fr.gouv.culture.oai
Class AbstractOAIHarvester

java.lang.Object
  extended by fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Semaphore
      extended by fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Mutex
          extended by fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLProducer
              extended by fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLPipe
                  extended by fr.gouv.culture.oai.SynchronizedOAIObjectImpl
                      extended by fr.gouv.culture.oai.AbstractOAIHarvester
All Implemented Interfaces:
EDU.oswego.cs.dl.util.concurrent.Sync, OAIHarvester, OAIObject, SynchronizedXMLConsumer, SynchronizedXMLPipe, SynchronizedXMLProducer, org.apache.avalon.excalibur.pool.Poolable, org.apache.avalon.excalibur.pool.Recyclable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, org.apache.avalon.framework.logger.LogEnabled, org.apache.avalon.framework.service.Serviceable, org.apache.cocoon.xml.XMLPipe, org.apache.cocoon.xml.XMLProducer, org.apache.excalibur.xml.sax.XMLConsumer, org.apache.excalibur.xml.sax.XMLizable, org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler
Direct Known Subclasses:
AbstractDocumentBaseOAIHarvester

public abstract class AbstractOAIHarvester
extends SynchronizedOAIObjectImpl
implements OAIHarvester, org.apache.avalon.framework.service.Serviceable


Nested Class Summary
 
Nested classes/interfaces inherited from interface fr.gouv.culture.oai.OAIObject
OAIObject.Node
 
Field Summary
protected  java.lang.String[] adminEmails
          List of email address strings for administrators of this harvester
protected  boolean captureElemContent
          flag for sax event handling indicating that an element's content should be captured in the endElement method
protected  boolean captureRecord
          flag for sax event handling indicating that a record should be capture
protected  java.lang.String currentDatestamp
          The _datestamp for the current record from the stream
protected  java.lang.String currentMetadtaUrlIdentifier
          Variable to hold any value retrieved based on
protected  java.lang.String currentOaiIdentifier
          The oai identifier for the current record from the stream
protected  java.lang.String currentOaiStatus
          The oai status for the current record from the stream
protected  int cursor
          Variable to hold cursor information from a request using resumptionTokens to return an entire set by multiple parts
protected  boolean deleteRecord
          flag for sax event handling indicating that a record should be delete
protected  java.lang.String errorCode
           
protected  org.apache.cocoon.xml.XMLConsumer firstXmlConsumer
          The first externally provided xml consumer.
protected  java.lang.String identifierName
          if a identifier name is provided we will attempt to take the value of the element named as such outside of the OAI2.0 namespace and retrieve an underlying XML document assuming the value is a valid url identifier and will incorporate the XML content into the oai-record
protected  org.apache.avalon.framework.service.ServiceManager manager
          Service manager for the object
protected  java.lang.String newRequestUrl
          The new URL to resolve the next resumptionToken
static java.lang.String OAI_REPOSITORY_URL
           
static java.lang.String OAI_REQUEST_URL
           
protected  java.lang.String repoUrl
          Variable to hold the url of the repository from which a response is being received
protected  org.apache.avalon.framework.parameters.Parameters requestParams
          The parameters for the request sent for which a response is being received
protected  java.lang.String requestUrl
          Variable to hold the url of the request to be sent
protected  java.lang.String responseDate
          Variable to hold the _datestamp of the response of the repository from which a response is being received
protected  java.lang.String resumptionToken
          Variable to hold the resumptionToken of the response of the repository from which a response is being received
protected  java.lang.StringBuffer sBuff
          buffer for data collection from sax stream
protected  java.lang.String userAgent
          User agent value to send with request
 
Fields inherited from class fr.gouv.culture.oai.SynchronizedOAIObjectImpl
_context, logger
 
Fields inherited from class fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLProducer
synchronizedXmlConsumer
 
Fields inherited from interface fr.gouv.culture.oai.OAIObject
HTTP_HEADER_NAME_FROM, HTTP_HEADER_NAME_USER_AGENT, NUMBER_RECORDS_PER_RESPONSE, STRING_DATEFORMAT_GRANULARITY_DAY, STRING_DATEFORMAT_GRANULARITY_SECOND
 
Fields inherited from interface EDU.oswego.cs.dl.util.concurrent.Sync
ONE_CENTURY, ONE_DAY, ONE_HOUR, ONE_MINUTE, ONE_SECOND, ONE_WEEK, ONE_YEAR
 
Constructor Summary
AbstractOAIHarvester()
           
 
Method Summary
protected  void abortRecordCapture()
          Stops any record capture currently being executed and sends a flag to the called method telling it to delete any document saved to any media
protected abstract  void captureRecord()
          When a complete record is received, this method takes the necessary steps to save the record to any underlying media, or pre-media
protected abstract  void captureResourceFromUrlIdentifier()
          When a complete "underlying document" is received, this method takes the necessary steps to save the document to any underlying media, or pre-media
 void characters(char[] chars, int relation, int relation1)
          Receive notification of character data.
 void close()
          Close OAI harvester.
 void endElement(java.lang.String s, java.lang.String s1, java.lang.String s2)
          Receive notification of the end of an element.
 java.lang.String[] getAdminEmails()
          Retrieves the list of adminstrator email addresses
protected  org.apache.avalon.framework.parameters.Parameters getHarvestParameters()
          This method returns the parameters for the request sent by this harvester as well as the "repository url", "request url", and the "harvester admin email".
protected  void handleErrors(java.lang.String errorMsg)
          Logs error messages, and the request parameters for that were sent to the repository which may have caused the error state
protected abstract  void handleResumptionToken()
          This method handles and reissues a new request using any resumption token received
protected abstract  void prepareRecordCapture()
          Prepares resources for capturing an oai record
protected abstract  void prepareRecordForDeletion()
          After receiving a header@status="deleted" for a record, this method makes the necessary preparations to delete the record from the harvester
protected abstract  void prepareResourceFromUrlIdentifierCapture()
          Prepares resources for capturing the underlying document available via a url described by the oai record
 void receiveRequest(java.lang.String url)
          Internal receive request method that by passes synchronization of this object as it may have already been synchronized elsewhere in the processing.
 void receiveSynchronizedRequest(java.lang.String url)
          Receive an OAI request
 void receiveSynchronizedRequest(java.lang.String url, java.lang.String originalRequestUrl)
          Receive an OAI request as an URL.
 void recycle()
          Clears any consumers provided to this object
protected  void resetAllFields()
          Resets the necessary class fields
protected abstract  void resetRecordCaptureFields(boolean deleteDoc)
          Stops any record capture currently being executed, resets the corresponding class fields and potentially deletes any document saved to any media
protected  void resetResumptionToken()
           
protected abstract  void saveCriticalFields(boolean dataHarvested)
          If data has been harvested, this method saves the any/all details of the harvest
 void service(org.apache.avalon.framework.service.ServiceManager serviceManager)
          The service manager for the object
 void setAdminEmails(java.lang.String[] adminEmails)
          Establishes the list of adminstrator email addresses
 void setConsumer(org.apache.cocoon.xml.XMLConsumer consumer)
          Set's the consumer of this object's events and will attempt to establish our firstXmlConsumer
 void setIdentifierName(java.lang.String name)
          Establishes the identifier class field
protected abstract  boolean shouldHarvestDocument()
          Querys underlying data structures do determine whether the current oai record should be harvested based on the state of the harvester (ie. past harvests, presence or lack or record in harvester data structures)
 void startElement(java.lang.String s, java.lang.String s1, java.lang.String s2, org.xml.sax.Attributes attributes)
          Receive notification of the beginning of an element.
protected abstract  void storeFailedHarvestData(java.lang.Exception e)
          This method stores information about a failed (internal failure not external error from OAI repository) harvest request, so that the valid request may be reexecuted by the proper mechanism.
protected abstract  boolean storeHarvestedData()
          This method saves all harvested records to a particular media
 void toSAX(org.xml.sax.ContentHandler contentHandler)
          Currently does nothing
 
Methods inherited from class fr.gouv.culture.oai.SynchronizedOAIObjectImpl
contextualize, enableLogging, getContext, sendElement, sendElementContent
 
Methods inherited from class fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLPipe
acquireSynchronizedXMLConsumer, comment, endCDATA, endDocument, endDTD, endEntity, endPrefixMapping, ignorableWhitespace, processingInstruction, releaseSynchronizedXMLConsumer, setDocumentLocator, skippedEntity, startCDATA, startDocument, startDTD, startEntity, startPrefixMapping
 
Methods inherited from class fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLProducer
setConsumer
 
Methods inherited from class fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Mutex
acquired, isAcquired
 
Methods inherited from class fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Semaphore
acquire, attempt, getTokens, release
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface fr.gouv.culture.oai.OAIHarvester
purgePastHarvestsData, sendPastHarvestsSummary, sendStoredHarvestingRequests
 
Methods inherited from interface org.apache.avalon.framework.logger.LogEnabled
enableLogging
 
Methods inherited from interface org.apache.avalon.framework.context.Contextualizable
contextualize
 
Methods inherited from interface org.xml.sax.ContentHandler
endDocument, endPrefixMapping, ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping
 
Methods inherited from interface org.xml.sax.ext.LexicalHandler
comment, endCDATA, endDTD, endEntity, startCDATA, startDTD, startEntity
 
Methods inherited from interface org.apache.avalon.framework.configuration.Configurable
configure
 
Methods inherited from interface fr.gouv.culture.util.apache.cocoon.xml.SynchronizedXMLProducer
acquired, setConsumer
 
Methods inherited from interface fr.gouv.culture.util.apache.cocoon.xml.SynchronizedXMLConsumer
acquired
 
Methods inherited from interface EDU.oswego.cs.dl.util.concurrent.Sync
acquire, attempt, release
 

Field Detail

OAI_REQUEST_URL

public static final java.lang.String OAI_REQUEST_URL
See Also:
Constant Field Values

OAI_REPOSITORY_URL

public static final java.lang.String OAI_REPOSITORY_URL
See Also:
Constant Field Values

manager

protected org.apache.avalon.framework.service.ServiceManager manager
Service manager for the object


adminEmails

protected java.lang.String[] adminEmails
List of email address strings for administrators of this harvester


userAgent

protected java.lang.String userAgent
User agent value to send with request


requestUrl

protected java.lang.String requestUrl
Variable to hold the url of the request to be sent


newRequestUrl

protected java.lang.String newRequestUrl
The new URL to resolve the next resumptionToken


requestParams

protected org.apache.avalon.framework.parameters.Parameters requestParams
The parameters for the request sent for which a response is being received


sBuff

protected java.lang.StringBuffer sBuff
buffer for data collection from sax stream


captureElemContent

protected boolean captureElemContent
flag for sax event handling indicating that an element's content should be captured in the endElement method


captureRecord

protected boolean captureRecord
flag for sax event handling indicating that a record should be capture


deleteRecord

protected boolean deleteRecord
flag for sax event handling indicating that a record should be delete


repoUrl

protected java.lang.String repoUrl
Variable to hold the url of the repository from which a response is being received


responseDate

protected java.lang.String responseDate
Variable to hold the _datestamp of the response of the repository from which a response is being received


resumptionToken

protected java.lang.String resumptionToken
Variable to hold the resumptionToken of the response of the repository from which a response is being received


cursor

protected int cursor
Variable to hold cursor information from a request using resumptionTokens to return an entire set by multiple parts


errorCode

protected java.lang.String errorCode

currentOaiIdentifier

protected java.lang.String currentOaiIdentifier
The oai identifier for the current record from the stream


currentDatestamp

protected java.lang.String currentDatestamp
The _datestamp for the current record from the stream


currentOaiStatus

protected java.lang.String currentOaiStatus
The oai status for the current record from the stream


identifierName

protected java.lang.String identifierName
if a identifier name is provided we will attempt to take the value of the element named as such outside of the OAI2.0 namespace and retrieve an underlying XML document assuming the value is a valid url identifier and will incorporate the XML content into the oai-record


currentMetadtaUrlIdentifier

protected java.lang.String currentMetadtaUrlIdentifier
Variable to hold any value retrieved based on

See Also:
identifierName

firstXmlConsumer

protected org.apache.cocoon.xml.XMLConsumer firstXmlConsumer
The first externally provided xml consumer. As serialization of the sax stream must be done the consumer of this object has to be dynamically changed and therefore we need a reference to the first externall provided consumer so that we may continue to supply it with sax events.

Constructor Detail

AbstractOAIHarvester

public AbstractOAIHarvester()
Method Detail

service

public void service(org.apache.avalon.framework.service.ServiceManager serviceManager)
             throws org.apache.avalon.framework.service.ServiceException
The service manager for the object

Specified by:
service in interface org.apache.avalon.framework.service.Serviceable
Parameters:
serviceManager -
Throws:
org.apache.avalon.framework.service.ServiceException

setConsumer

public void setConsumer(org.apache.cocoon.xml.XMLConsumer consumer)
Set's the consumer of this object's events and will attempt to establish our firstXmlConsumer

Specified by:
setConsumer in interface org.apache.cocoon.xml.XMLProducer
Overrides:
setConsumer in class AbstractSynchronizedXMLProducer
Parameters:
consumer -

getAdminEmails

public java.lang.String[] getAdminEmails()
Retrieves the list of adminstrator email addresses

Specified by:
getAdminEmails in interface OAIHarvester
Returns:
String[]

setAdminEmails

public void setAdminEmails(java.lang.String[] adminEmails)
Establishes the list of adminstrator email addresses

Specified by:
setAdminEmails in interface OAIHarvester

setIdentifierName

public void setIdentifierName(java.lang.String name)
Establishes the identifier class field

Specified by:
setIdentifierName in interface OAIHarvester
Parameters:
name -
See Also:
identifierName

toSAX

public void toSAX(org.xml.sax.ContentHandler contentHandler)
           throws org.xml.sax.SAXException
Currently does nothing

Specified by:
toSAX in interface org.apache.excalibur.xml.sax.XMLizable
Parameters:
contentHandler -
Throws:
org.xml.sax.SAXException

startElement

public void startElement(java.lang.String s,
                         java.lang.String s1,
                         java.lang.String s2,
                         org.xml.sax.Attributes attributes)
                  throws org.xml.sax.SAXException
Description copied from class: AbstractSynchronizedXMLPipe
Receive notification of the beginning of an element.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class SynchronizedOAIObjectImpl
Parameters:
s - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
s1 - The local name (without prefix), or the empty string if Namespace processing is not being performed.
s2 - The raw XML 1.0 name (with prefix), or the empty string if raw names are not available.
attributes - The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object.
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] chars,
                       int relation,
                       int relation1)
                throws org.xml.sax.SAXException
Description copied from class: AbstractSynchronizedXMLPipe
Receive notification of character data.

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class AbstractSynchronizedXMLPipe
Parameters:
chars - The characters from the XML document.
relation - The start position in the array.
relation1 - The number of characters to read from the array.
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String s,
                       java.lang.String s1,
                       java.lang.String s2)
                throws org.xml.sax.SAXException
Description copied from class: AbstractSynchronizedXMLPipe
Receive notification of the end of an element.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class AbstractSynchronizedXMLPipe
Parameters:
s - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
s1 - The local name (without prefix), or the empty string if Namespace processing is not being performed.
s2 - The raw XML 1.0 name (with prefix), or the empty string if raw names are not available.
Throws:
org.xml.sax.SAXException

abortRecordCapture

protected void abortRecordCapture()
Stops any record capture currently being executed and sends a flag to the called method telling it to delete any document saved to any media


handleErrors

protected void handleErrors(java.lang.String errorMsg)
Logs error messages, and the request parameters for that were sent to the repository which may have caused the error state

Parameters:
errorMsg -

prepareRecordCapture

protected abstract void prepareRecordCapture()
                                      throws org.xml.sax.SAXException
Prepares resources for capturing an oai record

Throws:
org.xml.sax.SAXException

shouldHarvestDocument

protected abstract boolean shouldHarvestDocument()
Querys underlying data structures do determine whether the current oai record should be harvested based on the state of the harvester (ie. past harvests, presence or lack or record in harvester data structures)

Returns:
boolean

captureRecord

protected abstract void captureRecord()
                               throws java.lang.Exception
When a complete record is received, this method takes the necessary steps to save the record to any underlying media, or pre-media

Throws:
java.lang.Exception

prepareRecordForDeletion

protected abstract void prepareRecordForDeletion()
After receiving a header@status="deleted" for a record, this method makes the necessary preparations to delete the record from the harvester


prepareResourceFromUrlIdentifierCapture

protected abstract void prepareResourceFromUrlIdentifierCapture()
Prepares resources for capturing the underlying document available via a url described by the oai record

See Also:
currentMetadtaUrlIdentifier, identifierName

captureResourceFromUrlIdentifier

protected abstract void captureResourceFromUrlIdentifier()
When a complete "underlying document" is received, this method takes the necessary steps to save the document to any underlying media, or pre-media

See Also:
currentMetadtaUrlIdentifier, identifierName

storeHarvestedData

protected abstract boolean storeHarvestedData()
                                       throws java.lang.Exception
This method saves all harvested records to a particular media

Throws:
java.lang.Exception

storeFailedHarvestData

protected abstract void storeFailedHarvestData(java.lang.Exception e)
This method stores information about a failed (internal failure not external error from OAI repository) harvest request, so that the valid request may be reexecuted by the proper mechanism.


handleResumptionToken

protected abstract void handleResumptionToken()
This method handles and reissues a new request using any resumption token received


saveCriticalFields

protected abstract void saveCriticalFields(boolean dataHarvested)
                                    throws org.xml.sax.SAXException
If data has been harvested, this method saves the any/all details of the harvest

Parameters:
dataHarvested - boolean indicating data was harvested
Throws:
org.xml.sax.SAXException

resetRecordCaptureFields

protected abstract void resetRecordCaptureFields(boolean deleteDoc)
Stops any record capture currently being executed, resets the corresponding class fields and potentially deletes any document saved to any media

Parameters:
deleteDoc -

resetAllFields

protected void resetAllFields()
Resets the necessary class fields


recycle

public void recycle()
Clears any consumers provided to this object

Specified by:
recycle in interface org.apache.avalon.excalibur.pool.Recyclable
Overrides:
recycle in class AbstractSynchronizedXMLProducer

resetResumptionToken

protected void resetResumptionToken()

receiveSynchronizedRequest

public void receiveSynchronizedRequest(java.lang.String url)
Receive an OAI request

Specified by:
receiveSynchronizedRequest in interface OAIHarvester

receiveSynchronizedRequest

public void receiveSynchronizedRequest(java.lang.String url,
                                       java.lang.String originalRequestUrl)
Receive an OAI request as an URL. The original request may have been changed if we are updating. In that case, we only want to get the last documents

Specified by:
receiveSynchronizedRequest in interface OAIHarvester
Parameters:
url - : the url wich represent the request
originalRequestUrl - : the original request

receiveRequest

public void receiveRequest(java.lang.String url)
Internal receive request method that by passes synchronization of this object as it may have already been synchronized elsewhere in the processing. It is useful when handling resumption tokens

Specified by:
receiveRequest in interface OAIHarvester
Parameters:
url -

getHarvestParameters

protected org.apache.avalon.framework.parameters.Parameters getHarvestParameters()
This method returns the parameters for the request sent by this harvester as well as the "repository url", "request url", and the "harvester admin email". This info will be useful when harvested records are processed.

Returns:
Parameters

close

public void close()
Close OAI harvester.



Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.