fr.gouv.culture.sdx.oai
Class AbstractDocumentBaseOAIHarvester

java.lang.Object
  extended by fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Semaphore
      extended by fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Mutex
          extended by fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLProducer
              extended by fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLPipe
                  extended by fr.gouv.culture.oai.SynchronizedOAIObjectImpl
                      extended by fr.gouv.culture.oai.AbstractOAIHarvester
                          extended by fr.gouv.culture.sdx.oai.AbstractDocumentBaseOAIHarvester
All Implemented Interfaces:
EDU.oswego.cs.dl.util.concurrent.Sync, OAIHarvester, OAIObject, DocumentBaseOAIHarvester, Saveable, Target, SynchronizedXMLConsumer, SynchronizedXMLPipe, SynchronizedXMLProducer, org.apache.avalon.excalibur.pool.Poolable, org.apache.avalon.excalibur.pool.Recyclable, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, org.apache.avalon.framework.logger.LogEnabled, org.apache.avalon.framework.service.Serviceable, org.apache.cocoon.xml.XMLPipe, org.apache.cocoon.xml.XMLProducer, org.apache.excalibur.xml.sax.XMLConsumer, org.apache.excalibur.xml.sax.XMLizable, org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler
Direct Known Subclasses:
LuceneDocumentBaseOAIHarvester

public abstract class AbstractDocumentBaseOAIHarvester
extends AbstractOAIHarvester
implements DocumentBaseOAIHarvester


Nested Class Summary
 
Nested classes/interfaces inherited from interface fr.gouv.culture.sdx.oai.DocumentBaseOAIHarvester
DocumentBaseOAIHarvester.ConfigurationNode
 
Nested classes/interfaces inherited from interface fr.gouv.culture.oai.OAIObject
OAIObject.Node
 
Field Summary
protected  Database _database
          Underlying database to store any info
protected  org.apache.cocoon.serialization.XMLSerializer cBytes
           
protected  java.lang.String defaultTransformerFactory
           
protected  java.lang.String defaultTransformerIndent
           
protected  java.util.ArrayList deletedDocs
           
protected  DocumentBase docbase
          The underlying document base
protected  java.lang.String docbaseId
          Id of the underlying document base
protected static java.lang.String ERROR_CODE
           
protected  java.io.FileOutputStream fileOs
           
protected  java.util.Hashtable filesProperties
          List OAI files with OAI properties
protected  boolean forceIndexOnHarvestError
          Force indexation on harvest error option Default: false.
protected static java.lang.String FORCEINDEXONHARVESTERROR
           
protected  java.io.File harvestDoc
           
protected  IDGenerator harvesterIdGen
          IDGenerator for this object
protected  boolean indexAtHarvestEnd
          Indexation at the end of harvesting option Default: true.
protected static java.lang.String INDEXATHARVESTEND
           
protected  boolean keepDeletedRecords
           
protected  boolean keepHarvestedRecords
          Force harvester to keep harvested records (default: false) Force harvester to keep harvested records (XML files) in file system server.
protected  java.util.Set m_docsaddedIds
           
protected  java.util.Set m_docsdeletedids
           
protected  java.util.Set m_docsToDeleteIds
           
protected static java.lang.String NO_DOCS_DELETED
           
protected static java.lang.String NO_DOCS_HARVESTED
           
protected  int noDocsDeleted
           
protected  int noHarvestedDocs
           
protected  int noRecordsPerBatch
           
protected static java.lang.String OAI_FAILED_HARVEST
           
protected static java.lang.String OAI_FROM
           
protected static java.lang.String OAI_HARVEST_ID
           
protected static java.lang.String OAI_HARVESTER_LAST_UPDATED
           
protected static java.lang.String OAI_HARVESTER_RESUMPTION_TOKEN
           
protected static java.lang.String OAI_IDENTIFIER
           
protected static java.lang.String OAI_METADATA_PREFIX
           
protected static java.lang.String OAI_SET
           
protected static java.lang.String OAI_UNTIL
           
protected static java.lang.String OAI_VERB
           
protected  org.apache.cocoon.xml.XMLPipe oaiStripper
           
protected  Pipeline pipe
          Pre-indexation pipeline
protected  TimeScheduler scheduler
          Time scheduler for stored requests
protected  java.util.Hashtable storedRequests
          Requests in application.xconf
protected  java.util.Hashtable storeRepositoriesRefs
          References to the underlying documentbase's/application's repositories
protected  java.io.File tempDir
           
protected  java.io.File tempDirBatch
           
protected  java.lang.String tempDirPath
          Directory to store harvested documents Temporary path of the directory where the harvested documents will be stored.
protected  java.lang.String TEMPFILE_SUFFIX
           
protected static java.lang.String TRANSFORMER_FACTORY
           
protected static java.lang.String TRANSFORMER_INDENT
           
protected  java.lang.String transformerFactory
          XML Transformer factory classe name.
protected  java.lang.String transformerIndent
          XML Transformer indent option.
protected  XMLDocument urlResource
           
 
Fields inherited from class fr.gouv.culture.oai.AbstractOAIHarvester
adminEmails, captureElemContent, captureRecord, currentDatestamp, currentMetadtaUrlIdentifier, currentOaiIdentifier, currentOaiStatus, cursor, deleteRecord, errorCode, firstXmlConsumer, identifierName, manager, newRequestUrl, OAI_REPOSITORY_URL, OAI_REQUEST_URL, repoUrl, requestParams, requestUrl, responseDate, resumptionToken, sBuff, userAgent
 
Fields inherited from class fr.gouv.culture.oai.SynchronizedOAIObjectImpl
_context, logger
 
Fields inherited from class fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLProducer
synchronizedXmlConsumer
 
Fields inherited from interface fr.gouv.culture.oai.OAIObject
HTTP_HEADER_NAME_FROM, HTTP_HEADER_NAME_USER_AGENT, NUMBER_RECORDS_PER_RESPONSE, STRING_DATEFORMAT_GRANULARITY_DAY, STRING_DATEFORMAT_GRANULARITY_SECOND
 
Fields inherited from interface EDU.oswego.cs.dl.util.concurrent.Sync
ONE_CENTURY, ONE_DAY, ONE_HOUR, ONE_MINUTE, ONE_SECOND, ONE_WEEK, ONE_YEAR
 
Fields inherited from interface fr.gouv.culture.sdx.utils.save.Saveable
ALL_SAVE_ATTRIB, PATH_ATTRIB, SAVE_DIRECTORY_PARAM
 
Constructor Summary
AbstractDocumentBaseOAIHarvester(DocumentBase base)
          Basic constructor
 
Method Summary
 void backup(SaveParameters save_config)
          Save the timeStamp of the Harvester
protected  void captureRecord()
          Ends the capture of an oai record.
protected  void captureResourceFromUrlIdentifier()
          Captures the xml from a url taken from an oai record and adds it to the oai-record as a sibling of the element
 boolean checkGranularity(java.lang.String granularity)
          Check the granularity of an AOI provider : YYYY-MM-DDThh:mm:ssZ or YYYY-MM-DD
 void close()
          Close OAI harvester.
 void configure(org.apache.avalon.framework.configuration.Configuration configuration)
          OAI harvester configuration Configures the OAI harvester reading application.xconf file wich may contains a section such as: <sdx:documentBase [...]> <sdx:oai-harvester adminEmail="{some.body@some.where}" keepDeletedRecords="{true|false}" noRecordsPerBatch="{number}" transformer-factory="{Transformer factory classe name}" transformer-indent="{yes|no}" keepHarvestedRecords="{true|false}" tempDirPath="{directory path}"> <sdx:oai-data-providers> <sdx:oai-repository [...]>[...]
protected  void configureAdminEmails(org.apache.avalon.framework.configuration.Configuration configuration)
          Configures a list of admin emails can be sub-elements, a single attribute, or both
protected  void configureDatabase(org.apache.avalon.framework.configuration.Configuration configuration)
          Configures the internal database
protected  void configureDataProviders(org.apache.avalon.framework.configuration.Configuration configuration)
          Configures data providers info that can be reused and from which requests can be automatically executed
protected  void configureHarvestIDGenerator(org.apache.avalon.framework.configuration.Configuration configuration)
          Configures the id generator for harvests
protected  void configurePipeline(org.apache.avalon.framework.configuration.Configuration configuration)
          Configures the preIndexation pipeline
protected  void configureStoreRepositories(java.lang.String repoUrl, org.apache.avalon.framework.configuration.Configuration oaiRepoConf)
          Configures the repositories to which data will be stored based upon their repository url
protected  void configureTempDir(org.apache.avalon.framework.configuration.Configuration conf)
          Configures the temporary directory Configures the temporary directory where harvested documents will be stored in sub-directories.
protected  void configureUpdateTriggers(java.lang.String requestUrl, org.apache.avalon.framework.configuration.Configuration updateConf)
          Configures time triggers for stored requests
protected  void deleteOAIDocuments()
          Delete OAI documents from the current document base.
protected  void deleteTempDir()
          Deletes the directory represented by the tempDir class field
protected  void deleteTempDirBatch()
          Deletes the directory represented by the tempDirBatch class field
 void endElement(java.lang.String s, java.lang.String s1, java.lang.String s2)
          Receive notification of the end of an element.
protected  void endHarvest()
          Ends the harvest
protected  java.lang.String generateNewHarvestId()
          Generates an id to associate with a harvest
protected  java.lang.String getHarvesterId()
          Returns an id for this harvester based upon the underlying document base id
protected  IndexParameters getIndexParameters()
          Builds simple index parameters for indexation of oai records into the undelryi
protected  java.lang.String getIsoDate()
          Get's the current date in iso8601 format
protected  java.io.File getNewTempDirBatch()
          Creates a new temporary directory for writing harvested records before the will be indexed
protected  void handleResumptionToken()
          Handles the resumption token by issuing another request based upon the request from which the resumption token was received.
protected  void initTempDir()
          Establishes the tempDirBatch class field
protected  boolean isStartsIndexation()
           
 java.util.Date lastUpdated()
          Retrieves the time when the harvester was last updated
protected  void prepareRecordCapture()
          Sets up resources to capture an oai record
protected  void prepareRecordForDeletion()
          Sets up resources to delete an oai record Add the record to the list of the records to removed
protected  void prepareResourceFromUrlIdentifierCapture()
          Prepares to read a url value from an oai record and retrieve the XML behind.
 void purgePastHarvestsData()
          Destroys all summary data pertaining to past harvests but not the actual oai records harvested
protected  void resetAllFields()
          Resets necessary class fields
protected  void resetRecordCaptureFields(boolean deleteDoc)
          Resets the class fields for record capture possibility deleting the current harvetDoc object underlying file
 void restore(SaveParameters save_config)
          Restore the timeStamp of the Harvester
protected  void saveCriticalFields(boolean dataHarvested)
          Saves critical data about a harvest
 void sendPastHarvestsSummary()
          Sends sax events to the current consumer with summary details of the all the past harvests
 void sendStoredHarvestingRequests()
          Sends the details of stored harvesting requests to the current consumer
protected  boolean shouldHarvestDocument()
          Querys the underlying data structures based upon current sax flow position/set class fields and determines whether an oai record should be harvested
 void startElement(java.lang.String s, java.lang.String s1, java.lang.String s2, org.xml.sax.Attributes attributes)
          Receive notification of the beginning of an element.
protected  void storeFailedHarvestData(java.lang.Exception e)
          Stores data about harvesting failures caused by problems other than oai errors sent from a queried repository
protected  boolean storeHarvestedData()
          Reads the documents from tempDirBatch and indexes them in the corresponding document base, any marked deletions will be carried out as well
 void targetTriggered(java.lang.String triggerName)
          Triggers an OAI request to a repository based upon a trigger name (also a request url)
 
Methods inherited from class fr.gouv.culture.oai.AbstractOAIHarvester
abortRecordCapture, characters, getAdminEmails, getHarvestParameters, handleErrors, receiveRequest, receiveSynchronizedRequest, receiveSynchronizedRequest, recycle, resetResumptionToken, service, setAdminEmails, setConsumer, setIdentifierName, toSAX
 
Methods inherited from class fr.gouv.culture.oai.SynchronizedOAIObjectImpl
contextualize, enableLogging, getContext, sendElement, sendElementContent
 
Methods inherited from class fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLPipe
acquireSynchronizedXMLConsumer, comment, endCDATA, endDocument, endDTD, endEntity, endPrefixMapping, ignorableWhitespace, processingInstruction, releaseSynchronizedXMLConsumer, setDocumentLocator, skippedEntity, startCDATA, startDocument, startDTD, startEntity, startPrefixMapping
 
Methods inherited from class fr.gouv.culture.util.apache.cocoon.xml.AbstractSynchronizedXMLProducer
setConsumer
 
Methods inherited from class fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Mutex
acquired, isAcquired
 
Methods inherited from class fr.gouv.culture.util.apache.avalon.excalibur.concurrent.Semaphore
acquire, attempt, getTokens, release
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface fr.gouv.culture.oai.OAIHarvester
getAdminEmails, receiveRequest, receiveSynchronizedRequest, receiveSynchronizedRequest, setAdminEmails, setIdentifierName
 
Methods inherited from interface org.apache.avalon.framework.logger.LogEnabled
enableLogging
 
Methods inherited from interface org.apache.avalon.framework.context.Contextualizable
contextualize
 
Methods inherited from interface org.apache.excalibur.xml.sax.XMLizable
toSAX
 
Methods inherited from interface org.xml.sax.ContentHandler
characters, endDocument, endPrefixMapping, ignorableWhitespace, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping
 
Methods inherited from interface org.xml.sax.ext.LexicalHandler
comment, endCDATA, endDTD, endEntity, startCDATA, startDTD, startEntity
 
Methods inherited from interface fr.gouv.culture.util.apache.cocoon.xml.SynchronizedXMLProducer
acquired, setConsumer
 
Methods inherited from interface org.apache.cocoon.xml.XMLProducer
setConsumer
 
Methods inherited from interface fr.gouv.culture.util.apache.cocoon.xml.SynchronizedXMLConsumer
acquired
 
Methods inherited from interface EDU.oswego.cs.dl.util.concurrent.Sync
acquire, attempt, release
 

Field Detail

docbase

protected DocumentBase docbase
The underlying document base


docbaseId

protected java.lang.String docbaseId
Id of the underlying document base


pipe

protected Pipeline pipe
Pre-indexation pipeline


_database

protected Database _database
Underlying database to store any info


storedRequests

protected java.util.Hashtable storedRequests
Requests in application.xconf


storeRepositoriesRefs

protected java.util.Hashtable storeRepositoriesRefs
References to the underlying documentbase's/application's repositories


scheduler

protected TimeScheduler scheduler
Time scheduler for stored requests


harvesterIdGen

protected IDGenerator harvesterIdGen
IDGenerator for this object


TEMPFILE_SUFFIX

protected java.lang.String TEMPFILE_SUFFIX

tempDir

protected java.io.File tempDir

tempDirBatch

protected java.io.File tempDirBatch

harvestDoc

protected java.io.File harvestDoc

fileOs

protected java.io.FileOutputStream fileOs

urlResource

protected XMLDocument urlResource

deletedDocs

protected java.util.ArrayList deletedDocs

noHarvestedDocs

protected int noHarvestedDocs

noDocsDeleted

protected int noDocsDeleted

m_docsaddedIds

protected java.util.Set m_docsaddedIds

m_docsToDeleteIds

protected java.util.Set m_docsToDeleteIds

m_docsdeletedids

protected java.util.Set m_docsdeletedids

keepDeletedRecords

protected boolean keepDeletedRecords

noRecordsPerBatch

protected int noRecordsPerBatch

keepHarvestedRecords

protected boolean keepHarvestedRecords
Force harvester to keep harvested records (default: false)

Force harvester to keep harvested records (XML files) in file system server.
Default is false.
This cas be change in document base configuration file: <oai-harvester keepHarvestedRecords="{true|false}" [...]>


tempDirPath

protected java.lang.String tempDirPath
Directory to store harvested documents

Temporary path of the directory where the harvested documents will be stored.
Default is the servlet context temp dir (eg, $TOMCAT/work/...). If the directory is not writable, the harvester will use the temporary directory of the JVM (ie, java.io.tmpdir system property).
This can be change in document base configuration file:

 <oai-harvester tempDirPath="{/path/to/directory}" [...]>
 
To resolve the path, harvester uses the Utilities.resolveFile(org.apache.avalon.framework.logger.Logger, String, Context, String, boolean).
By default, this directory is deleted after the harvest. This can be change with keepHarvestedRecords configuration attribute.

See Also:
Utilities.resolveFile(org.apache.avalon.framework.logger.Logger, String, Context, String, boolean)

transformerFactory

protected java.lang.String transformerFactory
XML Transformer factory classe name.

Default: Xalan, "org.apache.xalan.processor.TransformerFactoryImpl". This cas be change in configuration file: <oai-harvester transformer-factory="{classe name}" [...]>


defaultTransformerFactory

protected java.lang.String defaultTransformerFactory

transformerIndent

protected java.lang.String transformerIndent
XML Transformer indent option.

Default:no. This can be change in configuration file: <oai-harvester transformer-indent="yes|no" [...]>


defaultTransformerIndent

protected java.lang.String defaultTransformerIndent

indexAtHarvestEnd

protected boolean indexAtHarvestEnd
Indexation at the end of harvesting option

Default: true. This cas be change in configuraiton file: <oai-harvester index-at-index-end="yes|no" [...]>


forceIndexOnHarvestError

protected boolean forceIndexOnHarvestError
Force indexation on harvest error option

Default: false. This cas be change in configuraiton file: <oai-harvester force-index-on-harvest-error="yes|no" [...]>


TRANSFORMER_FACTORY

protected static final java.lang.String TRANSFORMER_FACTORY
See Also:
Constant Field Values

TRANSFORMER_INDENT

protected static final java.lang.String TRANSFORMER_INDENT
See Also:
Constant Field Values

INDEXATHARVESTEND

protected static final java.lang.String INDEXATHARVESTEND
See Also:
Constant Field Values

FORCEINDEXONHARVESTERROR

protected static final java.lang.String FORCEINDEXONHARVESTERROR
See Also:
Constant Field Values

OAI_HARVEST_ID

protected static final java.lang.String OAI_HARVEST_ID
See Also:
Constant Field Values

OAI_FAILED_HARVEST

protected static final java.lang.String OAI_FAILED_HARVEST
See Also:
Constant Field Values

OAI_HARVESTER_LAST_UPDATED

protected static final java.lang.String OAI_HARVESTER_LAST_UPDATED
See Also:
Constant Field Values

OAI_HARVESTER_RESUMPTION_TOKEN

protected static final java.lang.String OAI_HARVESTER_RESUMPTION_TOKEN
See Also:
Constant Field Values

OAI_VERB

protected static final java.lang.String OAI_VERB
See Also:
Constant Field Values

OAI_IDENTIFIER

protected static final java.lang.String OAI_IDENTIFIER
See Also:
Constant Field Values

OAI_METADATA_PREFIX

protected static final java.lang.String OAI_METADATA_PREFIX
See Also:
Constant Field Values

OAI_FROM

protected static final java.lang.String OAI_FROM
See Also:
Constant Field Values

OAI_UNTIL

protected static final java.lang.String OAI_UNTIL
See Also:
Constant Field Values

OAI_SET

protected static final java.lang.String OAI_SET
See Also:
Constant Field Values

NO_DOCS_DELETED

protected static final java.lang.String NO_DOCS_DELETED
See Also:
Constant Field Values

NO_DOCS_HARVESTED

protected static final java.lang.String NO_DOCS_HARVESTED
See Also:
Constant Field Values

ERROR_CODE

protected static final java.lang.String ERROR_CODE
See Also:
Constant Field Values

filesProperties

protected java.util.Hashtable filesProperties
List OAI files with OAI properties


cBytes

protected org.apache.cocoon.serialization.XMLSerializer cBytes

oaiStripper

protected org.apache.cocoon.xml.XMLPipe oaiStripper
Constructor Detail

AbstractDocumentBaseOAIHarvester

public AbstractDocumentBaseOAIHarvester(DocumentBase base)
Basic constructor

Method Detail

configure

public void configure(org.apache.avalon.framework.configuration.Configuration configuration)
               throws org.apache.avalon.framework.configuration.ConfigurationException
OAI harvester configuration

Configures the OAI harvester reading application.xconf file wich may contains a section such as:

 <sdx:documentBase [...]>
   <sdx:oai-harvester
           adminEmail="{some.body@some.where}"
           keepDeletedRecords="{true|false}"
           noRecordsPerBatch="{number}"
           transformer-factory="{Transformer factory classe name}"
           transformer-indent="{yes|no}"
           keepHarvestedRecords="{true|false}"
           tempDirPath="{directory path}">
     <sdx:oai-data-providers>
       <sdx:oai-repository [...]>[...]</sdx:oai-repository>
       [...]
     </sdx:oai-data-providers>
   </sdx:oai-harvester>
 </sdx:documentBase>
 

Specified by:
configure in interface org.apache.avalon.framework.configuration.Configurable
Parameters:
Configuration -
Throws:
org.apache.avalon.framework.configuration.ConfigurationException
See Also:
keepDeletedRecords, noRecordsPerBatch, transformerFactory, transformerIndent, keepHarvestedRecords, tempDirPath

configureTempDir

protected void configureTempDir(org.apache.avalon.framework.configuration.Configuration conf)
                         throws org.apache.avalon.framework.configuration.ConfigurationException
Configures the temporary directory

Configures the temporary directory where harvested documents will be stored in sub-directories. There will be one sub-directory per batch of the harvest. This directory will be deleted after harvest. This can be change with keepHarvestedRecords configuration attribute.

Parameters:
Configuration -
Throws:
org.apache.avalon.framework.configuration.ConfigurationException
See Also:
keepHarvestedRecords, tempDirPath

configureDatabase

protected void configureDatabase(org.apache.avalon.framework.configuration.Configuration configuration)
                          throws org.apache.avalon.framework.configuration.ConfigurationException
Configures the internal database

Throws:
org.apache.avalon.framework.configuration.ConfigurationException

configureHarvestIDGenerator

protected void configureHarvestIDGenerator(org.apache.avalon.framework.configuration.Configuration configuration)
                                    throws org.apache.avalon.framework.configuration.ConfigurationException
Configures the id generator for harvests

Throws:
org.apache.avalon.framework.configuration.ConfigurationException

getHarvesterId

protected java.lang.String getHarvesterId()
Returns an id for this harvester based upon the underlying document base id


configureAdminEmails

protected void configureAdminEmails(org.apache.avalon.framework.configuration.Configuration configuration)
                             throws org.apache.avalon.framework.configuration.ConfigurationException
Configures a list of admin emails can be sub-elements, a single attribute, or both

Parameters:
configuration -
Throws:
org.apache.avalon.framework.configuration.ConfigurationException

configureDataProviders

protected void configureDataProviders(org.apache.avalon.framework.configuration.Configuration configuration)
                               throws org.apache.avalon.framework.configuration.ConfigurationException
Configures data providers info that can be reused and from which requests can be automatically executed

Parameters:
configuration -
Throws:
org.apache.avalon.framework.configuration.ConfigurationException
See Also:
storedRequests

configureUpdateTriggers

protected void configureUpdateTriggers(java.lang.String requestUrl,
                                       org.apache.avalon.framework.configuration.Configuration updateConf)
                                throws org.apache.avalon.framework.configuration.ConfigurationException
Configures time triggers for stored requests

Parameters:
requestUrl - The request url
updateConf - The configuration for updates
Throws:
org.apache.avalon.framework.configuration.ConfigurationException
See Also:
scheduler, storedRequests

configureStoreRepositories

protected void configureStoreRepositories(java.lang.String repoUrl,
                                          org.apache.avalon.framework.configuration.Configuration oaiRepoConf)
                                   throws org.apache.avalon.framework.configuration.ConfigurationException
Configures the repositories to which data will be stored based upon their repository url

Parameters:
repoUrl - The repository/data provider url
oaiRepoConf - The configuration
Throws:
org.apache.avalon.framework.configuration.ConfigurationException

checkGranularity

public boolean checkGranularity(java.lang.String granularity)
                         throws org.apache.avalon.framework.configuration.ConfigurationException
Check the granularity of an AOI provider : YYYY-MM-DDThh:mm:ssZ or YYYY-MM-DD

Parameters:
granularity -
Returns:
true or false
Throws:
org.apache.avalon.framework.configuration.ConfigurationException

configurePipeline

protected void configurePipeline(org.apache.avalon.framework.configuration.Configuration configuration)
                          throws org.apache.avalon.framework.configuration.ConfigurationException
Configures the preIndexation pipeline

Parameters:
configuration -
Throws:
org.apache.avalon.framework.configuration.ConfigurationException
See Also:
pipe

getNewTempDirBatch

protected java.io.File getNewTempDirBatch()
                                   throws SDXException,
                                          java.io.IOException
Creates a new temporary directory for writing harvested records before the will be indexed

Returns:
File
Throws:
SDXException
java.io.IOException

deleteTempDirBatch

protected void deleteTempDirBatch()
Deletes the directory represented by the tempDirBatch class field


deleteTempDir

protected void deleteTempDir()
Deletes the directory represented by the tempDir class field


initTempDir

protected void initTempDir()
                    throws SDXException,
                           java.io.IOException
Establishes the tempDirBatch class field

Throws:
SDXException
java.io.IOException

getIsoDate

protected java.lang.String getIsoDate()
Get's the current date in iso8601 format

Returns:
String

prepareRecordCapture

protected void prepareRecordCapture()
                             throws org.xml.sax.SAXException
Sets up resources to capture an oai record

Specified by:
prepareRecordCapture in class AbstractOAIHarvester
Throws:
org.xml.sax.SAXException

captureRecord

protected void captureRecord()
                      throws java.lang.Exception
Ends the capture of an oai record. Store properties (dateStamp and OAI identifier) needed for indexation.

Specified by:
captureRecord in class AbstractOAIHarvester
Throws:
java.lang.Exception

resetRecordCaptureFields

protected void resetRecordCaptureFields(boolean deleteDoc)
Resets the class fields for record capture possibility deleting the current harvetDoc object underlying file

Specified by:
resetRecordCaptureFields in class AbstractOAIHarvester
Parameters:
deleteDoc - flag for deletion of actual file

prepareRecordForDeletion

protected void prepareRecordForDeletion()
Sets up resources to delete an oai record Add the record to the list of the records to removed

Specified by:
prepareRecordForDeletion in class AbstractOAIHarvester

isStartsIndexation

protected boolean isStartsIndexation()

storeHarvestedData

protected boolean storeHarvestedData()
                              throws org.apache.cocoon.ProcessingException,
                                     java.io.IOException,
                                     SDXException,
                                     org.xml.sax.SAXException
Reads the documents from tempDirBatch and indexes them in the corresponding document base, any marked deletions will be carried out as well

Specified by:
storeHarvestedData in class AbstractOAIHarvester
Returns:
boolean
Throws:
SDXException
org.xml.sax.SAXException
org.apache.cocoon.ProcessingException
java.io.IOException
See Also:
AbstractOAIHarvester.storeHarvestedData()

deleteOAIDocuments

protected void deleteOAIDocuments()
                           throws java.io.IOException,
                                  org.apache.cocoon.ProcessingException,
                                  SDXException,
                                  org.xml.sax.SAXException
Delete OAI documents from the current document base.

Throws:
java.io.IOException
org.apache.cocoon.ProcessingException
SDXException
org.xml.sax.SAXException

handleResumptionToken

protected void handleResumptionToken()
Handles the resumption token by issuing another request based upon the request from which the resumption token was received.

Specified by:
handleResumptionToken in class AbstractOAIHarvester

prepareResourceFromUrlIdentifierCapture

protected void prepareResourceFromUrlIdentifierCapture()
Prepares to read a url value from an oai record and retrieve the XML behind.

Specified by:
prepareResourceFromUrlIdentifierCapture in class AbstractOAIHarvester
See Also:
AbstractOAIHarvester.identifierName, AbstractOAIHarvester.currentMetadtaUrlIdentifier

captureResourceFromUrlIdentifier

protected void captureResourceFromUrlIdentifier()
Captures the xml from a url taken from an oai record and adds it to the oai-record as a sibling of the element

Specified by:
captureResourceFromUrlIdentifier in class AbstractOAIHarvester
See Also:
AbstractOAIHarvester.currentMetadtaUrlIdentifier, AbstractOAIHarvester.identifierName

resetAllFields

protected void resetAllFields()
Resets necessary class fields

Overrides:
resetAllFields in class AbstractOAIHarvester

endHarvest

protected void endHarvest()
Ends the harvest


getIndexParameters

protected IndexParameters getIndexParameters()
Builds simple index parameters for indexation of oai records into the undelryi

Returns:
IndexParameters

sendStoredHarvestingRequests

public void sendStoredHarvestingRequests()
                                  throws org.xml.sax.SAXException
Sends the details of stored harvesting requests to the current consumer

Specified by:
sendStoredHarvestingRequests in interface OAIHarvester
Throws:
org.xml.sax.SAXException

targetTriggered

public void targetTriggered(java.lang.String triggerName)
Triggers an OAI request to a repository based upon a trigger name (also a request url)

Specified by:
targetTriggered in interface Target
Parameters:
triggerName -

startElement

public void startElement(java.lang.String s,
                         java.lang.String s1,
                         java.lang.String s2,
                         org.xml.sax.Attributes attributes)
                  throws org.xml.sax.SAXException
Description copied from class: AbstractSynchronizedXMLPipe
Receive notification of the beginning of an element.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class AbstractOAIHarvester
Parameters:
s - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
s1 - The local name (without prefix), or the empty string if Namespace processing is not being performed.
s2 - The raw XML 1.0 name (with prefix), or the empty string if raw names are not available.
attributes - The attributes attached to the element. If there are no attributes, it shall be an empty Attributes object.
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String s,
                       java.lang.String s1,
                       java.lang.String s2)
                throws org.xml.sax.SAXException
Description copied from class: AbstractSynchronizedXMLPipe
Receive notification of the end of an element.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class AbstractOAIHarvester
Parameters:
s - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
s1 - The local name (without prefix), or the empty string if Namespace processing is not being performed.
s2 - The raw XML 1.0 name (with prefix), or the empty string if raw names are not available.
Throws:
org.xml.sax.SAXException

shouldHarvestDocument

protected boolean shouldHarvestDocument()
Querys the underlying data structures based upon current sax flow position/set class fields and determines whether an oai record should be harvested

Specified by:
shouldHarvestDocument in class AbstractOAIHarvester
Returns:
boolean indicates whether the record should be handled

saveCriticalFields

protected void saveCriticalFields(boolean dataHarvested)
                           throws org.xml.sax.SAXException
Saves critical data about a harvest

Specified by:
saveCriticalFields in class AbstractOAIHarvester
Parameters:
dataHarvested -
Throws:
org.xml.sax.SAXException

generateNewHarvestId

protected java.lang.String generateNewHarvestId()
Generates an id to associate with a harvest

Returns:
String

sendPastHarvestsSummary

public void sendPastHarvestsSummary()
                             throws org.xml.sax.SAXException
Sends sax events to the current consumer with summary details of the all the past harvests

Specified by:
sendPastHarvestsSummary in interface OAIHarvester
Throws:
org.xml.sax.SAXException

lastUpdated

public java.util.Date lastUpdated()
Retrieves the time when the harvester was last updated

Returns:
Date

purgePastHarvestsData

public void purgePastHarvestsData()
Destroys all summary data pertaining to past harvests but not the actual oai records harvested

Specified by:
purgePastHarvestsData in interface OAIHarvester

storeFailedHarvestData

protected void storeFailedHarvestData(java.lang.Exception e)
Stores data about harvesting failures caused by problems other than oai errors sent from a queried repository

Specified by:
storeFailedHarvestData in class AbstractOAIHarvester
Parameters:
e -

backup

public void backup(SaveParameters save_config)
            throws SDXException
Save the timeStamp of the Harvester

Specified by:
backup in interface Saveable
Throws:
SDXException
See Also:
Saveable.backup(fr.gouv.culture.sdx.utils.save.SaveParameters)

restore

public void restore(SaveParameters save_config)
             throws SDXException
Restore the timeStamp of the Harvester

Specified by:
restore in interface Saveable
Throws:
SDXException
See Also:
Saveable.restore(fr.gouv.culture.sdx.utils.save.SaveParameters)

close

public void close()
Close OAI harvester.

Overrides:
close in class AbstractOAIHarvester


Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.