fr.gouv.culture.sdx.document
Class AbstractIndexableDocument

java.lang.Object
  extended by fr.gouv.culture.sdx.utils.AbstractSdxObject
      extended by fr.gouv.culture.sdx.document.AbstractDocument
          extended by fr.gouv.culture.sdx.document.AbstractIndexableDocument
All Implemented Interfaces:
Document, IndexableDocument, Describable, Encodable, Identifiable, Localizable, SdxObject, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, org.apache.avalon.framework.logger.LogEnabled, org.apache.avalon.framework.service.Serviceable, org.apache.excalibur.xml.sax.XMLConsumer, org.apache.excalibur.xml.sax.XMLizable, org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler
Direct Known Subclasses:
HTMLDocument, XMLDocument

public abstract class AbstractIndexableDocument
extends AbstractDocument
implements IndexableDocument

An abstract class for indexable documents. TODO: THIS SHOULD BE BETTER, IN MORE DETAILS OF THE BASIC INDEXATION ELEMENT CREATION, should refer to some schema and it works from the permuatations of the schema.


Nested Class Summary
 class AbstractIndexableDocument.StoreHandler
           
 
Nested classes/interfaces inherited from interface fr.gouv.culture.sdx.utils.SdxObject
SdxObject.ConfigurationNode
 
Field Summary
protected  float _boost
           
protected  float _currentFieldBoost
           
protected  org.xml.sax.ContentHandler _msgHandler
           
protected  java.util.HashMap _xmlFieldList
          list of fields with a a XML type
protected  java.io.ByteArrayOutputStream _xmlFieldOutput
          The output of the parsed xml field
protected  javax.xml.transform.sax.TransformerHandler _xmlFieldTransformer
          The transfomer used to parse the xml fields
protected  java.util.Vector attachedDocuments
          A list of attached documents
protected  java.lang.StringBuffer characterBuffer
          A character buffer for element content.
protected  org.xml.sax.ContentHandler contentHandler
          The ContentHandler receiving SAX events.
protected  java.lang.String currentFieldName
          The current field name
protected  java.lang.String DOC_ATTACHEDOC_ELEMENT_NAME
           
protected  java.lang.String DOC_FIELD_ELEMENT_NAME
           
protected  java.lang.String DOC_MSG_ELEMENT_NAME
           
protected  java.lang.String DOC_NAMESPACE
           
protected  java.lang.String DOC_ROOT_ELEMENT_NAME
           
protected  org.xml.sax.ext.LexicalHandler lexicalHandler
          The LexicalHandler receiving SAX events.
protected  org.apache.avalon.framework.parameters.Parameters nsTable
           
protected  int openSdxDocElems
           
protected  java.util.Vector properties
          List of fields for indexing.
protected  AbstractIndexableDocument.StoreHandler storeHandler
           
protected  IndexableDocument subDoc
           
protected  java.io.ByteArrayOutputStream subDocBytes
           
protected  java.util.Vector subDocuments
          A list of sub(Indexable) documents
protected  IndexableDocument transformedDoc
          A document resulting from a transformation
protected  boolean withinSdxElement
           
protected  boolean withinXmlField
           
protected  org.apache.cocoon.xml.XMLConsumer xmlConsumer
          The XMLConsumer receiving SAX events.
 
Fields inherited from class fr.gouv.culture.sdx.document.AbstractDocument
idGenerator, idPrefix, idSuffix, mimeType, storeRepo
 
Fields inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject
_configuration, _context, _description, _encoding, _id, _locale, _logger, _manager, _xmlizable_objects, _xmlLang, isToSaxInitialized
 
Fields inherited from interface fr.gouv.culture.sdx.document.Document
CLASS_NAME_SUFFIX, DOCTYPE_BINARY, DOCTYPE_GROUP, DOCTYPE_HTML, DOCTYPE_USER, DOCTYPE_XML
 
Fields inherited from interface fr.gouv.culture.sdx.utils.Encodable
DEFAULT_ENCODING
 
Constructor Summary
AbstractIndexableDocument()
           
 
Method Summary
 void addAttachedDocument(java.lang.String id, java.net.URL baseURL, java.lang.String url, java.lang.String mimetype, java.lang.String repoId)
          Add an attached document to the list for this document.
 void characters(char[] ch, int start, int length)
           
 void comment(char[] chars, int i, int i1)
          Currently has no function
 void endCDATA()
          Currently has no function
 void endDocument()
          Currently has no function
 void endDTD()
          Currently has no function
 void endElement(java.lang.String nsURI, java.lang.String name, java.lang.String qName)
           
 void endEntity(java.lang.String s)
          Currently has no function
 void endPrefixMapping(java.lang.String s)
          Currently has no function
protected  java.lang.String generateId()
           
 java.util.Enumeration getAttachedDocuments()
          Retrieves an Enumeration of attached documents
 int getAttachedDocumentsSize()
           
 float getBoost()
          Gets a boost factor for scoring (currently Lucene specific)
 java.util.Enumeration getFieldValues()
          Returns field values.
 AbstractIndexableDocument.StoreHandler getStoreHandler()
           
 java.util.Enumeration getSubDocuments()
          Retrieves an Enumeration of sub(Indexable) documents
 int getSubDocumentsSize()
           
 IndexableDocument getTransformedDocument()
          Returns the transformed document object or null if no transformed document during the indexation pipeline
protected  void handleDocumentId(org.xml.sax.Attributes atts)
           
 void ignorableWhitespace(char[] chars, int i, int i1)
          Currently has no function
 void processingInstruction(java.lang.String s, java.lang.String s1)
          Currently has no function
 void resetAttachedDocuments()
          Reinits the Vector of attached documents
protected  void resetFields()
          Resets the objects we need to store indexation data or creates them if they do not exist
 void setAttachedDocuments(java.util.Vector list)
          Set's the list of attached documents for this document.
 void setBoost(float boost)
          Sets a boost factor for scoring (currently Lucene specific)
 void setConsumer(org.apache.cocoon.xml.XMLConsumer consumer)
          Set the XMLConsumer that will receive XML data.
 void setContentHandler(org.xml.sax.ContentHandler handler)
          Set the ContentHandler that will receive XML data.
 void setDocumentLocator(org.xml.sax.Locator locator)
          Currently has no function
 void setLexicalHandler(org.xml.sax.ext.LexicalHandler handler)
          Set the LexicalHandler that will receive XML data.
 void setMessageHandler(org.xml.sax.ContentHandler handler)
           
 void setUpdateAttachedDocuments(boolean updateAttachedDocuments)
          Indicates wheter the list of attached documents must be refreshed.
protected  void setUpTransformedDocument()
           
 void setXMLFieldList(java.util.HashMap fieldList)
          Sets the XMLFieldList of the DocumentBase where the document is stored.
 void setXMLTransformerHandler(javax.xml.transform.sax.TransformerHandler xmlFieldTransformer)
          Sets the XMLTransformer used to parse the xml fields
 void skippedEntity(java.lang.String s)
          Currently has no function
 void startCDATA()
          Currently has no function
 void startDocument()
          Currently has no function
 void startDTD(java.lang.String s, java.lang.String s1, java.lang.String s2)
          Currently has no function
 void startElement(java.lang.String nsURI, java.lang.String name, java.lang.String qName, org.xml.sax.Attributes atts)
           
 void startEntity(java.lang.String s)
          Currently has no function
 void startPrefixMapping(java.lang.String s, java.lang.String s1)
          Currently has no function
 boolean updateAttachedDocuments()
          Indicates wheter the documents must be refreshed.
 
Methods inherited from class fr.gouv.culture.sdx.document.AbstractDocument
getClassNameSuffix, getInputSource, getLength, getMimeType, getPreferredFilename, getRepositoryForStorage, getURL, initToSax, initVolatileObjectsToSax, openStream, save, setContent, setContent, setContent, setContent, setIdGenerator, setIdGenerator, setMimeType, setPreferredFilename, setPreferredFilename, setRepositoryForStorage, setURL
 
Methods inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject
configure, configureDescription, contextualize, enableLogging, getBaseAttributes, getConfiguration, getContext, getDescription, getEncoding, getId, getLocale, getLog, getServiceManager, getXmlLang, service, setDescription, setEncoding, setId, setLocale, setUpSdxObject, setUpSdxObject, setXmlLang, toSAX, verifyConfigurationResources
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface fr.gouv.culture.sdx.document.IndexableDocument
addAdditionalSystemFields, setTransformedDocument, setTransformedDocument, startIndexing
 
Methods inherited from interface fr.gouv.culture.sdx.document.Document
getDocType, getLength, getMimeType, getPreferredFilename, getRepositoryForStorage, getURL, openStream, save, setContent, setContent, setContent, setContent, setId, setIdGenerator, setIdGenerator, setMimeType, setPreferredFilename, setRepositoryForStorage, setURL
 
Methods inherited from interface fr.gouv.culture.sdx.utils.SdxObject
getLog
 
Methods inherited from interface org.apache.avalon.framework.logger.LogEnabled
enableLogging
 
Methods inherited from interface org.apache.avalon.framework.context.Contextualizable
contextualize
 
Methods inherited from interface org.apache.avalon.framework.service.Serviceable
service
 
Methods inherited from interface org.apache.avalon.framework.configuration.Configurable
configure
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Identifiable
getId
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Describable
getDescription, setDescription
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Encodable
getEncoding, setEncoding
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Localizable
getLocale, getXmlLang, setLocale, setXmlLang
 
Methods inherited from interface org.apache.excalibur.xml.sax.XMLizable
toSAX
 

Field Detail

DOC_NAMESPACE

protected java.lang.String DOC_NAMESPACE

DOC_ROOT_ELEMENT_NAME

protected java.lang.String DOC_ROOT_ELEMENT_NAME

DOC_FIELD_ELEMENT_NAME

protected java.lang.String DOC_FIELD_ELEMENT_NAME

DOC_ATTACHEDOC_ELEMENT_NAME

protected java.lang.String DOC_ATTACHEDOC_ELEMENT_NAME

DOC_MSG_ELEMENT_NAME

protected java.lang.String DOC_MSG_ELEMENT_NAME

nsTable

protected org.apache.avalon.framework.parameters.Parameters nsTable

properties

protected java.util.Vector properties
List of fields for indexing.


characterBuffer

protected java.lang.StringBuffer characterBuffer
A character buffer for element content.


currentFieldName

protected java.lang.String currentFieldName
The current field name


openSdxDocElems

protected int openSdxDocElems

attachedDocuments

protected java.util.Vector attachedDocuments
A list of attached documents


subDocuments

protected java.util.Vector subDocuments
A list of sub(Indexable) documents


transformedDoc

protected IndexableDocument transformedDoc
A document resulting from a transformation


subDoc

protected IndexableDocument subDoc

subDocBytes

protected java.io.ByteArrayOutputStream subDocBytes

withinSdxElement

protected boolean withinSdxElement

withinXmlField

protected boolean withinXmlField

_msgHandler

protected org.xml.sax.ContentHandler _msgHandler

_boost

protected float _boost
See Also:
Document.boost

_currentFieldBoost

protected float _currentFieldBoost

_xmlFieldList

protected java.util.HashMap _xmlFieldList
list of fields with a a XML type


_xmlFieldTransformer

protected javax.xml.transform.sax.TransformerHandler _xmlFieldTransformer
The transfomer used to parse the xml fields


_xmlFieldOutput

protected java.io.ByteArrayOutputStream _xmlFieldOutput
The output of the parsed xml field


xmlConsumer

protected org.apache.cocoon.xml.XMLConsumer xmlConsumer
The XMLConsumer receiving SAX events.


contentHandler

protected org.xml.sax.ContentHandler contentHandler
The ContentHandler receiving SAX events.


lexicalHandler

protected org.xml.sax.ext.LexicalHandler lexicalHandler
The LexicalHandler receiving SAX events.


storeHandler

protected AbstractIndexableDocument.StoreHandler storeHandler
Constructor Detail

AbstractIndexableDocument

public AbstractIndexableDocument()
Method Detail

startElement

public void startElement(java.lang.String nsURI,
                         java.lang.String name,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Specified by:
startElement in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

handleDocumentId

protected void handleDocumentId(org.xml.sax.Attributes atts)
                         throws SDXException
Throws:
SDXException

generateId

protected java.lang.String generateId()
                               throws SDXException
Throws:
SDXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Specified by:
characters in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

comment

public void comment(char[] chars,
                    int i,
                    int i1)
             throws org.xml.sax.SAXException
Currently has no function

Specified by:
comment in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endCDATA

public void endCDATA()
              throws org.xml.sax.SAXException
Currently has no function

Specified by:
endCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endDTD

public void endDTD()
            throws org.xml.sax.SAXException
Currently has no function

Specified by:
endDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
Currently has no function

Specified by:
endDocument in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

endEntity

public void endEntity(java.lang.String s)
               throws org.xml.sax.SAXException
Currently has no function

Specified by:
endEntity in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endPrefixMapping

public void endPrefixMapping(java.lang.String s)
                      throws org.xml.sax.SAXException
Currently has no function

Specified by:
endPrefixMapping in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

ignorableWhitespace

public void ignorableWhitespace(char[] chars,
                                int i,
                                int i1)
                         throws org.xml.sax.SAXException
Currently has no function

Specified by:
ignorableWhitespace in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

processingInstruction

public void processingInstruction(java.lang.String s,
                                  java.lang.String s1)
                           throws org.xml.sax.SAXException
Currently has no function

Specified by:
processingInstruction in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

setDocumentLocator

public void setDocumentLocator(org.xml.sax.Locator locator)
Currently has no function

Specified by:
setDocumentLocator in interface org.xml.sax.ContentHandler

skippedEntity

public void skippedEntity(java.lang.String s)
                   throws org.xml.sax.SAXException
Currently has no function

Specified by:
skippedEntity in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

startCDATA

public void startCDATA()
                throws org.xml.sax.SAXException
Currently has no function

Specified by:
startCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startDTD

public void startDTD(java.lang.String s,
                     java.lang.String s1,
                     java.lang.String s2)
              throws org.xml.sax.SAXException
Currently has no function

Specified by:
startDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Currently has no function

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

startEntity

public void startEntity(java.lang.String s)
                 throws org.xml.sax.SAXException
Currently has no function

Specified by:
startEntity in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startPrefixMapping

public void startPrefixMapping(java.lang.String s,
                               java.lang.String s1)
                        throws org.xml.sax.SAXException
Currently has no function

Specified by:
startPrefixMapping in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String nsURI,
                       java.lang.String name,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Specified by:
endElement in interface org.xml.sax.ContentHandler
Throws:
org.xml.sax.SAXException

setConsumer

public void setConsumer(org.apache.cocoon.xml.XMLConsumer consumer)
Set the XMLConsumer that will receive XML data.
This method will simply call setContentHandler(consumer) and setLexicalHandler(consumer).


setContentHandler

public void setContentHandler(org.xml.sax.ContentHandler handler)
Set the ContentHandler that will receive XML data.
Subclasses may retrieve this ContentHandler instance accessing the protected super.contentHandler field.


setLexicalHandler

public void setLexicalHandler(org.xml.sax.ext.LexicalHandler handler)
Set the LexicalHandler that will receive XML data.
Subclasses may retrieve this LexicalHandler instance accessing the protected super.lexicalHandler field.

Throws:
java.lang.IllegalStateException - If the LexicalHandler or the XMLConsumer were already set.

getFieldValues

public java.util.Enumeration getFieldValues()
Returns field values.

Specified by:
getFieldValues in interface IndexableDocument

addAttachedDocument

public void addAttachedDocument(java.lang.String id,
                                java.net.URL baseURL,
                                java.lang.String url,
                                java.lang.String mimetype,
                                java.lang.String repoId)
                         throws SDXException
Add an attached document to the list for this document.

Parameters:
id - The document id.
baseURL - The base URL, usually the parent document's URL
url - URL of the attached document.
mimetype - Mime type of the document, can be null.
repoId -
Throws:
SDXException

setAttachedDocuments

public void setAttachedDocuments(java.util.Vector list)
Set's the list of attached documents for this document.

Parameters:
list - The list of attached documents.

updateAttachedDocuments

public boolean updateAttachedDocuments()
Indicates wheter the documents must be refreshed.

Returns:
boolean

setUpdateAttachedDocuments

public void setUpdateAttachedDocuments(boolean updateAttachedDocuments)
Indicates wheter the list of attached documents must be refreshed.

Parameters:
updateAttachedDocuments - A boolean indicator.

getAttachedDocuments

public java.util.Enumeration getAttachedDocuments()
Retrieves an Enumeration of attached documents

Specified by:
getAttachedDocuments in interface IndexableDocument
Returns:
An Enumeration of "BinaryDocument" objects

getAttachedDocumentsSize

public int getAttachedDocumentsSize()

resetFields

protected void resetFields()
Resets the objects we need to store indexation data or creates them if they do not exist


resetAttachedDocuments

public void resetAttachedDocuments()
Reinits the Vector of attached documents

Specified by:
resetAttachedDocuments in interface IndexableDocument

getTransformedDocument

public IndexableDocument getTransformedDocument()
Returns the transformed document object or null if no transformed document during the indexation pipeline

Specified by:
getTransformedDocument in interface IndexableDocument

setUpTransformedDocument

protected void setUpTransformedDocument()
                                 throws SDXException
Throws:
SDXException

getSubDocuments

public java.util.Enumeration getSubDocuments()
Retrieves an Enumeration of sub(Indexable) documents

Specified by:
getSubDocuments in interface IndexableDocument
Returns:
An Enumeration of "XMLDocuments" objects

getSubDocumentsSize

public int getSubDocumentsSize()

getStoreHandler

public AbstractIndexableDocument.StoreHandler getStoreHandler()
Specified by:
getStoreHandler in interface IndexableDocument

setMessageHandler

public void setMessageHandler(org.xml.sax.ContentHandler handler)
Specified by:
setMessageHandler in interface IndexableDocument

setBoost

public void setBoost(float boost)
Sets a boost factor for scoring (currently Lucene specific)

Specified by:
setBoost in interface IndexableDocument
See Also:
Document.setBoost(float)

getBoost

public float getBoost()
Gets a boost factor for scoring (currently Lucene specific)

Specified by:
getBoost in interface IndexableDocument
See Also:
Document.getBoost()

setXMLFieldList

public void setXMLFieldList(java.util.HashMap fieldList)
Description copied from interface: IndexableDocument
Sets the XMLFieldList of the DocumentBase where the document is stored. give each name of the fields with a XML type.

Specified by:
setXMLFieldList in interface IndexableDocument
See Also:
fr.gouv.culture.sdx.document.IndexableDocument#setFieldList(java.util.HashMap)

setXMLTransformerHandler

public void setXMLTransformerHandler(javax.xml.transform.sax.TransformerHandler xmlFieldTransformer)
Description copied from interface: IndexableDocument
Sets the XMLTransformer used to parse the xml fields

Specified by:
setXMLTransformerHandler in interface IndexableDocument
See Also:
fr.gouv.culture.sdx.document.IndexableDocument#setXMLTransformer(javax.xml.transform.Transformer)


Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.