fr.gouv.culture.sdx.document
Class HTMLDocument

java.lang.Object
  extended by fr.gouv.culture.sdx.utils.AbstractSdxObject
      extended by fr.gouv.culture.sdx.document.AbstractDocument
          extended by fr.gouv.culture.sdx.document.AbstractIndexableDocument
              extended by fr.gouv.culture.sdx.document.HTMLDocument
All Implemented Interfaces:
Document, IndexableDocument, ParsableDocument, Describable, Encodable, Identifiable, Localizable, SdxObject, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, org.apache.avalon.framework.logger.LogEnabled, org.apache.avalon.framework.service.Serviceable, org.apache.cocoon.xml.XMLProducer, org.apache.excalibur.xml.sax.XMLConsumer, org.apache.excalibur.xml.sax.XMLizable, org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler

public class HTMLDocument
extends AbstractIndexableDocument
implements ParsableDocument

An HTML document, parsable and indexable.


Nested Class Summary
 
Nested classes/interfaces inherited from class fr.gouv.culture.sdx.document.AbstractIndexableDocument
AbstractIndexableDocument.StoreHandler
 
Nested classes/interfaces inherited from interface fr.gouv.culture.sdx.utils.SdxObject
SdxObject.ConfigurationNode
 
Field Summary
protected  java.io.File tidyConf
           
 
Fields inherited from class fr.gouv.culture.sdx.document.AbstractIndexableDocument
_boost, _currentFieldBoost, _msgHandler, _xmlFieldList, _xmlFieldOutput, _xmlFieldTransformer, attachedDocuments, characterBuffer, contentHandler, currentFieldName, DOC_ATTACHEDOC_ELEMENT_NAME, DOC_FIELD_ELEMENT_NAME, DOC_MSG_ELEMENT_NAME, DOC_NAMESPACE, DOC_ROOT_ELEMENT_NAME, lexicalHandler, nsTable, openSdxDocElems, properties, storeHandler, subDoc, subDocBytes, subDocuments, transformedDoc, withinSdxElement, withinXmlField, xmlConsumer
 
Fields inherited from class fr.gouv.culture.sdx.document.AbstractDocument
idGenerator, idPrefix, idSuffix, mimeType, storeRepo
 
Fields inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject
_configuration, _context, _description, _encoding, _id, _locale, _logger, _manager, _xmlizable_objects, _xmlLang, isToSaxInitialized
 
Fields inherited from interface fr.gouv.culture.sdx.document.Document
CLASS_NAME_SUFFIX, DOCTYPE_BINARY, DOCTYPE_GROUP, DOCTYPE_HTML, DOCTYPE_USER, DOCTYPE_XML
 
Fields inherited from interface fr.gouv.culture.sdx.utils.Encodable
DEFAULT_ENCODING
 
Constructor Summary
HTMLDocument()
          Creates an HTML document.
HTMLDocument(java.lang.String id)
          Creates an HTML document given an id.
 
Method Summary
 void addAdditionalSystemFields(org.apache.lucene.document.Document doc)
          Some additional system fields adding to the Lucene document
 java.lang.String getDocType()
          Gets the docType for the document
 java.lang.String getMimeType()
          Returns the mimeType field (A String) for this document
 void parse(org.apache.excalibur.xml.sax.SAXParser parser)
          Parses a document using the previously supplied consumer.
 void parse(org.apache.excalibur.xml.sax.SAXParser parser, org.apache.cocoon.xml.XMLConsumer consumer)
          Parses a document using a specific consumer.
 void setTidyConfiguration(java.io.File tidyConf)
           
 void setTransformedDocument(byte[] content)
          Set's the transformed document for the parent document.
 void setTransformedDocument(java.io.File file)
          Set's the transformed document for the parent document.
 void startIndexing(org.apache.excalibur.xml.sax.SAXParser parser, org.apache.cocoon.xml.XMLConsumer consumer)
          Starts the indexing process.
 
Methods inherited from class fr.gouv.culture.sdx.document.AbstractIndexableDocument
addAttachedDocument, characters, comment, endCDATA, endDocument, endDTD, endElement, endEntity, endPrefixMapping, generateId, getAttachedDocuments, getAttachedDocumentsSize, getBoost, getFieldValues, getStoreHandler, getSubDocuments, getSubDocumentsSize, getTransformedDocument, handleDocumentId, ignorableWhitespace, processingInstruction, resetAttachedDocuments, resetFields, setAttachedDocuments, setBoost, setConsumer, setContentHandler, setDocumentLocator, setLexicalHandler, setMessageHandler, setUpdateAttachedDocuments, setUpTransformedDocument, setXMLFieldList, setXMLTransformerHandler, skippedEntity, startCDATA, startDocument, startDTD, startElement, startEntity, startPrefixMapping, updateAttachedDocuments
 
Methods inherited from class fr.gouv.culture.sdx.document.AbstractDocument
getClassNameSuffix, getInputSource, getLength, getPreferredFilename, getRepositoryForStorage, getURL, initToSax, initVolatileObjectsToSax, openStream, save, setContent, setContent, setContent, setContent, setIdGenerator, setIdGenerator, setMimeType, setPreferredFilename, setPreferredFilename, setRepositoryForStorage, setURL
 
Methods inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject
configure, configureDescription, contextualize, enableLogging, getBaseAttributes, getConfiguration, getContext, getDescription, getEncoding, getId, getLocale, getLog, getServiceManager, getXmlLang, service, setDescription, setEncoding, setId, setLocale, setUpSdxObject, setUpSdxObject, setXmlLang, toSAX, verifyConfigurationResources
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface fr.gouv.culture.sdx.document.Document
getLength, getPreferredFilename, getRepositoryForStorage, getURL, openStream, save, setContent, setContent, setContent, setContent, setId, setIdGenerator, setIdGenerator, setMimeType, setPreferredFilename, setRepositoryForStorage, setURL
 
Methods inherited from interface fr.gouv.culture.sdx.utils.SdxObject
getLog
 
Methods inherited from interface org.apache.avalon.framework.logger.LogEnabled
enableLogging
 
Methods inherited from interface org.apache.avalon.framework.context.Contextualizable
contextualize
 
Methods inherited from interface org.apache.avalon.framework.service.Serviceable
service
 
Methods inherited from interface org.apache.avalon.framework.configuration.Configurable
configure
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Identifiable
getId
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Describable
getDescription, setDescription
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Encodable
getEncoding, setEncoding
 
Methods inherited from interface fr.gouv.culture.sdx.utils.Localizable
getLocale, getXmlLang, setLocale, setXmlLang
 
Methods inherited from interface org.apache.excalibur.xml.sax.XMLizable
toSAX
 
Methods inherited from interface org.apache.cocoon.xml.XMLProducer
setConsumer
 

Field Detail

tidyConf

protected java.io.File tidyConf
Constructor Detail

HTMLDocument

public HTMLDocument(java.lang.String id)
             throws SDXException
Creates an HTML document given an id.

Parameters:
id - The document's id. If logging is desired the super.getLog() should be set after creation.
Throws:
SDXException
See Also:
AbstractSdxObject.enableLogging(org.apache.avalon.framework.logger.Logger)

HTMLDocument

public HTMLDocument()
Creates an HTML document. The document's id must be given later. If logging is desired the super.getLog() should be set after creation.

See Also:
AbstractSdxObject.enableLogging(org.apache.avalon.framework.logger.Logger)
Method Detail

startIndexing

public void startIndexing(org.apache.excalibur.xml.sax.SAXParser parser,
                          org.apache.cocoon.xml.XMLConsumer consumer)
                   throws SDXException
Starts the indexing process.

Specified by:
startIndexing in interface IndexableDocument
Parameters:
parser - The parser to use
consumer - The consumer for the events generated by the indexing process
Throws:
SDXException

parse

public void parse(org.apache.excalibur.xml.sax.SAXParser parser)
           throws SDXException
Parses a document using the previously supplied consumer.

Specified by:
parse in interface ParsableDocument
Parameters:
parser - The parser to use.
Throws:
SDXException

parse

public void parse(org.apache.excalibur.xml.sax.SAXParser parser,
                  org.apache.cocoon.xml.XMLConsumer consumer)
           throws SDXException
Parses a document using a specific consumer.

Specified by:
parse in interface ParsableDocument
Parameters:
parser - The parser to use
consumer - The consumer of the events generated by the parse
Throws:
SDXException

getDocType

public java.lang.String getDocType()
Gets the docType for the document

Specified by:
getDocType in interface Document

setTransformedDocument

public void setTransformedDocument(byte[] content)
                            throws SDXException
Set's the transformed document for the parent document. The transformed document will have the same id and preferred filename as the original.

Specified by:
setTransformedDocument in interface IndexableDocument
Parameters:
content - The byte array of data
Throws:
SDXException

setTransformedDocument

public void setTransformedDocument(java.io.File file)
                            throws SDXException
Set's the transformed document for the parent document. The transformed document will have the same id and preferred filename as the original.

Specified by:
setTransformedDocument in interface IndexableDocument
Parameters:
file - The transformed document file
Throws:
SDXException

getMimeType

public java.lang.String getMimeType()
Returns the mimeType field (A String) for this document

Specified by:
getMimeType in interface Document
Overrides:
getMimeType in class AbstractDocument

setTidyConfiguration

public void setTidyConfiguration(java.io.File tidyConf)

addAdditionalSystemFields

public void addAdditionalSystemFields(org.apache.lucene.document.Document doc)
Some additional system fields adding to the Lucene document

Specified by:
addAdditionalSystemFields in interface IndexableDocument


Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.