fr.gouv.culture.sdx.document
Class HTMLDocument
java.lang.Object
fr.gouv.culture.sdx.utils.AbstractSdxObject
fr.gouv.culture.sdx.document.AbstractDocument
fr.gouv.culture.sdx.document.AbstractIndexableDocument
fr.gouv.culture.sdx.document.HTMLDocument
- All Implemented Interfaces:
- Document, IndexableDocument, ParsableDocument, Describable, Encodable, Identifiable, Localizable, SdxObject, org.apache.avalon.framework.configuration.Configurable, org.apache.avalon.framework.context.Contextualizable, org.apache.avalon.framework.logger.LogEnabled, org.apache.avalon.framework.service.Serviceable, org.apache.cocoon.xml.XMLProducer, org.apache.excalibur.xml.sax.XMLConsumer, org.apache.excalibur.xml.sax.XMLizable, org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler
public class HTMLDocument
- extends AbstractIndexableDocument
- implements ParsableDocument
An HTML document, parsable and indexable.
Field Summary |
protected java.io.File |
tidyConf
|
Fields inherited from class fr.gouv.culture.sdx.document.AbstractIndexableDocument |
_boost, _currentFieldBoost, _msgHandler, _xmlFieldList, _xmlFieldOutput, _xmlFieldTransformer, attachedDocuments, characterBuffer, contentHandler, currentFieldName, DOC_ATTACHEDOC_ELEMENT_NAME, DOC_FIELD_ELEMENT_NAME, DOC_MSG_ELEMENT_NAME, DOC_NAMESPACE, DOC_ROOT_ELEMENT_NAME, lexicalHandler, nsTable, openSdxDocElems, properties, storeHandler, subDoc, subDocBytes, subDocuments, transformedDoc, withinSdxElement, withinXmlField, xmlConsumer |
Fields inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject |
_configuration, _context, _description, _encoding, _id, _locale, _logger, _manager, _xmlizable_objects, _xmlLang, isToSaxInitialized |
Constructor Summary |
HTMLDocument()
Creates an HTML document. |
HTMLDocument(java.lang.String id)
Creates an HTML document given an id. |
Method Summary |
void |
addAdditionalSystemFields(org.apache.lucene.document.Document doc)
Some additional system fields adding to the Lucene document |
java.lang.String |
getDocType()
Gets the docType for the document |
java.lang.String |
getMimeType()
Returns the mimeType field (A String) for this document |
void |
parse(org.apache.excalibur.xml.sax.SAXParser parser)
Parses a document using the previously supplied consumer. |
void |
parse(org.apache.excalibur.xml.sax.SAXParser parser,
org.apache.cocoon.xml.XMLConsumer consumer)
Parses a document using a specific consumer. |
void |
setTidyConfiguration(java.io.File tidyConf)
|
void |
setTransformedDocument(byte[] content)
Set's the transformed document for the parent document. |
void |
setTransformedDocument(java.io.File file)
Set's the transformed document for the parent document. |
void |
startIndexing(org.apache.excalibur.xml.sax.SAXParser parser,
org.apache.cocoon.xml.XMLConsumer consumer)
Starts the indexing process. |
Methods inherited from class fr.gouv.culture.sdx.document.AbstractIndexableDocument |
addAttachedDocument, characters, comment, endCDATA, endDocument, endDTD, endElement, endEntity, endPrefixMapping, generateId, getAttachedDocuments, getAttachedDocumentsSize, getBoost, getFieldValues, getStoreHandler, getSubDocuments, getSubDocumentsSize, getTransformedDocument, handleDocumentId, ignorableWhitespace, processingInstruction, resetAttachedDocuments, resetFields, setAttachedDocuments, setBoost, setConsumer, setContentHandler, setDocumentLocator, setLexicalHandler, setMessageHandler, setUpdateAttachedDocuments, setUpTransformedDocument, setXMLFieldList, setXMLTransformerHandler, skippedEntity, startCDATA, startDocument, startDTD, startElement, startEntity, startPrefixMapping, updateAttachedDocuments |
Methods inherited from class fr.gouv.culture.sdx.document.AbstractDocument |
getClassNameSuffix, getInputSource, getLength, getPreferredFilename, getRepositoryForStorage, getURL, initToSax, initVolatileObjectsToSax, openStream, save, setContent, setContent, setContent, setContent, setIdGenerator, setIdGenerator, setMimeType, setPreferredFilename, setPreferredFilename, setRepositoryForStorage, setURL |
Methods inherited from class fr.gouv.culture.sdx.utils.AbstractSdxObject |
configure, configureDescription, contextualize, enableLogging, getBaseAttributes, getConfiguration, getContext, getDescription, getEncoding, getId, getLocale, getLog, getServiceManager, getXmlLang, service, setDescription, setEncoding, setId, setLocale, setUpSdxObject, setUpSdxObject, setXmlLang, toSAX, verifyConfigurationResources |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface fr.gouv.culture.sdx.document.Document |
getLength, getPreferredFilename, getRepositoryForStorage, getURL, openStream, save, setContent, setContent, setContent, setContent, setId, setIdGenerator, setIdGenerator, setMimeType, setPreferredFilename, setRepositoryForStorage, setURL |
Methods inherited from interface fr.gouv.culture.sdx.utils.SdxObject |
getLog |
Methods inherited from interface org.apache.avalon.framework.logger.LogEnabled |
enableLogging |
Methods inherited from interface org.apache.avalon.framework.context.Contextualizable |
contextualize |
Methods inherited from interface org.apache.avalon.framework.service.Serviceable |
service |
Methods inherited from interface org.apache.avalon.framework.configuration.Configurable |
configure |
Methods inherited from interface org.apache.excalibur.xml.sax.XMLizable |
toSAX |
Methods inherited from interface org.apache.cocoon.xml.XMLProducer |
setConsumer |
tidyConf
protected java.io.File tidyConf
HTMLDocument
public HTMLDocument(java.lang.String id)
throws SDXException
- Creates an HTML document given an id.
- Parameters:
id
- The document's id.
If logging is desired the super.getLog() should be set after creation.
- Throws:
SDXException
- See Also:
AbstractSdxObject.enableLogging(org.apache.avalon.framework.logger.Logger)
HTMLDocument
public HTMLDocument()
- Creates an HTML document.
The document's id must be given later.
If logging is desired the super.getLog() should be set after creation.
- See Also:
AbstractSdxObject.enableLogging(org.apache.avalon.framework.logger.Logger)
startIndexing
public void startIndexing(org.apache.excalibur.xml.sax.SAXParser parser,
org.apache.cocoon.xml.XMLConsumer consumer)
throws SDXException
- Starts the indexing process.
- Specified by:
startIndexing
in interface IndexableDocument
- Parameters:
parser
- The parser to useconsumer
- The consumer for the events generated by the indexing process
- Throws:
SDXException
parse
public void parse(org.apache.excalibur.xml.sax.SAXParser parser)
throws SDXException
- Parses a document using the previously supplied consumer.
- Specified by:
parse
in interface ParsableDocument
- Parameters:
parser
- The parser to use.
- Throws:
SDXException
parse
public void parse(org.apache.excalibur.xml.sax.SAXParser parser,
org.apache.cocoon.xml.XMLConsumer consumer)
throws SDXException
- Parses a document using a specific consumer.
- Specified by:
parse
in interface ParsableDocument
- Parameters:
parser
- The parser to useconsumer
- The consumer of the events generated by the parse
- Throws:
SDXException
getDocType
public java.lang.String getDocType()
- Gets the docType for the document
- Specified by:
getDocType
in interface Document
setTransformedDocument
public void setTransformedDocument(byte[] content)
throws SDXException
- Set's the transformed document for the parent document.
The transformed document will have the same id and preferred
filename as the original.
- Specified by:
setTransformedDocument
in interface IndexableDocument
- Parameters:
content
- The byte array of data
- Throws:
SDXException
setTransformedDocument
public void setTransformedDocument(java.io.File file)
throws SDXException
- Set's the transformed document for the parent document.
The transformed document will have the same id and preferred
filename as the original.
- Specified by:
setTransformedDocument
in interface IndexableDocument
- Parameters:
file
- The transformed document file
- Throws:
SDXException
getMimeType
public java.lang.String getMimeType()
- Returns the mimeType field (A String) for this document
- Specified by:
getMimeType
in interface Document
- Overrides:
getMimeType
in class AbstractDocument
setTidyConfiguration
public void setTidyConfiguration(java.io.File tidyConf)
addAdditionalSystemFields
public void addAdditionalSystemFields(org.apache.lucene.document.Document doc)
- Some additional system fields adding to the Lucene document
- Specified by:
addAdditionalSystemFields
in interface IndexableDocument
Copyright © 2000-2010 Ministere de la culture et de la communication / AJLSM. All Rights Reserved.