5.5. XML formats

5.5.1. Preface

XML is a derivative of SGML, and is used for stuctured storage of content and metadata.

The XML files are valid except for a couple of problems; there is no doctype at the beginning and where attributes have a numerical value, they are not surrounded in quotation marks.

None of the tags contain any content; they only have sub-tags and attributes. It is the attributes which store content, and all the attributes are named "value".

The XML files are used to implement the Collections feature. Each COL file specifies a single collection and hhcolreg.dat is used to store data on all collections on a machine.

There are three tags in hhctrl.ocx, that are yet to be seen in any of the XML formats; findmergedchms, showhomepage and homepage. At a guess these are likely to be meta-data tags from the COL format.

5.5.2. COL format

The col xml format describes individual collections of individual chm files that are used to group content into aggregations.

You can access a DTD that describes this XML format or read on for a more human-oriented description.

The outer element is XML, immediately inside this is the HTMLHelpCollection element, which is the true container for the other elements.

First in the HTMLHelpCollection element come several meta-data elements:

Table 5.47. Meta-data elements in the COL XML format.

Element Explanation
masterchm The file-name stem of the main chm of the collection.
masterlangid The numerical value of the main language, presumably that of the chm listed in the masterchm element.
samplelocation This has only been empty. Presumably the location that stores the samples.
collectionnum Which collection this is in the collection registry file (hhcolreg.dat). Numeric.
refcount Presumably a reference count that stores how many times this collection is linked into other collections or into hhcolreg.dat.
version Presumably the DTD version for this particular collection.

These are followed by a Folders element, which contains a single Folder element.

Each Folder element contains a TitleString element, a FolderOrder element (numerical) and 0 or more Folder elements. The value attribute of the TitleString element contains the string to display in the contents pane and the value attribute of the FolderOrder element stores the order in which its parent Folder element is put into the contents pane at the current Folder depth. If two Folder elements with the same parent have the same value attribute of the FolderOrder element then the one that occurs earlier in the file will come first in the contents pane.

Terminal Folder elements have TitleString elements with a value attribute of the form "=chmstem", where chmstem is the file-name stem of a chm file. Terminal Folder elements also have a LangId element whose value attribute is a numerical language identifier.

5.5.3. hhcolreg.dat format

hhcolreg.dat stores information on collections on the machine.

You can access a DTD that describes this XML format or read on for a more humane description.

The outer element is XML, immediately inside this is the HTMLHelpDocInfo element, which is the true container for the other elements.

First in the HTMLHelpDocInfo element comes the meta-data element NextCollectionId. The value attribute of this element stores the value that the ColNum of the next Collection will have if another collection is ever added.

Then come the following three container elements; Collections, Locations and DocCompilations.

The Collections element contains any number of Collection elements. Each Collection element contains a single ColNum element and a single ColName element. The value attribute of the ColNum element stores a unique number that identifies the collection. The value attribute of the ColName element stores the full path to the .col file for this collection.

The Locations element contains any number of Location elements. Each Location element contains one of each of the LocColNum, LocName, TitleString, LocPath and Volume elements. The table below explains the content and purpose of the value attribute of each of these elements.

Table 5.48. Location sub-elements

Element Explanation
LocColNum The value attribute of the ColNum element of the Collection element that this Location element applies to.
LocName The name of this Location element. This is used in sub-elements of the LocationHistory element.
TitleString This is displayed in a dialog box when the user is prompted to insert the required media because it is not accessible by hh.
LocPath The full path where the location can be accessed.
Volume The volume label of the filesystem that the LocPath is stored on.

The DocCompilations element contains any number of DocCompilation elements. Each DocCompilation element contains one of each of the DocCompId, DocCompLanguage and LocationHistory elements. The value attribute of the DocCompId element stores an identifier which is used in terminal Folder elements in the col files and is usually the stem of the CHM/CHI file for this element. The value attribute of the DocCompLanguage element stores the LCID. The LocationHistory element contains one of each of the ColNum, TitleLocation, IndexLocation, QueryLocation, LocationRef, Version, LastPromptedVersion, TitleSampleLocation, TitleQueryLocation and SupportsMerge elements, which are all described in the table below.

Table 5.49. LocationHistory subelements

Element Explanation
ColNum The collection number this DocCompilation is part of.
TitleLocation The full path to the CHM.
IndexLocation The full path to the CHI.
QueryLocation The full path to the CHQ.
LocationRef The LocName where the CHM/CHI are stored.
Version Numeric. Presumably a version of CHM.
LastPromptedVersion 0. Presumably the last Version that was installed before the previous one.
TitleSampleLocation The LocName where the samples for this CHM are stored.
TitleQueryLocation The LocName where the CHQ is stored.
SupportsMerge Numeric. Presumably whether or not the CHM has the other type of merging.