5.3. Internal file formats

5.3. Internal file formats
	Chapter 5. Formats

5.3.1. Preface

Please let us know if you find any other internal files, figure out formats of any internal files or find out what unknown parts of the files below do. Any and all contributions will be fully attributed and, if appropriate, co-copyright given.

In this section, where the description of a file says that an item is an offset into another file, that file may be located in the same CHM, or it may be located in an accompanying CHI file.

The different types of ITSF files contain different internal files. The list below indicates which file types contain which internal files:

CHI: /#ITBITS, /#SYSTEM, /#IDXHDR, /#STRINGS, /#TOCIDX, /#TOPICS, /#URLSTR, /#URLTBL, /#WINDOWS, /$OBJINST, /$WWAssociativeLinks/BTree, /$WWAssociativeLinks/Data, /$WWAssociativeLinks/Map, /$WWAssociativeLinks/Property, /$WWKeywordLinks/BTree, /$WWKeywordLinks/Data, /$WWKeywordLinks/Map, /$WWKeywordLinks/Property
CHM: /#ITBITS, /#SYSTEM, /#IDXHDR, /#STRINGS, /#TOCIDX, /#TOPICS, /#URLSTR, /#URLTBL, /#IVB, /#SUBSETS, /#WINDOWS, /$FIftiMain, /$OBJINST, /$WWAssociativeLinks/BTree, /$WWAssociativeLinks/Data, /$WWAssociativeLinks/Map, /$WWAssociativeLinks/Property, /$WWKeywordLinks/BTree, /$WWKeywordLinks/Data, /$WWKeywordLinks/Map, /$WWKeywordLinks/Property
CHQ: /$FIftiMain, /$OBJINST, /$TitleMap
CHW: /$OBJINST, /$HHTitleMap, /$WWAssociativeLinks/BTree, /$WWAssociativeLinks/Data, /$WWAssociativeLinks/Map, /$WWAssociativeLinks/Property, /$WWKeywordLinks/BTree, /$WWKeywordLinks/Data, /$WWKeywordLinks/Map, /$WWKeywordLinks/Property
hh.dat: windowtype, AdvSearchUI/Keywords, AdvSearchUI/Properties, Bookmarks/v1/Count, Bookmarks/v1/n/Topic, Bookmarks/v1/n/Url
KPD: /#KEY_DATA, /#KEY_DELETED
Seen in HHA.dll or on the internet, but not seen in any ITSF files: #GRPINF (see helpdeco docs by Manfred Winterhoff for a possible function), #INFOTYPES (probably will be output when MS implements information types), #URLS (probably a previous incarnation of the #URLTBL + #URLSTR combination), #BSSC (8 bytes, based on something I saw in KeyTools.exe it might contain the version of RoboHelp used to create the CHM. If you have RoboHelp please check this out & be sure to send in your chm.)

5.3.2. `#ITBITS`

The files I have seen so far have been empty or filled with zero BYTEs so who knows. My guess is that it has something to do with information types. The file where it had a non-zero size (12 zero BYTEs in VOICESDK.CHI from the MSDN) also had a non-zero #SYSTEM code 15 (Information type checksum) entry of 0xFFFFFFFF.

5.3.3. `#SYSTEM`

The #SYSTEM file begins with a DWORD, which is a version number. It is 2 in files compiled with Compatibility set to "1.0" or 3 in files compiled with Compatibility set to 1.1 or later. Other values have not been found. It is followed by #SYSTEM entries to the EOF, which have the following format:

Table 5.1. The format of #SYSTEM entries.

Offset	Type	Comment/Value
0	WORD	code - see below for values & meanings
2	WORD	length of data
4	BYTEs	data

In the below list of the different codes the order of the codes in the #SYSTEM file is 10, 9, 4, 2, 3, 16, 6, (5,0,1 or 0,1,5 - haven't been able to make files with all three), 7, 11, 12, 13, 14, 8 and lastly 15.

Table 5.2. An explanation for each of the #SYSTEM codes.

Code^[a] Explanation

0 Value of Contents file in the [OPTIONS] section of the HHP file. NT

1 Value of Index file in the [OPTIONS] section of the HHP file. NT

2 Value of Default topic in the [OPTIONS] section of the HHP file. NT

3 Value of Title in the [OPTIONS] section of the HHP file. NT

28 (HHA Version 4.72.7294 and earlier) or 36 (HHA Version 4.72.8086 and later) byte structure:

Table 5.3. The format of the code 4 #SYSTEM entry.

Offset	Type	Comment/Value
0	DWORD	LCID from the HHP file.
4	DWORD	One if DBCS is in use.
8	DWORD	One if full-text search is on.
0xC	DWORD	Non-zero if the file has KLinks.
0x10	DWORD	Non-zero if the file has ALinks.
0x14	QWORD	timestamp. Derived from `GetSystemTimeAsFileTime`, as called by the compiler program. No conversion, just stored as two little endian DWORDs, `dwLowDateTime` first, then `dwHighDateTime`.
0x1C	DWORD	0/1 (unknown) Only `dsmsdn.chi` from the MSDN has 1 here. Perhaps 1 means it is the root chm of a collection?
0x20	DWORD	0 (unknown)

5 Value of Default Window in the [OPTIONS] section of the HHP file. NT

6 Value of Compiled file in the [OPTIONS] section of the HHP file. This is the lowercase of the stem of the CHM file name. If the name of the CHM is ..\bar\foo\ FOO-Bar . chm jimmy is a poo-bum then this will be foo-bar . NT

7 ^[a]DWORD present in files with Binary Index turned on. The entry in the #URLTBL file that points to the sitemap index had the same first DWORD.

Rare. VOICESDK.CHM & CHI and WOSA.CHI from the MSDN have one. The abbreviations and explanations seem to be the same in WOSA.CHI & VOICESDK.CHM, except for 2 mistakes (one in VOICESDK.CHM & one in WOSA.CHI) that seem to be created by bugs in the compiler. Both were compiled by the same version of HHA (4.72.8086), so perhaps this version has some weird bug. Each entry is 16 BYTEs:

Table 5.4. The format of the code 8 #SYSTEM entry.

Offset	Type	Comment/Value
0	DWORD	0, 4 in some (unknown)
4	DWORD	Offset in `#STRINGS` file. An abbreviation.
8	DWORD	3 where 1st DWORD is 0, 5 where it is 4 (unknown)
0xC	DWORD	Offset in `#STRINGS` file. An explanation of the abbreviation.

9 The version/program that the CHM was compiled by - shown in the version dialog as Compiled with %s where %s is what is in this entry of the #SYSTEM file. If compiled with the MS HTML Help Author dll then it will be something like HHA Version 4.74.8702. It comes directly from the resource strings of HHA.dll (I saw it there in Unicode and successfully altered it). Beware that the text control in the version dialog that displays it is only so big and in some cases the string won't be displayed, & in other cases only part, depending apon the effect of wrapping, so if you write a compiler, be sure to test it and use a short name and version. Usually NT, but HH won't crash if it isn't.

10 time_t timestamp (DWORD). Derived from GetLocalTime. Not sure of the conversion yet.

11 ^[a]DWORD present in files with Binary TOC turned on. The entry in the #URLTBL file that points to the sitemap contents has the same first DWORD.

12 ^[a]DWORD. Number of information types.

13 ^[a]The #IDXHDR file contains exactly the same bytes. See below for more info

14 Rare. The ones I saw were from MS Word 2000. My guess is that it is an MSOffice extension (or maybe not) that overrides the names & window types of the navigation tabs. DWORD number of windows to override, 2 ANSI/UTF-8 NT strings for each window. The first is the text for the tab & the second is probably the name of the window type to use. (eg 2, &Answer Wizard\0MsoHelpAWDlg\0&Index\0MsoHelpKeyDlg\0) These are from the Custom tab variables of the [OPTIONS] section of the HHP file. The resources from MSOHELP.EXE have a weird .reg file that gives the CLSIDs involved in the provision of these dialogs.

15 ^[a]DWORD. Information type checksum. Unknown algorithm & data source.

16 Value of Default Font in [OPTIONS] section of the HHP file. NT

^[a]Code 17 to code 65535 have not yet been seen. Please let us know if you see these.

^[a]Not present in files compiled with Compatibility set to 1.0.

5.3.4. `#IDXHDR`

This has exactly the same bytes as the code 13 entry in the #SYSTEM file and is 4096 bytes long.

Table 5.5. The format of the #IDXHDR file.

Offset	Type	Comment/Value
0	char[4]	T#SM
4	DWORD	Unknown timestamp/checksum
8	DWORD	1 (unknown)
0xC	DWORD	Number of topic nodes including the contents & index files
0x10	DWORD	0 (unknown)
0x14	DWORD	Offset in the `#STRINGS` file of the ImageList param of the "text/site properties" object of the sitemap contents (0/-1 = none)
0x18	DWORD	0 (unknown)
0x1C	DWORD	1 if the value of the ImageType param of the "text/site properties" object of the sitemap contents is `Folder`. 0 otherwise.
0x20	DWORD	The value of the Background param of the "text/site properties" object of the sitemap contents
0x24	DWORD	The value of the Foreground param of the "text/site properties" object of the sitemap contents
0x28	DWORD	Offset in the `#STRINGS` file of the Font param of the "text/site properties" object of the sitemap contents (0/-1 = none)
0x2C	DWORD	The value of the Window Styles param of the "text/site properties" object of the sitemap contents
0x30	DWORD	The value of the ExWindow Styles param of the "text/site properties" object of the sitemap contents
0x34	DWORD	Unknown. Often -1. Sometimes 0.
0x38	DWORD	Offset in the `#STRINGS` file of the FrameName param of the "text/site properties" object of the sitemap contents (0/-1 = none)
0x3C	DWORD	Offset in the `#STRINGS` file of the WindowName param of the "text/site properties" object of the sitemap contents (0/-1 = none)
0x40	DWORD	Number of information types.
0x44	DWORD	Unknown. Often 1. Also 0, 3.
0x48	DWORD	Number of files in the [MERGE FILES] list.
0x4C	DWORD	Unknown. Often 0. Non-zero mostly in files with some files in the merge files list.
0x50	DWORD[1004]	List of offsets in the `#STRINGS` file that are the [MERGE FILES] list. Zero terminated, but don't count on it.

5.3.5. `#WINDOWS`

This file contains information on the window types in the CHM. It has the following format:

Table 5.6. The format of the #WINDOWS header.

Offset	Type	Comment/Value
0	DWORD	Number of entries in the file
4	DWORD	Size of each of the entries in the file (188 or 196)
8	`#WINDOWS` entries to the EOF

#WINDOWS entries are basically HH_WINTYPE structures as specified in htmlhelp.h. Note the first DWORD can be used to specify different versions of this structure. Also note that the HHW docs show a different structure to htmlhelp.h. Therefore many CHM files need to be surveyed to find structures with sizes other than 188 or 196. In the description of #WINDOWS entries below, Arg n means that that item is argument n of the window definition in the HHP file, either converted to a DWORD or to an offset in the indicated file:

Table 5.7. The format of each #WINDOWS entry.

Offset	Type	Comment/Value
0	DWORD	Size of the entry (188 in CHMs compiled with Compatibility set to `1.0`, 196 in CHMs compiled with Compatibility set to `1.1 or later`)
4	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "BOOL fUniCodeStrings; // IN/OUT: TRUE if all strings are in UNICODE"
8	DWORD	Arg 0. Offset in `#STRINGS` file.
0xC	DWORD	Which window properties are valid & are to be used for this window. See the table below.
0x10	DWORD	Arg 10.
0x14	DWORD	Arg 1. Offset in `#STRINGS` file.
0x18	DWORD	Arg 14.
0x1C	DWORD	Arg 15.
0x20	RECT	Arg 13. Order left, top, right & bottom.
0x30	DWORD	Arg 16.
0x34	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "HWND hwndHelp; // OUT: window handle"
0x38	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "HWND hwndCaller; // OUT: who called this window"
0x3C	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "HH_INFOTYPE* paInfoTypes; // IN: Pointer to an array of Information Types"
0x40	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "HWND hwndToolBar; // OUT: toolbar window in tri-pane window"
0x44	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "HWND hwndNavigation; // OUT: navigation window in tri-pane window"
0x48	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "HWND hwndHTML; // OUT: window displaying HTML in tri-pane window"
0x4C	DWORD	Arg 11.
0x50	BYTE[16]	0 (unknown) - but `htmlhelp.h` indicates that this is a RECT that is "RECT rcHTML; // OUT: HTML window coordinates" & the HHW docs say "Specifies the coordinates of the Topic pane."
0x60	DWORD	Arg 2. Offset in `#STRINGS` file.
0x64	DWORD	Arg 3. Offset in `#STRINGS` file.
0x68	DWORD	Arg 4. Offset in `#STRINGS` file.
0x6C	DWORD	Arg 5. Offset in `#STRINGS` file.
0x70	DWORD	Arg 12.
0x74	DWORD	Arg 17.
0x78	DWORD	Arg 18.
0x7C	DWORD	Arg 19.
0x80	DWORD	Arg 20.
0x84	BYTE[20]	0 (unknown) - but `htmlhelp.h` indicates that this is "BYTE tabOrder[HH_MAX_TABS + 1]; // IN/OUT: tab order: Contents, Index, Search, History, Favorites, Reserved 1-5, Custom tabs"
0x98	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "int cHistory; // IN/OUT: number of history items to keep (default is 30)"
0x9C	DWORD	Arg 7. Offset in `#STRINGS` file.
0xA0	DWORD	Arg 9. Offset in `#STRINGS` file.
0xA4	DWORD	Arg 6. Offset in `#STRINGS` file.
0xA8	DWORD	Arg 8. Offset in `#STRINGS` file.
0xAC	BYTE[16]	0 (unknown) - but `htmlhelp.h` indicates that this is a RECT that is "RECT rcMinSize; // Minimum size for window (ignored in version 1)"
Everything after here is only present in CHMs compiled with Compatibility set to `1.1 or later`.
0xBC	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "int cbInfoTypes; // size of paInfoTypes;"
0xC0	DWORD	0 (unknown) - but `htmlhelp.h` indicates that this is "LPCTSTR pszCustomTabs; // multiple zero-terminated strings"

Table 5.8. Flags used to specify which values are valid.

Value^[a]	Valid property
0x00000002	Navigation pane style.
0x00000004	Window style flags.
0x00000008	Window extended style flags.
0x00000010	Initial window position.
0x00000020	Navigation pane width.
0x00000040	Window show state.
0x00000080	Info types.
0x00000100	Buttons.
0x00000200	Navigation Pane initially closed state.
0x00000400	Tab position.
0x00000800	Tab order.
0x00001000	History count.
0x00002000	Default Pane.
^[a]The rest of the values either do nothing or are unknown. Please let us know if you find out what they are.

5.3.6. `#STRINGS`

This file is a list of ANSI/UTF-8 NT strings. The first is just a NIL character so that offsets to this file can specify zero & get a valid string. The strings are sliced up into blocks that are 4096 bytes in length. If a string crosses the end of a block then it will be cut off without a NT and repeated in full, with a NT, at the start of the next block. For eg "To customize the appearance of a contents file" might become "To customize the <block ending>To customize the appearance of a contents file" when there are 17 bytes left at the end of the block. The last block will be smaller than 4096 bytes if the strings do not fill it up.

The strings are in this order; "", [WINDOWS] (Arg 0, Arg 1, Arg 7, Arg 9, Arg 2, Arg 3, Arg 4, Arg 5, Arg 6, Arg 8) no. n..., Contents_0_Entry_title, Index_0_Keyword, Contents_Image_file, Contents_Font, Contents_Default_frame, Contents_Default_window, [MERGE FILES] no. n...

5.3.7. `#TOCIDX`

Present in files with a non-empty contents file, when Binary TOC is on and Compatibility is set to 1.1 or later.

This file is made up of 0x1000 byte blocks, but this is only apparent because of extra bytes interrupting what would otherwise be a stream of 20/28 byte structs. If the other parts (DWORDs & 16 byte structs) didn't fit into these blocks then presumably this would show up in the other parts too.

The first block is the header:

Table 5.9. The format of the #TOCIDX header.

Offset	Type	Comment/Value
0	DWORD	4096/header length/offset of no. 1 below
4	DWORD	offset of no. 3 below
8	DWORD	number of no. 3 below
0xC	DWORD	offset of no. 2 below
0x10	BYTE[4080]	0 (unknown)

The header is followed by the following different types of structs in the specified order:

20/28 byte structs (pages/books)
list of DWORDs pointing into the #TOPICS file
16 byte structs - links above stuff

First all the top level books/pages, then the next level, then the next & so on

Table 5.10. 20/28 byte structures

Offset	Type	Comment/Value
0	WORD	0 (unknown)
2	WORD	Unknown
4	DWORD	Seems to be a bit field: 0x2 is whether or not New is turned on, 0x4 is set when the entry is a book/has children and 0x8 is set when the entry has a Local value. The other bits are unknown (0x1, 0x40, 0x100 are sometimes set on books).
8	DWORD	Unknown. In some cases it is an index into the `#TOPICS` file of the entry containing offsets to the title & filename.
0xC	DWORD	Offset to the parent book.
0x10	DWORD	Offset to the next book/page in the current book/page.
The next two DWORDs are only present in books (28 byte structs)
0x14	DWORD	Offset to the first child of the book.
0x18	DWORD	0 (unknown)

Table 5.11. 16 byte structures

Offset	Type	Comment/Value
0	DWORD	Offset into no. 1 above.
4	DWORD	Some kind of sequence number that is incremented by one and starts at 666. I swear :)
8	DWORD	Offset into no. 2 above. Can contain RAM litter.
0xC	DWORD	Index in `#TOPICS` file of the entry containing offsets to the title & filename. Can contain RAM litter.

5.3.8. `#TOPICS`

	Note
	Note that an index into this file should be multiplied by 16 to get the offset of the entry.

An index into this file can be converted to an offset in the #URLTBL file, without reading this file using the following formula: offset = (index%341)*12 + index/341*4096

This file contains information on the topics present.

Each entry has the following format.

Table 5.12. The format of the #TOPICS file entries.

Offset	Type	Comment/Value
0	DWORD	Offset into the tree in the `#TOCIDX` file.
4	DWORD	Offset in `#STRINGS` file of the contents of the title tag or the Name param of the file in question. -1 = no title.
8	DWORD	Offset in `#URLTBL` of entry containing offset to `#URLSTR` entry containing the URL.
0xC	WORD	2 indicates not in contents, 6 indicates that it is in the contents, 0/4 something else (unknown)
0xE	WORD	0, 2, 4, 8, 10, 12, 16, 32 (unknown)

5.3.9. `#URLSTR`

This file is made up of 0x4000 byte blocks. If the last block is not filled then it will be smaller than 0x4000 bytes. The free space at the end of the blocks is filled with NUL bytes. The blocks contain the following elements one after another:

An unknown BYTE. So far this has been 0, 0x42 and in spechsdk.chi it was 0x49. Does not indicate presence/absence of URL/FrameName strings.

This is followed by pairs of URL, FrameName strings (both NT) from the HHC.

Then come all the Local strings from the HHC:

Table 5.13. The format of the #URLSTR entries.

Offset	Type	Comment/Value
0	DWORD	Offset of the URL for this topic.
4	DWORD	Offset of the FrameName for this topic.
8	ANSI/UTF-8 NT string that is the Local for this topic.

There is one way to tell where the end of the URL/FrameName pairs occurs: Repeat the following: read 2 DWORDs and if both are less than the current offset then this is the start of the Local strings else skip two NT strings.

5.3.10. `#URLTBL`

An offset in this file can be converted to an index into the #TOPICS file, without reading this file using the following formula: index = ((offset%4096)+((offset/4096)*4096-4))/12

Each 0x1000 byte block has the following format.

Table 5.14. The format of the #URLTBL blocks.

Offset	Type	Comment/Value
0	DWORD[3][341]	341 entries. 12 bytes each.
0xFFC	DWORD	4096 (unknown) Possibly the length of the block? That MS would pull this kind of shit is really annoying; they should have just put all the entries one after another, not stuffed in an arbitrary DWORD after every 4092 bytes. From this and other blockness I guess they are optimizing for the Wintel platform.

Offset

Type

Comment/Value

DWORD[3][341]

341 entries. 12 bytes each.

0xFFC

DWORD

4096 (unknown) Possibly the length of the block?

That MS would pull this kind of shit is really annoying; they should have just put all the entries one after another, not stuffed in an arbitrary DWORD after every 4092 bytes. From this and other blockness I guess they are optimizing for the Wintel platform.

Each entry has the following format.

Table 5.15. The format of the #URLTBL block entries.

Offset	Type	Comment/Value
0	DWORD	Unknown. I suspect that this is either some kind of unique ID or two WORDs.
4	DWORD	Index of entry in `#TOPICS` file.
8	DWORD	Offset in `#URLSTR` file of entry containing filename.

5.3.11. `#IVB`

This is basically the [ALIAS] section of the HHP file.

The #IVB file begins with a DWORD indicating the total size of all the #IVB entries, which then follow that DWORD.

#IVB entries have the following format.

Table 5.16. The format of the #IVB entries.

Offset	Type	Comment/Value
0	DWORD	The value of the alias
4	DWORD	Offset in `#STRINGS` file of the file to show

5.3.12. `#SUBSETS`

This file is present when the [SUBSETS] section is present in the HHP file.

Table 5.17. The format of the #SUBSETS header.

Offset	Type	Comment/Value
0	WORD	0 (unknown)
2	WORD	Number of bytes taken up by the subset entries.

The subset entries currently seem to be garbage left over from previous usage of the same memory locations. Based on the number of bytes per non-whitespace line in the [SUBSETS] section each subset entry is 12 BYTEs in length.

5.3.13. `$FIftiMain`

5.3.13.1. Preface

I would like to acknowledge Jed Wing for reverse-engineering this file and Razvan Cojocaru for pointing out clarity issues in this section.

This file stores information for the full-text search function of HTML Help. It prevents the need for viewers of CHM files to search each and every topic in the CHM to find words or phrases. This speeds up searching considerably, since the index that this file stores, contains data on which words occur in which files and at which locations.

This file will be empty when Full-text search has been turned off or when no files have been indexed. The CHM compiler will only index those files that contain .h in their names. All word sorting, processing and storage is done case-insensitively and is not case-preserving. If you have a word longer than 99 characters in a HTML file then it seems the indexing routines will die during indexing of that file and then skip on to the next one.

The full-text search data is structured as a tree that is more similar to the ITSP directory than the BTree file. The file begins with a header. This is followed by index nodes, leaf nodes and word location codes (WLCs). The index and leaf nodes are a fixed size (specified in the header) and the WLCs are variably sized (specified in the leaf nodes). There is a WLC block just before each leaf node and the pairs of WLC blocks and leaf nodes are interspersed with index nodes. Index nodes only point to nodes (index or leaf) that are further down the tree and at the same time located colser to the start of the file. The index nodes are directly after the last of their child nodes in the file. The structure of this format is best explained by the following diagram of a ficticious $FIftiMain file. For a more extensive set of examples see the example section

Figure 5.1. A pictorial description of the $FIftiMain format from CHM files.

This file makes use of two different ways of encoding integers in variable length fields; the so called scale and root method and a variant of the ENCINT method used in the PMGL/PMGI directory chunks from the ITSF format. For the ENCINTs in this file the bytes are stored least significant first (little endian), whereas in the PMGL/PMGI chunks they are stored most significant first (big endian).

For CHM viewers that want to quickly find out which files a word is in, the procedure is as follows; read the header, seek to the root index node, search the root index node for a word that is less than or equal to the desired, descend to the next index level, repeat the previous two steps as many times as the tree is deep, then search the resulting leaf node until the desired word is found, read the correct part of the WLCs for that leaf node and extract the topic numbers for that word. Once all the topic numbers have been found for the desired words, make sure there are no duplicates and extract the titles and urls from the #TOPICS, #URLTBL, #URLSTR and #STRINGS files.

For sample code for decoding $FIftiMain files, see chmdeco, xCHM or dump_fiftimain.c from the old html versions of this specification.

5.3.13.2. Structures

The file begins with a header that is 0x400 bytes in length:

Table 5.18. The format of the $FIftiMain header.

Offset	Type	Comment/Value
0	BYTE[4]	0x00 0x00 0x28 0x00 (unknown)
4	DWORD	Number of HTML files indexed after any automatic splitting.
8	DWORD	Offset to the single leaf node (if there are no index nodes), or the root index node of the tree (if there are any index nodes). This will be 4096 less than the file length.
0xC	DWORD	0 (unknown)
0x10	DWORD	The number of leaf nodes in the file.
0x14	DWORD	Same as the value at offset 8.
0x18	WORD	How many nodes deep the tree is: e.g. 1 if only a single leaf node, 2 if there is a single index node to index among the leaf nodes, 3 if there are 2 levels of index nodes.
0x1A	DWORD	7 (unknown)
0x1E	BYTE	Scale for encoding of the document index in the WLCs
0x1F	BYTE	Root size for encoding of the document index in the WLCs
0x20	BYTE	Scale for encoding of the code count in the WLCs
0x21	BYTE	Root size for encoding of the code count in the WLCs
0x22	BYTE	Scale for encoding of the location codes in the WLCs
0x23	BYTE	Root size for encoding of the location codes in the WLCs
0x24	BYTE[10]	0 (unknown)
0x2E	DWORD	Size in bytes of each of the leaf and index nodes (4096).
0x32	DWORD	0/1 (unknown)
0x36	DWORD	Word index of the last duplicate.
0x3A	DWORD	Character index of the last duplicate. From the first character of the first word. The whitespace after tags is not included. & type things are counted as one character. Line endings are not counted in this.
0x3E	DWORD	Length of the longest word in the list not including NT (maximum of 99).
0x42	DWORD	Number of words including duplicates.
0x46	DWORD	Number of words not including duplicates.
0x4A	DWORD	The total length of all the words including duplicates is this DWORD plus the next one. It is unknown how the split is performed.
0x4E	DWORD	This one is usually smaller than the previous one.
0x52	DWORD	Total length of all the words not including duplicates.
0x56	DWORD	Length of unused/null bytes at the end of the word block (if only 1 block, more than total if > 1 block - possibly some free space in WLC blocks).
0x5A	DWORD	0 (unknown)
0x5E	DWORD	One less than the number of HTML files indexed (not entirely sure).
0x62	BYTE[24]	0 (unknown)
0x7A	DWORD	Windows code page identifier (usually 1252 - Windows 3.1 US (ANSI))
0x7E	DWORD	LCID from the HHP file.
0x82	BYTE[894]	0 (unknown)

The index nodes begin with a WORD indicating the length of free space at the end of the node. This is followed by the entries, which fill up as much of the index node as possible.

Table 5.19. Index node entries

Offset	Type	Comment/Value
0	BYTE	One more than the length of the word/partial word in this entry.
1	BYTE	Position in the word where characters are placed.
2	BYTEs	Length-1 bytes make up the word or part of the word. Not NT
+0	DWORD	Offset of the leaf node whose last entry is this word.
+4	WORD	0 (unknown)

The leaf nodes begin with a short header, which is followed by entries:

Table 5.20. Leaf node header

Offset	Type	Comment/Value
0	DWORD	Offset to the next leaf node. 0 if this is the last leaf node.
4	WORD	0 (unknown)
6	WORD	Length of free space at the end of the current leaf node.

Table 5.21. Leaf node entries

Offset	Type	Comment/Value
0	BYTE	Length of the word/partial word in this entry plus one. Maximum of 100.
1	BYTE	Position in the word where characters are placed.
2	BYTEs	Length-1 bytes make up the word or part of the word. Maximum of 99 BYTEs. Not NT
+0	BYTE	Context (0 for body tag, 1 for title tag, other values are unknown)
+1	ENCINT	Number of WLC entries.
+0	DWORD	Offset in this file of the WLC entries for this word.
+4	WORD	0 (unknown)
+6	ENCINT	Number of bytes used by the WLC entries for this word.

Each WLC block is made up of bits packed as tightly as possible and is right-padded with 0s to a full byte. The fields are encoded as the scale and root variable-length integer format described below, with the parameters taken from the initial header. "Delta coding" is also used in a couple of places to reduce the size of the codes -- that is, the first value is stored verbatim, and subsequent values are stored as a delta or difference from the previous value.

Table 5.22. WLC entries

Encoding(s)	Name/Comment
s/r and delta (across the various entries for a single word)	Document index. Indicates in which document this entry is for by way of an index into the `#TOPICS` file.
s/r	Code count. The number of times that the word this entry specifies is used in the specified topic.
s/r and delta (within each WLC entry)	Location codes. A list of word indices (zero based) where this word occurs in the specified topic.

5.3.13.3. Scale and root endcoding

The scale and root method of integer encoding needs two parameters, which I'll call s (scale) and r (root size). In the context of $FIftiMain files, s always appears to be '2', but any other power of 2 could also work (and might be used in some rare cases).

The integer is encoded as two parts, p (prefix) and q (actual bits). p determines how many bits are stored, as well as implicitly determining the high-order bit of the integer. To encode an integer, p starts out as a single 0. If the integer fits in r bits, you're done. If the integer fits in r+1 bits (i.e. r-th bit is set, counting from 0), prepend a 1 to the p and store the low r bits of the integer in q. Otherwise, while the integer does not fit in the allotted space, prepend a bit to p, and increase the size of q by one bit. It's hard to see from the description, but an example will make it more clear. Using s=2, r=3:

Table 5.23. Example of scale and root encoding

Value	p	q
0	0	000
1	0	001
2	0	010
7	0	111
8	10	000
9	10	001
10	10	010
15	10	111
16	110	0000
17	110	0001
18	110	0010
30	110	1110
31	110	1111
32	1110	00000
33	1110	00001
34	1110	00010
62	1110	11110
63	1110	11111
64	11110	000000

A scale other than 2 has never been seen, so it is hard to say how s/r encoding works when s=4, etc. The following is how it might work using s=4, r=2 i.e. a base-4 digit is added each time, meaning two bits added each time.:

Table 5.24. Example of scale and root encoding where scale is greater than 2.

Value	p	q (base 4)	q (binary)
0	0	00	0000
1	0	01	0001
14	0	32	1110
15	0	33	1111
16	10	00	0000
17	10	01	0001
30	10	32	1110
31	10	33	1111
32	110	000	000000
33	110	001	000001

Of course, this is all wild speculation, since examples with s other than 2 haven't been seen... But the codes do work this way (i.e. prepending a 1 to the prefix multiplies the additive value 'b' by s and adds another log2(s) bits.)

5.3.13.4. Examples

This is all fairly complex, so some examples will be extremely useful here. First a short ficticious example, then an example from the real world.

The following table is a ficticious example of how the words appear in leaf nodes. The storage of words in index nodes is the same as in leaf nodes, except some fields are gone.

Table 5.25. Example leaf node entries

Stored word	Whole word	Length+1 to add	Position to add at	Context
`john`	`john`	5	0	0
`sh`	`josh`	3	2	0
`ington`	`joshington`	7	4	0
	`joshington`	1	10	1 (title)

This example is taken from a copy of windows.chm, the system documentation apparently distributed with some version of Windows 98:

Hex dump of two leaf node entries:

000223d:                                        02 00 31  ...0...........1
0002240: 00 0a 03 04 00 00 00 00 1d 01 01 01 01 20 04 00  ............. ..
0002250: 00 00 00 03

The fields of these two entries are as follows:

Table 5.26. Example leaf node entries

Field	1st entry	2nd entry
New length	2	1
Old length	0	1
Word	"1"
Context	0	1
Num WLC ents	0xA	1
Offset	0x403	0x420
Unknown	0	0
WLCs length	0x1D	3

Scanning over to offset 0x403 in the file, we see:

0000403:          f9 f4 60 86 b8 ea 6a 00 ed 78 00 2d c0
0000410: f8 d7 28 2c f0 f6 dc c8 ce 66 61 80 87 02 00 00
0000420: f9 f4 40

Broken out, these WLC entries are:

1          <10, 1027, 29>:  f9 f4 60 86 b8 ea 6a 00 ed 78 00 2d c0 f8 d7 28
                            2c f0 f6 dc c8 ce 66 61 80 87 02 00 00
1 (TITLE)  <1, 1056, 3>:    f9 f4 40

Now, the parameters for the WLC in this file are 2/2, 2/1, 2/5. Here is a quick reference table for the codes:

p       value       q (bits)
2/1:
0:      0-1         1
10:     2-3         1
110:    4-7         2
1110:   8-15        3
11110:  16-31       4
111110: 32-63       5
2/2:
0:      0-3         2
10:     4-7         2
110:    8-15        3
1110:   16-31       4
11110:  32-63       5
111110: 64-127      6
2/5:
0:      0-31        5
10:     32-63       5
110:    64-127      6
1110:   128-255     7
11110:  256-511     8
111110: 512-1023    9

Let's start with the short one, since it's very simple:

f9 f4 40 => 1111 1001 1111 0100 0100 0000
2/2 Document index: 111110 011111 => 64 + 31 => Document no. 95
2/1 Code count:     0 1           => 1
2/5 Location codes: 0 00100       => 4       => Word no. 4
    padding:        0000

So, in document no. 95, word no. 4 is a '1' which is in the title. Now, the ordering of the documents is provided by the #URLTBL and #URLSTR files. Looking up document no. 95 in there (0-based indexing!), we see the file is internet_account.htm, in which, the first non-markup text is:

<title>Dial-Up Networking: Step 1</title>
0: dial
1: up
2: networking
3: step
4: 1

Now, the next one is a little more complicated. I won't go over it in as much detail, but I'll just break it out quickly. It contains 10 entries:

(111110 011111) ( 0 1) (   0 00110  )             0000
(    10 00    ) ( 0 1) (  10 10111  )             000
(  1110 1010  ) ( 0 1) (  10 10100  )             0000000
(  1110 1101  ) ( 0 1) (1110 0000000)             000
(     0 01    ) ( 0 1) (  10 11100  )             0000
(111110 001101) ( 0 1) ( 110 010100 )             0
(     0 01    ) ( 0 1) (  10 01111  )             0000
( 11110 11011 ) ( 0 1) ( 110 011001 )             000
(   110 011   ) (10 0) ( 110 011001 ) ( 10 00011) 0000000
(    10 00    ) ( 0 1) ( 110 000001 )             0
                                                  00000000 00000000

Parsing those entries, we get:

Table 5.27. Example WLC entries.

Document number	Word numbers
95	6
95+4 = 99	55
99+26 = 125	52
125+29 = 154	128
154+1 = 155	60
155+77 = 232	84
232+1 = 233	47
233+59 = 292	89
292+11 = 303	89 and 89+35 = 124
303+4 = 307	65

Picking one at random, say, Document no. 303 with 2 hits, we open up windows_netsetup_netwin.htm, from which I've generated a wordlist containing all of the words in order:

  0: To(TITLE)
  1: set(TITLE)
  2: up(TITLE)
  ..
  ..
 86: client
 87: follow
 88: steps
 89: 1
 90: 3
 ..
122: follow
123: steps
124: 1
125: 3
126: and
 ..

And we can see the word '1' shows up in precisely the 89th and 124th spots.

5.3.13.5. Processing

The CHM compiler extracts words from all the input files (those with .h in their names), converts them to lowercase and stores them in the $FIftiMain file.

Words in the leaf and index nodes are made up of the following characters stored as is: 0x01 (buggy), 0-9, a-z, _, 0xDE, 0xFE. Bytes from the input files are converted and stored as per the table below. Character entity references of the form ▼ are truncated to BYTEs and translated as per the table below. Character entity references of the form & are treated as whitespace, except for the the latin characters, which are converted as per the table below.

Table 5.28. Conversions

Before	After
A-Z	a-z
0x8A, 0x9A	s
0x8C, 0x9C	oe
0x9F, 0xDD, 0xFD, 0xFF, Ý, ý, ÿ	y
0xC0-0xC5, 0xE0-0xE5, À, À, Á, Â, Ã, Ä, Å, à, á, â, ã, ä, å.	a
0xC6, 0xE6, Æ, æ	ae
0xC7, 0xE7, Ç, ç	c
0xC8-0xCB, 0xE8-0xEB, È, É, Ê, Ë, è, é, ê, ë	e
0xCC-0xCF, 0xEC-0xEF, Ì, Í, Î, Ï, ì, í, î, ï	i
0xD0, Ð	d
0xD1, 0xF1, ñ, Ñ	n
0xD2-0xD8, 0xF0, 0xF2-0xF8, ð, Ò, Ó, Ô, Õ, Ö, Ø, ò, ó, ô, õ, ö, ø	o
0xD9-0xDC, 0xF9-0xFC, Ù, Ú, Û, Ü, ù, ú, û, ü	u
0xDF, ß	ss
þ	0xFE
Þ	0xDE

These conversons may depend on the system codepage, character set, font and language set in the HHP file, but I'm just guessing here.

There are a few known bugs that are described in the following few paragraphs.

An 0x01 in a word causes the first whitespace character at the end of the word to be included in the word and if the next character is non-whitespace the word is joined to the next word. If the word begins with 0-9 then the word is terminated before the 0x01 and a new word begins at the 0x01. This bug affects the fields in the initial header. For example: abcd0x1efghi-foobar is converted to abcd0x1efghi-foobar. abcd0x1efghi- foobar is converted to abcd0x1efghi- and foobar. 0bcd0x1efghi-foobar is converted to 0bcd and 0x1efghi-foobar. 0bcd0x1efghi- foobar is converted to 0bcd, 0x1efghi- and foobar.

Weird bug where if the word is 16 characters in length then the word is doubled plus the first 7 chars in length.

There is a weird feature that if a word starts with 0-9 then it may contain multiple periods (0x2E = '.') or commas (0x2C = ',') embedded in the word before the non-period, non-comma word terminating character. I think this feature is so that the user can search for version numbers or numbers with a decimal point or thousands separator in them. Note that commas are removed from the word, while periods are not. For example v1.1.234.5......,6 will become v1 and 1.234.5......6.

There is a weird bug involving words ending in single quote (') being forgotten when the same word is also normal and also ending in a period (.).

There are probably many more hidden bugs and features in the word converter (I think its the the ITIR.StdWordBreaker class in ITIRCL.DLL).

5.3.14. `$OBJINST`

From the name and the number of GUIDs present I guess it has something to do with ActiveX objects. Seems like it can be deleted without major consequence.

Table 5.29. The format of the $OBJINST header.

Offset	Type	Comment/Value
0	DWORD	0x04000000 (unknown)
4	DWORD	Number of entries

This is followed by an listing, and each listing entry is as follows

Table 5.30. The format of the $OBJINST listing entries.

Offset	Type	Comment/Value
0	DWORD	Offset of the entry in this file
4	DWORD	Length of the entry

The listing is followed by the entries one after another at offsets specified in the listing.

There are 2 known types of entries. The first seems to be made up of up to 3 different sub entries. The second is a 36 BYTE structure.

Table 5.31. The first entry

Offset	Type	Comment/Value
0	GUID	{4662DAAF-D393-11D0-9A56-00C04FB68BF7}
0x10	DWORD	0x04000000 (unknown) Possibly a big-endian version number of the class that the GUID refers to.
0x14	DWORD	Unknown. Methinks bitflags that somehow affect the size of entries that have the 0x04000000 DWORD, like each bit specifies the presence/absence of a specific subentry.
0x18	DWORD	Windows code page identifier (usually 1252 - Windows 3.1 US (ANSI))
0x1C	DWORD	LCID from the HHP file.
0x20	BYTEs	Unknown
+0	Entries

I haven't been able to find any files without the data for bits 0 & 1 so I can't really say exactly how big the header is and which bytes are part of the bit 0 block and which are part of the bit 1 block. Together, though, bits 0 & 1 account for a large bulk of repeatedly increasing byte blocks of 10 bytes each, plus something else at the end. I suspect that the repeats are for bit 0 and the stuff at the end is bit 1. As to the function of these two bits blocks, well there are no GUIDs and no other clues, so who knows.

Table 5.32. bit 2. Only present when "Full text search stop list file" has been specified in the HHP.

Offset	Type	Comment/Value
0	char[4]	""(\0
4	DWORD	Length in bytes of the entries not including the last zero word.
8	BYTE[32]	0 (unknown)
0x28	Entries. The last entry has a zero length word.

Table 5.33. bit 2 entries

Offset	Type	Comment/Value
0	WORD	Length of the word
2	char[length]	ANSI/UTF-8 string from the stop list file, may be uniqified & sorted reverse alphabetically. Not NT.

Table 5.34. bit 3

Offset	Type	Comment/Value
0	GUID	{8FA0D5A8-DEDF-11D0-9A61-00C04FB68BF7}
0x10	DWORD	0x04000000 (unknown) Possibly a big-endian version number of the class that the GUID refers to.
0x14	DWORD	1 (unknown)
0x18	DWORD	Windows code page identifier (usually 1252 - Windows 3.1 US (ANSI))
0x1C	DWORD	LCID from the HHP file.
0x20	DWORD	0 (unknown)

Table 5.35. The second entry

Offset	Type	Comment/Value
0	GUID	{4662DAB0-D393-11D0-9A56-00C04FB68B66}
0x10	DWORD	666 (May represent the version of the class that the GUID refers to)
0x14	DWORD	Windows code page identifier (usually 1252 - Windows 3.1 US (ANSI))
0x18	DWORD	LCID from the HHP file.
0x1C	DWORD	Unknown. Almost always 10031. Also 66631 (`accessib.chm` from the MSDN).
0x20	DWORD	0 (unknown)

5.3.15. `$WWAssociativeLinks` & `$WWKeywordLinks`

5.3.15.1. Preface

The files in the $WWAssociativeLinks and $WWKeywordLinks directories have the same formats. The maximum total length (including parents) of an entry in one of these files is 488 characters (including NT). HHW complains about and refuses to output any that are greater than this length.

The $WWKeywordLinks dir specifies the contents of the Index navigation pane & the $WWAssociativeLinks dir specifies the Alinks.

5.3.15.2. `BTree`

In CHW files this is named BTREE and in CHI/CHM files it is named BTree.

This file has a 76 byte header, then 2048 byte blocks. First come all the listing blocks, then all the index blocks. This file is similar to the directory entries in the ITSF format, except that the index blocks are at the end instead of interspersed with the listing blocks. All block indices below are zero based. This file forms a tree, with the last (index mostly) block being the root of the tree. If there is more than one level of index blocks then the root block will have two children; the first in the block header and the second in the entry. WARNING: just as in the ITSF directory there can be garbage in the free space, so respect that first WORD and use it. I'm not yet sure how the listing blocks are split up, though it is probably the same as the ITSF directory (space filling).

Table 5.36. The format of the BTree header.

Offset	Type	Comment/Value
0	char[2]	;) (0x3B 0x29) (signature)^[a]
2	WORD	^[a]Flags. Bit 0x2 always 1. Bit 0x0400 1 if directory?? (this is always on)
4	WORD	Size of the blocks (2048)
6	BYTE[16]	Always X44. ^[a]It is likely to be a string describing format of data 'L' = DWORD (indexed) 'F' = NUL-terminated string (indexed) 'i' = NUL-terminated string (indexed) '2' = WORD '4' = DWORD 'z' = NUL-terminated string '!' = DWORD count value, count/8 * record DWORD filenumber DWORD TopicOffset
0x16	DWORD	0 (unknown)
0x1A	DWORD	Index of the last listing block in the file.
0x1E	DWORD	Index of the root block in the file.
0x22	DWORD	-1 (unknown)
0x26	DWORD	Number of blocks
0x2A	WORD	The depth of the tree of blocks (1 if no index blocks, 2 one level of index blocks, ...)
0x2C	DWORD	Number of keywords in the file.
0x30	DWORD	Windows code page identifier (usually 1252 - Windows 3.1 US (ANSI))
0x34	DWORD	LCID from the HHP file.
0x38	DWORD	0 if this a `BTREE` and is part of a CHW file, 1 if it is a `BTree` and is part of a CHI or CHM file
0x3C	DWORD	Unknown. Almost always 10031. Also 66631 (`accessib.chm`, `ieeula.chm`, `iesupp.chm`, `iexplore.chm`, `msoe.chm`, `mstask.chm`, `ratings.chm`, `wab.chm`).
0x40	DWORD	0 (unknown)
0x44	DWORD	0 (unknown)
0x48	DWORD	0 (unknown)
^[a]These were guessed from the documentation provided with helpdeco by Manfred Winterhoff.

Table 5.37. The format of the BTree listing blocks header.

Offset	Type	Comment/Value
0	WORD	Length of free space at the end of the block.
2	WORD	Number of entries in the block.
4	DWORD	Index of the previous block. -1 if this is the first listing block.
8	DWORD	Index of the next block. -1 if this is the last listing block.

Table 5.38. The format of the BTree listing block entries.

Offset	Type	Comment/Value
0	WCHARs	Value of the first Name entry from the HHK UTF-16/UCS-2. If this is a sub-keyword, then this will be all the parent keywords, including this one, separated by ", ". UTF-16/UCS-2 NT.
+0	WORD	2 if this keyword is a See Also keyword, 0 if it is not.
+2	WORD	Depth of this entry into the tree.
+4	DWORD	Character index of the last keyword in the ", " separated list.
+8	DWORD	0 (unknown)
+0xC	DWORD	Number of Name, Local pairs
+0x10	DWORDs or WCHARs	DWORDs:Index into the `#TOPICS` file. UTF-16/UCS-2 NT string: The value of the See Also string.
+0	DWORD	Mostly 1 (unknown)
+4	DWORD	Zero based index of this entry in the file (not block). Increments by 13 (each entry is 13 more than the last).

Table 5.39. The format of the BTree index blocks header.

Offset	Type	Comment/Value
0	WORD	Length of free space at the end of the block.
2	WORD	Number of entries in the block.
4	DWORD	Index of a child block.

Table 5.40. The format of the BTree index block entries.

Offset	Type	Comment/Value
0	WCHARs	Value of the first Name entry from the HHK. If this is a sub-keyword, then this will be all the parent keywords, including this one, separated by ", ". UTF-16/UCS-2 NT.
+0	WORD	2 if this keyword is a See Also keyword, 0 if it is not.
+2	WORD	Depth of this entry into the tree.
+4	DWORD	Character index of the last keyword in the ", " separated list.
+8	DWORD	0 (unknown)
+0xC	DWORD	Number of Name, Local pairs
+0x10	DWORDs or WCHARs	DWORDs:Index into the `#TOPICS` file. UTF-16/UCS-2 NT string: The value of the See Also string.
+0	DWORD	Index of a child block. If it is a listing block then it is the one that starts with the keyword at the start of this entry

5.3.15.3. `Data`

In CHW files this is named DATA and in CHI/CHM files it is named Data.

This file contains entries that are 13 bytes in length. All known entries have thus far contained the following bytes: 00000000 05000000 80000000 00. AFAICS this file is useless.

5.3.15.4. `Map`

In CHW files this is named MAP and in CHI/CHM files it is named Map.

Begins with a WORD indicating the number of entries in the file (also the number of listing blocks in the BTree file). Each entry is 2 DWORDs. The first is a cumulative sum of the number of keywords in the BTree listing blocks & the second is a consecutively increasing index number. Both start at zero.

5.3.15.5. `Property`

In CHW files this is named PROPERTY and in CHI/CHM files it is named Property.

If there are no links of this type in the CHM then this will be a zero DWORD. Othewise it contains the following DWORDs: 0, 0, 0, 0xC, 1, 1, 0, 0. AFAICS this file is pretty much useless.

5.3.16. `$HHTitleMap`

The file begins with a WORD indicating the number of entries.

Each entry has the following format:

Table 5.41. The format of the $HHTitleMap entries.

Offset	Type	Comment/Value
0	WORD	Length of the file stem.
2	BYTEs	File stem. ANSI/UTF-8 string. Not NT.
+0	DWORD	Unknown.
+4	DWORD	Unknown. Same value as previous DWORD.
+8	DWORD	LCID of the specified file.

5.3.17. `$TitleMap`

The file begins with a WORD indicating the number of entries.

Each entry is 68 BYTEs in length and has the following format:

Table 5.42. The format of the $TitleMap entries.

Offset	Type	Comment/Value
0	BYTE[25]	File stem. ANSI/UTF-8 NT fixed length string.
0x19	BYTE[25]	Unknown. Seems to be RAM litter, but contains paths, file names, zero bytes, DWORDs and mixtures.
0x32	WORD	An index number that begins at 1 and is incremented by 1 for each entry.
0x34	DWORD	Unknown.
0x38	DWORD	Unknown. Same value as previous DWORD.
0x3C	DWORD	LCID of the specified file.
0x40	DWORD	Number of topic nodes including the contents & index files in the specified file.

5.3.18. `/Path/file.chm/windowtype`

It is a cache of user customized bits of the windowtype entry from the #WINDOWS file of the \Path\file.chm CHM file.

Table 5.43. The format of the windowtype entries.

Offset	Type	Comment/Value
0	DWORD	Size of the file in bytes (44)
4	Signed DWORD	Position of the left edge of the window.
8	Signed DWORD	Position of the top edge of the window.
0xC	Signed DWORD	Position of the right edge of the window.
0x10	Signed DWORD	Position of the bottom edge of the window.
0x14	DWORD	Width of the navigation pane in pixels.
0x18	DWORD	Non-zero if search highlight is on.
0x1C	DWORD	Unknown. Not font size, printing options or show state.
0x20	DWORD	Non-zero if there is no text of the toolbar buttons.
0x24	DWORD	Non-zero if the navigation pane is initially closed.
0x28	DWORD	Which navigation tab is currently open.

5.3.19. `/Path/file.chm/AdvSearchUI/Keywords`

UTF-16/UCS-2 NT string. Each search item is separated by a UTF-16/UCS-2 Line Feed character. The string is followed by an unknown WORD.

5.3.20. `/Path/file.chm/AdvSearchUI/Properties`

DWORD. Only the lowest 3 bits are used, the other bits are unknown. Match similar words is controlled by bit 0. Search titles only is controlled by bit 1. Search previous results is controlled by bit 2. Note that since previous search results are not stored anywhere as yet HH will uncheck the Search previous results checkbox even if its bit is on. IMHO this is a bug: HH should automatically search the whole file if there are no previous results and the checkbox is checked.

5.3.21. `/Path/file.chm/Bookmarks/v1/Count`

A DWORD indicating the number of favourites stored for the \Path\file.chm CHM file.

5.3.22. `/Path/file.chm/Bookmarks/v1/n/Topic`

An NT UTF-16/UCS-2 string showing the topic name of bookmark number n (n is zero based).

5.3.23. `/Path/file.chm/Bookmarks/v1/n/Url`

An NT UTF-16/UCS-2 string showing the URL of bookmark number n (n is zero based). It is a fully qualified path into the \Path\file.chm CHM file.

5.3.24. `#KEY_DELETED`

A set of ANSI/UTF-8 NT strings indicating which internal files have been deleted in the new file. Names use backslash (\) instead of forward slash (/) & don't have an initial slash.

5.3.25. `#KEY_DATA`

Table 5.44. The format of the #KEY_DATA file.

Offset	Type	Comment/Value
0	DWORD[8]	Unknown.
0x20	DWORD	Length of the name of the old chm.
0x24	BYTEs	Name of the old chm. ANSI/UTF-8 NT.