Mixp Manual

This is Mixp, an XML Parser for Guile, written as an interface to James Clark’s expat library. The documentation is, of course, incomplete, and the interface is subject to change. However, it should be sufficient to get started. This documentation was last updated on 3 February 2020 and covers Mixp version 0.9.

1 Introduction
2 (mixp expat) Reference
3 (mixp utils) Reference
4 (mixp simit) Reference
GNU FDL
Index

1 Introduction

Mixp is a Scheme interface to James Clark’s expat library¹. It may be used to parse XML documents with Guile.

If you do not know expat, first have a look at the sample program See Sample programs. Typically, you will create a parser object with parser-create, then associate one or more handlers to it with hset!, then parse the document with parse. The handlers work by side-effect, so unless the algorithm is pure input/output, most likely you will also need to retrieve the state they maintain, after the parse. See (mixp expat) Reference.

If you happen to know expat already, you will find easily what you are looking for by taking a C expat function name, removing the XML_, using hyphens instead of capital letters to separate the words, and searching it in the reference documentation. In most cases, the prototype is the same, modulo the differences between C and Scheme.

1.1 Sample programs

The following sample program reads an XML file (provided with the Mixp distribution), and displays its start and end tags. You can launch a Guile shell from the test/ directory of the distribution, and execute this code. Your GUILE_LOAD_PATH variable should contain the directory in which you installed Mixp (that is, the directory which contains the mixp/ subdirectory).

(use-modules ((mixp expat) #:prefix E:)
             ((mixp utils) #:prefix U:))

(define (trace prefix)
  (lambda (name . ignored)
    (display prefix)
    (display name)
    (newline)))

;; Create the parser object.
(let ((parser (E:parser-create)))
  ;; Specify handlers.
  (E:hset! parser
           'element-start (trace "start ")
           'element-end   (trace "end "))
  ;; Parse the file.
  (call-with-input-file "REC-xml-19980210.xml"
    (lambda (port)
      (U:parse-data port parser))))

For more information about the Expat interface and handlers, See (mixp expat) Reference.

The following sample program builds a hierarchical tree structure from an XML document which resides in a string. This tree structure should be easy to use with traditional Scheme procedures.

(use-modules ((mixp utils) #:prefix U:))

(let ((xml-doc "<foo name='Paul'><bar>Some text</bar><void/></foo>"))
  (call-with-input-string xml-doc U:xml->tree))
⇒
((element ("foo" (("name" . "Paul")))
   (element ("bar" ())
     (character-data "Some text"))
   (element ("void" ()))))

For more information about this interface, See (mixp utils) Reference.

1.2 Loading Mixp

From the Guile shell or from a Guile script, you should type the following commands before using the Mixp API:

(use-modules (mixp expat))
(use-modules (mixp utils))

Actually, you may load just (mixp expat) if you intend to use only the raw expat interface (see (mixp expat) Reference). You need (mixp utils) if you want to use the extension procedures (see (mixp utils) Reference).

1.3 Mixp components

Mixp contains two Scheme modules:

(mixp expat) is the low-level interface to expat. It doesn’t stay as close as possible to the expat API, but someone who already knows expat will be able to note gross similarities.
(mixp utils) contains additional procedures that might be useful. For example, if you need to parse an XML file, you can use call-with-input-file and parse-data instead of parse (from (mixp expat)). This module may evolve into a higher-level interface, for example an object-based interface.

From another point of view, Mixp contains two files in a directory mixp, which in turn lives in a directory somewhere along your GUILE_LOAD_PATH.

1.4 How to...

This section describes a few common tasks which may be solved with Mixp.

Check that a document is well-formed.
Use parse-data from (mixp utils) without specifying a parser. A default one will be created, and it will do nothing interesting except raise errors if there is any error in the document:
```
(call-with-input-string "<doc><elem></elem>" parse-data)
```
See (mixp utils) Reference.

Retrieve the content of an element.

Suppose you want to retrieve the text contained between an opening tag and the matching closing tag. You may do that by using an element-handler and a character-data-handler together. The following code will retrieve the text between <title> and </title in an XML document:

(use-modules ((mixp expat) #:select E:)
             ((mixp utils) #:select U:))

(let ((parser (E:parser-create))
      (in-title? #f) ; becomes #t inside the tag
      (title ""))    ; will contain the result

  (define (toggle sense)
    (lambda (name . ignored)
      (and (string=? "title" name)
           (set! in-title? sense))))

  (define (handle-character-data value)
    (and in-title?
         (set! title (string-append title value))))

  (E:hset! parser
           'element-start  (toggle #t)
           'element-end    (toggle #f)
           'character-data handle-character-data)
  (call-with-input-string
   "<doc><title>Hello</title></doc>"
   (lambda (port)
     (U:parse-data port parser)))
  (display title)
  (newline))

Build a tree structure from an XML document.

Use xml->tree.

(use-modules ((mixp utils) #:prefix U:))
(call-with-input-file "file.xml" U:xml->tree)

See (mixp utils) Reference.

Read the external DTD.

The following program will read an XML document in foo.xml, parse the DTD which may be referenced in the DOCTYPE declaration, and expand the entities.

(use-modules ((mixp expat) #:prefix E:)
             ((mixp utils) #:prefix U:))

(define (fso s . args)
  (apply simple-format #t s args))

;; Create the parser object.
(let ((parser (E:parser-create)))

  (define (xref-h context base system-id public-id)
    (fso "Ref to external entity: ~A.~%" system-id)
    (open-input-file system-id))

  (E:set-param-entity-parsing
   parser 'XML_PARAM_ENTITY_PARSING_ALWAYS)

  ;; Specify callback functions.
  (E:hset! parser
           'character-data (lambda (value)
                             (fso "Char: ~A.~%" value))
           'external-entity-ref xref-h)

  ;; Parse the file.
  (call-with-input-file "foo.xml"
    (lambda (port)
      (U:parse-data port parser))))

Specify handlers to be called in the DTD.
You may want to define handlers to be called when Mixp parses the DTD and finds an element declaration or an attribute list declaration. Unfortunately, this is not possible, due to limitations in expat.

However, you may try to use the default handler (see Expat handlers). If DTD reading is enabled (see the previous item), then the default handler will be called repeatedly while reading the DTD, and will receive each time a part of the DTD. However, there is no guarantee about what part of the DTD it will receive each time. Building a representation of the DTD would be possible with the default handler, but not easy.

1.5 Bugs and suggestions

Please send bug reports to <guile-user@gnu.org>. We always appreciate feedback about Mixp, and suggestions about what could be improved in the interface.

2 (mixp expat) Reference

This chapter describes the libexpat interface, i.e., the (mixp expat) module. The interface has been modified to be more “Schemey”; it does not correspond one-to-one with libexpat. Notably, things are more symbolic and “condensed”.

Procedure: expat-version ¶

Return a list containing version info of this expat. The list has the form: (major minor micro string) where major, minor, micro are integers; and string is a string that begins w/ "expat_" and ends with the major, minor, micro numbers in dotted notation.

(expat-version) ⇒ (2 2 6 "expat_2.2.6")

Procedure: get-feature-list ¶

Return an alist describing the features of the underlying libexpat. The alist keys are strings, one or more of:

XML_UNICODE
XML_UNICODE_WCHAR_T
XML_DTD
XML_CONTEXT_BYTES
XML_MIN_SIZE
sizeof(XML_Char)
sizeof(XML_LChar)
XML_NS
XML_LARGE_SIZE
XML_ATTR_INFO

The values are absent if the feature is a simple one, otherwise some feature-specific positive integer.

Additionally, the first pair has key EXPAT_VERSION and value a number which (in hex) represents the version numbers of the underlying libexpat. For example:

131585 ⇒ #x020201 ⇒ version 2.2.1

2.1 Symbols

Expat uses C #defines and enums to operate symbolically. For (mixp expat), we use Scheme symbols directly.

status

Several procedures return a symbolic status, one of the set:

XML_STATUS_ERROR
XML_STATUS_OK
XML_STATUS_SUSPENDED

error code

Here are all the symbolic error codes (see Doing a parse), presented without the ‘XML_ERROR_’ prefix.

;; since the beginning           FEATURE_REQUIRES_XML_DTD
NONE                             CANT_CHANGE_FEATURE_ONCE_PARSING
NO_MEMORY
SYNTAX                           ;; added in 1.95.7
NO_ELEMENTS                      UNBOUND_PREFIX
INVALID_TOKEN
UNCLOSED_TOKEN                   ;; added in 1.95.8
PARTIAL_CHAR                     UNDECLARING_PREFIX
TAG_MISMATCH                     INCOMPLETE_PE
DUPLICATE_ATTRIBUTE              XML_DECL
JUNK_AFTER_DOC_ELEMENT           TEXT_DECL
PARAM_ENTITY_REF                 PUBLICID
UNDEFINED_ENTITY                 SUSPENDED
RECURSIVE_ENTITY_REF             NOT_SUSPENDED
ASYNC_ENTITY                     ABORTED
BAD_CHAR_REF                     FINISHED
BINARY_ENTITY_REF                SUSPEND_PE
ATTRIBUTE_EXTERNAL_ENTITY_REF
MISPLACED_XML_PI                 ;; added in 2.0
UNKNOWN_ENCODING                 RESERVED_PREFIX_XML
INCORRECT_ENCODING               RESERVED_PREFIX_XMLNS
UNCLOSED_CDATA_SECTION           RESERVED_NAMESPACE_URI
EXTERNAL_ENTITY_HANDLING
NOT_STANDALONE                   ;; added in 2.2.1
UNEXPECTED_STATE                 INVALID_ARGUMENT
ENTITY_DECLARED_IN_PE

2.2 Parser

Everything revolves around the parser object. This section describes procedures to create and query such objects.

Procedure: parser-create [encoding] ¶: Return a new parser object. Optional arg encoding is a string specifying the encoding to use (for example, "UTF-8").

Procedure: parser-create-ns [encoding [namespace-separator]] ¶

Optional arg encoding is a string specifying the encoding to use. Second optional arg namespace-separator is a character used to separate the namespace part from the local part (e.g., #\:).

Note: Using this proc (instead of parser-create) enables dispatch to the namespace-decl-start and namespace-decl-end handlers.

Procedure: parser? obj ¶: Return #t if obj is an XML-Parser object, otherwise #f.

Procedure: get-locus parser [stash] ¶

Return “current” locus information for parser as a vector of four elements (all non-negative integers):

#(LINE COLUMN BYTE-COUNT BYTE-INDEX)

Optional arg stash specifies a vector to fill in rather than constructing a new one. If an element in stash is #f, the respective slot is skipped (it remains #f).

2.3 Expat handlers

You must specify a set of handlers, or callback procedures, for the parser to call when it encounters specific situations while processing its input. The handler name is a symbol. Unlike libexpat, there is one centralized procedure for setting and one for getting the set.

Procedure: hset! parser [plist…] ¶

Set handlers for parser as specified in plist, a list of alternating handler names (symbols) and values. Valid values are a procedure, () (the empty list) or #f. Note, however, that no arity checks are done on the procedures.

As a special backward-compatible case, if the first key is a full alist, use that instead to specify the handlers to set, and ignore the rest of the args. NB: Support for this calling convention WILL BE REMOVED by the 1.0 release of Mixp.

Procedure: hget-one parser handler ¶: Return the procedure set as parser’s handler (a symbol). If none is set, return #f.

NB: The following procedure WILL BE REMOVED by the 1.0 release of Mixp.

Procedure: hget parser ¶: Return an alist representing the handler set of parser. If a particular handler is not specified, that pair’s CDR will be #f. The alist keys are handler names.

In the following description, the handler’s name is followed by the arguments that it will be called with. These are normally string values unless otherwise noted.

Handler: element-start name attributes ¶

This handler is called when expat sees an element start. attributes is an alist whose keys and values are all strings.

<foo a="1" b="2">
      name ⇒ "foo"
attributes ⇒ (("a" . "1") ("b" . "2"))

Handler: element-end name ¶

Likewise, for element end.

</foo>
name ⇒ "foo"

Handler: character-data data ¶: This handler is called for normal text (outside ‘<>’ tags). data should never be the empty string. It is encoded in UTF-8.

Handler: processing-instruction target pi-data ¶

This handler is called for every processing instruction (<? ... ?>).

<?a   b c d e f  ?>
 target ⇒ "a"
pi-data ⇒ "b c d e f  "
;; Note the trailing whitespace.

Handler: comment comment ¶

This handler is called for comments ().

<!-- This is a comment.  -->
comment ⇒ " This is a comment.  "
;; Note the surrounding whitespace.

Handler: cdata-section-start ¶
Handler: cdata-section-end ¶: This handler is called for CDATA sections (<![CDATA[ ... ]]>).

Handler: default data ¶

Handler: default-expand data ¶

Both default and default-expand specify the default handler. The difference regards processing of internal entities.

Using default inhibits expansion of internal entities; they are passed, instead, to the handler.
Using default-expand does not inhibit their expansion; they are not passed to the handler.

The default handler is called for any characters in the XML document for which there is no applicable handler. This includes both characters that are part of markup which is of a kind that is not reported (comments, markup declarations), or characters that are part of a construct which could be reported but for which no handler has been supplied. The characters are passed exactly as they were in the XML document except that they will be encoded in UTF-8.

Line boundaries are not normalized. Note that a byte order mark character is not passed to the default handler. There are no guarantees about how characters are divided between calls to the default handler: for example, a comment might be split between multiple calls.

Handler: doctype-decl-start doctype-name sysid pubid has-internal-subset? ¶: This handler is called for the start of the DOCTYPE declaration, before any DTD or internal subset is parsed.

Handler: doctype-decl-end ¶: This handler is called for the start of the DOCTYPE declaration when the closing ‘>’ is encountered, but after processing any external subset.

Handler: entity-decl entity-name is-parameter-entity? value base system-id public-id notation-name ¶

This handler is called for entity declarations. The arg is-parameter-entity? will be #t if the entity is a parameter entity, #f otherwise.

For internal entities (<!ENTITY foo "bar">), value will be non-#f and system-id, public-id, and notation-name will be #f. Since it is legal to have zero-length values, do not use this argument to test for internal entities.

For external entities, value will be #f and system-id will be non-#f. The public-id argument will be #f unless a public identifier was provided. The notation-name argument will have a non-#f value only for unparsed entity declarations.

Handler: unparsed-entity-decl entity-name base system-id public-id notation-name ¶

NB: This handler is obsolete; use entity-decl instead. It WILL BE REMOVED by the 1.0 release of Mixp.

This handler is called for unparsed entity declarations (<!ENTITY ...>). The entity-name, system-id and notation-name arguments will never be #f. The other arguments may be. The base argument is whatever was set by set-base. (see Expat misc).

<!ENTITY Antarctica SYSTEM 'http://www.antarctica.net'
         NDATA vrml>
  entity-name ⇒ "Antarctica"
    system-id ⇒ "http://www.antarctica.net"
    public-id ⇒ #f
notation-name ⇒ "vrml"

Handler: notation-decl notation-name base system-id public-id ¶

This handler is called for notation declarations (<!NOTATION ...>). Except notation-name, some of the args may be #f. The base argument is whatever was set by set-base. (see Expat misc).

<!NOTATION vrml PUBLIC 'VRML 2'>
notation-name ⇒ "vrml"
    system-id ⇒ #f
    public-id ⇒ "VRML 2"

Handler: namespace-decl-start prefix uri ¶

Handler: namespace-decl-end prefix ¶

When namespace processing is enabled (i.e., the parser was created with parser-create-ns), these are called once for each namespace declaration. The call to the start and end element handlers occur between the calls to the start and end namespace declaration handlers. prefix may be #f.

<html xmlns="http://www.w3.org/1999/xhtml"
      xml:lang="en" lang="en">
prefix ⇒ #f
   uri ⇒ "http://www.w3.org/1999/xhtml"

Handler: not-standalone ¶: This handler is called if the document is not standalone (it has an external subset or a reference to a parameter entity, but does not have ‘standalone="yes"’). If this handler returns #f, then processing will not continue, and the parser will return a XML_ERROR_NOT_STANDALONE error.

Handler: external-entity-ref context base system-id public-id ¶

Some of the args may be #f. The base argument is whatever was set by set-base. (see Expat misc).

This handler is called when the parser finds a reference to an external entity in the document. For example, the <!DOCTYPE ...> declaration contains an external entity reference when it specifies an external DTD. In that case, you should also call set-param-entity-parsing (see Expat misc), because you probably want the parser to expand the references to entities declared in your DTD. For an example, See How to....

The external entity reference handler should return an open port to the external entity. For example, assuming that system-id refers to a relative file path, you may define the handler as follows:

(lambda (context base system-id public-id)
  (open-input-file system-id))

The system identifier (system-id) is defined by the XML specification as a URI. Therefore, the example above will only work if you know that the system id is actually a filename. You may need to use, for example, some kind of http client library if you want to support URIs which start with ‘http://’.

Note that the behaviour of this handler is very different in expat.

Handler: skipped-entity entity-name is-parameter-entity? ¶

This handler is called in two situations:

An entity reference is encountered for which no declaration has been read and this is not an error.
An internal entity reference is read, but not expanded, because default-handler has been used.

Note: Skipped parameter entities in declarations and skipped general entities in attribute values cannot be reported, because the event would be out of sync with the reporting of the declarations or attribute values.

Handler: unknown-encoding name ¶

This handler is called when the parser does not recognize the declared encoding of a document. It should use make-xml-encoding (see Expat misc), if it thinks that it can teach the parser to decode name.

NB: Unknown encoding handlers have not been really tested, so they probably don’t work for now.

2.4 Encodings

Expat supports the following encodings: UTF-8, UTF-16, ISO-8859-1, US-ASCII.

The encoding is usually indicated in the first line of an XML file (the <?xml... ?> declaration). But every data you will receive in your handlers (tag names, attributes, character data...), will be encoded in UTF-8, whatever the original encoding was. UTF-8 represents ASCII characters with no modification, but represents other characters with multi-byte characters. ISO-8859-1 has better support in standard editors, but is too euro-centric.

The encoding features of expat are not completely supported in Mixp. Using unknown encoding handlers will not work, or at least I have not tested that feature. However, XML documents which encoding (as specified in the <?xml... ?> declaration) is supported by expat should be parsed correctly. For example, you should get an error if you parse a document which claims to be US-ASCII but contains 8-bit characters.

Procedure: set-encoding parser encoding ¶

Set the encoding for parser to encoding. Return XML_STATUS_OK for success. This is like calling parser-create (see Parser) with encoding as the first arg.

NB: Calling set-encoding after parse or parse-buffer has no effect and returns XML_STATUS_ERROR.

2.5 Expat misc

Procedure: default-current parser ¶: Declare that parser is the default current parser.

Procedure: set-hash-salt parser salt ¶: Set the parser’s hash salt to salt, an unsigned integer. This number is used for internal hash calculations. Setting it helps prevent DoS attacks based on predicting hash function behavior. Return #t if successful (called before parsing has started), #f otherwise.

Procedure: set-base parser base ¶: Set base for parser to base (a string). Return a symbolic status.

Procedure: get-base parser ¶: Return the base (a string) of parser. If none is set, return #f.

Procedure: get-specified-attribute-count parser ¶: Get the specified attribute count for parser.

Procedure: get-attribute-info parser ¶

Return a 2-D array describing the attributes of the current element. Each element of the array is an unsigned integer representing the byte offset of the attribute and value start and end positions. The end positions are “one past the last byte”. The array has the form:

attr1-attr-start  attr1-attr-end  attr1-val-start  attr1-val-end
attr2-attr-start  attr2-attr-end  attr2-val-start  attr2-val-end
      ...               ...             ...              ...
attrN-attr-start  attrN-attr-end  attrN-val-start  attrN-val-end

The attribute count (number n) is half the value that get-sepecified-attribute-count returns.

API Note: If the underlying libexpat is prior to 2.1.0, or if it doesn’t provide XML_GetAttributeInfo, return simply #f. You can determine this from the absence of XML_ATTR_INFO from the return value of get-feature-list (see (mixp expat) Reference).

Procedure: set-param-entity-parsing parser code ¶

Set entity parsing for parser to code (a symbol). Valid values for code are:

XML_PARAM_ENTITY_PARSING_NEVER
XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE
XML_PARAM_ENTITY_PARSING_ALWAYS

This controls parsing of parameter entities (including the external DTD subset). See /usr/include/expat.h for more information.

Procedure: make-xml-encoding map convert release ¶

Return a new XML-Encoding object.

map is a vector of length 256. Each element is an integer specifying how many bytes are required to decode a multibyte “character” whose first byte is that element’s index.

convert is a proc that takes one arg, a unibyte string. It should return the "Unicode scalar value" of the string, or -1 if the byte sequence is malformed.

release is a thunk the parser calls when done all conversion work.

2.6 Doing a parse

After all the set up (see Expat handlers), you will want to apply the parser to some input. This section describes two procedures to do that, as well as two procedures to help you understand things better when All Does Not Go Well.

Procedure: parse parser s [finalp] ¶: Use parser to parse string s. Optional third arg finalp, if non-#f, means this call is the last parsing to be done on s. Return a symbolic status.

Procedure: parse-buffer parser len [finalp] ¶: Use parser to parse len bytes of the internal buffer. Optional third arg finalp, if non-#f, means this call is the last parsing to be done. Return a symbolic status.

Procedure: stop-parser parser resumable ¶

Stop parser. Return a symbolic status. If resumable is non-#f, parsing is suspended. If #f, parsing is aborted.

This should be called from a handler (e.g., element-start). Note that some handlers will continue to be called before fully stopping (e.g., element-end).

Procedure: resume-parser parser ¶

Resume parser. Return a symbolic status, the same as for parse or parse-buffer, with the addition of XML_ERROR_NOT_SUSPENDED.

This should not be called from a handler. It should be called first on the most deeply nested child parser, then successively on the parent parser(s).

Procedure: get-parsing-status parser ¶: Return list (status final-buffer?), where status is the symbolic status of parser with respect to being initialized, parsing, finished, or suspended; and final-buffer is non-#f if processing is occurring for the final buffer.

Procedure: error-symbol parser ¶: Return a symbol corresponding to the error code for parser.

Procedure: error-string code ¶: Return a string representing the error code (a symbol). If code is not recognized, return #f.

Here is an example that uses the latter two procedures. See Symbols.

(define BAD-XML "<doc>dfssfd</do>")
;; NB: not same!  ^^^         ^^

(use-modules ((mixp expat) #:prefix E:))

(define PARSER (E:parser-create))
(define RES (E:parse PARSER BAD-XML #t))
RES ⇒ XML_STATUS_ERROR

(define ERR (E:error-symbol PARSER))
ERR ⇒ XML_ERROR_TAG_MISMATCH
(E:error-string ERR) ⇒ "mismatched tag"

2.7 Not implemented

The following functions are part of the expat interface, but are not exposed to Scheme.

C Function: XML_ParserCreate_MM ¶
C Function: XML_MemMalloc ¶
C Function: XML_MemRealloc ¶
C Function: XML_MemFree ¶: Customized memory management interaction is out of scope for Mixp. (Sorry.) That said, maybe the next maintainer will be more bold. Only time can tell…

C Function: XML_GetBuffer ¶: This kind of integration awaits widespread (and stable) Guile “array leasing” facilities.

3 (mixp utils) Reference

This chapter describes the (mixp utils) module, which provides high-level extensions to the raw expat interface.

Procedure: parse-data port [parser] ¶: Read all bytes from port (until it yields the EOF object), and throw an error if the input does not represent a valid XML document.

The next three procedures can operate on a source (first arg identically named from) of input. Portability Note: When from is a string, Guile 2 and later use open-input-string with the fluid %default-port-encoding set to “binary” input (i.e., ISO-8859-1).

Procedure: utf8->latin1 from ¶: Convert the byte list from from UTF-8 to Latin-1. Throw invalid-utf8 if from is not a valid UTF-8 stream, and no-latin1 if one of the characters is a multi-byte character (and thus cannot be a Latin-1 character). If from is a string, return a string. If from is a list, return a list.

Procedure: utf8->ucs2 from ¶: Convert a UTF-8 string, such as those returned by the parser, to a UCS-2 list. from may be a string or a list. Return a list whose elements are sub-lists with length two, each encoding a character from the original stream. Throw a no-ucs2 error if one of the characters decoded from the UTF-8 string is not a UCS-2 character.

Procedure: utf8->ucs4 from ¶: Convert a UTF-8 string, such as those returned by the parser, to a UCS-4 stream. from may be a string or a list. Return a list whose elements are sub-lists with length four, each encoding a character from the original stream.

Procedure: xml->tree port [parser] ¶

Build a tree data structure from the XML document read from port. Each XML element produces a new branch in the tree. Optional arg parser specifies another parser to use. The internal parser uses element start (and end), character-data, notation-decl, processing-instruction and comment handlers.

For example, consider this sample XML document:

<foo name='Paul'><bar>Some text</bar><void/></foo>

Here is the data structure produced by xml->tree:

(element ("foo" (("name" . "Paul")))
  (element ("bar" ())
    (character-data "Some text"))
  (element ("void" ())))

4 (mixp simit) Reference

SXML over SSAX is the usual way to go about things, but we don’t mind being unusual on occasion. This chapter describes the EXPERIMENTAL (mixp simit) module, which provides “SXML over Expat”, more or less. (The level of imitation is low while we figure out what the heck is going on. Later, things should align more, and weasel words like “SXML-ish” should go away…)

Procedure: from-port port namespaces ¶

Parse an XML document from port with some namespaces, a list of elements each of the form (nick uri), where nick is a symbol, and uri is a string.

The XML namespace is built-in:

(xml "http://www.w3.org/XML/1998/namespace")

If parsing is successful, return an SXML-ish tree. Otherwise, throw parse-error with two args, the symbolic reason (suitable for passing to error-string) and the location of the error as returned by get-locus.

GNU FDL

Version 1.3, 3 November 2008

Copyright © 2000, 2001, 2002, 2007, 2008 Free Software Foundation, Inc.
https://fsf.org/

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

PREAMBLE
The purpose of this License is to make a manual, textbook, or other functional and useful document free in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.

This License is a kind of “copyleft”, which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The “Document”, below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as “you”. You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.

A “Modified Version” of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.

A “Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The “Invariant Sections” are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.

The “Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.

A “Transparent” copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not “Transparent” is called “Opaque”.

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.

The “Title Page” means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, “Title Page” means the text near the most prominent appearance of the work’s title, preceding the beginning of the body of the text.

The “publisher” means any person or entity that distributes copies of the Document to the public.

A section “Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To “Preserve the Title” of such a section when you modify the Document means that it remains a section “Entitled XYZ” according to this definition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.
COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
1. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous versions (which should, if there were any, be listed in the History section of the Document). You may use the same title as a previous version if the original publisher of that version gives permission.
2. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it has fewer than five), unless they release you from this requirement.
3. State on the Title page the name of the publisher of the Modified Version, as the publisher.
4. Preserve all the copyright notices of the Document.
5. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
6. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version under the terms of this License, in the form shown in the Addendum below.
7. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice.
8. Include an unaltered copy of this License.
9. Preserve the section Entitled “History”, Preserve its Title, and add to it an item stating at least the title, year, new authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled “History” in the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page, then add an item describing the Modified Version as stated in the previous sentence.
10. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document, and likewise the network locations given in the Document for previous versions it was based on. These may be placed in the “History” section. You may omit a network location for a work that was published at least four years before the Document itself, or if the original publisher of the version it refers to gives permission.
11. For any section Entitled “Acknowledgements” or “Dedications”, Preserve the Title of the section, and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
12. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles.
13. Delete any section Entitled “Endorsements”. Such a section may not be included in the Modified Version.
14. Do not retitle any existing section to be Entitled “Endorsements” or to conflict in title with any Invariant Section.
15. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version’s license notice. These titles must be distinct from any other section titles.

You may add a section Entitled “Endorsements”, provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.

In the combination, you must combine any sections Entitled “History” in the various original documents, forming one section Entitled “History”; likewise combine any sections Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete all sections Entitled “Endorsements.”
COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
AGGREGATION WITH INDEPENDENT WORKS
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an “aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation’s users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.

If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “History”, the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.
TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, or distribute it is void, and will automatically terminate your rights under this License.

However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, receipt of a copy of some or all of the same material does not give you any rights to use it.
FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See https://www.gnu.org/licenses/.

Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License “or any later version” applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation. If the Document specifies that a proxy can decide which future versions of this License can be used, that proxy’s public statement of acceptance of a version permanently authorizes you to choose that version for the Document.
RELICENSING
“Massive Multiauthor Collaboration Site” (or “MMC Site”) means any World Wide Web server that publishes copyrightable works and also provides prominent facilities for anybody to edit those works. A public wiki that anybody can edit is an example of such a server. A “Massive Multiauthor Collaboration” (or “MMC”) contained in the site means any set of copyrightable works thus published on the MMC site.

“CC-BY-SA” means the Creative Commons Attribution-Share Alike 3.0 license published by Creative Commons Corporation, a not-for-profit corporation with a principal place of business in San Francisco, California, as well as future copyleft versions of that license published by that same organization.

“Incorporate” means to publish or republish a Document, in whole or in part, as part of another Document.

An MMC is “eligible for relicensing” if it is licensed under this License, and if all works that were first published under this License somewhere other than this MMC, and subsequently incorporated in whole or in part into the MMC, (1) had no cover texts or invariant sections, and (2) were thus incorporated prior to November 1, 2008.

The operator of an MMC Site may republish an MMC contained in the site under CC-BY-SA on the same site at any time before August 1, 2009, provided the MMC is eligible for relicensing.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following copyright and license notices just after the title page:

  Copyright (C)  year  your name.
  Permission is granted to copy, distribute and/or modify this document
  under the terms of the GNU Free Documentation License, Version 1.3
  or any later version published by the Free Software Foundation;
  with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
  Texts.  A copy of the license is included in the section entitled ``GNU
  Free Documentation License''.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the “with…Texts.” line with this:

    with the Invariant Sections being list their titles, with
    the Front-Cover Texts being list, and with the Back-Cover Texts
    being list.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Index

Jump to:	C D E F G H L M N P R S U X

	Index Entry	Section

C
	`cdata-section-end`:	Expat handlers
	`cdata-section-start`:	Expat handlers
	`character-data`:	Expat handlers
	`comment`:	Expat handlers

D
	`default`:	Expat handlers
	`default-current`:	Expat misc
	`default-expand`:	Expat handlers
	`doctype-decl-end`:	Expat handlers
	`doctype-decl-start`:	Expat handlers

E
	`element-end`:	Expat handlers
	`element-start`:	Expat handlers
	encodings:	Encodings
	`entity-decl`:	Expat handlers
	error code, symbolic:	Symbols
	`error-string`:	Doing a parse
	`error-symbol`:	Doing a parse
	`expat-version`:	Expat interface
	`external-entity-ref`:	Expat handlers

F
	`from-port`:	Imitating SXML

G
	`get-attribute-info`:	Expat misc
	`get-base`:	Expat misc
	`get-feature-list`:	Expat interface
	`get-locus`:	Parser
	`get-parsing-status`:	Doing a parse
	`get-specified-attribute-count`:	Expat misc

H
	handlers:	Expat handlers
	`hget`:	Expat handlers
	`hget-one`:	Expat handlers
	`hset!`:	Expat handlers

L
	leftovers, libexpat:	Not implemented
	libexpat leftovers:	Not implemented
	loading Mixp:	Loading Mixp

M
	`make-xml-encoding`:	Expat misc
	miscellaneous procedures, `(mixp expat)`:	Expat misc
	mixp components:	Mixp components

N
	`namespace-decl-end`:	Expat handlers
	`namespace-decl-start`:	Expat handlers
	`not-standalone`:	Expat handlers
	`notation-decl`:	Expat handlers

P
	`parse`:	Doing a parse
	`parse-buffer`:	Doing a parse
	`parse-data`:	High-level extensions
	parser object:	Parser
	parser, application to input:	Doing a parse
	`parser-create`:	Parser
	`parser-create-ns`:	Parser
	`parser?`:	Parser
	`processing-instruction`:	Expat handlers

R
	recipes:	How to...
	`resume-parser`:	Doing a parse

S
	sample programs:	Sample programs
	`set-base`:	Expat misc
	`set-encoding`:	Encodings
	`set-hash-salt`:	Expat misc
	`set-param-entity-parsing`:	Expat misc
	`skipped-entity`:	Expat handlers
	status, symbolic:	Symbols
	`stop-parser`:	Doing a parse
	SXML:	Imitating SXML
	symbolic error code:	Symbols
	symbolic status:	Symbols

U
	`unknown-encoding`:	Expat handlers
	`unparsed-entity-decl`:	Expat handlers
	`utf8->latin1`:	High-level extensions
	`utf8->ucs2`:	High-level extensions
	`utf8->ucs4`:	High-level extensions
	utilities:	High-level extensions

X
	`xml->tree`:	High-level extensions
	`XML_GetBuffer`:	Not implemented
	`XML_MemFree`:	Not implemented
	`XML_MemMalloc`:	Not implemented
	`XML_MemRealloc`:	Not implemented
	`XML_ParserCreate_MM`:	Not implemented

Jump to:	C D E F G H L M N P R S U X

Mixp Manual

Mixp Manual

Table of Contents

1 Introduction

1.1 Sample programs

1.2 Loading Mixp

1.3 Mixp components

1.4 How to...

1.5 Bugs and suggestions

2 (mixp expat) Reference

2.1 Symbols

status

error code

2.2 Parser

2.3 Expat handlers

2.4 Encodings

2.5 Expat misc

2.6 Doing a parse

2.7 Not implemented

3 (mixp utils) Reference

4 (mixp simit) Reference

GNU FDL

ADDENDUM: How to use this License for your documents

Index

Footnotes

(1)