Manpage for TXR

Jul 06, 2018


[collapse all]
8 TXR LISP [+]



TXR - Programming Language (Version 198)



txr [ options ] [ script-file [ data-files ... ]]



TXR is a programming language supporting multiple paradigms. It in fact comprises two languages integrated into a single tool: a text scanning and extraction language referred to as the TXR pattern language, or sometimes just TXR when it is clear; and a general-purpose dialect of Lisp called TXR Lisp.

A script written in the TXR pattern language is referred to in this document as a query, and it specifies a pattern which matches (a prefix of) an entire file, or multiple files. Patterns can consists of large chunks of multi-line free-form text, which is matched literally against material in the input sources. Free variables occurring in the pattern (denoted by the @ symbol) are bound to the pieces of text occurring in the corresponding positions. If the overall match is successful, then TXR can do one of two things: it can report the list of variables which were bound, in the form of a set of variable assignments which can be evaluated by the eval command of the POSIX shell language, or generate a custom report according to special directives in the query. Patterns can be arbitrarily complex, and can be broken down into named pattern functions, which may be mutually recursive. TXR patterns can work horizontally (characters within a line) or vertically (spanning multiple lines). Multiple lines can be treated as a single line.

In addition to embedded variables which implicitly match text, the TXR pattern language supports a number of directives, for matching text using regular expressions, for continuing a match in another file, for searching through a file for the place where an entire sub-query matches, for collecting lists, and for combining sub-queries using logical conjunction, disjunction and negation, and numerous others.

Furthermore, embedded within TXR is a powerful Lisp dialect. TXR Lisp supports functional, imperative and object-oriented programming, and provides data types such as symbols, strings, vectors, hash tables with weak reference support, lazy lists, and arbitrary-precision (bignum integers).

TXR Lisp features an expressive foreign function interface (FFI) for calling into libraries and other software components that support C-language-style calls.



If TXR is given no arguments, it will enter into an interactive mode. See the INTERACTIVE LISTENER section for a description of this mode. When TXR enters interactive mode this way, it prints a one-line banner is printed announcing the program name and version, and one line of help text instructing the user how to exit.

Options which don't take an argument may be combined together. The -v and -q options are mutually exclusive. Of these two, the one which occurs in the rightmost position in the argument list dominates. The -c and -f options are also mutually exclusive; if both are specified, it is a fatal error.

Bind the variable var to the value value prior to processing the query. The name is in scope over the entire query, so that all occurrence of the variable are substituted and match the equivalent text. If the value contains commas, these are interpreted as separators, which give rise to a list value. For instance -Da,b,c creates a list of the strings "a", "b" and "c". (See Collect Directive bellow). List variables provide a multiple match. That is to say, if a list variable occurs in a query, a successful match occurs if any of its values matches the text. If more than one value matches the text, the first one is taken.

Binds the variable var to an empty string value prior to processing the query.

Quiet operation during matching. Certain error messages are not reported on the standard error device (but the if the situations occur, they still fail the query). This option does not suppress error generation during the parsing of the query, only during its execution.

If this option is present, then TXR will enter into an interactive interpretation mode after processing all options, and the input query if one is present. See the INTERACTIVE LISTENER section for a description of this mode.

Invoke the interactive TXR debugger. See the DEBUGGER section.

This option affects behavior related to TXR's *std-input* stream. It also has a another, unrelated effect, on the behavior of the interactive listener; see below.

Normally, if this stream is connected to a terminal device, it is automatically marked as having the real-time property when TXR starts up (see the functions .code stream-set-prop and real-time-stream-p). The -n option suppresses this behavior; the *std-input* stream remains ordinary.

The TXR pattern language reads standard input via a lazy list, created by applying the lazy-stream-cons function to the *std-input* stream. If that stream is marked real-time, then the lazy list which is returned by that function has behaviors that are better suited for scanning interactive input. A more detailed explanation is given under the description of this function.

If the -n option is effect and TXR enters into the interactive listener, the listener operates in plain mode. The listener reads buffered lines from the operating system without any character-based editing features or history navigation. In plain mode, no prompts appear and no terminal control escape sequences are generated. The only output is the results of evaluation, related diagnostic messages, and any output generated by the evaluated expressions themselves.

Verbose operation. Detailed logging is enabled.

This option binds a Lisp global lexical variable (as if by the defparml function) to an object described by Lisp syntax. It requires an argument of the form sym=value where sym must be, syntactically, a token denoting a bindable symbol, and value is arbitrary TXR Lisp syntax. The sym syntax is converted to the symbol it denotes, which is bound as a global lexical variable, if it is not already a variable. The value syntax is parsed to the Lisp object it denotes. This object is not subject to evaluation; the object itself is stored into the variable binding denoted by sym. Note that if sym already exists as a global variable, then it is simply overwritten. If sym is marked special, then it stays special.

If the query is successful, print the variable bindings as a sequence of assignments in shell syntax that can be eval-ed by a POSIX shell. II the query fails, print the word "false". Evaluation of this word by the shell has the effect of producing an unsuccessful termination status from the shell's eval command.

-l or --lisp-bindings
This option implies -B. Print the variable bindings in Lisp syntax instead of shell syntax.

-a num
This option implies -B. The decimal integer argument num specifies the maximum number of array dimensions to use for list-valued variable bindings. The default is 1. Additional dimensions are expressed using numeric suffixes in the generated variable names. For instance, consider the three-dimensional list arising out of a triply nested collect: ((("a" "b") ("c" "d")) (("e" "f") ("g" "h"))). Suppose this is bound to a variable V. With -a 1, this will be reported as:


With -a 2, it comes out as:


The leftmost bracketed index is the most major index. That is to say, the dimension order is: NAME_m_m+1_..._n[1][2]...[m-1].

-c query
Specifies the query in the form of a command line argument. If this option is used, the script-file argument is omitted. The first non-option argument, if there is one, now specifies the first input source rather than a query. Unlike queries read from a file, (non-empty) queries specified as arguments using -c do not have to properly end in a newline. Internally, TXR adds the missing newline before parsing the query. Thus -c "@a" is a valid query which matches a line.


Shell script which uses TXR to read two lines "1" and "2" from standard input, binding them to variables a and b. Standard input is specified as - and the data comes from shell "here document" redirection:


 txr -B -c "@a
 @b" - <<!


The @; comment syntax can be used for better formatting:

  txr -B -c "@;

-f script-file
Specifies the file from which the query is to be read, instead of the script-file argument. This is useful in #! ("hash bang") scripts. (See Hash Bang Support below).

-e expression
Evaluates a TXR Lisp expression for its side effects, without printing its value. Can be specified more than once. The script-file argument becomes optional if -e is used at least once. If the evaluation of every expression evaluated this way terminates normally, and there is no script-file argument, then TXR terminates with a successful status.

-p expression
Just like -e but prints the value of expression using the prinl function.

-P expression
Like -p but prints using the pprinl function.

-t expression
Like -p but prints using the tprint function.

-C number

Requests TXR to behave in a manner that is compatible with the specified version of TXR. This makes a difference in situations when a release of TXR breaks backward compatibility. If some version N+1 deliberately introduces a change which is backward incompatible, then -C N can be used to request the old behavior.

The requested value of N can be too low, in which case TXR will complain and exit with an unsuccessful termination status. This indicates that TXR refuses to be compatible with such an old version. Users requiring the behavior of that version will have to install an older version of TXR which supports that behavior, or even that exact version.

If the option is specified more than once, the behavior is not specified.

Compatibility can also be requested via the TXR_COMPAT environment variable instead of the -C option.

For more information, see the COMPATIBILITY section.


The number argument to this option must be a decimal integer. It represents a megabyte value, the "GC delta": one megabyte is 1048576 bytes. The "GC delta" controls an aspect of the garbage collector behavior. See the gc-set-delta function for a description.

This option turns on debugging, like --debugger but also requests stepping into the auto-load processing of TXR Lisp library code. Normally, debugging through the evaluations triggered by auto-loading is suppressed.

This option turns on debugging, like --debugger but also requests stepping into the parse-time macro-expansion of TXR Lisp code embedded in TXR queries. Normally, this is suppressed.

Prints usage summary on standard output, and terminates successfully.

Prints the software license. This depends on the software being installed such that the LICENSE file is in the data directory. Use of TXR implies agreement with the liability disclaimer in the license.

Prints program version standard output, and terminates successfully.

The --args option provides a way to encode multiple arguments as a single argument, which is useful on some systems which have limitations in their implementation of the "hash bang" mechanism. For details about its special syntax, See Hash Bang Support below. It is also very useful in stand-alone application deployment. See the section STAND-ALONE APPLICATION SUPPORT, in which example uses of --args are shown.

The --eargs option (extended --args) is like --args but must be followed by an argument. The argument is removed from the argument list and substituted in place of occurrences of {} among the arguments expanded from the --eargs syntax.

These options influences the treatment of query files which do not have a suffix indicating their type. The --lisp option causes an unsuffixed file to be treated as Lisp source; and --compiled causes it to be treated as a compile file.

Moreover, if --lisp is specified, and an unsuffixed file does not exist, then TXRwill add the ".tl" suffix and try the file again; and --compiled will similarly add the ".tlo" suffix and try opening the file again. In the same situation, if neither --lisp nor --compiled has been specified, TXR will first try adding the ".txr" suffix. If that fails, then the ".tlo" suffix will be tried and finally ".tl". Note that --lisp and --compiled influence how the argument of the -f option is treated, but only they precedes that option.

On platforms which support the POSIX exec family of functions, this option causes TXR to re-execute itself. The re-executed image receives the remaining arguments which follow the --reexec argument. Note: this option is useful for supporting setuid operation in "hash hang" scripts. On some platforms, the interpreter designated by a "hash bang" script runs without altered privilege, even if that interpreter is installed setuid. If the interpreter is executed directly, then setuid applies to it, but not if it is executed via "hash bang". If the --reexec option is used in the interpreter command line of such a script, the interpreter will re-execute itself, thereby gaining the setuid privilege. The re-executed image will then obtain the script name from the arguments which are passed to it and determine whether that script will run setuid. See the section SETUID/SETGID OPERATION.

This option enables a behavior which stresses the garbage collector with frequent garbage collection requests. The purpose is to make it more likely to reproduce certain kinds of bugs. It makes TXR run very slowly.

If TXR is enabled with Valgrind support, then this option is available. It enables code which uses the Valgrind API to integrate with the Valgrind debugger, for more accurate tracking of garbage collected objects. For example, objects which have been reclaimed by the garbage collector are marked as inaccessible, and marked as uninitialized when they are allocated again.

If this option is used, then regular expressions are all treated using the derivative-based back-end. The NFA-based regex implementation is disabled. Normally, only regular expressions which require the intersection and complement operators are handled using the derivative back-end. This option makes it possible to test that back-end on test cases that it wouldn't normally receive.

Signifies the end of the option list.

This argument is not interpreted as an option, but treated as a filename argument. After the first such argument, no more options are recognized. Even if another argument looks like an option, it is treated as a name. This special argument - means "read from standard input" instead of a file. The script-file, or any of the data files, may be specified using this option. If two or more files are specified as -, the behavior is system-dependent. It may be possible to indicate EOF from the interactive terminal, and then specify more input which is interpreted as the second file, and so forth.

After the options, the remaining arguments are files. The first file argument specifies the script file, and is mandatory if the -f option has not been specified, and TXR isn't operating in interactive mode or evaluating expressions from the command line via -e or one of the related options. A file argument consisting of a single - means to read the standard input instead of opening a file.

Specifying standard input as a source with an explicit - argument is unnecessary. If no data source arguments are present, then TXR scans standard input by default. This was not true in versions of TXRprior to 171; see the COMPATIBILITY section.

TXR begins by reading the script. In the case of the TXR pattern language, the entire query is scanned, internalized and then begins executing, if it is free of syntax errors. (TXR Lisp is processed differently, form by form). On the other hand, the pattern language reads data files in a lazy manner. A file isn't opened until the query demands material from that file, and then the contents are read on demand, not all at once.

The suffix of the script-file is significant. If the name has no suffix, or if it has a ".txr" suffix, then it is assumed to be in the TXR pattern language. If it has the ".tl" suffix, then it is assumed to be TXR Lisp. The --lisp option changes the treatment of unsuffixed script file names, causing them to be interpreted as TXR Lisp .

If an unsuffixed script file name is specified, and cannot be opened, then TXR will add the ".txr" suffix and try again. If that fails, it will be tried with the ".tl" suffix, and treated as TXR Lisp . If the --lisp option has been specified, then TXR tries only the ".tl" suffix.

A TXR Lisp file is processed as if by the load macro: forms from the file are read and evaluated. If the forms do not terminate the TXR process or throw an exception, and there are no syntax errors, then TXR terminates successfully after evaluating the last form. If syntax errors are encountered in a form, then TXR terminates unsuccessfully. TXR Lisp is documented in the section TXR LISP.

If a query file is specified, but no file arguments, it is up to the query to open a file, pipe or standard input via the @(next) directive prior to attempting to make a match. If a query attempts to match text, but has run out of files to process, the match fails.



TXR sends errors and verbose logs to the standard error device. The following paragraphs apply when TXR is run without enabling verbose mode with -v, or the printing of variable bindings with -B or -a.

If the command line arguments are incorrect, TXR issues an error diagnostic and terminates with a failed status.

If the script-file specifies a query, and the query has a malformed syntax, TXR likewise issues error diagnostics and terminates with a failed status.

If the query fails due to a mismatch, TXR terminates with a failed status. No diagnostics are issued.

If the query is well-formed, and matches, then TXR issues no diagnostics, and terminates with a successful status.

In verbose mode (option -v), TXR issues diagnostics on the standard error device even in situations which are not erroneous.

In bindings-printing mode (options -B or -a), TXR prints the word false if the query fails, and exits with a failed termination status. If the query succeeds, the variable bindings, if any, are output on standard output.

If the script-file is TXR Lisp, then it is processed form by form. Each top-level Lisp form is evaluated after it is read. If any form is syntactically malformed, TXR issues diagnostics and terminates unsuccessfully. This is somewhat different from how the pattern language is treated: a script in the pattern language is parsed in its entirety before being executed.





A query may contain comments which are delimited by the sequence @; and extend to the end of the line. Whitespace can occur between the @ and ;. A comment which begins on a line swallows that entire line, as well as the newline which terminates it. In essence, the entire comment line disappears. If the comment follows some material in a line, then it does not consume the newline. Thus, the following two queries are equivalent:

 @a@; comment: match whole line against variable @a
 @; this comment disappears entirely


The comment after the @a does not consume the newline, but the comment which follows does. Without this intuitive behavior, line comment would give rise to empty lines that must match empty lines in the data, leading to spurious mismatches.

Instead of the ; character, the # character can be used. This is an obsolescent feature.


6.2 Hash Bang Support

TXR has several features which support use of the "hash bang" convention for creating apparently stand-alone executable programs.


6.2.1 Basic Hash Bang

Special processing is applied to TXR query or TXR Lisp script files that are specified on the command line via the -f option or as the first non-option argument. If the first line of such a file begins with the characters #!, that entire line is consumed and processed specially.

This removal for TXR queries to be turned into standalone executable programs in the POSIX environment using the "hash bang" mechanism. Unlike most interpreters, TXR applies special processing to the #! line, which is described below, in the section Argument Generation with the Null Hack.

Shell session example: create a simple executable program called "twoline.txr" and run it. This assumes TXR is installed in /usr/bin.

  $ cat > hello.txr
  @(bind a "Hey")
  Hello, world!
  $ chmod a+x hello.txr
  $ ./hello.txr
  Hello, world!

When this plain hash bang line is used, TXR receives the name of the script as an argument. Therefore, it is not possible to pass additional options to TXR. For instance, if the above script is invoked like this

  $ ./hello.txr -B

the -B option isn't processed by TXR, but treated as an additional argument, just as if txr scriptname -B had been executed directly.

This behavior is useful if the script author wants not to expose the TXR options to the user of the script.

However, the hash bang line can use the -f option:

  #!/usr/bin/txr -f

Now, the name of the script is passed as an argument to the -f option, and TXR will look for more options after that, so that the resulting program appears to accept TXR options. Now we can run

  $ ./hello.txr -B
  Hello, world!

The -B option is honored.


6.2.2 Argument Generation with --args and --eargs

On some operating systems, it is not possible to pass more than one argument through the hash bang mechanism. That is to say, this will not work.

  #!/usr/bin/txr -B -f

To support systems like this, TXR supports the special argument --args, as well as as an extended version, --eargs. With --args, it is possible to encode multiple arguments into one argument. The --args option must be followed by a separator character, chosen by the programmer. The characters after that are split into multiple arguments on the separator character. The --args option is then removed from the argument list and replaced with these arguments, which are processed in its place.


  #!/usr/bin/txr --args:-B:-f

The above has the same behavior as

  #!/usr/bin/txr -B -f

on a system which supports multiple arguments in hash bang. The separator character is the colon, and so the remainder of that argument, -B:-f, is split into the two arguments -B -f.

The --eargs mechanism allows an additional flexibility. An --eargs argument must be followed by one more argument.

After --eargs performs the argument splitting in the same manner as --args, any of the arguments which it produces which are the two-character sequence {} are replaced with that following argument. Whether or not the replacement occurs, that following argument is then removed.


  #!/usr/bin/txr --eargs:-B:{}:--foo:42

This has an effect which cannot be replicated in any known implementation of the hash bang mechanism. Suppose that this hash bang line is placed in a script called script.txr. When this script is invoked with arguments, as in:

  script.txr a b c

then TXR is invoked similarly to:

  /usr/bin/txr --eargs:-B:{}:--foo:42 script.txr a b c

Then, when --eargs processing takes place, firstly the argument sequence

  -B {} --foo 42

is produced by splitting into four fields using the : character as the separator. Then, within these four fields, all occurrences of {} are replaced with the following argument script.txr, resulting in:

  -B script.txr --foo 42

Furthermore, that script.txr argument is removed from the remaining argument list.

The four arguments are then substituted in place of the original --eargs:-B:{}:--foo:42 syntax.

The resulting TXR invocation is, therefore:

  /usr/bin/txr -B script.txr --foo 42 a b c

Thus, --eargs allows some arguments to be encoded into the interpreter script, such that script name is inserted anywhere among them, possibly multiple times. Arguments for the interpreter can be encoded, as well as arguments to be processed by the script.


6.2.3 Argument Generation with the Null Hack

The --args and --eargs mechanisms do not solve the following problem: the POSIX env utility is often exploited for its PATH searching capability, and used to express hash bang scripts in the following way:

  #!/usr/bin/env txr

Here, the env utility searches for the txr program in the directories indicated by the PATH variable, which liberates the script from having encode the exact location where the program is installed. However, if the operating system allows only one argument in the hash bang mechanism, then no arguments can be passed to the program.

To mitigate this problem, TXRsupports a special feature in its hash bang support. If the hash bang #! line contains a null byte, then text after the null byte, to the end of the line, is split into fields using the space character as a separator, and these fields are inserted into the command line. This manipulation happens during command line processing, prior to the execution of the file, which happens after command-line processing. If this processing is applied to a file that is specified using the -f option, then the arguments which arise from the special processing are inserted after that option and its argument. If this processing is applied to the file which is the first non-option argument, then the options are inserted before that argument. However, care is taken not to process that argument a second time. In either situation, processing of the command line options continues, and the arguments which are processed next are the ones which were just inserted. This is true even if the options had been inserted as a result of processing the first non-option argument, which would ordinarily signal the termination of option processing.

In the following examples, it is assumed that the script is named, and invoked, as /home/jenny/foo.txr, and is given arguments --bar abc, and that txr resolves to /usr/bin/txr. The <NUL> code indicates a literal ASCII NUL character, or zero bytes.

Basic example:

  #!/usr/bin/env txr<NUL>-a 3

Here, env searches for txr receives, from the operating system the arguments:

  /usr/bin/txr /home/jenny/foo.txr --bar abc

The first non-option argument is the name of the script. TXR opens the script, and notices that it begins with a hash bang line. It consumes the hash bang line and finds the null byte inside it, retrieving the character string after it, which is "-a 3". This is split into the two arguments -a and 3, which are then inserted into the command line ahead of the the script name. The effective command line then becomes:

  /usr/bin/txr -a 3 /home/jenny/foo.txr --bar abc

Command line option processing continues, beginning with the -a option. After the option is processed, /home/amy/foo.txr is encountered again. This time it is not opened a second time; it signals the end of option processing, exactly as it would immediately do if it hadn't triggered the insertion of any arguments.

Advanced example: use env to invoke txr passing options to interpreter and to the script:

  #!/usr/bin/env txr<NUL>--eargs:-C:175:{}:--debug

This example shows how --eargs can be used in conjunction with the null hack. When txr begins executing, it receives the arguments

  /usr/bin/txr /home/amy/foo.txr

The script file is opened, and the arguments delimited by the null character in the hash bang line are inserted, resulting in the effective command line:

  /usr/bin/txr --eargs:-C:175:{}:--debug /home/amy/foo.txr

Next, --eargs is processed in the ordinary way, transforming the command line into:

  /usr/bin/txr -C 175 /home/amy/foo.txr --debug

The name of the script file is encountered, and signals the end of option processing. Thus txr receives the -C option, instructing it to emulate some behaviors from version 175, and the /home/amy/foo.txr script receives --debug as its argument: it executes with the *args* list containing one element, the character string "--debug".

The hash bang null hack feature was introduced in TXR 177. Previous versions ignore the hash bang line, performing no special processing. Where a risk exists that programs which depend on the feature might be executed by an older version of TXR, care must be taken to detect and handle that situation, either by means of the txr-version variable, or else by some logic which infers that the processing of the hash bang line hadn't been performed.


6.2.4 Hash Bang and Setuid

TXR supports setuid hash bang scripting, even on platforms that do not support setuid and setgid attributes on hash bang scripts. On such platforms, TXR has to be installed setuid/setgid. See the section SETUID/SETGID OPERATION. On some platforms, it may also be necessary to to use the --reexec option.


6.3 Whitespace

Outside of directives, whitespace is significant in TXR queries, and represents a pattern match for whitespace in the input. An extent of text consisting of an undivided mixture of tabs and spaces is a whitespace token.

Whitespace tokens match a precisely identical piece of whitespace in the input, with one exception: a whitespace token consisting of precisely one space has a special meaning. It is equivalent to the regular expression @/[ ]+/: match an extent of one or more spaces (but not tabs!). Multiple consecutive spaces do not have this meaning.

Thus, the query line "a b" (one space between a and b) matches "a b" with any number of spaces between the two letters.

For matching a single space, the syntax @\ can be used (backslash-escaped space).

It is more often necessary to match multiple spaces than to exactly match one space, so this rule simplifies many queries and adds inconvenience to only few.

In output clauses, string and character literals and quasiliterals, a space token denotes a space.


6.4 Text

Query material which is not escaped by the special character @ is literal text, which matches input character for character. Text which occurs at the beginning of a line matches the beginning of a line. Text which starts in the middle of a line, other than following a variable, must match exactly at the current position, where the previous match left off. Moreover, if the text is the last element in the line, its match is anchored to the end of the line.

An empty query line matches an empty line in the input. Note that an empty input stream does not contain any lines, and therefore is not matched by an empty line. An empty line in the input is represented by a newline character which is either the first character of the file, or follows a previous newline-terminated line.

Input streams which end without terminating their last line with a newline are tolerated, and are treated as if they had the terminator.

Text which follows a variable has special semantics, described in the section Variables below.

A query may not leave a line of input partially matched. If any portion of a line of input is matched, it must be entirely matched, otherwise a matching failure results. However, a query may leave unmatched lines. Matching only four lines of a ten line file is not a matching failure. The eof directive can be used to explicitly match the end of a file.

In the following example, the query matches the text, even though the text has an extra line.

 Four score and seven
 years ago our

 Four score and seven
 years ago our

In the following example, the query fails to match the text, because the text has extra material on one line that is not matched:

 I can carry nearly eighty gigs
 in my head

 I can carry nearly eighty gigs of data
 in my head

Needless to say, if the text has insufficient material relative to the query, that is a failure also.

To match arbitrary material from the current position to the end of a line, the "match any sequence of characters, including empty" regular expression @/.*/ can be used. Example:

 I can carry nearly eighty gigs@/.*/

 I can carry nearly eighty gigs of data

In this example, the query matches, since the regular expression matches the string "of data". (See Regular Expressions section below).

Another way to do this is:

 I can carry nearly eighty gigs@(skip)


6.5 Special Characters in Text

Control characters may be embedded directly in a query (with the exception of newline characters). An alternative to embedding is to use escape syntax. The following escapes are supported:

@\ newline
A backslash immediately followed by a newline introduces a physical line break without breaking up the logical line. Material following this sequence continues to be interpreted as a continuation of the previous line, so that indentation can be introduced to show the continuation without appearing in the data.
@\ space
A backslash followed by a space encodes a space. This is useful in line continuations when it is necessary for some or all of the leading spaces to be preserved. For instance the two line sequence

    @\  efg

is equivalent to the line

  abcd  efg

The two spaces before the @\ in the second line are consumed. The spaces after are preserved.

Alert character (ASCII 7, BEL).
Backspace (ASCII 8, BS).
Horizontal tab (ASCII 9, HT).
Line feed (ASCII 10, LF). Serves as abstract newline on POSIX systems.
Vertical tab (ASCII 11, VT).
Form feed (ASCII 12, FF). This character clears the screen on many kinds of terminals, or ejects a page of text from a line printer.
Carriage return (ASCII 13, CR).
Escape (ASCII 27, ESC)
@\x hex-digits
A @\x immediately followed by a sequence of hex digits is interpreted as a hexadecimal numeric character code. For instance @\x41 is the ASCII character A. If a semicolon character immediately follows the hex digits, it is consumed, and characters which follow are not considered part of the hex escape even if they are hex digits.
@\ octal-digits

A @\ immediately followed by a sequence of octal digits (0 through 7) is interpreted as an octal character code. For instance @\010 is character 8, same as @\b. If a semicolon character immediately follows the octal digits, it is consumed, and subsequent characters are not treated as part of the octal escape, even if they are octal digits.

Note that if a newline is embedded into a query line with @\n, this does not split the line into two; it's embedded into the line and thus cannot match anything. However, @\n may be useful in the @(cat) directive and in @(output).


6.6 Character Handling and International Characters

TXR represents text internally using wide characters, which are used to represent Unicode code points. Script source code, as well as all data sources, are assumed to be in the UTF-8 encoding. In TXR and TXR Lisp source, extended characters can be used directly in comments, literal text, string literals, quasiliterals and regular expressions. Extended characters can also be expressed indirectly using hexadecimal or octal escapes. On some platforms, wide characters may be restricted to 16 bits, so that TXR can only work with characters in the BMP (Basic Multilingual Plane) subset of Unicode.

TXR does not use the localization features of the system library; its handling of extended characters is not affected by environment variables like LANG and L_CTYPE. The program reads and writes only the UTF-8 encoding.

If TXR encounters an invalid bytes in the UTF-8 input, what happens depends on the context in which this occurs. In a query, comments are read without regard for encoding, so invalid encoding bytes in comments are not detected. A comment is simply a sequence of bytes terminated by a newline. In lexical elements which represent text, such as string literals, invalid or unexpected encoding bytes are treated as syntax errors. The scanner issues an error message, then discards a byte and resumes scanning. Certain sequences pass through the scanner without triggering an error, namely some UTF-8 overlong sequences. These are caught when when the lexeme is subject to UTF-8 decoding, and treated in the same manner as other UTF-8 data, described in the following paragraph.

Invalid bytes in data are treated as follows. When an invalid byte is encountered in the middle of a multibyte character, or if the input ends in the middle of a multibyte character, or if a character is extracted which is encoded as an overlong form, the UTF-8 decoder returns to the starting byte of the ill-formed multibyte character, and extracts just that byte, mapping it to the Unicode character range U+DC00 through U+DCFF. The decoding resumes afresh at the following byte, expecting that byte to be the start of a UTF-8 code.

Furthermore, because TXR internally uses a null-terminated character representation of strings which easily interoperates with C language interfaces, when a null character is read from a stream, TXR converts it to the code U+DC00. On output, this code converts back to a null byte, as explained in the previous paragraph. By means of this representational trick, TXR can handle textual data containing null bytes.


6.7 Regular Expression Directives

In place of a piece of text (see section Text above), a regular expression directive may be used, which has the following syntax:


where the RE part enclosed in slashes represents regular expression syntax (described in the section Regular Expressions below).

Long regular expressions can be broken into multiple lines using a backslash-newline sequence. Whitespace before the sequence or after the sequence is not significant, so the following two are equivalent:

  @/reg \


There may not be whitespace between the backslash and newline.

Whereas literal text simply represents itself, regular expression denotes a (potentially infinite) set of texts. The regular expression directive matches the longest piece of text (possibly empty) which belongs to the set denoted by the regular expression. The match is anchored to the current position; thus if the directive is the first element of a line, the match is anchored to the start of a line. If the regular expression directive is the last element of a line, it is anchored to the end of the line also: the regular expression must match the text from the current position to the end of the line.

Even if the regular expression matches the empty string, the match will fail if the input is empty, or has run out of data. For instance suppose the third line of the query is the regular expression @/.*/, but the input is a file which has only two lines. This will fail: the data has no line for the regular expression to match. A line containing no characters is not the same thing as the absence of a line, even though both abstractions imply an absence of characters.

Like text which follows a variable, a regular expression directive which follows a variable has special semantics, described in the section Variables below.


6.8 Variables

Much of the query syntax consists of arbitrary text, which matches file data character for character. Embedded within the query may be variables and directives which are introduced by a @ character. Two consecutive @@ characters encode a literal @.

A variable matching or substitution directive is written in one of several ways:

bident /regex/}
bident (fun [arg ... ])}
bident number}

The forms with an * indicate a long match, see Longest Match below. The last two three forms with the embedded regexp /regex/ or number or function have special semantics; see Positive Match below.

The identifier t cannot be used as a name; it is a reserved symbol which denotes the value true. An attempt to use the variable @t will result in an exception. The symbol nil can be used where a variable name is required syntactically, but it has special semantics, described in a section below.

A sident is a "simple identifier" form which is not delimited by braces.

A sident consists of any combination of one or more letters, numbers, and underscores. It may not look like a number, so that for instance 123 is not a valid sident, but 12A is valid. Case is sensitive, so that FOO is different from foo, which is different from Foo.

The braces around an identifier can be used when material which follows would otherwise be interpreted as being part of the identifier. When a name is enclosed in braces it is a bident.

The following additional characters may be used as part of bident which are not allowed in a sident:

 ! $ % & * + - < = > ? \ ~

Moreover, most Unicode characters beyond U+007F may appear in a bident, with certain exceptions. A character may not be used if it is any of the Unicode space characters, a member of the high or low surrogate region, a member of any Unicode private use area, or is one of the two characters U+FFFE or U+FFFF.

The rule still holds that a name cannot look like a number so +123 is not a valid bident but these are valid: a->b, *xyz*, foo-bar.

The syntax @FOO_bar introduces the name FOO_bar, whereas @{FOO}_bar means the variable named "FOO" followed by the text "_bar". There may be whitespace between the @ and the name, or opening brace. Whitespace is also allowed in the interior of the braces. It is not significant.

If a variable has no prior binding, then it specifies a match. The match is determined from some current position in the data: the character which immediately follows all that has been matched previously. If a variable occurs at the start of a line, it matches some text at the start of the line. If it occurs at the end of a line, it matches everything from the current position to the end of the line.


6.9 Negative Match

If a variable is one of the plain forms


then this is a "negative match". The extent of the matched text (the text bound to the variable) is determined by looking at what follows the variable, and ranges from the current position to some position where the following material finds a match. This is why this is called a "negative match": the spanned text which ends up bound to the variable is that in which the match for the trailing material did not occur.

A variable may be followed by a piece of text, a regular expression directive, a function call, a directive, another variable, or nothing (i.e. occurs at the end of a line). These cases are described in detail below.


6.9.1 Variable Followed by Nothing

If the variable is followed by nothing, the negative match extends from the current position in the data, to the end of the line. Example:
 a b c @FOO
 a b c defghijk


6.9.2 Variable Followed by Text

For the purposes of determining the negative match, text is defined as a sequence of literal text and regular expressions, not divided by a directive. So for instance in this example:

  @a:@/foo/bcd e@(maybe)f@(end)

the variable @a is considered to be followed by ":@/foo/bcd e".

If a variable is followed by text, then the extent of the negative match is determined by searching for the first occurrence of that text within the line, starting at the current position.

The variable matches everything between the current position and the matching position (not including the matching position). Any whitespace which follows the variable (and is not enclosed inside braces that surround the variable name) is part of the text. For example:

 a b @FOO e f
 a b c d e f
 FOO="c d"

In the above example, the pattern text "a b " matches the data "a b ". So when the @FOO variable is processed, the data being matched is the remaining "c d e f". The text which follows @FOO is " e f". This is found within the data "c d e f" at position 3 (counting from 0). So positions 0-2 ("c d") constitute the matching text which is bound to FOO.


6.9.3 Variable Followed by a Function Call or Directive

If the variable is followed by a function call, or a directive, the extent is determined by scanning the text for the first position where a match occurs for the entire remainder of the line. (For a description of functions, see Functions.)

For example:

  @foo@(bind a "abc")xyz

Here, foo will match the text from the current position to where "xyz" occurs, even though there is a @(bind) directive. Furthermore, if more material is added after the xyz, it is part of the search. Note the difference between the following two:


In the first example, the variable foo matches the text from the current position until the match for the regular expression abc. @(func) is not considered when processing @foo. In the second example, the variable foo matches the text from the current position until the position which matches the function call, followed by a match for the regular expression. The entire sequence @(func)@/abc/ is considered.


6.9.4 Consecutive Variables

If an unbound variable specifies a fixed-width match or a regular expression, then the issue of consecutive variables does not arise. Such a variable consumes text regardless of any context which follows it.

However, what if an unbound variable with no modifier is followed by another variable? The behavior depends on the nature of the other variable.

If the other variable is also unbound, and also has no modifier, this is a semantic error which will cause the query to fail. A diagnostic message will be issued, unless operating in quiet mode via -q. The reason is that there is no way to bind two consecutive variables to an extent of text; this is an ambiguous situation, since there is no matching criterion for dividing the text between two variables. (In theory, a repetition of the same variable, like @FOO@FOO, could find a solution by dividing the match extent in half, which would work only in the case when it contains an even number of characters. This behavior seems to have dubious value).

An unbound variable may be followed by one which is bound. The bound variable is effectively replaced by the text which it denotes, and the logic proceeds accordingly.

It is possible for a variable to be bound to a regular expression. If x is an unbound variable and y is bound to a regular expression RE, then @x@y means @x@/RE/. A variable v can be bound to a regular expression using, for example, @(bind v #/RE/).

The @* syntax for longest match is available. Example:

 FOO=xyz, BAR=def

Here, FOO is matched with "xyz", based on the delimiting around the colon. The colon in the pattern then matches the colon in the data, so that BAR is considered for matching against "defxyz". BAR is followed by FOO, which is already bound to "xyz". Thus "xyz" is located in the "defxyz" data following "def", and so BAR is bound to "def".

If an unbound variable is followed by a variable which is bound to a list, or nested list, then each character string in the list is tried in turn to produce a match. The first match is taken.

An unbound variable may be followed by another unbound variable which specifies a regular expression or function call match. This is a special case called a "double variable match". What happens is that the text is searched using the regular expression or function. If the search fails, than neither variable is bound: it is a matching failure. If the search succeeds, than the first variable is bound to the text which is skipped by the search. The second variable is bound to the text matched by the regular expression or function. Examples:

 @foo@{bar /abc/}
 foo="xyz@#", BAR="abc"


6.9.5 Consecutive Variables Via Directive

Two variables can be de facto consecutive in a manner shown in the following example:


This is treated just like the variable followed by directive. No semantic error is identified, even if both variables are unbound. Here, @var2 matches everything at the current position, and so @var1 ends up bound to the empty string.

Example 1: b matches at position 0 and a binds the empty string:


Example 2: *a specifies longest match (see Longest Match below), and so it takes everything:



6.9.6 Longest Match

The closest-match behavior for the negative match can be overridden to longest match behavior. A special syntax is provided for this: an asterisk between the @ and the variable, e.g:
 a @*{FOO}cd
 a b cdcdcdcd
 FOO="b cdcdcd"

 a @{FOO}cd
 a b cdcdcd
 FOO="b "

In the former example, the match extends to the rightmost occurrence of "cd", and so FOO receives "b cdcdcd". In the latter example, the * syntax isn't used, and so a leftmost match takes place. The extent covers only the "b ", stopping at the first "cd" occurrence.


6.10 Positive Match

There are syntactic variants of variable syntax which have an embedded expression enclosed with the variable in braces:

bident /regex/}
bident (fun [args...])}
bident number}
bident bident}

These specify a variable binding that is driven by a positive match derived from a regular expression, function or character count, rather than from trailing material (which is regarded as a "negative" match, since the variable is bound to material which is skipped in order to match the trailing material). In the /regex/ form, the match extends over all characters from the current position which match the regular expression regex. (see Regular Expressions section below). In the (fun [args ...]) form, the match extends over characters which are matched by the call to the function, if the call succeeds. Thus @{x (y z w)} is just like @(y z w), except that the region of text skipped over by @(y z w) is also bound to the variable x. See Functions below.

In the number form, the match processes a field of text which consists of the specified number of characters, which must be non-negative number. If the data line doesn't have that many characters starting at the current position, the match fails. A match for zero characters produces an empty string. The text which is actually bound to the variable is all text within the specified field, but excluding leading and trailing whitespace. If the field contains only spaces, then an empty string is extracted.

This syntax is processed without consideration of what other syntax follows. A positive match may be directly followed by an unbound variable.

bident bident} syntax allows the number or regex modifier to come from a variable. The variable must be bound and contain a non-negative integer or regular expression. For example, @{x y} behaves like @{x 3} if y is bound to the integer 3. It is an error if y is unbound.


6.11 Special Symbols nil and t

Just like in the Common Lisp language, the names nil and t are special.

nil symbol stands for the empty list object, an object which marks the end of a list, and Boolean false. It is synonymous with the syntax () which may be used interchangeably with nil in most constructs.

In TXR Lisp, nil and t cannot be used as variables. When evaluated, they evaluate to themselves.

In the TXR pattern language, nil can be used in the variable binding syntax, but does not create a binding; it has a special meaning. It allows the variable matching syntax to be used to skip material, in ways similar to the skip directive.

The nil symbol is also used as a block name, both in the TXR pattern language and in TXR Lisp. A block named nil is considered to be anonymous.


6.12 Keyword Symbols

Names whose names begin with the : character are keyword symbols. These also may not be used as variables either and stand for themselves. Keywords are useful for labeling information and situations.


6.13 Regular Expressions

Regular expressions are a language for specifying sets of character strings. Through the use of pattern matching elements, regular expression is able to denote an infinite set of texts. TXR contains an original implementation of regular expressions, which supports the following syntax:

The period is a "wildcard" that matches any character.
Character class: matches a single character, from the set specified by special syntax written between the square brackets. This supports basic regexp character class syntax. POSIX notation like [:digit:] is not supported. The regex tokens \s, \d and \w are permitted in character classes, but not their complementing counterparts. These tokens simply contribute their characters to the class. The class [a-zA-Z] means match an uppercase or lowercase letter; the class [0-9a-f] means match a digit or a lowercase letter; the class [^0-9] means match a non-digit, and so forth. There are no locale-specific behaviors in TXR regular expressions; [A-Z] denotes an ASCII/Unicode range of characters. The class [\d.] means match a digit or the period character. A ] or - can be used within a character class, but must be escaped with a backslash. A ^ in the first position denotes a complemented class, unless it is escaped by backslash. In any other position, it denotes itself. Two backslashes code for one backslash. So for instance [\[\-] means match a [ or - character, [^^] means match any character other than ^, and [\^\\] means match either a ^ or a backslash. Regex operators such as *, + and & appearing in a character class represent ordinary characters. The characters -, ] and ^ occurring outside of a character class are ordinary. Unescaped / characters can appear within a character class. The empty character class [] matches no character at all, and its complement [^] matches any character, and is treated as a synonym for the . (period) wildcard operator.
\s, \w and \d
These regex tokens each match a single character. The \s regex token matches a wide variety of ASCII whitespace characters and Unicode spaces. The \w token matches alphabetic word characters; it is equivalent to the character class [A-Za-z_]. The \d token matches a digit, and is equivalent to [0-9].
\S, \W and \D
These regex tokens are the complemented counterparts of \s, \w and \d. The \S token matches all those characters which \s does not match, \W matches all characters that \w does not match and \D matches nondigits.
An empty expression is a regular expression. It represents the set of strings consisting of the empty string; i.e. it matches just the empty string. The empty regex can appear alone as a full regular expression (for instance the TXR syntax @// with nothing between the slashes) and can also be passed as a subexpression to operators, though this may require the use of parentheses to make the empty regex explicit. For example, the expression a| means: match either a, or nothing. The forms * and (*) are syntax errors; though not useful, the correct way to match the empty expression zero or more times is the syntax ()*.
The nomatch regular expression represents the empty set: it matches no strings at all, not even the empty string. There is no dedicated syntax to directly express nomatch in the regex language. However, the empty character class [] is equivalent to nomatch, and may be considered to be a notation for it. Other representations of nomatch are possible: for instance, the regex ~.* which is the complement of the regex that denotes the set of all possible strings, and thus denotes the empty set. A nomatch has uses; for instance, it can be used to temporarily "comment out" regular expressions. The regex ([]abc|xyz) is equivalent to (xyz), since the []abc branch cannot match anything. Using [] to "block" a subexpression allows you to leave it in place, then enable it later by removing the "block".
If R is a regular expression, then so is (R). The contents of parentheses denote one regular expression unit, so that for instance in (RE)*, the * operator applies to the entire parenthesized group. The syntax () is valid and equivalent to the empty regular expression.
Optionally match the preceding regular expression R.
Match the expression R zero or more times. This operator is sometimes called the "Kleene star", or "Kleene closure". The Kleene closure favors the longest match. Roughly speaking, if there are two or more ways in which R1*R2 can match, than that match occurs in which R1* matches the longest possible text.
Match the preceding expression R one or more times. Like R*, this favors the longest possible match: R+ is equivalent to RR*.
Match R1 zero or more times, then match R2. If this match can occur in more than one way, then it occurs such that R1 is matched the fewest number of times, which is opposite from the behavior of R1*R2. Repetitions of R1 terminate at the earliest point in the text where a non-empty match for R2 occurs. Because it favors shorter matches, % is termed a non-greedy operator. If R2 is the empty expression, or equivalent to it, then R1%R2 reduces to R1*. So for instance (R%) is equivalent to (R*), since the missing right operand is interpreted as the empty regex. Note that whereas the expression (R1*R2) is equivalent to (R1*)R2, the expression (R1%R2) is not equivalent to (R1%)R2. Also note that A(XY%Z)B is equivalent to AX(Y%Z)B. This is because the precedence of % is higher than that of catenation on its left side; this rule prevents the given syntax from expressing the XY catenation. The expression may be understood as: A(X(Y%Z))B where the inner parentheses clarify how the syntax surrounding the % operator is being parsed, and the outer parentheses are superfluous. The correct way to assert catenation of XY as the left operand of % is A(XY)%ZB. To specify XY as the left operand, and limit the right operand to just Z, the correct syntax is A((XY)%Z)B. By contrast, the expression A(X%YZ)B is not equivalent to A(X%Y)ZB because the precedence of % is lower than that of catenation on its right side. The operator is effectively "bi-precedential".
Match the opposite of the following expression R; that is, match exactly those texts that R does not match. This operator is called complement, or logical not.
Two consecutive regular expressions denote catenation: the left expression must match, and then the right.
match either the expression R1 or R2. This operator is known by a number of names: union, logical or, disjunction, branch, or alternative.
Match both the expression R1 and R2 simultaneously; i.e. the matching text must be one of the texts which are in the intersection of the set of texts matched by R1 and the set matched by R2. This operator is called intersection, logical and, or conjunction.

Any character which is not a regular expression operator, a backslash escape, or the slash delimiter, denotes one-position match of that character itself.

Any of the special characters, including the delimiting /, and the backslash, can be escaped with a backslash to suppress its meaning and denote the character itself.

Furthermore, all of the same escapes as are described in the section Special Characters in Text above are supported - the difference is that in regular expressions, the @ character is not required, so for example a tab is coded as \t rather than @\t. Octal and hex character escapes can be optionally terminated by a semicolon, which is useful if the following characters are octal or hex digits not intended to be part of the escape.

Only the above escapes are supported. Unlike in some other regular expression implementations, if a backlash appears before a character which isn't a regex special character or one of the supported escape sequences, it is an error. This wasn't true of historic versions of TXR. See the COMPATIBILITY section.

Precedence table, highest to lowest:
(R) []primary
R? R+ R* R%...postfixleft-to-right
~R ...%Runaryright-to-left

The % operator is like a postfix operator with respect to its left operand, but like a unary operator with respect to its right operand. Thus a~b%c~d is a(~(b%(c(~d)))) , demonstrating right-to-left associativity, where all of b% may be regarded as a unary operator being applied to c~d. Similarly, a?*+%b means (((a?)*)+)%b, where the trailing %b behaves like a postfix operator.

In TXR, regular expression matches do not span multiple lines. The regex language has no feature for multi-line matching. However, the @(freeform) directive allows the remaining portion of the input to be treated as one string in which line terminators appear as explicit characters. Regular expressions may freely match through this sequence.

It's possible for a regular expression to match an empty string. For instance, if the next input character is z, facing a the regular expression /a?/, there is a zero-character match: the regular expression's state machine can reach an acceptance state without consuming any characters. Examples:


 @{A /a?/}@B
 A="", B="zzzz"


In the first example, variable @A is followed by a regular expression which can match an empty string. The expression faces the letter z at position 0 in the data line. A zero-character match occurs there, therefore the variable A takes on the empty string. The @/.*/ regular expression then consumes the line.

Similarly, in the second example, the /a?/ regular expression faces a z, and thus yields an empty string which is bound to A. Variable @B consumes the entire line.

The third example requests the longest match for the variable binding. Thus, a search takes place for the rightmost position where the regular expression matches. The regular expression matches anywhere, including the empty string after the last character, which is the rightmost place. Thus variable A fetches the entire line.

For additional information about the advanced regular expression operators, NOTES ON EXOTIC REGULAR EXPRESSIONS below.


6.14 Compound Expressions

If the @ escape character is followed by an open parenthesis or square bracket, this is taken to be the start of a TXR Lisp compound expression.

The TXR language has the unusual property that its syntactic elements, so-called directives, are Lisp compound expressions. These expressions not only enclose syntax, but expressions which begin with certain symbols de facto behave as tokens in a phrase structure grammar. For instance, the expression @(collect) begins a block which must be terminated by the expression @(end), otherwise there is a syntax error. The collect expression can contain arguments which modify the behavior of the construct, for instance @(collect :gap 0 :vars (a b)). In some ways, this situation might be compared to the HTML language, in which an element such as <a> must be terminated by </a> and can have attributes such as <a href="...">.

Compound contain subexpressions: other compound expressions, or literal objects of various kinds. Among these are: symbols, numbers, string literals, character literals, quasiliterals and regular expressions. These are described in the following sections. Additional kinds of literal objects exist, which are discussed in the TXR LISP section of the manual.

Some examples of compound expressions are:


  (a b c (d e f))

  (  a (b (c d) (e  ) ))

  ("apple" #\b #\space 3)

  (a #/[a-z]*/ b)

  (_ `@file.txt`)

Symbols occurring in a compound expression follow a slight more permissive lexical syntax than the bident in the syntax @{bident} introduced earlier. The / (slash) character may be part of an identifier, or even constitute an entire identifier. In fact a symbol inside a directive is a lident. This is described in the Symbol Tokens section under TXR LISP. A symbol must not be a number; tokens that look like numbers are treated as numbers and not symbols.


6.15 Character Literals

Character literals are introduced by the #\ syntax, which is either followed by a character name, the letter x followed by hex digits, the letter o followed by octal digits, or a single character. Valid character names are:

  nul                 linefeed            return
  alarm               newline             esc
  backspace           vtab                space
  tab                 page                pnul

For instance #\esc denotes the escape character.

This convention for character literals is similar to that of the Scheme language. Note that #\linefeed and #\newline are the same character. The #\pnul character is specific to TXR and denotes the U+DC00 code in Unicode; the name stands for "pseudo-null", which is related to its special function. For more information about this, see the section "Character Handling and International Characters".


6.16 String Literals

String literals are delimited by double quotes. A double quote within a string literal is encoded using \" and a backslash is encoded as \\. Backslash escapes like \n and \t are recognized, as are hexadecimal escapes like \xFF or \xxabc and octal escapes like \123. Ambiguity between an escape and subsequent text can be resolved by using trailing semicolon delimiter: "\xabc;d" is a string consisting of the character U+0ABC followed by "d". The semicolon delimiter disappears. To write a literal semicolon immediately after a hex or octal escape, write two semicolons, the first of which will be interpreted as a delimiter. Thus, "\x21;;" represents "!;".

If the line ends in the middle of a literal, it is an error, unless the last character is a backslash. This backslash is a special escape which does not denote a character; rather, it indicates that the string literal continues on the next line. The backslash is deleted, along with whitespace which immediately precedes it, as well as leading whitespace in the following line. The escape sequence "\ " (backslash space) can be used to encode a significant space.


  "foo   \

  "foo   \
  \ bar"

  "foo\  \

The first string literal is the string "foobar". The second two are "foo bar".


6.17 Word List Literals

A word list literal (WLL) provides a convenient way to write a list of strings when such a list can be given as whitespace-delimited words.

There are two flavors of the WLL: the regular WLL which begins with #" (hash, double-quote) and the splicing list literal which begins with #*" (hash, star, double-quote).

Both types are terminated by a double quote, which may be escaped as \" in order to include it as a character. All the escaping conventions used in string literals can be used in word literals.

Unlike in string literals, whitespace (tabs and spaces) is not significant in word literals: it separates words. Whitespace may be escaped with a backslash in order to include it as a literal character.

Just like in string literals, an unescaped newline character is not allowed. A newline preceded by a backslash is permitted. Such an escaped backslash, together with any leading and trailing unescaped whitespace, is removed and replaced with a single space.


  #"abc def ghi"   --> notates ("abc" "def" "ghi")

  #"abc   def \
      ghi"         --> notates ("abc" "def" "ghi")

  #"abc\ def ghi" --> notates ("abc def" "ghi")

  #"abc\ def\ \
   \ ghi"         --> notates ("abc def " " ghi")

A splicing word literal differs from a word literal in that it does not produce a list of string literals, but rather it produces a sequence of string literals that is merged into the surrounding syntax. Thus, the following two notations are equivalent:

  (1 2 3 #*"abc def" 4 5 #"abc def")

  (1 2 3 "abc" "def" 4 5 ("abc" "def"))

The regular WLL produced a single list object, but the splicing WLL expanded into multiple string literal objects.


6.18 String Quasiliterals

Quasiliterals are similar to string literals, except that they may contain variable references denoted by the usual @ syntax. The quasiliteral represents a string formed by substituting the values of those variables into the literal template. If a is bound to "apple" and b to "banana", the quasiliteral `one @a and two @{b}s` represents the string "one apple and two bananas". A backquote escaped by a backslash represents itself. Unlike in directive syntax, two consecutive @ characters do not code for a literal @, but cause a syntax error. The reason for this is that compounding of the @ syntax is meaningful. Instead, there is a \@ escape for encoding a literal @ character. Quasiliterals support the full output variable syntax. Expressions within variable substitutions follow the evaluation rules of TXR Lisp. This hasn't always been the case: see the COMPATIBILITY section.

Quasiliterals can be split into multiple lines in the same way as ordinary string literals.


6.19 Quasiword List Literals

The quasiword list literals (QLL-s) are to quasiliterals what WLL-s are to ordinary literals. (See the above section Word List Literals.)

A QLL combines the convenience of the WLL with the power of quasistrings.

Just as in the case of WLL-s, there are two flavors of the QLL: the regular QLL which begins with #`  (hash, backquote) and the splicing QLL which begins with #*`  (hash, star, backquote).

Both types are terminated by a backquote, which may be escaped as \`  in order to include it as a character. All the escaping conventions used in quasiliterals can be used in QLL.

Unlike in quasiliterals, whitespace (tabs and spaces) is not significant in QLL: it separates words. Whitespace may be escaped with a backslash in order to include it as a literal character.

A newline is not permitted unless escaped. An escaped newline works exactly the same way as it does in word list literals (WLL-s).

Note that the delimiting into words is done before the variable substitution. If the variable a contains spaces, then #`@a` nevertheless expands into a list of one item: the string derived from a.


  #`abc @a ghi`  --> notates (`abc` `@a` `ghi`)

  #`abc   @d@e@f \
  ghi`            --> notates (`abc` `@d@e@f` `ghi`)

  #`@a\ @b @c` --> notates (`@a @b` `@c`)

A splicing QLL differs from an ordinary QLL in that it does not produce a list of quasiliterals, but rather it produces a sequence of quasiliterals that is merged into the surrounding syntax.


6.20 Numbers

TXR supports integers and floating-point numbers.

An integer constant is made up of digits 0 through 9, optionally preceded by a + or - sign.



An integer constant can also be specified in hexadecimal using the prefix #x followed by an optional sign, followed by hexadecimal digits: 0 through 9 and the upper or lower case letters A through F:

  #xFF    ;; 255
  #x-ABC  ;; -2748

Similarly, octal numbers are supported with the prefix #o followed by octal digits:

  #o777   ;; 511

and binary numbers can be written with a #b prefix:

  #b1110  ;; 14

Note that the #b prefix is also used for buffer literals.

A floating-point constant is marked by the inclusion of a decimal point, the exponential "e notation", or both. It is an optional sign, followed by a mantissa consisting of digits, a decimal point, more digits, and then an optional exponential notation consisting of the letter e or E, an optional + or - sign, and then digits indicating the exponent value. In the mantissa, the digits are not optional. At least one digit must either precede the decimal point or follow. That is to say, a decimal point by itself is not a floating-point constant.



Examples which are not floating-point constant tokens:

  .      ;; dot token, not a number
  123E   ;; the symbol 123E
  1.0E-  ;; syntax error: invalid floating point constant
  1.0E   ;; syntax error: invalid floating point constant
  1.E    ;; syntax error: invalid floating point literal
  .e     ;; syntax error: dot token followed by symbol

In TXR there is a special "dotdot" token consisting of two consecutive periods. An integer constant followed immediately by dotdot is recognized as such; it is not treated as a floating constant followed by a dot. That is to say, 123.. does not mean 123. . (floating point 123.0 value followed by dot token). It means 123 .. (integer 123 followed by .. token).

Dialect note: unlike in Common Lisp, 123. is not an integer, but the floating-point number 123.0.



Comments of the form @; were introduced earlier. Inside compound expressions, another convention for comments exists: Lisp comments, which are introduced by the ; (semicolon) character and span to the end of the line.


  @(foo  ; this is a comment
    bar  ; this is another comment

This is equivalent to @(foo bar).




7.1 Overview

When a TXR Lisp compound expressions occurs in TXR preceded by a @, it is a directive.

Directives which are based on certain symbols are, additionally, involved in a phrase-structure syntax which uses Lisp expressions as if they were tokens.

For instance, the directive


not only denotes a compound expression with the collect symbol in its head position, but it also introduces a syntactic phrase which requires a matching @(end) directive. In other words, @(collect) is not only an expression, but serves as a kind of token in a higher level phrase structure grammar.

Effectively, collect is a reserved symbol in the TXR language. A TXR program cannot use this symbol as the name of a pattern function, due to its role in the syntax. Lisp code, of course, can use the symbol.

Usually if this type of directive occurs alone in a line, not preceded or followed by other material, it is involved in a "vertical" (or line oriented) syntax.

If such a directive is embedded in a line (has preceding or trailing material) then it is in a horizontal syntactic and semantic context (character-oriented).

There is an exception: the definition of a horizontal function looks like this:

  @(define name (arg))body material@(end)

Yet, this is considered one vertical item, which means that it does not match a line of data. (This is necessary because all horizontal syntax matches something within a line of data, which is undesirable for definitions.)

Many directives exhibit both horizontal and vertical syntax, with different but closely related semantics. A few are vertical only, and some are horizontal only.

A summary of the available directives follows:

Explicitly match the end of file. Fails if unmatched data remains in the input stream.

Explicitly match the end of line. Fails if the current position is not the end of a line. Also fails if no data remains (there is no current line).

Continue matching in another file or other data source.

Groups together a sequence of directives into a logical name block, which can be explicitly terminated from within using the @(accept) and @(fail) directives. Blocks are described in the section Blocks below.

Treat the remaining query as a subquery unit, and search the lines (or characters) of the input file until that subquery matches somewhere. A skip is also an anonymous block.

Treat the remaining query or subquery as a match for a trailing context. That is to say, if the remainder matches, the data position is not advanced.

Treat the remainder of the input as one big string, and apply the following query line to that string. The newline characters (or custom separators) appear explicitly in that string.

The fuzz directive, inspired by the patch utility, specifies a partial match for some lines.

@(line) and @(chr)
These directives match a variable or expression against the current line number or character position.

Match a variable against the name of the current data source.

Match a variable against the remaining data (lazy list of strings).

Multiple clauses are each applied to the same input. Succeeds if at least one of the clauses matches the input. The bindings established by earlier successful clauses are visible to the later clauses.

Multiple clauses are applied to the same input. Succeeds if and only if each one of the clauses matches. The clauses are applied in sequence, and evaluation stops on the first failure. The bindings established by earlier successful clauses are visible to the later clauses.

Multiple clauses are applied to the same input. Succeeds if and only if none of them match. The clauses are applied in sequence, and evaluation stops on the first success. No bindings are ever produced by this construct.

Multiple clauses are applied to the same input. No failure occurs if none of them match. The bindings established by earlier successful clauses are visible to the later clauses.

Multiple clauses are applied to the same input. Evaluation stops on the first successful clause.

The require directive is similar to the do directive in that it evaluates one or more TXR Lisp expressions. If the result of the rightmost expression is nil, then require triggers a match failure. See the TXR LISP section far below.

@(if), @(elif), and @(else)
The if directive with optional elif and else clauses allows one of multiple bodies of pattern matching directives to be conditionally selected by testing the values of Lisp expressions.

Multiple clauses are applied to the same input. The one whose effect persists is the one which maximizes or minimizes the length of a particular variable.

The @(empty) directive matches the empty string. It is useful in certain situations, such as expressing an empty match in a directive that doesn't accept an empty clause. The @(empty) syntax has another meaning in @(output) clauses, in conjunction with @(repeat).

@(define name (args ...))
Introduces a function. Functions are described in the Functions section below.

@(call expr args*)
Performs function indirection. Evaluates expr, which must produce a symbol that names a pattern function. Then that pattern function is invoked.

Searches text for matches for multiple clauses which may occur in arbitrary order. For convenience, lines of the first clause are treated as separate clauses.

Search the data for multiple matches of a clause. Collect the bindings in the clause into lists, which are output as array variables. The @(collect) directive is line oriented. It works with a multi-line pattern and scans line by line. A similar directive called @(coll) works within one line.

A collect is an anonymous block.

Separator of clauses for @(some), @(all), @(none), @(maybe) and @(cases). Equivalent to @(or). The choice is stylistic.

Separator of clauses for @(some), @(all), @(none), @(maybe) and @(cases). Equivalent to @(and). The choice is stylistic.

Required terminator for @(some), @(all), @(none), @(maybe), @(cases), @(if), @(collect), @(coll), @(output), @(repeat), @(rep), @(try), @(block) and @(define).

Terminate the processing of a block, as if it were a failed match. Blocks are described in the section Blocks below.

Terminate the processing of a block, as if it were a successful match. What bindings emerge may depend on the kind of block: collect has special semantics. Blocks are described in the section Blocks below.

Indicates the start of a try block, which is related to exception handling, described in the Exceptions section below.

@(catch) and @(finally)
Special clauses within @(try). See Exceptions below.

@(defex) and @(throw)
Define custom exception types; throw an exception. See Exceptions below.

The assert directive requires the following material to match, otherwise it throws an exception. It is useful for catching mistakes or omissions in parts of a query that are sure-fire matches.

Normalizes a set of specified variables to one-dimensional lists. Those variables which have scalar value are reduced to lists of that value. Those which are lists of lists (to an arbitrary level of nesting) are converted to flat lists of their leaf values.

Binds a new variable which is the result of merging two or more other variables. Merging has somewhat complicated semantics.

Decimates a list (any number of dimensions) to a string, by catenating its constituent strings, with an optional separator string between all of the values.

Binds one or more variables against a value using a structural pattern match. A limited form of unification takes place which can cause a match to fail.

Destructively assigns one or more existing variables using a structural pattern, using syntax similar to bind. Assignment to unbound variables triggers an error.

Evaluates an expression in the current binding environment, and then creates new bindings for the variables in the structural pattern. Useful for temporarily overriding variable values in a scope.

Removes variable bindings.

Synonym of @(forget).

A directive which encloses an output clause in the query. An output section does not match text, but produces text. The directives above are not understood in an output clause.

A directive understood within an @(output) section, for repeating multi-line text, with successive substitutions pulled from lists. The directive @(rep) produces iteration over lists horizontally within one line. These directives have a different meaning in matching clauses, providing a shorthand notation for @(collect :vars nil) and @(coll :vars nil), respectively.

The deffilter directive is used for defining named filters, which are useful for filtering variable substitutions in output blocks. Filters are useful when data must be translated between different representations that have different special characters or other syntax, requiring escaping or similar treatment. Note that it is also possible to use a function as a filter. See Function Filters below.

Named filters are stored in the hash table held in the Lisp special variable *filters*.

The filter directive passes one or more variables through a given filter or chain or filters, updating them with the filtered values.

@(load) and @(include)
The load and include directives allow TXR programs to be modularized. They bring in code from a file, in two different ways.

The do directive is used to evaluate TXR Lisp expressions, discarding their result values. See the TXR LISP section far below.

The mdo (macro do) directive evaluates TXR Lisp expressions immediately, during the parsing of the TXR syntax in which it occurs.

The in-package directive is used to switch to a different symbol package. It mirrors the TXR Lisp macro of the same name.


7.2 Subexpression Evaluation

Some directives contain subexpressions which are evaluated. Two distinct styles of evaluations occur in TXR: bind expressions and Lisp expressions. Which semantics applies to an expression depends on the syntactic context in which it occurs: which position in which directive.

The evaluation of TXR Lisp expressions is described in the TXR LISP section of the manual.

Bind expressions are so named because they occur in the @(bind) directive. TXR pattern function invocations also treat argument expressions as bind expressions.

The @(rebind), @(set), @(merge), and @(deffilter) directives also use bind expression evaluation. Bind expression evaluation also occurs in the argument position of the :tlist keyword in the @(next) directive.

Unlike Lisp expressions, bind expressions do not support operators. If a bind expression is a nested list structure, it is a template denoting that structure. Any symbol in any position of that structure is interpreted as a variable. When the bind expression is evaluated, those corresponding positions in the template are replaced by the values of the variables.

Anywhere where a variable can appear in a bind expression's nested list structure, a Lisp expression can appear preceded by the @ character. That Lisp expression is evaluated and its value is substituted into the bind expression's template.

Moreover, a Lisp expression preceded by @ can be used as an entire bind expression. The value of that Lisp expression is then taken as the bind expression value.

Any object in a bind expression which is not a nested list structure containing Lisp expressions or variables denotes itself literally.


In the following examples, the variables a and b are assumed to have the string values "foo" and "bar", respectively.

The -> notation indicates the value of each expression.

  a              ->  "foo"
  (a b)          ->  ("foo" "bar")
  ((a) ((b) b))  ->  (("foo") (("bar") "bar"))
  (list a b)     ->  error: unbound variable list
  @(list a b)    ->  ("foo" "bar") ;; Lisp expression
  (a @[b 1..:])  ->  ("foo" "ar")  ;; Lisp eval of [b 1..:]
  (a @(+ 2 2))   ->  ("foo" 4)     ;; Lisp eval of (+ 2 2)
  #(a b)         ->  (a b)         ;; Vector literal, not list.
  [a b]          ->  error: unbound variable dwim

The last example above [a b] is a notation equivalent to (dwim a b) and so follows similarly to the example involving list.


7.3 Input Scanning and Data Manipulation


7.3.1 The next directive

The next directive indicates that the remaining directives in the current block are to be applied against a new input source.

It can only occur by itself as the only element in a query line, and takes various arguments, according to these possibilities:

source :nothrow)
  @(next :args)
  @(next :env)
  @(next :list
  @(next :tlist
  @(next :string
  @(next :var
  @(next nil)

The lone @(next) without arguments specifies that subsequent directives will match inside the next file in the argument list which was passed to TXR on the command line.

If source is given, it must be a TXR Lisp expression which denotes an input source. Its value may be a string or an input stream. For instance, if variable A contains the text "data", then @(next A) means switch to the file called "data", and @(next `@A.txt`) means to switch to the file "data.txt". The directive @(next (open-command `git log`)) switches to the input stream connected to the output of the git log command.

If the input source cannot be opened for whatever reason, TXR throws an exception (see Exceptions below). An unhandled exception will terminate the program. Often, such a drastic measure is inconvenient; if @(next) is invoked with the :nothrow keyword, then if the input source cannot be opened, the situation is treated as a simple match failure.

The variant @(next :args) means that the remaining command line arguments are to be treated as a data source. For this purpose, each argument is considered to be a line of text. The argument list does include that argument which specifies the file that is currently being processed or was most recently processed. As the arguments are matched, they are consumed. This means that if a @(next) directive without arguments is executed in the scope of @(next :args), it opens the file named by the first unconsumed argument.

To process arguments, and then continue with the original file and argument list, wrap the argument processing in a @(block). When the block terminates, the input source and argument list are restored to what they were before the block.

The variant @(next :env) means that the list of process environment variables is treated as a source of data. It looks like a text file stream consisting of lines of the form "name=value". If this feature is not available on a given platform, an exception is thrown.

The syntax @(next :list lisp-expr) treats TXR Lisp expression lisp-expr as a source of text. The value of lisp-expr is flattened to a simple list in a way similar to the @(flatten) directive. The resulting list is treated as if it were the lines of a text file: each element of the list must be a string, which represents a line. If the strings happen contain embedded newline characters, they are a visible constituent of the line, and do not act as line separators.

The syntax @(next :tlist bind-expr) is very similar to @(next :list ...) except that bind-expr is not a TXR Lisp expression, but a TXR bind expression.

The syntax @(next :var var) requires var to be a previously bound variable. The value of the variable is retrieved and treated like a list, in the same manner as under @(next :list ...). Note that @(next :var x) is not always the same as @(next :tlist x), because :var x strictly requires x to be a TXR variable, whereas the x in :tlist x is an expression which can potentially refer to Lisp variable.

The syntax @(next :string lisp-expr) treats expression lisp-expr as a source of text. The value of the expression must be a string. Newlines in the string are interpreted as line terminators.

A string which is not terminated by a newline is tolerated, so that:

  @(next :string "abc")

binds a to "abc". Likewise, this is also the case with input files and other streams whose last line is not terminated by a newline.

However, watch out for empty strings, which are analogous to a correctly formed empty file which contains no lines:

  @(next :string "")

This will not bind a to ""; it is a matching failure. The behavior of :list is different. The query

  @(next :list "")

binds a to "". The reason is that under :list the string "" is flattened to the list ("") which is not an empty input stream, but a stream consisting of one empty line.

The @(next nil) variant indicates that the following subquery is applied to empty data, and the list of data sources from the command line is considered empty. This directive is useful in front of TXR code which doesn't process data sources from the command line, but takes command line arguments. The @(next nil) incantation absolutely prevents TXR from trying to open the first command line argument as a data source.

Note that the @(next) directive only redirects the source of input over the scope of subquery in which the that directive appears. For example, the following query looks for the line starting with "xyz" at the top of the file "foo.txt", within a some directive. After the @(end) which terminates the @(some), the "abc" is matched in the previous input stream which was in effect before the @(next) directive:

  @(next "foo.txt")

However, if the @(some) subquery successfully matched "xyz@suffix" within the file foo.text, there is now a binding for the suffix variable, which is visible to the remainder of the entire query. The variable bindings survive beyond the clause, but the data stream does not.


7.3.2 The skip directive

The skip directive considers the remainder of the query as a search pattern. The remainder is no longer required to strictly match at the current line in the current input stream. Rather, the current stream is searched, starting with the current line, for the first line where the entire remainder of the query will successfully match. If no such line is found, the skip directive fails. If a matching position is found, the remainder of the query is processed from that point.

Of course, the remainder of the query can itself contain skip directives. Each such directive performs a recursive subsearch.

Skip comes in vertical and horizontal flavors. For instance, skip and match the last line:


Skip and match the last character of the line:

  @(skip)@{last 1}@(eol)

The skip directive has two optional arguments, which are evaluated as TXR Lispexpressions. If the first argument evaluates to an integer, its value limits the range of lines scanned for a match. Judicious use of this feature can improve the performance of queries.

Example: scan until "size: @SIZE" matches, which must happen within the next 15 lines:

  @(skip 15)
  size: @SIZE

Without the range limitation skip will keep searching until it consumes the entire input source. In a horizontal skip, the range-limiting numeric argument is expressed in characters, so that

  abc@(skip 5)def

means: there must be a match for "abc" at the start of the line, and then within the next five characters, there must be a match for "def".

Sometimes a skip is nested within a collect, or following another skip. For instance, consider:

  begin @BEG_SYMBOL

The above collect iterates over the entire input. But, potentially, so does the embedded skip. Suppose that "begin x" is matched, but the data has no matching "end x". The skip will search in vain all the way to the end of the data, and then the collect will try another iteration back at the beginning, just one line down from the original starting point. If it is a reasonable expectation that an end x occurs 15 lines of a "begin x", this can be specified instead:

  begin @BEG_SYMBOL
  @(skip 15)

If the symbol nil is used in place of a number, it means to scan an unlimited range of lines; thus, @(skip nil) is equivalent to @(skip).

If the symbol :greedy is used, it changes the semantics of the skip to longest match semantics. For instance, match the last three space-separated tokens of the line:

  @(skip :greedy) @a @b @c

Without :greedy, the variable @c will can match multiple tokens, and end up with spaces in it, because nothing follows @c and so it matches from any position which follows a space to the end of the line. Also note the space in front of @a. Without this space, @a will get an empty string.

A line oriented example of greedy skip: match the last line without using @eof:

  @(skip :greedy)

There may be a second numeric argument. This specifies a minimum number of lines to skip before looking for a match. For instance, skip 15 lines and then search indefinitely for begin ...:

  @(skip nil 15)
  begin @BEG_SYMBOL

The two arguments may be used together. For instance, the following matches if, and only if, the 15th line of input starts with begin :

  @(skip 1 15)
  begin @BEG_SYMBOL

Essentially, @(skip 1 n) means "hard skip by n lines". @(skip 1 0) is the same as @(skip 1), which is a noop, because it means: "the remainder of the query must match starting on the very next line", or, more briefly, "skip exactly zero lines", which is the behavior if the skip directive is omitted altogether.

Here is one trick for grabbing the fourth line from the bottom of the input:

  @(skip 1 3)

Or using greedy skip:

  @(skip :greedy)
  @(skip 1 3)

Nongreedy skip with the @(eof) has a slight advantage because the greedy skip will keep scanning even though it has found the correct match, then backtrack to the last good match once it runs out of data. The regular skip with explicit @(eof) will stop when the @(eof) matches.


7.3.3 Reducing Backtracking with Blocks

skip can consume considerable CPU time when multiple skips are nested. Consider:


This is actually nesting: the second a third skips occur within the body of the first one, and thus this creates nested iteration. TXR is searching for the combination of skips which find match the pattern of lines A, B and C, with backtracking behavior. The outermost skip marches through the data until it finds A, followed by a pattern match for the second skip. The second skip iterates within to find B, followed by the third skip, and the third skip iterates to find C. If there is only one line A, and one B, then this is reasonably fast. But suppose there are many lines matching A and B, giving rise to a large number combinations of skips which match A and B, and yet do not find a match for C, triggering backtracking. The nested stepping which tries the combinations of A and B can give rise to a considerable running time.

One way to deal with the problem is to unravel the nesting with the help of blocks. For example:

  @  (skip)
  @  (skip)

Now the scope of each skip is just the remainder of the block in which it occurs. The first skip finds A, and then the block ends. Control passes to the next block, and backtracking will not take place to a block which completed (unless all these blocks are enclosed in some larger construct which backtracks, causing the blocks to be re-executed.

Of course, this rewrite is not equivalent, and cannot be used for instance in backreferencing situations such as:

  @; Find three lines anywhere in the input which are identical.

This example depends on the nested search-within-search semantics.


7.3.4 The trailer directive

The trailer directive introduces a trailing portion of a query or subquery which matches input material normally, but in the event of a successful match, does not advance the current position. This can be used, for instance, to cause @(collect) to match partially overlapping regions.

Trailer can be used in vertical context:


or horizontal:

directives ...

A vertical trailer prevents the vertical input position from advancing as it is matched by directives, whereas a horizontal trailer prevents the horizontal position from advancing. In other words, trailer performs matching without consuming the input, providing a look-ahead mechanism.



This script collects each line which has a duplicate somewhere later in the input. Without the @(trailer) directive, this does not work properly for inputs like:


Without @(trailer), the first duplicate pair constitutes a match which spans over the 222. After that pair is found, the matching continues after the second 111.

With the @(trailer) directive in place, the collect body, on each iteration, only consumes the lines matched prior to @(trailer).


7.3.5 The freeform directive

The freeform directive provides a useful alternative to TXR's line-oriented matching discipline. The freeform directive treats all remaining input from the current input source as one big line. The query line which immediately follows freeform is applied to that line.

The syntax variations are:

  ... query line ..

  ... query line ..

  ... query line ..

number string)
  ... query line ..

where number and string denote TXR Lisp expressions which evaluate to an integer or string value, respectively.

If number and string are both present, they may be given in either order.

If the number argument is given, its value limits the range of lines which are combined together. For instance @(freeform 5) means to only consider the next five lines to to be one big line. Without this argument, freeform is "bottomless". It can match the entire file, which creates the risk of allocating a large amount of memory.

If the string argument is given, it specifies a custom line terminator. The default terminator is "\n". The terminator does not have to be one character long.

Freeform does not convert the entire remainder of the input into one big line all at once, but does so in a dynamic, lazy fashion, which takes place as the data is accessed. So at any time, only some prefix of the data exists as a flat line in which newlines are replaced by the terminator string, and the remainder of the data still remains as a list of lines.

After the subquery is applied to the virtual line, the unmatched remainder of that line is broken up into multiple lines again, by looking for and removing all occurrences of the terminator string within the flattened portion.

Care must be taken if the terminator is other than the default "\n". All occurrences of the terminator string are treated as line terminators in the flattened portion of the data, so extra line breaks may be introduced. Likewise, in the yet unflattened portion, no breaking takes place, even if the text contains occurrences of the terminator string. The extent of data which is flattened, and the amount of it which remains, depends entirely on the query line underneath @(flatten).

In the following example, lines of data are flattened using $ as the line terminator.

 @(freeform "$")


output (-B):

The data is turned into the virtual line 1$2:3$4$. The @a$@b: subquery matches the 1$2: portion, binding a to "1", and b to "2". The remaining portion 3$4$ is then split into separate lines again according to the line terminator $i:


Thus the remainder of the query


faces these lines, binding c to 3 and d to 4. Note that since the data does not contain dollar signs, there is no ambiguity; the meaning may be understood in terms of the entire data being flattened and split again.

In the following example, freeform is used to solve a tokenizing problem. The Unix password file has fields separated by colons. Some fields may be empty. Using freeform, we can join the password file using ":" as a terminator. By restricting freeform to one line, we can obtain each line of the password file with a terminating ":", allowing for a simple tokenization, because now the fields are colon-terminated rather than colon-separated.


  @(next "/etc/passwd")
  @(freeform 1 ":")
  @(coll)@{token /[^:]*/}:@(end)


7.3.6 The fuzz directive

The fuzz directive allows for an imperfect match spanning a set number of lines. It takes two arguments, both of which are TXR Lisp expressions that should evaluate to integers:

@(fuzz m n)

This expresses that over the next n query lines, the matching strictness is relaxed a little bit. Only m out of those n lines have to match. Afterward, the rest of the query follows normal, strict processing.

In the degenerate situation that there are fewer than n query lines following the fuzz directive, then m of them must succeed nevertheless. (If there are fewer than m, then this is impossible.)


7.3.7 The line and chr directives

The line and chr directives perform binding between the current input line number or character position within a line, against an expression or variable:

  @(line 42)
  @(line x)
  abc@(chr 3)def@(chr y)

The directive @(line 42) means "match the current input line number against the integer 42". If the current line is 42, then the directive matches, otherwise it fails. line is a vertical directive which doesn't consume a line of input. Thus, the following matches at the beginning of an input stream, and x ends up bound to the first line of input:

  @(line 1)
  @(line 1)
  @(line 1)

The directive @(line x) binds variable x to the current input line number, if x is an unbound variable. If x is already bound, then the value of x must match the current line number, otherwise the directive fails.

The chr directive is similar to line except that it's a horizontal directive, and matches the character position rather than the line position. Character positions are measured from zero, rather than one. chr does not consume a character. Hence the two occurrences of chr in the following example both match, and x takes the entire line of input:

  @(chr 0)@(chr 0)@x

The argument of line or chr may be a @-delimited Lisp expression. This is useful for matching computed lines or character positions:

  @(line @(+ a (* b c)))


7.3.8 The name directive

The name directive performs a binding between the name of the current data source and a variable or bind expression:

  @(name na)
  @(name "data.txt")

If na is an unbound variable, it is bound and takes on the name of the data source, such as a file name. If na is bound, then it has to match the name of the data source, otherwise the directive fails.

The directive @(name "data.txt") fails unless the current data source has that name.


7.3.9 The data directive

The data directive performs a binding between the unmatched data at the current position, and and a variable or bind expression. The unmatched data takes the form of a list of strings:

  @(data d)

The binding is performed on object equality. If d is already bound, a matching failure occurs unless d contains the current unmatched data.

Matching the current data has various uses.

For instance, two branches of pattern matching can, at some point, bind the current data into different variables. When those paths join, the variables can be bound together to create the assertion that the current data had been the same at those points:

  @  (skip)
  @  (skip)
  @  (data x)
  @  (skip)
  @  (skip)
  @  (data y)
  @(require (eq x y))

Here, two branches of the @(all) match some material which ends in the line bar. However, it is possible that this is a different line. The data directives are used to create an assertion that the data regions matched by the two branches are identical. That is to say, the unmatched data x captured after the first bar and the unmatched data y captured after the second bar must be the same object in order for @(require (eq x y)) to succeed, which implies that the same bar was matched in both branches of the @(all).

Another use of data is simply to gain access to the trailing remainder of the unmatched input in order to print it, or do some special processing on it.

The tprint Lisp function is useful for printing the unmatched data as newline-terminated lines:

  @(data remainder)
  @(do (tprint remainder))


7.3.10 The some, all, none, maybe, cases and choose directives

These directives, called the parallel directives, combine multiple subqueries, which are applied at the same input position, rather than to consecutive input.

They come in vertical (line mode) and horizontal (character mode) flavors.

In horizontal mode, the current position is understood to be a character position in the line being processed. The clauses advance this character position by moving it to the right. In vertical mode, the current position is understood to be a line of text within the stream. A clause advances the position by some whole number of lines.

The syntax of these parallel directives follows this example:


And in horizontal mode:


Long horizontal lines can be broken up with line continuations, allowing the above example to be written like this, which is considered a single logical line:


The @(some), @(all), @(none), @(maybe), @(cases) or @(choose) must be followed by at least one subquery clause, and be terminated by @(end). If there are two or more subqueries, these additional clauses are indicated by @(and) or @(or), which are interchangeable. The separator and terminator directives also must appear as the only element in a query line.

The choose directive requires keyword arguments. See below.

The syntax supports arbitrary nesting. For example:

  QUERY:            SYNTAX TREE:

  @(all)            all -+
  @  (skip)              +- skip -+
  @  (some)              |        +- some -+
  it                     |        |        +- TEXT
  @  (and)               |        |        +- and
  @    (none)            |        |        +- none -+
  was                    |        |        |        +- TEXT
  @    (end)             |        |        |        +- end
  @  (end)               |        |        +- end
  a dark                 |        +- TEXT
  @(end)                 *- end

nesting can be indicated using whitespace between @ and the directive expression. Thus, the above is an @(all) query containing a @(skip) clause which applies to a @(some) that is followed by the text line "a dark". The @(some) clause combines the text line "it", and a @(none) clause which contains just one clause consisting of the line "was".

The semantics of the parallel directives is:

Each of the clauses is matched at the current position. If any of the clauses fails to match, the directive fails (and thus does not produce any variable bindings). Clauses following the failed directive are not evaluated. Bindings extracted by a successful clause are visible to the clauses which follow, and if the directive succeeds, all of the combined bindings emerge.

@(some [ :resolve (var ...) ])
Each of the clauses is matched at the current position. If any of the clauses succeed, the directive succeeds, retaining the bindings accumulated by the successfully matching clauses. Evaluation does not stop on the first successful clause. Bindings extracted by a successful clause are visible to the clauses which follow.

The :resolve parameter is for situations when the @(some) directive has multiple clauses that need to bind some common variables to different values: for instance, output parameters in functions. Resolve takes a list of variable name symbols as an argument. This is called the resolve set. If the clauses of @(some) bind variables in the resolve set, those bindings are not visible to later clauses. However, those bindings do emerge out of the @(some) directive as a whole. This creates a conflict: what if two or more clauses introduce different bindings for a variable in the resolve set? This is why it is called the resolve set: conflicts for variables in the resolve set are automatically resolved in favor of later directives.


  @(some :resolve (x))
  @  (bind a "a")
  @  (bind x "x1")
  @  (bind b "b")
  @  (bind x "x2")

Here, the two clauses both introduce a binding for x. Without the :resolve parameter, this would mean that the second clause fails, because x comes in with the value "x1", which does not bind with "x2". But because x is placed into the resolve set, the second clause does not see the "x1" binding. Both clauses establish their bindings independently creating a conflict over x. The conflict is resolved in favor of the second clause, and so the bindings which emerge from the directive are:


Each of the clauses is matched at the current position. The directive succeeds only if all of the clauses fail. If any clause succeeds, the directive fails, and subsequent clauses are not evaluated. Thus, this directive never produces variable bindings, only matching success or failure.

Each of the clauses is matched at the current position. The directive always succeeds, even if all of the clauses fail. Whatever bindings are found in any of the clauses are retained. Bindings extracted by any successful clause are visible to the clauses which follow.

Each of the clauses is matched at the current position. The clauses are matched, in order, at the current position. If any clause matches, the matching stops and the bindings collected from that clause are retained. Any remaining clauses after that one are not processed. If no clause matches, the directive fails, and produces no bindings.

@(choose [ :longest var | :shortest var ])
Each of the clauses is matched at the current position in order. In this construct, bindings established by an earlier clause are not visible to later clauses. Although any or all of the clauses can potentially match, the clause which succeeds is the one which maximizes or minimizes the length of the text bound to the specified variable. The other clauses have no effect.

For all of the parallel directives other than @(none) and @(choose), the query advances the input position by the greatest number of lines that match in any of the successfully matching subclauses that are evaluated. The @(none) directive does not advance the input position.

For instance if there are two subclauses, and one of them matches three lines, but the other one matches five lines, then the overall clause is considered to have made a five line match at its position. If more directives follow, they begin matching five lines down from that position.


7.3.11 The require directive

The syntax of @(require) is:


The require directive evaluates a TXR Lisp expression. (See TXR LISP far below.) If the expression yields a true value, then it succeeds, and matching continues with the directives which follow. Otherwise the directive fails.

In the context of the require directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.


  @; require that 4 is greater than 3
  @; This succeeds; therefore, @a is processed
  @(require (> (+ 2 2) 3))


7.3.12 The if directive

The if directive allows for conditional selection of pattern matching clauses, based on the Boolean results Lisp expressions.

The syntax of the if directive can be exemplified as follows:


The @(elif) and @(else) clauses are all optional. If @(else) is present, it must be last, before @(end), after any @(elif) clauses. Any of the clauses may be empty.


  @(if (> (length str) 42))
  foo: @a @b

In this example, if the length of the variable str is greater than 42, then matching continues with "foo: @a b", otherwise it proceeds with {@c}.

More precisely, how the if directive works is as follows. The Lisp expressions are evaluated in order, starting with the if expression, then the elif expressions if any are present. If any Lisp expression yields a true result (any value other than nil) then evaluation of Lisp expressions stops. The corresponding clause of that Lisp expression is selected and pattern matching continues with that clauses. The result of that clause (its success or failure, and any newly bound variables) is then taken as the result of the if directive. If none of the Lisp expressions yield true, and an else clause is present, then that clause is processed and its result determines the result of the if directive. If none of the Lisp expressions yield true, and there is no else clause, then the if directive is deemed to have trivially succeeded, allowing matching to continue with whatever directive follows it.


7.3.13 The gather directive

Sometimes text is structured as items that can appear in an arbitrary order. When multiple matches need to be extracted, there is a combinatorial explosion of possible orders, making it impractical to write pattern matches for all the possible orders.

The gather directive is for these situations. It specifies multiple clauses which all have to match somewhere in the data, but in any order.

For further convenience, the lines of the first clause of the gather directive are implicitly treated as separate clauses.

The syntax follows this pattern


Of course the multi-line clauses are optional. The gather directive takes keyword parameters, see below.


7.3.14 The until / last clause in gather

Similarly to collect, gather has an optional until/last clause:


How gather works is that the text is searched for matches for the single line and multi-line queries. The clauses are applied in the order in which they appear. Whenever one of the clauses matches, any bindings it produces are retained and it is removed from further consideration. Multiple clauses can match at the same text position. The position advances by the longest match from among the clauses which matched. If no clauses match, the position advances by one line. The search stops when all clauses are eliminated, and then the cumulative bindings are produced. If the data runs out, but unmatched clauses remain, the directive fails.

Example: extract several environment variables, which do not appear in a particular order:

  @(next :env)

If the until or last clause is present and a match occurs, then the matches from the other clauses are discarded and the gather terminates. The difference between until/last is that any bindings bindings established in last are retained, and the input position is advanced past the matched material. The until/last clause has visibility to bindings established in the previous clauses in that same iteration, even though those bindings end up thrown away.

For consistency, the :mandatory keyword is supported in the until/last clause of gather. The semantics of using :mandatory in this situation is tricky. In particular, if it is in effect, and the gather terminates successfully by collecting all required matches, it will trigger a failure. On the other hand, if the until or last clause activates before all required matches are gathered, a failure also occurs, whether or not the clause is :mandatory.

Meaningful use of :mandatory requires that the gather be open-ended; it must allow some (or all) variables not to be required. The presence of the option means that for the gather to succeed, all required variables must be gathered first, but then termination must be achieved via the until/last clause before all gather clauses are satisfied.


7.3.15 Keyword parameters in gather

The gather directive accepts the keyword parameter :vars. The argument to :vars is a list of required and optional variables. A required variable is specified as a symbol. An optional variable is specified as a two element list which pairs a symbol with a Lisp expression. That Lisp expression is evaluated and specifies the default value for the variable.


  @(gather :vars (a b c (d "foo")))

Here, a, b and c are required variables, and d is optional, with the default value given by the Lisp expression "foo".

The presence of :vars changes the behavior in three ways.

Firstly, even if all the clauses in the gather match successfully and are eliminated, the directive will fail if the required variables do not have bindings. It doesn't matter whether the bindings are existing, or whether they are established by the gather.

Secondly, if some of the clauses of the gather did not match, but all of the required variables have bindings, then the directive succeeds. Without the presence of :vars, it would fail in this situation.

Thirdly, if gather succeeds (all required variables have bindings), then all of the optional variables which do not have bindings are given bindings to their default values.

The expressions which give the default values are evaluated whenever the gather directive is evaluated, whether or not their values are used.


7.3.16 The collect directive

The syntax of the collect directive is:

  ... lines of subquery

or with an until or last clause:

  ... lines of subquery: main clause
  ... lines of subquery: until clause

  ... lines of subquery: main clause
  ... lines of subquery: last clause

The repeat symbol may be specified instead of collect, which changes the meaning, see below:

  ... lines of subquery

The subquery is matched repeatedly, starting at the current line. If it fails to match, it is tried starting at the subsequent line. If it matches successfully, it is tried at the line following the entire extent of matched data, if there is one. Thus, the collected regions do not overlap. (Overlapping behavior can be obtained: see the @(trailer) directive).

Unless certain keywords are specified, or unless the collection is explicitly failed with @(fail), it always succeeds, even if it collects nothing, and even if the until/last clause never finds a match.

If no until/last last clause is specified, and the collect is not limited using parameters, the collection is unbounded: it consumes the entire data file. If any query material follows such the collect clause, it will fail if it tries to match anything in the current file; but of course, it is possible to continue matching in another file by means of @(next).


7.3.17 The until / last clause in collect

If an until/last last clause is specified, the collection stops when that clause matches at the current position.

If an until clause terminates collect, no bindings are collected at that position, even if the main clause matches at that position also. Moreover, the position is not advanced. The remainder of the query begins matching at that position.

If a last clause terminates collect, the behavior is different. Any bindings captured by the main clause are thrown away, just like with the until clause. However, the bindings in the last clause itself survive, and the position is advanced to skip over that material.



The line 42 is not collected, even though it matches @a. Furthermore, the @(until) does not advance the position, so variable c takes 42.

If the @(until) is changed to @(last) the output will be different:


The 42 is not collected into the a list, just like before. But now the binding captured by @b emerges. Furthermore, the position advances so variable now takes 6.

The binding variables within the clause of a collect are treated specially. The multiple matches for each variable are collected into lists, which then appear as array variables in the final output.



The query matches the data in three places, so each variable becomes a list of three elements, reported as an array.

Variables with list bindings may be referenced in a query. They denote a multiple match. The -D command line option can establish a one-dimensional list binding.

The clauses of collect may be nested. Variable matches collated into lists in an inner collect, are again collated into nested lists in the outer collect. Thus an unbound variable wrapped in N nestings of @(collect) will be an N-dimensional list. A one dimensional list is a list of strings; a two dimensional list is a list of lists of strings, etc.

It is important to note that the variables which are bound within the main clause of a collect. That is, the variables which are subject to collection appear, within the collect, as normal one-value bindings. The collation into lists happens outside of the collect. So for instance in the query:


The left @x establishes a binding for some material preceding an equal sign. The right @x refers to that binding. The value of @x is different in each iteration, and these values are collected. What finally comes out of the collect clause is a single variable called x which holds a list containing each value that was ever instantiated under that name within the collect clause.

Also note that the until clause has visibility over the bindings established in the main clause. This is true even in the terminating case when the until clause matches, and the bindings of the main clause are discarded.


7.3.18 Keyword parameters in collect

By default, collect searches the rest of the input indefinitely, or until the until/last clause matches. It skips arbitrary amounts of nonmatching material before the first match, and between matches.

Within the @(collect) syntax, it is possible to specify keyword parameters for additional control of the behavior. A keyword parameter consist of a keyword symbol followed by an argument, enclosed within the @(collect) syntax. The following are the supported keywords.

:maxgap n
The :maxgap keyword takes a numeric argument n, which is a Lisp expression. It causes the collect to terminate if it fails to find a match after skipping n lines from the starting position, or more than n lines since any successful match. For example,

  @(collect :maxgap 5)

specifies that the gap between the current position and the first match for the body of the collect, or between consecutive matches can be no longer than five lines. A :maxgap value of 0 means that the collected regions must be adjacent and must match right from the starting position. For instance:

  @(collect :maxgap 0)
  M @a

means: from here, collect consecutive lines of the form "M ...". This will not search for the first such line, nor will it skip lines which do not match this form.

:mingap n
The :mingap keyword complements :maxgap, though not exactly. Its argument n, a Lisp expression, specifies a minimum number of lines which must separate consecutive matches. However, it has no effect on the distance from the starting position to the first match.

:gap n
The :gap keyword effectively specifies :mingap and :maxgap at the same time, and can only be used if these other two are not used. Thus:

  @(collect :gap 1)

means: collect every other line starting with the current line.

:times n
This shorthand means the same thing as if
:mintimes n :maxtimes n
were specified. This means that exactly n matches must occur. If fewer occur, then the collect fails. Collect stops once it achieves n matches.

:mintimes n
The argument n of the :mintimes keyword is a Lisp expression which specifies that at least n matches must occur, or else the collect fails.

:mintimes n
The Lisp argument expression n of the :mintimes keyword specifies that at most n matches are collected.

:lines n
The argument n of the :lines keyword parameter is a Lisp expression which specifies the upper bound on how many lines should be scanned by collect, measuring from the starting position. The extent of the collect body is not counted. Example:

  @(collect :lines 2)
  foo: @a
  bar: @b
  baz: @c

The above collect will look for a match only twice: at the current position, and one line down.

:vars ({variable | (variable default-value)}*)
The :vars keyword specifies a restriction on what variables will emanate from the collect. Its argument is a list of variable names. An empty list may be specified using empty parentheses or, equivalently, the symbol nil. The default-value element of the syntax is a Lisp expression. The behavior of the :vars keyword is specified in the following section, "Specifying variables in collect".

:lists (variable*)
The :lists keyword indicates a list of variables. After the collect terminates, each variable in the list which does not have a binding is bound to the empty list symbol nil. Unlike :vars the :lists mechanism doesn't assert that only the listed variables may emanate from the collect. It also doesn't assert that each iteration of the collect must bind each of those variables.

:counter {variable | (variable starting-value)}
The :counter keyword's argument is a variable name symbol, or a compound expression consisting of a variable name symbol and the TXR Lisp expression starting-value. If this keyword argument is specified, then a binding for variable is established prior to each repetition of the collect body, to an integer value representing the repetition count. By default, repetition counts begin at zero. If starting-value is specified, it must evaluate to a number. This number is then added to each repetition count, and variable takes on the resulting displaced value.

If there is an existing binding for variable prior to the processing of the collect, then the variable is shadowed.

The binding is collected in the same way as other bindings that are established in the collect body.

The repetition count only increments after a successful match.

The variable is visible to the collect's until/last clause. If that clause is being processed after a successful match of the body, then variable holds an integer value. If the body fails to match, then the until/last clause sees a binding for variable with a value of nil.


7.3.19 Specifying variables in collect

Normally, any variable for which a new binding occurs in a collect block is collected. A collect clause may be "sloppy": it can neglect to collect some variables on some iterations, or bind some variables which are intended to behave like local temporaries, but end up collated into lists. Another issue is that the collect clause might not match anything at all, and then none of the variables are bound.

The :vars keyword allows the query writer to add discipline the collect body.

The argument to :vars is a list of variable specs. A variable spec is either a symbol, or a (symbol default-value) pair, where default-value is a Lisp expression whose value specifies a default value for the variable.

When a :vars list is specified, it means that only the given variables can emerge from the successful collect. Any newly introduced bindings for other variables do not propagate.

Furthermore, for any variable which is not specified with a default value, the collect body, whenever it matches successfully, must bind that variable. If it neglects to bind the variable, an exception of type query-error is thrown. (If a collect body matches successfully, but produces no new bindings, then this error is suppressed.)

For any variable which does have a default value, if the collect body neglects to bind that variable, the behavior is as if collect did bind that variable to that default value.

The default values are expressions, and so can be quasiliterals.

Lastly, if in the event that collect does not match anything, the variables specified in :vars (whether or not they have a default value) are all bound to empty lists. (These bindings are established after the processing of the until/last last clause, if present.)


  @(collect :vars (a b (c "foo")))
  @a @c

Here, if the body "@a @c" matches, an error will be thrown because one of the mandatory variables is b, and the body neglects to produce a binding for b.


  @(collect :vars (a (c "foo")))
  @a @b

Here, if "@a @b" matches, only a will be collected, but not b, because b is not in the variable list. Furthermore, because there is no binding for c in the body, a binding is created with the value "foo", exactly as if c matched such a piece of text.

In the following example, the assumption is that THIS NEVER MATCHES is not found anywhere in the input but the line THIS DOES MATCH is found and has a successor which is bound to a. Because the body did not match, the :vars a and b should be bound to empty lists. But a is bound by the last clause to some text, so this takes precedence. Only b is bound to an empty list.

  @(collect :vars (a b))

The following means: do not allow any variables to propagate out of any iteration of the collect and therefore collect nothing:

  @(collect :vars nil)

Instead of writing @(collect :vars nil), it is possible to write @(repeat). @(repeat) takes all collect keywords, except for :vars. There is a @(repeat) directive used in @(output) clauses; that is a different directive.


7.3.20 Mandatory until and last

The until/last clause supports the option keyword :mandatory, exemplified by the following:

  @(last :mandatory)

This means that the collect must be terminated by a match for the until/last clause, or else by an explicit @(accept).

Specifically, the collect cannot terminate due to simply running out of data, or exceeding a limit on the number of matches that may be collected. In those situations, if an until or last clause is present with :mandatory, the collect is deemed to have failed.


7.3.21 The coll directive

The coll directive is the horizontal version of collect. Whereas collect works with multi-line clauses on line-oriented material, coll works within a single line. With coll, it is possible to recognize repeating regularities within a line and collect lists.

Regular-expression based Positive Match variables work well with coll.

Example: collect a comma-separated list, terminated by a space.

 @(coll)@{A /[^, ]+/}@(until) @(end)@B
 foo,bar,xyzzy blorch

Here, the variable A is bound to tokens which match the regular expression /[^, ]+/: non-empty sequence of characters other than commas or spaces.

Like collect, coll searches for matches. If no match occurs at the current character position, it tries at the next character position. Whenever a match occurs, it continues at the character position which follows the last character of the match, if such a position exists.

If not bounded by an until clause, it will exhaust the entire line. If the until clause matches, then the collection stops at that position, and any bindings from that iteration are discarded. Like collect, coll also supports an until/last clause, which propagates variable bindings and advances the position. The :mandatory keyword is supported.

coll clauses nest, and variables bound within a coll are available to clauses within the rest of the coll clause, including the until/last clause, and appear as single values. The final list aggregation is only visible after the coll clause.

The behavior of coll leads to difficulties when a delimited variable are used to match material which is delimiter separated rather than terminated. For instance, entries in a comma-separated files usually do not appear as "a,b,c," but rather "a,b,c".

So for instance, the following result is not satisfactory:

 @(coll)@a @(end)
 1 2 3 4 5

The 5 is missing because it isn't followed by a space, which the text-delimited variable match "@a " looks for. After matching "4 ", coll continues to look for matches, and doesn't find any. It is tempting to try to fix it like this:

 @(coll)@a@/ ?/@(end)
 1 2 3 4 5

The problem now is that the regular expression / ?/ (match either a space or nothing), matches at any position. So when it is used as a variable delimiter, it matches at the current position, which binds the empty string to the variable, the extent of the match being zero. In this situation, the coll directive proceeds character by character. The solution is to use positive matching: specify the regular expression which matches the item, rather than a trying to match whatever follows. The collect directive will recognize all items which match the regular expression:

 @(coll)@{a /[^ ]+/}@(end)
 1 2 3 4 5

The until clause can specify a pattern which, when recognized, terminates the collection. So for instance, suppose that the list of items may or may not be terminated by a semicolon. We must exclude the semicolon from being a valid character inside an item, and add an until clause which recognizes a semicolon:

 @(coll)@{a /[^ ;]+/}@(until);@(end);
 1 2 3 4 5;

Whether followed by the semicolon or not, the items are collected properly.

Note that the @(end) is followed by a semicolon. That's because when the @(until) clause meets a match, the matching material is not consumed.

This repetition can, of course, be avoided by using @(last) instead of @(until) since @(last) consumes the terminating material.

Instead of the above regular-expression-based approach, this extraction problem can also be solved with cases:

 @(coll)@(cases)@a @(or)@a@(end)@(end)
 1 2 3 4 5


7.3.22 Keyword parameters in coll

The @(coll) directive takes most of the same parameters as @(collect). See the section Keyword parameters in collect above. So for instance @(coll :gap 0) means that the collects must be consecutive, and @(coll :maxtimes 2) means that at most two matches will be collected. The :lines keyword does not exist, but there is an analogous :chars keyword.

The @(coll) directive takes the :vars keyword.

The shorthand @(rep) may be used instead of @(coll :vars nil). @(rep) takes all keywords, except :vars.


7.3.23 The flatten directive

The flatten directive can be used to convert variables to one dimensional lists. Variables which have a scalar value are converted to lists containing that value. Variables which are multidimensional lists are flattened to one-dimensional lists.

Example (without @(flatten))


Example (with @(flatten)):

 @(flatten a b)


7.3.24 The merge directive

The syntax of merge follows the pattern:

@(merge destination [sources ...])

destination is a variable, which receives a new binding. sources are bind expressions.

The merge directive provides a way of combining two or more variables or expressions in a somewhat complicated but very useful way. A new binding is created for the destination variable, which holds the result of the operation.

This directive is useful for combining the results from collects at different levels of nesting into a single nested list such that parallel elements are at equal depth.

The merge directive performs its special function if invoked with at least three arguments: a destination and two sources.

The one-argument case @(merge x) binds a new variable x and initializes it with the empty list and is thus equivalent to @(bind x). Likewise, the two-argument case @(merge x y) is equivalent to @(bind x y), establishing a binding for x which is initialized with the value of y.

To understand what merge does when two sources are given, as in @(merge C A B), we first have to define a property called depth. The depth of an atom such as a string is defined as 1. The depth of an empty list is 0. The depth of a nonempty list is one plus the depth of its deepest element. So for instance "foo" has depth 1, ("foo") has depth 2, and ("foo" ("bar")) has depth three.

We can now define a binary (two argument) merge(A, B) function as follows. First, merge(A, B) normalizes the values A and B to produce a pair of values which have equal depth, as defined above. If either value is an atom it is first converted to a one-element list containing that atom. After this step, both values are lists; and the only way an argument has depth zero is if it is an empty list. Next, if either value has a smaller depth than the other, it is wrapped in a list as many times as needed to give it equal depth. For instance if A is (a) and B is (((("b" "c") ("d" "e)))) then A is converted to (((("a")))). Finally, the list values are appended together to produce the merged result. In the case of the preceding two example values, the result is: (((("a"))) ((("b" "c") ("d" "e)))). The result is stored into a the newly bound destination variable C.

If more than two source arguments are given, these are merged by a left-associative reduction, which is to say that a three argument merge(X, Y, Z) is defined as merge(merge(X, Y), Z). The leftmost two values are merged, and then this result is merged with the third value, and so on.


7.3.25 The cat directive

The cat directive converts a list variable into a single piece of text. The syntax is:

var [sep])

The sep argument is a Lisp expression whose value specifies a separating piece of text. If it is omitted, then a single space is used as the separator.


 @(coll)@{a /[^ ]+/}@(end)
 @(cat a ":")
 1 2 3 4 5


7.3.26 The bind directive

The syntax of the bind directive is:

pattern bind-expression {keyword value}*)

The bind directive is a kind of pattern match, which matches one or more variables given in pattern against a value produced by the bind-expression on the right.

Variables names occurring in the pattern expression may refer to bound variables, or may be unbound.

All variables references occurring in bind-expression must have value.

Binding occurs as follows. The tree structure of pattern and the value of bind-expression are considered to be parallel structures.

Any variables in pattern which are unbound receive a new binding, which is initialized with the structurally corresponding piece of the object produced by bind-expression.

Any variables in pattern which are already bound must match the corresponding part of the value of bind-expression, or else the bind directive fails. Variables which are already bound are not altered, retaining their current values, even if the matching is inexact.

The simplest bind is of one variable against itself, for instance bind A against A:

  @(bind A A)

This will throw an exception if A is not bound. If A is bound, it succeeds, since A matches itself.

The next simplest bind binds one variable to another:

  @(bind A B)

Here, if A is unbound, it takes on the same value as B. If A is bound, it has to match B, or the bind fails. Matching means that either

A and B are the same text
A is text, B is a list, and A occurs within B.
vice versa: B is text, A is a list, and B occurs within A.
A and B are lists and are either identical, or one is found as substructure within the other.

The right hand side does not have to be a variable. It may be some other object, like a string, quasiliteral, regexp, or list of strings, et cetera. For instance

  @(bind A "ab\tc")

will bind the string "ab\tc" to the variable A if A is unbound. If A is bound, this will fail unless A already contains an identical string. However, the right hand side of a bind cannot be an unbound variable, nor a complex expression that contains unbound variables.

The left hand side of bind can be a nested list pattern containing variables. The last item of a list at any nesting level can be preceded by a . (dot), which means that the variable matches the rest of the list from that position.

Example 1:

Suppose that the list A contains ("now" "now" "brown" "cow"). Then the directive @(bind (H N . C) A), assuming that H, N and C are unbound variables, will bind H to "how", code N to "now", and C to the remainder of the list ("brown" "cow").

Example: suppose that the list A is nested to two dimensions and contains (("how" "now") ("brown" "cow")). Then @(bind ((H N) (B C)) A) binds H to "how", N to "now", B to "brown" and C to "cow".

The dot notation may be used at any nesting level. it must be followed by an item. The forms (.) and (X .) are invalid, but (. X) is valid and equivalent to X.

The number of items in a left pattern match must match the number of items in the corresponding right side object. So the pattern () only matches an empty list. The notations () and nil mean exactly the same thing.

The symbols nil, t and keyword symbols may be used on either side. They represent themselves. For example @(bind :foo :bar) fails, but @(bind :foo :foo) succeeds since the two sides denote the same keyword symbol object.

Example 2:

In this example, suppose A contains "foo" and B contains bar. Then @(bind (X (Y Z)) (A (B "hey"))) binds X to "foo", Y to "bar" and Z to "hey". This is because the bind-expression produces the object ("foo" ("bar" "hey")) which is then structurally matched against the pattern (X (Y Z)), and the variables receive the corresponding pieces.


7.3.27 Keywords in the bind directive

The bind directive accepts these keywords:

The argument to :lfilt is a filter specification. When the left side pattern contains a binding which is therefore matched against its counterpart from the right side expression, the left side is filtered through the filter specified by :lfilt for the purposes of the comparison. For example:

  @(bind "a" "A" :lfilt :upcase)

produces a match, since the left side is the same as the right after filtering through the :upcase filter.

The argument to :rfilt is a filter specification. The specified filter is applied to the right hand side material prior to matching it against the left side. The filter is not applied if the left side is a variable with no binding. It is only applied to determine a match. Binding takes place the unmodified right hand side object.

For example, the following produces a match:

  @(bind "A" "a" :rfilt :upcase)

This keyword is a shorthand to specify both filters to the same value. For instance :filter :upcase is equivalent to :lfilt :upcase :rfilt :upcase.

For a description of filters, see Output Filtering below.

Of course, compound filters like (:fromhtml :upcase) are supported with all these keywords. The filters apply across arbitrary patterns and nested data.


  @(bind (a b c) ("A" "B" "C"))
  @(bind (a b c) (("z" "a") "b" "c") :rfilt :upcase)

Here, the first bind establishes the values for a, b and c, and the second bind succeeds, because the value of a matches the second element of the list ("z" "a") if it is upcased, and likewise b matches "b" and c matches "c" if these are upcased.


7.3.28 Lisp forms in the bind directive

TXR Lisp forms, introduced by @ may be used in the bind-expression argument of bind, or as the entire form. This is consistent with the rules for bind expressions.

TXR Lisp forms can be used in the pattern expression also.


  @(bind a @(+ 2 2))
  @(bind @(+ 2 2) @(* 2 2))

Here, a is bound to the integer 4. The second bind then succeeds because the forms (+ 2 2) and (* 2 2) produce equal values.


7.3.29 The set directive

The set directive syntactically resembles bind, but is not a pattern match. It overwrites the previous values of variables with new values from the right hand side. Each variable that is assigned must have an existing binding: set will not induce binding.

Examples follow.

Store the value of A back into A, an operation with no effect:

  @(set A A)

Exchange the values of A and B:

  @(set (A B) (B A))

Store a string into A:

  @(set A "text")

Store a list into A:

  @(set A ("line1" "line2"))

Destructuring assignment. A ends up with "A", B ends up with ("B1" "B2") and C binds to ("C1" "C2").

  @(bind D ("A" ("B1" "B2") "C1" "C2"))
  @(bind (A B C) (() () ()))
  @(set (A B . C) D)

Note that set does not support a TXR Lisp expression on the left side, so the following are invalid syntax:

  @(set @(+ 1 1) @(* 2 2))
  @(set @b @(list "a"))

The second one is erroneous even though there is a variable on the left. Because it is preceded by the @ escape, it is a Lisp variable, and not a pattern variable.


7.3.30 The rebind directive

The rebind directive resembles set but it is not an assignment. It combines the semantics of local, bind and set. The expression on the right hand side is evaluated in the current environment. Then the variables in the pattern on the left are introduced as new bindings, whose values come from the pattern.

rebind makes it easy to create temporary bindings based on existing bindings.

  @(define pattern-function (arg))
  @;; inside a pattern function:
  @(rebind recursion-level @(+ recursion-level 1))
  @;; ...

When the function terminates, the previous value of recursion-level is restored. The effect is like the following, but much easier to write and faster to execute:

  @(define pattern-function (arg))
  @;; inside a pattern function:
  @(local temp)
  @(set temp recursion-level)
  @(local recursion-level)
  @(set recursion-level @(+ temp 1))
  @;; ...


7.3.31 The forget directive

The forget has two spellings: @(forget) and @(local).

The arguments are one or more symbols, for example:

  @(forget a)
  @(local a b c)

this can be written

  @(local a)
  @(local a b c)

Directives which follow the forget or local directive no longer see any bindings for the symbols mentioned in that directive, and can establish new bindings.

It is not an error if the bindings do not exist.

It is strongly recommended to use the @(local) spelling in functions, because the forgetting action simulates local variables: for the given symbols, the machine forgets any earlier variables from outside of the function, and consequently, any new bindings for those variables belong to the function. (Furthermore, functions suppress the propagation of variables that are not in their parameter list, so these locals will be automatically forgotten when the function terminates.)


7.3.32 The do directive

The syntax of @(do) is:


The do directive evaluates zero or more TXR Lisp expressions. (See TXR LISP far below.) The value of the expression is ignored, and matching continues continues with the directives which follow the do directive, if any.

In the context of the do directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.


  @; match text into variables a and b, then insert into hash table h
  @(bind h (hash))
  @(do (set [h a] b))


7.3.33 The mdo directive

The syntax of @(mdo) is:


Like the do directive, mdo (macro-time do) evaluates zero or more TXR Lisp expressions. Unlike do, mdo performs this evaluation immediately upon being parsed. Then it disappears from the syntax.

The effect of @(mdo e0 e1 e2 ...) is exactly like @(do (macro-time e0 e1 e2 ...)) except that do doesn't disappear from the syntax.

Another difference is that do can be used as a horizontal or vertical directive, whereas mdo is only vertical.


7.3.34 The in-package directive

The in-package directive shares the same syntax and semantics as the TXR Lisp macro of the same name:


The in-package directive is evaluated immediately upon being parsed, leaving no trace in the syntax tree of the surrounding TXRquery.

It causes the *package* special variable to take on the package denoted by name.

The directive that name is either a string or symbol. An error exception is thrown if this isn't the case. Otherwise it searches for the package. If the package is not found, an error exception is thrown.


7.4 Blocks


7.4.1 Overview

Blocks are sections of a query which are either denoted by a name, or are anonymous. They may nest: blocks can occur within blocks and other constructs.

Blocks are useful for terminating parts of a pattern matching search prematurely, and escaping to a higher level. This makes blocks not only useful for simplifying the semantics of certain pattern matches, but also an optimization tool.

Judicious use of blocks and escapes can reduce or eliminate the amount of backtracking that TXR performs.


7.4.2 The block directive

The @(block name) directive introduces a named block, except when name is the symbol nil. The @(block) directive introduces an unnamed block, equivalent to @(block nil).

The @(skip) and @(collect) directives introduce implicit anonymous blocks, as do function bodies.

Blocks must be terminated by @(end) and can be vertical:

  @(block [

or horizontal:

  @(block [


7.4.3 Block Scope

The names of blocks are in a distinct namespace from the variable binding space. So @(block foo) is unrelated to the variable @foo.

A block extends from the @(block ...) directive which introduces it, until the matching @(end), and may be empty. For instance:

  @(block foo)

Here, the block foo occurs in a @(some) clause, and so it extends to the @(end) which terminates the block. After that @(end), the name foo is not associated with a block (is not "in scope"). The second @(end) terminates the @(some) block.

The implicit anonymous block introduced by @(skip) has the same scope as the @(skip): it extends over all of the material which follows the skip, to the end of the containing subquery.


7.4.4 Block Nesting

Blocks may nest, and nested blocks may have the same names as blocks in which they are nested. For instance:


is a nesting of two anonymous blocks, and

  @(block foo)
  @(block foo)

is a nesting of two named blocks which happen to have the same name. When a nested block has the same name as an outer block, it creates a block scope in which the outer block is "shadowed"; that is to say, directives which refer to that block name within the nested block refer to the inner block, and not to the outer one.


7.4.5 Block Semantics

A block normally does nothing. The query material in the block is evaluated normally. However, a block serves as a termination point for @(fail) and @(accept) directives which are in scope of that block and refer to it.

The precise meaning of these directives is:

@(fail name)
Immediately terminate the enclosing query block called name, as if that block failed to match anything. If more than one block by that name encloses the directive, the inner-most block is terminated. No bindings emerge from a failed block.

Immediately terminate the innermost enclosing anonymous block, as if that block failed to match.

The @(fail) directive has a vertical and horizontal form.

If the implicit block introduced by @(skip) is terminated in this manner, this has the effect of causing skip itself to fail. I.e. the behavior is as if skip search did not find a match for the trailing material, except that it takes place prematurely (before the end of the available data source is reached).

If the implicit block associated with a @(collect) is terminated this way, then the entire collect fails. This is a special behavior, because a collect normally does not fail, even if it matches nothing and collects nothing!

To prematurely terminate a collect by means of its anonymous block, without failing it, use @(accept).

@(accept name)
Immediately terminate the enclosing query block called name, as if that block successfully matched. If more than one block by that name encloses the directive, the inner-most block is terminated.

Immediately terminate the innermost enclosing anonymous block, as if that block successfully matched.

@(accept) communicates the current bindings and input position to the terminated block. These bindings and current position may be altered by special interactions between certain directives and @(accept), described in the following section. Communicating the current bindings and input position means that the block which is terminated by @(accept) exhibits the bindings which were collected just prior to the execution of that @(accept) and the input position which was in effect at that time.

@(accept) has a vertical and horizontal form. In the horizontal form, it communicates a horizontal input position. A horizontal input position thus communicated will only take effect if the block being terminated had been suspended on the same line of input.

If the implicit block introduced by @(skip) is terminated by @(accept), this has the effect of causing the skip itself to succeed, as if all of the trailing material had successfully matched.

If the implicit block associated with a @(collect) is terminated by @(accept), then the collection stops. All bindings collected in the current iteration of the collect are discarded. Bindings collected in previous iterations are retained, and collated into lists in accordance with the semantics of collect.

Example: alternative way to achieve @(until) termination:

  @  (maybe)
  @  (accept)
  @  (end)

This query will collect entire lines into a list called LINE. However, if the line --- is matched (by the embedded @(maybe)), the collection is terminated. Only the lines up to, and not including the --- line, are collected. The effect is identical to:


The difference (not relevant in these examples) is that the until clause has visibility into the bindings set up by the main clause.

However, the following example has a different meaning:

  @  (maybe)
  @  (accept)
  @  (end)

Now, lines are collected until the end of the data source, or until a line is found which is followed by a --- line. If such a line is found, the collection stops, and that line is not included in the collection! The @(accept) terminates the process of the collect body, and so the action of collecting the last @LINE binding into the list is not performed.

Example: communication of bindings and input position:

 @(block foo)
 @(accept foo)

At the point where the accept occurs, the foo block has matched the first line, bound the text "1" to the variable @first. The block is then terminated. Not only does the @first binding emerge from this terminated block, but what also emerges is that the block advanced the data past the first line to the second line. Next, the @(some) directive ends, and propagates the bindings and position. Thus the @second which follows then matches the second line and takes the text "2".

Example: abandonment of @(some) clause by @(accept):

In the following query, the foo block occurs inside a maybe clause. Inside the foo block there is a @(some) clause. Its first subclause matches variable @first and then terminates block foo. Since block foo is outside of the @(some) directive, this has the effect of terminating the @(some) clause:

 @(block foo)
 @  (some)
 @  (accept foo)
 @  (or)
 @  (end)

The second clause of the @(some) directive, namely:


is never processed. The reason is that subclauses are processed in top to bottom order, but the processing was aborted within the first clause the @(accept foo). The @(some) construct never gets the opportunity to match four lines.

If the @(accept foo) line is removed from the above query, the output is different:

 @(block foo)
 @  (some)
 @#          <--  @(accept foo) removed from here!!!
 @  (or)
 @  (end)

Now, all clauses of the @(some) directive have the opportunity to match. The second clause grabs four lines, which is the longest match. And so, the next line of input available for matching is 5, which goes to the @second variable.


7.4.6 Interaction Between the trailer and accept Directives

If one of the clauses which follow a @(trailer) requests a successful termination to an outer block via @(accept), then @(trailer) intercepts the escape and adjusts the data extent to the position that it was given.



The variable line3 is bound to "1" because although @(accept) yields a data position which has advanced to the third line, this is intercepted by @(trailer) and adjusted back to the first line. Neglecting to do this adjustment would violate the semantics of trailer.


7.4.7 Interaction Between the next and accept Directives

When the clauses under a next directive are terminated by an accept, such that control passes to a block which surrounds that next, the accept is intercepted by next.

The input position being communicated by the accept is replaced with the original input position in the original stream which is in effect prior to the next directive. The accept transfer is then resumed.

In other words, accept cannot be used to "leak" the new stream out of a next scope.

However, next has no effect on the bindings being communicated.


 @(next "file-x")
 @(block b)
 @(next "file-y")
 @(accept b)

Here, the variable line matches the first line of the file "file-y", after which an accept transfer is initiated, targeting block b. This transfer communicates the line binding, as well as the position within file-y, pointing at the second line. However, the accept traverses the next directive, causing it to be abandoned. The special unwinding action within that directive detects this transfer and rewrites the input position to be the original one within the stream associated with "file-x". Note that this special handling exists in order for the behavior to be consistent with what would happen if the @(accept b) were removed, and the block b terminated normally: because the inner next is nested within that block, TXR would backtrack to the previous input position within "file-x".


7.4.8 Interaction Between Functions and the accept directive

If a pattern function is terminated due to accept, the function return mechanism intercepts the accept. The bindings being communicated by that accept are then subject to the special resolution with respect to the function parameters, exactly as if the bindings were being returned normally out of the function. The resolved bindings then replace those being communicated by the accept and the accept transfer is resumed.


 @(define fun (a))
 @  (bind a "a")
 @  (bind b "b")
 @  (accept blk)
 @(block blk)
 @(fun x)
 this line is skipped by accept

Here, the accept initiates a control transfer which communicates the a and b variable bindings which are visible in that scope. This transfer is intercepted by the function, and the treatment of the bindings follows to the same rules as a normal return (which, in the given function, would readily take place if the accept directive were removed). The b variable is suppressed, because b isn't a parameter of the function. Because a is a parameter, and the argument to that parameter is the unbound variable x, the effect is that x is bound to the value of a. When the accept transfer reaches block blk and terminates it, all that emerges is the x binding carrying "a".

If the accept invocation is removed from fun, then of course the function returns normally, producing the x binding. In that case, the line this line is skipped by accept isn't skipped since the block isn't being terminated; that line must match something.


7.4.9 Interaction Between finally and the accept directive

If the exception handling try directive protected body is terminated by an accept transfer, and if that try has a finally block, then there is a special interaction between the finally block and the accept transfer.

The processing of the finally block detects that it has been triggered by an accept transfer. Consequently, it retrieves the current input position and bindings from that transfer, and uses that position and those bindings for the processing of the finally clauses.

If the finally clauses succeed, then the new input position and new bindings are installed into the accept control transfer and that transfer resumes.

If the finally clauses fail, then the accept transfer is converted to a fail, with exactly the same block as its destination.


7.4.10 Vertical-Horizontal Mismatch Between block and accept

The block, accept and fail directives comes in horizontal and vertical forms.

This creates the possibility than an accept in horizontal context targets a vertical block or vice-versa, raising the question of how the input position is treated. The semantics of this is defined.

If a horizontal-context accept targets a vertical block, the current position at the target block will be the following line. That is to say, when the horizontal accept occurs, there is a current input line which may have unconsumed material past the current position. If the accept communicates its input position to a vertical context, that unconsumed material is skipped, as if it had been matched and the vertical position is advanced to the next line.

If a horizontal block catches a vertical accept, it rejects that accept's position and stays at the current backtracking position for that block. Only the bindings from the accept are retained.


7.4.11 Horizontal-Horizontal Mismatch between block and accept

It is possible for a horizontal accept to terminate in a horizontal block which is processing a different line of input (or even a different input stream). This situation is treated the same way as vertical accept terminating in a horizontal block: the position communicated by accept is ignored, and only the bindings are taken.


7.5 Functions


7.5.1 Overview

TXR functions allow a query to be structured to avoid repetition. On a theoretical note, because TXR functions support recursion, functions enable TXR to match some kinds of patterns which exhibit self-embedding, or nesting, and thus cannot be matched by a regular language.

Functions in TXR are not exactly like functions in mathematics or functional languages, and are not like procedures in imperative programming languages. They are not exactly like macros either. What it means for a TXR function to take arguments and produce a result is different from the conventional notion of a function.

A TXR function may have one or more parameters. When such a function is invoked, an argument must be specified for each parameter. However, a special behavior is at play here. Namely, some or all of the argument expressions may be unbound variables. In that case, the corresponding parameters behave like unbound variables also. Thus TXR function calls can transmit the "unbound" state from argument to parameter.

It should be mentioned that functions have access to all bindings that are visible in the caller; functions may refer to variables which are not mentioned in their parameter list.

With regard to returning, TXR functions are also unconventional. If the function fails, then the function call is considered to have failed. The function call behaves like a kind of match; if the function fails, then the call is like a failed match.

When a function call succeeds, then the bindings emanating from that function are processed specially. Firstly, any bindings for variables which do not correspond to one of the function's parameters are thrown away. Functions may internally bind arbitrary variables in order to get their job done, but only those variables which are named in the function argument list may propagate out of the function call. Thus, a function with no arguments can only indicate matching success or failure, but not produce any bindings. Secondly, variables do not propagate out of the function directly, but undergo a renaming. For each parameter which went into the function as an unbound variable (because its corresponding argument was an unbound variable), if that parameter now has a value, that value is bound onto the corresponding argument.


  @(define collect-words (list))
  @(coll)@{list /[^ \t]+/}@(end)

The above function collect-words contains a query which collects words from a line (sequences of characters other than space or tab), into the list variable called list. This variable is named in the parameter list of the function, therefore, its value, if it has one, is permitted to escape from the function call.

Suppose the input data is:

  Fine summer day

and the function is called like this:

  @(collect-words wordlist)

The result (with txr -B) is:


How it works is that in the function call @(collect-words wordlist), wordlist is an unbound variable. The parameter corresponding to that unbound variable is the parameter list. Therefore, that parameter is unbound over the body of the function. The function body collects the words of "Fine summer day" into the variable list, and then yields the that binding. Then the function call completes by noticing that the function parameter list now has a binding, and that the corresponding argument wordlist has no binding. The binding is thus transferred to the wordlist variable. After that, the bindings produced by the function are thrown away. The only enduring effects are:

the function matched and consumed some input; and
the function succeeded; and
the wordlist variable now has a binding.

Another way to understand the parameter behavior is that function parameters behave like proxies which represent their arguments. If an argument is an established value, such as a character string or bound variable, the parameter is a proxy for that value and behaves just like that value. If an argument is an unbound variable, the function parameter acts as a proxy representing that unbound variable. The effect of binding the proxy is that the variable becomes bound, an effect which is settled when the function goes out of scope.

Within the function, both the original variable and the proxy are visible simultaneously, and are independent. What if a function binds both of them? Suppose a function has a parameter called P, which is called with an argument A, which is an unbound variable, and then, in the function, both A and P bound. This is permitted, and they can even be bound to different values. However, when the function terminates, the local binding of A simply disappears (because the symbol A is not among the parameters of the function). Only the value bound to P emerges, and is bound to A, which still appears unbound at that point. The P binding disappears also, and the net effect is that A is now bound. The "proxy" binding of A through the parameter P "wins" the conflict with the direct binding.


7.5.2 Definition Syntax

Function definition syntax comes in two flavors: vertical and horizontal. Horizontal definitions actually come in two forms, the distinction between which is hardly noticeable, and the need for which is made clear below.

A function definition begins with a @(define ...) directive. For vertical functions, this is the only element in a line.

The define symbol must be followed by a symbol, which is the name of the function being defined. After the symbol, there is a parenthesized optional argument list. If there is no such list, or if the list is specified as () or the symbol nil then the function has no parameters. Examples of valid define syntax are:

  @(define foo)
  @(define bar ())
  @(define match (a b c))

If the define directive is followed by more material on the same line, then it defines a horizontal function:

  @(define match-x)x@(end)

If the define is the sole element in a line, then it is a vertical function, and the function definition continues below:

  @(define match-x)

The difference between the two is that a horizontal function matches characters within a line, whereas a vertical function matches lines within a stream. The former match-x matches the character x, advancing to the next character position. The latter match-x matches a line consisting of the character x, advancing to the next line.

Material between @(define) and @(end) is the function body. The define directive may be followed directly by the @(end) directive, in which case the function has an empty body.

Functions may be nested within function bodies. Such local functions have dynamic scope. They are visible in the function body in which they are defined, and in any functions invoked from that body.

The body of a function is an anonymous block. (See Blocks above).


7.5.3 Two Forms of The Horizontal Function

If a horizontal function is defined as the only element of a line, it may not be followed by additional material. The following construct is erroneous:

  @(define horiz (x))@foo:@bar@(end)lalala

This kind of definition is actually considered to be in the vertical context, and like other directives that have special effects and that do not match anything, it does not consume a line of input. If the above syntax were allowed, it would mean that the line would not only define a function but also match lalala. This would, in turn, would mean that the @(define)...@(end) is actually in horizontal mode, and so it matches a span of zero characters within a line (which means that is would require a line of input to match: a surprising behavior for a non-matching directive!)

A horizontal function can be defined in an actual horizontal context. This occurs if its is in a line where it is preceded by other material. For instance:

  X@(define fun)...@(end)Y

This is a query line which must match the text XY. It also defines the function fun. The main use of this form is for nested horizontal functions:

  @(define fun)@(define local_fun)...@(end)@(end)


7.5.4 Vertical-Horizontal Overloading

A function of the same name may be defined as both vertical and horizontal. Both functions are available at the same time. Which one is used by a call is resolved by context. See the section Vertical Versus Horizontal Calls below.


7.5.5 Call Syntax

A function is invoked by compound directive whose first symbol is the name of that function. Additional elements in the directive are the arguments. Arguments may be symbols, or other objects like string and character literals, quasiliterals ore regular expressions.


 @(define pair (a b))
 @a @b
 @(pair first second)
 @(pair "ice" cream)
 one two
 ice milk

The first call to the function takes the line "one two". The parameter a takes "one" and parameter b takes "two". These are rebound to the arguments first and second. The second call to the function binds the a parameter to the word "ice", and the b is unbound, because the corresponding argument cream is unbound. Thus inside the function, a is forced to match ice. Then a space is matched and b collects the text "milk". When the function returns, the unbound "cream" variable gets this value.

If a symbol occurs multiple times in the argument list, it constrains both parameters to bind to the same value. That is to say, all parameters which, in the body of the function, bind a value, and which are all derived from the same argument symbol must bind to the same value. This is settled when the function terminates, not while it is matching. Example:

 @(define pair (a b))
 @a @b
 @(pair same same)
 one two
 [query fails]

Here the query fails because a and b are effectively proxies for the same unbound variable same and are bound to different values, creating a conflict which constitutes a match failure.


7.5.6 Vertical Versus Horizontal Calls

A function call which is the only element of the query line in which it occurs is ambiguous. It can go either to a vertical function or to the horizontal one. If both are defined, then it goes to the vertical one.


 @(define which (x))@(bind x "horizontal")@(end)
 @(define which (x))
 @(bind x "vertical")
 @(which fun)

Not only does this call go to the vertical function, but it is in a vertical context.

If only a horizontal function is defined, then that is the one which is called, even if the call is the only element in the line. This takes place in a horizontal character-matching context, which requires a line of input which can be traversed:


 @(define which (x))@(bind x "horizontal")@(end)
 @(which fun)
 [query fails]

The query fails because since @(which fun) is in horizontal mode, it matches characters in a line. Since the function body consists only of @(bind ...) which doesn't match any characters, the function call requires an empty line to match. The line ABC is not empty, and so there is a matching failure. The following example corrects this:


 @(define which (x))@(bind x "horizontal")@(end)
 @(which fun)
 [empty line]

A call made in a clearly horizontal context will prefer the horizontal function, and only fall back on the vertical one if the horizontal one doesn't exist. (In this fall-back case, the vertical function is called with empty data; it is useful for calling vertical functions which process arguments and produce values.)

In the next example, the call is followed by trailing material, placing it in a horizontal context. Leading material will do the same thing:


 @(define which (x))@(bind x "horizontal")@(end)
 @(define which (x))
 @(bind x "vertical")
 @(which fun)B


7.5.7 Local Variables

As described earlier, variables bound in a function body which are not parameters of the function are discarded when the function returns. However, that, by itself, doesn't make these variables local, because pattern functions have visibility to all variables in their calling environment. If a variable x exists already when a function is called, then an attempt to bind it inside a function may result in a failure. The local directive must be used in a pattern function to list which variables are local.


  @(define path (path))@\
    @(local x y)@\
      (@(path x))@(path y)@(bind path `(@x)@y`)@\
      @{x /[.,;'!?][^ \t\f\v]/}@(path y)@(bind path `@x@y`)@\
      @{x /[^ .,;'!?()\t\f\v]/}@(path y)@(bind path `@x@y`)@\
      @(bind path "")@\

This is a horizontal function which matches a path, which lands into four recursive cases. A path can be parenthesized path followed by a path; it can be a certain character followed by a path, or it can be empty

This function ensures that the variables it uses internally, x and y, do not have anything to do with any inherited bindings for x and y.

Note that the function is recursive, which cannot work without x and y being local, even if no such bindings exist prior to the top-level invocation of the function. The invocation @(path x) causes x to be bound, which is visible inside the invocation @(path y), but that invocation needs to have its own binding of x for local use.


7.5.8 Nested Functions

Function definitions may appear in a function. Such definitions are visible in all functions which are invoked from the body (and not necessarily enclosed in the body). In other words, the scope is dynamic, not lexical. Inner definitions shadow outer definitions. This means that a caller can redirect the function calls that take place in a callee, by defining local functions which capture the references.


 @(define which)
 @  (fun)
 @(define fun)
 @  (output)
 top-level fun!
 @  (end)
 @(define callee)
 @  (define fun)
 @    (output)
 local fun!
 @    (end)
 @  (end)
 @  (which)
 local fun!
 top-level fun!

Here, the function which is defined which calls fun. A top-level definition of fun is introduced which outputs "top-level fun!". The function callee provides its own local definition of fun which outputs "local fun!" before calling which. When callee is invoked, it calls which, whose @(fun) call is routed to callee's local definition. When which is called directly from the top level, its fun call goes to the top-level definition.


7.5.9 Indirect Calls

Function indirection may be performed using the call directive. If fun-expr is an expression which evaluates to a symbol, and that symbol names a function which takes no arguments, then
  @(call fun-expr)
may be used to invoke the function. Of course, additional expressions may be supplied which specify arguments.

Example 1:

 @(define foo (arg))
  @(bind arg "abc")
  @(call @'foo b)

In this example, the effect is that foo is invoked, and b ends up bound to "abc".

The call directive here uses the @'foo expression to calculate the name of the function to be invoked. The @ symbol indicates that the expression which follows is TXR Lisp , and 'foo is the TXR Lisp syntax for quoting a symbol. (See the quote operator).

Of course, this particular call expression can just be replaced by the direct invocation syntax @(foo b).

The power of call lies in being able to specify the function as a value which comes from elsewhere in the program, as in the following example.

 @(define foo (arg))
  @(bind arg "abc")
  @(bind f @'foo)
  @(call f b)

Here the call directive obtains the name of the function from the f variable.

Note that function names are resolved to functions in the environment that is apparent at the point in execution where the call takes place. Very simply, the directive @(call f args ...) is precisely equivalent to @(s args ...) if, at the point of the call, f is a variable which holds the symbol s and symbol s is defined as a function. Otherwise it is erroneous.


7.6 Modularization


7.6.1 The load and include directives

The syntax of the load and include directives is:


Where expr is a Lisp expression that evaluates to a string giving the path of the file to load.

If the *load-path* has a current value which is not nil and the path is pure relative according to the pure-rel-path-p function, then the path is interpreted relative to the directory portion of the path which is stored in *load-path*.

If *load-path* is nil, or the load path is not pure relative, then it the path is taken as-is.

If the file named by the path cannot be opened, then the .txr suffix is added and another attempt is made. Thus load expressions need not refer to the suffix. In the future, additional suffixes may be searched (compiled versions of a file).

Both the load and include directives bind the *load-path* variable to the path of the loaded file just before parsing syntax from it, and remove the binding when their processing of the file is complete. Processing TXR Lisp code means that each of its forms is read, and evaluated. Processing TXR code means parsing the entire file in its entirety, and then executing its directives against the current input.

The load and include directives differ as follows. The action of load is not performed immediately but at evaluation time. Evaluation time occurs after a TXR program is read from beginning to end and parsed. That is to say, when a TXR query is parsed, any embedded @(load ...) forms in it are parsed and constitute part of its syntax tree. They are executed when that query is executed and its execution reaches those load directives.

By contrast, the action of include is performed immediately, right after the @(include ...) directive syntax is parsed. That is to say, as the TXR parser encounters this syntax it processes it immediately. The included material is read and processed. If it is TXR syntax, then it is parsed and incorporated into the syntax tree in place of the include directive. The parser then continues processing the original file after the include directive. If TXR Lisp code is processed by the include directive, then its forms are read and evaluated. An empty directive is substituted into the syntax tree in this case.

Note: the include directive is useful for loading TXR files which contain Lisp macros which are needed by the parent program. The parent program cannot use load to bring in macros because macros are required during expansion, which takes place prior to evaluation time, whereas load doesn't execute until evaluation time.

See also: the self-path, stdlib and *load-path* variables in TXR Lisp.


7.7 Output


7.7.1 Introduction

A TXR query may perform custom output. Output is performed by output clauses, which may be embedded anywhere in the query, or placed at the end. Output occurs as a side effect of producing a part of a query which contains an @(output) directive, and is executed even if that part of the query ultimately fails to find a match. Thus output can be useful for debugging. An output clause specifies that its output goes to a file, pipe, or (by default) standard output. If any output clause is executed whose destination is standard output, TXR makes a note of this, and later, just prior to termination, suppresses the usual printing of the variable bindings or the word false.


7.7.2 The output directive

The syntax of the @(output) directive is:

  @(output [
destination ] { bool-keyword | keyword value }* )
  . one or more output directives or lines

If the directive has arguments, then the first one is evaluated. If it is an object other than a keyword symbol, then it specifies the optional destination. Any remaining arguments after the optional destination are the keyword list. If the destination is missing, then the entire argument list is a keyword list.

The destination argument, if present, is treated as a TXR Lisp expression and evaluated. The resulting value is taken as the output destination. The value may be a string which gives the path name of a file to open for output. Otherwise, the destination must be a stream object.

The keyword list consists of a mixture of Boolean keywords which do not have an argument, or keywords with arguments.

The following Boolean keywords are supported:

The output directive throws an exception if the output destination cannot be opened, unless the :nothrow keyword is present, in which case the situation is treated as a match failure.

Note that since command pipes are processes that report errors asynchronously, a failing command will not throw an immediate exception that can be suppressed with :nothrow. This is for synchronous errors, like trying to open a destination file, but not having permissions, etc.

This keyword is meaningful for files, specifying append mode: the output is to be added to the end of the file rather than overwriting the file.

The following value keywords are supported:

The argument can be a symbol, which specifies a filter to be applied to the variable substitutions occurring within the output clause. The argument can also be a list of filter symbols, which specifies that multiple filters are to be applied, in left to right order.

See the later sections Output Filtering below, and The Deffilter Directive.

The argument of :into is a symbol which denotes a variable. The output will go into that variable. If the variable is unbound, it will be created. Otherwise, its contents are overwritten unless the :append keyword is used. If :append is used, then the new content will be appended to the previous content of the variable, after flattening the content to a list, as if by the flatten directive.

The argument of :named is a symbol which denotes a variable. The file or pipe stream which is opened for the output is stored in this variable, and is not closed at the end of the output block. This allows a subsequent output block to continue output on the same stream, which is possible using the next two keywords, :continue or :finish. A new binding is established for the variable, even if it already has an existing binding.

A destination should not be specified if :continue is used. The argument of :continue is an expression, such as a variable name, that evaluates to a stream object. That stream object is used for the output block. At the end of the output block, the stream is flushed, but not closed. A usage example is given in the documentation for the Close Directive below.

A destination should not be specified if :finish is used. The argument of :finish is an expression, such as a variable name, that evaluates to a stream object. That stream object is used for the output block. At the end of the output block, the stream is closed. An example is given in the documentation for the Close Directive below.


7.7.3 Output Text

Text in an output clause is not matched against anything, but is output verbatim to the destination file, device or command pipe.


7.7.4 Output Variables

Variables occurring in an output clause do not match anything; instead their contents are output.

A variable being output can be any object. If it is of a type other than a list or string, it will be converted to a string as if by the tostring function in TXR Lisp.

A list is converted to a string in a special way: the elements are individually converted to a string and then they are catenated together. The default separator string is a single space: an alternate separation can be specified as an argument in the brace substitution syntax. Empty lists turn into an empty string.

Lists may be output within @(repeat) or @(rep) clauses. Each nesting of these constructs removes one level of nesting from the list variables that it contains.

In an output clause, the @{name number} variable syntax generates fixed-width field, which contains the variable's text. The absolute value of the number specifies the field width. For instance -20 and 20 both specify a field width of twenty. If the text is longer than the field, then it overflows the field. If the text is shorter than the field, then it is left-adjusted within that field, if the width is specified as a positive number, and right-adjusted if the width is specified as negative.

An output variable may specify a filter which overrides any filter established for the output clause. The syntax for this is @{NAME :filter filterspec}. The filter specification syntax is the same as in the output clause. See Output Filtering below.


7.7.5 Output Variables: Indexing

Additional syntax is supported in output variables that does not appear in pattern matching variables.

A square bracket index notation may be used to extract elements or ranges from a variable, which works with strings, vectors and lists. Elements are indexed from zero. This notation is only available in brace-enclosed syntax, and looks like this:

Extract the element at the position given by expr.

Extract a range of elements from the position given by expr1, up to one position less than the position given by expr2.

If the variable is a list, it is treated as a list substitution, exactly as if it were the value of an unsubscripted list variable. The elements of the list are converted to strings and catenated together wit ha separator string between them, the default one being a single space.

An alternate character may be given as a string argument in the brace notation.


  @(bind a ("a" "b" "c" "d"))
  @{a[1..3] "," 10}

The above produces the text "b,c" in a field 10 spaces wide. The [1..3] argument extracts a range of a; the "," argument specifies an alternate separator string, and 10 specifies the field width.


7.7.6 Output Substitutions

The brace syntax has another syntactic and semantic extension in output clauses. In place of the symbol, an expression may appear. The value of that expression is substituted.


 @(bind a "foo")
 @{`@a:` -10}

Here, the quasiliteral expression `@a:` is evaluated, producing the string "foo:". This string is printed right-adjusted in a 10 character field.


7.7.7 The repeat directive

The repeat directive generates repeated text from a "boilerplate", by taking successive elements from lists. The syntax of repeat is like this:

  main clause material, required
  special clauses, optional

repeat has four types of special clauses, any of which may be specified with empty contents, or omitted entirely. They are described below.

repeat takes arguments, also described below.

All of the material in the main clause and optional clauses is examined for the presence of variables. If none of the variables hold lists which contain at least one item, then no output is performed, (unless the repeat specifies an @(empty) clause, see below). Otherwise, among those variables which contain non-empty lists, repeat finds the length of the longest list. This length of this list determines the number of repetitions, R.

If the repeat contains only a main clause, then the lines of this clause is output R times. Over the first repetition, all of the variables which, outside of the repeat, contain lists are locally rebound to just their first item. Over the second repetition, all of the list variables are bound to their second item, and so forth. Any variables which hold shorter lists than the longest list eventually end up with empty values over some repetitions.

Example: if the list A holds "1", "2" and "3"; the list B holds "A", "B"; and the variable C holds "X", then

  >> @C
  >> @A @B

will produce three repetitions (since there are two lists, the longest of which has three items). The output is:

  >> X
  >> 1 A
  >> X
  >> 2 B
  >> X
  >> 3

The last line has a trailing space, since it is produced by "@A @B", where B has an empty value. Since C is not a list variable, it produces the same value in each repetition.

The special clauses are:

If the repeat produces exactly one repetition, then the contents of this clause are processed for that one and only repetition, instead of the main clause or any other clause which would otherwise be processed.

The body of this clause specifies an alternative body to be used for the first repetition, instead of the material from the main clause.

The body of this clause is used instead of the main clause for the last repetition.

If the repeat produces no repetitions, then the body of this clause is output. If this clause is absent or empty, the repeat produces no output.

@(mod n m)
The forms n and m are Lisp expressions that evaluate to integers. The value of m should be nonzero. The clause denoted this way is active if the repetition modulo m is equal to n. The first repetition is numbered zero. For instance the clause headed by @(mod 0 2) will be used on repetitions 0, 2, 4, 6, ... and @(mod 1 2) will be used on repetitions 1, 3, 5, 7, ...

@(modlast n m)
The meaning of n and m is the same as in @(mod n m), but one more condition is imposed. This clause is used if the repetition modulo m is equal to n, and if it is the last repetition.

The precedence among the clauses which take an iteration is: single > first > mod > modlast > last > main. That is if two or more of these clauses can apply to a repetition, then the leftmost one in this precedence list applies. For instance, if there is just a single repetition, then any of these special clause types can apply to that repetition, since it is the only repetition, as well as the first and last one. In this situation, if there is a @(single) clause present, then the repetition is processed using that clause. Otherwise, if there is a @(first) clause present, that clause is used. Failing that, @(mod) is used if there is such a clause and its numeric conditions are satisfied. If there isn't, then @(modlast) clauses are considered, and if there are none, or none of them activate, then @(last) is considered. Finally if none of all these clauses are present or apply, then the repetition is processed using the main clause.

Repeat supports arguments.

      [:counter {
symbol | (symbol expr)}]
      [:vars ({
symbol | (symbol expr)}*)])

The :counter argument designates a symbol which will behave as an integer variable over the scope of the clauses inside the repeat. The variable provides access to the repetition count, starting at zero, incrementing with each repetition. If the the argument is given as (symbol expr) then expr is a Lisp expression whose value is taken as a displacement value which is added to each iteration of the counter. For instance :counter (c 1) specifies a counter c which counts from 1.

The :vars argument specifies a list of variable names, or pairs consisting of a variable name and Lisp expression. For every variable paired with a Lisp expression, the expression is evaluated, and a binding is introduced, associating that variable with the expression's value.

The repeat directive then processes the list of variables, selecting from it those which have a binding, either a previously existing binding or one just introduced from a Lisp expression. For each selected variable, repeat will assume that the variable occur in the repeat block and contains a list to be iterated.

Thus :vars Firstly, it is needed for situations in which @(repeat) is not able to deduce the existence of a variable in the block. It does not dig very deeply to discover variables, and does not "see" variables that are referenced via embedded TXR Lisp expressions. For instance, the following produces no output:

  @(bind list ("a" "b" "c"))
  @(format nil "<~a>" list)

Although the list variable appears in the repeat block, it is embedded in a TXR Lisp construct. That construct will never be evaluated because no repetitions take place: the repeat construct doesn't find any variables and so doesn't iterate. The remedy is to provide a little help via the :vars parameter:

  @(bind list ("a" "b" "c"))
  @(repeat :vars (list))
  @(format nil "<~a>" list)

Now the repeat block iterates over list and the output is:


Secondly, The variable binding syntax supported by :vars additionally provides a solution for situations when it is necessary to iterate over some list, but that list is the result of an expression, and not stored in any variable. A repeat block iterates only over lists emanating from variables; it does not iterate over lists pulled from arbitrary expressions.

Example: output all file names matching the *.txr pattern in the current directory:

  @(repeat :vars ((name (glob "*.txr"))))


7.7.8 Nested repeat directives

If a repeat clause encloses variables which hold multidimensional lists, those lists require additional nesting levels of repeat (or rep). It is an error to attempt to output a list variable which has not been decimated into primary elements via a repeat construct.

Suppose that a variable X is two-dimensional (contains a list of lists). X must be twice nested in a repeat. The outer repeat will traverse the lists contained in X. The inner repeat will traverse the elements of each of these lists.

A nested repeat may be embedded in any of the clauses of a repeat, not only the main clause.


7.7.9 The rep directive

The rep directive is similar to repeat. Whereas repeat is line oriented, rep generates material within a line. It has all the same clauses, but everything is specified within one line:

  @(rep)... main material ... .... special clauses ...@(end)

More than one @(rep) can occur within a line, mixed with other material. A @(rep) can be nested within a @(repeat) or within another @(rep).

Also, @(rep) accepts the same :counter and :vars arguments.


7.7.10 repeat and rep Examples

Example 1: show the list L in parentheses, with spaces between the elements, or the word EMPTY if the list is empty:

  @(rep)@L @(single)(@L)@(first)(@L @(last)@L)@(empty)EMPTY@(end)

Here, the @(empty) clause specifies EMPTY. So if there are no repetitions, the text EMPTY is produced. If there is a single item in the list L, then @(single)(@L) produces that item between parentheses. Otherwise if there are two or more items, the first item is produced with a leading parenthesis followed by a space by @(first)(@L and the last item is produced with a closing parenthesis: @(last)@L). All items in between are emitted with a trailing space by the main clause: @(rep)@L.

Example 2: show the list L like Example 1 above, but the empty list is ().

  (@(rep)@L @(last)@L@(end))

This is simpler. The parentheses are part of the text which surrounds the @(rep) construct, produced unconditionally. If the list L is empty, then @(rep) produces no output, resulting in (). If the list L has one or more items, then they are produced with spaces each one, except the last which has no space. If the list has exactly one item, then the @(last) applies to it instead of the main clause: it is produced with no trailing space.


7.7.11 The close directive

The syntax of the close directive is:


Where expr evaluates to a stream. The close directive can be used to explicitly close streams created using @(output ... :named var) syntax, as an alternative to @(output :finish expr).


Write two lines to "foo.txt" over two output blocks using a single stream:

  @(output "foo.txt" :named foo)
  @(output :continue foo)
  @(close foo)

The same as above, using :finish rather than :continue so that the stream is closed at the end of the second block:

  @(output "foo.txt" :named foo)
  @(output :finish foo)


7.7.12 Output Filtering

Often it is necessary to transform the output to preserve its meaning under the convention of a given data format. For instance, if a piece of text contains the characters < or >, then if that text is being substituted into HTML, these should be replaced by &lt; and &gt;. This is what filtering is for. Filtering is applied to the contents of output variables, not to any template text. TXR implements named filters. Built-in filters are named by keywords, given below. User-defined filters are possible, however. See notes on the deffilter directive below.

Instead of a filter name, the syntax (fun name) can be used. This denotes that the function called name is to be used as a filter. This is described in the next section Function Filters below.

Built-in filters named by keywords:

Filter text to HTML, representing special characters using HTML ampersand sequences. For instance > is replaced by &gt;.

Filter text to HTML, representing special characters using HTML ampersand sequences. Unlike :tohtml, this filter doesn't treat the single and double quote characters. It is not suitable for preparing HTML fragments which end up inserted into HTML tag attributes.

Filter text with HTML codes into text in which the codes are replaced by the corresponding characters. For instance &gt; is replaced by >.

Convert the 26 lower case letters of the English alphabet to upper case.

Convert the 26 upper case letters of the English alphabet to lower case.

Decode percent-encoded text. Character triplets consisting of the % character followed by a pair of hexadecimal digits (case insensitive) are are converted to bytes having the value represented by the hexadecimal digits (most significant nybble first). Sequences of one or more such bytes are treated as UTF-8 data and decoded to characters.

Convert to percent encoding according to RFC 3986. The text is first converted to UTF-8 bytes. The bytes are then converted back to text as follows. Bytes in the range 0 to 32, and 127 to 255 (note: including the ASCII DEL), bytes whose values correspond to ASCII characters which are listed by RFC 3986 as being in the "reserved set", and the byte value corresponding to the ASCII % character are encoded as a three-character sequence consisting of the % character followed by two hexadecimal digits derived from the byte value (most significant nybble first, upper case). All other bytes are converted directly to characters of the same value without any such encoding.

Decode from URL encoding, which is like percent encoding, except that if the unencoded + character occurs, it is decoded to a space character. Of course %20 still decodes to space, and %2B to the + character.

Encode to URL encoding, which is like percent encoding except that a space maps to + rather than %20. The + character, being in the reserved set, encodes to %2B.

Decode from the Base64 encoding described in RFC 4648.

Encodes to the RFC 4648 Base64 encoding.

Converts strings to numbers. Strings that contain a period, e or E are converted to floating point as if by the Lisp function flo-str. Otherwise they are converted to integer as if using int-str with a radix of 10. Non-numeric junk results in the object nil.

Converts strings to integers as if using int-str with a radix of 10. Non-numeric junk results in the object nil.

Converts strings to floating-point values as if using the function flo-str. Non-numeric junk results in the object nil.

Converts strings to integers as if using int-str with a radix of 16. Non-numeric junk results in the object nil.


To escape HTML characters in all variable substitutions occurring in an output clause, specify :filter :tohtml in the directive:

  @(output :filter :tohtml)

To filter an individual variable, add the syntax to the variable spec:

  @{x :filter :tohtml}

Multiple filters can be applied at the same time. For instance:

  @{x :filter (:upcase :tohtml)}

This will fold the contents of x to upper case, and then encode any special characters into HTML. Beware of combinations that do not make sense. For instance, suppose the original text is HTML, containing codes like &quot;. The compound filter (:upcase :fromhtml) will not work because &quot; will turn to &QUOT; which no longer be recognized by the :fromhtml filter, since the entity names in HTML codes are case-sensitive.

Capture some numeric variables and convert to numbers:

  @date @time @temperature @pressure
  @(filter :tofloat temperature pressure)
  @;; temperature and pressure can now be used in calculations


7.7.13 Function Filters

A function can be used as a filter. For this to be possible, the function must conform to certain rules:

The function must take two special arguments, which may be followed by additional arguments.

When the function is called, the first argument will be bound to a string, and the second argument will be unbound. The function must produce a value by binding it to the second argument. If the filter is to be used as the final filter in a chain, it must produce a string.

For instance, the following is a valid filter function:

  @(define foo_to_bar (in out))
  @  (next :string in)
  @  (cases)
  @    (bind out "bar")
  @  (or)
  @    (bind out in)
  @  (end)

This function binds the out parameter to "bar" if the in parameter is "foo", otherwise it binds the out parameter to a copy of the in parameter. This is a simple filter.

To use the filter, use the syntax (:fun foo_to_bar) in place of a filter name. For instance in the bind directive:

  @(bind "foo" "bar" :lfilt (:fun foo_to_bar))

The above should succeed since the left side is filtered from "foo" to "bar", so that there is a match.

Of course, function filters can be used in a chain:

  @(output :filter (:downcase (:fun foo_to_bar) :upcase))

Here is a split function which takes an extra argument which specifies the separator:

  @(define split (in out sep))
  @  (next :list in)
  @  (coll)@(maybe)@token@sep@(or)@token@(end)@(end)
  @  (bind out token)

Furthermore, note that it produces a list rather than a string. This function separates the argument in into tokens according to the separator text carried in the variable sep.

Here is another function, join, which catenates a list:

  @(define join (in out sep))
  @  (output :into out)
  @  (rep)@in@sep@(last)@in@(end)
  @  (end)

Now here is these two being used in a chain:

  @(bind text "how,are,you")
  @(output :filter (:fun split ",") (:fun join "-"))



When the filter invokes a function, it generates the first two arguments internally to pass in the input value and capture the output. The remaining arguments from the (:fun ...) construct are also passed to the function. Thus the string objects "," and "-" are passed as the sep argument to split and join.

Note that split puts out a list, which join accepts. So the overall filter chain operates on a string: a string goes into split, and a string comes out of join.


7.7.14 The deffilter directive

The deffilter directive allows a query to define a custom filter, which can then be used in output clauses to transform substituted data.

The syntax of deffilter is illustrated in this example:

 @(deffilter rot13
    ("a" "n")
    ("b" "o")
    ("c" "p")
    ("d" "q")
    ("e" "r")
    ("f" "s")
    ("g" "t")
    ("h" "u")
    ("i" "v")
    ("j" "w")
    ("k" "x")
    ("l" "y")
    ("m" "z")
    ("n" "a")
    ("o" "b")
    ("p" "c")
    ("q" "d")
    ("r" "e")
    ("s" "f")
    ("t" "g")
    ("u" "h")
    ("v" "i")
    ("w" "j")
    ("x" "k")
    ("y" "l")
    ("z" "m"))
 @(output :filter rot13)
 hey there!
 url gurer!

The deffilter symbol must be followed by the name of the filter to be defined, followed by bind expressions which evaluate to lists of strings. Each list must be at least two elements long and specifies one or more texts which are mapped to a replacement text. For instance, the following specifies a telephone keypad mapping from upper case letters to digits.

  @(deffilter alpha_to_phone ("E" "0")
                             ("J" "N" "Q" "1")
                             ("R" "W" "X" "2")
                             ("D" "S" "Y" "3")
                             ("F" "T" "4")
                             ("A" "M" "5")
                             ("C" "I" "V" "6")
                             ("B" "K" "U" "7")
                             ("L" "O" "P" "8")
                             ("G" "H" "Z" "9"))

  @(deffilter foo (`@a` `@b`) ("c" `->@d`))

  @(bind x ("from" "to"))
  @(bind y ("---" "+++"))
  @(deffilter sub x y)

The last deffilter has the same effect as the @(deffilter sub ("from" "to") ("---" "+++")) directive.

Filtering works using a longest match algorithm. The input is scanned from left to right, and the longest piece of text is identified at every character position which matches a string on the left hand side, and that text is replaced with its associated replacement text. The scanning then continues at the first character after the matched text.

If none of the strings matches at a given character position, then that character is passed through the filter untranslated, and the scan continues at the next character in the input.

Filtering is not in-place but rather instantiates a new text, and so replacement text is not re-scanned for more replacements.

If a filter definition accidentally contains two or more repetitions of the same left hand string with different right hand translations, the later ones take precedence. No warning is issued.


7.7.15 The filter directive

The syntax of the filter directive is:

  @(filter FILTER { VAR }+ )

A filter is specified, followed by one or more variables whose values are filtered and stored back into each variable.

Example: convert a, b, and c to upper case and HTML encode:

  @(filter (:upcase :tohtml) a b c)


7.8 Exceptions


7.8.1 Introduction

The exceptions mechanism in TXR is another disciplined form of non-local transfer, in addition to the blocks mechanism (see Blocks above). Like blocks, exceptions provide a construct which serves as the target for a dynamic exit. Both blocks and exceptions can be used to bail out of deep nesting when some condition occurs. However, exceptions provide more complexity. Exceptions are useful for error handling, and TXR in fact maps certain error situations to exception control transfers. However, exceptions are not inherently an error-handling mechanism; they are a structured dynamic control transfer mechanism, one of whose applications is error handling.

An exception control transfer (simply called an exception) is always identified by a symbol, which is its type. Types are organized in a subtype-supertype hierarchy. For instance, the file-error exception type is a subtype of the error type. This means that a file error is a kind of error. An exception handling block which catches exceptions of type error will catch exceptions of type file-error, but a block which catches file-error will not catch all exceptions of type error. A query-error is a kind of error, but not a kind of file-error. The symbol t is the supertype of every type: every exception type is considered to be a kind of t. (Mnemonic: t stands for type, as in any type).

Exceptions are handled using @(catch) clauses within a @(try) directive.

In addition to being useful for exception handling, the @(try) directive also provides unwind protection by means of a @(finally) clause, which specifies query material to be executed unconditionally when the try clause terminates, no matter how it terminates.


7.8.2 The try directive

The general syntax of the try directive is

  ... main clause, required ...
  ... optional catch clauses ...
  ... optional finally clause

A catch clause looks like:

  @(catch TYPE [ PARAMETERS ])

and also this simple form:


which catches all exceptions, and is equivalent to @(catch t).

A finally clause looks like:


The main clause may not be empty, but the catch and finally may be.

A try clause is surrounded by an implicit anonymous block (see Blocks section above). So for instance, the following is a no-op (an operation with no effect, other than successful execution):


The @(accept) causes a successful termination of the implicit anonymous block. Execution resumes with query lines or directives which follow, if any.

try clauses and blocks interact. For instance, an accept from within a try clause invokes a finally.

 @(block foo)
 @  (try)
 @    (accept foo)
 @  (finally)
 @     (output)
 @     (end)
 @  (end)

How this works: the try block's main clause is @(accept foo). This causes the enclosing block named foo to terminate, as a successful match. Since the try is nested within this block, it too must terminate in order for the block to terminate. But the try has a finally clause, which executes unconditionally, no matter how the try block terminates. The finally clause performs some output, which is seen.

Note that finally interacts with accept in subtle ways not revealed in this example; they are documented in the description of accept under the block directive documentation.


7.8.3 The finally clause

A try directive can terminate in one of three ways. The main clause may match successfully, and possibly yield some new variable bindings. The main clause may fail to match. Or the main clause may be terminated by a non-local control transfer, like an exception being thrown or a block return (like the block foo example in the previous section).

No matter how the try clause terminates, the finally clause is processed.

The finally clause is itself a query which binds variables, which leads to questions: what happens to such variables? What if the finally block fails as a query? As well as: what if a finally clause itself initiates a control transfer? Answers follow.

Firstly, a finally clause will contribute variable bindings only if the main clause terminates normally (either as a successful or failed match). If the main clause of the try block successfully matches, then the finally block continues matching at the next position in the data, and contributes bindings. If the main clause fails, then the finally block tries to match at the same position where the main clause failed.

The overall try directive succeeds as a match if either the main clause or the finally clause succeed. If both fail, then the try directive is a failed match.



In this example, the main clause of the try captures line "1" of the data as variable a, then the finally clause captures "2" as b, and then the query continues with the @c line after try block, so that c captures "3".


 hello @a

In this example, the main clause of the try fails to match, because the input is not prefixed with "hello ". However, the finally clause matches, binding b to "1". This means that the try block is a successful match, and so processing continues with @c which captures "2".

When finally clauses are processed during a non-local return, they have no externally visible effect if they do not bind variables. However, their execution makes itself known if they perform side effects, such as output.

A finally clause guards only the main clause and the catch clauses. It does not guard itself. Once the finally clause is executing, the try block is no longer guarded. This means if a nonlocal transfer, such as a block accept or exception, is initiated within the finally clause, it will not re-execute the finally clause. The finally clause is simply abandoned.

The disestablishment of blocks and try clauses is properly interleaved with the execution of finally clauses. This means that all surrounding exit points are visible in a finally clause, even if the finally clause is being invoked as part of a transfer to a distant exit point. The finally clause can make a control transfer to an exit point which is more near than the original one, thereby "hijacking" the control transfer. Also, the anonymous block established by the try directive is visible in the finally clause.


  @  (try)
  @    (next "nonexistent-file")
  @  (finally)
  @    (accept)
  @  (end)
  @(catch file-error)
  @  (output)
  file error caught
  @  (end)

In this example, the @(next) directive throws an exception of type file-error, because the given file does not exist. The exit point for this exception is the @(catch file-error) clause in the outer-most try block. The inner block is not eligible because it contains no catch clauses at all. However, the inner try block has a finally clause, and so during the processing of this exception which is headed for @(catch file-error), the finally clause performs an anonymous accept. The exit point for that accept is the anonymous block surrounding the inner try. So the original transfer to the catch clause is thereby abandoned. The inner try terminates successfully due to the accept, and since it constitutes the main clause of the outer try, that also terminates successfully. The "file error caught" message is never printed.


7.8.4 catch clauses

catch clauses establish their associated try blocks as potential exit points for exception-induced control transfers (called "throws").

A catch clause specifies an optional list of symbols which represent the exception types which it catches. The catch clause will catch exceptions which are a subtype of any one of those exception types.

If a try block has more than one catch clause which can match a given exception, the first one will be invoked.

When a catch is invoked, it is of course understood that the main clause did not terminate normally, and so the main clause could not have produced any bindings.

catch clauses are processed prior to finally.

If a catch clause itself throws an exception, that exception cannot be caught by that same clause or its siblings in the same try block. The catch clauses of that block are no longer visible at that point. Nevertheless, the catch clauses are still protected by the finally block. If a catch clause throws, or otherwise terminates, the finally block is still processed.

If a finally block throws an exception, then it is simply aborted; the remaining directives in that block are not processed.

So the success or failure of the try block depends on the behavior of the catch clause or the finally clause, if there is one. If either of them succeed, then the try block is considered a successful match.


 @  (next "nonexistent-file")
 @  x
 @  (catch file-error)

Here, the try block's main clause is terminated abruptly by a file-error exception from the @(next) directive. This is handled by the catch clause, which binds variable a to the input line "1". Then the finally clause executes, binding b to "2". The try block then terminates successfully, and so @c takes "3".


7.8.5 catch Clauses with Parameters

A catch clause may have parameters following the type name, like this:

  @(catch pair (a b))

To write a catch-all with parameters, explicitly write the master supertype t:

  @(catch t (arg ...))

Parameters are useful in conjunction with throw. The built-in error exceptions carry one argument, which is a string containing the error message. Using throw, arbitrary parameters can be passed from the throw site to the catch site.


7.8.6 The throw directive

The throw directive generates an exception. A type must be specified, followed by optional arguments, which are bind expressions. For example,

  @(throw pair "a" `@file.txt`)

throws an exception of type pair, with two arguments, being "a" and the expansion of the quasiliteral `@file.txt`.

The selection of the target catch is performed purely using the type name; the parameters are not involved in the selection.

Binding takes place between the arguments given in throw and the target catch.

If any catch parameter, for which a throw argument is given, is a bound variable, it has to be identical to the argument, otherwise the catch fails. (Control still passes to the catch, but the catch is a failed match).

 @(bind a "apple")
 @(throw e "banana")
 @(catch e (a))
 [query fails]

If any argument is an unbound variable, the corresponding parameter in the catch is left alone: if it is an unbound variable, it remains unbound, and if it is bound, it stays as is.

 @(trow e "honda" unbound)
 @(catch e (car1 car2))
 @car1 @car2
 honda toyota

If a catch has fewer parameters than there are throw arguments, the excess arguments are ignored:

 @(throw e "banana" "apple" "pear")
 @(catch e (fruit))

If a catch has more parameters than there are throw arguments, the excess parameters are left alone. They may be bound or unbound variables.

 @(trow e "honda")
 @(catch e (car1 car2))
 @car1 @car2
 honda toyota

A throw argument passing a value to a catch parameter which is unbound causes that parameter to be bound to that value.

throw arguments are evaluated in the context of the throw, and the bindings which are available there. Consideration of what parameters are bound is done in the context of the catch.

 @(bind c "c")
 @(forget c)
 @(bind (a c) ("a" "lc"))
 @(throw e a c)
 @(catch e (b a))

In the above example, c has a top-level binding to the string "c", but then becomes unbound via forget within the try construct, and rebound to the value "lc". Since the try construct is terminated by a throw, these modifications of the binding environment are discarded. Hence, at the end of the query, variable c ends up bound to the original value "c". The throw still takes place within the scope of the bindings set up by the try clause, so the values of a and c that are thrown are "a" and "lc". However, at the catch site, variable a does not have a binding. At that point, the binding to "a" established in the try has disappeared already. Being unbound, the catch parameter a can take whatever value the corresponding throw argument provides, so it ends up with "lc".

There is a horizontal form of throw. For instance:

  abc@(throw e 1)

throws exception e if abc matches.


7.8.7 The defex directive

The defex directive allows the query writer to invent custom exception types, which are arranged in a type hierarchy (meaning that some exception types are considered subtypes of other types).

Subtyping means that if an exception type B is a subtype of A, then every exception of type B is also considered to be of type A. So a catch for type A will also catch exceptions of type B. Every type is a supertype of itself: an A is a kind of A. This of course implies that every type is a subtype of itself also. Furthermore, every type is a subtype of the type t, which has no supertype other than itself. Type nil is is a subtype of every type, including itself. The subtyping relationship is transitive also. If A is a subtype of B, and B is a subtype of C, then A is a subtype of C.

defex may be invoked with no arguments, in which case it does nothing:


It may be invoked with one argument, which must be a symbol. This introduces a new exception type. Strictly speaking, such an introduction is not necessary; any symbol may be used as an exception type without being introduced by @(defex):

  @(defex a)

Therefore, this also does nothing, other than document the intent to use a as an exception.

If two or more argument symbols are given, the symbols are all introduced as types, engaged in a subtype-supertype relationship from left to right. That is to say, the first (leftmost) symbol is a subtype of the next one, which is a subtype of the next one and so on. The last symbol, if it had not been already defined as a subtype of some type, becomes a direct subtype of the master supertype t. Example:

  @(defex d e)
  @(defex a b c d)

The first directive defines d as a subtype of e, and e as a subtype of t. The second defines a as a subtype of b, b as a subtype of c, and c as a subtype of d, which is already defined as a subtype of e. Thus a is now a subtype of e. The the above can be condensed to:

  @(defex a b c d e)


 @(defex gorilla ape primate)
 @(defex monkey primate)
 @(defex human primate)
 gorilla @name
 @(throw gorilla name)
 monkey @name
 @(throw monkey name)
 human @name
 @(throw human name)
 @(catch primate (name))
 @kind @name
 we have a primate @name of kind @kind
 gorilla joe
 human bob
 monkey alice
 we have a primate joe of kind gorilla
 we have a primate bob of kind human
 we have a primate alice of kind monkey

Exception types have a pervasive scope. Once a type relationship is introduced, it is visible everywhere. Moreover, the defex directive is destructive, meaning that the supertype of a type can be redefined. This is necessary so that something like the following works right:

  @(defex gorilla ape)
  @(defex ape primate)

These directives are evaluated in sequence. So after the first one, the ape type has the type t as its immediate supertype. But in the second directive, ape appears again, and is assigned the primate supertype, while retaining gorilla as a subtype. This situation could be diagnosed as an error, forcing the programmer to reorder the statements, but instead TXR obliges. However, there are limitations. It is an error to define a subtype-supertype relationship between two types if they are already connected by such a relationship, directly or transitively. So the following definitions are in error:

  @(defex a b)
  @(defex b c)
  @(defex a c)@# error: a is already a subtype of c, through b

  @(defex x y)
  @(defex y x)@# error: circularity; y is already a supertype of x.


7.8.8 The assert directive

The assert directive requires the remaining query or sub-query which follows it to match. If the remainder fails to match, the assert directive throws an exception. If the directive is simply


Then it throws an assertion of type assert, which is a subtype of error. The assert directive also takes arguments similar to the throw directive: an exception symbol and additional arguments which are bind expressions, and may be unbound variables. The following assert directive, if it triggers, will throw an exception of type foo, with arguments 1 and "2":

  @(assert foo 1 "2")


  Important Header
  Foo: @a, @b

Without the assertion in places, if the Foo: @a, @b part does not match, then the entire interior of the @(collect) clause fails, and the collect continues searching for another match.

With the assertion in place, if the text "Important Header" and its underline match, then the remainder of the collect body must match, otherwise an exception is thrown. Now the program will not silently skip over any Important Header sections due to a problem in its matching logic. This is particularly useful when the matching is varied with numerous cases, and they must all be handled.

There is a horizontal assert directive also. For instance:


asserts that if the prefix "abc" is matched, then it must be followed by a successful match for "d@x", or else an exception is thrown.



The TXR language contains an embedded Lisp dialect called TXR Lisp.

This language is exposed in TXR in a number of ways.

In any situation that calls for an expression, a Lisp expression can be used, if it is preceded by the @ character. The Lisp expression is evaluated and its value becomes the value of that expression. Thus, TXR directives are embedded in literal text using @, and Lisp expressions are embedded in directives using @ also.

Furthermore, certain directives evaluate Lisp expressions without requiring @. These are @(do), @(require), @(assert), @(if) and @(next).

TXR Lisp code can be placed into files. On the command line, TXR treats files with a ".tl" suffix as TXR Lisp code, and the @(load) directive does also.

TXR also provides an interactive listener for Lisp evaluation.

Lastly, TXR Lisp expressions can be evaluated via the command line, using the -e and -p options.


Bind variable a to the integer 4:

  @(bind a @(+ 2 2))

Bind variable b to the standard input stream. Note that @ is not required on a Lisp variable:

  @(bind a *stdin*)

Define several Lisp functions inside @(do):

    (defun add (x y) (+ x y))

    (defun occurs (item list)
      (cond ((null list) nil)
            ((atom list) (eql item list))
            (t (or (eq (first list) item)
                   (occurs item (rest list)))))))

Trigger a failure unless previously bound variable answer is greater than 42:

  @(require (> (int-str answer) 42)


8.1 Overview

TXR Lisp is a small and simple dialect, like Scheme, but much more similar to Common Lisp than Scheme. It has separate value and function binding namespaces, like Common Lisp (and thus is a Lisp-2 type dialect), and represents Boolean true and false with the symbols t and nil (note the case sensitivity of identifiers denoting symbols!) Furthermore, the symbol nil is also the empty list, which terminates nonempty lists.

TXR Lisp has lexically scoped local variables and dynamic global variables, similarly to Common Lisp, including the convention that defvar marks symbols for dynamic binding in local scopes. Lexical closures are supported. TXR Lisp also supports global lexical variables via defvarl.

Functions are lexically scoped in TXR Lisp; they can be defined in pervasive global environment using defun or in local scopes using flet and labels.


8.2 Additional Syntax

Much of the TXR Lisp syntax has been introduced in the previous sections of the manual, since directive forms are based on it. There is some additional syntax that is useful in TXR Lisp programming.


8.2.1 Symbol Tokens

The symbol tokens in TXR Lisp, called a lident (Lisp identifier) has a similar syntax to the bident (braced identifier) in the TXR pattern language. It may consist of all the same characters, as well as the / (slash) character which may not be used in a bident. Thus a lident may consist of these characters, in addition to letters, numbers and underscores:

 ! $ % & * + - < = > ? \ ~ /

and of course, may not look like a number.

A lident may also include all of the Unicode characters which are permitted in a bident.

The one character which is allowed in a lident but not in a bident is / (forward slash).

A lone / is a valid lident and consequently a symbol token in TXR Lisp. The token /abc/ is also a symbol, and, unlike in a braced expression, is not a regular expression. In TXR Lisp expressions, regular expressions are written with a leading #.


8.2.2 Package Prefixes

If a symbol name contains a colon, the lident characters, if any, before that colon constitute the package prefix.

For example, the syntax foo:bar denotes bar symbol in the foo package.

It is a syntax error to read a symbol whose package doesn't exist.

If the package exists, but the symbol name doesn't exist in that package, then the symbol is interned in that package.

If the package name is an empty string (the colon is preceded by nothing), the package is understood to be the keyword package. The symbol is interned in that package.

The syntax :test denotes the symbol test in the keyword package, the same as keyword:test.

Symbols in the keyword package are self-evaluating. This means that when a keyword symbol is evaluated as a form, the value of that form is the keyword symbol itself. Exactly two non-keyword symbols also have this special self-evaluating behavior: the symbols t and nil in the user package, whose fully qualified names are usr:t and usr:nil.

The syntax @foo:bar denotes the meta prefix @ being applied to the foo:bar symbol, not to a symbol in the @foo package.

The syntax #:bar denotes an uninterned symbol named bar, described in the next section.

Dialect note:

In ANSI Common Lisp, the foo:bar syntax does not intern the symbol bar in the foo package; the symbol must exist or else the syntax is erroneous.


8.2.3 Uninterned Symbols

Uninterned symbols are written with the #: prefix, followed by zero or more lident characters. When an uninterned symbol is read, a new, unique symbol is constructed, with the specified name. Even if two uninterned symbols have the same name, they are different objects. The make-sym and gensym functions produce uninterned symbols.

"Uninterned" means "not entered into a package". Interning refers to a process which combines package lookup with symbol creation, which ensures that multiple occurrences of a symbol name in written syntax are all converted to the same object: the first occurrence creates the symbol and associates it with its name in a package. Subsequent occurrences do not create a new symbol, but retrieve the existing one.


8.2.4 Consing Dot

Unlike other major Lisp dialects, TXR Lisp allows a consing dot with no forms preceding it. This construct simply denotes the form which follows the dot. That is to say, the parser implements the following transformation:

  (. expr) -> expr

This is convenient in writing function argument lists that only take variable arguments. Instead of the syntax:

  (defun fun args ...)

the following syntax can be used:

  (defun fun (. args) ...)

When a lambda form is printed, it is printed in the following style.

  (lambda nil ...) -> (lambda () ...)
  (lambda sym ...) -> (lambda (. sym) ...)
  (lambda (sym) ...) -> (lambda (sym) ...)

In no other circumstances is nil printed as (), or an atom sym as (. sym).


8.2.5 Referencing Dot

A dot token which is flanked by expressions on both sides, without any intervening whitespace, is the referencing dot, and not the consing dot. The referencing dot is a syntactic sugar which translated to the qref syntax ("quoted ref"). This syntax denotes structure access; see Structures.

  ;; a.b may be almost any expressions
  a.b           <-->  (qref a b)
  a.b.c         <-->  (qref a b c)
  a.(qref b c)  <-->  (qref a b c)
  (qref a b).c  <-->  (qref (qref a b) c)

That is to say, this dot operator constructs a qref expression out of its left and right arguments. If the right argument of the dot is already a qref expression (whether produced by another instance of the dot operator, or expressed directly) it is merged. And the qref dot operator is right-to-left associative, so that a.b.c first produces (qref b c) via the right dot, and then a is adjoined into the syntax via the right dot.

Integer tokens cannot be involved in this syntax, because they form floating-point constants when juxtaposed with a dot. Such ambiguous uses of floating-point tokens are diagnosed as syntax errors:

  (a.4)   ;; error: cramped floating-point literal
  (a .4)  ;; good: a followed by 0.4


8.2.6 Unbound Referencing Dot

Closely related to the referencing dot syntax is the unbound referencing dot. This is a dot which is flanked by an expression on the right, without any intervening whitespace, but is not preceded by an expression Rather, it is preceded by whitespace, or some punctuation such as [, ( or '. This is a syntactic sugar which translates to uref syntax:

  .a            <--> (uref a)
  .a.b          <--> (uref a b)

When the unbound referencing dot is applied to a dotted expression, this can be understood as a conversion of qref to uref.

Indeed, this is exactly what happens if the unbound dot is applied to an explicit qref expression:

  .(qref a b)   <--> (uref a b)

The unbound referencing dot takes its name from the semantics of the uref macro, which produces a function that implements late binding of an object to a method slot. Whereas the expression obj.a.b denotes accessing object obj to retrieve slot a and then accessing slot b of the object from that slot, the expression .a.b. represents a "disembodied" reference: it produces a function which takes an object as an argument and then performs the implied slot referencing on that argument. When the function is called, it is said to bind the referencing to the object. Hence that referencing is "unbound".


8.2.7 Quote and Quasiquote


The quote character in front of an expression is used for suppressing evaluation, which is useful for forms that evaluate to something other than themselves. For instance if '(+ 2 2) is evaluated, the value is the three-element list (+ 2 2), whereas if (+ 2 2) is evaluated, the value is 4. Similarly, the value of 'a is the symbol a itself, whereas the value of a is the contents of the variable a.


The caret in front of an expression is a quasiquote. A quasiquote is like a quote, but with the possibility of substitution of material.

Under a quasiquote, form is considered to be a quasiquote template. The template is considered to be a literal structure, except that it may contain the notations ,expr and ,*expr which denote non-constant parts.

A quasiquote gets translated into code which, when evaluated, constructs the structure implied by qq-template, taking into account the unquotes and splices.

A quasiquote also processes nested quasiquotes specially.

If qq-template does not contain any unquotes or splices (which match its level of nesting), or is simply an atom, then ^qq-template is equivalent to 'qq-template . in other words, it is like an ordinary quote. For instance ^(a b ^(c ,d)) is equivalent to '(a b ^(c ,d)). Although there is an unquote ,d it belongs to the inner quasiquote ^(c ,d), and the outer quasiquote does not have any unquotes of its own, making it equivalent to a quote.

Dialect note: in Common Lisp and Scheme, ^form is written `form, and quasiquotes are also informally known as backquotes. In TXR, the backquote character ` used for quasi string literals.


The comma character is used within a qq-template to denote an unquote. Whereas the quasiquote suppresses evaluation, similarly to the quote, the comma introduces an exception: an element of a form which is evaluated. For example, list ^(a b c ,(+ 2 2) (+ 2 2)) is the list (a b c 4 (+ 2 2)). Everything in the quasiquote stands for itself, except for the ,(+ 2 2) which is evaluated.

Note: if a variable is called *x*, then the syntax ,*x* means ,* x*: splice the value of x*. In this situation, whitespace between the comma and the variable name must be used: , *x*.


The comma-star operator is used within quasiquote list to denote a splicing unquote. The form which follows ,* must evaluate to a list. That list is spliced into the structure which the quasiquote denotes. For example: '(a b c ,*(list (+ 3 3) (+ 4 4) d)) evaluates to (a b c 6 8 d). The expression (list (+ 3 3) (+ 4 4)) is evaluated to produce the list (6 8), and this list is spliced into the quoted template.

Dialect Notes:

In other Lisp dialects, like Scheme and ANSI Common Lisp, the equivalent syntax is usually ,@ (comma at). The @ character already has an assigned meaning in TXR, so * is used.

However, * is also a character that may appear in a symbol name, which creates a potential for ambiguity. The syntax ,*abc denotes the application of the ,* splicing operator to the symbolic expression abc; to apply the ordinary non-splicing unquote to the symbol *abc, whitespace must be used: , *abc.

In TXR, the unquoting and splicing forms may freely appear outside of a quasiquote template. If they are evaluated as forms, however, they throw an exception:

   ,(+ 2 2) ;; error!

   ',(+ 2 2) --> ,(+ 2 2)

In other Lisp dialects, a comma not enclosed by backquote syntax is treated as a syntax error by the reader.


8.2.8 Quasiquoting non-List Objects

Quasiquoting is supported over hash table and vector literals (see Vectors and Hashes below). A hash table or vector literal can be quoted, like any object, for instance:

  '#(1 2 3)

The #(1 2 3) literal is turned into a vector atom right in the TXR parser, and this atom is being quoted: this is (quote atom) syntactically, which evaluates to atom.

When a vector is quasi-quoted, this is a case of ^atom which evaluates to atom.

A vector can be quasiquoted, for example:

  ^#(1 2 3)

Of course, unquotes can occur within it.

  (let ((a 42))
    ^#(1 ,a 3)) ; value is #(1 42 3)

In this situation, the ^#(...) notation produces code which constructs a vector.

The vector in the following example is also a quasivector. It contains unquotes, and though the quasiquote is not directly applied to it, it is embedded in a quasiquote:

  (let ((a 42))
    ^(a b c #(d ,a))) ; value is (a b c #(d 42))

Hash table literals have two parts: the list of hash construction arguments and the key-value pairs. For instance:

   #H((:eql-based) (a 1) (b 2))

where (:eql-based) indicates that this hash table's keys are treated using eql equality, and (a 1) and (b 2) are the key/value entries. Hash literals may be quasiquoted. In quasiquoting, the arguments and pairs are treated as separate syntax; it is not one big list. So the following is not a possible way to express the above hash:

  ;; not supported: splicing across the entire syntax
  (let ((hash-syntax '((:eql-based) (a 1) (b 2))))

This is correct:

  ;; fine: splicing hash arguments and contents separately
  (let ((hash-args '(:eql-based))
        (hash-contents '((a 1) (b 2))))
    ^#H(,hash-args ,*hash-contents))


8.2.9 Quasiquoting combined with Quasiliterals

When a quasiliteral is embedded in a quasiquote, it is possible to use splicing to insert material into the quasiliteral.


  (eval (let ((a 3)) ^`abc @,a @{,a} @{(list 1 2 ,a)}`))

  -> "abc 3 3 1 2 3"


8.2.10 Vector Literals


A hash token followed by a list denotes a vector. For example #(1 2 a) is a three-element vector containing the numbers 1 and 2, and the symbol a.


8.2.11 Struct Literals

#S(name {slot value}*)

The notation #S followed by a nested list syntax denotes a struct literal. The first item in the syntax is a symbol denoting the struct type name. This must be the name of a struct type, otherwise the literal is erroneous. Followed by the struct type are slot names interleaved with their values. The values are literal expressions, not subject to evaluation. Each slot name which is present in the literal must name a slot in the struct type, though not all slots in the struct type must be present in the literal.

When a struct literal is read, the denoted struct type is constructed as if by a call to make-struct with an empty plist argument, followed by a sequence of assignments which store into each slot the corresponding value expression.


8.2.12 Hash Literals

#H((hash-argument*) (key value)*)

The notation #H followed by a nested list syntax denotes a hash table literal. The first item in the syntax is a list of keywords. These are the same keywords as are used when calling the function hash to construct a hash table. Allowed keywords are: :equal-based, :eql-based, :weak-keys, :weak-values, and :userdata. If the :userdata keyword is present, it must be followed by an object; that object specifies the hash table's user data, which can be retrieved using the hash-userdata function. The :equal-based and :eql-based keywords are mutually exclusive.

An empty list can be specified as nil or (), which defaults to a hash table based on the eql function, with no weak semantics or user data.

The hash table key-value contents are specified as zero or more two-element lists, whose first element specifies the key and whose second specifies the value. Both expressions are literal objects, not subject to evaluation.


8.2.13 Range Literals

#R(from to)

The notation #R followed by a two-element list syntax denotes a range literal. It combines from and to expressions, themselves literals not subject to evaluation, producing the range object whose corresponding to and from fields are the objects denoted by these expressions.


8.2.14 Buffer Literals


The notation #b' introduces a buffer object: a data representation for a block of bytes. This #b' prefix must be followed by a data section and a closing quote. The data section consists of hexadecimal digits, among which may be interspersed whitespace: tabs, spaces and newlines. There must be an even number of digits, or else the notation is ill-formed. The whitespace is ignored, and pairs of successive hex digits specify bytes. If there are no hex digits, then a zero length buffer is specified.

Buffers may be constructed by the make-buf function, and other means such as the ffi-get function.

Note that the #b prefix is also used for binary numbers. In that syntax, it is followed by an optional sign, and then a mixture of one or more of the digits 0 or 1.


8.2.15 The .. notation

In TXR Lisp, there is a special "dotdot" notation consisting of a pair of dots. This can be written between successive atoms or compound expressions, and is a shorthand for rcons.

That is to say, A .. B translates to (rcons A B), and so for instance (a b .. (c d) e .. f . g) means (a (rcons b (c d)) (rcons e f) . g).

The rcons function constructs a range object, which denotes a pair of values. Range objects are most commonly used for referencing subranges of sequences.

For instance, if L is a list, then [L 1 .. 3] computes a sublist of L consisting of elements 1 through 2 (counting from zero).

Note that if this notation is used in the dot position of an improper list, the transformation still applies. That is, the syntax (a . b .. c) is valid and produces the object (a . (rcons b c)) which is another way of writing (a rcons b c), which is quite probably nonsense.

The notation's .. operator associates right to left, so that a..b..c denotes (rcons a (rcons b c)).

Note that range objects are not printed using the dotdot notation. A range literal has the syntax of a two-element list, prefixed by #R. (See Range Literals above).

In any context where the dotdot notation may be used, and where it is evaluated to its value, a range literal may also be specified. If an evaluated dotdot notation specifies two constant expressions, then an equivalent range literal can replace it. For instance the form [L 1 .. 3] can also be written [L #R(1 3)]. The two are syntactically different, and so if these expressions are being considered for their syntax rather than value, they are not the same.


8.2.16 The DWIM Brackets

TXR Lisp has a square bracket notation. The syntax [...] is a shorthand way of writing (dwim ...). The [] syntax is useful for situations where the expressive style of a Lisp-1 dialect is useful.

For instance if foo is a variable which holds a function object, then [foo 3] can be used to call it, instead of (call foo 3). If foo is a vector, then [foo 3] retrieves the fourth element, like (vecref foo 3). Indexing over lists, strings and hash tables is possible, and the notation is assignable.

Furthermore, any arguments enclosed in [] which are symbols are treated according to a modified namespace lookup rule.

More details are given in the documentation for the dwim operator.


8.2.17 Compound Forms

In TXR Lisp, there are two types of compound forms: the Lisp-2 style compound forms, denoted by ordinary lists that are expressed with parentheses. There are Lisp-1 style compound forms denoted by the DWIM Brackets, described in the previous section.

The first position of an ordinary Lisp-2 style compound form, is expected to have a function or operator name. Then arguments follow. There may also be an expression in the dotted position, if the form is a function call.

If the form is a function call then the arguments are evaluated. If any of the arguments are symbols, they are treated according to Lisp-2 namespacing rules.

A function name may be a symbol, or else any of the syntactic forms given in the description of the function func-get-name.


8.2.18 Dot Position in Function Calls

If there is an expression in the dotted position of a function call expression, it is also evaluated, and the resulting value is involved in the function call in a special way.

Firstly, note that a compound form cannot be used in the dot position, for obvious reasons, namely that (a b c . (foo z)) does not mean that there is a compound form in the dot position, but denotes an alternate spelling for (a b c foo z), where foo behaves as a variable.

If the dot position of a compound form is an atom, then the behavior may be understood according to the following transformations:

  (f a b c ... . x)  -->  (apply (fun f) a b c ... x)
  [f a b c ... . x]  -->  [apply f a b c ... x]

In addition to atoms, meta-expressions and meta-variables can appear in the dot position, even though their underlying syntax is comprised of a compound expression. This appears to work according to a transformation pattern which superficially appears to be the same as that for atoms:

  (f a b c ... . @x)  -->  (apply (fun f) a b c ... @x)

However, in this situation, the @x is actually the form (sys:var x) and the dotted form is actually a proper list. The transformation is in fact taking place over a proper list, like this:

  (f a b c ... sys:var x)  -->  (apply (fun f) a b c ... (sys:var @x))

That is to say, the TXR Lisp form expander reacts to the presence of a sys:var or sys:expr atom in embedded in the form. That symbol and the items which follow it are wrapped in an additional level of nesting, converted into a single compound form element.

Effectively, in all these cases, the dot notation constitutes a shorthand for apply.


  ;; a contains 3
  ;; b contains 4
  ;; c contains #(5 6 7)
  ;; s contains "xyz"

  (foo a b . c)  ;; calls (foo 3 4 5 6 7)
  (foo a)        ;; calls (foo 3)
  (foo . s)      ;; calls (foo #\x #\y #\z)

  (list . a)     ;; yields 3
  (list a . b)   ;; yields (3 . 4)
  (list a . c)   ;; yields (3 5 6 7)
  (list* a c)    ;; yields (3 . #(5 6 7))

  (cons a . b)   ;; error: cons isn't variadic.
  (cons a b . c) ;; error: cons requires exactly two arguments.

  [foo a b . c]  ;; calls (foo 3 4 5 6 7)

  [c 1]          ;; indexes into vector #(5 6 7) to yield 6

  (call (op list 1 . @1) 2) ;; yields 2

Note that the atom in the dot position of a function call may be a symbol macro. Since the semantics works as if by transformation to an apply form in which the original dot position atom is an ordinary argument, the symbol macro may produce a compound form.


  (symacrolet ((x 2))
    (list 1 . x))  ;; yields (1 . 2)

  (symacrolet ((x (list 1 2)))
    (list 1 . x))  ;; (yields (1 . 3))

That is to say, the expansion of x is not substituted into the form (list 1 . x) but rather the transformation to apply syntax takes place first, and so the substitution of x takes place in a form resembling (apply (fun list) 1 x).

Dialect Note:

In some other Lisp dialects like ANSI Common Lisp, the improper list syntax may not be used as a function call; a function called apply (or similar) must be used for application even if the expression which gives the trailing arguments is a symbol. Moreover, applying sequences other than lists is not supported.


8.2.19 Improper Lists as Macro Calls

TXR Lisp allows macros to be called using forms which are improper lists. These forms are simply destructured by the usual macro parameter list destructuring. To be callable this way, the macro must have an argument list which specifies a parameter match in the dot position. This dot position must either match the terminating atom of the improper list form, or else match the trailing portion of the improper list form.

For instance if a macro mac is defined as

  (defmacro mac (a b . c) ...)

then it may not be invoked as (mac 1 . 2) because the required argument b is not satisfied, and so the 2 argument cannot match the dot position c as required. The macro may be called as (mac 1 2 . 3) in which case c receives the form 3. If it is called as (mac 1 2 3 . 4) then c receives the improper list form 3 . 4.


8.2.20 Regular Expression Literals

In TXR Lisp, the / character can occur in symbol names, and the / token is a symbol. Therefore the /regex/ syntax is not used for denoting regular expressions; rather, the #/regex/ syntax is used.


8.2.21 Notation for Circular and Shared Structure

TXR Lisp supports a printed notation called circle notation which accurately articulates the representation of objects which contain shared substructures as well as circular references. The notation is supported as a means of input, and is also optionally produced as output, controlled by the *print-circle* variable.

Ordinarily, shared substructure in printed objects is not evident, except in the case of multiple occurrences of interned symbols, in whose semantics it is implicit that they refer to the same object. Other shared structure is printed as separate copies which look like distinct objects. For instance, the object produced by (let ((shared '(1 2))) (list shared shared)) is printed as ((1 2) (1 2)), where it is not clear that the two occurrences of (1 2) are actually the same object. Under the circle notation, this object can be represented as (#5=(1 2) #5#). The #5= part introduces a reference label, associating the arbitrarily chosen non-negative integer 5 with the object which follows. The subsequent notation #5# simply refers to the object labeled by 5, reproducing that object by reference. The result is a two-element list which has the same (1 2) in two places.

Circular structure presents a greater challenge to printing: namely, if it is printed by a naive recursive descent, it results in infinite output, and possibly stack exhaustion due to recursion. The circle notation detects and handles circular references. For instance, the object produced by (let ((c (list 1))) (rplacd c c)) produces a circular list which looks like an infinite list of 1's: (1 1 1 1 ...). This cannot be printed. However, under the circle notation, it can be represented as #1=(1 . #1#). The entire object itself is labeled by the integer 1. Then, enclosed within the syntax of that labeled object itself, a reference occurs to the label. This circular label reference represents the corresponding circular reference in the object.

A detailed description of the notational elements follows:

#digits= object

The #= syntax introduces an object label which denotes the object whose printed representation follows. The label is identified by the integer value arising from digits digits which are one or more decimal digits. Note: the value zero is permitted; even though when the notation is produced by the TXR Lisp printer, labeling begins at 1. Negative values are not possible because a leading sign is not part of the syntax.

There may be no more than one definition for a given label within the syntactic scope being parsed, otherwise a syntax error occurs. In TXR pattern language code, an entire source file is parsed as one unit, and so scope for the circular notation's references is the entire source file. Files processed by @(include) have their own scope. The scope for labels in TXR Lisp source code is the top-level expression in which they appear. Consequently, references in one TXR Lisp top-level expression cannot reach definitions in another.


The ## syntax denotes a label reference: the repetition of an object that was previously labeled by the integer given by digits. If no such label had been introduced in the syntactic scope, a syntax error occurs. An object was previously labeled by digits if a #= definition occurs in the same syntactic scope as the reference, and is applied to an object which either encloses the reference, or lexically precedes the reference. Forward references such as (#1# #1=(1 2)) are not supported.


Circular notation can span hash table literals. The syntax #1=#H((:eql-based) (#1# #1#)) denotes an eql-based hash table which contains one entry, in which that same table itself is both the key and value. This kind of circularity is not supported for equal-based hash tables. The analogous syntax #1=#H(() (#1# #1#)) produces a hash table in an inconsistent state.

Dialect note:

Circle notation is taken from Common Lisp, intended to be unsurprising to users familiar with that language. The implementation is based on descriptions in the ANSI Common Lisp document, judiciously taking into account the content of the X3J13 Cleanup Issues named PRINT-CIRCLE-STRUCTURE:USER-FUNCTIONS-WORK and PRINT-CIRCLE-SHARED:RESPECT-PRINT-CIRCLE.


8.2.22 Notation for Erasing Objects

#; expr

The TXR Lisp notation #; in TXR Lisp indicates that the expression expr is to be read and then discarded, as if it were replaced by whitespace.

This is useful for temporarily "commenting out" an expression.


Whereas it is valid for a TXR Lisp source file to be empty, it is a syntax error if a TXR Lisp source file contains nothing but one or more objects which are each suppressed by a preceding #;. In the interactive listener, an input line consisting of nothing but commented-out objects is similarly a syntax error.

The notation does not cascade; consecutive occurrences of #; trigger a syntax error.

The notation interacts with the circle notation. Firstly, if an object which is erased by #; contains circular-referencing instances of the label notation, those instances refer to nil. Secondly, commented-out objects may introduce labels which are subsequently referenced in expr. An example of the first situation occurs in:


Here the #1# label is a circular reference because it refers to an object which is a parent of the object which contains that reference. Such a reference is only satisfied by a "backpatching" process once the entire surrounding syntax is processed to the top level. The erasure perpetrated by #; causes the #1# label reference to be replaced by nil, and therefore the labeled object is the object (nil).

An example of the second situation is

  #;(#2=(a b c)) #2#

Here, even though the expression (#2=(a b c)) is suppressed, the label definition which it has introduced persists into the following object, where the label reference #2# resolves to (a b c).

A combination of the two situations occurs in

  #;(#1=(#1#)) #1#

which yields (nil). This is because the #1= label is available; but the earlier #1# reference, being a circular reference inside an erased object, had lapsed to nil.


8.3 Generalization of List Accessors

In ancient Lisp in the 1960's, it was not possible to apply the operations car and cdr to the nil symbol (empty list), because it is not a cons cell. In the InterLisp dialect, this restriction was lifted: these operations were extended to accept nil (and return nil). The convention was adopted in other Lisp dialects such as MacLisp and eventually in Common Lisp. Thus there exists an object which is not a cons, yet which takes car and cdr.

In TXR Lisp, this relaxation is extended further. For the sake of convenience, the operations car and cdr, are made to work with strings and vectors:

  (cdr "") -> nil
  (car "") -> nil

  (car "abc") -> #\a
  (cdr "abc") -> "bc"

  (cdr #(1 2 3)) -> #(2 3)
  (car #(1 2 3)) -> 1

Moreover, structure types which define the methods car, cdr and nullify can also be treated in the same way.

The ldiff function is also extended in a special way. When the right parameter a non-list sequence, then it uses the equal equality test rather than eq for detecting the tail of the list.

  (ldiff "abcd" "cd") -> (#\a #\b)

The ldiff operation starts with "abcd" and repeatedly applies cdr to produce "bcd" and "cd", until the suffix is equal to the second argument: (equal "cd" "cd") yields true.

Operations based on car, cdr and ldiff, such as keep-if and remq extend to strings and vectors.

Most derived list processing operations such as remq or mapcar obey the following rule: the returned object follows the type of the leftmost input list object. For instance, if one or more sequences are processed by mapcar, and the leftmost one is a character string, the function is expected to return characters, which are converted to a character string. However, in the event that the objects produced cannot be assembled into that type of sequence, a list is returned instead.

For example [mapcar list "ab" "12"] returns ((#\a #\b) (#\1 #\2)), because a string cannot hold lists of characters. However [mappend list "ab" "12"] returns "a1b2".

The lazy versions of these functions such as mapcar* do not have this behavior; they produce lazy lists.


8.4 Callable Objects

In TXR Lisp, sequences (strings, vectors and lists) as well as hashes and regular expressions can be used as functions everywhere, not just with the DWIM brackets.

Sequences work as one or two-argument functions. With a single argument, an element is selected by position and returned. With two arguments, a range is extracted and returned.

Moreover, when a sequence is used as a function of one argument, and the argument is a range object rather than an integer, then the call is equivalent to the two-argument form. This is the basis for array slice syntax like ["abc" 0..1] .

Hashes also work as one or two argument functions, corresponding to the arguments of the gethash function.

A regular expression behaves as a one, two, or three argument function, which operates on a string argument. It returns the leftmost matching substring, or else nil.

Example 1:

  (mapcar "abc" '(2 0 1)) -> (#\c #\a #\b)

Here, mapcar treats the string "abc" as a function of one argument (since there is one list argument). This function maps the indices 0, 1 and 2 to the corresponding characters of string "abc". Through this function, the list of integer indices (2 0 1) is taken to the list of characters (#\c #\a #\b).

Example 2:

  (call '(1 2 3 4) 1..3) -> (2 3)

Here, the shorthand 1 .. 3 denotes (rcons 1 3). A range used as an argument to a sequence performs range extraction: taking a slice starting at index 1, up to and not including index 3, as if by the call (sub '(1 2 3 4) 1 3).

Example 3:

  (call '(1 2 3 4) '(0 2)) -> (1 2)

A list of indices applied to a sequence is equivalent to using the select function, as if (select '(1 2 3 4) '(0 2)) were called.

Example 4:

  (call #/b./ "abcd") -> "bc"

Here, the regular expression, called as a function, finds the matching substring "bc" within the argument "abcd".


8.5 Special Variables

Similarly to Common Lisp, TXR Lisp is lexically scoped by default, but also has dynamically scoped (a.k.a "special") variables.

When a variable is defined with defvar or defparm, a binding for the symbol is introduced in the global name space, regardless of in what scope the defvar form occurs.

Furthermore, at the time the defvar form is evaluated, the symbol which names the variable is tagged as special.

When a symbol is tagged as special, it behaves differently when it is used in a lexical binding construct like let, and all other such constructs such as function parameter lists. Such a binding is not the usual lexical binding, but a "rebinding" of the global variable. Over the dynamic scope of the form, the global variable takes on the value given to it by the rebinding. When the form terminates, the prior value of the variable is restored. (This is true no matter how the form terminates; even if by an exception.)

Because of this "pervasive special" behavior of a symbol that has been used as the name of a global variable, a good practice is to make global variables have visually distinct names via the "earmuffs" convention: beginning and ending the name with an asterisk.


  (defvar *x* 42)     ;; *x* has a value of 42

  (defun print-x ()
    (format t "~a\n" *x*))

  (let ((*x* "abc"))  ;; this overrides *x*
    (print-x))        ;; *x* is now "abc" and so that is printed

  (print-x)           ;; *x* is 42 again and so "42" is printed

Dialect Note 1:

The terms bind and binding are used differently in TXR Lisp compared to ANSI Common Lisp. In TXR Lisp binding is an association between a symbol and an abstract storage location. The association is registered in some namespace, such as the global namespace or a lexical scope. That storage location, in turn, contains a value. In ANSI Lisp, a binding of a dynamic variable is the association between the symbol and a value. It is possible for a dynamic variable to exist, and not have a value. A value can be assigned, which creates a binding. In TXR Lisp, an assignment is an operation which transfers a value into a binding, not one which creates a binding.

In ANSI Lisp, a dynamic variable can exist which has no value. Accessing the value signals a condition, but storing a value is permitted; doing so creates a binding. By contrast, in TXR Lisp a global variable cannot exist without a value. If a defvar form doesn't specify a value, and the variable doesn't exist, it is created with a value of nil.

Dialect Note 2:

Unlike ANSI Common Lisp, TXR Lisp has global lexical variables in addition to special variables. These are defined using defvarl and defparml. The only difference is that when variables are introduced by these macros, the symbols are not marked special, so their binding in lexical scopes is not altered to dynamic binding.

Many variables in TXR Lisp's standard library are global lexicals. Those which are special variables obey the "earmuffs" convention in their naming. For instance s-ifmt, log-emerg and sig-hup are global lexicals, because they provide constant values for which overriding doesn't make sense. On the other hand the standard output stream variable *stdout* is special. Overriding it over a dynamic scope is very useful.

Dialect Note 3:

In Common Lisp, defparm is known as defparameter.


8.6 Syntactic Places and Accessors

The TXR Lisp feature known as syntactic places allows programs to use the syntax of a form which is used to access a value from an environment or object, as an expression which denotes a place where a value may be stored.

They are almost exactly the same concept as "generalized references" in Common Lisp, and are related to "lvalues" in languages in the C family, or "designators" in Pascal.


8.6.1 Symbolic Places

A symbol is a is a syntactic place if it names a variable. If a is a variable, then it may be assigned using the set operator: the form (set a 42) causes a to have the integer value 42.


8.6.2 Compound Places

A compound expression can be a syntactic place, if its leftmost constituent is as symbol which is specially registered, and if the form has the correct syntax for that kind of place, and suitable semantics. Such an expression is a compound place.

An example of a compound place is a car form. If c is an expression denoting a cons cell, then (car c) is not only an expression which retrieves the value of the car field of the cell. It is also a syntactic place which denotes that field as a storage location. Consequently, the expression (set (car c) "abc") stores the character string "abc" in that location. Although the same effect can be obtained with (rplaca c "abc") the syntactic place frees the programmer from having to remember different update functions for different kinds of places. There are various other advantages. TXR Lisp provides a plethora of operators for modifying a place in addition to set. Subject to certain usage restrictions, these operators work uniformly on all places. For instance, the expression (rotate (car x) [str 3] y) causes three different kinds of places to exchange contents, while the three expressions denoting those places are evaluated only once. New kinds of place update macros like rotate are quite easily defined, as are new kinds of compound places.


8.6.3 Accessor Functions

When a function call form such as the above (car x) is a syntactic place, then the function is called an accessor. This term is used throughout this document to denote functions which have associated syntactic places.


8.6.4 Macro Call Syntactic Places

Syntactic places can be macros (global and lexical), including symbol macros. So for instance in (set x 42) the x place can actually be a symbolic macro which expands to, say, (cdr y). This means that the assignment is effectively (set (cdr y) 42).


8.6.5 User-Defined Syntactic Places and Place Operators

Syntactic places, as well as operators upon syntactic places, are both open-ended. Code can be written quite easily in TXR Lisp to introduce new kinds of places, as well as new place-mutating operators. New places can be introduced with the help of the defplace macro, or possibly the define-place-macro macro in simple cases when a new syntactic place can be expressed as a transformation to the syntax of an existing place. Three ways exist for developing new place update macros (place operators). They can be written using the ordinary macro definer ordinary macro definer defmacro, with the help of special utility macros called with-update-expander, with-clobber-expander, and with-delete-expander. They can also be written using defmacro in conjunction with the operators placelet or placelet*. Simple update macros similar to inc and push can be written compactly using define-modify-macro.


8.6.6 Deletable Places

Unlike generalized references in Common Lisp, TXR Lisp syntactic places support the concept of deletion. Some kinds of places can be deleted, which is an action distinct from (but does not preclude) being overwritten with a value. What exactly it means for a place to be deleted, or whether that is even permitted, depends on the kind of place. For instance a place which denotes a lexical variable may not be deleted, whereas a global variable may be. A place which denotes a hash table entry may be deleted, and results in the entry being removed from the hash table. Deleting a place in a list causes the trailing items, if any, or else the terminating atom, to move in to close the gap. Users may, of course, define new kinds of places which support deletion semantics.


8.6.7 Evaluation of Places

To bring about their effect, place operators must evaluate one or more places. Moreover, some of them evaluate additional forms which are not places. Which arguments of a place operator form are places and which are ordinary forms depends on its specific syntax. For all the built-in place operators, the position of an argument in the syntax determines whether it is treated as (and consequently required to be) a syntactic place, or whether it is an ordinary form.

All built-in place operators perform the evaluation of place and non-place argument forms in strict left to right order.

Place forms are evaluated not in order to compute a value, but in order to determine the storage location. In addition to determining a storage location, the evaluation of a place form may possibly give rise to side effects. Once a place is fully evaluated, the storage location can then be accessed. Access to the storage location is not considered part of the evaluation of a place. To determine a storage location means to compute some hidden referential object which provides subsequent access to that location without the need for a re-evaluation of the original place form. (The subsequent access to the place through this referential object may still require a multi-step traversal of a data structure; minimizing such steps is a matter of optimization.)

Place forms may themselves be compounds, which contain subexpressions that must be evaluated. All such evaluation for the built-in places takes place in left to right order.

Certain place operators, such as shift and rotate, exhibit an unspecified behavior with regard to the timing of the access of the prior value of a place, relative to the evaluation of places which occur later in the same place operator form. Access to the prior values may be delayed until the entire form is evaluated, or it may be interleaved into the evaluation of the form. For example, in the form (shift a b c 1), the prior value of a can be accessed and saved as soon as a is evaluated, prior to the evaluation of b. Alternatively, a may be accessed and saved later, after the evaluation of b or after the evaluation of all the forms. This issue affects the behavior of place-modifying forms whose subforms contain side effects. It is recommended that such forms not be used in programs.


8.6.8 Nested Places

Certain place forms are required to have one or more arguments which are themselves places. The prime example of this, and the only example from among built-in syntactic places, are DWIM forms. A DWIM form has the syntax

obj-place index [alt])

and of course the square-bracket-notation equivalent:

obj-place index [alt]]

Note that not only is the entire form a place, denoting some element or element range of obj-place, but there is the added constraint that obj-place must also itself be a syntactic place.

This requirement is necessary, because it supports the behavior that when the element or element range is updated, then obj-place is also potentially updated.

After the assignment (set [obj 0..3] '("forty" "two")) not only is the range of places denoted by [obj 0..3] replaced by the list of strings ("forty" "two") but obj may also be overwritten with a new value.

This behavior is necessary because the DWIM brackets notation maintains the illusion of an encapsulated array-like container over several dis-similar types, including Lisp lists. But Lisp lists do not behave as fully encapsulated containers. Some mutations on Lisp lists return new objects, which then have to stored (or otherwise accepted) in place of the original objects in order to maintain the array-like container illusion.


8.6.9 Built-In Syntactic Places

The following is a summary of the built-in place forms, in addition to symbolic places denoting variables. Of course, new syntactic place forms can be defined by TXR programs.

object [num])
object [num])
index list)
index list)
num list)
seq idx)
sequence [from [to]])
vec idx)
str idx)
hash key [alt])
obj-place index [alt])
obj-place index [alt]] ;; equivalent to dwim
struct-obj slot-name-valued-form)
struct-obj slot-name) ;; by macro-expansion to (slot ...)
struct-obj . slot-name ;; equivalent to qref


8.6.10 Built-In Place-Mutating Operators

The following is a summary of the built-in place mutating macros. They are described in detail in their own sections.

(set {place new-value}*)
Assigns the values of expressions to places, performing assignments in left to right order, returning the value assigned to the rightmost place.

(pset {place new-value}*)
Assigns the values of expressions to places, performing the determination of places and evaluation of the expressions left to right, but the assignment in parallel. Returns the value assigned to the rightmost place.

(zap place [new-value])
Assigns new-value to place, defaulting to nil, and returns the prior value.

(flip place)
Logically toggles the Boolean value of place, and returns the new value.

(test-set place)
If place contains nil, stores t into the place and returns t to indicate that the store took place. Otherwise does nothing and returns nil.

(test-clear place)
If place contains a Boolean true value, stores nil into the place and returns t to indicate that the store took place. Otherwise does nothing and returns nil.

(compare-swap place cmp-fun cmp-val store-val)
Examines the value of place and compares it to cmp-val using the comparison function given by the function name cmp-fun. If the comparison is false, returns nil. Otherwise, stores the store-val value into place and returns t.

(inc place [delta])
Increments place by delta, which defaults to 1, and returns the new value.

(dec place [delta])
Decrements place by delta, which defaults to 1, and returns the new value.

(pinc place [delta])
Increments place by delta, which defaults to 1, and returns the old value.

(pdec place [delta])
Decrements place by delta, which defaults to 1, and returns the old value.

(test-inc place [delta [from-val]])
Increments place by delta and returns t if the previous value was eql to from-val, where delta defaults to 1 and from-val defaults to zero.

(test-dec place [delta [to-val]])
Decrements place by delta and returns t if the new value is eql to to-val, where delta defaults to 1 and to-val defaults to 0.

(swap left-place right-place)
Exchanges the values of left-place and right-place.

(push item place)
Pushes item into the list stored in place and returns item.

(pop place)
Pop the list stored in place and returns the popped value.

(shift place+ shift-in-value)
Treats one or more places as a "multi-place shift register". Values are shifted to the left among the places. The rightmost place receives shift-in-value, and the value of the leftmost place emerges as the return value.

(rotate place*)
Treats zero or more places as a "multi-place rotate register". The places exchange values among themselves, by a rotation by one place to the left. The value of the leftmost place goes to the rightmost place, and that value is returned.

(del place)
Deletes a place which supports deletion, and returns the value which existed in that place prior to deletion.

(lset {place}+ list-expr)
Sets multiple places to values obtained from successive elements of sequence.

(upd place opip-arg*)
Applies an opip-style operational pipeline to the value of place and stores the result back into place.


8.7 Namespaces and Environments

TXR Lisp is a Lisp-2 dialect: it features separate namespaces for functions and variables.


8.7.1 Global Functions and Operator Macros

In TXR Lisp, global functions and operator macros co-exist, meaning that the same symbol can be defined as both a macro and a function.

There is a global namespace for functions, into which functions can be introduced with the defun macro. The global function environment can be inspected and modified using the symbol-function accessor.

There is a global namespace for macros, into which macros are introduced with the defmacro macro. The global function environment can be inspected and modified using the symbol-macro accessor.

If a name x is defined as both a function and a macro, then an expression of the form (x ...) is expanded by the macro, whereas an expression of the form [x ...] refers to the function. Moreover, the macro can produce a call to the function. The expression (fun x) will retrieve the function object.


8.7.2 Global and Dynamic Variables

There is a global namespace for variables also. The operators defvar and defparm introduce bindings into this namespace. These operators have the side effect of marking a symbol as a special variable, of the symbol are treated as dynamic variables, subject to rebinding. The global variable namespace together with the special dynamic rebinding is called the dynamic environment. The dynamic environment can be inspected and modified using the symbol-value accessor.

The operators defvarl and defparml introduce bindings into the global namespace without marking symbols as special variables. Such bindings are called global lexical variables.


8.7.3 Global Symbol Macros

Symbol macros may be defined over the global variable namespace using defsymacro.

Note that whereas a symbol may simultaneously have both a function and macro binding in the global namespace, a symbol may not simultaneously have a variable and symbol macro binding.


8.7.4 Lexical Environments

In addition to global and dynamic namespaces, TXR Lisp provides lexically scoped binding for functions, variables, macros, and symbol macros. Lexical variable binding are introduced with let, let* or various binding macros derived from these. Lexical functions are bound with flet and labels. Lexical macros are established with macrolet and lexical symbol macros with symacrolet.

Macros receive an environment parameter with which they may expand forms in their correct environment, and perform some limited introspection over that environment in order to determine the nature of bindings, or the classification of forms in those environments. This introspection is provided by lexical-var-p, lexical-fun-p, and lexical-lisp1-binding.

Lexical operator macros and lexical functions can also co-exist in the following way. A lexical function shadows a global or lexical macro completely. However, the reverse is not the case. A lexical macro shadows only those uses of a function which look like macro calls. This is succinctly demonstrated by the following form:

  (flet ((foo () 43))
    (macrolet ((foo () 44))
      (list (fun foo) (foo) [foo])))

  -> (#<interpreted fun: lambda nil> 44 43)

The (fun foo) and [fun] expressions are oblivious to the macro; the macro expansion process process the symbol foo in those contexts. However the form (foo) is subject to macro-expansion and replaced with 44.

If the flet and macrolet are reversed, the behavior is different:

  (macrolet ((foo () 44))
    (flet ((foo () 43))
      (list (fun foo) (foo) [foo])))

  -> (#<interpreted fun: lambda nil> 43 43)

All three forms refer to the function, which lexically shadows the macro.


8.7.5 Pattern Language and Lisp Scope Nesting

TXR Lisp expressions can be embedded in the TXR pattern language in various ways. Likewise, the pattern language can be invoked from TXR Lisp. This brings about the possibility that Lisp code attempts to access pattern variables bound in the pattern language. The TXR pattern language can also attempt to access TXR Lisp variables.

The rules are as follows, but they have undergone historic changes. See the COMPATIBILITY section, in particular notes under 138 and 121, and also 124.

A Lisp expression evaluated from the TXR pattern language executes in a null lexical environment. The current set of pattern variables captured up to that point by the pattern language are installed as dynamic variables. They shadow any Lisp global variables (whether those are defined by defvar or defvarl).

In the reverse direction, a variable reference from the TXR pattern language searches the pattern variable space first. If a variable doesn't exist there, then the lookup refers to the TXR Lisp global variable space. The pattern language doesn't see Lisp lexical variables.

When Lisp code is evaluated from the pattern language, the pattern variable bindings are not only installed as dynamic variables for the sake of their visibility from Lisp, but they are also specially stored in a dynamic environment frame. When TXR pattern code is re-entered from Lisp, these bindings are picked up from the closest such environment frame, allowing the nested invocation of pattern code to continue with the bindings captured by outer pattern code.

Concisely, in any context in which a symbol has both a binding as a Lisp global variable as well as a pattern variable, that symbol refers to the pattern variable. Pattern variables are propagated through Lisp evaluation into nested invocations of the pattern language.

The pattern language can also reference Lisp variables using the @ prefix, which is a consequence of that prefix introducing an expression that is evaluated as Lisp, the name of a variable being such an expression.




9.1 Conventions

The following sections list all of the special operators, macros and functions in TXR Lisp.

In these sections, syntax is indicated using these conventions:

A symbol in fixed-width-italic font denotes some syntactic unit: it may be a symbol or compound form. The syntactic unit is explained in the corresponding Description section.

{syntax}* word*
This indicates a repetition of zero or more of the given syntax enclosed in the braces or syntactic unit. The curly braces may be omitted if the scope of the * is clear.

{syntax}+ word+
This indicates a repetition of one or more of the given syntax enclosed in the braces or syntactic unit. The curly braces may be omitted if the scope of the + is clear.

{syntax | syntax | ...}
This indicates a choice among alternatives. May be combined with + or * repetition.

[syntax] [word]
Square brackets indicate optional syntax.

'[' ']'
The quoted square brackets indicate literal brackets which appear in the syntax, which they do without quotes. For instance '['foo [ bar ]']' is a pattern denotes the two possible expressions [foo] and [foo bar].

syntax -> result
The arrow notation is used in examples to indicate that the evaluation of the given syntax produces a value, whose printed representation is result.


9.2 Form Evaluation

A compound expression with a symbol as its first element, if intended to be evaluated, denotes either an operator invocation or a function call. This depends on whether the symbol names an operator or a function.

When the form is an operator invocation, the interpretation of the meaning of that form is under the complete control of that operator.

If the compound form is a function call, the remaining forms, if any, denote argument expressions to the function. They are evaluated in left to right order to produce the argument values, which are passed to the function. An exception is thrown if there are not enough arguments, or too many. Programs can define named functions with the defun operator

Some operators are macros. There exist predefined macros in the library, and macro operators can also be user-defined using the macro-defining operator defmacro. Operators that are not macros are called special operators.

Macro operators work as functions which are given the source code of the form. They analyze the form, and translate it to another form which is substituted in their place. This happens during a code walking phase called the expansion phase, which is applied to each top-level expression prior to evaluation. All macros occurring in a form are expanded in the expansion phase, and subsequent evaluation takes place on a structure which is devoid of macros. All that remains are the executable forms of special operators, function calls, symbols denoting either variables or themselves, and atoms such as numeric and string literals.

Special operators can also perform code transformations during the expansion phase, but that is not considered macroexpansion, but rather an adjustment of the representation of the operator into an required executable form. In effect, it is post-macro compilation phase.

Note that Lisp forms occurring in TXR pattern language are not individual top-level forms. Rather, the entire TXR query is parsed at the same time, and the macros occurring in its Lisp forms are expanded at that time.


9.2.1 Operator quote




The quote operator, when evaluated, suppresses the evaluation of form, and instead returns form itself as an object. For example, if form is a symbol, then form is not evaluated to the symbol's value; rather the symbol itself is returned.

Note: the quote syntax '<form> is translated to (quote form).


  ;; yields symbol a itself, not value of variable a
  (quote a) -> a

  ;; yields three-element list (+ 2 2), not 4.
  (quote (+ 2 2)) -> (+ 2 2)


9.3 Variable Binding

Variables are associations between symbols and storage locations which hold values. These associations are called bindings.

Bindings are held in a context called an environment.

Lexical environments hold local variables, and nest according to the syntactic structure of the program. Lexical bindings are always introduced by a some form known as a binding construct, and the corresponding environment is instantiated during the evaluation of that construct. There also exist bindings outside of any binding construct, in the so-called global environment . Bindings in the global environment can be temporarily shadowed by lexically-established binding in the dynamic environment . See the Special Variables section above.

Certain special symbols cannot be used as variable names, namely the symbols t and nil, and all of the keyword symbols (symbols in the keyword package), which are denoted by a leading colon. When any of these symbols is evaluated as a form, the resulting value is that symbol itself. It is said that these special symbols are self-evaluating or self-quoting, similarly to all other atom objects such as numbers or strings.

When a form consisting of a symbol, other than the above special symbols, is evaluated, it is treated as a variable, and yields the value of the variable's storage location. If the variable doesn't exist, an exception is thrown.

Note: symbol forms may also denote invocations of symbol macros. (See the operators defsymacro and symacrolet). All macros, including symbol macros, which occur inside a form are fully expanded prior to the evaluation of a form, therefore evaluation does not consider the possibility of a symbol being a symbol macro.


9.3.1 Operator defvar and macro defparm


sym [value])
sym value)


The defvar operator binds a name in the variable namespace of the global environment. Binding a name means creating a binding: recording, in some namespace of some environment, an association between a name and some named entity. In the case of a variable binding, that entity is a storage location for a value. The value of a variable is that which has most recently been written into the storage location, and is also said to be a value of the binding, or stored in the binding.

If the variable named sym already exists in the global environment, the form has no effect; the value form is not evaluated, and the value of the variable is unchanged.

If the variable does not exist, then a new binding is introduced, with a value given by evaluating the value form. If the form is absent, the variable is initialized to nil.

The value form is evaluated in the environment in which the defvar form occurs, not necessarily in the global environment.

The symbols t and nil may not be used as variables, and neither can be keyword symbols: symbols denoted by a leading colon.

In addition to creating a binding, the defvar operator also marks sym as the name of a special variable. This changes what it means to bind that symbol in a lexical binding construct such as the let operator, or a function parameter list. See the section "Special Variables" far above.

The defparm macro behaves like defvar when a variable named sym doesn't already exist.

If sym already denotes a variable binding in the global namespace, defparm evaluates the value form and assigns the resulting value to the variable.

The following equivalence holds:

  (defparm x y)  <-->  (prog1 (defvar x) (set x y))

The defvar and defparm forms return sym.


9.3.2 Macros defvarl and defparml


sym [value])
sym value)


The defvarl and defparml macros behave, respectively, almost exactly like defvar and defparm.

The difference is that these operators do not mark sym as special.

If a global variable sym does not previously exist, then after the evaluation of either of these forms (boundp sym) is true, but (special-var-p sym) isn't.

If sym had been already introduced as a special variable, it stays that way after the evaluation of defvarl or defparml.


9.3.3 Operators let and let*


  (let ({
sym | (sym init-form)}*) body-form*)
  (let* ({
sym | (sym init-form)}*) body-form*)


The let and let* operators introduce a new scope with variables and evaluate forms in that scope. The operator symbol, either let or let*, is followed by a list which can contain any mixture of variable name symbols, or (sym init-form) pairs. A symbol denotes the name of variable to be instantiated and initialized to the value nil. A symbol specified with an init-form denotes a variable which is initialized from the value of the init-form.

The symbols t and nil may not be used as variables, and neither can be keyword symbols: symbols denoted by a leading colon.

The difference between let and let* is that in let*, later init-form-s have visibility over the variables established by earlier variables in the same let* construct. In plain let, the variables are not visible to any of the init-form-s.

When the variables are established, then the body-form-s are evaluated in order. The value of the last body-form becomes the return value of the let.

If there are no body-form-s, then the return value nil is produced.

The list of variables may be empty.


  (let ((a 1) (b 2)) (list a b)) -> (1 2)
  (let* ((a 1) (b (+ a 1))) (list a b (+ a b))) -> (1 2 3)
  (let ()) -> nil
  (let (:a nil)) -> error, :a and nil can't be used as variables


9.4 Functions


9.4.1 Operator defun


name (param* [: opt-param*] [. rest-param])


The defun operator introduces a new function in the global function namespace. The function is similar to a lambda, and has the same parameter syntax and semantics as the lambda operator.

Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.

Unlike in lambda, the body-form-s of a defun are surrounded by a block. The name of this block is the same as the name of the function, making it possible to terminate the function and return a value using (return-from name value). For more information, see the definition of the block operator.

A function may call itself by name, allowing for recursion.

The special symbols t and nil may not be used as function names. Neither can keyword symbols.

It is possible to define methods as well as macros with defun, as an alternative to the defmeth and defmacro forms.

To define a method, the syntax (meth type name) should be used as the argument to the name parameter. This gives rise to the syntax (defun (meth type name) args form*) which is equivalent to the (defmeth type name args form*) syntax.

Macros can be defined using (macro name) as the name parameter of defun. This way of defining a macro doesn't support destructuring; it defines the expander as an ordinary function with an ordinary argument list. To work, the function must accept two arguments: the entire macro call form that is to be expanded, and the macro environment. Thus, the macro definition syntax is (defun (macro name) form env form*) which is equivalent to the (defmacro name (:form form :env env) form*) syntax.

Dialect Note:

In ANSI Common Lisp, keywords may be used as function names. In TXR Lisp, they may not.

Dialect Note:

A function defined by defun may co-exist with a macro defined by defmacro. This is not permitted in ANSI Common Lisp.


9.4.2 Operator lambda


  (lambda (
param* [: opt-param*] [. rest-param])


The lambda operator produces a value which is a function. Like in most other Lisps, functions are objects in TXR Lisp. They can be passed to functions as arguments, returned from functions, aggregated into lists, stored in variables, et cetera.

Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.

The first argument of lambda is the list of parameters for the function. It may be empty, and it may also be an improper list (dot notation) where the terminating atom is a symbol other than nil. It can also be a single symbol.

The second and subsequent arguments are the forms making up the function body. The body may be empty.

When a function is called, the parameters are instantiated as variables that are visible to the body forms. The variables are initialized from the values of the argument expressions appearing in the function call.

The dotted notation can be used to write a function that accepts a variable number of arguments. There are two ways write a function that accepts only a variable argument list and no required arguments:

  (lambda (.
rest-param) ...)
rest-param ...)

(These notations are syntactically equivalent because the list notation (. X) actually denotes the object X which isn't wrapped in any list).

The keyword symbol : (colon) can appear in the parameter list. This is the symbol in the keyword package whose name is the empty string. This symbol is treated specially: it serves as a separator between required parameters and optional parameters. Furthermore, the : symbol has a role to play in function calls: it can be specified as an argument value to an optional parameter by which the caller indicates that the optional argument is not being specified. It will be processed exactly that way.

An optional parameter can also be written in the form (name expr [sym]). In this situation, if the call does not specify a value for the parameter (or specifies a value as the keyword : (colon)) then the parameter takes on the value of the expression expr. If sym is specified, then sym will be introduced as an additional binding with a Boolean value which indicates whether or not the optional parameter had been specified by the caller.

The initializer expressions are evaluated an environment in which all of the previous parameters are visible, in addition to the surrounding environment of the lambda. For instance:

  (let ((default 0))
    (lambda (str : (end (length str)) (counter default))
      (list str end counter)))

In this lambda, the initializing expression for the optional parameter end is (length str), and the str variable it refers to is the previous argument. The initializer for the optional variable counter is the expression default, and it refers to the binding established by the surrounding let. This reference is captured as part of the lambda's lexical closure.


Counting function:
This function, which takes no arguments, captures the variable counter. Whenever this object is called, it increments counter by 1 and returns the incremented value.

  (let ((counter 0))
    (lambda () (inc counter)))

Function that takes two or more arguments:
The third and subsequent arguments are aggregated into a list passed as the single parameter z:

  (lambda (x y . z) (list 'my-arguments-are x y z))

Variadic function:

  (lambda args (list 'my-list-of-arguments args))

Optional arguments:

  [(lambda (x : y) (list x y)) 1] -> (1 nil)
  [(lambda (x : y) (list x y)) 1 2] -> (1 2)


9.4.3 Macros flet and labels


  (flet ({(
name param-list function-body-form*)}*)

  (labels ({(
name param-list function-body-form*)}*)


The flet and labels macros bind local, named functions in the lexical scope.

Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.

The difference between flet and labels is that a function defined by labels can see itself, and therefore recurse directly by name. Moreover, if multiple functions are defined by the same labels construct, they all have each other's names in scope of their bodies. By contrast, a flet-defined function does not have itself in scope and cannot recurse. Multiple functions in the same flet do not have each other's names in their scopes.

More formally, the function-body-form-s and param-list of the functions defined by labels are in a scope in which all of the function names being defined by that same labels construct are visible.

Under both labels and flet, the local functions that are defined are lexically visible to the main body-form-s.

Note that labels and flet are properly scoped with regard to macros. During macro expansion, function bindings introduced by these macro operators shadow macros defined by macrolet and defmacro.

Furthermore, function bindings introduced by labels and flet also shadow symbol macros defined by symacrolet, when those symbol macros occur as arguments of a dwim form.

See also: the macrolet operator.

Dialect Note:

The flet and labels macros do not establish named blocks around the body forms of the local functions which they bind. This differs from ANSI Common Lisp, whose local function have implicit named blocks, allowing for return-from to be used.


  ;; Wastefully slow algorithm for determining evenness.
  ;; Note:
  ;; - mutual recursion between labels-defined functions
  ;; - inner is-even bound by labels shadows the outer
  ;;   one bound by defun so the (is-even n) call goes
  ;;   to the local function.

  (defun is-even (n)
   (labels ((is-even (n)
              (if (zerop n) t (is-odd (- n 1))))
            (is-odd (n)
              (if (zerop n) nil (is-even (- n 1)))))
     (is-even n)))


9.4.4 Function call


function argument*)


The call function invokes function, passing it the given arguments, if any.


Apply arguments 1 2 to a lambda which adds them to produce 3:

  (call (lambda (a b) (+ a b)) 1 2)

Useless use of call on a named function; equivalent to (list 1 2):

  (call (fun list) 1 2)


9.4.5 Operator fun




The fun operator retrieves the function object corresponding to a named function in the current lexical environment.

The function-name may be a symbol denoting a named function: a built in function, or one defined by defun.

The function-name may also take any of the forms specified in the description of the func-get-name function. If such a function-name refers to a function which exists, then the fun operator yields that function.

Note: the fun operator does not see macro bindings via their symbolic names with which they are defined by defmacro. However, the name syntax (macro name) may be used to refer to macros. This syntax is documented in the description of func-get-name. It is also possible to retrieve a global macro expander using the function symbol-macro.


9.4.6 Operator dwim


  (set (dwim
obj-place index [alt]) new-value)
  (set '['
obj-place index [alt]']' new-value)


The dwim operator's name is an acronym: DWIM may be taken to mean "Do What I Mean", or alternatively, "Dispatch, in a Way that is Intelligent and Meaningful".

The notation [...] is a shorthand which denotes (dwim ...).

Note that since the [ and ] are used in this document for indicating optional syntax, in the above Syntax synopsis the quoted notation '[' and ']' denotes bracket tokens which literally appear in the syntax.

The dwim operator takes a variable number of arguments, which are treated as expressions to be individually macro-expanded and evaluated, using the same rules.

This means that the first argument isn't a function name, but an ordinary expression which can simply compute a function object (or, more generally, a callable object).

Furthermore, for those arguments of dwim which are symbols (after all macro-expansion is performed), the evaluation rules are altered. For the purposes of resolving symbols to values, the function and variable binding namespaces are considered to be merged into a single space, creating a situation that is very similar to a Lisp-1 style dialect.

This special Lisp-1 evaluation is not recursively applied. All arguments of dwim which, after macro expansion, are not symbols are evaluated using the normal Lisp-2 evaluation rules. Thus, the DWIM operator must be used in every expression where the Lisp-1 rules for reducing symbols to values are desired.

If a symbol has bindings both in the variable and function namespace in scope, and is referenced by a dwim argument, this constitutes a conflict which is resolved according to two rules. When nested scopes are concerned, then an inner binding shadows an outer binding, regardless of their kind. An inner variable binding for a symbol shadows an outer or global function binding, and vice versa.

If a symbol is bound to both a function and variable in the global namespace, then the variable binding is favored.

Macros do not participate in the special scope conflation, with one exception. What this means is that the space of symbol macros is not folded together with the space of operator macros. An argument of dwim that is a symbol might be symbol macro, variable or function, but it cannot be interpreted as the name of a operator macro.

The exception is this: from the perspective of a dwim form, function bindings can shadow symbol macros. If a function binding is defined in an inner scope relative to a symbol macro for the same symbol, using flet or labels, the function hides the symbol macro. In other words, when macro expansion processes an argument of a dwim form, and that argument is a symbol, it is treated specially in order to provide a consistent name lookup behavior. If the innermost binding for that symbol is a function binding, it refers to that function binding, even if a more outer symbol macro binding exists, and so the symbol is not expanded using the symbol macro. By contrast, in an ordinary form, a symbolic argument never resolves to a function binding. The symbol refers to either a symbol macro or a variable, whichever is nested closer.

If, after macro expansion, the leftmost argument of the dwim is the name of a special operator or macro, the dwim form doesn't denote an invocation of that operator or macro. A dwim form is an invocation of the dwim operator, and the leftmost argument of that operator, if it is a symbol, is treated as a binding to be resolved in the variable or function namespace, like any other argument. Thus [if x y] is an invocation of the if function, not the if operator.

How many arguments are required by the dwim operator depends on the type of object to which the first argument expression evaluates. The possibilities are:

[function argument*]
Call the given function object with the given arguments.

[symbol argument*]
If the first expression evaluates to a symbol, that symbol is resolved in the function namespace, and then the resulting function, if found, is called with the given arguments.

[sequence index]
Retrieve an element from sequence, at the specified index, which is a nonnegative integer.

This form is also a syntactic place. If a value is stored to this place, it replaces the element.

The place may also be deleted, which has the effect of removing the element from the sequence, shifting the elements at higher indices, if any, down one element position, and shortening the sequence by one. If the place is deleted, and if sequence is a list, then the sequence form itself must be a place.

Retrieve the specified range of elements. The range of elements is specified in the from and to fields of a range object. The .. (dotdot) syntactic sugar denotes it construction via the rcons function. See the section on Range Indexing below.

This form is also a syntactic place. Storing a value in this place has the effect of replacing the subsequence with a new subsequence. Deleting the place has the effect of removing the specified subsequence from sequence. If sequence is a list, then the sequence form must itself be a place. The new-value argument in a range assignment can be a string, vector or list, regardless of whether the target is a string, vector or list. If the target is a string, the replacement sequence must be a string, or a list or vector of characters.

[sequence index-list]
Elements specified by index-list, which may be a list or vector, are extracted from sequence and returned as a sequence of the same kind as sequence.

This form is equivalent to (select sequence where-index) except when the target of an assignment operation.

This form is a syntactic place if sequence is one. If a sequence is assigned to this place, then elements of the sequence are distributed to the specified locations.

The following equivalences hold between index-list-based indexing and the select and replace functions, except that set always returns the value assigned, whereas replace returns its first argument:

  [seq idx-list] <--> (select seq idx-list)

  (set [seq idx-list] new) <--> (replace seq new idx-list)

Note that unlike the select function, this does not support [hash index-list] because since hash keys may be lists, that syntax is indistinguishable from a simple hash lookup where index-list is the key.

[hash key [alt]]
Retrieve a value from the hash table corresponding to key, or else return alt if there is no such entry. The expression alt is always evaluated, whether or not its value is used.

[regex [start [from-end]] string ]
Determine whether regular expression regex matches string, and in that case return the (possibly empty) leftmost matching substring. Otherwise, return nil.

If start is specified, it gives the starting position where the search begins, and if from-end is given, and has a value other than nil, it specifies a search from right to left. These optional arguments have the same conventions and semantics as their equivalents in the search-regst function.

Note that string is always required, and is always the rightmost argument.

[struct arg*]
The structure instance struct is inquired whether it supports a method named by the symbol lambda. If so, that method is invoked on the object. The method receives struct followed by the value of every arg. If this form is used as a place, then the object must support a lambda-set method.
[carray index]
Element and range indexing is possible on object of type carray which manipulate arrays in a foreign ("C language") representation, and are closely associated with the Foreign Function Interface (FFI). Just like in the case of sequences, the semantics of referencing carray objects with the bracket notation is based on the functions ref, refset, sub and replace. These, in turn, rely on the specialized functions. carray-ref, carray-refset, carray-sub and carray-replace.
[buf index]
Indexing is supported for objects of type buf. This provides a way to access and store the individual bytes of a buffer.

Range Indexing:

Vector and list range indexing is based from zero, meaning that the first element is numbered zero, the second one and so on. zero. Negative values are allowed; the value -1 refers to the last element of the vector or list, and -2 to the second last and so forth. Thus the range 1 .. -2 means "everything except for the first element and the last two".

The symbol t represents the position one past the end of the vector, string or list, so 0 .. t denotes the entire list or vector, and the range t .. t represents the empty range just beyond the last element. It is possible to assign to t .. t. For instance:

  (defvar list '(1 2 3))
  (set [list t .. t] '(4)) ;; list is now (1 2 3 4)

The value zero has a "floating" behavior when used as the end of a range. If the start of the range is a negative value, and the end of the range is zero, the zero is interpreted as being the position past the end of the sequence, rather than the first element. For instance the range -1..0 means the same thing as -1..t. Zero at the start of a range always means the first element, so that 0..-1 refers to all the elements except for the last one.


The dwim operator allows for a Lisp-1 flavor of programming in TXR Lisp, which is principally a Lisp-2 dialect.

A Lisp-1 dialect is one in which an expression like (a b) treats both a and b as expressions subject to the same evaluation rules—at least, when a isn't an operator or an operator macro. This means that the symbols a and b are resolved to values in the same namespace. The form denotes a function call if the value of variable a is a function object. Thus in a Lisp-1, named functions do not exist as such: they are just variable bindings. In a Lisp-1, the form (car 1) means that there is a variable called car, which holds a function, which is retrieved from that variable and the argument 1 is applied to it. In the expression (car car), both occurrences of car refer to the variable, and so this form applies the car function to itself. It is almost certainly meaningless. In a Lisp-2 (car 1) means that there is a function called car, in the function namespace. In the expression (car car) the two occurrences refer to different bindings: one is a function and the other a variable. Thus there can exist a variable car which holds a cons cell object, rather than the car function, and the form makes sense.

The Lisp-1 approach is useful for functional programming, because it eliminates cluttering occurrences of the call and fun operators. For instance:

  ;; regular notation

  (call foo (fun second) '((1 a) (2 b)))

  ;; [] notation

  [foo second '((1 a) (2 b))]

Lisp-1 dialects can also provide useful extensions by giving a meaning to objects other than functions in the first position of a form, and the dwim/[...] syntax does exactly this.

TXR Lisp is a Lisp-2 because Lisp-2 also has advantages. Lisp-2 programs which use macros naturally achieve hygiene because lexical variables do not interfere with the function namespace. If a Lisp-2 program has a local variable called list, this does not interfere with the hidden use of the function list in a macro expansion in the same block of code. Lisp-1 dialects have to provide hygienic macro systems to attack this problem. Furthermore, even when not using macros, Lisp-1 programmers have to avoid using the names of functions as lexical variable names, if the enclosing code might use them.

The two namespaces of a Lisp-2 also naturally accommodate symbol macros and operator macros. Whereas functions and variables can be represented in a single namespace readily, because functions are data objects, this is not so with symbol macros and operator macros, the latter of which are distinguished syntactically by their position in a form. In a Lisp-1 dialect, given (foo bar), either of the two symbols could be a symbol macro, but only foo can possibly be an operator macro. Yet, having only a single namespace, a Lisp-1 doesn't permit (foo foo), where foo is simultaneously a symbol macro and an operator macro, though the situation is unambiguous by syntax even in Lisp-1. In other words, Lisp-1 dialects do not entirely remove the special syntactic recognition given to the leftmost position of a compound form, yet at the same time they prohibit the user from taking full advantage of it by providing only one namespace.

TXR Lisp provides the "best of both worlds": the DWIM brackets notation provides a model of Lisp-1 computation that is purer than Lisp-1 dialects (since the leftmost argument is not given any special syntactic treatment at all) while the Lisp-2 foundation provides a traditional Lisp environment with its "natural hygiene".


9.5 Sequencing, Selection and Iteration


9.5.1 Operators progn and prog1




The progn operator evaluates forms in order, and returns the value of the last form. The return value of the form (progn) is nil.

The prog1 operator evaluates forms in order, and returns the value of the first form. The return value of the form (prog1) is nil.

Various other operators such as let also arrange for the evaluation of a body of forms, the value of the last of which is returned. These operators are said to feature an implicit progn.


9.5.2 Operator cond


  (cond {(
test form*)}*)


The cond operator provides a multi-branching conditional evaluation of forms. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.

The forms are processed from left to right as follows: the first form, test, in each group is evaluated. If it evaluates true, then the remaining forms in that group, if any, are also evaluated. Processing then terminates and the result of the last form in the group is taken as the result of cond. If test is the only form in the group, then result of test is taken as the result of cond.

If the first form of a group yields nil, then processing continues with the next group, if any. If all form groups yield nil, then the cond form yields nil. This holds in the case that the syntax is empty: (cond) yields nil.


9.5.3 Macros caseq, caseql and casequal


test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])


These three macros arrange for the evaluation of test-form, whose value is then compared against the key or keys in each normal-clause in turn. When the value matches a key, then the remaining forms of normal-clause are evaluated, and the value of the last form is returned; subsequent clauses are not evaluated. When the value doesn't match any of the keys of a normal-clause then the next normal-clause is tested. If all these clauses are exhausted, and there is no else-clause, then the value nil is returned. Otherwise, the forms in the else-clause are evaluated, and the value of the last one is returned. If there are no forms, then nil is returned.

The syntax of a normal-clause takes on these two forms:

key form*)

where key may be an atom which denotes a single key, or else a list of keys. There is a restriction that the symbol t may not be used as key. The form (t) may be used as a key to match that symbol.

The syntax of an else-clause is:


which resembles a form that is often used as the final clause in the cond syntax.

The three forms of the case construct differ from what type of test they apply between the value of test-form and the keys. The caseq macro generates code which uses the eq function's equality. The caseql macro uses eql, and casequal uses equal.


  (let ((command-symbol (casequal command-string
                          (("q" "quit") 'quit)
                          (("a" "add") 'add)
                          (("d" "del" "delete") 'delete)
                          (t 'unknown))))


9.5.4 Macros caseq*, caseql* and casequal*


test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])


The caseq*, caseql*, and casequal* macros are similar to the macros caseq, caseql, and casequal, differing from them in only the following regard. The normal-clause, of these macros has the form
evaluated-key form*) where evaluated-key is either an atom, which is evaluated to produce a key, or else else a compound form, whose elements are evaluated as forms, producing multiple keys. This evaluation takes place at macro-expansion time, in the dynamic environment.

The else-clause works the same way under these macros as under caseq et al.

Note that although in a normal-clause, evaluated-key must not be the atom t, there is no restriction against it being an atom which evaluates to t. In this situation, the value t has no special meaning.

The evaluated-key expressions are evaluated in the order in which they appear in the construct, at the time the caseq*, caseql* or casequal* macro is expanded.

Note: these macros allow the use of variables and global symbol macros as case keys.


  (defvarl red 0)
  (defvarl green 1)
  (defvarl blue 2)

  (let ((color blue))
    (caseql* color
      (red "hot")
      ((green blue) "cool")))
  --> "cool"


9.5.5 Operator/function if


cond t-form [e-form])
cond then [else]']'


There exist both an if operator and an if function. A list form with the symbol if in the fist position is interpreted as an invocation of the if operator. The function can be accessed using the DWIM bracket notation and in other ways.

The if operator provides a simple two-way-selective evaluation control. The cond form is evaluated. If it yields true then t-form is evaluated, and that form's return value becomes the return value of the if. If cond yields false, then e-form is evaluated and its return value is taken to be that of if. If e-form is omitted, then the behavior is as if e-form were specified as nil.

The if function provides no evaluation control. All of arguments are evaluated from left to right. If the cond argument is true, then it returns the then argument, otherwise it returns the value of the else argument if present, otherwise it returns nil.


9.5.6 Operator/function and




There exist both an and operator and an and function. A list form with the symbol and in the fist position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.

The and operator provides three functionalities in one. It computes the logical "and" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). It also provides an idiom for the convenient substitution of a value in place of nil when some other values are all true.

The and operator evaluates as follows. First, a return value is established and initialized to the value t. The form-s, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when nil is stored in the return value. When evaluation stops, the operator yields the return value.

The and function provides no evaluation control; it receives all of its arguments fully evaluated. If it is given no arguments, it returns t. If it is given one or more arguments, and any of them are nil, it returns nil. Otherwise it returns the value of the last argument.


  (and) -> t
  (and (> 10 5) (stringp "foo")) -> t
  (and 1 2 3) -> 3  ;; shorthand for (if (and 1 2) 3).


9.5.7 Operator/function or




There exist both an or operator and an or function. A list form with the symbol or in the fist position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.

The or operator provides three functionalities in one. It computes the logical "or" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). The behavior of or also provides an idiom for the selection of the first non-nil value from a sequence of forms.

The or operator evaluates as follows. First, a return value is established and initialized to the value nil. The form-s, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when a true value is stored into the return value. When evaluation stops, the operator yields the return value.

The or function provides no evaluation control; it receives all of its arguments fully evaluated. If it is given no arguments, it returns nil. If all of its arguments are nil, it also returns nil. Otherwise, it returns the value of the first argument which isn't nil.


  (or) -> nil
  (or 1 2) -> 1
  (or nil 2) -> 2
  (or (> 10 20) (stringp "foo")) -> t


9.5.8 Macros when and unless


expression form*)
expression form*)


The when macro operator evaluates expression. If expression yields true, and there are additional forms, then each form is evaluated. The value of the last form is becomes the result value of the when form. If there are no forms, then the result is nil.

The unless operator is similar to when, except that it reverses the logic of the test. The forms, if any, are evaluated if, and only if expression is false.


9.5.9 Macros while and until


expression form*)
expression form*)


The while macro operator provides a looping construct. It evaluates expression. If expression yields nil, then the evaluation of the while form terminates, producing the value nil. Otherwise, if there are additional forms, then each form is evaluated. Next, evaluation returns to expression, repeating all of the previous steps.

The until macro operator is similar to while, except that the until form terminates when expression evaluates true, rather than false.

These operators arrange for the evaluation of all their enclosed forms in an anonymous block. Any of the form-s, or expression, may use the return operator to terminate the loop, and optionally to specify a result value for the form.

The only way these forms can yield a value other than nil is if the return operator is used to terminate the implicit anonymous block, and is given an argument, which becomes the result value.


9.5.10 Macros while* and until*


expression form*)
expression form*)


The while* and until* macros are similar, respectively, to the macros while and until.

They differ in one respect: they begin by evaluating the form-s one time unconditionally, without first evaluating expression. After this evaluation, the subsequent behavior is like that of while or until.

Another way to regard the behavior is that that these forms execute one iteration unconditionally, without evaluating the termination test prior to the first iteration. Yet another view is that these constructs relocate the test from the "top of the loop" to the "bottom of the loop".


9.5.11 Macro whilet


  (whilet ({
sym | (sym init-form)}+)


The whilet macro provides a construct which combines iteration with variable binding.

The evaluation of the form takes place as follows. First, fresh bindings are established for sym-s as if by the let* operator. It is an error for the list of variable bindings to be empty.

After the establishment of the bindings, the the value of the sym is tested. If the value is nil, then whilet terminates. Otherwise, body-form-s are evaluated in the scope of the variable bindings, and then whilet iterates from the beginning, again establishing fresh bindings for the sym-s, and testing the value of the last sym.

All evaluation takes place in an anonymous block, which can be terminated with the return operator. Doing so terminates the loop. If the whilet loop is thus terminated by an explicit return, a return value can be specified. Under normal termination, the return value is nil.

In the syntax, a small convenience is permitted. Instead of the last (sym init-form) it is permissible for the syntax (init-form) to appear, the sym being omitted. A machine-generated variable is substituted in place of the missing sym and that variable is then initialized from init-form and used as the basis of the test.


  ;; read lines of text from *std-input* and print them,
  ;; until the end-of-stream condition:

  (whilet ((line (get-line)))
    (put-line line))

  ;; read lines of text from *std-input* and print them,
  ;; until the end-of-stream condition occurs or
  ;; a line is identical to the character string "end".

  (whilet ((line (get-line))
           (more (and line (not (equal line "end")))))
    (put-line line))


9.5.12 Macros iflet and whenlet


  (iflet {({
sym | (sym init-form)}+) | atom-form}
then-form [else-form])
  (whenlet {({
sym | (sym init-form)}+) | atom-form}


The iflet and whenlet macros combine the variable binding of let* with conditional evaluation of if and when, respectively.

In either construct's syntax, a non-compound form atom-form may appear in place of the variable binding list. In this case, atom-form is evaluated as a form, and the construct is equivalent to its respective ordinary if or when counterpart.

If the list of variable bindings is empty, it is interpreted as the atom nil and treated as an atom-form.

If one or more bindings are specified rather than atom-form, then the evaluation of these forms takes place as follows. First, fresh bindings are established for sym-s as if by the let* operator.

Then, the last variable's value is tested. If it is not nil then the test is true, otherwise false.

In the syntax, a small convenience is permitted. Instead of the last (sym init-form) it is permissible for the syntax (init-form) to appear, the sym being omitted. A machine-generated variable is substituted in place of the missing sym and that variable is then initialized from init-form and used as the basis of the test. This is intended to be useful in situations in which then-form or else-form do not require access to the tested value.

In the case of the iflet operator, if the test is true, the operator evaluates then-form and yields its value. Otherwise the test is false, and if the optional else-form is present, that is evaluated instead and its value is returned. If this form is missing, then nil is returned.

In the case of the whenlet operator, if the test is true, then the body-form-s, if any, are evaluated. The value of the last one is returned, otherwise nil if the forms are missing. If the test is false, then evaluation of body-form-s is skipped, and nil is returned.


  ;; dispose of foo-resource if present
  (whenlet ((foo-res (get-foo-resource obj)))
    (foo-shutdown foo-res)
    (set-foo-resource obj nil))

  ;; Contrast with: above, using when and let
  (let ((foo-res (get-foo-resource obj)))
    (when foo-res
      (foo-shutdown foo-res)
      (set-foo-resource obj nil)))

  ;; print frobosity value if it exceeds 150
  (whenlet ((fv (get-frobosity-value))
            (exceeds-p (> fv 150)))
    (format t "frobosity value ~a exceeds 150\n" fv))

  ;; same as above, taking advantage of the
  ;; last variable being optional:
  (whenlet ((fv (get-frobosity-value))
            ((> fv 150)))
    (format t "frobosity value ~a exceeds 150\n" fv))

  ;; yield 4: 3 interpreted as atom-form
  (whenlet 3 4)

  ;; yield 4: nil interpreted as atom-form
  (iflet () 3 4)


9.5.13 Macro condlet


     ([({ sym | (
sym init-form)}+) | atom-form]


The condlet macro generalizes iflet.

Each argument is a compound consisting of at least one item: a list of bindings or atom-form. This item is followed by zero or more body-form-s.

If the are are no body-form-s then the situation is treated as if there were a single body-form specified as nil.

The arguments of condlet are considered in sequence, starting with the leftmost.

If the argument's left item is an atom-form then the form is evaluated. If it yields true, then the body-form-s next to it are evaluated in order, and the condlet form terminates, yielding the value obtained from the last body-form. If atom-form yields false, then the next argument is considered, if there is one.

If the argument's left item is a list of bindings, then it is processed with exactly the same logic as under the iflet macro. If the last binding contains a true value, then the adjoining body-form-s are evaluated in a scope in which all of the bindings are visible, and condlet terminates, yielding the value of the last body-form. Otherwise, the next argument of condlet is considered (processed in a scope in which the bindings produced by the current item are no longer visible).

If condlet runs out of arguments, it terminates and returns nil.


  (let ((l '(1 2 3)))
       ;; first arg
       (((a (first l)   ;; a binding gets 1
         (b (second l)) ;; b binding gets 2
         (g (> a b))))  ;; last variable g is nil
        'foo)           ;; not evaluated
       ;; second arg
       (((b (second l)  ;; b gets 2
         (c (third l))  ;; c gets 3
         (g (> b c))))  ;; last variable g is true
        'bar)))         ;; condlet terminates
   --> bar              ;; result is bar


9.5.14 Macro ifa


cond then [else])


The ifa macro provides a anaphoric conditional operator resembling the if operator. Around the evaluation of the then and else forms, the symbol it is implicitly bound to a subexpression of cond, a subexpression which is thereby identified as the it-form. This it alias provides a convenient reference to that place or value, similar to the word "it" in the English language, and similar anaphoric pronouns in other languages.

If it is bound to a place form, the binding is established as if using the placelet operator: the form is evaluated only once, even if the it alias is used multiple times in the then or else expressions. Otherwise, if the form is not a syntactic place it is bound as an ordinary lexical variable to the form's value.

An it-candidate is an an expression viable for having its value or storage location bound to the it symbol. An it-candidate is any expression which is not a constant expression according to the constantp function, and not a symbol.

The ifa macro imposes applies several rules to the cond expression:

The cond expression must be either an atom, a function call form, or a dwim form. Otherwise the ifa expression is ill-formed, and throws an exception at macro-expansion time. For the purposes of these rules, a dwim form is considered as a function call expression, whose first argument is the second element of the form. That is to say, [f x] which is equivalent to (dwim f x) is treated similarly to (f x) as a one-argument call.

If the cond expression is a function call with two or more arguments, at most one of them may be an it-candidate. If two or more arguments are it-candidates, the situation is ambiguous. The ifa expression is ill-formed and throws an exception at macro-expansion time.
If cond is an atom, or a function call expression with no arguments, then the it symbol is not bound. Effectively, ifa macro behaves like the ordinary if operator.
If cond is a function call or dwim expression with exactly one argument, then the it variable is bound to the argument expression, except when the function being called is not, null, or false. This binding occurs regardless of whether the expression is an it-candidate.
If cond is a function call with exactly one argument to the Boolean negation function which goes by one of the three names not, null, or false, then that situation is handled by a rewrite according to the following pattern:

  (ifa (not
expr) then else) -> (ifa expr else then)

which applies likewise for null or false substituted for not. The Boolean inverse function is removed, and the then and else expressions are exchanged.

If cond is a function call with two or more arguments, then it is only well-formed if at most one of those arguments is an it-candidate. If there is one such argument, then the it variable is bound to it.
Otherwise the variable is bound to the leftmost argument expression, regardless of whether that argument expression is an it-candidate.

In all other regards, the ifa macro behaves similarly to if.

The cond expression is evaluated, and, if applicable, the value of, or storage location denoted by the appropriate argument is captured and bound to the variable it whose scope extends over the then form, as well as over else, if present.

If cond yields a true value, then then is evaluated and the resulting value is returned, otherwise else is evaluated if present and its value is returned. A missing else is treated as if it were the nil form.


  (ifa t 1 0)  ->  1

  ;; Rule 6: it binds to (* x x), which is
  ;; the only it-candidate.
  (let ((x 6) (y 49))
    (ifa (> y (* x x)) ;; it binds to (* x x)
      (list it)))
  -> (36)

  ;; Rule 4: it binds to argument of evenp,
  ;; even though 4 isn't an it-candidate.
  (ifa (evenp 4)
    (list it))
  -> (4)

  ;; Rule 5:
  (ifa (not (oddp 4))
    (list it))
  -> (4)

  ;; Rule 7: no candidates: choose leftmost
  (let ((x 6) (y 49))
    (ifa (< 1 x y)
      (list it)))
  -> (1)

  -> (4)
  ;; Violation of Rule 1:
  ;; while is not a function
  (ifa (while t (print 42))
    (list it))
  --> exception!

  ;; Violation of Rule 2:
  (let ((x 6) (y 49))
    (ifa (> (* y y y) (* x x)))
      (list it))
  --> exception!


9.5.15 Macro conda


  (conda {(
test form*)}*)


The conda operator provides a multi-branching conditional evaluation of forms, similarly to the cond operator. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.

The conda operator is anaphoric: it expands into a nested structure of zero or more ifa invocations, according to these patterns:

  (conda) -> nil
  (conda (x y ...) ...) -> (ifa x (progn y ...) (conda ...))

Thus, conda inherits all the restrictions on the test expressions from ifa, as well as the anaphoric it variable feature.


9.5.16 Macro whena


test form*)


The whena macro is similar to the when macro, except that it is anaphoric in exactly the same manner as the ifa macro. It may be understood as conforming to the following equivalence:

  (whena x f0 f2 ...)  <-->  (if x (progn f0 f2 ...))


9.5.17 Macro dotimes


  (dotimes (
var count-form [result-form])


The dotimes macro implements a simple counting loop. var is established as a variable, and initialized to zero. count-form is evaluated one time to produce a limiting value, which should be a number. Then, if the value of var is less than the limiting value, the body-form-s are evaluated, var is incremented by one, and the process repeats with a new comparison of var against the limiting value possibly leading to another evaluation of the forms.

If var is found to equal or exceed the limiting value, then the loop terminates.

When the loop terminates, its return value is nil unless a result-form is present, in which case the value of that form specifies the return value.

body-form-s as well as result-form are evaluated in the scope in which the binding of var is visible.


9.5.18 Operators each, each*, collect-each, collect-each*, append-each and append-each*


  (each ({(
sym init-form)}*) body-form*)
  (each* ({(
sym init-form)}*) body-form*)
  (collect-each ({(
sym init-form)}*) body-form*)
  (collect-each* ({(
sym init-form)}*) body-form*)
  (append-each ({(
sym init-form)}*) body-form*)
  (append-each* ({(
sym init-form)}*) body-form*)


These operators establish a loop for iterating over the elements of one or more lists. Each init-form must evaluate to a list. The lists are then iterated in parallel over repeated evaluations of the body-form-s, with each sym variable being assigned to successive elements of its list. The shortest list determines the number of iterations, so if any of the init-form-s evaluate to an empty list, the body is not executed.

If the list of (syn << init-form ) pairs itself is empty, then an infinite loop is specified.

The body forms are enclosed in an anonymous block, allowing the return operator to terminate the loop prematurely and optionally specify the return value.

The collect-each and collect-each* variants are like each and each*, except that for each iteration, the resulting value of the body is collected into a list. When the iteration terminates, the return value of the collect-each or collect-each* operator is this collection.

The append-each and append-each* variants are like each and each*, except that for each iteration other than the last, the resulting value of the body must be a list. The last iteration may produce either an atom or a list. The objects produced by the iterations are combined together as if they were arguments to the append function, and the resulting value is the value of the append-each or append-each* operator.

The alternate forms denoted by the adorned symbols each*, collect-each* and append-each*, differ from each, collect-each and append-each* in the following way. The plain forms evaluate the init-form-s in an environment in which none of the sym variables are yet visible. By contrast, the alternate forms evaluate each init-form in an environment in which bindings for the previous sym variables are visible. In this phase of evaluation, sym variables are list-valued: one by one they are each bound to the list object emanating from their corresponding init-form. Just before the first loop iteration, however, the sym variables are assigned the first item from each of their lists.


 ;; print numbers from 1 to 10 and whether they are even or odd
 (each* ((n (range 1 10)) ;; n list a list here!
         (even (collect-each ((n m)) (evenp m))))
   ;; n is an item here!
   (format t "~s is ~s\n" n (if even "even" "odd")))


 1 is odd
 2 is even
 3 is odd
 4 is even
 5 is odd
 6 is even
 7 is odd
 8 is even
 9 is odd
 10 is even


9.5.19 Operators for and for*


  ({for | for*} ({
sym | (sym init-form)}*)
test-form result-form*])


The for and for* operators combine variable binding with loop iteration. The first argument is a list of variables with optional initializers, exactly the same as in the let and let* operators. Furthermore, the difference between for and for* is like that between let and let* with regard to this list of variables.

The for and for* operators execute these steps:

Establish an anonymous block over the entire form, allowing the return operator to be used to terminate the loop.
Establish bindings for the specified variables similarly to let and let*. The variable bindings are visible over the test-form, each result-form, each inc-form and each body-form.
Evaluate test-form. If test-form yields nil, then the loop terminates. Each result-form is evaluated, and the value of the last of these forms is is the result value of the loop. If there are no result-form-s then the result value is nil. If the test-form is omitted, then the test is taken to be true, and the loop does not terminate.
Otherwise, if test-form yields true, then each body-form is evaluated in turn. Then, each inc-form is evaluated in turn and processing resumes at step 2.

Furthermore, the for and for* operators establish an anonymous block, allowing the return operator to be used to terminate at any point.


9.5.20 Macros doloop and doloop*


  ({doloop | doloop*}
     ({ sym | (
sym [init-form [step-form])}*)
test-form result-form*])


The doloop and doloop* macros provide an iteration construct inspired by the ANSI Common Lisp do and do* macros.

Each sym element in the form must be a symbol suitable for use as a variable name.

The tagbody-form-s are placed into an implicit tagbody, meaning that a tagbody-form which is an integer, character or symbol is interpreted as a tagbody label which may be the target of a control transfer via the go macro.

The doloop macro binds each sym to the value produced by evaluating the adjacent init-form. Then, in the environment in which these variables now exist, test-form is evaluated. If that form yields nil, then the loop terminates. The result-form-s are evaluated, and the value of the last one is returned.

If result-form-s are absent, then nil is returned.

If test-form is also absent, then the loop terminates and returns nil.

If test-form produces a true value, then result-form-s are not evaluated. Instead, the implicit tagbody comprised of the tagbody-form-s is evaluated. If that evaluation terminates normally, the loop variables are then updated by assigning to each sym the value of step-form.

The following defaulting behaviors apply in regard to the variable syntax. For each sym which has an associated init-form but no step-form, the init-form is duplicated and taken as the step-form. Thus a variable specification like (x y) is equivalent to (x y y). If both forms are omitted, then the init-form is taken to be nil, and the step-form is taken to be sym. This means that the variable form (x) is equivalent to (x nil x) which has the effect that x retains its current value when the next loop iteration begins. Lastly, the sym variant is equivalent to (sym) so that x is also equivalent to (x nil x).

The differences between doloop and doloop* are: doloop binds the variables in parallel, similarly to let, whereas doloop* binds sequentially, like let*; moreover, doloop performs the step-form assignments in parallel as if using a single (pset sym0 step-form-0 sym1 step-form-1 ...) form, whereas doloop* performs the assignment sequentially as if using set rather than pset.

The doloop and doloop* macros establish an anonymous block, allowing early return from the loop, with a value, via the return operator.

Dialect Note:

These macros are substantially different from the ANSI Common Lisp do and do* macros. Firstly, the termination logic is inverted; effectively they implement "while" loops, whereas their ANSI CL counterparts implement "until" loops. Secondly, in the ANSI CL macros, the defaulting of the missing step-form is different. Variables with no step-form are not updated. In particular, this means that the form (x y) is not equivalent to (x y y); the ANSI CL macros do not feature the automatic replication of init-form into the step-form position.


9.5.21 Operators block and block*


name body-form*)
name-form body-form*)


The block operator introduces a named block around the execution of some forms. The name argument must be a symbol. Since a block name is not a variable binding, keyword symbols are permitted, and so are the symbols t and nil. A block named by the symbol nil is slightly special: it is understood to be an anonymous block.

The block* operator differs from block in that it evaluates name-form, which is expected to produce a symbol. The resulting symbol is used for the name of the block.

A named or anonymous block establishes an exit point for the return-from or return operator, respectively. These operators can be invoked within a block to cause its immediate termination with a specified return value.

A block also establishes a prompt for a delimited continuation. Anywhere in a block, a continuation can be captured using the sys:capture-cont function. Delimited continuations are described in the section Delimited Continuations. A delimited continuation allows an apparently abandoned block to be restarted at the capture point, with the entire call chain and dynamic environment between the prompt and the capture point intact.

Blocks in TXR Lisp have dynamic scope. This means that the following situation is allowed:

  (defun func () (return-from foo 42))
  (block foo (func))

The function can return from the foo block even though the foo block does not lexically surround foo.

It is because blocks are dynamic that the block* variant exists; for lexically scoped blocks, it would make little sense to have support a dynamically computed name.

Thus blocks in TXR Lisp provide dynamic non-local returns, as well as returns out of lexical nesting.

It is permitted for blocks to be aggressively progn-converted by compilation. This means that a block form which meets certain criteria is converted to a progn form which surrounds the body-form-s and thus no longer establishes an exit point.

A block form will be spared from progn-conversion by the compiler if it meets the following rules.

Any body-form references the block's name in a return, return-from, sys:abscond-from or sys:capture-cont expression.
The form contains at least one direct call to a function not in the standard TXR Lisp library.
The form contains at least one direct call to the functions sys:capture-cont, return*, sys:abscond*, match-fun, eval, load, compile, compile-file or compile-toplevel.
The form references any of the functions in rules 2 and 3 as a function binding via the dwim operator (or the DWIM brackets notation) or via the fun operator.
The form is a block* form; these are spared from the optimization.

Removal of blocks under the above rules means that some use of blocks which works in interpreted code will not work in compiled programs. Programs which adhere to the rules are not affected by such a difference.

Additionally, the compiler may progn-convert blocks in contravention of the above rules, but only if doing so makes no difference to visible program behavior.


  (defun helper ()
    (return-from top 42))

  ;; defun implicitly defines a block named top
  (defun top ()
    (helper) ;; function returns 42
    (prinl 'notreached)) ;; never printed

  (defun top2 ()
    (let ((h (fun helper)))
      (block top (call h))   ;; may progn-convert
      (block top (call 'helper)) ;; may progn-convert
      (block top (helper)))) ;; not removed
In the above examples, the block containing (call h) may be converted to progn because it doesn't express a direct call to the helper function. The block which calls helper using (call 'helper) is also not considered to be making a direct call.

Dialect Note:

In Common Lisp, blocks are lexical. A separate mechanism consisting of catch and throw operators performs non-local transfer based on symbols. The TXR Lisp example:

  (defun func () (return-from foo 42))
  (block foo (func))

is not allowed in Common Lisp, but can be transliterated to:

  (defun func () (throw 'foo 42))
  (catch 'foo (func))

Note that foo is quoted in CL. This underscores the dynamic nature of the construct. throw itself is a function and not an operator. Also note that the CL example, in turn, is even more closely transcribed back into TXR Lisp simply by replacing its throw and catch with return* and block*:

  (defun func () (return* 'foo 42))
  (block* 'foo (func))

Common Lisp blocks also do not support delimited continuations.


9.5.22 Operators return and return-from


  (return [
name [value])


The return operator must be dynamically enclosed within an anonymous block (a block named by the symbol nil). It immediately terminates the evaluation of the innermost anonymous block which encloses it, causing it to return the specified value. If the value is omitted, the anonymous block returns nil.

The return-from operator must be dynamically enclosed within a named block whose name matches the name argument. It immediately terminates the evaluation of the innermost such block, causing it to return the specified value. If the value is omitted, that block returns nil.


    (block foo
      (let ((a "abc\n")
            (b "def\n"))
        (pprint a *stdout*)
        (return-from foo 42)
        (pprint b *stdout*)))

Here, the output produced is "abc". The value of b is not printed because. return-from terminates block foo, and so the second pprint form is not evaluated.


9.5.23 Function return*


name [value])


The return* function is similar to the the return-from operator, except that name is an ordinary function parameter, and so when return* is used, an argument expression must be specified which evaluates to a symbol. Thus return* allows the target block of a return to be dynamically computed.

The following equivalence holds between the operator and function:

  (return-from a b)  <-->  (return* 'a b)

Expressions used as name arguments to return* which do not simply quote a symbol have no equivalent in return-from.


9.5.24 Macros tagbody and go


  (tagbody {
form | label}*)


The tagbody macro provides a form of the "go to" control construct. The arguments of a tagbody form are a mixture of zero or more forms and go labels. The latter consist of those arguments which are symbols, integers or characters. Labels are not considered by tagbody and go to be forms, and are not subject to macro expansion or evaluation.

The go macro is available inside tagbody. It is erroneous for a go form to occur outside of a tagbody. This situation is diagnosed by global macro called go, which unconditionally throws an error.

In the absence of invocations of go or other control transfers, the tagbody macro evaluates each form in left to right order. The go labels are ignored. After the last form is evaluated, the tagbody form terminates, and yields nil.

Any form itself, or else any of its sub-forms, may be the form (go label) where label matches one of the go labels of a surrounding tagbody. When this go form is evaluated, then the evaluation of form is immediately abandoned, and control transfers to the specified label. The forms are then evaluated in left-to-right order starting with the form immediately after that label. If the label is not followed by any forms, then the tagbody terminates. If label doesn't match to any label in any surrounding tagbody, the go form is erroneous.

The abandonment of a form by invocation of go is a dynamic transfer. All necessary unwinding inside form takes place.

The go labels are lexically scoped, but dynamically bound. Their scope being lexical means that the labels are not visible to forms which are not enclosed within the tagbody, even if their evaluation is invoked from that tagbody. The dynamic binding means that the labels of a tagbody form are established when it begins evaluating, and removed when that form terminates. Once a label is removed, it is not available to be the target of a go control transfer, even if that go form has the label in its lexical scope. Such an attempted transfer is erroneous.

It is permitted for tagbody forms to nest arbitrarily. The labels of an inner tagbody are not visible to an outer tagbody. However, the reverse is true: a go form in an inner tagbody may branch to a label in an outer tagbody, in which case the entire inner tagbody terminates.

In cases where the same objects are used as labels by an inner and outer tagbody, the inner labels shadow the outer labels.

There is no restriction on what kinds of symbols may be labels. Symbols in the keyword package as well as the symbols t and nil are valid tagbody labels.

Dialect Note:

ANSI Common Lisp tagbody supports only symbols and integers as labels (which are called "go tags"); characters are not supported.


  ;; print the numbers 1 to 10
  (let ((i 0))
      (go skip) ;; forward goto skips 0
      (prinl i)
      (when (<= (inc i) 10)
        (go again))))

  ;; Example of erroneous usage: by the time func is invoked
  ;; by (call func) the tagbody has already terminated. The
  ;; lambda body can still "see" the label, but it doesn't
  ;; have a binding.
  (let (func)
      (set func (lambda () (go label)))
      (go out)
      (prinl 'never-reached)
    (call func))

  ;; Example of unwinding when the unwind-protect
  ;; form is abandoned by (go out). Output is:
  ;;   reached
  ;;   cleanup
  ;;   out
         (prinl 'reached)
         (go out)
         (prinl 'notreached))
       (prinl 'cleanup))
     (prinl 'out))


9.5.25 Macros prog and prog*


  (prog ({
sym | (sym init-form)}*)
body-form | label}*)
  (prog* ({
sym | (sym init-form)}*)
body-form | label}*)


The prog and progn* macros combine the features of let and let*, respectively, anonymous block and tagbody.

The prog macro treats the sym and init-form expressions similarly to let, establishing variable bindings in parallel. The prog* macro treats these expressions in a similar way to let*.

The forms enclosed are treated like the argument forms of the tagbody macro: labels are permitted, along with use of go.

Finally, an anonymous block is established around all of the enclosed forms (both the init-form-s and body-forms-s) allowing the use of return to terminate evaluation with a value.

The prog macro may be understood according to the following equivalence:

   (prog vars forms ...)  <-->  (block nil
                                  (let vars
                                    (tagbody forms ...)))

Likewise, the prog* macro follows an analogous equivalence, with let replaced by let*.


9.6 Evaluation


9.6.1 Function eval


form [env])


The eval function treats the form object as a Lisp expression, which is evaluated. The side effects implied by the form are performed, and the value which it produces is returned. The optional env object specifies an environment for resolving the function and variable references encountered in the expression. If this argument is omitted nil then evaluation takes place in the global environment.

See also: the make-env function.


9.6.2 Function constantp


form [env ])


The constantp function determines whether form is a constant form, with respect to environment env.

If env is absent, the global environment is used. The env argument is used for macro-expanding form.

Currently, constantp returns true for any form which, after macro-expansion, is any of the following: a compound form with the symbol quote in its first position; a non-symbolic atom; or one of the symbols which evaluate to themselves and cannot be bound as variables. These symbols are the keyword symbols, and the symbols t and nil.

In the future, constantp will be able to recognize more constant forms, such as calls to certain functions whose arguments are constant forms.


9.6.3 Function make-env


  (make-env [
var-bindings [fun-bindings [next-env]]])


The make-env function creates an environment object suitable as the env parameter.

The var-bindings and fun-bindings parameters, if specified, should be association lists, mapping symbols to objects. The objects in fun-bindings should be functions, or objects callable as functions.

The next-env argument, if specified, should be an environment.

Note: bindings can also be added to an environment using the env-vbind and env-fbind functions.


9.6.4 Functions env-vbind and env-fbind


env symbol value)
env symbol value)


These functions bind a symbol to a value in either the function or variable space of environment env.

Values established in the function space should be functions or objects that can be used as functions such as lists, strings, arrays or hashes.

If symbol already exists in the environment, in the given space, then its value is updated with value.

If env is specified as nil, then the binding takes place in the global environment.


9.7 Global Environment


9.7.1 Accessors symbol-function, symbol-macro and symbol-value


  (symbol-function {
symbol | method-name} )
  (set (symbol-function
symbol) new-value)
  (set (symbol-macro
symbol) new-value)
  (set (symbol-value
symbol) new-value)


If given a symbol argument, the symbol-function function retrieves the value of the global function binding of the given symbol if it has one: that is, the function object bound to the symbol. If symbol has no global function binding, then nil is returned.

The symbol-function function supports method names of the form (meth struct slot) where struct names a struct type, and slot is either a static slot or one of the keyword symbols :init or :postinit which refer to special functions associated with a structure type. Names in this format are returned by the func-get-name function. The symbol-function function also supports names of the form (macro name) which denote macros. Thus, symbol-function provides unified access to functions, methods and macros.

The symbol-macro function retrieves the value of the global macro binding of symbol if it has one.

Note: the name of this function has nothing to do with symbol macros; it is named for consistency with symbol-function and symbol-value, referring to the "macro-expander binding of the symbol cell".

The value of a macro binding is a function object. Intrinsic macros are C functions in the TXR kernel, which receive the entire macro call form and macro environment, performing their own destructuring. Currently, macros written in TXR Lisp are represented as curried C functions which carry the following list object in their environment cell:

  (#<environment object>
macro-parameter-list body-form*)

Local macros created by macrolet have nil in place of the environment object.

This representation is likely to change or expand to include other forms in future TXR versions.

The symbol-value function retrieves the value stored in the dynamic binding of symbol that is apparent in the current context. If the variable has no dynamic binding, then symbol-value retrieves its value in the global environment. If symbol has no variable binding, but is defined as a global symbol macro, then the value of that symbol macro binding is retrieved. The value of a symbol macro binding is simply the replacement form.

Rather than throwing an exception, each of these functions returns nil if the argument symbol doesn't have the binding in the respective namespace or namespaces which that function searches.

A symbol-function, symbol-macro, or symbol-value form denotes a place, if symbol has a binding of the respective kind. This place may be assigned to or deleted. Assignment to the place causes the denoted binding to have a new value. Deleting a place with the del macro removes the binding, and returns the previous contents of that binding. A binding denoted by a symbol-function form is removed using fmakunbound, one denoted by by symbol-macro is removed using mmakunbound and a binding denoted by symbol-value is removed using makunbound.

Deleting a method via symbol-function is not possible; an attempt to do so has no effect.

Storing a value, using any one of these three accessors, to a nonexistent variable, function or macro binding, is not erroneous. It has has the effect of creating that binding.

Deleting a binding, using any of these three accessors, when the binding does not exist, also isn't erroneous. There is no effect and the del operator yields nil as the prior value, consistent with the behavior when accessors are used to retrieve a nonexistent value.

Dialect note:

In ANSI Common Lisp, the symbol-function function retrieves a function, macro or special operator binding of a symbol. These are all in one space and may not co-exist. In TXR Lisp, it retrieves a symbol's function binding only. The symbol-macro function doesn't exist in Common Lisp.


9.7.2 Functions boundp, fboundp and mboundp




boundp returns t if the symbol is bound as a variable or symbol macro in the global environment, otherwise nil.

fboundp returns t if the symbol has a function binding in the global environment, otherwise it returns nil nil.

mboundp returns t if the symbol has an operator macro binding in the global environment, otherwise nil.

Dialect Notes:

The boundp function in ANSI Common Lisp doesn't report that global symbol macros have a binding. They are not considered bindings. In TXR Lisp, they are considered bindings.

The ANSI Common Lisp fboundp yields true if its argument has a function, macro or operator binding. The behavior of the Common Lisp expression (fboundp x) in Common Lisp can be obtained in TXR Lisp using the

  (or (fboundp x) (mboundp x) (special-operator-p x))


The mboundp function doesn't exist in ANSI Common Lisp.


9.7.3 Functions makunbound, fmakunbound and mmakunbound




The function makunbound the binding of symbol from either the dynamic environment or the global symbol macro environment. After the call to makunbound, symbol appears to be unbound.

If the makunbound call takes place in a scope in which there exists a dynamic rebinding of symbol, the information for restoring the previous binding is not affected by makunbound. When that scope terminates, the previous binding will be restored.

If the makunbound call takes place in a scope in which the dynamic binding for symbol is the global binding, then the global binding is removed. When the global binding is removed, then if symbol was previously marked as special (for instance by defvar) this marking is removed.

Otherwise if symbol has a global symbol macro binding, that binding is removed.

If symbol has no apparent dynamic binding, and no global symbol macro binding, makunbound does nothing.

In all cases, makunbound returns symbol.

Dialect Note:

The behavior of makunbound differs from its counterpart in ANSI Common Lisp.

The makunbound function in Common Lisp only removes a value from a dynamic variable. The dynamic variable does not cease to exist, it only ceases to have a value (because a binding is a value). In TXR Lisp, the variable ceases to exist. The binding of a variable isn't its value, it is the variable itself: the association between a name and an abstract storage location, in some environment. If the binding is undone, the variable disappears.

The makunbound function in Common Lisp does not remove global symbol macros, which are not considered to be bindings in the variable namespace. That is to say, the Common Lisp boundp does not report true for symbol macros.

The Common Lisp makunbound also doesn't remove the special attribute from a symbol. If a variable is introduced with defvar and then removed with makunbound, the symbol continues to exhibit dynamic binding rather than lexical in subsequent scopes. In TXR Lisp, if a global binding is removed, so is the special attribute.


9.7.4 Functions fmakunbound and mmakunbound




The function fmakunbound removes any binding for symbol from the function namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.

The function mmakunbound removes any binding for symbol from the operator macro namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.

Dialect Note:

The behavior of fmakunbound differs from its counterpart in ANSI Common Lisp. The fmakunbound function in Common Lisp removes a function or macro binding, which do not coexist.

The mmakunbound function doesn't exist in Common Lisp.


9.7.5 Function func-get-form




The func-get-form function retrieves a source code form of func, which must be an interpreted function. The source code form has the syntax (name arglist body-form*) .


9.7.6 Function func-get-name


func [env])


The func-get-name tries to resolve the function object func to a name. If that is not possible, it returns nil.

The resolution is performed by an exhaustive search through up to three spaces.

If an environment is specified by env, then this is searched first. If a binding is found in that environment which resolves to the function, then the search terminates and the binding's symbol is returned as the function's name.

If the search through environment env fails, or if that argument is not specified, then the global environment is searched for a function binding which resolves to func. If such a binding is found, then the search terminates, and the binding's symbol is returned. If two or more symbols in the global environment resolve to the function, it is not specified which one is returned.

If the global function environment search fails, then the function is considered as a possible macro. The global macro environment is searched for a macro binding whose expander function is func, similarly to the way the function environment was searched. If a binding is found, then the syntax (macro name) is returned, where name is the name of the global macro binding that was found which resolves to func. If two or more global macro bindings share func, it is not specified which of those bindings provides name.

If the global macro search fails, then func is considered as a possible method. The static slot space of all struct types is searched for a slot which contains func. If such a slot is found, then the method name is returned, consisting of the syntax (meth type name) where type is a symbol denoting the struct type and name is the static slot of the struct type which holds func.

A check is also performed whether func might be equal to one of the two special functions of a structure type: its initfun or postinitfun, in which case it is returned as either the (meth type :init) or the (meth type :postinit) syntax.

If func is an interpreted function not found under any name, then a lambda expression denoting that function is returned in the syntax (lambda args form*)

If func cannot be identified as a function, then nil is returned.


9.7.7 Function func-get-env




The func-get-env function retrieves the environment object associated with function func. The environment object holds the captured bindings of a lexical closure.


9.7.8 Function functionp




The functionp function returns t if obj is a function, otherwise it returns nil.


9.7.9 Function interp-fun-p




The interp-fun-p function returns t if obj is an interpreted function, otherwise it returns nil.


9.7.10 Function vm-fun-p




The vm-fun-p function returns t if obj a function compiled for the virtual machine: a function representation produced by means of the functions compile-file, compile-toplevel or compile. If obj is of any other type, the function returns nil.


9.7.11 Function special-var-p




The special-var-p function returns t if obj is a symbol marked for special variable binding, otherwise it returns nil. Symbols are marked special by defvar and defparm.


9.7.12 Function special-operator-p




The special-operator-p function returns t if obj is a symbol which names a special operator, otherwise it returns nil.


9.8 Object Type

In TXR Lisp, objects obey the following type hierarchy. In this type hierarchy, the internal nodes denote abstract types: no object is an instance of an abstract type. Nodes in square brackets indicate an internal structure in the type graph, visible to programs, and angle brackets indicate a plurality of types which are not listed by name:

  t ----+--- [cobj types] ---+--- hash
        |                    |
        |                    +--- stream
        |                    |
        |                    +--- random-state
        |                    |
        |                    +--- regex
        |                    |
        |                    +--- buf
        |                    |
        |                    +--- cptr
        |                    |
        |                    +--- struct-type
        |                    |
        |                    +--- <structures>
        |                    |
        |                    +--- ... others
        +--- sequence ---+--- string ---+--- str
        |                |              |
        |                |              +--- lstr
        |                |              |
        |                |              +--- lit
        |                |
        |                +--- list ---+--- null
        |                |            |
        |                |            +--- cons
        |                |            |
        |                |            +--- lcons
        |                |
        |                +--- vec
        +--- number ---+--- float
        |              |
        |              +--- integer ---+--- fixnum
        |                              |
        |                              +--- bignum
        +--- sym
        +--- env
        +--- range
        +--- pkg
        +--- fun

In addition to the above hierarchy, the following relationships also exist:

  t ---+--- atom --- <any type other than cons> --- nil
       +--- cons ---+--- lcons --- nil
                    +--- nil

  sym --- null

That is to say, the types are exhaustively partitioned into atoms and conses; an object is either a cons or else it isn't, in which case it is the abstract type atom.

The cons type is odd in that it is both an abstract type, serving as a supertype for the type lcons and it is also a concrete type in that regular conses are of this type.

The type nil is an abstract type which is empty. That is to say, no object is of type nil. This type is considered the abstract subtype of every other type, including itself.

The type nil is not to be confused with the type null which is the type of the nil symbol.

Lastly, because the type of nil is the type null and nil is also a symbol, the null type is a subtype of sym.


9.8.1 Function typeof




The typeof function returns a symbol representing the type of value.

The core types are identified by the following symbols:

Cons cell.


Literal string embedded in the TXR executable image.


Fixnum integer: an integer that fits into the value word, not having to be heap allocated.

A bignum integer: arbitrary precision integer that is heap-allocated.

Floating-point number.


Symbol package.



Lazy cons.

Range object.

Lazy string.

Function/variable binding environment.

Hash table.

I/O stream of any kind.

Regular expression object.

A structure type: the type of any one of the values which represents a structure type.

There are more kinds of objects, such as user-defined structures.


9.8.2 Function subtypep


left-type-symbol right-type-symbol)


The subtypep function tests whether left-type-symbol and right-type-symbol name a pair of types, such that the left type is a subtype of the right type.

If either argument doesn't name a type, the behavior is unspecified.

Each type is a subtype of itself. Most other type relationships can be inferred from the type hierarchy diagrams given in the introduction to this section.

In addition, there are inheritance relationships among structures. If left-type-symbol and right-type-symbol both name structure types, then subtypep yields true if the types are the same struct type, or if the right type is a direct or indirect supertype of the left.


9.8.3 Function typep


object type-symbol)


The typep function tests whether the type of object is a subtype of the type named by type-symbol.

The following equivalence holds:

  (typep a b) --> (subtypep (typeof a) b)


9.8.4 Macro typecase


test-form {(type-sym clause-form*)}*)


The typecase macro evaluates test-form and then successively tests its type against each clause.

Each clause consists of a type symbol type-sym and zero or more clause-form-s.

The first clause whose type-sym is a supertype of the type of test-form's value is considered to be the matching clause. That clause's clause-form-s are evaluated, and the value of the last form is returned.

If there is no matching clause, or there are no clauses present, or the matching clause has no clause-form-s, then nil is returned.

Note: since t is the supertype of every type, a clause whose type-sym is the symbol t always matches. If such a clause is placed as the last clause of a typecase, it provides a fallback case, whose forms are evaluated if none of the previous clauses match.


9.9 Object Equivalence


9.9.1 Functions identity and use




The identity function returns its argument.

The use function is a synonym.


The identity function is useful as a functional argument, when a transformation function is required, but no transformation is actually desired. In this role, the use synonym leads to readable code. For instance:
  ;; construct a function which returns its integer argument
  ;; if it is odd, otherwise it returns its successor.
  ;; "If it's odd, use it, otherwise take its successor".

  [iff oddp use succ]

  ;; Applications of the function:

  [[iff oddp use succ] 3] -> 3  ;; use applied to 3

  [[iff oddp use succ] 2] -> 3  ;; succ applied to 2


9.9.2 Functions null, not and false




The null, not and false functions are synonyms. They tests whether value is the object nil. They return t if this is the case, nil otherwise.


  (null '()) -> t
  (null nil) -> t
  (null ()) -> t
  (false t) -> nil

  (if (null x) (format t "x is nil!"))

  (let ((list '(b c d)))
    (if (not (memq 'a list))
      (format t "list ~s does not contain the symbol a\n")))


9.9.3 Functions true and have




The true function is the complement of the null, not and false functions. The have function is a synonym for true.

It return t if the value is any object other than nil. If value is nil, it returns nil.

Note: programs should avoid explicitly testing values with true. For instance (if x ...) should be favored over (if (true x) ...). However, the latter is useful with the ifa macro because (ifa (true expr) ...) binds the it variable to the value of expr, no matter what kind of form expr is, which is not true in the (ifa expr ...) form.


   ;; Compute indices where the list '(1 nil 2 nil 3)
   ;; has true values:
   [where '(1 nil 2 nil 3) true] -> (1 3)


9.9.4 Functions eq, eql and equal


left-obj right-obj)
left-obj right-obj)
left-obj right-obj)


The principal equality test functions eq, eql and equal test whether two objects are equivalent, using different criteria. They return t if the objects are equivalent, and nil otherwise.

The eq function uses the strictest equivalence test, called implementation equality. The eq function returns t if, and only if, left-obj and right-obj are actually the same object. The eq test is is implemented by comparing the raw bit pattern of the value, whether or not it is an immediate value or a pointer to a heaped object. Two character values are eq if they are the same character, and two fixnum integers are eq if they have the same value. All other object representations are actually pointers, and are eq if, and only, if they point to the same object in memory. So, for instance, two bignum integers might not be eq even if they have the same numeric value, two lists might not be eq even if all their corresponding elements are eq and two strings might not be eq even if they hold identical text.

The eql function is slightly less strict than eq. The difference between eql and eq is that if left-obj and right-obj are numbers which are of the same kind and have the same numeric value, eql returns t, even if they are different objects. Note that an integers and a floating-point number are not eql even if one has a value which converts to the other: thus, (eql 0.0 0) yields nil; a comparison expression which finds these numbers equal is (= 0.0 0). The eql function also specially treats range objects. Two distinct range objects are eql if their corresponding from and to fields are eql. For all other object types, eql behaves like eq.

The equal function is less strict still than eql. In general, it recurses into some kinds of aggregate objects to perform a structural equivalence check. For struct types, it also supports customization via equality substitution. See the Equality Substitution section under Structures.

Firstly, if left-obj and right-obj are eql then they are also equal, though of course the converse isn't necessarily the case.

If two objects are both cons cells, then they are equal if their car fields are equal and their cdr fields are equal.

If two objects are vectors, they are equal if they have the same length, and their corresponding elements are equal.

If two objects are strings, they are equal if they are textually identical.

If two objects are functions, they are equal if they have equal environments, and if they have the same code. Two compiled functions are considered to have the same code if and only if they are pointers to the same function. Two interpreted functions are considered to have the same code if their list structure is equal.

Two hashes are equal if they use the same equality (both are :equal-based, or both are :eql-based), if their associated user data elements are equal (see the function hash-userdata), if their sets of keys are identical, and if the data items associated with corresponding keys from each respective hash are equal objects.

Two ranges are equal if their corresponding to and from fields are equal.

For some aggregate objects, there is no special semantics. Two arguments which are symbols, packages, or streams are equal if and only if they are the same object.

Certain object types have a custom equal function.


9.9.5 Functions neq, neql and nequal


left-obj right-obj)
left-obj right-obj)
left-obj right-obj)


The functions neq, neql and nequal are logically negated counterparts of, respectively, eq, eql and equal.

If eq returns t for a given pair of arguments left-obj and right-obj, then neq returns nil. Vice versa, if eq returns nil, neq returns t.

The same relationship exits between eql and neql, and between equal and nequal.


9.9.6 Function less


left-obj right-obj)
obj obj*)


The less function, when called with two arguments, determines whether left-obj compares less than right-obj in a generic way which handles arguments of various types.

The argument syntax of less is generalized. It can accept one argument, in which case it unconditionally returns t regardless of that argument's value. If more than two arguments are given, then less generalizes in a way which can be described by the following equivalence pattern, with the understanding that each argument expression is evaluated exactly once:

  (less a b c) <--> (and (less a b) (less b c))
  (less a b c d) <--> (and (less a b) (less b c) (less c d))

The less function is used as the default for the lessfun argument of the functions sort and merge, as well as the testfun argument of the pos-min and find-min.

The less function is capable of comparing numbers, characters, symbols, strings, as well as lists and vectors of these.

If both arguments are the same object so that (eq left-obj right-obj) holds true, then the function returns nil regardless of the type of left-obj, even if the function doesn't handle comparing different instances of that type. In other words, no object is less than itself, no matter what it is.

If both arguments are numbers or characters, they are compared as if using the < function.

If both arguments are strings, they are compared as if using the string-lt function.

If both arguments are symbols, then their names are compared in their place, as if by the string-lt function.

If both arguments are conses, then they are compared as follows:

The less function is recursively applied to the car fields of both arguments. If it yields true, then left-obj is deemed to be less than right-obj.
Otherwise, if the car fields are unequal under the equal function, less returns nil.
If the car fields are equal then less is recursively applied to the cdr fields of the arguments, and the result of that comparison is returned.

This logic performs a lexicographic comparison on ordinary lists such that for instance (1 1) is less than (1 1 1) but not less than (1 0) or (1).

Note that the empty nil list nil compared to a cons is handled by type-based precedence, described below.

Two vectors are compared by less lexicographically, similarly to strings. Corresponding elements, starting with element 0, of the vectors are compared until an index position is found where corresponding elements of the two vectors are not equal. If this differing position is beyond the end of one of the two vectors, then the shorter vector is considered to be lesser. Otherwise, the result of less is the outcome of comparing those differing elements themselves with less.

Two ranges are compared by less using lexicographic logic similar to conses and vectors. The from fields of the ranges are first compared. If they are not equal, equal then less is applied to those fields and the result is returned. If the from fields are equal, then less is applied to the to fields and that result is returned.

If the two arguments are of the above types, but of mutually different types, then less resolves the situation based on the following precedence: numbers and characters are less than ranges, which are less than strings, which are less than symbols, which are less than conses, which are less than vectors.

Note that since nil is a symbol, it is ranked lower than a cons. This interpretation ensures correct behavior when nil is regarded as an empty list, since the empty list is lexicographically prior to a nonempty list.

If either argument is a structure for which the equal method is defined, the method is invoked on that argument, and the value returned is used in place of that argument for performing the comparison. Structures with no equal method cannot participate in a comparison, resulting in an error. See the Equality Substitution section under Structures.

Finally, if either of the arguments has a type other than the above types, the situation is an error.


9.9.7 Function greater


left-obj right-obj)
obj obj*)


The greater function is equivalent to less with the arguments reversed. That is to say, the following equivalences hold:

  (greater a <--> (less a) <--> t
  (greater a b) <--> (less b a)
  (greater a b c ...) <--> (less ... c b a)

The greater function is used as the default for the testfun argument of the pos-max and find-max functions.


9.9.8 Functions lequal and gequal


obj obj*)
obj obj*)


The functions lequal and gequal are similar to less and greater respectively, but differ in the following respect: when called with two arguments which compare true under the equal function, the lequal and gequal functions return t.

When called with only one argument, both functions return t and both functions generalize to three or more arguments in the same way as do less and greater.


9.10 List Manipulation


9.10.1 Function cons


car-value cdr-value)


The cons function allocates, initializes and returns a single cons cell. A cons cell has two fields called car and cdr, which are accessed by functions of the same name, or by the functions first and rest, which are synonyms for these.

Lists are made up of conses. A (proper) list is either the symbol nil denoting an empty list, or a cons cell which holds the first item of the list in its car, and the list of the remaining items in cdr. The expression (cons 1 nil) allocates and returns a single cons cell which denotes the one-element list (1). The cdr is nil, so there are no additional items.

A cons cell whose cdr is an atom other than nil is printed with the dotted pair notation. For example the cell produced by (cons 1 2) is denoted (1 . 2). The notation (1 . nil) is perfectly valid as input, but the cell which it denotes will print back as (1). The notations are equivalent.

The dotted pair notation can be used regardless of what type of object is the cons cell's cdr. so that for instance (a . (b c)) denotes the cons cell whose car is the symbol a a and whose cdr is the list (b c). This is exactly the same thing as (a b c). In other words (a b ... l m . (n o ... w . (x y z))) is exactly the same as (a b ... l m n o ... w x y z).

Every list, and more generally cons cell tree structure, can be written in a "fully dotted" notation, such that there are as many dots as there are cells. For instance the cons structure of the nested list (1 (2) (3 4 (5))) can be made more explicit using (1 . ((2 . nil) . ((3 . (4 . ((5 . nil) . nil))) . nil)))). The structure contains eight conses, and so there are eight dots in the fully dotted notation.

The number of conses in a linear list like (1 2 3) is simply the number of items, so that list in particular is made of three conses. Additional nestings require additional conses, so for instance (1 2 (3)) requires four conses. A visual way to count the conses from the printed representation is to count the atoms, then add the count of open parentheses, and finally subtract one.

A list terminated by an atom other than nil is called an improper list, and the dot notation is extended to cover improper lists. For instance (1 2 . 3) is an improper list of two elements, terminated by 3, and can be constructed using (cons 1 (cons 2 3)). The fully dotted notation for this list is (1 . (2 . 3)).


9.10.2 Function atom




The atom function tests whether value is an atom. It returns t if this is the case, nil otherwise. All values which are not cons cells are atoms.

(atom x) is equivalent to (not (consp x)).


  (atom 3) -> t
  (atom (cons 1 2)) -> nil
  (atom "abc") -> t
  (atom '(3)) -> nil


9.10.3 Function consp




The consp function tests whether value is a cons. It returns t if this is the case, nil otherwise.

(consp x) is equivalent to (not (atom x)).

Non-empty lists test positive under consp because a list is represented as a reference to the first cons in a chain of one or more conses.

Note that a lazy cons is a cons and satisfies the consp test. See the function make-lazy-cons and the macro lcons.


  (consp 3) -> nil
  (consp (cons 1 2)) -> t
  (consp "abc") -> nil
  (consp '(3)) -> t


9.10.4 Accessors car and first


  (set (car
object) new-value)
  (set (first
object) new-value)


The functions car and first are synonyms.

If object is a cons cell, these functions retrieve the car field of that cons cell. (car (cons 1 2)) yields 1.

For programming convenience, object may be of several other kinds in addition to conses.

(car nil) is allowed, and returns nil.

object may also be a vector or a string. If it is an empty vector or string, then nil is returned. Otherwise the first character of the string or first element of the vector is returned.

object may be a structure. The car operation is possible if the object has a car method. If so, car invokes that method and returns whatever the method returns. If the structure has no car method, but has a lambda method, then the car function calls that method with one argument, that being the integer zero. Whatever the method returns, car returns. If neither method is defined, an error exception is thrown.

A car form denotes a valid place whenever object is a valid argument for the rplaca function. Modifying the place denoted by the form is equivalent to invoking rplaca with object as the left argument, and the replacement value as the right argument. It takes place in the manner given under the description rplaca function, and obeys the same restrictions.

A car form supports deletion. The following equivalence then applies:

  (del (car place)) <--> (pop place)

This implies that deletion requires the argument of the car form to be a place, rather than the whole form itself. In this situation, the argument place may have a value which is nil, because pop is defined on an empty list.

The abstract concept behind deleting a car is that physically deleting this field from a cons, thereby breaking it in half, would result in just the cdr remaining. Though fragmenting a cons in this manner is impossible, deletion simulates it by replacing the place which previously held the cons, with that cons' cdr field. This semantics happens to coincide with deleting the first element of a list by a pop operation.


9.10.5 Accessors cdr and rest


  (set (cdr
object) new-value)
  (set (rest
object) new-value)


The functions cdr and rest are synonyms.

If object is a cons cell, these functions retrieve the cdr field of that cons cell. (cdr (cons 1 2)) yields 2.

For programming convenience, object may be of several other kinds in addition to conses.

(cdr nil) is allowed, and returns nil.

object may also be a vector or a string. If it is a non-empty string or vector containing at least two items, then the remaining part of the object is returned, with the first element removed. For example (cdr "abc") yields "bc". If object is is a one-element vector or string, or an empty vector or string, then nil is returned. Thus (cdr "a") and (cdr "") both result in nil.

If object is a structure, then cdr requires it to support either the cdr method or the lambda method. If both are present, cdr is used. When the cdr function uses the cdr method, it invokes it with no arguments. Whatever value the method returns becomes the return value of cdr. When cdr invokes a structure's lambda method, it passes as the argument the range object #R(1 t). Whatever the lambda method returns becomes the return value of cdr.

The invocation syntax of a cdr or rest form is a syntactic place. The place is semantically correct if object is a valid argument for the rplacd function. Modifying the place denoted by the form is equivalent to invoking rplacd with object as the left argument, and the replacement value as the right argument. It takes place in the manner given under the description rplacd function, and obeys the same restrictions.

A cdr place supports deletion, according to the following near equivalence:

  (del (cdr place)) <--> (prog1 (cdr place)
                                (set place (car place)))

Of course, place is evaluated only once.

Note that this is symmetric with the delete semantics of car in that the cons stored in place goes away, as does the cdr field, leaving just the car, which takes the place of the original cons.


Walk every element of the list (1 2 3) using a for loop:

    (for ((i '(1 2 3))) (i) ((set i (cdr i)))
      (print (car i) *stdout*)
      (print #\newline *stdout*))

The variable i marches over the cons cells which make up the "backbone" of the list. The elements are retrieved using the car function. Advancing to the next cell is achieved using (cdr i). If i is the last cell in a (proper) list, (cdr i) yields nil and so i becomes nil, the loop guard expression i fails and the loop terminates.


9.10.6 Functions rplaca and rplacd


object new-car-value)
object new-cdr-value)


If object is a cons cell or lazy cons cell, then rplaca and rplacd functions assign new values into the car and cdr fields of the object. In addition, these functions are meaningful for other kinds of objects also.

Note that, except for the difference in return value, (rplaca x y) is the same as the more generic (set (car x) y), and likewise (rplacd x y) can be written as (set (cdr x) y).

The rplaca and rplacd functions return cons. Note: TXR versions 89 and earlier, these functions returned the new value. The behavior was undocumented.

The cons argument does not have to be a cons cell. Both functions support meaningful semantics for vectors and strings. If cons is a string, it must be modifiable.

The rplaca function replaces the first element of a vector or first character of a string. The vector or string must be at least one element long.

The rplacd function replaces the suffix of a vector or string after the first element with a new suffix. The new-cdr-value must be a sequence, and if the suffix of a string is being replaced, it must be a sequence of characters. The suffix here refers to the portion of the vector or string after the first element.

It is permissible to use rplacd on an empty string or vector. In this case, new-cdr-value specifies the contents of the entire string or vector, as if the operation were done on a non-empty vector or string, followed by the deletion of the first element.

The object argument may be a structure. In the case of rplaca, the structure must have a defined rplaca method or else, failing that, a lambda-set method. The first of these methods which is available, in the given order, is used to perform the operation. Whatever the respective method returns, If the lambda-set method is used, it is called with two arguments (in addition to object): the integer zero, and new-car-value.

In the case of rplacd, the structure must have a defined rplacd method or else, failing that, a lambda-set method. The first of these methods which is available, in the given order, is used to perform the operation. Whatever the respective method returns, If the lambda-set method is used, it is called with two arguments (in addition to object): the range value #R(1 t) and new-car-value.


9.10.7 Accessors second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth


  (set (first
object) new-value)
  (set (second
object) new-value)
  (set (tenth
object) new-value)


Used as functions, these accessors retrieve the elements of a sequence by position. If the sequence is shorter than implied by the position, these functions return nil.

When used as syntactic places, these accessors denote the storage locations by position. The location must exist, otherwise an error exception results. The places support deletion.


  (third '(1 2)) -> nil
  (second "ab") -> #\b
  (third '(1 2 . 3)) -> **error, improper list*

  (let ((x (copy "abcd")))
    (inc (third x))
    x) -> "abce"


9.10.8 Functions append and nconc


  (append [
  (nconc [


The append function creates a new object which is a catenation of the list arguments. All arguments are optional; (append) produces the empty list, and if a single argument is specified, that argument is returned.

If two or more arguments are present, then the situation is identified as one or more sequence arguments followed by last-arg. The sequence arguments must be sequences; last-arg may be a sequence or atom.

The append operation over three or more arguments is left-associative, such that (append x y z) is equivalent to both (append (append x y) z) and (append x (append z y)).

This allows the catenation of an arbitrary number of arguments to be understood in terms of a repeated application of the two-argument case, whose semantics is given by these rules:

nil catenates with nil to produce nil:
  (append nil nil) -> nil
nil catenates with a proper or improper list, producing that list itself:
  (append nil '(1 2)) -> (1 2)
  (append nil '(1 2 . 3)) -> (1 2 . 3)
A proper list catenates with nil, producing that list itself:
  (append '(1 2) nil) -> (1 2)
A proper list catenates with an atom, producing an improper list terminated by that atom, whether or not that atom is a sequence:
  (append '(1 2) #(3)) -> (1 2 . #(3))
  (append '(1 2) 3) -> (1 2 . 3)
A non-list sequence catenates with another sequence into a sequence, producing a sequence which contains the elements of both, of the same kind as the left sequence. The elements must be compatible; a string can only catenate with a sequence of characters.
  (append #(1 2) #(3 4)) -> #(1 2 3 4)
  (append "ab" "cd") -> "abcd"
  (append "ab" #(#\c #\d)) -> "abcd"
  (append "ab" #(3 4)) -> ;; error
A non-list sequence catenates with an atom if it is a suitable element type for that kind of sequence. The resulting sequence is of the same kind, and includes that atom:
  (append #(1 2) 3) -> #(1 2 3)
  (append "ab" #) -> "abc"
  (append "ab" 3) -> ;; error
If an improper list is catenated with any object, the catenation takes place between the terminating atom of that list and that object. This requires the terminating atom to be a sequence. If the catenation is possible, then the result is a new improper list which is a copy of the original, but with the terminating atom replaced by a catenation of that atom and the object:
  (append '(1 2 . "ab") "c") -> (1 2 . "abc")
  (append '(1 2 . "ab") '(2 3)) -> ;; error
A non-sequence atom doesn't catenate; the situation is erroneous:
  (append 1 2) -> ;; error
  (append '(1 . 2) 3) -> ;; error

If N arguments are specified, where N > 1, then the first N-1 arguments must be proper lists. Copies of these lists are catenated together. The last argument N, shown in the above syntax as last-arg, may be any kind of object. It is installed into the cdr field of the last cons cell of the resulting list. Thus, if argument N is also a list, it is catenated onto the resulting list, but without being copied. Argument N may be an atom other than nil; in that case append produces an improper list.

The nconc function works like append, but may destructively manipulate any of the input objects.


  ;; An atom is returned.
  (append 3) -> 3

  ;; A list is also just returned: no copying takes place.
  ;; The eq function can verify that the same object emerges
  ;; from append that went in.
  (let ((list '(1 2 3)))
    (eq (append list) list)) -> t

  (append '(1 2 3) '(4 5 6) 7) -> '(1 2 3 4 5 6 . 7))

  ;; the (4 5 6) tail of the resulting list is the original
  ;; (4 5 6) object, shared with that list.

  (append '(1 2 3) '(4 5 6)) -> '(1 2 3 4 5 6)

  (append nil) -> nil

  ;; (1 2 3) is copied: it is not the last argument
  (append '(1 2 3) nil) -> (1 2 3)

  ;; empty lists disappear
  (append nil '(1 2 3) nil '(4 5 6)) -> (1 2 3 4 5 6)
  (append nil nil nil) -> nil

  ;; atoms and improper lists other than in the last position
  ;; are erroneous
  (append '(a . b) 3 '(1 2 3)) -> **error**

  ;; sequences other than lists can be catenated.
  (append "abc" "def" "g" #\h) -> "abcdefgh"

  ;; lists followed by non-list sequences end with non-list
  ;; sequences catenated in the terminating atom:
  (append '(1 2) '(3 4) "abc" "def") -> (1 2 3 4 . "abcdef")


9.10.9 Function append*


  (append* [


The append* function lazily catenates lists.

If invoked with no arguments, it returns nil. If invoked with a single argument, it returns that argument.

Otherwise, it returns a lazy list consisting of the elements of every list argument from left to right.

Arguments other than the last are treated as lists, and traversed using car and cdr functions to visit their elements.

The last argument isn't traversed: rather, that object itself becomes the cdr field of the last cons cell of the lazy list constructed from the previous arguments.


9.10.10 Functions revappend and nreconc


list1 list2)
list1 list2)


The revappend function returns a list consisting of list2 appended to a reversed copy of list1. The returned object shares structure with list2, which is unmodified.

The nreconc function behaves similarly, except that the the returned object may share structure with not only list2 but also list1, which is modified.


9.10.11 Function list




The list function creates a new list, whose elements are the argument values.


  (list) -> nil
  (list 1) -> (1)
  (list 'a 'b) -> (a b)


9.10.12 Function list*




The list* function is a generalization of cons. If called with exactly two arguments, it behaves exactly like cons: (list* x y) is identical to (cons x y). If three or more arguments are specified, the leading arguments specify additional atoms to be consed to the front of the list. So for instance (list* 1 2 3) is the same as (cons 1 (cons 2 3)) and produces the improper list (1 2 . 3). Generalizing in the other direction, list* can be called with just one argument, in which case it returns that argument, and can also be called with no arguments in which case it returns nil.


  (list*) -> nil
  (list* 1) -> 1
  (list* 'a 'b) -> (a . b)
  (list* 'a 'b 'c) -> (a b . c)

Dialect Note:

Note that unlike in some other Lisp dialects, the effect of (list* 1 2 x) can also be obtained using (list 1 2 . x). However, (list* 1 2 (func 3)) cannot be rewritten as (list 1 2 . (func 3)) because the latter is equivalent to (list 1 2 func 3).


9.10.13 Function sub-list


list [from [to]])


This function is like the sub function, except that it operates strictly on lists.

For a description of the arguments and semantics, refer to the sub function.


9.10.14 Function replace-list


list item-sequence [from [to]])


The replace-list function is like the replace function, except that the first argument must be a list.

For a description of the arguments, semantics and return value, refer to the replace function.


9.10.15 Functions listp and proper-list-p




The listp and proper-list-p functions test, respectively, whether value is a list, or a proper list, and return t or nil accordingly.

The listp test is weaker, and executes without having to traverse the object. The value produced by the expression (listp x) is the same as that of (or (null x) (consp x)), except that x is evaluated only once. The empty list nil is a list, and a cons cell is a list.

The proper-list-p function returns t only for proper lists. A proper list is either nil, or a cons whose cdr is a proper list. proper-list-p traverses the list, and its execution will not terminate if the list is circular.

Dialect Note: in TXR 137 and older, proper-list-p is called proper-listp. The name was changed for adherence to conventions and compatibility with other Lisp dialects, like Common Lisp. However, the function continues to be available under the old name. Code that must run on TXR 137 and older installations should use proper-listp, but its use going forward is deprecated.


9.10.16 Function endp




The endp function returns t if object is the object nil.

If object is a cons cell, then endp returns t.

Otherwise, endp function throws an exception.


9.10.17 Function length-list




The length-list function returns the length of list, which may be a proper or improper list. The length of a list is the number of conses in that list.


9.10.18 Function copy-list




The copy-list function which returns a list similar to list, but with a newly allocated cons cell structure.

If list is an atom, it is simply returned.

Otherwise, list is a cons cell, and copy-list returns the same object as the expression (cons (car list) (copy-list (cdr list))).

Note that the object (car list) is not deeply copied, but only propagated by reference into the new list. copy-list produces a new list structure out of the same items that are in list.

Dialect Note:

Common Lisp does not allow the argument to be an atom, except for the empty list nil.


9.10.19 Function copy-cons




This function creates a fresh cons cell, whose car and cdr fields are copied from cons.


9.10.20 Functions reverse and nreverse





The functions reverse and nreverse produce an object which contains the same items as proper list list, but in reverse order. If list is nil, then both functions return nil.

The reverse function is non-destructive: it creates a new list.

The nreverse function creates the structure of the reversed list out of the cons cells of the input list, thereby destructively altering it (if it contains more than one element). How nreverse uses the material from the original list is unspecified. It may rearrange the cons cells into a reverse order, or it may keep the structure intact, but transfer the car values among cons cells into reverse order. Other approaches are possible.


9.10.21 Accessor nthlast


index list)
  (set (nthlast
index list) new-value)


The nthlast function retrieves the n-th last cons cell of a list, indexed from one. The index parameter must be a an integer. If index is positive and so large that it specifies a nonexistent cons beyond the beginning of the list, nthlast returns list. Effectively, values of index larger than the length of the list are clamped to the length. If index is negative, then nthlast yields nil. An index value of zero retrieves the terminating atom of list or else the value list itself, if list is an atom.

The following equivalences hold:

  (nthlast 1 list) <--> (last list)

An nthlast place designates the storage location which holds the n-th cell, as indicated by the value of index.

A negative index doesn't denote a place.

A positive index greater than the length of the list is treated as if it were equal to the length of the list.

If list is itself a syntactic place, then the index value n is permitted for a list of length n. This index value denotes the list place itself. Storing to this value overwrites list. If list isn't a syntactic place, then storing to position n isn't permitted.

If list is is of length zero, or an atom (in which case its length is considered to be zero) then the above remarks about position n apply to an index value of zero: if list is a syntactic place, then the position denotes list itself, otherwise the position doesn't exist as a place.

If list contains one or more elements, then index value of zero denotes the cdr field of its last cons cell. Storing a value to this place overwrites the terminating atom.


9.10.22 Accessor butlastn


num list)
  (set (butlastn
num list) new-value )


The butlastn function calculates that initial portion of list which excludes the last num elements.

Note: the butlastn function doesn't support non-list sequences as sequences; it treats them as the terminating atom of a zero-length improper list. The butlast sequence function supports non-list sequences. If x is a list, then the following equivalence holds:

  (butlastn n x)  <-->  (butlast x n)

If num is zero, or negative, then butlastn returns list.

If num is positive, and meets or exceeds the length of list, then butlastn returns nil.

If a butlastn form is used as a syntactic place, then list must be a place. Assigning to the form causes list to be replaced with a new list which is a catenation of the new value and the last num elements of the original list, according to the following equivalence:

  (set (butlastn n x) v)  <--> (progn (set x (append v (nthlast n x)))

except that n, x and v are evaluated only once, in left-to-right order.


9.10.23 Accessor nth


index object)
  (set (nth
index object) new-value)


The nth function performs random access on a list, retrieving the n-th element indicated by the zero-based index value given by index. The index argument must be a non-negative integer.

If index indicates an element beyond the end of the list, then the function returns nil.

The following equivalences hold:

  (nth 0 list) <--> (car 0) <--> (first list)
  (nth 1 list) <--> (cadr list) <--> (second list)
  (nth 2 list) <--> (caddr list) <--> (third list)

  (nth x y) <--> (car (nthcdr x y))


9.10.24 Accessor nthcdr


index list)
  (set (nthcdr
index list) new-value)


The nthcdr function retrieves the n-th cons cell of a list, indexed from zero. The index parameter must be a non-negative integer. If index specifies a nonexistent cons beyond the end of the list, then nthcdr yields nil.

The following equivalences hold:

  (nthcdr 0 list) <--> list
  (nthcdr 1 list) <--> (cdr list)
  (nthcdr 2 list) <--> (cddr list)

  (car (nthcdr x y)) <--> (nth x y)

An nthcdr place designates the storage location which holds the n-th cell, as indicated by the value of index. Indices beyond the last cell of list do not designate a valid place. If list is itself a place, then the zeroth index is permitted and the resulting place denotes list. Storing a value to (nthcdr 0 list) overwrites list. Otherwise if list isn't a syntactic place, then the zeroth index does not designate a valid place; index must have a positive value. A nthcdr place does not support deletion.

Dialect Note:

In Common Lisp, nthcdr is only a function, not an accessor; nthcdr forms do not denote places.


9.10.25 Accessors caar, cadr, cdar, cddr, ... and cdddddr


  (set (caar
object) new-value)
  (set (cadr
object) new-value)


The a-d accessors provide a shorthand notation for accessing two to five levels deep into a cons-cell-based tree structure. For instance, the the equivalent of the nested function call expression (car (car (cdr object))) can be achieved using the single function call (caadr object). The symbol names of the a-d accessors are a generalization of the words "car" and "cdr". They encode the pattern of car and cdr traversal of the structure using a sequence of the the letters a and d placed between c and r. The traversal is encoded in right-to-left order, so that cadr indicates a traversal of the cdr link, followed by the car. This order corresponds to the nested function call notation, which also encodes the traversal right-to-left. The following diagram illustrates the straightforward relationship:
  (cdr (car (cdr x)))
    ^    ^    ^
    |   /     |
    |  /     /
    | / ____/
    || /
  (cdadr x)

TXR Lisp provides all possible a-d accessors up to five levels deep, from caar all the way through cdddddr.

Expressions involving a-d accessors are places. For example, (caddr x) denotes the same place as (car (cddr x)), and (cdadr x) denotes the same place as (cdr (cadr x)).

The a-d accessor places support deletion, with semantics derived from the deletion semantics of the car and cdr places. For example, (del (caddr x)) means the same as (del (car (cddr x))).


9.10.26 Functions flatten and flatten*




The flatten function produces a list whose elements are all of the non-nil atoms contained in the structure of list.

The flatten* function works like flatten except that it produces a lazy list. It can be used to lazily flatten an infinite lazy structure.


  (flatten '(1 2 () (3 4))) -> (1 2 3 4)

  ;; equivalent to previous, since
  ;; nil is the same thing as ()
  (flatten '(1 2 nil (3 4))) -> (1 2 3 4)

  (flatten nil) -> nil

  (flatten '(((()) ()))) -> nil


9.10.27 Functions flatcar and flatcar*




The flatcar function produces a list of all the atoms contained in the tree structure tree, in the order in which they appear, when the structure is traversed left to right.

This list includes those nil atoms which appear in car fields.

The list excludes nil atoms which appear in cdr fields.

The flatcar* function works like flatcar except that it produces a lazy list. It can be used to lazily flatten an infinite lazy structure.


  (flatcar '(1 2 () (3 4))) -> (1 2 nil 3 4)

  (flatcar '(a (b . c) d (e) (((f)) . g) (nil . z) nil . h))

  --> (a b c d e f g nil z nil h)


9.10.28 Function tree-find


obj tree test-function)


The tree-find function searches tree for an occurrence of obj. Tree can be any atom, or a cons. If tree it is a cons, it is understood to be a proper list whose elements are also trees.

The equivalence test is performed by test-function which must take two arguments, and has conventions similar to eq, eql or equal.

tree-find works as follows. If tree is equivalent to obj under test-function, then t is returned to announce a successful finding. If this test fails, and tree is an atom, nil is returned immediately to indicate that the find failed. Otherwise, tree is taken to be a proper list, and tree-find is recursively applied to each element of the list in turn, using the same obj and test-function arguments, stopping at the first element which returns a non-nil value.


9.10.29 Functions memq, memql and memqual


object list)
object list)
object list)


The memq, memql and memqual functions search list for a member which is, respectively, eq, eql or equal to object. (See the eq, eql and equal functions above.)

If no such element found, nil is returned.

Otherwise, that suffix of list is returned whose first element is the matching object.


9.10.30 Functions member and member-if


key sequence [testfun [keyfun]])
predfun sequence [keyfun])


The member and member-if functions search through sequence for an item which matches a key, or satisfies a predicate function, respectively.

The keyfun argument specifies a function which is applied to the elements of the sequence to produce the comparison key. If this argument is omitted, then the untransformed elements of the sequence themselves are examined.

The member function's testfun argument specifies the test function which is used to compare the comparison keys taken from the sequence to the search key. If this argument is omitted, then the equal function is used. If member does not find a matching element, it returns nil. Otherwise it returns the suffix of sequence which begins with the matching element.

The member-if function's predfun argument specifies a predicate function which is applied to the successive comparison keys pulled from the sequence by applying the key function to successive elements. If no match is found, then nil is returned, otherwise what is returned is the suffix of sequence which begins with the matching element.


9.10.31 Functions rmemq, rmemql, rmemqual, rmember and rmember-if


object list)
object list)
object list)
key sequence [testfun [keyfun]])
predfun sequence [keyfun])


These functions are counterparts to memq, memql, memqual, member and member-if which look for the right-most element which matches object, rather than for the left-most element.


9.10.32 Functions conses and conses*




These functions return a list whose elements are the conses which make up list. The conses* function does this in a lazy way, avoiding the computation of the entire list: it returns a lazy list of the conses of list. The conses function computes the entire list before returning.

The input list may be proper or improper.

The first cons of list is that list itself. The second cons is the rest of the list, or (cdr list). The third cons is (cdr (cdr list)) and so on.


  (conses '(1 2 3)) -> ((1 2 3) (2 3) (3))

Dialect Note:

These functions are useful for simulating the maplist function found in other dialects like Common Lisp.

TXR Lisp's (conses x) can be expressed in Common Lisp as (maplist #'identity x).

Conversely, the Common Lisp operation (maplist function list) can be computed in TXR Lisp as (mapcar function (conses list)).

More generally, the Common Lisp operation

  (maplist function list0 list1 ... listn)

can be expressed as:

  (mapcar function (conses list0)
                   (conses list1) ... (conses listn))


9.11 Association Lists

Association lists are ordinary lists formed according to a special convention. Firstly, any empty list is a valid association list. A non-empty association list contains only cons cells as the key elements. These cons cells are understood to represent key/value associations, hence the name "association list".


9.11.1 Function assoc


key alist)


The assoc function searches an association list alist for a cons cell whose car field is equivalent to key under the equal function. The first such cons is returned. If no such cons is found, nil is returned.


9.11.2 Function assql


key alist)


The assql function is just like assoc, except that the equality test is determined using the eql function rather than equal.


9.11.3 Functions rassql and rassoc


value alist)
value alist)


The rassql and rassoc functions are reverse lookup counterparts to assql and assoc. When searching, they examine the cdr field of the pairs of alist rather than the car field.

The rassql function searches association list alist for a cons whose cdr field equivalent to value according to the eql function. If such a cons is found, it is returned. Otherwise nil is returned.

The rassoc function searches in the same way as rassql but compares values using equal.


9.11.4 Function acons


car cdr alist)


The acons function constructs a new alist by consing a new cons to the front of alist. The following equivalence holds:

  (acons car cdr alist) <--> (cons (cons car cdr) alist)


9.11.5 Function acons-new


car cdr alist)


The acons-new function searches alist, as if using the assoc function, for an existing cell which matches the key provided by the car argument. If such a cell exists, then its cdr field is overwritten with the cdr argument, and then the alist is returned. If no such cell exists, then a new list is returned by adding a new cell to the input list consisting of the car and cdr values, as if by the acons function.


9.11.6 Function aconsql-new


car cdr alist)


This function is like acons-new, except that the eql function is used for equality testing. Thus, the list is searched for an existing cell as if using the assql function rather than assoc.


9.11.7 Function alist-remove


alist keys)


The alist-remove function takes association list alist and produces a duplicate from which cells matching the specified keys have been removed. The keys argument is a list of the keys not to appear in the output list.


9.11.8 Function alist-nremove


alist keys)


The alist-nremove function is like alist-remove, but potentially destructive. The input list alist may be destroyed and its structural material re-used to form the output list. The application should not retain references to the input list.


9.11.9 Function copy-alist




The copy-alist function duplicates alist. Unlike copy-list, which only duplicates list structure, copy-alist also duplicates each cons cell of the input alist. That is to say, each element of the output list is produced as if by the copy-cons function applied to the corresponding element of the input list.


9.12 Property Lists

A property list, also referred to as a plist, is a flat list of even length consisting of interleaved pairs of property names (usually symbols) and their values (arbitrary objects). An example property list is (:a 1 :b "two") which contains two properties, :a having value 1, and :b having value "two".

An improper plist represents Boolean properties in a condensed way, as property indicators which are not followed by a value. Such properties only indicate their presence or absence, which is useful for encoding a Boolean value. If it is absent, then the property is false. Correctly using an improper plist requires that the exact set of Boolean keys is established by convention.

In this document, the unqualified terms property list and plist refer strictly to an ordinary plist, not to an improper plist.

Dialect Note:

Unlike in some other Lisp dialects, including ANSI Common Lisp, symbols do not have property lists in TXR Lisp. Improper plists aren't a concept in ANSI CL.


9.12.1 Function prop


plist key)


The prop function searches property list plist for key key. If the key is found, then the value next to it is returned. Otherwise nil is returned.

It is ambiguous whether nil is returned due to the property not being found, or due to the property being present with a nil value.

The indicators in plist are compared with key using eq equality, allowing them to be symbols, characters or fixnum integers.


9.12.2 Function memp


key plist)


The memp function searches property list plist for key key, using eq equality.

If the key is found, then the entire suffix of plist beginning with the indicator is returned, such that the first element of the returned list is key and the second element is the property value.

Note the reversed argument convention relative to the prop function, harmonizing with functions in the member family.


9.12.3 Functions plist-to-alist and improper-plist-to-alist


imp-plist bool-keys)


The functions plist-to-alist and improper-plist-to-alist convert, respectively, a property list and improper property list to an association list.

The plist-to-alist function scans plist and returns the indicator-property pairs as a list of cons cells, such that each car is the indicator, and each cdr is the value.

The improper-plist-to-alist is similar, except that it handles the Boolean properties which, by convention, aren't followed by a value. The list of all such indicators is specified by the bool-keys argument.


  (plist-to-alist '(a 1 b 2))  -->  ((a . 1) (b . 2))

  (improper-plist-to-alist '(:x 1 :blue :y 2) '(:blue))
  -->  ((:x . 1) (:blue) (:y . 2))


9.13 List Sorting

Note: these functions operate on lists. The principal sorting function in TXR Lisp is sort, described under Sequence Manipulation.

The merge function described here provides access to an elementary step of the algorithm used internally by sort when operating on lists.

The multi-sort operation sorts multiple lists in parallel. It is implemented using sort.


9.13.1 Function merge


seq1 seq2 [lessfun [keyfun]])


The merge function merges two sorted sequences seq1 and seq2 into a single sorted sequence. The semantics and defaulting behavior of the lessfun and keyfun arguments are the same as those of the sort function.

The sequence which is returned is of the same kind as seq1.

This function is destructive of any inputs that are lists. If the output is a list, it is formed out of the structure of the input lists.


9.13.2 Function multi-sort


columns less-funcs [key-funcs])


The multi-sort function regards a list of lists to be the columns of a database. The corresponding elements from each list constitute a record. These records are to be sorted, producing a new list of lists.

The columns argument supplies the list of lists which comprise the columns of the database. The lists should ideally be of the same length. If the lists are of different lengths, then the shortest list is taken to be the length of the database. Excess elements in the longer lists are ignored, and do not appear in the sorted output.

The less-funcs argument supplies a list of comparison functions which are applied to the columns. Successive functions correspond to successive columns. If less-funcs is an empty list, then the sorted database will emerge in the original order. If less-funcs contains exactly one function, then the rows of the database is sorted according to the first column. The remaining columns simply follow their row. If less-funcs contains more than one function, then additional columns are taken into consideration if the items in the previous columns compare equal. For instance if two elements from column one compare equal, then the corresponding second column elements are compared using the second column comparison function.

The optional key-funcs argument supplies transformation functions through which column entries are converted to comparison keys, similarly to the single key function used in the sort function and others. If there are more key functions than less functions, the excess key functions are ignored.


9.14 Lazy Lists and Lazy Evaluation


9.14.1 Function make-lazy-cons




The function make-lazy-cons makes a special kind of cons cell called a lazy cons, or lcons. Lazy conses are useful for implementing lazy lists.

Lazy lists are lists which are not allocated all at once. Rather, their elements materialize when they are accessed, like magic stepping stones appearing under one's feet out of thin air.

A lazy cons has car and cdr fields like a regular cons, and those fields are initialized to nil when the lazy cons is created. A lazy cons also has an update function, the one which is provided as the function argument to make-lazy-cons.

When either the car and cdr fields of a cons are accessed for the first time, the function is automatically invoked first. That function has the opportunity to initialize the car and cdr fields. Once the function is called, it is removed from the lazy cons: the lazy cons no longer has an update function.

To continue a lazy list, the function can make another call to make-lazy-cons and install the resulting cons as the cdr of the lazy cons.


  ;;; lazy list of integers between min and max
  (defun integer-range (min max)
    (let ((counter min))
      ;; min is greater than max; just return empty list,
      ;; otherwise return a lazy list
      (if (> min max)
          (lambda (lcons)
            ;; install next number into car
            (rplaca lcons counter)
            ;; now deal wit cdr field
              ;; max reached, terminate list with nil!
              ((eql counter max)
               (rplacd lcons nil))
              ;; max not reached: increment counter
              ;; and extend with another lazy cons
                (inc counter)
                (rplacd lcons (make-lazy-cons
                                (lcons-fun lcons))))))))))


9.14.2 Function lconsp




The lconsp function returns t if value is a lazy cons cell. Otherwise it returns nil, even if value is an ordinary cons cell.


9.14.3 Function lcons-fun




The lcons-fun function retrieves the update function of a lazy cons. Once a lazy cons has been accessed, it no longer has an update function and lcons-fun returns nil. While the update function of a lazy cons is executing, it is still accessible. This allows the update function to retrieve a reference to itself and propagate itself into another lazy cons (as in the example under make-lazy-cons).


9.14.4 Macro lcons


car-expression cdr-expression)


The lcons macro simplifies the construction of structures based on lazy conses. Syntactically, it resembles the cons function. However, the arguments are expressions rather than values. The macro generates code which, when evaluated, immediately produces a lazy cons. The expressions car-expression and cdr-expression are not immediately evaluated. Rather, when either the car or cdr field of the lazy cons cell is accessed, these expressions are both evaluated at that time, in the order that they appear in the lcons expression, and in the original lexical scope in which that expression was evaluated. The return values of these expressions are used, respectively, to initialize the corresponding fields of the lazy cons.

Note: the lcons macro may be understood in terms of the following reference implementation, as a syntactic sugar combining the make-lazy-cons constructor with a lexical closure provided by a lambda function:

  (defmacro lcons (car-form cdr-form)
    (let ((lc (gensym)))
       ^(make-lazy-cons (lambda (,lc)
                          (rplaca ,lc ,car-form)
                          (rplacd ,lc ,cdr-form)))))


  ;; Given the following function ...

  (defun fib-generator (a b)
    (lcons a (fib-generator b (+ a b))))

  ;; ... the following function call generates the Fibonacci
  ;; sequence as an infinite lazy list.

  (fib-generator 1 1) -> (1 1 2 3 5 8 13 ...)


9.14.5 Functions lazy-stream-cons and get-lines


  (get-lines [


The lazy-stream-cons and get-lines functions are synonyms, except that the stream argument is optional in get-lines and defaults to *stdin*. Thus, the following description of lazy-stream-cons also applies to get-lines.

The lazy-stream-cons returns a lazy cons which generates a lazy list based on reading lines of text from input stream stream, which form the elements of the list. The get-line function is called on demand to add elements to the list.

The lazy-stream-cons function itself makes the first call to get-line on the stream. If this returns nil, then the stream is closed and nil is returned. Otherwise, a lazy cons is returned whose update function will install that line into the car field of the lazy cons, and continue the lazy list by making another call to lazy-stream-cons, installing the result into the cdr field.

lazy-stream-cons inspects the real-time property of a stream as if by the real-time-stream-p function. This determines which of two styles of lazy list are returned. For an ordinary (non-real-time) stream, the lazy list treats the end-of-file condition accurately: an empty file turns into the empty list nil, a one line file into a one-element list which contains that line and so on. This accuracy requires one line of lookahead which is not acceptable in real-time streams, and so a different type of lazy list is used, which generates an extra nil item after the last line. Under this type of lazy list, an empty input stream translates to the list (nil); a one-line stream translates to ("line" nil) and so forth.


9.14.6 Macro delay




The delay operator arranges for the delayed (or "lazy") evaluation of expression. This means that the expression is not evaluated immediately. Rather, the delay expression produces a promise object.

The promise object can later be passed to the force function (described later in this document). The force function will trigger the evaluation of the expression and retrieve the value.

The expression is evaluated in the original scope, no matter where the force takes place.

The expression is evaluated at most once, by the first call to force. Additional calls to force only retrieve a cached value.


  ;; list is popped only once: the value is computed
  ;; just once when force is called on a given promise
  ;; for the first time.

  (defun get-it (promise)
    (format t "*list* is ~s\n" *list*)
    (format t "item is ~s\n" (force promise))
    (format t "item is ~s\n" (force promise))
    (format t "*list* is ~s\n" *list*))

  (defvar *list* '(1 2 3))

  (get-it (delay (pop *list*)))


  *list* is (1 2 3)
  item is 1
  item is 1
  *list* is (2 3)


9.14.7 Accessor force


  (set (force
promise) new-value)


The force function accepts a promise object produced by the delay macro. The first time force is invoked, the expression which was wrapped inside promise by the delay macro is evaluated (in its original lexical environment, regardless of where in the program the force call takes place). The value of expression is cached inside promise and returned, becoming the return value of the force function call. If the force function is invoked additional times on the same promise, the cached value is retrieved.

A force form is a syntactic place, denoting the value cache location within promise.

Storing a value in a force place causes future accesses to the promise to return that value.

If the promise had not yet been forced, then storing a value into it prevents that from ever happening. The delayed expression will never be evaluated.

If, while a promise is being forced, the evaluation of expression itself causes an assignment to the promise, it is not specified whether the promise will take on the value of expression or the assigned value.


9.14.8 Function promisep




The promisep function returns t if object is a promise object: an object created by the delay macro. Otherwise it returns nil.

Note: promise objects are conses. The typeof function applied to a promise returns cons.


9.14.9 Macro mlet


  (mlet ({
sym | (sym init-form)}*) body-form*)


The mlet macro ("magic let" or "mutual let") implements a variable binding construct similar to let and let*.

Under mlet, the scope of the bindings of the sym variables extends over the init-form-s, as well as the body-form-s.

Unlike the let* construct, each init-form has each sym in scope. That is to say, an init-form can refer not only to previous variables, but also to later variables as well as to its own variable.

The variables are not initialized until their values are accessed for the first time. Any sym whose value is not accessed is not initialized.

Furthermore, the evaluation of each init-form does not take place until the time when its value is needed to initialize the associated sym. This evaluation takes place once. If a given sym is not accessed during the evaluation of the mlet construct, then its init-form is never evaluated.

The bound variables may be assigned. If, before initialization, a variable is updated in such a way that its prior value is not needed, it is unspecified whether initialization takes place, and thus whether its init-form is evaluated.

Direct circular references are erroneous and are diagnosed. This takes place when the macro-expanded form is evaluated, not during the expansion of mlet.


  ;; Dependent calculations in arbitrary order
  (mlet ((x (+ y 3))
         (z (+ x 1))
         (y 4))
    (+ z 4))  -->  12

  ;; Error: circular reference:
  ;; x depends on y, y on z, but z on x again.
  (mlet ((x (+ y 1))
         (y (+ z 1))
         (z (+ x 1)))

  ;; Okay: lazy circular reference because lcons is used
  (mlet ((list (lcons 1 list)))
    list)  -->  (1 1 1 1 1 ...) ;; circular list

In the last example, the list variable is accessed for the first time in the body of the mlet form. This causes the evaluation of the lcons form. This form evaluates its arguments lazily, which means that it is not a problem that list is not yet initialized. The form produces a lazy cons, which is then used to initialize list. When the car or cdr fields of the lazy cons are accessed, the list expression in the lcons argument is accessed. By that time, the variable is initialized and holds the lazy cons itself, which creates the circular reference, and a circular list.


9.14.10 Functions generate, giterate and ginterate


while-fun gen-fun)
while-fun gen-fun [value])
while-fun gen-fun [value])


The generate function produces a lazy list which dynamically produces items according to the following logic.

The arguments to generate are functions which do not take any arguments. The return value of generate is a lazy list.

When the lazy list is accessed, for instance with the functions car and cdr, it produces items on demand. Prior to producing each item, while-fun is called. If it returns a true Boolean value (any value other than nil), then the gen-fun function is called, and its return value is incorporated as the next item of the lazy list. But if while-fun yields nil, then the lazy list immediately terminates.

Prior to returning the lazy list, generate invokes the while-fun one time. If while-fun yields nil, then generate returns the empty list nil instead of a lazy list. Otherwise, it instantiates a lazy list, and invokes the gen-func to populate it with the first item.

The giterate function is similar to generate, except that while-fun and gen-fun are functions of one argument rather than functions of no arguments. The optional value argument defaults to nil and is threaded through the function calls. That is to say, the lazy list returned is (value [gen-fun value] [gen-fun [gen-fun value]] ...).

The lazy list terminates when a value fails to satisfy while-fun. That is to say, prior to generating each value, the lazy list tests the value using while-fun. If that function returns nil, then the item is not added, and the sequence terminates.

Note: giterate could be written in terms of generate like this:

  (defun giterate (w g v)
     (generate (lambda () [w v])
               (lambda () (prog1 v (set v [g v])))))

The ginterate function is a variant of giterate which includes the test-failing item in the generated sequence. That is to say ginterate generates the next value and adds it to the lazy list. The value is then tested using while-fun. If that function returns nil, then the list is terminated, and no more items are produced.


  (giterate (op > 5) (op + 1) 0) -> (0 1 2 3 4)
  (ginterate (op > 5) (op + 1) 0) -> (0 1 2 3 4 5)


9.14.11 Function expand-right


gen-fun value)


The expand-right function is a complement to reduce-right, with lazy semantics.

The gen-fun parameter is a function, which must accept a single argument, and return either a cons pair or nil.

The value parameter is any value.

The first call to gen-fun receives value.

The return value is interpreted as follows. If gen-fun returns a cons cell pair (elem . next) then elem specifies the element to be added to the lazy list, and next specifies the value to be passed to the next call to gen-fun. If gen-fun returns nil then the lazy list ends.


  ;; Count down from 5 to 1 using explicit lambda
  ;; for gen-fun:

    (lambda (item)
      (if (zerop item) nil
        (cons item (pred item))))
  --> (5 4 3 2 1)

  ;; Using functional combinators:
  [expand-right [iff zerop nilf [callf cons identity pred]] 5]
  --> (5 4 3 2 1)

  ;; Include zero:
    [iff null
       [callf cons identity [iff zerop nilf pred]]] 5]
  --> (5 4 3 2 1 0)


9.14.12 Functions expand-left and nexpand-left


gen-fun value)
gen-fun value)


The expand-left function is a companion to expand-right.

Unlike expand-right, it has eager semantics: it calls gen-fun repeatedly and accumulates an output list, not returning until gen-fun returns nil.

The semantics is as follows. expand-left initializes an empty accumulation list. Then gen-fun is called, with value as its argument.

If gen-fun it returns a cons cell, then the car of that cons cell is pushed onto the accumulation list, and the procedure is repeated: gen-fun is called again, with cdr taking the place of value.

If gen-fun returns nil, then the accumulation list is returned.

If the expression (expand-right f v) produces a terminating list, then the following equivalence holds:

  (expand-left f v) <--> (reverse (expand-right f v))

Of course, the equivalence cannot hold for arguments to expand-left which produce an infinite list.

The nexpand-left function is a destructive version of expand-left.

The list returned by nexpand-left is composed of the cons cells returned by gen-fun whereas the list returned by expand-left is composed of freshly allocated cons cells.


9.14.13 Function repeat


list [count])


If list is empty, then repeat returns an empty list.

If count is omitted, the repeat function produces an infinite lazy list formed by catenating together copies of list.

If count is specified and is zero or negative, then an empty list is returned.

Otherwise a list is returned consisting of count repetitions of list catenated together.


9.14.14 Function pad


sequence object [count])


The pad function produces a lazy list which consists of all of the elements of sequence followed by repetitions of object.

If count is omitted, then the repetition of object is infinite. Otherwise the specified number of repetitions occur.

Note that sequence may be a lazy list which is infinite. In that case, the repetitions of object will never occur.


9.14.15 Function weave


  (weave {


The weave function interleaves elements from the sequences given as arguments.

If called with no arguments, it returns the empty list.

If called with a single sequence, it returns the elements of that sequence as a new lazy list.

When called with two or more sequences, weave returns a lazy list which draws elements from the sequences in a round-robin fashion, repeatedly scanning the sequences from left to right, and taking an item from each one, removing it from the sequence. Whenever a sequence runs out of items, it is deleted; the weaving then continues with the remaining sequences. The weaved sequence terminates when all sequences are eliminated. (If at least one of the sequences is an infinite lazy list, then the weaved sequence is infinite.)


  ;; Weave negative integers with positive ones:
  (weave (range 1) (range -1 : -1)) -> (1 -1 2 -2 3 -3 ...)

  (weave "abcd" (range 1 3) '(x x x x x x x))
  --> (#\a 1 x #\b 2 x #\c 3 x #\d x x x x)


9.14.16 Macros gen and gun


while-expression produce-item-expression)


The gen macro operator produces a lazy list, in a manner similar to the generate function. Whereas the generate function takes functional arguments, the gen operator takes two expressions, which is often more convenient.

The return value of gen is a lazy list. When the lazy list is accessed, for instance with the functions car and cdr, it produces items on demand. Prior to producing each item, the while-expression is evaluated, in its original lexical scope. If the expression yields a non-nil value, then produce-item-expression is evaluated, and its return value is incorporated as the next item of the lazy list. If the expression yields nil, then the lazy list immediately terminates.

The gen operator itself immediately evaluates while-expression before producing the lazy list. If the expression yields nil, then the operator returns the empty list nil. Otherwise, it instantiates the lazy list and invokes the produce-item-expression to force the first item.

The gun macro similarly creates a lazy list according to the following rules. Each successive item of the lazy list is obtained as a result of evaluating produce-item-expression. However, when produce-item-expression yields nil, then the list terminates (without adding that nil as an item).

Note 1: the form gun can be implemented as a macro-expanding to an instance of the gen operator, like this:

  (defmacro gun (expr)
    (let ((var (gensym)))
      ^(let (,var)
         (gen (set ,var ,expr)

This exploits the fact that the set operator returns the value that is assigned, so the set expression is tested as a condition by gen, while having the side effect of storing the next item temporarily in a hidden variable.

In turn, gen can be implemented as a macro expanding to some lambda functions which are passed to the generate function:

  (defmacro gen (while-expr produce-expr)
    ^(generate (lambda () ,while-expr) (lambda () ,produce-expr)))

Note 2: gen can be considered as an acronym for Generate, testing Expression before Next item, whereas gun stands for Generate Until Null.


  ;; Make a lazy list of integers up to 1000
  ;; access and print the first three.
  (let* ((counter 0)
         (list (gen (< counter 1000) (inc counter))))
    (format t "~s ~s ~s\n" (pop list) (pop list) (pop list)))

  1 2 3


9.14.17 Functions range and range*


  (range [
from [to [step]]])
  (range* [
from [to [step]]])


The range and range* functions generate a lazy sequence of integers, with a fixed step between successive values.

The difference between range and range* is that range* excludes the endpoint. For instance (range 0 3) generates the list (0 1 2 3), whereas (range* 0 3) generates (0 1 2).

All arguments are optional. If the step argument is omitted, then it defaults to 1: each value in the sequence is greater than the previous one by 1. Positive or negative step sizes are allowed. There is no check for a step size of zero, or for a step direction which cannot meet the endpoint.

The to argument specifies the endpoint value, which, if it occurs in the sequence, is excluded from it by the range* function, but included by the range function. If to is missing, or specified as nil, then there is no endpoint, and the sequence which is generated is infinite, regardless of step.

If from is omitted, then the sequence begins at zero, otherwise from must be an integer which specifies the initial value.

The sequence stops if it reaches the endpoint value (which is included in the case of range, and excluded in the case of range*). However, a sequence with a stepsize greater than 1 or less than -1 might step over the endpoint value, and therefore never attain it. In this situation, the sequence also stops, and the excess value which surpasses the endpoint is excluded from the sequence.


9.14.18 Functions rlist and rlist*




The rlist ("range list") function is useful for producing a list consisting of a mixture of discontinuous numeric or character ranges and individual items.

The function returns a lazy list of elements. The items are produced by converting the function's successive item arguments into lists, which are lazily catenated together to form the output list.

Each item is transformed into a list as follows. Any item which is not a range object is trivially turned into a one-element list as if by the (list item*) expression.

Any item which is a range object, whose to field isn't a range is turned into a lazy list as if by evaluating the (range (from item)(to item)) expression. Thus for instance the argument 1..10 turns into the (lazy) list (1 2 3 4 5 6 7 8 9 10).

Any item which is a range object such that its to field is also a range is turned into a lazy list as if by evaluating the (range (from item)(from (to item))(to (to item))) expression. Thus for instance the argument expression 1..10..2 produces an item which rlist turns into the lazy list (1 3 5 7 9) as if by the call (range 1 10 2). Note that the expression 1..10..2 stands for the expression (range 1 (range 10 2)) which evaluates to #R(1 #R(10 2)).

The #R(1 #R(10 2)) range literal syntax can be passed as an argument to rlist with the same result as 1..10..2.

The rlist* function differs from rlist in one regard: under rlist*, the ranges denoted by the range notation exclude the endpoint. That is, the ranges are generated as if by the range* function rather than range.

Note: it is permissible for item objects to specify infinite ranges. It is also permissible to apply an infinite argument list to rlist.


  (rlist 1 "two" :three)  ->  (1 "two" :three)
  (rlist 10 15..16 #\a..#\d 2) -> (10 15 16 #\a #\b #\c 2)
  (take 7 (rlist 1 2 5..:)) -> (1 2 5 6 7 8 9)


9.15 Ranges

Ranges are objects that aggregate two values, not unlike cons cells. However, they are atoms, and are primarily intended to hold numeric or character values in their two fields. These fields are called from and to which are the names of the functions which access them. These fields are not mutable; a new value cannot be stored into either field of a range.

The printed notation for a range object consists of the prefix #R (hash R) followed by the two values expressed as a two-element list. Ranges can be constructed using the rcons function. The notation x..y corresponds to (rcons x y).

Ranges behave as a numeric type and support a subset of the numeric operations. Two ranges can be added or subtracted, which obeys these equivalences:

  (+ a..b c..d)  <-->  (+ a c)..(+ b d)
  (- a..b c..d)  <-->  (- a c)..(- b d)

A range a..b can be combined with a character or number n using addition or subtractions, which obeys these equivalences:

  (+ a..b n)  <-->  (+ n a..b)  <-->  (+ a n)..(+ b n)
  (- a..b n)  <-->  (- a n)..(- b n)
  (- n a..b)  <-->  (- n a)..(- n b)

A range can be multiplied by a number:

  (* a..b n)  <-->  (* n a..b)  <-->  (* a n)..(* b n)

A range can be divided by a number using the / or trunc functions, but a number cannot be divided by a range:

  (trunc a..b n)  <-->  (trunc a n)..(trunc b n)
  (/ a..b n)      <-->  (/ a n)..(/ b n)

Ranges can be compared using the equality and inequality functions =, <, >, <= and >=. Equality obeys this equivalence:

  (= a..b c..d)  <-->  (and (= a c) (= b d))

Inequality comparisons treat the from component with precedence over to such that only if the from components of the two ranges are not equal under the = function, then the inequality is based solely on them. If they are equal, then the inequality is based on the to components. This gives rise to the following equivalences:

  (< a..b c..d)   <-->  (if (= a c) (< b d) (< a c))
  (> a..b c..d)   <-->  (if (= a c) (> b d) (> a c))
  (>= a..b c..d)  <-->  (if (= a c) (>= b d) (> a c))
  (<= a..b c..d)  <-->  (if (= a c) (<= b d) (< a c))

Ranges can be negated with the one-argument form of the - function, which is equivalent to subtraction from zero: the negation distributes over the two range components.

The abs function also applies to ranges and distributes into their components.

The succ and pred family of functions also operate on ranges.

The length of a range may be obtained with the length function;

The length of the range a..b is defined as (- b a), and may be obtained using the length function. The empty function accepts ranges and tests them for zero length.


9.15.1 Function rcons


from to)


The rcons function constructs a range object which holds the values from and to.

Though range objects are effectively binary cells like conses, they are atoms. They also aren't considered sequences, nor are they structures.

Range objects are used for indicating numeric ranges, such as substrings of lists, arrays and strings. The dotdot notation serves as a syntactic sugar for rcons. The syntax a..b denotes the expression (rcons a b).

Note that ranges are immutable, meaning that it is not possible to replace the values in a range.


9.15.2 Function rangep




The rangep function returns t if value is a range. Otherwise it returns nil.


9.15.3 Functions from and to




The from and to functions retrieve, respectively, the from and to fields of a range.

Note that these functions are not accessors, which is because ranges are immutable.


9.16 Characters and Strings


9.16.1 Function mkstring


length [char])


The mkstring function constructs a string object of a length specified by the length parameter. Every position in the string is initialized with char, which must be a character value.

If the optional argument char is not specified, it defaults to the space character.


9.16.2 Function copy-str




The copy-str function constructs a new string whose contents are identical to string.

If string is a lazy string, then a lazy string is constructed with the same attributes as string. The new lazy string has its own copy of the prefix portion of string which has been forced so far. The unforced list and separator string are shared between string and the newly constructed lazy string.


9.16.3 Function upcase-str




The upcase-str function produces a copy of string such that all lower-case characters of the English alphabet are mapped to their upper case counterparts.


9.16.4 Function downcase-str




The downcase-str function produces a copy of string such that all upper case characters of the English alphabet are mapped to their lower case counterparts.


9.16.5 Function string-extend


string tail)


The string-extend function destructively increases the length of string, which must be an ordinary dynamic string. It is an error to invoke this function on a literal string or a lazy string.

The tail argument can be a character, string or integer. If it is a string or character, it specifies material which is to be added to the end of the string: either a single character or a sequence of characters. If it is an integer, it specifies the number of characters to be added to the string.

If tail is an integer, the newly added characters have indeterminate contents. The string appears to be the original one because of an internal terminating null character remains in place, but the characters beyond the terminating zero are indeterminate.


9.16.6 Function stringp




The stringp function returns t if obj is one of the several kinds of strings. Otherwise it returns nil.


9.16.7 Function length-str




The length-str function returns the length string in characters. The argument must be a string.


9.16.8 Function search-str


haystack needle [start [from-end]])


The search-str function finds an occurrence of the string needle inside the haystack string and returns its position. If no such occurrence exists, it returns nil.

If a start argument is not specified, it defaults to zero. If it is a non-negative integer, it specifies the starting character position for the search. Negative values of start indicate positions from the end of the string, such that -1 is the last character of the string.

If the from-end argument is specified and is not nil, it means that the search is conducted right-to-left. If multiple matches are possible, it will find the rightmost one rather than the leftmost one.


9.16.9 Function search-str-tree


haystack tree [start [from-end]])


The search-str-tree function is similar to search-str, except that instead of searching haystack for the occurrence of a single needle string, it searches for the occurrence of numerous strings at the same time. These search strings are specified, via the tree argument, as an arbitrarily structured tree whose leaves are strings.

The function finds the earliest possible match, in the given search direction, from among all of the needle strings.

If tree is a single string, the semantics is equivalent to search-str.


9.16.10 Function match-str


bigstring littlestring [start])


Without the start argument, the match-str function determines whether littlestring is a prefix of bigstring, returning a t or nil indication.

If the start argument is specified, and is a non-negative integer, then the function tests whether littlestring matches a prefix of that portion of bigstring which starts at the given position.

If the start argument is a negative integer, then match-str determines whether littlestring is a suffix of bigstring, ending on that position of bigstring, where -1 denotes the last character of bigstring, -2 the second last one and so on.

If start is -1, then this corresponds to testing whether littlestring is a suffix of bigstring.


9.16.11 Function match-str-tree


bigstring tree [start])


The match-str-tree function is a generalization of match-str which matches multiple test strings against bigstring at the same time. The value reported is the longest match from among any of the strings.

The strings are specified as an arbitrarily shaped tree structure which has strings at the leaves.

If tree is a single string atom, then the function behaves exactly like match-str.


9.16.12 Function sub-str


string [from [to]])


The sub-str function is like the more generic function sub, except that it operates only on strings. For a description of the arguments and semantics, refer to the sub function.


9.16.13 Function replace-str


string item-sequence [from [to]])


The replace-str function is like the replace function, except that the first argument must be a string.

For a description of the arguments, semantics and return value, refer to the replace function.


9.16.14 Function cat-str


string-list [sep-string])


The cat-str function catenates a list of strings given by string-list into a single string. The optional sep-string argument specifies a separator string which is interposed between the catenated strings.


9.16.15 Function split-str


string sep [keep-between])


The split-str function breaks the string into pieces, returning a list thereof. The sep argument must be either a string or a regular expression. It specifies the separator character sequence within string.

All non-overlapping matches for sep within string are identified in left to right order, and are removed from string. The string is broken into pieces according to the gaps left behind by the removed separators, and a list of the remaining pieces is returned.

If sep is the empty string, then the separator pieces removed from the string are considered to be the empty strings between its characters. In this case, if string is of length one or zero, then it is considered to have no such pieces, and a list of one element is returned containing the original string. These remarks also apply to the situation when sep is a regular expression which matches only an empty substring of string.

If a match for sep is not found in the string at all (not even an empty match), then the string is not split at all: a list of one element is returned containing the original string.

If sep matches the entire string, then a list of two empty strings is returned, except in the case that the original string is empty, in which case a list of one element is returned, containing the empty string.

Whenever two adjacent matches for sep occur, they are considered separate cuts with an empty piece between them.

This operation is nondestructive: string is not modified in any way.

If the optional keep-between argument is specified and is not nil, If an argument is given and is true, then split-str incorporates the matching separating pieces of string into the resulting list, such that if the resulting list is catenated, a string equivalent to the original string will be produced.

Note: to split a string into pieces of length one such that an empty string produces nil rather than (), use the (tok-str string #/./) pattern.

Note: the function call (split-str s r t) produces a resulting list identical to (tok-str s r t), for all values of r and s, provided that r does not match empty strings. If r matches empty strings, then the tok-str call returns extra elements compared to split-str, because tok-str allows empty matches to take place and extract empty tokens before the first character of the string, and after the last character, whereas split-str does not recognize empty separators at these outer limits of the string.


9.16.16 Function spl


sep [keep-between] string)


The spl function performs the same computation as split-str. The same-named parameters of spl and split-str have the same semantics. The difference is the argument order. The spl function takes the sep argument first. The last argument is always string whether or not there are two arguments or three. If there are three arguments, then keep-between is the middle one.

Note: the argument conventions of spl facilitate less verbose partial application, such as with macros in the op family, in the common situation when string is the unbound argument.


9.16.17 Function split-str-set


string set)


The split-str-set function breaks the string into pieces, returning a list thereof. The set argument must be a string. It specifies a set of characters. All occurrences of any of these characters within string are identified, and are removed from string. The string is broken into pieces according to the gaps left behind by the removed separators.

Adjacent occurrences of characters from set within string are considered to be separate gaps which come between empty strings.

This operation is nondestructive: string is not modified in any way.


9.16.18 Functions tok-str and tok-where


string regex [keep-between])
string regex)


The tok-str function searches string for tokens, which are defined as substrings of string which match the regular expression regex in the longest possible way, and do not overlap. These tokens are extracted from the string and returned as a list.

Whenever regex matches an empty string, then an empty token is returned, and the search for another token within string resumes after advancing by one character position. However, if an empty match occurs immediately after a non-empty token, that empty match is not turned into a token.

So for instance, (tok-str "abc" #/a?/) returns ("a" "" ""). After the token "a" is extracted from a non-empty match for the regex, an empty match for the regex occurs just before the character b. This match is discarded because it is an empty match which immediately follows the non-empty match. The character b is skipped. The next match is an empty match between the b and c characters. This match causes an empty token to be extracted. The character c is skipped, and one more empty match occurs after that character and is extracted.

If the keep-between argument is specified, and is not nil, then the behavior of tok-str changes in the following way. The pieces of string which are skipped by the search for tokens are included in the output. If no token is found in string, then a list of one element is returned, containing string. Generally, if N tokens are found, then the returned list consists of 2N + 1 elements. The first element of the list is the (possibly empty) substring which had to be skipped to find the first token. Then the token follows. The next element is the next skipped substring and so on. The last element is the substring of string between the last token and the end.

The tok-where function works similarly to tok-str, but instead of returning the extracted tokens themselves, it returns a list of the character position ranges within string where matches for regex occur. The ranges are pairs of numbers, represented as cons cells, where the first number of the pair gives the starting character position, and the second number is one position past the end of the match. If a match is empty, then the two numbers are equal.

The tok-where function does not support the keep-between parameter.


9.16.19 Function tok


regex [keep-between] string)


The tok function performs the same computation as tok-str. The same-named parameters of tok and tok-str have the same semantics. The difference is the argument order. The tok function takes the regex argument first. The last argument is always string whether or not there are two arguments or three. If there are three arguments, then keep-between is the middle one.

Note: the argument conventions of tok facilitate less verbose partial application, such as with macros in the op family, in the common situation when string is the unbound argument.


9.16.20 Function list-str




The list-str function converts a string into a list of characters.


9.16.21 Function trim-str




The trim-str function produces a copy of string from which leading and trailing tabs, spaces and newlines are removed.


9.16.22 Function chrp




Returns t if obj is a character, otherwise nil.


9.16.23 Function chr-isalnum




Returns t if char is an alpha-numeric character, otherwise nil. Alpha-numeric means one of the upper or lower case letters of the English alphabet found in ASCII, or an ASCII digit. This function is not affected by locale.


9.16.24 Function chr-isalpha




Returns t if char is an alphabetic character, otherwise nil. Alphabetic means one of the upper or lower case letters of the English alphabet found in ASCII. This function is not affected by locale.


9.16.25 Function chr-isascii




This function returns t if the code of character char is in the range 0 to 127 inclusive. For characters outside of this range, it returns nil.


9.16.26 Function chr-iscntrl




This function returns t if the character char is a character whose code ranges from 0 to 31, or is 127. In other words, any non-printable ASCII character. For other characters, it returns nil.


9.16.27 Functions chr-isdigit and chr-digit




If char is is an ASCII decimal digit character, chr-isdigit returns the value t and chr-digit returns the integer value corresponding to that digit character, a value in the range 0 to 9. Otherwise, both functions return nil.


9.16.28 Function chr-isgraph




This function returns t if char is a non-space printable ASCII character. It returns nil if it is a space or control character.

It also returns nil for non-ASCII characters: Unicode characters with a code above 127.


9.16.29 Function chr-islower




This function returns t if char is an ASCII lower case letter. Otherwise it returns nil.


9.16.30 Function chr-isprint




This function returns t if char is an ASCII character which is not a control character. It also returns nil for all non-ASCII characters: Unicode characters with a code above 127.


9.16.31 Function chr-ispunct




This function returns t if char is an ASCII character which is not a control character. It also returns nil for all non-ASCII characters: Unicode characters with a code above 127.


9.16.32 Function chr-isspace




This function returns t if char is an ASCII whitespace character: any of the characters in the set #\space, #\tab, #\linefeed, #\newline, #\return, #\vtab and #\page. For all other characters, it returns nil.


9.16.33 Function chr-isblank




This function returns t if char is a space or tab: the character #\space or #\tab. For all other characters, it returns nil.


9.16.34 Function chr-isunisp




This function returns t if char is a Unicode whitespace character. This the case for all the characters for which chr-isspace returns t. It also returns t for these additional characters: #\xa0, #\x1680, #\x180e, #\x2000, #\x2001, #\x2002, #\x2003, #\x2004, #\x2005, #\x2006, #\x2007, #\x2008, #\x2009, #\x200a, #\x2028, #\x2029, #\x205f, and #\x3000. For all other characters, it returns nil.


9.16.35 Function chr-isupper


char )


This function returns t if char is an ASCII upper case letter. Otherwise it returns nil.


9.16.36 Functions chr-isxdigit and chr-xdigit




If char is a hexadecimal digit character, chr-isxdigit returns the value t and chr-xdigit returns the integer value corresponding to that digit character, a value in the range 0 to 15. Otherwise, both functions returns nil.

A hexadecimal digit is one of the ASCII digit characters 0 through 9, or else one of the letters A through F or their lower-case equivalents a through f denoting the values 10 to 15.


9.16.37 Function chr-toupper




If character char is a lower case ASCII letter character, this function returns the upper case equivalent character. If it is some other character, then it just returns char.


9.16.38 Function chr-tolower




If character char is an upper case ASCII letter character, this function returns the lower case equivalent character. If it is some other character, then it just returns char.


9.16.39 Functions int-chr and chr-int




The argument char must be a character. The num-chr function returns that character's Unicode code point value as an integer.

The argument num must be a fixnum integer in the range 0 to #\x10FFFF. The argument is taken to be a Unicode code point value and the corresponding character object is returned.

Note: these functions are also known by the obsolescent names num-chr and chr-num.


9.16.40 Accessor chr-str


str idx)
  (set (chr-str
str idx) new-value)


The chr-str function performs random access on string str to retrieve the character whose position is given by integer idx, which must be within range of the string.

The index value 0 corresponds to the first (leftmost) character of the string and so non-negative values up to one less than the length are possible.

Negative index values are also allowed, such that -1 corresponds to the last (rightmost) character of the string, and so negative values down to the additive inverse of the string length are possible.

An empty string cannot be indexed. A string of length one supports index 0 and index -1. A string of length two is indexed left to right by the values 0 and 1, and from right to left by -1 and -2.

If the element idx of string str exists, and the string is modifiable, then the chr-str form denotes a place.

A chr-str place supports deletion. When a deletion takes place, then the character at idx is removed from the string. Any characters after that position move by one position to close the gap, and the length of the string decreases by one.


Direct use of chr-str is equivalent to the DWIM bracket notation except that str must be a string. The following relation holds:

  (chr-str s i) --> [s i]

since [s i] <--> (ref s i), this also holds:

  (chr-str s i) --> (ref s i)

However, note the following difference. When the expression [s i] is used as a place, then the subexpression s must be a place. When (chr-str s i) is used as a place, s need not be a place.


9.16.41 Function chr-str-set


str idx char)


The chr-str function performs random access on string str to overwrite the character whose position is given by integer idx, which must be within range of the string. The character at idx is overwritten with character char.

The idx argument works exactly as in chr-str.

The str argument must be a modifiable string.


Direct use of chr-str is equivalent to the DWIM bracket notation provided that str is a string and idx an integer. The following relation holds:

  (chr-str-set s i c) --> (set [s i] c)

Since (set [s i] c) <--> (refset s i c) for an integer index i, this also holds:

  (chr-str s i) --> (refset s i c)


9.16.42 Function span-str


str set)


The span-str function determines the longest prefix of string str which consists only of the characters in string set, in any combination.


9.16.43 Function compl-span-str


str set)


The compl-span-str function determines the longest prefix of string str which consists only of the characters which do not appear in set, in any combination.


9.16.44 Function break-str


str set)


The break-str function returns an integer which represents the position of the first character in string str which appears in string set.

If there is no such character, then nil is returned.


9.17 Lazy Strings

Lazy strings are objects that were developed for the TXR pattern matching language, and are exposed via TXR Lisp. Lazy strings behave much like strings, and can be substituted for strings. However, unlike regular strings, which exist in their entirety, first to last character, from the moment they are created, lazy strings do not exist all at once, but are created on demand. If character at index N of a lazy string is accessed, then characters 0 through N of that string are forced into existence. However, characters at indices beyond N need not necessarily exist.

A lazy string dynamically grows by acquiring new text from a list of strings which is attached to that lazy string object. When the lazy string is accessed beyond the end of its hitherto materialized prefix, it takes enough strings from the list in order to materialize the index. If the list doesn't have enough material, then the access fails, just like an access beyond the end of a regular string. A lazy string always takes whole strings from the attached list.

Lazy string growth is achieved via the lazy-str-force-upto function which forces a string to exist up to a given character position. This function is used internally to handle various situations.

The lazy-str-force function forces the entire string to materialize. If the string is connected to an infinite lazy list, this will exhaust all memory.

Lazy strings are specially recognized in many of the regular string functions, which do the right thing with lazy strings. For instance when sub-str is invoked on a lazy string, a special version of the sub-str logic is used which handles various lazy string cases, and can potentially return another lazy string. Taking a sub-str of a lazy string from a given character position to the end does not force the entire lazy string to exist, and in fact the operation will work on a lazy string that is infinite.

Furthermore, special lazy string functions are provided which allow programs to be written carefully to take better advantage of lazy strings. What carefully means is code that avoids unnecessarily forcing the lazy string. For instance, in many situations it is necessary to obtain the length of a string, only to test it for equality or inequality with some number. But it is not necessary to compute the length of a string in order to know that it is greater than some value.


9.17.1 Function lazy-str


string-list [terminator [limit-count]])


The lazy-str function constructs a lazy string which draws material from string-list which is a list of strings.

If the optional terminator argument is given, then it specifies a string which is appended to every string from string-list, before that string is incorporated into the lazy string. If terminator is not given, then it defaults to the string "\n", and so the strings from string-list are effectively treated as lines which get terminated by newlines as they accumulate into the growing prefix of the lazy string. To avoid the use of a terminator string, a null string terminator argument must be explicitly passed. In that case, the lazy string grows simply by catenating elements from string-list.

If the limit-count argument is specified, it must be a positive integer. It expresses a maximum limit on how many elements will be consumed from string-list in order to feed the lazy string. Once that many elements are drawn, the string ends, even if the list has not been exhausted.


9.17.2 Function lazy-stringp




The lazy-stringp function returns t if obj is a lazy string. Otherwise it returns nil.


9.17.3 Function lazy-str-force-upto


lazy-str index)


The lazy-str-force-upto function tries to instantiate the lazy string such that the position given by index materializes. The index is a character position, exactly as used in the chr-str function.

Some positions beyond index may also materialize, as a side effect.

If the string is already materialized through to at least index, or if it is possible to materialize the string that far, then the value t is returned to indicate success.

If there is insufficient material to force the lazy string through to the index position, then nil is returned.

It is an error if the lazy-str argument isn't a lazy string.


9.17.4 Function lazy-str-force




The lazy-str argument must be a lazy string. The lazy string is forced to fully materialize.

The return value is an ordinary, non-lazy string equivalent to the fully materialized lazy string.


9.17.5 Function lazy-str-get-trailing-list


string index)


The lazy-str-get-trailing-list function can be considered, in some way, an inverse operation to the production of the lazy string from its associated list.

First, string is forced up through the position index. That is the only extent to which string is modified by this function.

Next, the suffix of the materialized part of the lazy string starting at position index, is split into pieces on occurrences of the terminator character (which had been given as the terminator argument in the lazy-str constructor, and defaults to newline). If the index position is beyond the part of the string which can be materialized (in adherence with the lazy string's limit-count constructor parameter), then the list of pieces is considered to be empty.

Finally, a list is returned consisting of the pieces produced by the split, to which is appended the remaining list of the string which has not yet been forced to materialize.


9.17.6 Functions length-str->, length-str->=, length-str-< and length-str-<=


string len)
string len)
string len)
string len)


These functions compare the lengths of two strings. The following equivalences hold, as far as the resulting value is concerned:

  (length-str-> s l) <--> (> (length-str s) l)
  (length-str->= s l) <--> (>= (length-str s) l)
  (length-str-< s l) <--> (< (length-str s) l)
  (length-str-<= s l) <--> (<= (length-str s) l)

The difference between the functions and the equivalent forms is that if the string is lazy, the length-str function will fully force it in order to calculate and return its length.

These functions only force a string up to position len, so they are not only more efficient, but on infinitely long lazy strings they are usable.

length-str cannot compute the length of a lazy string with an unbounded length; it will exhaust all memory trying to force the string.

These functions can be used to test such as string whether it is longer or shorter than a given length, without forcing the string beyond that length.


9.17.7 Function cmp-str


left-string right-string)


The cmp-str function returns a negative integer if left-string is lexicographically prior to right-string, and a positive integer if the reverse situation is the case. Otherwise the strings are equal and zero is returned.

If either or both of the strings are lazy, then they are only forced to the minimum extent necessary for the function to reach a conclusion and return the appropriate value, since there is no need to look beyond the first character position in which they differ.

The lexicographic ordering is naive, based on the character code point values in Unicode taken as integers, without regard for locale-specific collation orders.


9.17.8 Functions str=, str<, str>, str>= and str<=


left-string right-string)
left-string right-string)
left-string right-string)
left-string right-string)
left-string right-string)


These functions compare left-string and right-string lexicographically, as if by the cmp-str function.

The str= function returns t if the two strings are exactly the same, character for character, otherwise it returns nil.

The str< function returns t if left-string is lexicographically before right-string, otherwise nil.

The str> function returns t if left-string is lexicographically after right-string, otherwise nil.

The str< function returns t if left-string is lexicographically before right-string, or if they are exactly the same, otherwise nil.

The str< function returns t if left-string is lexicographically after right-string, or if they are exactly the same, otherwise nil.


9.17.9 Function string-lt


left-str right-str)


The string-lt is a deprecated alias for str<.


9.18 Vectors


9.18.1 Function vector


length [initval])


The vector function creates and returns a vector object of the specified length. The elements of the vector are initialized to initval, or to nil if initval is omitted.


9.18.2 Function vec




The vec function creates a vector out of its arguments.


9.18.3 Function vectorp




The vectorp function returns t if obj is a vector, otherwise it returns nil.


9.18.4 Function vec-set-length


vec len)


The vec-set-length modifies the length of vec, making it longer or shorter. If the vector is made longer, then the newly added elements are initialized to nil. The len argument must be nonnegative.

The return value is vec.


9.18.5 Accessor vecref


vec idx)
  (set (vecref
vec idx) new-value)


The vecref function performs indexing into a vector. It retrieves an element of vec at position idx, counted from zero. The idx value must range from 0 to one less than the length of the vector. The specified element is returned.

If the element idx of vector vec exists, then the vecref form denotes a place.

A vecref place supports deletion. When a deletion takes place, then if idx denotes the last element in the vector, the vector's length is decreased by one, so that the vector no longer has that element. Otherwise, if idx isn't the last element, then each elements values at a higher index than idx shifts by one one element position to the adjacent lower index. Then, the length of the vector is decreased by one, so that the last element position disappears.


9.18.6 Function vec-push


vec elem)


The vec-push function extends the length of a vector vec by one element, and sets the new element to the value elem.

The previous length of the vector (which is also the position of elem) is returned.

This function performs similarly to the generic function ref, except that the first argument must be a vector.


9.18.7 Function length-vec




The length-vec function returns the length of vector vec. It performs similarly to the generic length function, except that the argument must be a vector.


9.18.8 Function size-vec




The size-vec function returns the number of elements for which storage is reserved in the vector vec.


The length of the vector can be extended up to this size without any memory allocation operations having to be performed.


9.18.9 Function vec-list




This function returns a vector which contains all of the same elements and in the same order as list list.

Note: this function is also known by the obsolescent name vector-list.


9.18.10 Function list-vec




The list-vec function returns a list of the elements of vector vec.

Note: this function is also known by the obsolescent name list-vector.


9.18.11 Function copy-vec




The copy-vec function returns a new vector object of the same length as vec and containing the same elements in the same order.


9.18.12 Function sub-vec


vec [from [to]])


The sub-vec function is like the more generic function sub, except that it operates only on vectors.

For a description of the arguments and semantics, refer to the sub function.


9.18.13 Function replace-vec


vec item-sequence [from [to]])


The replace-vec is like the replace function, except that the first argument must be a vector.

For a description of the arguments, semantics and return value, refer to the replace function.


9.18.14 Function cat-vec




The vec-list argument is a list of vectors. The cat-vec function produces a catenation of the vectors listed in vec-list. It returns a single large vector formed by catenating those vectors together in order.


9.19 Structures

TXR supports a structure data type. Structures are objects which hold multiple storage locations called slots, which are named by symbols. Structures can be related to each other by inheritance.

The type of a structure is itself an object, of type struct-type.

When the program defines a new structure type, it does so by creating a new struct-type instance, with properties which describe the new structure type: its name, its list of slots, its initialization and "boa constructor" functions, and the structure type it inherits from (the "super").

The struct-type object is then used to generate instances.

Structures instances are not only containers which hold named slots, but they also indicate their struct type. Two structures which have the same number of slots having the same names are not necessarily of the same type.

Structure types and structures may be created and manipulated using a programming interface based on functions.

For more convenient and clutter-free expression of structure-based program code, macros are also provided.

Furthermore, concise and expressive slot access syntax is provided courtesy of the referencing dot and unbound referencing dot syntax, a syntactic sugar for the qref and uref macros.

Structure types have a name, which is a symbol. The typeof function, when applied to any struct type, returns the symbol struct-type. When typeof is applied to a struct instance, it returns the name of the struct type. Effectively, struct names are types.

The consequences are unspecified if an existing struct name is re-used for a different struct type, or an existing type name is used for a struct type.


9.19.1 Static Slots

Structure slots can be of two kinds: they can be the ordinary instance slots or they can be static slots. The instances of a given structure type have their own instance of a given instance slot. However, they all share a single instance of a static slot.

Static slots are allocated in a global area associated with a structure type and are initialized when the structure type is created. They are useful for efficiently representing properties which have the same value for all instances of a struct. These properties don't have to occupy space in each instance, and time doesn't have to be wasted initializing them each time a new instance is created. Static slots are also useful for struct-specific global variables. Lastly, static slots are also useful for holding methods and functions. Although structures can have methods and functions in their instances, usually, all structures of the same type share the same functions. The defstruct macro supports a special syntax for defining methods and struct-specific functions at the same time when a new structure type is defined. The defmeth macro can be used for adding new methods and functions to an existing structure and its descendants.

Static slots may be assigned just like instance slots. Changing a static slot, of course, changes that slot in every structure of the same type.

Static slots are not listed in the #S(...) notation when a structure is printed. When the structure notation is read from a stream, if static slots are present, they will be processed and their values stored in the static locations they represent, thus changing their values for all instances.

Static slots are inherited just like instance slots. If a given structure B has some static slot s, and a new structure D is derived from B, using defstruct, and does not define a slot s, then D inherits s. This means that D shares the static slot with B: both types share a single instance of that slot.

On the other hand if D defines a static slot s then that slot will have its own instance in the D structure type; D will not inherit the B instance of slot s. Moreover, if the the definition of D omits the init-form for slot s, then that slot will be initialized with a copy of the current value of slot s of the B base type, which allows derived types to obtain the value of base type's static slot, yet have that in their own instance.

The slot type can be overridden. A structure type deriving from another type can introduce slots which have the same names as the supertype, but are of a different kind: an instance slot in the supertype can be replaced by a static slot in the derived type or vice versa.

A structure type is associated with a static initialization function which may be used to store initial values into static slots. This function is invoked once in a type's life time, when the type is created. The function is also inherited by derived struct types and invoked when they are created.


9.19.2 Dirty Flags

All structure instances contain a Boolean flag called the dirty flag. This flag is not a slot, but rather a meta-data property that is exposed to program access. When the flag is set, an object is said to be dirty; otherwise it is clean.

Newly constructed objects come into existence dirty. The dirty flag state can be tested with the function test-dirty. An object can be marked as clean by clearing its dirty flag with clear-dirty. A combined operation test-clear-dirty is provided which clears the dirty flag, and returns its previous value.

The dirty flag is set whenever a new value is stored into the instance slot of an object.

Note: the dirty flag can be used to support support the caching of values derived from an object's slots. The derived values don't have to be re-computed while an object remains clean.


9.19.3 Equality Substitution

In object-based or object-oriented programming, sometimes it is necessary for a new data type to provide its own notion of equality: its own requirements for when two distinct instances of the type are considered equal. Furthermore, types sometimes have to implement their own notion, also, of inequality: the requirements for the manner in which one instance is considered lesser or greater than another.

TXR Lisp structures implement a concept called equality substitution which provides a simple, unified way for the implementor of an object to encode the requirements for both equality and inequality. Equality substitution allows for objects to be used as keys in a hash table according to the custom equality, without the programmer being burdened with the responsibility of developing a custom hashing function.

An object participates in equality substitution by implementing the equal method. The equal method takes no arguments other than the object itself. It returns a representative value which is used in place of that object for the purposes of equal comparison.

Whenever an object which supports equality substitution is used as an argument of any of the functions equal, nequal, greater, less, gequal, lequal or hash-equal, the equal method of that object is invoked, and the return value of that method is taken in place of that object.

The same is true if an object which supports equality substitution is used as a key in an :equal-based hash table.

The substitution is applied repeatedly: if the return value of the object's equal method is an object which itself supports equality substitution, than that returned object's method is invoked on that object to fetch its equality substitute. This repeats as many times as necessary until an object is determined which isn't a structure that supports equality substitution.

Once the equality substitute is determined, then the given function proceeds with the replacement object. Thus for example equal compares the replacement object in place of the original, and an :equal-based hash table uses the replacement object as the key for the purposes of hashing and comparison.


9.19.4 Macro defstruct


  (defstruct {
name | (name arg*)} super


The defstruct macro defines a new structure type and registers it under name, which must be a bindable symbol, according to the bindable function. Likewise, the name of every slot must also be a bindable symbol.

The super argument must either be nil or a symbol which names an existing struct type. The newly defined struct type will inherit all slots, as well as initialization behaviors from this type.

The defstruct macro is implemented using the make-struct-type function, which is more general. The macro analyzes the defstruct argument syntax, and synthesizes arguments which are then used to call the function. Some remarks in the description of defstruct only apply to structure types defined using that macro.

Slots are specified using zero or more slot specifiers. Slot specifiers come in the following variety:

The simplest slot specifier is just a name, which must be a bindable symbol, as defined by the bindable function. This form is a short form for the (:instance name) syntax.
(name init-form)
This syntax is a short form for the (:instance name init-form) syntax.
(:instance name [init-form])
This syntax specifies an instance slot called name whose initial value is obtained by evaluating init-form whenever a new instance of the structure is created. This evaluation takes place in the original lexical environment in which the defstruct form occurs. If init-form is omitted, the slot is initialized to nil.
(:static name [init-form])
This syntax specifies a static slot called name whose initial value is obtained by evaluating init-form once, during the evaluation of the defstruct form in which it occurs, if the init-form is present. If init-form is absent, and a static slot with the same name exists in the super base type, then this slot is initialized with the value of that slot. Otherwise it is initialized to nil.

The definition of a static slot in a defstruct causes the new type to have its own instance that slot, even if a same-named static slot occurs in the super base type, or its bases.

(:method name (param+) body-form*)
This syntax creates a static slot called name which is initialized with an anonymous function. The anonymous function is created during the evaluation of the defstruct form. The function takes the arguments specified by the param symbols, and its body consists of the body-form-s. There must be at least one param. When the function is invoked as a method, as intended, the leftmost param receives the structure instance. The body-form-s are evaluated in a context in which a block named name is visible. Consequently, return-from may be used to terminate the execution of a method and return a value. Methods are invoked using the instance.(name arg ...) syntax, which implicitly inserts the instance into the argument list.

Due to the semantics of static slots, methods are naturally inherited from a base structure to a derived one, and defining a method in a derived class which also exists in a base class performs OOP-style overriding.

(:function name (param*) body-form*)
This syntax creates a static slot called name which is initialized with an anonymous function. The anonymous function is created during the evaluation of the defstruct form. The function takes the arguments specified by the param symbols, and its body consists of the body-form-s. This specifier differs from :method only in one respect: there may be zero parameters. A structure function defined this way is intended to be used as a utility function which doesn't receive the structure instance as an argument. The body-form-s are evaluated in a context in which a block named name is visible. Consequently, return-from may be used to terminate the execution of the function and return a value. Such functions are called using the instance.[name arg ...] syntax which doesn't insert the instance into the argument list.

The remarks about inheritance and overriding in the description of :method also apply to :function.

(:init (param) body-form*)
The :init specifier doesn't describe a slot. Rather, it specifies code which is executed when a structure is instantiated, after the slot initializations specific to the structure type are performed. The code consists of body-form-s which are evaluated in order in a lexical scope in which the variable param is bound to the structure object.

The :init specifier may not appear more than once in a given defstruct form.

When an object with one or more levels of inheritance is instantiated, the :init code of a base structure type, if any, is executed before any initializations specific to a derived structure type.

The :init initializations are executed before any other slot initializations. The argument values passed to the new or lnew operator or the make-struct function are not yet stored in the object's slots, and are not accessible. Initialization code which needs these values to be stable can be defined with :postinit.

Initializers in base structures must be careful about assumptions about slot kinds, because derived structures can alter static slots to instance slots or vice versa. To avoid an unwanted initialization being applied to the wrong kind of slot, initialization code can be made conditional on the outcome of static-slot-p applied to the slot. (Code generated by defstruct for initializing instance slots performs this kind of check).

The body-form-s of an :init specifier are not surrounded by an implicit block.

(:postinit (param) body-form*)
The :postinit specifier is very similar to :init. Both specify forms which are evaluated during object instantiation. The difference is that the body-form-s of a :postinit are evaluated after other initializations have taken place, including the :init initializations, as a second pass. By the time :postinit initialization runs, the argument material from the make-struct, new or lnew invocation has already been processed and stored into slots. Like :init actions, :postinit actions registered at different levels of the type's inheritance hierarchy are invoked in the base-to-derived order.
(:fini (param) body-form*)
The :fini specifier doesn't describe a slot. Rather, it specifies a finalization function which is associated with the structure instance, as if by use of the finalize function. This finalization registration takes place as the first step when an instance of the structure is created, before the slots are initialized and the :init code, if any, has been executed. The registration takes place as if by the evaluation of the form (finalize obj (lambda (param) body-form...) t) where obj denotes the structure instance. Note the t argument which requests reverse order of registration, ensuring that if an object has multiple finalizers registered at different levels of inheritance hierarchy, the finalizers specified for a derived structure type are called before inherited finalizers.

The body-form-s of a :fini specifier are not surrounded by an implicit block.

Note that an object's finalizers can be called explicitly with call-finalizers.

The with-objects macro arranges for finalizers to be called on objects when the execution of a scope terminates by any means.

The slot names given in a defstruct must all be unique among themselves, but they may match the names of existing slots in the super base type.

A given structure type can have only one slot under a given symbolic name. If a newly specified slot matches the name of an existing slot in the super type or that type's chain of ancestors, it is called a repeated slot.

The kind of the repeated slot (static or instance) is not inherited; it is established by the defstruct and may be different from the type of the same-named slot in the supertype or its ancestors.

If a repeated slot is introduced as a static slot, and has no init-form then it receives the current of the a static of the same name from the nearest supertype which has such a slot.

If a repeated slot is an instance slot, no such inheritance of value takes place; only the local init-form applies to it; if it is absent, the slot it initialized to nil in each newly created instance of the new type.

However, :init and :postinit initializations are inherited from a base type and they apply to the repeated slots, regardless of their kind. These initializations take place on the instantiated object, and the slot references resolve accordingly.

The initialization for slots which are specified using the :method or :function specifiers is re-ordered with regard to :static slots. Regardless of their placement in the defstruct form, :method and :function slots are initialized before :static slots. This ordering is useful, because it means that when the initialization expression for a given static slot constructs an instance of the struct type, any instance initialization code executing for that instance can use all functions and methods of the struct type. However, note the static slots which follow that slot in the defstruct syntax are not yet initialized. If it is necessary for a structure's initialization code to have access to all static slots, even when the structure is instantiated during the initialization of a static slot, a possible solution may be to use lazy instantiation using the lnew operator, rather than ordinary eager instantiation via new. It is also necessary to ensure that that the instance isn't accessed until all static initializations are complete, since access to the instance slots of a lazily instantiated structure triggers its initialization.

The structure name is specified using two forms, plain name or the syntax (name arg*) If the second form is used, then the structure type will support "boa construction", where "boa" stands for "by order of arguments". The arg-s specify the list of slot names which are to be initialized in the by-order-of-arguments style. For instance, if three slot names are given, then those slots can be optionally initialized by giving three arguments in the new macro or the make-struct function.

Slots are first initialized according to their init-form-s, regardless of whether they are involved in boa construction

A slot initialized in this style still has a init-form which is processed independently of the existence of, and prior to, boa construction.

The boa constructor syntax can specify optional parameters, delimited by a colon, similarly to the lambda syntax. However, the optional parameters may not be arbitrary symbols; they must be symbols which name slots. Moreover, the (name init-form [present-p]) optional parameter syntax isn't supported.

When boa construction is invoked with optional arguments missing, the default values for those arguments come from the init-form-s in the remaining defstruct syntax.


  (defvar *counter* 0)

  ;; New struct type foo with no super type:
  ;; Slots a and b initialize to nil.
  ;; Slot c is initialized by value of (inc *counter*).
  (defstruct foo nil (a b (c (inc *counter*))))

  (new foo) -> #S(foo a nil b nil c 1)
  (new foo) -> #S(foo a nil b nil c 2)

  ;; New struct bar inheriting from foo.
  (defstruct bar foo (c 0) (d 100))

  (new bar) -> #S(bar a nil b nil c 0 d 100)
  (new bar) -> #S(bar a nil b nil c 0 d 100)

  ;; counter was still incremented during
  ;; construction of d:
  *counter* -> 4

  ;; override slots with new arguments
  (new foo a "str" c 17) -> #S(foo a "str" b nil c 17)

  *counter* -> 5

  ;; boa initialization
  (defstruct (point x : y) nil (x 0) (y 0))

  (new point) -> #S(point x 0 y 0)
  (new (point 1 1)) -> #S(point x 1 y 1)

  ;; property list style initialization
  ;; can always be used:
  (new point x 4 y 5) -> #S(point x 4 y 5)

  ;; boa applies last:
  (new (point 1 1) x 4 y 5) -> #S(point x 1 y 1)

  ;; boa with optional argument omitted:
  (new (point 1)) -> #S(point x 1 y 0)

  ;; boa with optional argument omitted and
  ;; with property list style initialization:
  (new (point 1) x 5 y 5) -> #S(point x 1 y 5)


9.19.5 Macro defmeth


type-name name param-list body-form*)


Unless name is one of the two symbols :init or :postinit, the defmeth macro installs a function into the static slot named by the symbol name in the struct type indicated by type-name.

If the structure type doesn't already have such a static slot, it is first added, as if by the static-slot-ensure function, subject to the same checks.

If the function has at least one argument, it can be used as a method. In that situation, the leftmost argument passes the structure instance on which the method is being invoked.

The function takes the arguments specified by the param-list symbols, and its body consists of the body-form-s.

The body-form-s are placed into a block named name.

A method named lambda allows a structure to be used as if it were a function. When arguments are applied to the structure as if it were a function, the lambda method is invoked with those arguments, with the object itself inserted into the leftmost argument position.

If defmeth is used to redefine an existing method, the semantics can be inferred from that of static-slot-ensure. In particular, the method will be imposed into all subtypes which inherit (do not override) the method.

If name is the keyword symbol :init, then instead of operating on a static slot, the macro redefines the initfun of the given structure type, as if by a call to the function struct-set-initfun.

Similarly, if name is the keyword symbol :postinit, then the macro redefines the postinitfun of the given structure type, as if by a call to the function struct-set-postinitfun.

When redefining :initfun the admonishments given in the description of struct-set-initfun apply: if the type has an initfun generated by the defstruct macro, then that initfun is what implements all of the slot initializations given in the slot specifier syntax. These initializations are lost if the initfun is overwritten.

The defmeth macro returns a method name: a unit of syntax of the form (meth type-name name) which can be used as an argument to the accessor symbol-function and other situations.


9.19.6 Macros new and lnew


  (new {
name | (name arg*)} {slot init-form}*)
  (lnew {
name | (name arg*)} {slot init-form}*)


The new macro creates a new instance of the structure type named by name.

If the structure supports "boa construction", then, optionally, the arguments may be given using the syntax (name arg*) instead of name.

Slot values may also be specified by the slot and init-form arguments.

Note: the evaluation order in new is surprising: namely, init-form-s are evaluated before arg-s if both are present.

When the object is constructed, all default initializations take place first. If the object's structure type has a supertype, then the supertype initializations take place. Then the type's initializations take place, followed by the slot init-form overrides from the new macro, and lastly the "boa constructor" overrides.

If any of the initializations abandon the evaluation of new by a non-local exit such as an exception throw, the object's finalizers, if any, are invoked.

The macro lnew differs from new in that it specifies the construction of a lazy struct, as if by the make-lazy-struct function. When lnew is used to construct an instance, a lazy struct is returned immediately, without evaluating any of the the arg and init-form expressions. The expressions are evaluated when any of the object's instance slots is accessed for the first time. At that time, these expressions are evaluated (in the same order as under new) and initialization proceeds in the same way.

If any of the initializations abandon the delayed initializations steps arranged by lnew by a non-local exit such as an exception throw, the object's finalizers, if any, are invoked.

Lazy initialization does not detect cycles. Immediately prior to the lazy initialization of a struct, the struct is marked as no longer requiring initialization. Thus, during initialization, its instance slots may be freely accessed. Slots not yet initialized evaluate as nil.


9.19.7 Macro with-slots


  (with-slots ({
slot | (sym slot)}*) struct-expr


The with-slots binds lexical macros to serve as aliases for the slots of a structure.

The struct-expr argument is expected to be an expression which evaluates to a struct object. It is evaluated once, and its value is retained. The aliases are then established to the slots of the resulting struct value.

The aliases are specified as zero or more expressions which consist of either a single symbol slot or a (sym slot) pair. The simple form binds a macro named slot to a slot also named slot. The pair form binds a macro named sym to a slot named slot.

The lexical aliases are syntactic places: assigning to an alias causes the value to be stored into the slot which it denotes.

After evaluating struct-expr the with-slots macro arranges for the evaluation of body-form-s in the lexical scope in which the aliases are visible.

Dialect Notes:

The intent of the with-slots macro is to help reduce the verbosity of code which makes multiple references to the same slot. Use of with-slots is less necessary in TXR Lisp than other Lisp dialects thanks to the dot operator for accessing struct slots.

Lexical aliases to struct places can also be arranged with considerable convenience using the placelet operator. However, placelet will not bind multiple aliases to multiple slots of the same object such that the expression which produces the object is evaluated only once.


  (defstruct point nil x y)

  ;; Here, with-slots introduces verbosity because
  ;; each slot is accessed only once. The function
  ;; is equivalent to:
  ;; (defun point-delta (p0 p1)
  ;;   (new point x (- p1.x p0.x) y (- p1.y p0.y)))
  ;; Also contrast with the use of placelet:
  ;; (defun point-delta (p0 p1)
  ;;   (placelet ((x0 p0.x) (y0 p0.y)
  ;;              (x1 p1.x) (y1 p1.y))
  ;;     (new point x (- x1 x0) y (- y1 y0)))))

  (defun point-delta (p0 p1)
    (with-slots ((x0 x) (y0 y)) p0
      (with-slots ((x1 x) (y1 y)) p1
        (new point x (- x1 x0) y (- y1 y0)))))


9.19.8 Macro qref


slot | (slot arg*) | [slot arg*]}+)


The qref macro ("quoted reference") performs structure slot access. Structure slot access is more conveniently expressed using the referencing dot notation, which works by translating to qref qref syntax, according to the following equivalence:

  a.b.c.d <--> (qref a b c d)  ;; a b c d must not be numbers

(See the Referencing Dot section under Additional Syntax.)

The leftmost argument of qref is an expression which is evaluated. This argument is followed by one or more reference designators. If there are two or more designators, the following equivalence applies:

  (qref obj d1 d2 ...)  <---> (qref (qref obj d1) d2 ...)

That is to say, qref is applied to the object and a single designator. This must yield an object, which to which the next designator is applied as if by another qref operation, and so forth.

Thus, qref can be understood entirely in terms of the semantics of the binary form (qref object-form designator)

Designators come in three forms: a lone symbol, an ordinary compound expression consisting of a symbol followed by arguments, or a DWIM expression consisting of a symbol followed by arguments.

A lone symbol designator indicates the slot of that name. That is to say, the following equivalence applies:

  (qref o n)  <-->  (slot o 'n)

where slot is the structure slot accessor function. Because slot is an accessor, this form denotes the slot as a syntactic place; slots can be modified via assignment to the qref form and the referencing dot syntax.

The slot name being implicitly quoted is the basis of the term "quoted reference", giving rise to the qref name.

A compound designator indicates that the named slot is a function, and arguments are to be applied to it. The following equivalence applies in this case, except that o is evaluated only once:

  (qref o (n arg ...)) <--> (call (slot o 'n) o arg ...)

A DWIM designator indicates that the named slot is a function or an indexable or callable object. The following equivalence applies:

  (qref obj [name arg ...])  <-->  [(slot obj 'name) arg ...]


  (defstruct foo nil
    (array (vec 1 2 3))
    (increment (lambda (self index delta)
                 (inc [self.array index] delta))))

  (defvarl s (new foo))

  ;; access third element of s.array:
  s.[array 2]  -->  3

  ;; increment first element of array by 42
  s.(increment 0 42)  -->  43

  ;; access array member
  s.array  -->  #(43 2 3)

Note how increment behaves much like a single-argument-dispatch object-oriented method. Firstly, the syntax s.(increment 0 42) effectively selects the increment function which is particular to the s object. Secondly, the object is passed to the selected function as the leftmost argument, so that the function has access to the object.


9.19.9 Macro uref


  (uref {
slot | (slot arg*) | [slot arg*]}+)


The uref macro ("unbound reference") expands to an expression which evaluates to a function. The function takes exactly one argument: an object. When the function is invoked on an object, it references slots or methods relative to that object.

Note: the uref syntax may be used directly, but it is also produced by the unbound referencing dot syntactic sugar:

  .a          -->  (uref a)
  .(f x)      -->  (uref (f x))
  .(f x).b    -->  (uref (f x) b)
  .a.(f x).b  -->  (uref a (f x) b)

The macro may be understood in terms of the following translation scheme:

  (uref a b c ...)  -->  (lambda (o) (qref o a b c ...))

where o is understood to be a unique symbol (for instance, as produced by the gensym function).

When only one uref argument is present, these equivalences also hold:

  (uref (f a b c ...))  <-->  (umeth f a b c ...)
  (uref s)  <-->  (usl s)

The terminology "unbound reference" refers to the property that uref expressions produce a function which isn't bound to a structure object. The function binds a slot or method; the call to that function then binds an object to that function, as an argument.


Suppose that the objects in