Manpage for TXR

Mar 17, 2024


[expand all]
8 TXR LISP [+]



TXR - Programming Language (Version 294)



txr [ options ] [ script-file [ arguments ... ]]



TXR is a general-purpose, multi-paradigm programming language. It comprises two languages integrated into a single tool: a text scanning and extraction language referred to as the TXR Pattern Language (sometimes just "TXR"), and a general-purpose dialect of Lisp called TXR Lisp.

TXR can be used for everything from "one liner" data transformation tasks at the command line, to data scanning and extracting scripts, to full application development in a wide range of areas.

A script written in the TXR Pattern Language, also referred to in this document as a query, specifies a pattern which matches one or more sources of inputs, such as text files. Patterns can consist of large chunks of multiline free-form text, which is matched literally against material in the input sources. Free variables occurring in the pattern (denoted by the @ symbol) are bound to the pieces of text occurring in the corresponding positions. Patterns can be arbitrarily complex, and can be broken down into named pattern functions, which may be mutually recursive.

In addition to embedded variables which implicitly match text, the TXR pattern language supports a number of directives, for matching text using regular expressions, for continuing a match in another file, for searching through a file for the place where an entire subquery matches, for collecting lists, and for combining subqueries using logical conjunction, disjunction and negation, and numerous others.

Patterns can contain actions which transform data and generate output. These actions can be embedded anywhere within the pattern-matching logic. A common structure for small TXR scripts is to perform a complete matching session at the top of the script, and then deal with processing and reporting at the bottom.

The TXR Lisp language can be used from within TXR scripts as an embedded language, or completely standalone. It supports functional, imperative and object-oriented programming, and provides numerous data types such as symbols, strings, vectors, hash tables with weak reference support, lazy lists, and arbitrary-precision ("bignum") integers. It has an expressive foreign function interface (FFI) for calling into libraries and other software components that support C-language-style calls.

TXR Lisp source files as well as individual functions can be optionally compiled for execution on a virtual machine that is built into TXR. Compiled files execute and load faster, and resist reverse-engineering. Standalone application delivery is possible.

TXR is free software offered under the two-clause BSD license which places almost no restrictions on redistribution, and allows every conceivable use, of the whole software or any constituent part, royalty-free, free of charge, and free of any restrictions.



If TXR is given no arguments, it will enter into an interactive mode. See the INTERACTIVE LISTENER section for a description of this mode. When TXR enters interactive mode this way, it prints a one-line banner announcing the program name and version, and one line of help text instructing the user how to exit.

If TXR is invoked under the name txrlisp, it behaves as if the --lisp option had been specified before any other option. Similarly, if TXR is invoked under the name txrvm, it behaves as if the --compiled option had been given.

Unless the -c or -f options are present, the first non-option argument is treated as a script-file which is executed. This is described after the following descriptions of all of the options. Any additional arguments have no fixed meaning; they are available to the TXR query or TXR Lisp application for specifying input files to be processed, or other meanings under the control of the application.

Options which don't take an argument may be combined together. The -v and -q options are mutually exclusive. Of these two, the one which occurs in the rightmost position in the argument list dominates. The -c and -f options are also mutually exclusive; if both are specified, it is a fatal error.

Bind the variable var to the value value prior to processing the query. The name is in scope over the entire query, so that all occurrences of the variable are substituted and match the equivalent text. If the value contains commas, these are interpreted as separators, which give rise to a list value. For instance -Dvar=a,b,c binds var to the list of the strings "a", "b" and "c". (See the @(collect) directive.) List variables provide a multiple match. That is to say, if a list variable occurs in a query, a successful match occurs if any of its values matches the text. If more than one value matches the text, the first one is taken.

Binds the variable var to an empty string value prior to processing the query.

Quiet operation during matching. Certain error messages are not reported on the standard error device (but if the situations occur, they still fail the query). This option does not suppress error generation during the parsing of the query, only during its execution.

If this option is present, then TXR will enter into an interactive interpretation mode after processing all options, and the input query if one is present. See the INTERACTIVE LISTENER section for a description of this mode.

Invoke the interactive TXR debugger. See the DEBUGGER section. Implies --backtrace.

Turns on the establishment of backtrace frames for function calls so that a backtrace can be produced when an unhandled exception occurs, and in other situations. Backtraces are helpful in identifying the causes of errors, but require extra stack space and slow down execution.

This option affects behavior related to TXR's *stdin* stream. It also has a another, unrelated effect, on the behavior of the interactive listener; see below.

Normally, if this stream is connected to a terminal device, it is automatically marked as having the real-time property when TXR starts up (see the functions stream-set-prop and real-time-stream-p). The -n option suppresses this behavior; the *stdin* stream remains ordinary.

The TXR pattern language reads standard input via a lazy list, created by applying the lazy-stream-cons function to the *stdin* stream. If that stream is marked real-time, then the lazy list which is returned by that function has behaviors that are better suited for scanning interactive input. A more detailed explanation is given under the description of this function.

If the -n option is effect and TXR enters into the interactive listener, the listener operates in plain mode instead of the visual mode. The listener reads buffered lines from the operating system without any character-based editing features or history navigation. In plain mode, no prompts appear and no terminal control escape sequences are generated. The only output is the results of evaluation, related diagnostic messages, and any output generated by the evaluated expressions themselves.

Verbose operation. Detailed logging is enabled.

This option binds a Lisp global lexical variable (as if by the defparml function) to an object described by Lisp syntax. It requires an argument of the form sym=value where sym must be, syntactically, a token denoting a bindable symbol, and value is arbitrary TXR Lisp syntax. The sym syntax is converted to the symbol it denotes, which is bound as a global lexical variable, if it is not already a variable. The value syntax is parsed to the Lisp object it denotes. This object is not subject to evaluation; the object itself is stored into the variable binding denoted by sym. Note that if sym already exists as a global variable, then it is simply overwritten. If sym is marked special, then it stays special.

If the query is successful, print the variable bindings as a sequence of assignments in shell syntax that can be eval-ed by a POSIX shell. II the query fails, print the word "false". Evaluation of this word by the shell has the effect of producing an unsuccessful termination status from the shell's eval command.

This option implies -B. Print the variable bindings in Lisp syntax instead of shell syntax.

-a num
This option implies -B. The decimal integer argument num specifies the maximum number of array dimensions to use for list-valued variable bindings. The default is 1. Additional dimensions are expressed using numeric suffixes in the generated variable names. For instance, consider the three-dimensional list arising out of a triply nested collect: ((("a" "b") ("c" "d")) (("e" "f") ("g" "h"))). Suppose this is bound to a variable V. With -a 1, this will be reported as:


With -a 2, it comes out as:


The leftmost bracketed index is the most major index. That is to say, the dimension order is: NAME_m_m+1_..._n[1][2]...[m-1].

-c query
Specifies the query in the form of a command-line argument. If this option is used, the script-file argument is omitted. The first non-option argument, if there is one, now specifies the first input source rather than a query. Unlike queries read from a file, (nonempty) queries specified as arguments using -c do not have to properly end in a newline. Internally, TXR adds the missing newline before parsing the query. Thus -c "@a" is a valid query which matches a line.


Shell script which uses TXR to read two lines "1" and "2" from standard input, binding them to variables a and b. Standard input is specified as - and the data comes from shell "here document" redirection:


 txr -B -c "@a
 @b" - <<!


The @; comment syntax can be used for better formatting:

  txr -B -c "@;

-f script-file
Provides a way to specify the file from which the query is to be read, as an alternative to using the main script-file argument. This is useful in #! ("hash bang") scripts. (See Hash-Bang Support below.) Use of this option does not affect the order of processing. All of the options are processed first, before the script-file is read, as if it were specified by the main script-file argument. If the argument to -f is - (dash) then the script will be read from standard input instead of a file. If this option is used, the first non-option argument, if there is one, no longer specifies the script-file. It is an argument to the script, such as the name of an input source.

-e expressions
Evaluates zero or more TXR Lisp expressions for their side effects, without implicitly printing their values. Can be specified more than once. The argument may be empty, in which case the argument has no effect, since it calls for an empty sequence of forms to be evaluated.

The script-file argument becomes optional if at least one -e, -p, -P or -t option is processed.

If the evaluation of every expression evaluated this way terminates normally, and there is no script-file argument, then TXR terminates with a successful status, instead of entering the interactive listener. The -i option can be used to request the listener.

-p expression
The argument must specify exactly one valid TXR Lisp form. If this is successfully parsed and evaluated, the value of the expression is printed as if using the prinl function.

-P expression
Like -p but prints using the pprinl function.

-t expression
Like -p but prints using the tprint function.

-C number
Requests TXR to behave in a manner that is compatible with the specified version of TXR. This makes a difference in situations when a release of TXR breaks backward compatibility. If some version N+1 deliberately introduces a change which is backward incompatible, then -C N can be used to request the old behavior.

The requested value of N can be too low, in which case TXR will complain and exit with an unsuccessful termination status. This indicates that TXR refuses to be compatible with such an old version. Users requiring the behavior of that version will have to install an older version of TXR which supports that behavior, or even that exact version.

If the option is specified more than once, the behavior is not specified.

Compatibility can also be requested via the TXR_COMPAT environment variable instead of the -C option.

For more information, see the COMPATIBILITY section.

The number argument to this option must be a decimal integer. It represents a megabyte value, the "GC delta": one megabyte is 1048576 bytes. The "GC delta" controls an aspect of the garbage collector behavior. See the gc-set-delta function for a description.

This option turns on debugging, like --debugger but also requests stepping into the autoload processing of TXR Lisp library code. Normally, debugging through the evaluations triggered by autoloading is suppressed. Implies --backtrace.

This option turns on debugging, like --debugger but also requests stepping into the parse-time macro-expansion of TXR Lisp code embedded in TXR queries. Normally, this is suppressed. Implies --backtrace.

Prints usage summary on standard output, and terminates successfully.

Prints the software license. This depends on the software being installed such that the LICENSE file is in the data directory. Use of TXR implies agreement with the liability disclaimer in the license.

Prints a message on standard output which includes the program version, and then immediately causes TXR to terminate with a successful status.

If TXR was built with an embedded build ID string, this option prints that string. Otherwise nothing is printed. In either case, TXR then immediately terminates with a successful status.

The --args option provides a way to encode multiple arguments as a single argument, which is useful on some systems which have limitations in their implementation of the hash-bang mechanism. For details about its special syntax, see Hash-Bang Support below. It is also useful in standalone application deployment. See the section STANDALONE APPLICATION SUPPORT, in which example uses of --args are shown.

The --eargs option (extended --args) is like --args but must be followed by an argument. The argument is removed from the argument list and substituted in place of occurrences of {} among the arguments expanded from the --eargs syntax.

These options influence the treatment of query files which do not have a recognized suffix indicating their type. The --lisp option causes a file with an unrecognized suffix, or no suffix, to be treated as Lisp source; --compiled causes it to be treated as a compiled TXR Lisp file. Moreover, --lisp and --compiled influence the suffix search. By default, when a query file name does not have a recognizable suffix, and the file does not exist, TXR adds the ".txr" suffix to the name and tries opening that name, and in a similar way tries ".tlo", ".tlo.gz" and finally ".tl". In this situation, if either of these two options is specified, TXR tries only the ".tlo", ".tlo.gz" and ".tl" suffixes, in that order, avoiding the ".txr" suffix. The search order is always ".tlo" first, then ".tl" regardless of whether --lisp or --compiled is specified.

Note that --lisp and --compiled influence how the argument of the -f option is treated, but only if they precede that option.

If the file has a recognized suffix: ".tl", ".tlo", ".tlo.gz", ".txr" or ".txr_profile", then these options have no effect. The suffix determines the interpretation of the content. Moreover, no suffix search takes place: only the given path name is tried.

On platforms which support the POSIX exec family of functions, this option causes TXR to re-execute itself. The re-executed image receives the remaining arguments which follow the --reexec argument. Note: this option is useful for supporting setuid operation in hash-hang scripts. On some platforms, the interpreter designated by a hash-bang script runs without altered privilege, even if that interpreter is installed setuid. If the interpreter is executed directly, then setuid applies to it, but not if it is executed via hash bang. If the --reexec option is used in the interpreter command line of such a script, the interpreter will re-execute itself, thereby gaining the setuid privilege. The re-executed image will then obtain the script name from the arguments which are passed to it and determine whether that script will run setuid. See the section SETUID/SETGID OPERATION.

If entering the interactive listener, suppress the reading of the .txr_profile in the home directory. See the Interactive Profile File subsection in the INTERACTIVE LISTENER section of the manual.

This option enables a behavior which stresses the garbage collector with frequent garbage collection requests. The purpose is to make it more likely to reproduce certain kinds of bugs. Use of this option severely degrades the performance of TXR.

If TXR is enabled with Valgrind support, then this option is available. It enables code which uses the Valgrind API to integrate with the Valgrind debugger, for more accurate tracking of garbage collected objects. For example, objects which have been reclaimed by the garbage collector are marked as inaccessible, and marked as uninitialized when they are allocated again.

This option specifies that all memory allocated by TXR should be freed upon normal termination. This behavior is useful for debugging memory leaks. An accurate leak detection tool, such as the one built into Valgrind, should report zero leaked or still reachable memory if --free-all has been used and TXR has terminated normally. that indicates either a leak in TXR, a leak or global object retention in a platform library, or else a a leak introduced due to misuse of FFI.
If this option is used, then regular expressions are all treated using the derivative-based back-end. The NFA-based regex implementation is disabled. Normally, only regular expressions which require the intersection and complement operators are handled using the derivative back-end. This option makes it possible to test that back-end on test cases that it wouldn't normally receive.

This option changes to the specified package, by finding the package of the specified name and assigning that to the *package* special variable. If the package is not found, a diagnostic is issued, and TXR terminates unsuccessfully. The package thus specified is visible to the subsequent occurrences of the -e family of options as well as of the --compile option. It does not affect the value of *package* which is in effect when a script-file is executed or when the interactive listener is entered.

This option invokes the compile-update-file on source-file. If target-file is specified, it is passed to compile-update-file as the target argument; otherwise, that argument is defaulted. The option can be used multiple times to process multiple files. Unsuccessful compilation throws an exception, causing TXR to terminate abnormally. Similarly to the -e option, if this option is used at least once, and all of the invocations are successful, and there is no script-file argument, then TXR terminates with a successful status instead of entering the interactive listener. The -i option can be used request the listener.

Signifies the end of the option list.

This argument is not interpreted as an option, but treated as a filename argument. After the first such argument, no more options are recognized. Even if another argument looks like an option, it is treated as a name. This special argument - means "read from standard input" instead of a file. The script-file, or any of the data files, may be specified using this option. If two or more files are specified as -, the behavior is system-dependent. It may be possible to indicate EOF from the interactive terminal, and then specify more input which is interpreted as the second file, and so forth.

After the options, the remaining arguments are treated as follows.

If neither the -f nor the -c options were specified, then the first argument is treated as the script-file. If no arguments are present, then TXR enters interactive mode, provided that none of the -e, -p, -P or -t options had been processed, in which case it instead terminates.

The TXR Pattern Language has features for implicitly treating the subsequent command-line arguments as input files. It follows the convention that an argument consisting of a single - (dash) character specifies that standard input is to be used, instead of opening a file. If the query does not use the @(next) directive to select an alternative data source, and a pattern-matching construct is processed which demands data, then the first argument will be opened as a data source. Arguments not opened as data sources can be assigned alternative meanings and uses, or can be ignored entirely, under control of the query.

Specifying standard input as a source with an explicit - argument is unnecessary. If no arguments are present, then TXR scans standard input by default. This was not true in versions of TXR prior to 171; see the COMPATIBILITY section.

TXR begins by reading the script, which is given as the contents of the argument of the -c option, or else as the contents of an input source specified by the -f option or by the script-file argument. If -f or the script-file argument specify - (dash) then the script is read from standard input.

In the case of the TXR pattern language, the entire query is scanned, internalized, and then begins executing, if it is free of syntax errors. (TXR Lisp is processed differently, form by form.) On the other hand, the pattern language reads data files in a lazy manner. A file isn't opened until the query demands material from that file, and then the contents are read on demand, not all at once.

The suffix of the script-file is significant. If the name has no suffix, or if it has a ".txr" suffix, then it is assumed to be in the TXR pattern language. If it has the ".tl" suffix, then it is assumed to be TXR Lisp. The --lisp and --compiled options change the treatment of unsuffixed script file names, causing them to be interpreted as TXR Lisp source or compiled TXR Lisp, respectively.

If a file name is specified which does not have a recognized suffix, and names a file which doesn't exist, then TXR adds the ".txr" suffix and tries again. If that doesn't exist, another attempt is made with the ".tlo" suffix, which will be treated as as a TXR Lisp compiled file. If that doesn't exist, then ".tlo.gz" is tried, expected to be a file compressed in gzip format. Finally, if that doesn't exist, the ".tl" suffix is tried, which will be treated as containing TXR Lisp source. If either the --lisp or --compiled option has been specified, then TXR skips trying the ".txr" suffix, and tries only ".tlo" followed by ".tlo.gz" and ".tl".

A TXR Lisp file is processed as if by the load macro: forms from the file are read and evaluated. If the forms do not terminate the TXR process or throw an exception, and there are no syntax errors, then TXR terminates successfully after evaluating the last form. If syntax errors are encountered in a form, then TXR terminates unsuccessfully. TXR Lisp is documented in the section TXR LISP.

If a query file is specified, but no file arguments, it is up to the query to open a file, pipe or standard input via the @(next) directive prior to attempting to make a match. If a query attempts to match text, but has run out of files to process, the match fails.



TXR sends errors and verbose logs to the standard error device. The following paragraphs apply when TXR is run without enabling verbose mode with -v, or the printing of variable bindings with -B or -a.

If the command-line arguments are incorrect, TXR issues an error diagnostic and terminates with a failed status.

If the script-file specifies a query, and the query has a malformed syntax, TXR likewise issues error diagnostics and terminates with a failed status.

If the query fails due to a mismatch, TXR terminates with a failed status. No diagnostics are issued.

If the query is well-formed, and matches, then TXR issues no diagnostics, and terminates with a successful status.

In verbose mode (option -v), TXR issues diagnostics on the standard error device even in situations which are not erroneous.

In bindings-printing mode (options -B or -a), TXR prints the word false if the query fails, and exits with a failed termination status. If the query succeeds, the variable bindings, if any, are output on standard output.

If the script-file is TXR Lisp, then it is processed form by form. Each top-level Lisp form is evaluated after it is read. If any form is syntactically malformed, TXR issues diagnostics and terminates unsuccessfully. This is somewhat different from how the pattern language is treated: a script in the pattern language is parsed in its entirety before being executed.





A query may contain comments which are delimited by the sequence @; and extend to the end of the line. Whitespace can occur between the @ and ;. A comment which begins on a line swallows that entire line, as well as the newline which terminates it. In essence, the entire comment line disappears. If the comment follows some material in a line, then it does not consume the newline. Thus, the following two queries are equivalent:

 @a@; comment: match whole line against variable @a
 @; this comment disappears entirely


The comment after the @a does not consume the newline, but the comment which follows does. Without this intuitive behavior, line comment would give rise to empty lines that must match empty lines in the data, leading to spurious mismatches.

Instead of the ; character, the # character can be used. This is an obsolescent feature.


6.2 Hash-Bang Support

TXR has several features which support use of the hash-bang convention for creating apparently standalone executable programs.


6.2.1 Basic Hash Bang

Special processing is applied to TXR query or TXR Lisp script files that are specified on the command line via the -f option or as the first non-option argument. If the first line of such a file begins with the characters #!, that entire line is consumed and processed specially.

This removal allows for TXR queries to be turned into standalone executable programs in the POSIX environment using the hash-bang mechanism. Unlike most interpreters, TXR applies special processing to the #! line, which is described below, in the section Argument Generation with the Null Hack.

Shell session example: create a simple executable program called "twoline.txr" and run it. This assumes TXR is installed in /usr/bin.

  $ cat > hello.txr
  @(bind a "Hey")
  Hello, world!
  $ chmod a+x hello.txr
  $ ./hello.txr
  Hello, world!

When this plain hash-bang line is used, TXR receives the name of the script as an argument. Therefore, it is not possible to pass additional options to TXR. For instance, if the above script is invoked like this

  $ ./hello.txr -B

the -B option isn't processed by TXR, but treated as an additional argument, just as if txr script-file -B had been executed directly.

This behavior is useful if the script author wants not to expose the TXR options to the user of the script.

However, the hash-bang line can use the -f option:

  #!/usr/bin/txr -f

Now, the name of the script is passed as an argument to the -f option, and TXR will look for more options after that, so that the resulting program appears to accept TXR options. Now we can run

  $ ./hello.txr -B
  Hello, world!

The -B option is honored.


6.2.2 Argument Generation with --args and --eargs

On some operating systems, it is not possible to pass more than one argument through the hash-bang mechanism. That is to say, this will not work.

  #!/usr/bin/txr -B -f

To support systems like this, TXR supports the special argument --args, as well as an extended version, --eargs. With --args, it is possible to encode multiple arguments into one argument. The --args option must be followed by a separator character, chosen by the programmer. The characters after that are split into multiple arguments on the separator character. The --args option is then removed from the argument list and replaced with these arguments, which are processed in its place.


  #!/usr/bin/txr --args:-B:-f

The above has the same behavior as

  #!/usr/bin/txr -B -f

on a system which supports multiple arguments in the hash-bang line. The separator character is the colon, and so the remainder of that argument, -B:-f, is split into the two arguments -B -f.

The --eargs option is similar to --args, but must be followed by one more argument. After --eargs performs the argument splitting in the same manner as --args, any of the arguments which it produces which are the two-character sequence {} are replaced with that following argument. Whether or not the replacement occurs, that following argument is then removed.


  #!/usr/bin/txr --eargs:-B:{}:--foo:42

This has an effect which cannot be replicated in any known implementation of the hash-bang mechanism. Suppose that this hash-bang line is placed in a script called script.txr. When this script is invoked with arguments, as in:

  script.txr a b c

then TXR is invoked similarly to:

  /usr/bin/txr --eargs:-B:{}:--foo:42 script.txr a b c

Then, when --eargs processing takes place, firstly the argument sequence

  -B {} --foo 42

is produced by splitting into four fields using the : (colon) character as the separator. Then, within these four fields, all occurrences of {} are replaced with the following argument script.txr, resulting in:

  -B script.txr --foo 42

Furthermore, that script.txr argument is removed from the remaining argument list.

The four arguments are then substituted in place of the original --eargs:-B:{}:--foo:42 syntax.

The resulting TXR invocation is, therefore:

  /usr/bin/txr -B script.txr --foo 42 a b c

Thus, --eargs allows some arguments to be encoded into the interpreter script, such that script name is inserted anywhere among them, possibly multiple times. Arguments for the interpreter can be encoded, as well as arguments to be processed by the script.


6.2.3 Argument Generation with the Null Hack

The --args and --eargs mechanisms do not solve the following problem: the POSIX env utility is often exploited for its PATH searching capability, and used to express hash-bang scripts in the following way:

  #!/usr/bin/env txr

Here, the env utility searches for the txr program in the directories indicated by the PATH variable, which liberates the script from having to encode the exact location where the program is installed. However, if the operating system allows only one argument in the hash-bang mechanism, then no arguments can be passed to the program.

To mitigate this problem, TXR supports a special feature in its hash-bang support. If the hash-bang line contains a null byte, then the text from after the null byte until the end of the line is split into fields using the space character as a separator, and these fields are inserted into the command line. This manipulation happens during command-line processing, i.e. prior to the execution of the file. If this processing is applied to a file that is specified using the -f option, then the arguments which arise from the special processing are inserted after that option and its argument. If this processing is applied to the file which is the first non-option argument, then the options are inserted before that argument. However, care is taken not to process that argument a second time. In either situation, processing of the command-line options continues, and the arguments which are processed next are the ones which were just inserted. This is true even if the options had been inserted as a result of processing the first non-option argument, which would ordinarily signal the termination of option processing.

In the following examples, it is assumed that the script is named, and invoked, as /home/jenny/foo.txr, and is given arguments --bar abc, and that txr resolves to /usr/bin/txr. The <NUL> code indicates a literal ASCII NUL character (the zero byte).

Basic example:

  #!/usr/bin/env txr<NUL>-a 3

Here, env searches for txr, finding it in /usr/bin. Thus, including the executable name, TXR receives this full argument list:

  /usr/bin/txr /home/jenny/foo.txr --bar abc

The first non-option argument is the name of the script. TXR opens the script, and notices that it begins with a hash-bang line. It consumes the hash-bang line and finds the null byte inside it, retrieving the character string after it, which is "-a 3". This is split into the two arguments -a and 3, which are then inserted into the command line ahead of the the script name. The effective command line then becomes:

  /usr/bin/txr -a 3 /home/jenny/foo.txr --bar abc

Command-line option processing continues, beginning with the -a option. After the option is processed, /home/jenny/foo.txr is encountered again. This time it is not opened a second time; it signals the end of option processing, exactly as it would immediately do if it hadn't triggered the insertion of any arguments.

Advanced example: use env to invoke txr, passing options to the interpreter and to the script:

  #!/usr/bin/env txr<NUL>--eargs:-C:175:{}:--debug

This example shows how --eargs can be used in conjunction with the null hack. When txr begins executing, it receives the arguments

  /usr/bin/txr /home/jenny/foo.txr

The script file is opened, and the arguments delimited by the null character in the hash-bang line are inserted, resulting in the effective command line:

  /usr/bin/txr --eargs:-C:175:{}:--debug /home/jenny/foo.txr

Next, --eargs is processed in the ordinary way, transforming the command line into:

  /usr/bin/txr -C 175 /home/jenny/foo.txr --debug

The name of the script file is encountered, and signals the end of option processing. Thus txr receives the -C option, instructing it to emulate some behaviors from version 175, and the /home/jenny/foo.txr script receives --debug as its argument: it executes with the *args* list containing one element, the character string "--debug".

The hash-bang null-hack feature was introduced in TXR 177. Previous versions ignore the hash-bang line, performing no special processing. Where a risk exists that programs which depend on the feature might be executed by an older version of TXR, care must be taken to detect and handle that situation, either by means of the txr-version variable, or else by some logic which infers that the processing of the hash-bang line hasn't been performed.


6.2.4 Passing Options to TXR via Hash-Bang Null Hack

It is possible to use the Hash-Bang Null Hack, such that the resulting executable program recognizes TXR options. This is made possible by a special behavior in the processing of the -f option.

For instance, suppose that the effect of the following familiar hash-bang line is required:

  #!/path/to/txr -f

However, suppose there is also a requirement to use the env utility to find TXR. Furthermore, the operating system allows only one hash-bang argument. Using the Null Hack, this is rewritten as:

  #!/usr/bin/env txr<NUL>-f

then if the script is invoked with arguments -a b c, the command line will ultimately be transformed into:

  /path/to/txr -f /path/to/scriptfile -i a b c

which allows TXR to process the -i option, leaving a, b and c as arguments for the script.

However, note that there is a subtle issue with the -f option that has been inserted via the Null Hack: namely, this insertion happens after TXR has opened the script file and read the hash-bang line from it. This means that when the inserted -f option is being processed, the script file is already open. A special behavior occurs. The -f option processing notices that the argument to -f is identical to the pathname of name of the script file that TXR has already opened for processing. The -f option and its argument are then skipped.


6.2.5 Hash Bang and Setuid

TXR supports setuid hash-bang scripting, even on platforms that do not support setuid and setgid attributes on hash-bang scripts. On such platforms, TXR has to be installed setuid/setgid. See the section SETUID/SETGID OPERATION. On some platforms, it may also be necessary to to use the --reexec option.


6.3 Whitespace

Outside of directives, whitespace is significant in TXR queries, and represents a pattern match for whitespace in the input. An extent of text consisting of an undivided mixture of tabs and spaces is a whitespace token.

Whitespace tokens match a precisely identical piece of whitespace in the input, with one exception: a whitespace token consisting of precisely one space has a special meaning. It is equivalent to the regular expression @/[ ]+/: match an extent of one or more spaces (but not tabs!). Multiple consecutive spaces do not have this meaning.

Thus, the query line "a b" (one space between a and b) matches "a b" with any number of spaces between the two letters.

For matching a single space, the syntax @\ can be used (backslash-escaped space).

It is more often necessary to match multiple spaces than to match exactly one space, so this rule simplifies many queries and inconveniences only a few.

In output clauses, string and character literals and quasiliterals, a space token denotes a space.


6.4 Text

Query material which is not escaped by the special character @ is literal text, which matches input character for character. Text which occurs at the beginning of a line matches the beginning of a line. Text which starts in the middle of a line, other than following a variable, must match exactly at the current position, where the previous match left off. Moreover, if the text is the last element in the line, its match is anchored to the end of the line.

An empty query line matches an empty line in the input. Note that an empty input stream does not contain any lines, and therefore is not matched by an empty line. An empty line in the input is represented by a newline character which is either the first character of the file, or follows a previous newline-terminated line.

Input streams which end without terminating their last line with a newline are tolerated, and are treated as if they had the terminator.

Text which follows a variable has special semantics, described in the section Variables below.

A query may not leave a line of input partially matched. If any portion of a line of input is matched, it must be entirely matched, otherwise a matching failure results. However, a query may leave unmatched lines. Matching only four lines of a ten-line file is not a matching failure. The eof directive can be used to explicitly match the end of a file.

In the following example, the query matches the text, even though the text has an extra line.

 Four score and seven
 years ago our

 Four score and seven
 years ago our

In the following example, the query fails to match the text, because the text has extra material on one line that is not matched:

 I can carry nearly eighty gigs
 in my head

 I can carry nearly eighty gigs of data
 in my head

Needless to say, if the text has insufficient material relative to the query, that is a failure also.

To match arbitrary material from the current position to the end of a line, the "match any sequence of characters, including empty" regular expression @/.*/ can be used. Example:

 I can carry nearly eighty gigs@/.*/

 I can carry nearly eighty gigs of data

In this example, the query matches, since the regular expression matches the string "of data". (See the Regular Expressions section below.)

Another way to do this is:

 I can carry nearly eighty gigs@(skip)


6.5 Special Characters in Text

Control characters may be embedded directly in a query (with the exception of newline characters). An alternative to embedding is to use escape syntax. The following escapes are supported:

A backslash immediately followed by a newline introduces a physical line break without breaking up the logical line. Material following this sequence continues to be interpreted as a continuation of the previous line, so that indentation can be introduced to show the continuation without appearing in the data.
A backslash followed by a space encodes a space. This is useful in line continuations when it is necessary for some or all of the leading spaces to be preserved. For instance the two line sequence

    @\  efg

is equivalent to the line

  abcd  efg

The two spaces before the @\ in the second line are consumed. The spaces after are preserved.

Alert character (ASCII 7, BEL).
Backspace (ASCII 8, BS).
Horizontal tab (ASCII 9, HT).
Line feed (ASCII 10, LF). Serves as abstract newline on POSIX systems.
Vertical tab (ASCII 11, VT).
Form feed (ASCII 12, FF). This character clears the screen on many kinds of terminals, or ejects a page of text from a line printer.
Carriage return (ASCII 13, CR).
Escape (ASCII 27, ESC)
A @\x immediately followed by a sequence of hex digits is interpreted as a hexadecimal numeric character code. For instance @\x41 is the ASCII character A. If a semicolon character immediately follows the hex digits, it is consumed, and characters which follow are not considered part of the hex escape even if they are hex digits.
A @\ immediately followed by a sequence of octal digits (0 through 7) is interpreted as an octal character code. For instance @\010 is character 8, same as @\b. If a semicolon character immediately follows the octal digits, it is consumed, and subsequent characters are not treated as part of the octal escape, even if they are octal digits.

Note that if a newline is embedded into a query line with @\n, this does not split the line into two; it's embedded into the line and thus cannot match anything. However, @\n may be useful in the @(cat) directive and in @(output).


6.6 Character Handling and International Characters

TXR represents text internally using wide characters, which are used to represent Unicode code points. Script source code, as well as all data sources, are assumed to be in the UTF-8 encoding. In TXR and TXR Lisp source, extended characters can be used directly in comments, literal text, string literals, quasiliterals and regular expressions. Extended characters can also be expressed indirectly using hexadecimal or octal escapes. On some platforms, wide characters may be restricted to 16 bits, so that TXR can only work with characters in the BMP (Basic Multilingual Plane) subset of Unicode.

TXR does not use the localization features of the system library; its handling of extended characters is not affected by environment variables like LANG and L_CTYPE. The program reads and writes only the UTF-8 encoding.

TXR deals with UTF-8 separately in its parser and in its I/O streams implementation.

TXR's text streams perform UTF-8 conversion internally, such that TXR applications use Unicode code points.

In text streams, invalid UTF-8 bytes are treated as follows. When an invalid byte is encountered in the middle of a multibyte character, or if the input ends in the middle of a multibyte character, or if an invalid character is decoded, such as an overlong from, or code in the range U+DC00 through U+DCFF, the UTF-8 decoder returns to the starting byte of the ill-formed multibyte character, and extracts just one byte, mapping that byte to the Unicode character range U+DC00 through U+DCFF, producing that code point as the decoded result. The decoder is then reset to its initial state and begins decoding at the following byte, where the same algorithm is repeated.

Furthermore, because TXR internally uses a null-terminated character representation of strings which easily interoperates with C language interfaces, when a null character is read from a stream, TXR converts it to the code U+DC00. On output, this code converts back to a null byte, as explained in the previous paragraph. By means of this representational trick, TXR can handle textual data containing null bytes.

In contrast to the above, the TXR parser scans raw UTF-8 bytes from a binary stream, rather than using a text stream. The parser performing its own recognition of UTF-8 sequences in certain language constructs, using a UTF-8 decoder only when processing certain kinds of tokens.

Comments are read without regard for encoding, so invalid encoding bytes in comments are not detected. A comment is simply a sequence of bytes terminated by a newline.

Invalid UTF-8 encountered while scanning identifiers and character names in character literal (hash-backslash) syntax is diagnosed as a syntax error.

UTF-8 in string literals is treated in the same way as UTF-8 in text streams. Invalid UTF-8 bytes are mapped into code points in the U+DC000 through U+DCFF range, and incorporated as such into the resulting string object which the literal denotes. The same remarks apply to regular-expression literals.


6.7 Regular Expression Directives

In place of a piece of text (see section Text above), a regular-expression directive may be used, which has the following syntax:


where the RE part enclosed in slashes represents regular-expression syntax (described in the section Regular Expressions below).

Long regular expressions can be broken into multiple lines using a backslash-newline sequence. Whitespace before the sequence or after the sequence is not significant, so the following two are equivalent:

  @/reg \


There may not be whitespace between the backslash and newline.

Whereas literal text simply represents itself, regular expression denotes a (potentially infinite) set of texts. The regular-expression directive matches the longest piece of text (possibly empty) which belongs to the set denoted by the regular expression. The match is anchored to the current position; thus if the directive is the first element of a line, the match is anchored to the start of a line. If the regular-expression directive is the last element of a line, it is anchored to the end of the line also: the regular expression must match the text from the current position to the end of the line.

Even if the regular expression matches the empty string, the match will fail if the input is empty, or has run out of data. For instance suppose the third line of the query is the regular expression @/.*/, but the input is a file which has only two lines. This will fail: the data has no line for the regular expression to match. A line containing no characters is not the same thing as the absence of a line, even though both abstractions imply an absence of characters.

Like text which follows a variable, a regular-expression directive which follows a variable has special semantics, described in the section Variables below.


6.8 Variables

Much of the query syntax consists of arbitrary text, which matches file data character for character. Embedded within the query may be variables and directives which are introduced by a @ character. Two consecutive @@ characters encode a literal @.

A variable-matching or substitution directive is written in one of several ways:

bident /regex/}
bident (fun [arg ...])}
bident number}
bident bident}

The forms with an * indicate a long match, see Longest Match below. The forms with the embedded regexp /regex/ or function or number have special semantics; see Positive Match below.

The identifier t cannot be used as a name; it is a reserved symbol which denotes the value true. An attempt to use the variable @t will result in an exception. The symbol nil can be used where a variable name is required syntactically, but it has special semantics, described in a section below.

A sident is a "simple identifier" form which is not delimited by braces.

A sident consists of any combination of one or more letters, numbers, and underscores. It may not look like a number, so that for instance 123 is not a valid sident, but 12A is valid. Case is sensitive, so that FOO is different from foo, which is different from Foo.

The braces around an identifier can be used when material which follows would otherwise be interpreted as being part of the identifier. When a name is enclosed in braces it is a bident.

The following additional characters may be used as part of a bident which are not allowed in a sident:

  ! $ % & * + - < = > ? \ ~

Moreover, most Unicode characters beyond U+007F may appear in a bident, with certain exceptions. A character may not be used if it is any of the Unicode space characters, a member of the high or low surrogate region, a member of any Unicode private-use area, or is either of the two characters U+FFFE and U+FFFF. These situations produce a syntax error. Invalid UTF-8 in an identifier is also a syntax error.

The rule still holds that a name cannot look like a number so +123 is not a valid bident but these are valid: a->b, *xyz*, foo-bar.

The syntax @FOO_bar introduces the name FOO_bar, whereas @{FOO}_bar means the variable named "FOO" followed by the text "_bar". There may be whitespace between the @ and the name, or opening brace. Whitespace is also allowed in the interior of the braces. It is not significant.

If a variable has no prior binding, then it specifies a match. The match is determined from some current position in the data: the character which immediately follows all that has been matched previously. If a variable occurs at the start of a line, it matches some text at the start of the line. If it occurs at the end of a line, it matches everything from the current position to the end of the line.


6.9 Negative Match

If a variable is one of the plain forms


then this is a "negative match". The extent of the matched text (the text bound to the variable) is determined by looking at what follows the variable, and ranges from the current position to some position where the following material finds a match. This is why this is called a "negative match": the spanned text which ends up bound to the variable is that in which the match for the trailing material did not occur.

A variable may be followed by a piece of text, a regular-expression directive, a function call, a directive, another variable, or nothing (i.e. occurs at the end of a line). These cases are described in detail below.


6.9.1 Variable Followed by Nothing

If the variable is followed by nothing, the negative match extends from the current position in the data, to the end of the line. Example:
 a b c @FOO
 a b c defghijk


6.9.2 Variable Followed by Text

For the purposes of determining the negative match, text is defined as a sequence of literal text and regular expressions, not divided by a directive. So for instance in this example:

  @a:@/foo/bcd e@(maybe)f@(end)

the variable a is considered to be followed by ":@/foo/bcd e".

If a variable is followed by text, then the extent of the negative match is determined by searching for the first occurrence of that text within the line, starting at the current position.

The variable matches everything between the current position and the matching position (not including the matching position). Any whitespace which follows the variable (and is not enclosed inside braces that surround the variable name) is part of the text. For example:

 a b @FOO e f
 a b c d e f
 FOO="c d"

In the above example, the pattern text "a b " matches the data "a b ". So when the @FOO variable is processed, the data being matched is the remaining "c d e f". The text which follows @FOO is " e f". This is found within the data "c d e f" at position 3 (counting from 0). So positions 0–2 ("c d") constitute the matching text which is bound to FOO.


6.9.3 Variable Followed by a Function Call or Directive

If the variable is followed by a function call, or a directive, the extent is determined by scanning the text for the first position where a match occurs for the entire remainder of the line. (For a description of functions, see Functions.)

For example:

  @foo@(bind a "abc")xyz

Here, @foo will match the text from the current position to where "xyz" occurs, even though there is a @(bind) directive. Furthermore, if more material is added after the "xyz", it is part of the search. Note the difference between the following two:


In the first example, @foo matches the text from the current position until the match for the regular expression "abc". @(func) is not considered when processing @foo. In the second example, @foo matches the text from the current position until the position which matches the function call, followed by a match for the regular expression. The entire sequence @(func)@/abc/ is considered.


6.9.4 Consecutive Variables

If an unbound variable specifies a fixed-width match or a regular expression, then the issue of consecutive variables does not arise. Such a variable consumes text regardless of any context which follows it.

However, what if an unbound variable with no modifier is followed by another variable? The behavior depends on the nature of the other variable.

If the other variable is also unbound, and also has no modifier, this is a semantic error which will cause the query to fail. A diagnostic message will be issued, unless operating in quiet mode via -q. The reason is that there is no way to bind two consecutive variables to an extent of text; this is an ambiguous situation, since there is no matching criterion for dividing the text between two variables. (In theory, a repetition of the same variable, like @FOO@FOO, could find a solution by dividing the match extent in half, which would work only in the case when it contains an even number of characters. This behavior seems to have dubious value.)

An unbound variable may be followed by one which is bound. The bound variable is effectively replaced by the text which it denotes, and the logic proceeds accordingly.

It is possible for a variable to be bound to a regular expression. If x is an unbound variable and y is bound to a regular expression RE, then @x@y means @x@/RE/. A variable v can be bound to a regular expression using, for example, @(bind v #/RE/).

The @* syntax for longest match is available. Example:

 FOO=xyz, BAR=def

Here, FOO is matched with "xyz", based on the delimiting around the colon. The colon in the pattern then matches the colon in the data, so that BAR is considered for matching against "defxyz". BAR is followed by FOO, which is already bound to "xyz". Thus "xyz" is located in the "defxyz" data following "def", and so BAR is bound to "def".

If an unbound variable is followed by a variable which is bound to a list, or nested list, then each character string in the list is tried in turn to produce a match. The first match is taken.

An unbound variable may be followed by another unbound variable which specifies a regular expression or function call match. This is a special case called a "double variable match". What happens is that the text is searched using the regular expression or function. If the search fails, then neither variable is bound: it is a matching failure. If the search succeeds, then the first variable is bound to the text which is skipped by the search. The second variable is bound to the text matched by the regular expression or function. Example:

 @foo@{bar /abc/}
 foo="xyz@#", BAR="abc"


6.9.5 Consecutive Variables via Directive

Two variables can be de facto consecutive in a manner shown in the following example:


This is treated just like the variable followed by directive. No semantic error is identified, even if both variables are unbound. Here, @var2 matches everything at the current position, and so @var1 ends up bound to the empty string.

Example 1: b matches at position 0 and a binds the empty string:


Example 2: *a specifies longest match (see Longest Match below), and so it takes everything:



6.9.6 Longest Match

The closest-match behavior for the negative match can be overridden to longest match behavior. A special syntax is provided for this: an asterisk between the @ and the variable, e.g.:
 a @*{FOO}cd
 a b cdcdcdcd
 FOO="b cdcdcd"

 a @{FOO}cd
 a b cdcdcd
 FOO="b "

In the former example, the match extends to the rightmost occurrence of "cd", and so FOO receives "b cdcdcd". In the latter example, the * syntax isn't used, and so a leftmost match takes place. The extent covers only the "b ", stopping at the first "cd" occurrence.


6.10 Positive Match

There are syntactic variants of variable syntax which have an embedded expression enclosed with the variable in braces:

bident /regex/}
bident (fun [args ...])}
bident number}
bident bident}

These specify a variable binding that is driven by a positive match derived from a regular expression, function or character count, rather than from trailing material (which is regarded as a "negative" match, since the variable is bound to material which is skipped in order to match the trailing material).

The positive match syntax is processed without considering any following syntax, and therefore may be followed by an unbound variable.

In the @{bident /regex/} form, the match extends over all characters from the current position which match the regular expression regex. (See the Regular Expressions section below.) If the variable already has a value, the text extracted by the regular expression must exactly match the variable.

In the @{bident (fun [args ...])} form, the match extends over lines or characters which are matched by the call to the function, if the call succeeds. Thus @{x (y z w)} is just like @(y z w), except that the region of text skipped over by @(y z w) is also bound to the variable x. Except in one special case, the matching takes place horizontally within the current line, and the spanned range of text is treated as a string. The exception is that if the @{bident (fun [args ...])} appears as the only element of a line, and fun has a binding as a vertical function, then the function is invoked in the same manner as it would be by the @(fun [args ...]) syntax. Then the variable indicated by bident is bound to the list of lines matched by the function call. Pattern functions are described in the Functions section below. The function is invoked even if the variable already has a value. The text matched by the function must match the variable.

In the @{bident number} form, the match processes a field of text which consists of the specified number of characters, which must be a nonnegative number. If the data line doesn't have that many characters starting at the current position, the match fails. A match for zero characters produces an empty string. The text which is actually bound to the variable is all text within the specified field, but excluding leading and trailing whitespace. If the field contains only spaces, then an empty string is extracted. This fixed-field extraction takes place whether or not the variable already has a binding. If it already has a binding, then it must match the extracted, trimmed text.

The @{bident bident} syntax allows the number or regex modifier to come from a variable. The variable must be bound and contain a nonnegative integer or regular expression. For example, @{x y} behaves like @{x 3} if y is bound to the integer 3. It is an error if y is unbound.


6.11 Special Symbols nil and t

Just like in the Common Lisp language, the names nil and t are special.

nil symbol stands for the empty list object, an object which marks the end of a list, and Boolean false. It is synonymous with the syntax () which may be used interchangeably with nil in most constructs.

In TXR Lisp, nil and t cannot be used as variables. When evaluated, they evaluate to themselves.

In the TXR pattern language, nil can be used in the variable binding syntax, but does not create a binding; it has a special meaning. It allows the variable-matching syntax to be used to skip material, in ways similar to the skip directive.

The nil symbol is also used as a block name, both in the TXR pattern language and in TXR Lisp. A block named nil is considered to be anonymous.


6.12 Keyword Symbols

Names beginning with the : (colon) character are keyword symbols. These also stand for themselves and may not be used as variables. Keywords are useful for labeling information and situations.


6.13 Regular Expressions

Regular expressions are a language for specifying sets of character strings. Through the use of pattern-matching elements, a regular expression is able to denote an infinite set of texts. TXR contains an original implementation of regular expressions, which supports the following syntax:

The period is a "wildcard" that matches any character.
Character class: matches a single character, from the set specified by special syntax written between the square brackets. This supports basic regexp character class syntax. POSIX notation like [:digit:] is not supported. The regex tokens \s, \d and \w are permitted in character classes, but not their complementing counterparts. These tokens simply contribute their characters to the class. The class [a-zA-Z] means match an uppercase or lowercase letter; the class [0-9a-f] means match a digit or a lowercase letter; the class [^0-9] means match a non-digit, and so forth. There are no locale-specific behaviors in TXR regular expressions; [A-Z] denotes an ASCII/Unicode range of characters. The class [\d.] means match a digit or the period character. A ] or - can be used within a character class, but must be escaped with a backslash. A ^ in the first position denotes a complemented class, unless it is escaped by backslash. In any other position, it denotes itself. Two backslashes code for one backslash. So for instance [\[\-] means match a [ or - character, [^^] means match any character other than ^, and [\^\\] means match either a ^ or a backslash. Regex operators such as *, + and & appearing in a character class represent ordinary characters. The characters -, ] and ^ occurring outside of a character class are ordinary. Unescaped / characters can appear within a character class. The empty character class [] matches no character at all, and its complement [^] matches any character, and is treated as a synonym for the . (period) wildcard operator.
\s, \w and \d
These regex tokens each match a single character. The \s regex token matches a wide variety of ASCII whitespace characters and Unicode spaces. The \w token matches alphabetic word characters; it is equivalent to the character class [A-Za-z_]. The \d token matches a digit, and is equivalent to [0-9].
\S, \W and \D
These regex tokens are the complemented counterparts of \s, \w and \d. The \S token matches all those characters which \s does not match, \W matches all characters that \w does not match and \D matches nondigits.
An empty expression is a regular expression. It represents the set of strings consisting of the empty string; i.e. it matches just the empty string. The empty regex can appear alone as a full regular expression (for instance the TXR syntax @// with nothing between the slashes) and can also be passed as a subexpression to operators, though this may require the use of parentheses to make the empty regex explicit. For example, the expression a| means: match either a, or nothing. The forms * and (*) are syntax errors; though not useful, the correct way to match the empty expression zero or more times is the syntax ()*.
The nomatch regular expression represents the empty set: it matches no strings at all, not even the empty string. There is no dedicated syntax to directly express nomatch in the regex language. However, the empty character class [] is equivalent to nomatch, and may be considered to be a notation for it. Other representations of nomatch are possible: for instance, the regex ~.* which is the complement of the regex that denotes the set of all possible strings, and thus denotes the empty set. A nomatch has uses; for instance, it can be used to temporarily "comment out" regular expressions. The regex ([]abc|xyz) is equivalent to (xyz), since the []abc branch cannot match anything. Using [] to "block" a subexpression allows you to leave it in place, then enable it later by removing the "block".
If R is a regular expression, then so is (R). The contents of parentheses denote one regular expression unit, so that for instance in (RE)*, the * operator applies to the entire parenthesized group. The syntax () is valid and equivalent to the empty regular expression.
Optionally match the preceding regular expression R.
Match the expression R zero or more times. This operator is sometimes called the "Kleene star", or "Kleene closure". The Kleene closure favors the longest match. Roughly speaking, if there are two or more ways in which R1*R2 can match, then that match occurs in which R1* matches the longest possible text.
Match the preceding expression R one or more times. Like R*, this favors the longest possible match: R+ is equivalent to RR*.
Match R1 zero or more times, then match R2. If this match can occur in more than one way, then it occurs such that R1 is matched the fewest number of times, which is opposite from the behavior of R1*R2. Repetitions of R1 terminate at the earliest point in the text where a nonempty match for R2 occurs. Because it favors shorter matches, % is termed a non-greedy operator. If R2 is the empty expression, or equivalent to it, then R1%R2 reduces to R1*. So for instance (R%) is equivalent to (R*), since the missing right operand is interpreted as the empty regex. Note that whereas the expression (R1*R2) is equivalent to (R1*)R2, the expression (R1%R2) is not equivalent to (R1%)R2. Also note that A(XY%Z)B is equivalent to AX(Y%Z)B. This is because the precedence of % is higher than that of catenation on its left side; this rule prevents the given syntax from expressing the XY catenation. The expression may be understood as: A(X(Y%Z))B where the inner parentheses clarify how the syntax surrounding the % operator is being parsed, and the outer parentheses are superfluous. The correct way to assert catenation of XY as the left operand of % is A(XY)%ZB. To specify XY as the left operand, and limit the right operand to just Z, the correct syntax is A((XY)%Z)B. By contrast, the expression A(X%YZ)B is not equivalent to A(X%Y)ZB because the precedence of % is lower than that of catenation on its right side. The operator is effectively "bi-precedential".
Match the opposite of the following expression R; that is, match exactly those texts that R does not match. This operator is called complement, or logical not.
Two consecutive regular expressions denote catenation: the left expression must match, and then the right.
Match either the expression R1 or R2. This operator is known by a number of names: union, logical or, disjunction, branch, or alternative.
Match both the expression R1 and R2 simultaneously; i.e. the matching text must be one of the texts which are in the intersection of the set of texts matched by R1 and the set matched by R2. This operator is called intersection, logical and, or conjunction.

Any character which is not a regular-expression operator, a backslash escape, or the slash delimiter, denotes a one-position match of that character itself.

Any of the special characters, including the delimiting /, and the backslash, can be escaped with a backslash to suppress its meaning and denote the character itself.

Furthermore, all of the same escapes that are described in the section Special Characters in Text above are supported — the difference is that in regular expressions, the @ character is not required, so for example a tab is coded as \t rather than @\t. Octal and hex character escapes can be optionally terminated by a semicolon, which is useful if the following characters are octal or hex digits not intended to be part of the escape.

Only the above escapes are supported. Unlike in some other regular-expression implementations, if a backlash appears before a character which isn't a regex special character or one of the supported escape sequences, it is an error. This wasn't true of historic versions of TXR. See the COMPATIBILITY section.

Precedence table, highest to lowest:
(R) []primary
R? R+ R* R%...postfixleft-to-right
~R ...%Runaryright-to-left

The % operator is like a postfix operator with respect to its left operand, but like a unary operator with respect to its right operand. Thus a~b%c~d is a(~(b%(c(~d)))), demonstrating right-to-left associativity, where all of b% may be regarded as a unary operator being applied to c~d. Similarly, a?*+%b means (((a?)*)+)%b, where the trailing %b behaves like a postfix operator.

In TXR, regular expression matches do not span multiple lines. The regex language has no feature for multiline matching. However, the @(freeform) directive allows the remaining portion of the input to be treated as one string in which line terminators appear as explicit characters. Regular expressions may freely match through this sequence.

It's possible for a regular expression to match an empty string. For instance, if the next input character is z, facing the regular expression /a?/, there is a zero-character match: the regular expression's state machine can reach an acceptance state without consuming any characters. Examples:


 @{A /a?/}@B
 A="", B="zzzz"


In the first example, variable @A is followed by a regular expression which can match an empty string. The expression faces the letter z at position 0 in the data line. A zero-character match occurs there, therefore the variable A takes on the empty string. The @/.*/ regular expression then consumes the line.

Similarly, in the second example, the /a?/ regular expression faces a z, and thus yields an empty string which is bound to A. Variable @B consumes the entire line.

The third example requests the longest match for the variable binding. Thus, a search takes place for the rightmost position where the regular expression matches. The regular expression matches anywhere, including the empty string after the last character, which is the rightmost place. Thus variable A fetches the entire line.

For additional information about the advanced regular-expression operators, see NOTES ON EXOTIC REGULAR EXPRESSIONS below.


6.14 Compound Expressions

If the @ escape character is followed by an open parenthesis or square bracket, this is taken to be the start of a TXR Lisp compound expression.

The TXR language has the unusual property that its syntactic elements, so-called directives, are Lisp compound expressions. These expressions not only enclose syntax, but expressions which begin with certain symbols de facto behave as tokens in a phrase structure grammar. For instance, the expression @(collect) begins a block which must be terminated by the expression @(end), otherwise there is a syntax error. The collect expression can contain arguments which modify the behavior of the construct, for instance @(collect :gap 0 :vars (a b)). In some ways, this situation might be compared to HTML, in which an element such as <a> must be terminated by </a> and can have attributes such as <a href="...">.

Compound expressions contain subexpressions which are other compound expressions or literal objects of various kinds. Among these are: symbols, numbers, string literals, character literals, quasiliterals and regular expressions. These are described in the following sections. Additional kinds of literal objects exist, which are discussed in the TXR LISP section of the manual.

Some examples of compound expressions are:


  (a b c (d e f))

  (  a (b (c d) (e  ) ))

  ("apple" #\b #\space 3)

  (a #/[a-z]*/ b)

  (_ `@file.txt`)

Symbols occurring in a compound expression follow a slightly more permissive lexical syntax than the bident in the syntax @{bident} introduced earlier. The / (slash) character may be part of an identifier, or even constitute an entire identifier. In fact a symbol inside a directive is a lident. This is described in the Symbol Tokens section under TXR LISP. A symbol must not be a number; tokens that look like numbers are treated as numbers and not symbols.


6.15 Character Literals

Character literals are introduced by the #\ (hash-backslash) syntax, which is either followed by a character name, the letter x followed by hex digits, the letter o followed by octal digits, or a single character. Valid character names are:

  nul                 linefeed            return
  alarm               newline             esc
  backspace           vtab                space
  tab                 page                pnul

For instance #\esc denotes the escape character.

This convention for character literals is similar to that of the Scheme language. Note that #\linefeed and #\newline are the same character. The #\pnul character is specific to TXR and denotes the U+DC00 code in Unicode; the name stands for "pseudo-null", which is related to its special function. For more information about this, see the section "Character Handling and International Characters".


6.16 String Literals

String literals are delimited by double quotes. A double quote within a string literal is encoded using \" and a backslash is encoded as \\. Backslash escapes like \n and \t are recognized, as are hexadecimal escapes like \xFF or \xabc and octal escapes like \123. Ambiguity between an escape and subsequent text can be resolved by adding a semicolon delimiter after the escape: "\xabc;d" is a string consisting of the character U+0ABC followed by "d". The semicolon delimiter disappears. To write a literal semicolon immediately after a hex or octal escape, write two semicolons, the first of which will be interpreted as a delimiter. Thus, "\x21;;" represents "!;".

Note that the source code syntax of TXR string literals is specified in UTF-8, which is decoded into an internal string representation consisting of code points. The numeric escape sequences are an abstract syntax for specifying code points, not for specifying bytes to be inserted into the UTF-8 representation, even if they lie in the 8-bit range. Bytes cannot be directly specified, other than literally. However, when a TXR string object is encoded to UTF-8, every code point lying in the range U+DC00 through U+DCFF is converted to a single byte by taking the low-order eight bits of its value. By manipulating code points in this special range, TXR programs can reproduce arbitrary byte sequences in text streams. Also note that the \u escape sequence for specifying code points found in some languages is unnecessary and absent, since the existing hexadecimal and octal escapes satisfy this requirement. More detailed information is given in the earlier section Character Handling and International Characters.

If the line ends in the middle of a literal, it is an error, unless the last character is a backslash. This backslash is a special escape which does not denote a character; rather, it indicates that the string literal continues on the next line. The backslash is deleted, along with whitespace which immediately precedes it, as well as leading whitespace in the following line. The escape sequence "\ " (backslash space) can be used to encode a significant space.


  "foo   \

  "foo   \
  \ bar"

  "foo\  \

The first string literal is the string "foobar". The second two are "foo bar".


6.17 Word List Literals

A word list literal (WLL) provides a convenient way to write a list of strings when such a list can be given as whitespace-delimited words.

There are two flavors of the WLL: the regular WLL which begins with #" (hash, double quote) and the splicing list literal which begins with #*" (hash, star, double quote).

Both types are terminated by a double quote, which may be escaped as \" in order to include it as a character. All the escaping conventions used in string literals can be used in word literals.

Unlike in string literals, whitespace (tabs and spaces) is not significant in word literals: it separates words. A whitespace character may be escaped with a backslash in order to include it as a literal character.

Just like in string literals, an unescaped newline character is not allowed. A newline preceded by a backslash is permitted. Such an escaped backslash, together with any leading and trailing unescaped whitespace, is removed and replaced with a single space.


  #"abc def ghi"   --> notates ("abc" "def" "ghi")

  #"abc   def \
      ghi"         --> notates ("abc" "def" "ghi")

  #"abc\ def ghi" --> notates ("abc def" "ghi")

  #"abc\ def\ \
   \ ghi"         --> notates ("abc def " " ghi")

A splicing word literal differs from a word literal in that it does not produce a list of string literals, but rather it produces a sequence of string literals that is merged into the surrounding syntax. Thus, the following two notations are equivalent:

  (1 2 3 #*"abc def" 4 5 #"abc def")

  (1 2 3 "abc" "def" 4 5 ("abc" "def"))

The regular WLL produced a single list object, but the splicing WLL expanded into multiple string literal objects.


6.18 String Quasiliterals

Quasiliterals are similar to string literals, except that they may contain variable references denoted by the usual @ syntax. The quasiliteral represents a string formed by substituting the values of those variables into the literal template. If a is bound to "apple" and b to "banana", the quasiliteral `one @a and two @{b}s` represents the string "one apple and two bananas". A backquote escaped by a backslash represents itself. Unlike in directive syntax, two consecutive @ characters do not code for a literal @, but cause a syntax error. The reason for this is that compounding of the @ syntax is meaningful. Instead, there is a \@ escape for encoding a literal @ character. Quasiliterals support the full output variable syntax. Expressions within variable substitutions follow the evaluation rules of TXR Lisp. This hasn't always been the case: see the COMPATIBILITY section.

Quasiliterals can be split into multiple lines in the same way as ordinary string literals.


6.19 Quasiword List Literals

The quasiword list literals (QLLs) are to quasiliterals what WLLs are to ordinary literals. (See the above section Word List Literals.)

A QLL combines the convenience of the WLL with the power of quasistrings.

Just as in the case of WLLs, there are two flavors of the QLL: the regular QLL which begins with #` (hash, backquote) and the splicing QLL which begins with #*` (hash, star, backquote).

Both types are terminated by a backquote, which may be escaped as \` in order to include it as a character. All the escaping conventions used in quasiliterals can be used in QLLs.

Unlike in quasiliterals, whitespace (tabs and spaces) is not significant in QLLs: it separates words. A whitespace character may be escaped with a backslash in order to include it as a literal character.

A newline is not permitted unless escaped. An escaped newline works exactly the same way as it does in WLLs.

Note that the delimiting into words is done before the variable substitution. If the variable a contains spaces, then #`@a` nevertheless expands into a list of one item: the string derived from a.


  #`abc @a ghi`  --> notates (`abc` `@a` `ghi`)

  #`abc   @d@e@f \
  ghi`            --> notates (`abc` `@d@e@f` `ghi`)

  #`@a\ @b @c` --> notates (`@a @b` `@c`)

A splicing QLL differs from an ordinary QLL in that it does not produce a list of quasiliterals, but rather it produces a sequence of quasiliterals that is merged into the surrounding syntax.


6.20 Numbers

TXR supports integers and floating-point numbers.

An integer literal is made up of digits 0 through 9, optionally preceded by a + or - sign. The character , (comma) may appear between digits, as a visual separator of no semantic significance. The digit sequence must start and end with a digit. Runs of consecutive commas are permitted. Commas outside of the digit sequence are interpreted as the Lisp unquote syntax.

Compatibility node: support for separator commas appeared in TXR 283. Older TXR versions will interpret commas in the middle of numeric constants as instances of the unquote syntax.


  1,2,3,,4  ;; equivalent to 1234

Examples that are not integer tokens:

  ,123   ;; equivalent to (sys:unquote 123)
  123,a  ;; equivalent to 123, followed by (sys:unquote a)
  -,1    ;; symbol -  followed by (sys:unquote 1)

An integer constant can also be specified in hexadecimal using the prefix #x followed by an optional sign, followed by hexadecimal digits: 0 through 9 and the uppercase or lowercase letters A through F:

  #xFF    ;; 255
  #x-ABC  ;; -2748

These digits may contain separator commas, just as in the case of the decimal integer:


Similarly, octal numbers are supported with the prefix #o followed by octal digits:

  #o777         ;; 511
  #o123,456     ;; 42797

and binary numbers can be written with a #b prefix:

  #b1110        ;; 14
  #b1111,1111   ;; 255

A comma between the radix prefix and digits is a syntax error:

  #x,DEF5,549C  ;; Syntax error
  #b,1001,1101  ;; Likewise

Note that the #b prefix is also used for buffer literals.

A floating-point literal is marked by the inclusion of a decimal point, the scientific E notation, or both. It is an optional sign, followed by a mantissa consisting of digits, a decimal point, more digits, and then an optional E notation consisting of the letter e or E, an optional + or - sign, and then digits indicating the exponent value. In the mantissa, the digits are not optional. At least one digit must either precede the decimal point or follow it. That is to say, a decimal point by itself is not a floating-point constant.

The digits of the mantissa may include separator commas, in the same manner as decimal integer literals, in both the integer and fractional part. The digits of the exponent may not include separator commas.



Examples which are not floating-point constant tokens:

  .      ;; dot token, not a number
  123E   ;; the symbol 123E
  1.0E-  ;; syntax error: invalid floating point constant
  1.0E   ;; syntax error: invalid floating point constant
  1.E    ;; syntax error: invalid floating point literal
  .e     ;; syntax error: dot token followed by symbol
  ,1.0   ;; equivalent to (sys:unquote 1.0)

In TXR there is a special "dotdot" token consisting of two consecutive periods. An integer constant followed immediately by dotdot is recognized as such; it is not treated as a floating constant followed by a dot. That is to say, 123.. does not mean 123. . (floating point 123.0 value followed by dot token). It means 123 .. (integer 123 followed by .. token).

Dialect Note: unlike in Common Lisp, 123. is not an integer, but the floating-point number 123.0.

Integers within a certain small range centered on zero have fixnum type. Values in the fixnum range fit into a Lisp value directly, not requiring heap allocation. A value which is implemented as a reference to a heap-allocated object is called boxed, whereas a self-contained value not referencing any storage elsewhere is called unboxed. Thus values in the fixnum are unboxed; those outside of the range have bignum type instead, and are boxed. The variables fixnum-min and fixnum-max indicate the range.

Floating-point values are all unboxed if TXR is built with "NaN boxing" enabled, otherwise they are all boxed. The Lisp expression (eq (read "0.0") (read "0.0")) returns t under NaN boxing, indicating that the two instances of 0.0 are the same object. In the absence of NaN boxing, the two read calls produce distinct, boxed representations of 0.0, which compare unequal under eq. (The expression (eq 0.0 0.0) may not be relied upon if it is compiled, since compilation may deduplicate identical boxed literals, leading to a false positive.)



Comments of the form @; were introduced earlier. Inside compound expressions, another convention for comments exists: Lisp comments, which are introduced by the ; (semicolon) character and span to the end of the line.


  @(foo  ; this is a comment
    bar  ; this is another comment

This is equivalent to @(foo bar).




7.1 Overview

When a TXR Lisp compound expression occurs in TXR preceded by a @, it is a directive.

Directives which are based on certain symbols are, additionally, involved in a phrase-structure syntax which uses Lisp expressions as if they were tokens.

For instance, the directive


not only denotes a compound expression with the collect symbol in its head position, but it also introduces a syntactic phrase which requires a matching @(end) directive. In other words, @(collect) is not only an expression, but serves as a kind of token in a higher-level, phrase-structure grammar.

Effectively, collect is a reserved symbol in the TXR language. A TXR program cannot use this symbol as the name of a pattern function due to its role in the syntax. The symbol has no reserved role in TXR Lisp.

Usually if this type of directive occurs alone in a line, not preceded or followed by other material, it is involved in a "vertical" (or line-oriented) syntax.

If such a directive is embedded in a line (has preceding or trailing material) then it is in a horizontal syntactic and semantic context (character-oriented).

There is an exception: the definition of a horizontal function looks like this:

  @(define name (arg))body material@(end)

Yet, this is considered one vertical item, which means that it does not match a line of data. (This is necessary because all horizontal syntax matches something within a line of data, which is undesirable for definitions.)

Many directives exhibit both horizontal and vertical syntax, with different but closely related semantics. Some are vertical only, some are horizontal only.

A summary of the available directives follows:

Explicitly match the end of file. Fails if unmatched data remains in the input stream. Can capture or match the termination status of a pipe.

Explicitly match the end of line. Fails if the current position is not the end of a line. Also fails if no data remains (there is no current line).

Continue matching in another file or data source.

Groups together a sequence of directives into a logical name block, which can be explicitly terminated from within by using the @(accept) and @(fail) directives. Blocks are described in the section Blocks below.

Treat the remaining query as a subquery unit, and search the lines (or characters) of the input file until that subquery matches somewhere. A skip is also an anonymous block.

Treat the remaining query or subquery as a match for a trailing context. That is to say, if the remainder matches, the data position is not advanced.

Treat the remainder of the input as one big string, and apply the following query line to that string. The newline characters (or custom separators) appear explicitly in that string.

The fuzz directive, inspired by the patch utility, specifies a partial match for some lines.

@(line) and @(chr)
These directives match a variable or expression against the current line number or character position.

Match a variable against the name of the current data source.

Match a variable against the remaining data (a lazy list of strings).

Multiple clauses are each applied to the same input. Succeeds if at least one of the clauses matches the input. The bindings established by earlier successful clauses are visible to the later clauses.

Multiple clauses are applied to the same input. Succeeds if and only if each one of the clauses matches. The clauses are applied in sequence, and evaluation stops on the first failure. The bindings established by earlier successful clauses are visible to the later clauses.

Multiple clauses are applied to the same input. Succeeds if and only if none of them match. The clauses are applied in sequence, and evaluation stops on the first success. No bindings are ever produced by this construct.

Multiple clauses are applied to the same input. No failure occurs if none of them match. The bindings established by earlier successful clauses are visible to the later clauses.

Multiple clauses are applied to the same input. Evaluation stops on the first successful clause.

The require directive is similar to the do directive in that it evaluates one or more TXR Lisp expressions. If the result of the rightmost expression is nil, then require triggers a match failure. See the TXR LISP section far below.

@(if), @(elif), and @(else)
The if directive with optional elif and else clauses allows one of multiple bodies of pattern-matching directives to be conditionally selected by testing the values of Lisp expressions. It is also available inside @(output) for conditionally selecting output clauses.

Multiple clauses are applied to the same input. The one whose effect persists is the one which maximizes or minimizes the length of a particular variable.

The @(empty) directive matches the empty string. It is useful in certain situations, such as expressing an empty match in a directive that doesn't accept an empty clause. The @(empty) syntax has another meaning in @(output) clauses, in conjunction with @(repeat).

@(define name (args ...))
Introduces a function. Functions are described in the Functions section below.

@(call expr arg*)
Performs function indirection. Evaluates expr, which must produce a symbol that names a pattern function. Then that pattern function is invoked.

Searches text for matches for multiple clauses which may occur in arbitrary order. For convenience, lines of the first clause are treated as separate clauses.

Search the data for multiple matches of a clause. Collect the bindings in the clause into lists, which are output as array variables. The @(collect) directive is line-oriented. It works with a multiline pattern and scans line by line. A similar directive called @(coll) works within one line.

A collect is an anonymous block.

Separator of clauses for @(some), @(all), @(none), @(maybe) and @(cases). Equivalent to @(or). The choice is stylistic.

Separator of clauses for @(some), @(all), @(none), @(maybe) and @(cases). Equivalent to @(and). The choice is stylistic.

Required terminator for @(some), @(all), @(none), @(maybe), @(cases), @(if), @(collect), @(coll), @(output), @(repeat), @(rep), @(try), @(block) and @(define).

Terminate the processing of a block, as if it were a failed match. Blocks are described in the section Blocks below.

Terminate the processing of a block, as if it were a successful match. What bindings emerge may depend on the kind of block: collect has special semantics. Blocks are described in the section Blocks below.

Indicates the start of a try block, which is related to exception handling, described in the Exceptions section below.

@(catch) and @(finally)
Special clauses within @(try). See Exceptions below.

@(defex) and @(throw)
Define custom exception types; throw an exception. See Exceptions below.

The assert directive requires the following material to match, otherwise it throws an exception. It is useful for catching mistakes or omissions in parts of a query that are surefire matches.

Normalizes a set of specified variables to one-dimensional lists. Those variables which have a scalar value are reduced to lists of that value. Those which are lists of lists (to an arbitrary level of nesting) are converted to flat lists of their leaf values.

Binds a new variable which is the result of merging two or more other variables. Merging has somewhat complicated semantics.

Decimates a list (any number of dimensions) to a string, by catenating its constituent strings, with an optional separator string between all of the values.

Binds one or more variables against a value using a structural pattern match. A limited form of unification takes place which can cause a match to fail.

Destructively assigns one or more existing variables using a structural pattern, using syntax similar to bind. Assignment to unbound variables triggers an error.

Evaluates an expression in the current binding environment, and then creates new bindings for the variables in the structural pattern. Useful for temporarily overriding variable values in a scope.

Removes variable bindings.

Synonym of @(forget).

@(output) and @(push)
A directive which encloses an output clause in the query. An output section does not match text, but produces text which can be directed to various destinations, the default being standard output. Most directives cannot be used inside an output clause. The @(push) clause is a variant of @(output) which produces text that implicitly pushed back into the input stream to be matched.

A directive understood within an @(output) section, for repeating multiline text, with successive substitutions pulled from lists. The directive @(rep) produces iteration over lists horizontally within one line. These directives have a different meaning in matching clauses, providing a shorthand notation for @(collect :vars nil) and @(coll :vars nil), respectively.

The deffilter directive is used for defining named filters, which are useful for filtering variable substitutions in output blocks. Filters are useful when data must be translated between different representations that have different special characters or other syntax, requiring escaping or similar treatment. Note that it is also possible to use a function as a filter. See Function Filters below.

Named filters are stored in the hash table held in the Lisp special variable *filters*.

The filter directive passes one or more variables through a given filter or chain or filters, updating them with the filtered values.

@(load) and @(include)
The load and include directives allow TXR programs to be modularized. They bring in code from a file, in two different ways.

The do directive is used to evaluate TXR Lisp expressions, discarding their result values. See the TXR LISP section far below.

The mdo (macro do) directive evaluates TXR Lisp expressions immediately, during the parsing of the TXR syntax in which it occurs.

The in-package directive is used to switch to a different symbol package. It mirrors the TXR Lisp macro of the same name.


7.2 Subexpression Evaluation

Some directives contain subexpressions which are evaluated. Two distinct styles of evaluations occur in TXR: bind expressions and Lisp expressions. Which semantics applies to an expression depends on the syntactic context in which it occurs: which position in which directive.

The evaluation of TXR Lisp expressions is described in the TXR LISP section of the manual.

Bind expressions are so named because they occur in the @(bind) directive. TXR pattern function invocations also treat argument expressions as bind expressions.

The @(rebind), @(set), @(merge), and @(deffilter) directives also use bind expression evaluation. Bind expression evaluation also occurs in the argument position of the :tlist keyword in the @(next) directive.

Unlike Lisp expressions, bind expressions do not support operators. If a bind expression is a nested list structure, it is a template denoting that structure. Any symbol in any position of that structure is interpreted as a variable. When the bind expression is evaluated, those corresponding positions in the template are replaced by the values of the variables.

Anywhere where a variable can appear in a bind expression's nested list structure, a Lisp expression can appear preceded by the @ character. That Lisp expression is evaluated and its value is substituted into the bind expression's template.

Moreover, a Lisp expression preceded by @ can be used as an entire bind expression. The value of that Lisp expression is then taken as the bind expression value.

Any object in a bind expression which is not a nested list structure containing Lisp expressions or variables denotes itself literally.


In the following examples, the variables a and b are assumed to have the string values "foo" and "bar", respectively.

The -> notation indicates the value of each expression.

  a              ->  "foo"
  (a b)          ->  ("foo" "bar")
  ((a) ((b) b))  ->  (("foo") (("bar") "bar"))
  (list a b)     ->  error: unbound variable list
  @(list a b)    ->  ("foo" "bar") ;; Lisp expression
  (a @[b 1..:])  ->  ("foo" "ar")  ;; Lisp eval of [b 1..:]
  (a @(+ 2 2))   ->  ("foo" 4)     ;; Lisp eval of (+ 2 2)
  #(a b)         ->  (a b)         ;; Vector literal, not list.
  [a b]          ->  error: unbound variable dwim

The last example above [a b] is a notation equivalent to (dwim a b) and so follows similarly to the example involving list.


7.3 Input Scanning and Data Manipulation


7.3.1 The next Directive

The next directive indicates that the remaining directives in the current block are to be applied against a new input source.

It can only occur by itself as the only element in a query line, and takes various arguments, according to these possibilities:

source [:nothrow] [:noclose])
  @(next :args)
  @(next :env)
  @(next :list
  @(next :tlist
  @(next :string
  @(next :var
  @(next nil)

The lone @(next) without arguments specifies that subsequent directives will match inside the next file in the argument list which was passed to TXR on the command line.

If source is given, it must be a TXR Lisp expression which denotes an input source. Its value may be a string or an input stream. For instance, if variable A contains the text "data", then @(next A) means switch to the file called "data", and @(next `@A.txt`) means to switch to the file "data.txt". The directive @(next (open-command `git log`)) switches to the input stream connected to the output of the git log command.

If the input source cannot be opened for whatever reason, TXR throws an exception (see Exceptions below). An unhandled exception will terminate the program. Often, such a drastic measure is inconvenient; if @(next) is invoked with the :nothrow keyword, then if the input source cannot be opened, the situation is treated as a simple match failure. The :nothrow keyword also ensures that when the stream is later closed, which occurs when the lazy list reads all of the available data, the implicit call to the close-stream function specifies nil as the argument value to that function's throw-on-error-p parameter. This :nothrow mechanism does not suppress all exceptions related to the processing of that stream; unusual conditions encountered during the reading of data from the stream may throw exceptions.

When the subsequent directives which follow @(next) are processed, the directive terminates, and any stream which had been opened for source is closed. If the :noclose keyword is present, then this is prevented; the stream remains open. Note: keeping the stream open may be necessary if the @(data) directive is used to capture the input list into a variable whose value is used after the @(next) directive terminates, because the input list is lazy, and may depend on the stream continuing to be open.

The variant @(next :args) means that the remaining command-line arguments are to be treated as a data source. For this purpose, each argument is considered to be a line of text. The argument list does include that argument which specifies the file that is currently being processed or was most recently processed. As the arguments are matched, they are consumed. This means that if a @(next) directive without arguments is executed in the scope of @(next :args), it opens the file named by the first unconsumed argument.

To process arguments, and then continue with the original file and argument list, wrap the argument processing in a @(block). When the block terminates, the input source and argument list are restored to what they were before the block.

The variant @(next :env) means that the list of process environment variables is treated as a source of data. It looks like a text file stream consisting of lines of the form "name=value". If this feature is not available on a given platform, an exception is thrown.

The syntax @(next :list lisp-expr) treats TXR Lisp expression lisp-expr as a source of text. The value of lisp-expr is flattened to a simple list in a way similar to the @(flatten) directive. The resulting list is treated as if it were the lines of a text file: each element of the list must be a string, which represents a line. If the strings happen contain embedded newline characters, they are a visible constituent of the line, and do not act as line separators.

The syntax @(next :tlist bind-expr) is similar to @(next :list ...) except that bind-expr is not a TXR Lisp expression, but a TXR bind expression.

The syntax @(next :var var) requires var to be a previously bound variable. The value of the variable is retrieved and treated like a list, in the same manner as under @(next :list ...). Note that @(next :var x) is not always the same as @(next :tlist x), because :var x strictly requires x to be a TXR variable, whereas the x in :tlist x is an expression which can potentially refer to Lisp variable.

The syntax @(next :string lisp-expr) treats expression lisp-expr as a source of text. The value of the expression must be a string. Newlines in the string are interpreted as line terminators.

A string which is not terminated by a newline is tolerated, so that:

  @(next :string "abc")

binds a to "abc". Likewise, this is also the case with input files and other streams whose last line is not terminated by a newline.

However, watch out for empty strings, which are analogous to a correctly formed empty file which contains no lines:

  @(next :string "")

This will not bind a to ""; it is a matching failure. The behavior of :list is different. The query

  @(next :list "")

binds a to "". The reason is that under :list the string "" is flattened to the list ("") which is not an empty input stream, but a stream consisting of one empty line.

The @(next nil) variant indicates that the following subquery is applied to empty data, and the list of data sources from the command line is considered empty. This directive is useful in front of TXR code which doesn't process data sources from the command line, but takes command-line arguments. The @(next nil) incantation absolutely prevents TXR from trying to open the first command-line argument as a data source.

Note that the @(next) directive only redirects the source of input over the scope of subquery in which the that directive appears. For example, the following query looks for the line starting with "xyz" at the top of the file "foo.txt", within a some directive. After the @(end) which terminates the @(some), the "abc" is matched in the previous input stream which was in effect before the @(next) directive:

  @(next "foo.txt")

However, if the @(some) subquery successfully matched "xyz@suffix" within the file foo.text, there is now a binding for the suffix variable, which is visible to the remainder of the entire query. The variable bindings survive beyond the clause, but the data stream does not.


7.3.2 The skip Directive

The skip directive considers the remainder of the query as a search pattern. The remainder is no longer required to strictly match at the current line in the current input stream. Rather, the current stream is searched, starting with the current line, for the first line where the entire remainder of the query will successfully match. If no such line is found, the skip directive fails. If a matching position is found, the remainder of the query is processed from that point.

The remainder of the query can itself contain skip directives. Each such directive performs a recursive subsearch.

Skip comes in vertical and horizontal flavors. For instance, skip and match the last line:


Skip and match the last character of the line:

  @(skip)@{last 1}@(eol)

The skip directive has two optional arguments, which are evaluated as TXR Lisp expressions. If the first argument evaluates to an integer, its value limits the range of lines scanned for a match. Judicious use of this feature can improve the performance of queries.

Example: scan until "size: @SIZE" matches, which must happen within the next 15 lines:

  @(skip 15)
  size: @SIZE

Without the range limitation, skip will keep searching until it consumes the entire input source. In a horizontal skip, the range-limiting numeric argument is expressed in characters, so that

  abc@(skip 5)def

means: there must be a match for "abc" at the start of the line, and then within the next five characters, there must be a match for "def".

Sometimes a skip is nested within a collect, or following another skip. For instance, consider:

  begin @BEG_SYMBOL

The above collect iterates over the entire input. But, potentially, so does the embedded skip. Suppose that "begin x" is matched, but the data has no matching "end x". The skip will search in vain all the way to the end of the data, and then the collect will try another iteration back at the beginning, just one line down from the original starting point. If it is a reasonable expectation that an end x occurs 15 lines of a "begin x", this can be specified instead:

  begin @BEG_SYMBOL
  @(skip 15)

If the symbol nil is used in place of a number, it means to scan an unlimited range of lines; thus, @(skip nil) is equivalent to @(skip).

If the symbol :greedy is used, it changes the semantics of the skip to longest match semantics. For instance, match the last three space-separated tokens of the line:

  @(skip :greedy) @a @b @c

Without :greedy, the variable @c may match multiple tokens, and end up with spaces in it, because nothing follows @c and so it matches from any position which follows a space to the end of the line. Also note the space in front of @a. Without this space, @a will get an empty string.

A line-oriented example of greedy skip: match the last line without using @(eof):

  @(skip :greedy)

There may be a second numeric argument. This specifies a minimum number of lines to skip before looking for a match. For instance, skip 15 lines and then search indefinitely for begin ...:

  @(skip nil 15)
  begin @BEG_SYMBOL

The two arguments may be used together. For instance, the following matches if and only if the 15th line of input starts with begin :

  @(skip 1 15)
  begin @BEG_SYMBOL

Essentially, @(skip 1 n) means "hard skip by n lines". @(skip 1 0) is the same as @(skip 1), which is a noop, because it means: "the remainder of the query must match starting on the next line", or, more briefly, "skip exactly zero lines", which is the behavior if the skip directive is omitted altogether.

Here is one trick for grabbing the fourth line from the bottom of the input:

  @(skip 1 3)

Or using greedy skip:

  @(skip :greedy)
  @(skip 1 3)

Non-greedy skip with the @(eof) directive has a slight advantage because the greedy skip will keep scanning even though it has found the correct match, then backtrack to the last good match once it runs out of data. The regular skip with explicit @(eof) will stop when the @(eof) matches.


7.3.3 Reducing Backtracking with Blocks

The skip directive can consume considerable CPU time when multiple skips are nested. Consider:


This is actually nesting: the second and third skips occur within the body of the first one, and thus this creates nested iteration. TXR is searching for the combination of skips which match the pattern of lines A, B and C with backtracking behavior. The outermost skip marches through the data until it finds A followed by a pattern match for the second skip. The second skip iterates to find B followed by the third skip, and the third skip iterates to find C. If A and B are only one line each, then this is reasonably fast. But suppose there are many lines matching A and B, giving rise to a large number of combinations of skips which match A and B, and yet do not find a match for C, triggering backtracking. The nested stepping which tries the combinations of A and B can give rise to a considerable running time.

One way to deal with the problem is to unravel the nesting with the help of blocks. For example:

  @  (skip)
  @  (skip)

Now the scope of each skip is just the remainder of the block in which it occurs. The first skip finds A, and then the block ends. Control passes to the next block, and backtracking will not take place to a block which completed (unless all these blocks are enclosed in some larger construct which backtracks, causing the blocks to be re-executed.

This rewrite is not equivalent, and cannot be used for instance in backreferencing situations such as:

  @; Find three lines anywhere in the input which are identical.

This example depends on the nested search-within-search semantics.


7.3.4 The trailer Directive

The trailer directive introduces a trailing portion of a query or subquery which matches input material normally, but in the event of a successful match, does not advance the current position. This can be used, for instance, to cause @(collect) to match partially overlapping regions.

Trailer can be used in vertical context:


or horizontal:

directives ...

A vertical trailer prevents the vertical input position from advancing as it is matched by directives, whereas a horizontal trailer prevents the horizontal position from advancing. In other words, trailer performs matching without consuming the input, providing a lookahead mechanism.



This script collects each line which has a duplicate somewhere later in the input. Without the @(trailer) directive, this does not work properly for inputs like:


Without @(trailer), the first duplicate pair constitutes a match which spans over the 222. After that pair is found, the matching continues after the second 111.

With the @(trailer) directive in place, the collect body, on each iteration, only consumes the lines matched prior to @(trailer).


7.3.5 The freeform Directive

The freeform directive provides a useful alternative to TXR's line-oriented matching discipline. The freeform directive treats all remaining input from the current input source as one big line. The query line which immediately follows freeform is applied to that line.

The syntax variations are:

  ... query line ..

  @(freeform number) 
  ... query line ..

  @(freeform string) 
  ... query line ..

  @(freeform number string) 
  ... query line ..

where number and string denote TXR Lisp expressions which evaluate to an integer or string value, respectively.

If number and string are both present, they may be given in either order.

If the number argument is given, its value limits the range of lines which are combined together. For instance @(freeform 5) means to only consider the next five lines to be one big line. Without this argument, freeform is "bottomless". It can match the entire file, which creates the risk of allocating a large amount of memory.

If the string argument is given, it specifies a custom line terminator. The default terminator is "\n". The terminator does not have to be one character long.

Freeform does not convert the entire remainder of the input into one big line all at once, but does so in a dynamic, lazy fashion, which takes place as the data is accessed. So at any time, only some prefix of the data exists as a flat line in which newlines are replaced by the terminator string, and the remainder of the data still remains as a list of lines.

After the subquery is applied to the virtual line, the unmatched remainder of that line is broken up into multiple lines again, by looking for and removing all occurrences of the terminator string within the flattened portion.

Care must be taken if the terminator is other than the default "\n". All occurrences of the terminator string are treated as line terminators in the flattened portion of the data, so extra line breaks may be introduced. Likewise, in the yet unflattened portion, no breaking takes place, even if the text contains occurrences of the terminator string. The extent of data which is flattened, and the amount of it which remains, depends entirely on the query line underneath @(flatten).

In the following example, lines of data are flattened using $ as the line terminator.

 @(freeform "$")


output (-B):

The data is turned into the virtual line 1$2:3$4$. The @a$@b: subquery matches the 1$2: portion, binding a to "1", and b to "2". The remaining portion 3$4$ is then split into separate lines again according to the line terminator $i:


Thus the remainder of the query


faces these lines, binding c to 3 and d to 4. Note that since the data does not contain dollar signs, there is no ambiguity; the meaning may be understood in terms of the entire data being flattened and split again.

In the following example, freeform is used to solve a tokenizing problem. The Unix password file has fields separated by colons. Some fields may be empty. Using freeform, we can join the password file using ":" as a terminator. By restricting freeform to one line, we can obtain each line of the password file with a terminating ":", allowing for a simple tokenization, because now the fields are colon-terminated rather than colon-separated.


  @(next "/etc/passwd")
  @(freeform 1 ":")
  @(coll)@{token /[^:]*/}:@(end)


7.3.6 The fuzz Directive

The fuzz directive allows for an imperfect match spanning a set number of lines. It takes two arguments, both of which are TXR Lisp expressions that should evaluate to integers:

@(fuzz m n)

This expresses that over the next n query lines, the matching strictness is relaxed a little bit. Only m out of those n lines have to match. Afterward, the rest of the query follows normal, strict processing.

In the degenerate situation where there are fewer than n query lines following the fuzz directive, then m of them must succeed anyway. (If there are fewer than m, then this is impossible.)


7.3.7 The line and chr Directives

The line and chr directives perform binding between the current input line number or character position within a line, against an expression or variable:

  @(line 42)
  @(line x)
  abc@(chr 3)def@(chr y)

The directive @(line 42) means "match the current input line number against the integer 42". If the current line is 42, then the directive matches, otherwise it fails. line is a vertical directive which doesn't consume a line of input. Thus, the following matches at the beginning of an input stream, and x ends up bound to the first line of input:

  @(line 1)
  @(line 1)
  @(line 1)

The directive @(line x) binds variable x to the current input line number, if x is an unbound variable. If x is already bound, then the value of x must match the current line number, otherwise the directive fails.

The chr directive is similar to line except that it's a horizontal directive, and matches the character position rather than the line position. Character positions are measured from zero, rather than one. chr does not consume a character. Hence the two occurrences of chr in the following example both match, and x takes the entire line of input:

  @(chr 0)@(chr 0)@x

The argument of line or chr may be an @-delimited Lisp expression. This is useful for matching computed lines or character positions:

  @(line @(+ a (* b c)))


7.3.8 The name Directive

The name directive performs a binding between the name of the current data source and a variable or bind expression:

  @(name na)
  @(name "data.txt")

If na is an unbound variable, it is bound and takes on the name of the data source, such as a file name. If na is bound, then it has to match the name of the data source, otherwise the directive fails.

The directive @(name "data.txt") fails unless the current data source has that name.


7.3.9 The data Directive

The data directive performs a binding between the unmatched data at the current position, and and a variable or bind expression. The unmatched data takes the form of a list of strings:

  @(data d)

The binding is performed on object equality. If d is already bound, a matching failure occurs unless d contains the current unmatched data.

Matching the current data has various uses.

For instance, two branches of pattern matching can, at some point, bind the current data into different variables. When those paths join, the variables can be bound together to create the assertion that the current data had been the same at those points:

  @  (skip)
  @  (skip)
  @  (data x)
  @  (skip)
  @  (skip)
  @  (data y)
  @(require (eq x y))

Here, two branches of the @(all) match some material which ends in the line bar. However, it is possible that this is a different line. The data directives are used to create an assertion that the data regions matched by the two branches are identical. That is to say, the unmatched data x captured after the first bar and the unmatched data y captured after the second bar must be the same object in order for @(require (eq x y)) to succeed, which implies that the same bar was matched in both branches of the @(all).

Another use of data is simply to gain access to the trailing remainder of the unmatched input in order to print it, or do some special processing on it.

The tprint Lisp function is useful for printing the unmatched data as newline-terminated lines:

  @(data remainder)
  @(do (tprint remainder))


7.3.10 The eof Directive

The eof directive, if not given any argument, matches successfully when no more input is available from the current input source.

In the following example, the line variable captures the text "One-line file" and then since that is the last line of input, the eof directive matches:


 One-line file

If the data consisted of two or more lines, eof would fail.

The eof directive may be given a single argument, which is a pattern that matches the termination status of the input source. This is useful when the input source is a process pipe. For the purposes of eof, sources which are not process pipes have the symbol t as their termination status.

In the following example, which assumes the availability of a POSIX shell command interpreter in the host system, the variable a captures the string "a" and the status variable captures the integer value 5, which is the termination status of the command:

  @(next (open-command "echo a; exit 5"))
  @(eof status)


7.3.11 The some, all, none, maybe, cases and choose Directives

These directives, called the parallel directives, combine multiple subqueries, which are applied at the same input position, rather than to consecutive input.

They come in vertical (line mode) and horizontal (character mode) flavors.

In horizontal mode, the current position is understood to be a character position in the line being processed. The clauses advance this character position by moving it to the right. In vertical mode, the current position is understood to be a line of text within the stream. A clause advances the position by some whole number of lines.

The syntax of these parallel directives follows this example:


And in horizontal mode:


Long horizontal lines can be broken up with line continuations, allowing the above example to be written like this, which is considered a single logical line:


The @(some), @(all), @(none), @(maybe), @(cases) or @(choose) must be followed by at least one subquery clause, and be terminated by @(end). If there are two or more subqueries, these additional clauses are indicated by @(and) or @(or), which are interchangeable. The separator and terminator directives also must appear as the only element in a query line.

The choose directive requires keyword arguments. See below.

The syntax supports arbitrary nesting. For example:

  QUERY:            SYNTAX TREE:

  @(all)            all -+
  @  (skip)              +- skip -+
  @  (some)              |        +- some -+
  it                     |        |        +- TEXT
  @  (and)               |        |        +- and
  @    (none)            |        |        +- none -+
  was                    |        |        |        +- TEXT
  @    (end)             |        |        |        +- end
  @  (end)               |        |        +- end
  a dark                 |        +- TEXT
  @(end)                 *- end

nesting can be indicated using whitespace between @ and the directive expression. Thus, the above is an @(all) query containing a @(skip) clause which applies to a @(some) that is followed by the text line "a dark". The @(some) clause combines the text line "it", and a @(none) clause which contains just one clause consisting of the line "was".

The semantics of the parallel directives is:

Each of the clauses is matched at the current position. If any of the clauses fails to match, the directive fails (and thus does not produce any variable bindings). Clauses following the failed directive are not evaluated. Bindings extracted by a successful clause are visible to the clauses which follow, and if the directive succeeds, all of the combined bindings emerge.

@(some [ :resolve (var ...) ])
Each of the clauses is matched at the current position. If any of the clauses succeed, the directive succeeds, retaining the bindings accumulated by the successfully matching clauses. Evaluation does not stop on the first successful clause. Bindings extracted by a successful clause are visible to the clauses which follow.

The :resolve parameter is for situations when the @(some) directive has multiple clauses that need to bind some common variables to different values: for instance, output parameters in functions. Resolve takes a list of variable name symbols as an argument. This is called the resolve set. If the clauses of @(some) bind variables in the resolve set, those bindings are not visible to later clauses. However, those bindings do emerge out of the @(some) directive as a whole. This creates a conflict: what if two or more clauses introduce different bindings for a variable in the resolve set? This is why it is called the resolve set: conflicts for variables in the resolve set are automatically resolved in favor of later directives.


  @(some :resolve (x))
  @  (bind a "a")
  @  (bind x "x1")
  @  (bind b "b")
  @  (bind x "x2")

Here, the two clauses both introduce a binding for x. Without the :resolve parameter, this would mean that the second clause fails, because x comes in with the value "x1", which does not bind with "x2". But because x is placed into the resolve set, the second clause does not see the "x1" binding. Both clauses establish their bindings independently creating a conflict over x. The conflict is resolved in favor of the second clause, and so the bindings which emerge from the directive are:


Each of the clauses is matched at the current position. The directive succeeds only if all of the clauses fail. If any clause succeeds, the directive fails, and subsequent clauses are not evaluated. Thus, this directive never produces variable bindings, only matching success or failure.

Each of the clauses is matched at the current position. The directive always succeeds, even if all of the clauses fail. Whatever bindings are found in any of the clauses are retained. Bindings extracted by any successful clause are visible to the clauses which follow.

Each of the clauses is matched at the current position. The clauses are matched, in order, at the current position. If any clause matches, the matching stops and the bindings collected from that clause are retained. Any remaining clauses after that one are not processed. If no clause matches, the directive fails, and produces no bindings.

@(choose [ :longest var | :shortest var ])
Each of the clauses is matched at the current position in order. In this construct, bindings established by an earlier clause are not visible to later clauses. Although any or all of the clauses can potentially match, the clause which succeeds is the one which maximizes or minimizes the length of the text bound to the specified variable. The other clauses have no effect.

For all of the parallel directives other than @(none) and @(choose), the query advances the input position by the greatest number of lines that match in any of the successfully matching subclauses that are evaluated. The @(none) directive does not advance the input position.

For instance if there are two subclauses, and one of them matches three lines, but the other one matches five lines, then the overall clause is considered to have made a five line match at its position. If more directives follow, they begin matching five lines down from that position.


7.3.12 The require Directive

The syntax of @(require) is:


The require directive evaluates a TXR Lisp expression. (See TXR LISP far below.) If the expression yields a true value, then it succeeds, and matching continues with the directives which follow. Otherwise the directive fails.

In the context of the require directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.


  @; require that 4 is greater than 3
  @; This succeeds; therefore, @a is processed
  @(require (> (+ 2 2) 3))


7.3.13 The if Directive

The if directive allows for conditional selection of pattern-matching clauses, based on the Boolean results of Lisp expressions.

A variant of the if directive is also available for use inside an output clauses, where it similarly allows for the conditional selection of output clauses.

The syntax of the if directive can be exemplified as follows:


The @(elif) and @(else) clauses are all optional. If @(else) is present, it must be last, before @(end), after any @(elif) clauses. Any of the clauses may be empty.


  @(if (> (length str) 42))
  foo: @a @b

In this example, if the length of the variable str is greater than 42, then matching continues with "foo: @a b", otherwise it proceeds with {@c}.

More precisely, how the if directive works is as follows. The Lisp expressions are evaluated in order, starting with the if expression, then the elif expressions if any are present. If any Lisp expression yields a true result (any value other than nil) then evaluation of Lisp expressions stops. The corresponding clause of that Lisp expression is selected and pattern matching continues with that clause. The result of that clause (its success or failure, and any newly bound variables) is then taken as the result of the if directive. If none of the Lisp expressions yield true, and an else clause is present, then that clause is processed and its result determines the result of the if directive. If none of the Lisp expressions yield true, and there is no else clause, then the if directive is deemed to have trivially succeeded, allowing matching to continue with whatever directive follows it.


7.3.14 The Lisp if versus TXR if

The @(output) directive supports the embedding of Lisp expressions, whose values are interpolated into the output. In particular, Lisp if expressions are useful. For instance @(if expr "A" "B") reproduces A if expr yields a true value, otherwise B. Yet the @(if) directive is also supported in @(output). How the apparent conflict between the two is resolved is that the two take different numbers of arguments. An @(if) which has no arguments at all is a syntax error. One that has one argument is the head of the if directive syntax which must be terminated by @(end) and which takes the optional @(elif) and @(else) clauses. An @(if) which has two or more arguments is parsed as a self-contained Lisp expression.


7.3.15 The gather Directive

Sometimes text is structured as items that can appear in an arbitrary order. When multiple matches need to be extracted, there is a combinatorial explosion of possible orders, making it impractical to write pattern matches for all the possible orders.

The gather directive is for these situations. It specifies multiple clauses which all have to match somewhere in the data, but in any order.

For further convenience, the lines of the first clause of the gather directive are implicitly treated as separate clauses.

The syntax follows this pattern:


The multiline clauses are optional. The gather directive takes keyword parameters, see below.


7.3.16 The until / last clause in gather

Similarly to collect, gather has an optional until/last clause:


How gather works is that the text is searched for matches for the single-line and multiline queries. The clauses are applied in the order in which they appear. Whenever one of the clauses matches, any bindings it produces are retained and it is removed from further consideration. Multiple clauses can match at the same text position. The position advances by the longest match from among the clauses which matched. If no clauses match, the position advances by one line. The search stops when all clauses are eliminated, and then the cumulative bindings are produced. If the data runs out, but unmatched clauses remain, the directive fails.

Example: extract several environment variables, which do not appear in a particular order:

  @(next :env)

If the until or last clause is present and a match occurs, then the matches from the other clauses are discarded and the gather terminates. The difference between until/last is that any bindings bindings established in last are retained, and the input position is advanced past the matching material. The until/last clause has visibility to bindings established in the previous clauses in that same iteration, even though those bindings end up thrown away.

For consistency, the :mandatory keyword is supported in the until/last clause of gather. The semantics of using :mandatory in this situation is tricky. In particular, if it is in effect, and the gather terminates successfully by collecting all required matches, it will trigger a failure. On the other hand, if the until or last clause activates before all required matches are gathered, a failure also occurs, whether or not the clause is :mandatory.

Meaningful use of :mandatory requires that the gather be open-ended; it must allow some (or all) variables not to be required. The presence of the option means that for gather to succeed, all required variables must be gathered first, but then termination must be achieved via the until/last clause before all gather clauses are satisfied.


7.3.17 Keyword Parameters in gather

The gather directive accepts the keyword parameter :vars. The argument to :vars is a list of required and optional variables. A required variable is specified as a symbol. An optional variable is specified as a two element list which pairs a symbol with a Lisp expression. That Lisp expression is evaluated and specifies the default value for the variable.


  @(gather :vars (a b c (d "foo")))

Here, a, b and c are required variables, and d is optional, with the default value given by the Lisp expression "foo".

The presence of :vars changes the behavior in three ways.

Firstly, even if all the clauses in the gather match successfully and are eliminated, the directive will fail if the required variables do not have bindings. It doesn't matter whether the bindings are existing, or whether they are established by gather.

Secondly, if some of the clauses of gather did not match, but all of the required variables have bindings, then the directive succeeds. Without the presence of :vars, it would fail in this situation.

Thirdly, if gather succeeds (all required variables have bindings), then all of the optional variables which do not have bindings are given bindings to their default values.

The expressions which give the default values are evaluated whenever the gather directive is evaluated, whether or not their values are used.


7.3.18 The collect Directive

The syntax of the collect directive is:

  ... lines of subquery

or with an until or last clause:

  ... lines of subquery: main clause
  ... lines of subquery: until clause

  ... lines of subquery: main clause
  ... lines of subquery: last clause

The repeat symbol may be specified instead of collect, which changes the meaning:

  ... lines of subquery

The @(repeat) syntax is equivalent to @(collect :vars nil) and doesn't take the :vars clause. It accepts other collect parameters.

The subquery is matched repeatedly, starting at the current line. If it fails to match, it is tried starting at the subsequent line. If it matches successfully, it is tried at the line following the entire extent of matched data, if there is one. Thus, the collected regions do not overlap. (Overlapping behavior can be obtained: see the @(trailer) directive.)

Unless certain keywords are specified, or unless the collection is explicitly failed with @(fail), it always succeeds, even if it collects nothing, and even if the until/last clause never finds a match.

If no until/last clause is specified, and the collect is not limited using parameters, the collection is unbounded: it consumes the entire data file.


7.3.19 The until / last clause in collect

If an until/last clause is specified, the collection stops when that clause matches at the current position.

If an until clause terminates collect, no bindings are collected at that position, even if the main clause matches at that position also. Moreover, the position is not advanced. The remainder of the query begins matching at that position.

If a last clause terminates collect, the behavior is different. Any bindings captured by the main clause are thrown away, just like with the until clause. However, the bindings in the last clause itself survive, and the position is advanced to skip over that material.



The line 42 is not collected, even though it matches @a. Furthermore, the @(until) does not advance the position, so variable c takes 42.

If the @(until) is changed to @(last) the output will be different:


The 42 is not collected into a list, just like before. But now the binding captured by @b emerges. Furthermore, the position advances so variable now takes 6.

The binding variables within the clause of a collect are treated specially. The multiple matches for each variable are collected into lists, which then appear as array variables in the final output.



The query matches the data in three places, so each variable becomes a list of three elements, reported as an array.

Variables with list bindings may be referenced in a query. They denote a multiple match. The -D command-line option can establish a one-dimensional list binding.

The clauses of collect may be nested. Variable matches collated into lists in an inner collect are again collated into nested lists in the outer collect. Thus an unbound variable wrapped in N nestings of @(collect) will be an N-dimensional list. A one-dimensional list is a list of strings; a two-dimensional list is a list of lists of strings, etc.

It is important to note that the variables which are bound within the main clause of a collect, that is, the variables which are subject to collection, appear, within the collect, as normal one-value bindings. The collation into lists happens outside of the collect. So for instance in the query:


The left @x establishes a binding for some material preceding an equal sign. The right @x refers to that binding. The value of @x is different in each iteration, and these values are collected. What finally comes out of the collect clause is a single variable called x which holds a list containing each value that was ever instantiated under that name within the collect clause.

Also note that the until clause has visibility over the bindings established in the main clause. This is true even in the terminating case when the until clause matches, and the bindings of the main clause are discarded.


7.3.20 Keyword Parameters in collect

By default, collect searches the rest of the input indefinitely, or until the until/last clause matches. It skips arbitrary amounts of nonmatching material before the first match, and between matches.

Within the @(collect) syntax, it is possible to specify keyword parameters for additional control of the behavior. A keyword parameter consist of a keyword symbol followed by an argument, enclosed within the @(collect) syntax. The following are the supported keywords.

:maxgap n
The :maxgap keyword takes a numeric argument n, which is a Lisp expression. It causes collect to terminate if it fails to find a match after skipping n lines from the starting position, or more than n lines since any successful match. For example,

  @(collect :maxgap 5)

specifies that the gap between the current position and the first match for the body of the collect, or between consecutive matches can be no longer than five lines. A :maxgap value of 0 means that the collected regions must be adjacent and must match right from the starting position. For instance:

  @(collect :maxgap 0)
  M @a

means: from here, collect consecutive lines of the form "M ...". This will not search for the first such line, nor will it skip lines which do not match this form.

:mingap n
The :mingap keyword complements :maxgap, though not exactly. Its argument n, a Lisp expression, specifies a minimum number of lines which must separate consecutive matches. However, it has no effect on the distance from the starting position to the first match.

:gap n
The :gap keyword effectively specifies :mingap and :maxgap at the same time, and can only be used if these other two are not used. Thus:

  @(collect :gap 1)

means: collect every other line starting with the current line.

:times n
This shorthand means the same thing as if
:mintimes n :maxtimes n
were specified. This means that exactly n matches must occur. If fewer occur, then collect fails. The collect stops once it achieves n matches.

:mintimes n
The argument n of the :mintimes keyword is a Lisp expression which specifies that at least n matches must occur, or else collect fails.

:mintimes n
The Lisp argument expression n of the :mintimes keyword specifies that at most n matches are collected.

:lines n
The argument n of the :lines keyword parameter is a Lisp expression which specifies the upper bound on how many lines should be scanned by collect, measuring from the starting position. The extent of the collect body is not counted. Example:

  @(collect :lines 2)
  foo: @a
  bar: @b
  baz: @c

The above collect will look for a match only twice: at the current position, and one line down.

:vars ({variable | (variable default-value)}*)
The :vars keyword specifies a restriction on what variables will emanate from the collect. Its argument is a list of variable names. An empty list may be specified using empty parentheses or, equivalently, the symbol nil. The default-value element of the syntax is a Lisp expression. The behavior of the :vars keyword is specified in the following section, "Specifying variables in collect".

:lists (variable*)
The :lists keyword indicates a list of variables. After the collect terminates, each variable in the list which does not have a binding is bound to the empty list symbol nil. Unlike :vars the :lists mechanism doesn't assert that only the listed variables may emanate from the collect. It also doesn't assert that each iteration of the collect must bind each of those variables.

:counter {variable | (variable starting-value)}
The :counter keyword's argument is a variable name symbol, or a compound expression consisting of a variable name symbol and the TXR Lisp expression starting-value. If this keyword argument is specified, then a binding for variable is established prior to each repetition of the collect body, to an integer value representing the repetition count. By default, repetition counts begin at zero. If starting-value is specified, it must evaluate to a number. This number is then added to each repetition count, and variable takes on the resulting displaced value.

If there is an existing binding for variable prior to the processing of the collect, then the variable is shadowed.

The binding is collected in the same way as other bindings that are established in the collect body.

The repetition count only increments after a successful match.

The variable is visible to the collect's until/last clause. If that clause is being processed after a successful match of the body, then variable holds an integer value. If the body fails to match, then the until/last clause sees a binding for variable with a value of nil.


7.3.21 Specifying Variables in collect

Normally, any variable for which a new binding occurs in a collect block is collected. A collect clause may be "sloppy": it can neglect to collect some variables on some iterations, or bind some variables which are intended to behave like local temporaries, but end up collated into lists. Another issue is that the collect clause might not match anything at all, and then none of the variables are bound.

The :vars keyword allows the query writer to add discipline the collect body.

The argument to :vars is a list of variable specs. A variable spec is either a symbol, denoting a required variable, or a (symbol default-value) pair, where default-value is a Lisp expression whose value specifies a default value for the variable, which is optional.

When a :vars list is specified, it means that only the given variables can emerge from the successful collect. Any newly introduced bindings for other variables do not propagate. More precisely, whenever the collect body matches successfully, the following three rules apply:

If :vars specifies required variables, the collect body must bind all of them, or else must not bind any variable at all, whether listed in :vars or not, otherwise an exception of type query-error is thrown.
If :vars specifies required variables, and also specifies default variables, and the collect body binds no variable at all, then the default variables are not bound to their default values.
If :vars specifies optional variables, and all required variables are bound by the collect body, then all those optional variables that are not bound by the collect body are bound to their default values. Under this rule, if :vars specifies no required variables, that is deemed to be logically equivalent to all required variables being bound.

In the event that collect does not match anything, the variables specified in :vars, whether required or optional, are all bound to empty lists. These bindings are established after the processing of the until/last clause, if present.


  @(collect :vars (a b (c "foo")))
  @a @c

Here, if the body "@a @c" matches, an error will be thrown because one of the mandatory variables is b, and the body neglects to produce a binding for b.


  @(collect :vars (a (c "foo")))
  @a @b

Here, if "@a @b" matches, only a will be collected, but not b, because b is not in the variable list. Furthermore, because there is no binding for c in the body, a binding is created with the value "foo", exactly as if c matched such a piece of text.

In the following example, the assumption is that THIS NEVER MATCHES is not found anywhere in the input but the line THIS DOES MATCH is found and has a successor which is bound to a. Because the body did not match, the :vars a and b should be bound to empty lists. But a is bound by the last clause to some text, so this takes precedence. Only b is bound to an empty list.

  @(collect :vars (a b))

The following means: do not allow any variables to propagate out of any iteration of the collect and therefore collect nothing:

  @(collect :vars nil)

Instead of writing @(collect :vars nil), it is possible to write @(repeat). @(repeat) takes all collect keywords, except for :vars. There is a @(repeat) directive used in @(output) clauses; that is a different directive.


7.3.22 Mandatory until and last

The until/last clause supports the option keyword :mandatory, exemplified by the following:

  @(last :mandatory)

This means that the collect must be terminated by a match for the until/last clause, or else by an explicit @(accept).

Specifically, the collect cannot terminate due to simply running out of data, or exceeding a limit on the number of matches that may be collected. In those situations, if an until or last clause is present with :mandatory, the collect is deemed to have failed.


7.3.23 The coll Directive

The coll directive is the horizontal version of collect. Whereas collect works with multiline clauses on line-oriented material, coll works within a single line. With coll, it is possible to recognize repeating regularities within a line and collect lists.

Regular-expression-based Positive Match variables work well with coll.

Example: collect a comma-separated list, terminated by a space.

 @(coll)@{A /[^, ]+/}@(until) @(end)@B
 foo,bar,xyzzy blorch

Here, the variable A is bound to tokens which match the regular expression /[^, ]+/: nonempty sequence of characters other than commas or spaces.

Like collect, coll searches for matches. If no match occurs at the current character position, it tries at the next character position. Whenever a match occurs, it continues at the character position which follows the last character of the match, if such a position exists.

If not bounded by an until clause, it will exhaust the entire line. If the until clause matches, then the collection stops at that position, and any bindings from that iteration are discarded. Like collect, coll also supports an until/last clause, which propagates variable bindings and advances the position. The :mandatory keyword is supported.

coll clauses nest, and variables bound within a coll are available to clauses within the rest of the coll clause, including the until/last clause, and appear as single values. The final list aggregation is only visible after the coll clause.

The behavior of coll leads to difficulties when a delimited variable are used to match material which is delimiter separated rather than terminated. For instance, entries in a comma-separated files usually do not appear as "a,b,c," but rather "a,b,c".

So for instance, the following result is not satisfactory:

 @(coll)@a @(end)
 1 2 3 4 5

The 5 is missing because it isn't followed by a space, which the text-delimited variable match "@a " looks for. After matching "4 ", coll continues to look for matches, and doesn't find any. It is tempting to try to fix it like this:

 @(coll)@a@/ ?/@(end)
 1 2 3 4 5

The problem now is that the regular expression / ?/ (match either a space or nothing), matches at any position. So when it is used as a variable delimiter, it matches at the current position, which binds the empty string to the variable, the extent of the match being zero. In this situation, the coll directive proceeds character by character. The solution is to use positive matching: specify the regular expression which matches the item, rather than a trying to match whatever follows. The collect directive will recognize all items which match the regular expression:

 @(coll)@{a /[^ ]+/}@(end)
 1 2 3 4 5

The until clause can specify a pattern which, when recognized, terminates the collection. So for instance, suppose that the list of items may or may not be terminated by a semicolon. We must exclude the semicolon from being a valid character inside an item, and add an until clause which recognizes a semicolon:

 @(coll)@{a /[^ ;]+/}@(until);@(end);
 1 2 3 4 5;

Whether followed by the semicolon or not, the items are collected properly.

Note that the @(end) is followed by a semicolon. That's because when the @(until) clause meets a match, the matching material is not consumed.

This repetition can be avoided by using @(last) instead of @(until) since @(last) consumes the terminating material.

Instead of the above regular-expression-based approach, this extraction problem can also be solved with cases:

 @(coll)@(cases)@a @(or)@a@(end)@(end)
 1 2 3 4 5


7.3.24 Keyword Parameters in coll

The @(coll) directive takes most of the same parameters as @(collect). See the section Keyword parameters in collect above. So for instance @(coll :gap 0) means that the collects must be consecutive, and @(coll :maxtimes 2) means that at most two matches will be collected. The :lines keyword does not exist, but there is an analogous :chars keyword.

The @(coll) directive takes the :vars keyword.

The shorthand @(rep) may be used instead of @(coll :vars nil). @(rep) takes all keywords, except :vars.


7.3.25 The flatten Directive

The flatten directive can be used to convert variables to one-dimensional lists. Variables which have a scalar value are converted to lists containing that value. Variables which are multidimensional lists are flattened to one-dimensional lists.

Example (without @(flatten)):


Example (with @(flatten)):

 @(flatten a b)


7.3.26 The merge Directive

The syntax of merge follows the pattern:

@(merge destination [sources ...])

destination is a variable, which receives a new binding. sources are bind expressions.

The merge directive provides a way of combining collected data from multiple nested lists in a way which normalizes different nesting levels among the sources. This directive is useful for combining the results from collects at different levels of nesting into a single nested list such that parallel elements are at equal depth.

A new binding is created for the destination variable, which holds the result of the operation.

The merge directive performs its special function if invoked with at least three arguments: a destination and two sources.

The one-argument case @(merge x) binds a new variable x and initializes it with the empty list and is thus equivalent to @(bind x). Likewise, the two-argument case @(merge x y) is equivalent to @(bind x y), establishing a binding for x which is initialized with the value of y.

To understand what merge does when two sources are given, as in @(merge C A B), we first have to define a property called depth. The depth of an atom such as a string is defined as 1. The depth of an empty list is 0. The depth of a nonempty list is one plus the depth of its deepest element. So for instance "foo" has depth 1, ("foo") has depth 2, and ("foo" ("bar")) has depth three.

We can now define a binary (two argument) merge(A, B) function as follows. First, merge(A, B) normalizes the values A and B to produce a pair of values which have equal depth, as defined above. If either value is an atom it is first converted to a one-element list containing that atom. After this step, both values are lists; and the only way an argument has depth zero is if it is an empty list. Next, if either value has a smaller depth than the other, it is wrapped in a list as many times as needed to give it equal depth. For instance if A is (a) and B is (((("b" "c") ("d" "e)))) then A is converted to (((("a")))). Finally, the list values are appended together to produce the merged result. In the case of the preceding two example values, the result is: (((("a"))) ((("b" "c") ("d" "e)))). The result is stored into a the newly bound destination variable C.

If more than two source arguments are given, these are merged by a left-associative reduction, which is to say that a three argument merge(X, Y, Z) is defined as merge(merge(X, Y), Z). The leftmost two values are merged, and then this result is merged with the third value, and so on.


7.3.27 The cat Directive

The cat directive converts a list variable into a single piece of text. The syntax is:

var [sep])

The sep argument is a Lisp expression whose value specifies a separating piece of text. If it is omitted, then a single space is used as the separator.


 @(coll)@{a /[^ ]+/}@(end)
 @(cat a ":")
 1 2 3 4 5


7.3.28 The bind Directive

The syntax of the bind directive is:

pattern bind-expression {keyword value}*)

The bind directive is a kind of pattern match, which matches one or more variables given in pattern against a value produced by the bind-expression on the right.

Variable names occurring in the pattern expression may refer to bound or unbound variables.

All variable references occurring in bind-expression must have a value.

Binding occurs as follows. The tree structure of pattern and the value of bind-expression are considered to be parallel structures.

Any variables in pattern which are unbound receive a new binding, which is initialized with the structurally corresponding piece of the object produced by bind-expression.

Any variables in pattern which are already bound must match the corresponding part of the value of bind-expression, or else the bind directive fails. Variables which are already bound are not altered, retaining their current values even if the matching is inexact.

The simplest bind is of one variable against itself, for instance binding A against A:

  @(bind A A)

This will throw an exception if A is not bound. If A is bound, it succeeds, since A matches itself.

The next simplest bind binds one variable to another:

  @(bind A B)

Here, if A is unbound, it takes on the same value as B. If A is bound, it has to match B, or the bind fails. Matching means that either

A and B are the same text
A is text, B is a list, and A occurs within B.
vice versa: B is text, A is a list, and B occurs within A.
A and B are lists and are either identical, or one is found as a substructure within the other.

The right-hand side does not have to be a variable. It may be some other object, like a string, quasiliteral, regexp, or list of strings, etc. For instance,

  @(bind A "ab\tc")

will bind the string "ab\tc" to the variable A if A is unbound. If A is bound, this will fail unless A already contains an identical string. However, the right-hand side of a bind cannot be an unbound variable, nor a complex expression that contains unbound variables.

The left-hand side of bind can be a nested list pattern containing variables. The last item of a list at any nesting level can be preceded by a . (dot), which means that the variable matches the rest of the list from that position.

Example 1:

Suppose that the list A contains ("now" "now" "brown" "cow"). Then the directive @(bind (H N . C) A), assuming that H, N and C are unbound variables, will bind H to "how", code N to "now", and C to the remainder of the list ("brown" "cow").

Example: suppose that the list A is nested to two dimensions and contains (("how" "now") ("brown" "cow")). Then @(bind ((H N) (B C)) A) binds H to "how", N to "now", B to "brown" and C to "cow".

The dot notation may be used at any nesting level. it must be followed by an item. The forms (.) and (X .) are invalid, but (. X) is valid and equivalent to X.

The number of items in a left pattern match must match the number of items in the corresponding right side object. So the pattern () only matches an empty list. The notations () and nil mean exactly the same thing.

The symbols nil, t and keyword symbols may be used on either side. They represent themselves. For example @(bind :foo :bar) fails, but @(bind :foo :foo) succeeds since the two sides denote the same keyword symbol object.

Example 2:

In this example, suppose A contains "foo" and B contains bar. Then @(bind (X (Y Z)) (A (B "hey"))) binds X to "foo", Y to "bar" and Z to "hey". This is because the bind-expression produces the object ("foo" ("bar" "hey")) which is then structurally matched against the pattern (X (Y Z)), and the variables receive the corresponding pieces.


7.3.29 Keywords in The bind Directive

The bind directive accepts these keywords:

The argument to :lfilt is a filter specification. When the left side pattern contains a binding which is therefore matched against its counterpart from the right side expression, the left side is filtered through the filter specified by :lfilt for the purposes of the comparison. For example:

  @(bind "a" "A" :lfilt :upcase)

produces a match, since the left side is the same as the right after filtering through the :upcase filter.

The argument to :rfilt is a filter specification. The specified filter is applied to the right-hand-side material prior to matching it against the left side. The filter is not applied if the left side is a variable with no binding. It is only applied to determine a match. Binding takes place the unmodified right-hand-side object.

For example, the following produces a match:

  @(bind "A" "a" :rfilt :upcase)

This keyword is a shorthand to specify both filters to the same value. For instance :filter :upcase is equivalent to :lfilt :upcase :rfilt :upcase.

For a description of filters, see Output Filtering below.

Compound filters like (:fromhtml :upcase) are supported with all these keywords. The filters apply across arbitrary patterns and nested data.


  @(bind (a b c) ("A" "B" "C"))
  @(bind (a b c) (("z" "a") "b" "c") :rfilt :upcase)

Here, the first bind establishes the values for a, b and c, and the second bind succeeds, because the value of a matches the second element of the list ("z" "a") if it is upcased, and likewise b matches "b" and c matches "c" if these are upcased.


7.3.30 Lisp Forms in The bind Directive

TXR Lisp forms, introduced by @ may be used in the bind-expression argument of bind, or as the entire form. This is consistent with the rules for bind expressions.

TXR Lisp forms can be used in the pattern expression also.


  @(bind a @(+ 2 2))
  @(bind @(+ 2 2) @(* 2 2))

Here, a is bound to the integer 4. The second bind then succeeds because the forms (+ 2 2) and (* 2 2) produce equal values.


7.3.31 The set Directive

The syntax of the set directive is:

pattern bind-expression)

The set directive syntactically resembles bind, but is not a pattern match. It overwrites the previous values of variables with new values from the right-hand side. Each variable that is assigned must have an existing binding: set will not induce binding.

Examples follow.

Store the value of A back into A, an operation with no effect:

  @(set A A)

Exchange the values of A and B:

  @(set (A B) (B A))

Store a string into A:

  @(set A "text")

Store a list into A:

  @(set A ("line1" "line2"))

Destructuring assignment. A ends up with "A", B ends up with ("B1" "B2") and C binds to ("C1" "C2").

  @(bind D ("A" ("B1" "B2") "C1" "C2"))
  @(bind (A B C) (() () ()))
  @(set (A B . C) D)

Note that set does not support a TXR Lisp expression on the left side, so the following are invalid syntax:

  @(set @(+ 1 1) @(* 2 2))
  @(set @b @(list "a"))

The second one is erroneous even though there is a variable on the left. Because it is preceded by the @ escape, it is a Lisp variable, and not a pattern variable.

The set directive also doesn't support Lisp expressions in the pattern, which must consist only of variables.


7.3.32 The rebind Directive

The syntax of the rebind directive is:

pattern bind-expression)

The rebind directive resembles bind. It combines the semantics of local and bind into a single directive. The bind-expression is evaluated in the current environment, and its value remembered. Then a new environment is produced in which all the variables specified in pattern are absent. Then, the pattern is newly bound in that environment against the previously produced value, as if using bind.

The old environment with the previous variables is not modified; it continues to exist. This is in contrast with the set directive, which mutates existing bindings.

rebind makes it easy to create temporary bindings based on existing bindings.

  @(define pattern-function (arg))
  @;; inside a pattern function:
  @(rebind recursion-level @(+ recursion-level 1))
  @;; ...

When the function terminates, the previous value of recursion-level is restored. The effect is less verbose and more efficient than the following equivalent

  @(define pattern-function (arg))
  @;; inside a pattern function:
  @(local temp)
  @(set temp recursion-level)
  @(local recursion-level)
  @(set recursion-level @(+ temp 1))
  @;; ...

Like bind, rebind supports nested patterns, such as

  @(rebind (a (b c)) (1 (2 3))

but it does not support any keyword arguments. The filtering features of bind do not make sense in rebind because the variables are always reintroduced into an environment in which they don't exist, whereas filtering applies in situations when bound variables are matched against values.

The rebind directive also doesn't support Lisp expressions in the pattern, which must consist only of variables.


7.3.33 The forget Directive

The forget has two spellings: @(forget) and @(local).

The arguments are one or more symbols, for example:

  @(forget a)
  @(local a b c)

this can be written

  @(local a)
  @(local a b c)

Directives which follow the forget or local directive no longer see any bindings for the symbols mentioned in that directive, and can establish new bindings.

It is not an error if the bindings do not exist.

It is strongly recommended to use the @(local) spelling in functions, because the forgetting action simulates local variables: for the given symbols, the machine forgets any earlier variables from outside of the function, and consequently, any new bindings for those variables belong to the function. (Furthermore, functions suppress the propagation of variables that are not in their parameter list, so these locals will be automatically forgotten when the function terminates.)


7.3.34 The do Directive

The syntax of @(do) is:


The do directive evaluates zero or more TXR Lisp expressions. (See TXR LISP far below.) The value of the expression is ignored, and matching continues with the directives which follow the do directive, if any.

In the context of the do directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.


  @; match text into variables a and b, then insert into hash table h
  @(bind h @(hash))
  @(do (set [h a] b))


7.3.35 The mdo Directive

The syntax of @(mdo) is:


Like the do directive, mdo (macro-time do) evaluates zero or more TXR Lisp expressions. Unlike do, mdo performs this evaluation immediately upon being parsed. Then it disappears from the syntax.

The effect of @(mdo e0 e1 e2 ...) is exactly like @(do (macro-time e0 e1 e2 ...)) except that do doesn't disappear from the syntax.

Another difference is that do can be used as a horizontal or vertical directive, whereas mdo is only vertical.


7.3.36 The in-package Directive

The in-package directive shares the same syntax and semantics as the TXR Lisp macro of the same name:


The in-package directive is evaluated immediately upon being parsed, leaving no trace in the syntax tree of the surrounding TXR query.

It causes the *package* special variable to take on the package denoted by name.

The directive that name is either a string or symbol. An error exception is thrown if this isn't the case. Otherwise it searches for the package. If the package is not found, an error exception is thrown.


7.4 Blocks


7.4.1 Overview

Blocks are sections of a query which are either denoted by a name, or are anonymous. They may nest: blocks can occur within blocks and other constructs.

Blocks are useful for terminating parts of a pattern-matching search prematurely, and escaping to a higher level. This makes blocks not only useful for simplifying the semantics of certain pattern matches, but also an optimization tool.

Judicious use of blocks and escapes can reduce or eliminate the amount of backtracking that TXR performs.


7.4.2 The block Directive

The @(block name) directive introduces a named block, except when name is the symbol nil. The @(block) directive introduces an unnamed block, equivalent to @(block nil).

The @(skip) and @(collect) directives introduce implicit anonymous blocks, as do function bodies.

Blocks must be terminated by @(end) and can be vertical:

  @(block [

or horizontal:

  @(block [


7.4.3 Block Scope

The names of blocks are in a distinct namespace from the variable binding space. So @(block foo) is unrelated to the variable @foo.

A block extends from the @(block ...) directive which introduces it, until the matching @(end), and may be empty. For instance:

  @(block foo)

Here, the block foo occurs in a @(some) clause, and so it extends to the @(end) which terminates the block. After that @(end), the name foo is not associated with a block (is not "in scope"). The second @(end) terminates the @(some) block.

The implicit anonymous block introduced by @(skip) has the same scope as the @(skip): it extends over all of the material which follows the skip, to the end of the containing subquery.


7.4.4 Block Nesting

Blocks may nest, and nested blocks may have the same names as blocks in which they are nested. For instance:


is a nesting of two anonymous blocks, and

  @(block foo)
  @(block foo)

is a nesting of two named blocks which happen to have the same name. When a nested block has the same name as an outer block, it creates a block scope in which the outer block is "shadowed"; that is to say, directives which refer to that block name within the nested block refer to the inner block, and not to the outer one.


7.4.5 Block Semantics

A block normally does nothing. The query material in the block is evaluated normally. However, a block serves as a termination point for @(fail) and @(accept) directives which are in scope of that block and refer to it.

The precise meaning of these directives is:

@(fail name)
Immediately terminate the enclosing query block called name, as if that block failed to match anything. If more than one block by that name encloses the directive, the innermost block is terminated. No bindings emerge from a failed block.

Immediately terminate the innermost enclosing anonymous block, as if that block failed to match.

The @(fail) directive has a vertical and horizontal form.

If the implicit block introduced by @(skip) is terminated in this manner, this has the effect of causing skip itself to fail. In other words, the behavior is as if @(skip)'s search did not find a match for the trailing material, except that it takes place prematurely (before the end of the available data source is reached).

If the implicit block associated with a @(collect) is terminated this way, then the entire collect fails. This is a special behavior, because a collect normally does not fail, even if it matches nothing and collects nothing!

To prematurely terminate a collect by means of its anonymous block, without failing it, use @(accept).

@(accept name)
Immediately terminate the enclosing query block called name, as if that block successfully matched. If more than one block by that name encloses the directive, the innermost block is terminated.

Immediately terminate the innermost enclosing anonymous block, as if that block successfully matched.

@(accept) communicates the current bindings and input position to the terminated block. These bindings and current position may be altered by special interactions between certain directives and @(accept), described in the following section. Communicating the current bindings and input position means that the block which is terminated by @(accept) exhibits the bindings which were collected just prior to the execution of that @(accept) and the input position which was in effect at that time.

@(accept) has a vertical and horizontal form. In the horizontal form, it communicates a horizontal input position. A horizontal input position thus communicated will only take effect if the block being terminated had been suspended on the same line of input.

If the implicit block introduced by @(skip) is terminated by @(accept), this has the effect of causing the skip itself to succeed, as if all of the trailing material had successfully matched.

If the implicit block associated with a @(collect) is terminated by @(accept), then the collection stops. All bindings collected in the current iteration of the collect are discarded. Bindings collected in previous iterations are retained, and collated into lists in accordance with the semantics of collect.

Example: alternative way to achieve @(until) termination:

  @  (maybe)
  @  (accept)
  @  (end)

This query will collect entire lines into a list called LINE. However, if the line --- is matched (by the embedded @(maybe)), the collection is terminated. Only the lines up to, and not including the --- line, are collected. The effect is identical to:


The difference (not relevant in these examples) is that the until clause has visibility into the bindings set up by the main clause.

However, the following example has a different meaning:

  @  (maybe)
  @  (accept)
  @  (end)

Now, lines are collected until the end of the data source, or until a line is found which is followed by a --- line. If such a line is found, the collection stops, and that line is not included in the collection! The @(accept) terminates the process of the collect body, and so the action of collecting the last @LINE binding into the list is not performed.

Example: communication of bindings and input position:

 @(block foo)
 @(accept foo)

At the point where the accept occurs, the foo block has matched the first line, bound the text "1" to the variable @first. The block is then terminated. Not only does the @first binding emerge from this terminated block, but what also emerges is that the block advanced the data past the first line to the second line. Next, the @(some) directive ends, and propagates the bindings and position. Thus the @second which follows then matches the second line and takes the text "2".

Example: abandonment of @(some) clause by @(accept):

In the following query, the foo block occurs inside a maybe clause. Inside the foo block there is a @(some) clause. Its first subclause matches variable @first and then terminates block foo. Since block foo is outside of the @(some) directive, this has the effect of terminating the @(some) clause:

 @(block foo)
 @  (some)
 @  (accept foo)
 @  (or)
 @  (end)

The second clause of the @(some) directive, namely:


is never processed. The reason is that subclauses are processed in top to bottom order, but the processing was aborted within the first clause the @(accept foo). The @(some) construct never gets the opportunity to match four lines.

If the @(accept foo) line is removed from the above query, the output is different:

 @(block foo)
 @  (some)
 @#          <--  @(accept foo) removed from here!!!
 @  (or)
 @  (end)

Now, all clauses of the @(some) directive have the opportunity to match. The second clause grabs four lines, which is the longest match. And so, the next line of input available for matching is 5, which goes to the @second variable.


7.4.6 Interaction Between The trailer and accept Directives

If one of the clauses which follow a @(trailer) requests a successful termination to an outer block via @(accept), then @(trailer) intercepts the escape and adjusts the data extent to the position that it was given.



The variable line3 is bound to "1" because although @(accept) yields a data position which has advanced to the third line, this is intercepted by @(trailer) and adjusted back to the first line. Neglecting to do this adjustment would violate the semantics of trailer.


7.4.7 Interaction Between The next and accept Directives

When the clauses under a next directive are terminated by an accept, such that control passes to a block which surrounds that next, the accept is intercepted by next.

The input position being communicated by the accept is replaced with the original input position in the original stream which is in effect prior to the next directive. The accept transfer is then resumed.

In other words, accept cannot be used to "leak" the new stream out of a next scope.

However, next has no effect on the bindings being communicated.


 @(next "file-x")
 @(block b)
 @(next "file-y")
 @(accept b)

Here, the variable line matches the first line of the file "file-y", after which an accept transfer is initiated, targeting block b. This transfer communicates the line binding, as well as the position within file-y, pointing at the second line. However, the accept traverses the next directive, causing it to be abandoned. The special unwinding action within that directive detects this transfer and rewrites the input position to be the original one within the stream associated with "file-x". Note that this special handling exists in order for the behavior to be consistent with what would happen if the @(accept b) were removed, and the block b terminated normally: because the inner next is nested within that block, TXR would backtrack to the previous input position within "file-x".


7.4.8 Interaction Between Functions and the accept Directive

If a pattern function is terminated due to accept, the function return mechanism intercepts the accept. The bindings being communicated by that accept are then subject to the special resolution with respect to the function parameters, exactly as if the bindings were being returned normally out of the function. The resolved bindings then replace those being communicated by the accept and the accept transfer is resumed.


 @(define fun (a))
 @  (bind a "a")
 @  (bind b "b")
 @  (accept blk)
 @(block blk)
 @(fun x)
 this line is skipped by accept

Here, the accept initiates a control transfer which communicates the a and b variable bindings which are visible in that scope. This transfer is intercepted by the function, and the treatment of the bindings follows to the same rules as a normal return (which, in the given function, would readily take place if the accept directive were removed). The b variable is suppressed, because b isn't a parameter of the function. Because a is a parameter, and the argument to that parameter is the unbound variable x, the effect is that x is bound to the value of a. When the accept transfer reaches block blk and terminates it, all that emerges is the x binding carrying "a".

If the accept invocation is removed from fun, then the function returns normally, producing the x binding. In that case, the line this line is skipped by accept isn't skipped since the block isn't being terminated; that line must match something.


7.4.9 Interaction Between finally and the accept directive

If the exception handling try directive protected body is terminated by an accept transfer, and if that try has a finally block, then there is a special interaction between the finally block and the accept transfer.

The processing of the finally block detects that it has been triggered by an accept transfer. Consequently, it retrieves the current input position and bindings from that transfer, and uses that position and those bindings for the processing of the finally clauses.

If the finally clauses succeed, then the new input position and new bindings are installed into the accept control transfer and that transfer resumes.

If the finally clauses fail, then the accept transfer is converted to a fail, with exactly the same block as its destination.


7.4.10 Vertical-Horizontal Mismatch Between block and accept

The block, accept and fail directives comes in horizontal and vertical forms.

This creates the possibility that an accept in horizontal context targets a vertical block or vice versa, raising the question of how the input position is treated. The semantics of this is defined.

If a horizontal-context accept targets a vertical block, the current position at the target block will be the following line. That is to say, when the horizontal accept occurs, there is a current input line which may have unconsumed material past the current position. If the accept communicates its input position to a vertical context, that unconsumed material is skipped, as if it had been matched and the vertical position is advanced to the next line.

If a horizontal block catches a vertical accept, it rejects that accept's position and stays at the current backtracking position for that block. Only the bindings from the accept are retained.


7.4.11 Horizontal-Horizontal Mismatch between block and accept

It is possible for a horizontal accept to terminate in a horizontal block which is processing a different line of input (or even a different input stream). This situation is treated the same way as vertical accept terminating in a horizontal block: the position communicated by accept is ignored, and only the bindings are taken.


7.5 Functions


7.5.1 Overview

TXR functions allow a query to be structured to avoid repetition. On a theoretical note, because TXR functions support recursion, functions enable TXR to match some kinds of patterns which exhibit self-embedding, or nesting, and thus cannot be matched by a regular language.

Functions in TXR are not exactly like functions in mathematics or functional languages, and are not like procedures in imperative programming languages. They are not exactly like macros either. What it means for a TXR function to take arguments and produce a result is different from the conventional notion of a function.

A TXR function may have one or more parameters. When such a function is invoked, an argument must be specified for each parameter. However, a special behavior is at play here. Namely, some or all of the argument expressions may be unbound variables. In that case, the corresponding parameters behave like unbound variables also. Thus TXR function calls can transmit the "unbound" state from argument to parameter.

It should be mentioned that functions have access to all bindings that are visible in the caller; functions may refer to variables which are not mentioned in their parameter list.

With regard to returning, TXR functions are also unconventional. If the function fails, then the function call is considered to have failed. The function call behaves like a kind of match; if the function fails, then the call is like a failed match.

When a function call succeeds, then the bindings emanating from that function are processed specially. Firstly, any bindings for variables which do not correspond to one of the function's parameters are thrown away. Functions may internally bind arbitrary variables in order to get their job done, but only those variables which are named in the function argument list may propagate out of the function call. Thus, a function with no arguments can only indicate matching success or failure, but not produce any bindings. Secondly, variables do not propagate out of the function directly, but undergo a renaming. For each parameter which went into the function as an unbound variable (because its corresponding argument was an unbound variable), if that parameter now has a value, that value is bound onto the corresponding argument.


  @(define collect-words (list))
  @(coll)@{list /[^ \t]+/}@(end)

The above function collect-words contains a query which collects words from a line (sequences of characters other than space or tab), into the list variable called list. This variable is named in the parameter list of the function, therefore, its value, if it has one, is permitted to escape from the function call.

Suppose the input data is:

  Fine summer day

and the function is called like this:

  @(collect-words wordlist)

The result (with txr -B) is:


How it works is that in the function call @(collect-words wordlist), wordlist is an unbound variable. The parameter corresponding to that unbound variable is the parameter list. Therefore, that parameter is unbound over the body of the function. The function body collects the words of "Fine summer day" into the variable list, and then yields the that binding. Then the function call completes by noticing that the function parameter list now has a binding, and that the corresponding argument wordlist has no binding. The binding is thus transferred to the wordlist variable. After that, the bindings produced by the function are thrown away. The only enduring effects are:

the function matched and consumed some input; and
the function succeeded; and
the wordlist variable now has a binding.

Another way to understand the parameter behavior is that function parameters behave like proxies which represent their arguments. If an argument is an established value, such as a character string or bound variable, the parameter is a proxy for that value and behaves just like that value. If an argument is an unbound variable, the function parameter acts as a proxy representing that unbound variable. The effect of binding the proxy is that the variable becomes bound, an effect which is settled when the function goes out of scope.

Within the function, both the original variable and the proxy are visible simultaneously, and are independent. What if a function binds both of them? Suppose a function has a parameter called P, which is called with an argument A, which is an unbound variable, and then, in the function, both A and P bound. This is permitted, and they can even be bound to different values. However, when the function terminates, the local binding of A simply disappears (because the symbol A is not among the parameters of the function). Only the value bound to P emerges, and is bound to A, which still appears unbound at that point. The P binding disappears also, and the net effect is that A is now bound. The "proxy" binding of A through the parameter P "wins" the conflict with the direct binding.


7.5.2 Definition Syntax

Function definition syntax comes in two flavors: vertical and horizontal. Horizontal definitions actually come in two forms, the distinction between which is hardly noticeable, and the need for which is made clear below.

A function definition begins with a @(define ...) directive. For vertical functions, this is the only element in a line.

The define symbol must be followed by a symbol, which is the name of the function being defined. After the symbol, there is a parenthesized optional argument list. If there is no such list, or if the list is specified as () or the symbol nil then the function has no parameters. Examples of valid define syntax are:

  @(define foo)
  @(define bar ())
  @(define match (a b c))

If the define directive is followed by more material on the same line, then it defines a horizontal function:

  @(define match-x)x@(end)

If the define is the sole element in a line, then it is a vertical function, and the function definition continues below:

  @(define match-x)

The difference between the two is that a horizontal function matches characters within a line, whereas a vertical function matches lines within a stream. The former match-x matches the character x, advancing to the next character position. The latter match-x matches a line consisting of the character x, advancing to the next line.

Material between @(define) and @(end) is the function body. The define directive may be followed directly by the @(end) directive, in which case the function has an empty body.

Functions may be nested within function bodies. Such local functions have dynamic scope. They are visible in the function body in which they are defined, and in any functions invoked from that body.

The body of a function is an anonymous block. (See Blocks above.)


7.5.3 Two Forms of The Horizontal Function

If a horizontal function is defined as the only element of a line, it may not be followed by additional material. The following construct is erroneous:

  @(define horiz (x))@foo:@bar@(end)lalala

This kind of definition is actually considered to be in the vertical context, and like other directives that have special effects and that do not match anything, it does not consume a line of input. If the above syntax were allowed, it would mean that the line would not only define a function but also match lalala. This would, in turn, would mean that the @(define)...@(end) is actually in horizontal mode, and so it matches a span of zero characters within a line (which means that is would require a line of input to match: a surprising behavior for a nonmatching directive!)

A horizontal function can be defined in an actual horizontal context. This occurs if its is in a line where it is preceded by other material. For instance:

  X@(define fun)...@(end)Y

This is a query line which must match the text XY. It also defines the function fun. The main use of this form is for nested horizontal functions:

  @(define fun)@(define local_fun)...@(end)@(end)


7.5.4 Vertical-Horizontal Overloading

A function of the same name may be defined as both vertical and horizontal. Both functions are available at the same time. Which one is used by a call is resolved by context. See the section Vertical Versus Horizontal Calls below.


7.5.5 Call Syntax

A function is invoked by compound directive whose first symbol is the name of that function. Additional elements in the directive are the arguments. Arguments may be symbols, or other objects like string and character literals, quasiliterals ore regular expressions.


 @(define pair (a b))
 @a @b
 @(pair first second)
 @(pair "ice" cream)
 one two
 ice milk

The first call to the function takes the line "one two". The parameter a takes "one" and parameter b takes "two". These are rebound to the arguments first and second. The second call to the function binds the a parameter to the word "ice", and the b is unbound, because the corresponding argument cream is unbound. Thus inside the function, a is forced to match ice. Then a space is matched and b collects the text "milk". When the function returns, the unbound "cream" variable gets this value.

If a symbol occurs multiple times in the argument list, it constrains both parameters to bind to the same value. That is to say, all parameters which, in the body of the function, bind a value, and which are all derived from the same argument symbol must bind to the same value. This is settled when the function terminates, not while it is matching. Example:

 @(define pair (a b))
 @a @b
 @(pair same same)
 one two
 [query fails]

Here the query fails because a and b are effectively proxies for the same unbound variable same and are bound to different values, creating a conflict which constitutes a match failure.


7.5.6 Vertical Versus Horizontal Calls

A function call which is the only element of the query line in which it occurs is ambiguous. It can go either to a vertical function or to the horizontal one. If both are defined, then it goes to the vertical one.


 @(define which (x))@(bind x "horizontal")@(end)
 @(define which (x))
 @(bind x "vertical")
 @(which fun)

Not only does this call go to the vertical function, but it is in a vertical context.

If only a horizontal function is defined, then that is the one which is called, even if the call is the only element in the line. This takes place in a horizontal character-matching context, which requires a line of input which can be traversed:


 @(define which (x))@(bind x "horizontal")@(end)
 @(which fun)
 [query fails]

The query fails because since @(which fun) is in horizontal mode, it matches characters in a line. Since the function body consists only of @(bind ...) which doesn't match any characters, the function call requires an empty line to match. The line ABC is not empty, and so there is a matching failure. The following example corrects this:


 @(define which (x))@(bind x "horizontal")@(end)
 @(which fun)
 [empty line]

A call made in a clearly horizontal context will prefer the horizontal function, and only fall back on the vertical one if the horizontal one doesn't exist. (In this fallback case, the vertical function is called with empty data; it is useful for calling vertical functions which process arguments and produce values.)

In the next example, the call is followed by trailing material, placing it in a horizontal context. Leading material will do the same thing:


 @(define which (x))@(bind x "horizontal")@(end)
 @(define which (x))
 @(bind x "vertical")
 @(which fun)B


7.5.7 Local Variables

As described earlier, variables bound in a function body which are not parameters of the function are discarded when the function returns. However, that, by itself, doesn't make these variables local, because pattern functions have visibility to all variables in their calling environment. If a variable x exists already when a function is called, then an attempt to bind it inside a function may result in a failure. The local directive must be used in a pattern function to list which variables are local.


  @(define path (path))@\
    @(local x y)@\
      (@(path x))@(path y)@(bind path `(@x)@y`)@\
      @{x /[.,;'!?][^ \t\f\v]/}@(path y)@(bind path `@x@y`)@\
      @{x /[^ .,;'!?()\t\f\v]/}@(path y)@(bind path `@x@y`)@\
      @(bind path "")@\

This is a horizontal function which matches a path, which lands into four recursive cases. A path can be parenthesized path followed by a path; it can be a certain character followed by a path, or it can be empty

This function ensures that the variables it uses internally, x and y, do not have anything to do with any inherited bindings for x and y.

Note that the function is recursive, which cannot work without x and y being local, even if no such bindings exist prior to the top-level invocation of the function. The invocation @(path x) causes x to be bound, which is visible inside the invocation @(path y), but that invocation needs to have its own binding of x for local use.


7.5.8 Nested Functions

Function definitions may appear in a function. Such definitions are visible in all functions which are invoked from the body (and not necessarily enclosed in the body). In other words, the scope is dynamic, not lexical. Inner definitions shadow outer definitions. This means that a caller can redirect the function calls that take place in a callee, by defining local functions which capture the references.


 @(define which)
 @  (fun)
 @(define fun)
 @  (output)
 top-level fun!
 @  (end)
 @(define callee)
 @  (define fun)
 @    (output)
 local fun!
 @    (end)
 @  (end)
 @  (which)
 local fun!
 top-level fun!

Here, the function which is defined which calls fun. A top-level definition of fun is introduced which outputs "top-level fun!". The function callee provides its own local definition of fun which outputs "local fun!" before calling which. When callee is invoked, it calls which, whose @(fun) call is routed to callee's local definition. When which is called directly from the top level, its fun call goes to the top-level definition.


7.5.9 Indirect Calls

Function indirection may be performed using the call directive. If fun-expr is an Lisp expression which evaluates to a symbol, and that symbol names a function which takes no arguments, then

  @(call fun-expr)
may be used to invoke the function. Additional expressions may be supplied which specify arguments.

Example 1:

 @(define foo (arg))
  @(bind arg "abc")
  @(call 'foo b)

In this example, the effect is that foo is invoked, and b ends up bound to "abc".

The call directive here uses the 'foo expression to calculate the name of the function to be invoked. (See the quote operator).

This particular call expression can just be replaced by the direct invocation syntax @(foo b).

The power of call lies in being able to specify the function as a value which comes from elsewhere in the program, as in the following example.

 @(define foo (arg))
  @(bind arg "abc")
  @(bind f @'foo)
  @(call f b)

Here the call directive obtains the name of the function from the f variable.

Note that function names are resolved to functions in the environment that is apparent at the point in execution where the call takes place. The directive @(call f args ...) is precisely equivalent to @(s args ...) if, at the point of the call, f is a variable which holds the symbol s and symbol s is defined as a function. Otherwise it is erroneous.


7.6 Modularization


7.6.1 The load and include Directives

The syntax of the load and include directives is:


Where expr is a Lisp expression that evaluates to a string giving the path of the file to load.

Firstly, the path given by expr is converted to an effective path, as follows.

If the value of the *load-path* variable has a current value which is not nil and the path given in expr is pure relative according to the pure-rel-path-p function, then the effective path is interpreted taken relative to the directory portion of the path which is stored in *load-path*.

If *load-path* is nil, or the load path is not pure relative, then the path is taken as-is as the effective path.

Next, an attempt is made to open the file for processing, in almost exactly the same manner as by the TXR Lisp function load. The difference is that if the effective path is unsuffixed, then the .txr suffix is added to it, and that resulting path is tried first, and if it succeeds, then the file is treated as TXR Pattern Language syntax. If that fails, then the suffix .tlo is tried, and so forth, as described for the load function.

If these initial attempts to find the file fail, and the failure is due to the file not being found rather than some other problem such as a permission error, and expr isn't an absolute path according to abs-path-p, then additional attempts are made by searching for the file in the list of directories given in the *load-search-dirs* variable. Details are given in the description of the TXR Lisp load function.

Both the load and include directives bind the *load-path* variable to the path of the loaded file just before parsing syntax from it, The *package* variable is also given a new dynamic binding, whose value is the same as the existing binding. These bindings are removed when the load operation completes, restoring the prior values of these variables. The *load-hooks* variable is given a new dynamic binding, with a nil value.

If the file opened for processing is TXR Lisp source, or a compiled TXR Lisp file, then it is processed in the manner described for the load function.

Different requirements apply to the processing of the file under the load and include directives.

The include directive performs the processing of the file at parse time. If the file being processed is TXR Pattern Language, then it is parsed, and then its syntax replaces the include directive, as if it had originally appeared in its place. If a TXR Lisp source or a compiled TXR Lisp file is processed by include then the include directive is removed from the syntax.

The load directive performs the processing of the file at evaluation time. Evaluation time occurs after a TXR program is read from beginning to end and parsed. That is to say, when a TXR query is parsed, any embedded @(load ...) forms in it are parsed and constitute part of its syntax tree. They are executed when that query is executed, whenever its execution reaches those load directives. When the load directive processes TXR Pattern Language syntax, it parses the file in its entirety and then executes that file's directives against the current input position. Repeated executions of the same load directive result in repeated processing of the file.

Note: the include directive is useful for loading TXR files which contain Lisp macros which are needed by the parent program. The parent program cannot use load to bring in macros because macros are required during expansion, which takes place prior to evaluation time, whereas load doesn't execute until evaluation time.

Note: the load directive doesn't provide access to the value propagated by a return via the load block.

See also: the load function, and the self-path, stdlib and *load-path* variables in TXR Lisp.


7.7 Output


7.7.1 Introduction

A TXR query may perform custom output. Output is performed by output clauses, which may be embedded anywhere in the query, or placed at the end. Output occurs as a side effect of producing a part of a query which contains an @(output) directive, and is executed even if that part of the query ultimately fails to find a match. Thus output can be useful for debugging. An output clause specifies that its output goes to a file, pipe, or (by default) standard output. If any output clause is executed whose destination is standard output, TXR makes a note of this, and later, just prior to termination, suppresses the usual printing of the variable bindings or the word false.


7.7.2 The output Directive

The syntax of the @(output) directive is:

  @(output [
destination ] { bool-keyword | keyword value }* )
  . one or more output directives or lines

If the directive has arguments, then the first one is evaluated. If it is an object other than a keyword symbol, then it specifies the optional destination. Any remaining arguments after the optional destination are the keyword list. If the destination is missing, then the entire argument list is a keyword list.

The destination argument, if present, is treated as a TXR Lisp expression and evaluated. The resulting value is taken as the output destination. The value may be a string which gives the pathname of a file to open for output. Otherwise, the destination must be a stream object.

The keyword list consists of a mixture of Boolean keywords which do not have an argument, or keywords with arguments.

The following Boolean keywords are supported:

The output directive throws an exception if the output destination cannot be opened, unless the :nothrow keyword is present, in which case the situation is treated as a match failure.

Note that since command pipes are processes that report errors asynchronously, a failing command will not throw an immediate exception that can be suppressed with :nothrow. This is for synchronous errors, like trying to open a destination file, but not having permissions, etc.

This keyword is meaningful for files, specifying append mode: the output is to be added to the end of the file rather than overwriting the file.

The following value keywords are supported:

The argument can be a symbol, which specifies a filter to be applied to the variable substitutions occurring within the output clause. The argument can also be a list of filter symbols, which specifies that multiple filters are to be applied, in left-to-right order.

See the later sections Output Filtering below, and The Deffilter Directive.

The argument of :into is a symbol which denotes a variable. The output will go into that variable. If the variable is unbound, it will be created. Otherwise, its contents are overwritten unless the :append keyword is used. If :append is used, then the new content will be appended to the previous content of the variable, after flattening the content to a list, as if by the flatten directive.

The argument of :named is a symbol which denotes a variable. The file or pipe stream which is opened for the output is stored in this variable, and is not closed at the end of the output block. This allows a subsequent output block to continue output on the same stream, which is possible using the next two keywords, :continue or :finish. A new binding is established for the variable, even if it already has an existing binding.

A destination should not be specified if :continue is used. The argument of :continue is an expression, such as a variable name, that evaluates to a stream object. That stream object is used for the output block. At the end of the output block, the stream is flushed, but not closed. A usage example is given in the documentation for the Close Directive below.

A destination should not be specified if :finish is used. The argument of :finish is an expression, such as a variable name, that evaluates to a stream object. That stream object is used for the output block. At the end of the output block, the stream is closed. An example is given in the documentation for the Close Directive below.


7.7.3 The push Directive

The @(push) directive is a variant of @(output) which produces lines of text that are pushed back into the input stream.

This directive supports only the :filter keyword argument.

This directive doesn't take any of the keyword arguments supported by @(output) except for the :filter keyword.

After the execution of a @(push), the next pattern matching syntax that is evaluated now faces the material produced by that @(push) followed by the original input. In order to preserve the line numbering of the original input, @(push) adjusts the line number for the synthetic input by subtracting the number of synthetic lines from the original input's line number. For instance if the original input is line 5, and 7 lines are prepended by @(push), then those lines are numbered -2 to 4.

The input-synthesizing effect of @(push) is visible to a subsequent form in exactly those situations in which an input-consuming effect of a pattern matching directive would also be visible. For instance, a @(push) occurring in the body of a @(collect) can produce input that is visible to the next iteration.

The @(push) directive interacts with the parallel matching directives such as @(some). When multiple parallel clauses match, the input position is advanced by the longest match. Lines pushed into the input by @(push) look like negative advancement. If one clause advances in the input, while another one pushes into it, the push will lose to the advancement and its effect will disappear. If two clauses push varying amounts of material, the shorter push will win.


Swap the first two lines if they start with a colon, changing the colon to a period:

  @  (push)
  @  (end)
  @(data capture)
  @(do (tprint capture))




7.7.4 Output Text

Text in an output clause is not matched against anything, but is output verbatim to the destination file, device or command pipe.


7.7.5 Output Variables

Variables occurring in an output clause do not match anything; instead their contents are output.

A variable being output can be any object. If it is of a type other than a list or string, it will be converted to a string as if by the tostring function in TXR Lisp.

A list is converted to a string in a special way: the elements are individually converted to a string and then they are catenated together. The default separator string is a single space: an alternate separation can be specified as an argument in the brace substitution syntax. Empty lists turn into an empty string.

Lists may be output within @(repeat) or @(rep) clauses. Each nesting of these constructs removes one level of nesting from the list variables that it contains.

In an output clause, the @{name number} variable syntax generates fixed-width field, which contains the variable's text. The absolute value of the number specifies the field width. For instance -20 and 20 both specify a field width of twenty. If the text is longer than the field, then it overflows the field. If the text is shorter than the field, then it is left-adjusted within that field, if the width is specified as a positive number, and right-adjusted if the width is specified as negative.

An output variable may specify a filter which overrides any filter established for the output clause. The syntax for this is @{NAME :filter filterspec}. The filter specification syntax is the same as in the output clause. See Output Filtering below.


7.7.6 Output Variables: Indexing

Additional syntax is supported in output variables that does not appear in pattern-matching variables.

A square bracket index notation may be used to extract elements or ranges from a variable, which works with strings, vectors and lists. Elements are indexed from zero. This notation is only available in brace-enclosed syntax, and looks like this:

Extract the element at the position given by expr.

Extract a range of elements from the position given by expr1, up to one position less than the position given by expr2.

If the variable is a list, it is treated as a list substitution, exactly as if it were the value of an unsubscripted list variable. The elements of the list are converted to strings and catenated together with a separator string between them, the default one being a single space.

An alternate character may be given as a string argument in the brace notation.


  @(bind a ("a" "b" "c" "d"))
  @{a[1..3] "," 10}

The above produces the text "b,c" in a field 10 spaces wide. The [1..3] argument extracts a range of a; the "," argument specifies an alternate separator string, and 10 specifies the field width.


7.7.7 Output Substitutions

The brace syntax has another syntactic and semantic extension in output clauses. In place of the symbol, an expression may appear. The value of that expression is substituted.


 @(bind a "foo")
 @{`@a:` -10}

Here, the quasiliteral expression `@a:` is evaluated, producing the string "foo:". This string is printed right-adjusted in a 10 character field.


7.7.8 The repeat Directive

The repeat directive generates repeated text from a "boilerplate", by taking successive elements from lists. The syntax of repeat is like this:

  main clause material, required
  special clauses, optional

repeat has four types of special clauses, any of which may be specified with empty contents, or omitted entirely. They are described below.

repeat takes arguments, also described below.

All of the material in the main clause and optional clauses is examined for the presence of variables. If none of the variables hold lists which contain at least one item, then no output is performed, (unless the repeat specifies an @(empty) clause, see below). Otherwise, among those variables which contain nonempty lists, repeat finds the length of the longest list. This length of this list determines the number of repetitions, R.

If the repeat contains only a main clause, then the lines of this clause is output R times. Over the first repetition, all of the variables which, outside of the repeat, contain lists are locally rebound to just their first item. Over the second repetition, all of the list variables are bound to their second item, and so forth. Any variables which hold shorter lists than the longest list eventually end up with empty values over some repetitions.

Example: if the list A holds "1", "2" and "3"; the list B holds "A", "B"; and the variable C holds "X", then

  >> @C
  >> @A @B

will produce three repetitions (since there are two lists, the longest of which has three items). The output is:

  >> X
  >> 1 A
  >> X
  >> 2 B
  >> X
  >> 3

The last line has a trailing space, since it is produced by "@A @B", where B has an empty value. Since C is not a list variable, it produces the same value in each repetition.

The special clauses are:

If the repeat produces exactly one repetition, then the contents of this clause are processed for that one and only repetition, instead of the main clause or any other clause which would otherwise be processed.

The body of this clause specifies an alternative body to be used for the first repetition, instead of the material from the main clause.

The body of this clause is used instead of the main clause for the last repetition.

If the repeat produces no repetitions, then the body of this clause is output. If this clause is absent or empty, the repeat produces no output.

@(mod n m)
The forms n and m are Lisp expressions that evaluate to integers. The value of m should be nonzero. The clause denoted this way is active if the repetition modulo m is equal to n. The first repetition is numbered zero. For instance the clause headed by @(mod 0 2) will be used on repetitions 0, 2, 4, 6, ... and @(mod 1 2) will be used on repetitions 1, 3, 5, 7, ...

@(modlast n m)
The meaning of n and m is the same as in @(mod n m), but one more condition is imposed. This clause is used if the repetition modulo m is equal to n, and if it is the last repetition.

The precedence among the clauses which take an iteration is: single > first > modlast > last > mod > main. That is, whenever two or more of these clauses can apply to a repetition, then the leftmost one in this precedence list will be selected. It is possible for all these clauses to be viable for processing the same repetition. If a repeat occurs which has only one repetition, then that repetition is simultaneously the first, only and last repetition. Moreover, it also matches (mod 0 m) and, because it is the last repetition, it matches (modlast 0 m). In this situation, if there is a @(single) clause present, then the repetition shall be processed using that clause. Otherwise, if there is a @(first) clause present, that clause is activated. Failing that, @(modlast) is used if there is such a clause, featuring an n argument of zero. If there isn't, then the @(last) clause is considered, if present. Otherwise, the @(mod) clause is considered if present with an n argument of zero. Otherwise, none of these clauses are present or applicable, and the repetition is processed using the main clause.

The @(empty) clause does not appear in the above precedence list because it is mutually exclusive with respect to the others: it is processed only when there are no iterations, in which case even the main clause isn't active.

The @(repeat) clause supports arguments.

      [:counter {
symbol | (symbol expr)}]
      [:vars ({
symbol | (symbol expr)}*)])

The :counter argument designates a symbol which will behave as an integer variable over the scope of the clauses inside the repeat. The variable provides access to the repetition count, starting at zero, incrementing with each repetition. If the argument is given as (symbol expr) then expr is a Lisp expression whose value is taken as a displacement value which is added to each iteration of the counter. For instance :counter (c 1) specifies a counter c which counts from 1.

The :vars argument specifies a list of variable name symbols symbol or else pairs of the form (symbol init-form) consisting of a variable name and Lisp expression. Historically, the former syntax informed repeat about references to variables contained in Lisp code. This usage is no longer necessary as of TXR 243, since the repeat construct walks Lisp code, identifying all free variables. The latter syntax introduces a new pattern variable binding for symbol over the scope of the repeat construct. The init-form specifies a Lisp expression which is evaluated to produce the binding's value.

The repeat directive then processes the list of variables, selecting from it those which have a binding, either a previously existing binding or the one just introduced. For each selected variable, repeat will assume that the variable occurs in the repeat block and contains a list to be iterated.

The variable binding syntax supported by :vars of the form (symbol init-form) provides a solution for situations when it is necessary to iterate over some list, but that list is the result of an expression, and not stored in any variable. A repeat block iterates only over lists emanating from variables; it does not iterate over lists pulled from arbitrary expressions.

Example: output all file names matching the *.txr pattern in the current directory:

  @(repeat :vars ((name (glob "*.txr"))))

Prior to TXR 243, the simple variable-binding syntax supported by :vars of the form symbol was needed for situations in which TXR Lisp expressions which referenced variables were embedded in @(repeat) blocks. Variable references embedded in Lisp code were not identified in @(repeat). For instance, the following produced no output, because no variables were found in the repeat body:

  @(bind trigraph ("abc" "def" "ghi"))
  @(reverse trigraph)

There is a reference to trigraph but it's inside the (reverse trigraph) Lisp expression that was not processed by repeat. The solution was to mention trigraph in the :vars construct:

  @(bind trigraph ("abc" "def" "ghi"))
  @(repeat :vars (trigraph))
  @(reverse trigraph)

Then the repeat block would iterate over trigraph, producing the output


This workaround is no longer required as of TXR 243; the output is produced by the first example, without :vars.


7.7.9 Nested repeat directives

If a repeat clause encloses variables which hold multidimensional lists, those lists require additional nesting levels of repeat (or rep). It is an error to attempt to output a list variable which has not been decimated into primary elements via a repeat construct.

Suppose that a variable X is two-dimensional (contains a list of lists). X must be nested twice in a repeat. The outer repeat will traverse the lists contained in X. The inner repeat will traverse the elements of each of these lists.

A nested repeat may be embedded in any of the clauses of a repeat, not only in the main clause.


7.7.10 The rep Directive

The rep directive is similar to repeat. Whereas repeat is line-oriented, rep generates material within a line. It has all the same clauses, but everything is specified within one line:

  @(rep)... main material ... .... special clauses ...@(end)

More than one @(rep) can occur within a line, mixed with other material. A @(rep) can be nested within a @(repeat) or within another @(rep).

Also, @(rep) accepts the same :counter and :vars arguments.


7.7.11 repeat and rep Examples

Example 1: show the list L in parentheses, with spaces between the elements, or the word EMPTY if the list is empty:

  @(rep)@L @(single)(@L)@(first)(@L @(last)@L)@(empty)EMPTY@(end)

Here, the @(empty) clause specifies EMPTY. So if there are no repetitions, the text EMPTY is produced. If there is a single item in the list L, then @(single)(@L) produces that item between parentheses. Otherwise if there are two or more items, the first item is produced with a leading parenthesis followed by a space by @(first)(@L and the last item is produced with a closing parenthesis: @(last)@L). All items in between are emitted with a trailing space by the main clause: @(rep)@L.

Example 2: show the list L like Example 1 above, but the empty list is ().

  (@(rep)@L @(last)@L@(end))

This is simpler. The parentheses are part of the text which surrounds the @(rep) construct, produced unconditionally. If the list L is empty, then @(rep) produces no output, resulting in (). If the list L has one or more items, then they are produced with spaces each one, except the last which has no space. If the list has exactly one item, then the @(last) applies to it instead of the main clause: it is produced with no trailing space.


7.7.12 The close Directive

The syntax of the close directive is:


Where expr evaluates to a stream. The close directive can be used to explicitly close streams created using @(output ... :named var) syntax, as an alternative to @(output :finish expr).


Write two lines to "foo.txt" over two output blocks using a single stream:

  @(output "foo.txt" :named foo)
  @(output :continue foo)
  @(close foo)

The same as above, using :finish rather than :continue so that the stream is closed at the end of the second block:

  @(output "foo.txt" :named foo)
  @(output :finish foo)


7.7.13 Output Filtering

Often it is necessary to transform the output to preserve its meaning under the convention of a given data format. For instance, if a piece of text contains the characters < or >, then if that text is being substituted into HTML, these should be replaced by &lt; and &gt;. This is what filtering is for. Filtering is applied to the contents of output variables, not to any template text. TXR implements named filters. Built-in filters are named by keywords, given below. User-defined filters are possible, however. See notes on the deffilter directive below.

Instead of a filter name, the syntax (fun name) can be used. This denotes that the function called name is to be used as a filter. This is described in the next section Function Filters below.

Built-in filters named by keywords:

Filter text to HTML, representing special characters using HTML ampersand sequences. For instance > is replaced by &gt;.

Filter text to HTML, representing special characters using HTML ampersand sequences. Unlike :tohtml, this filter doesn't treat the single and double quote characters. It is not suitable for preparing HTML fragments which end up inserted into HTML tag attributes.

Filter text with HTML codes into text in which the codes are replaced by the corresponding characters. For instance &gt; is replaced by >.

Convert the 26 lowercase letters of the English alphabet to uppercase.

Convert the 26 uppercase letters of the English alphabet to lowercase.

Decode percent-encoded text. Character triplets consisting of the % character followed by a pair of hexadecimal digits (case insensitive) are are converted to bytes having the value represented by the hexadecimal digits (most significant nybble first). Sequences of one or more such bytes are treated as UTF-8 data and decoded to characters.

Convert to percent encoding according to RFC 3986. The text is first converted to UTF-8 bytes. The bytes are then converted back to text as follows. Bytes in the range 0 to 32, and 127 to 255 (note: including the ASCII DEL), bytes whose values correspond to ASCII characters which are listed by RFC 3986 as being in the "reserved set", and the byte value corresponding to the ASCII % character are encoded as a three-character sequence consisting of the % character followed by two hexadecimal digits derived from the byte value (most significant nybble first, upper case). All other bytes are converted directly to characters of the same value without any such encoding.

Decode from URL encoding, which is like percent encoding, except that if the unencoded + character occurs, it is decoded to a space character. The %20 sequence still decodes to space, and %2B to the + character.

Encode to URL encoding, which is like percent encoding except that a space maps to + rather than %20. The + character, being in the reserved set, encodes to %2B.

Decode from the Base 64 encoding described in RFC 4648, section 5.

Encode to the Base 64 encoding described in RFC 4648, section 5.

Decode from the Base64 encoding described in RFC 4648, section 6. This uses the URL and filename safe alphabet, in which the + (plus) and / (slash) characters used in regular Base 64 are respectively replaced with - (minus) and _ (underscore).

Encode to the Base 64 encoding described in RFC 4648, section 6. See :frombase64url above.

Converts strings to numbers. Strings that contain a period, e or E are converted to floating point as if by the Lisp function flo-str. Otherwise they are converted to integer as if using int-str with a radix of 10. Non-numeric junk results in the object nil.

Converts strings to integers as if using int-str with a radix of 10. Non-numeric junk results in the object nil.

Converts strings to floating-point values as if using the function flo-str. Non-numeric junk results in the object nil.

Converts strings to integers as if using int-str with a radix of 16. Non-numeric junk results in the object nil.


To escape HTML characters in all variable substitutions occurring in an output clause, specify :filter :tohtml in the directive:

  @(output :filter :tohtml)

To filter an individual variable, add the syntax to the variable spec:

  @{x :filter :tohtml}

Multiple filters can be applied at the same time. For instance:

  @{x :filter (:upcase :tohtml)}

This will fold the contents of x to uppercase, and then encode any special characters into HTML. Beware of combinations that do not make sense. For instance, suppose the original text is HTML, containing codes like &quot;. The compound filter (:upcase :fromhtml) will not work because &quot; will turn to &QUOT; which no longer be recognized by the :fromhtml filter, since the entity names in HTML codes are case-sensitive.

Capture some numeric variables and convert to numbers:

  @date @time @temperature @pressure
  @(filter :tofloat temperature pressure)
  @;; temperature and pressure can now be used in calculations


7.7.14 Function Filters

A function can be used as a filter. For this to be possible, the function must conform to certain rules:

The function must take two special arguments, which may be followed by additional arguments.
When the function is called, the first argument will be bound to a string, and the second argument will be unbound. The function must produce a value by binding it to the second argument. If the filter is to be used as the final filter in a chain, it must produce a string.

For instance, the following is a valid filter function:

  @(define foo_to_bar (in out))
  @  (next :string in)
  @  (cases)
  @    (bind out "bar")
  @  (or)
  @    (bind out in)
  @  (end)

This function binds the out parameter to "bar" if the in parameter is "foo", otherwise it binds the out parameter to a copy of the in parameter. This is a simple filter.

To use the filter, use the syntax (:fun foo_to_bar) in place of a filter name. For instance in the bind directive:

  @(bind "foo" "bar" :lfilt (:fun foo_to_bar))

The above should succeed since the left side is filtered from "foo" to "bar", so that there is a match.

Function filters can be used in a chain:

  @(output :filter (:downcase (:fun foo_to_bar) :upcase))

Here is a split function which takes an extra argument which specifies the separator:

  @(define split (in out sep))
  @  (next :list in)
  @  (coll)@(maybe)@token@sep@(or)@token@(end)@(end)
  @  (bind out token)

Furthermore, note that it produces a list rather than a string. This function separates the argument in into tokens according to the separator text carried in the variable sep.

Here is another function, join, which catenates a list:

  @(define join (in out sep))
  @  (output :into out)
  @  (rep)@in@sep@(last)@in@(end)
  @  (end)

Now here is these two being used in a chain:

  @(bind text "how,are,you")
  @(output :filter (:fun split ",") (:fun join "-"))



When the filter invokes a function, it generates the first two arguments internally to pass in the input value and capture the output. The remaining arguments from the (:fun ...) construct are also passed to the function. Thus the string objects "," and "-" are passed as the sep argument to split and join.

Note that split puts out a list, which join accepts. So the overall filter chain operates on a string: a string goes into split, and a string comes out of join.


7.7.15 The deffilter Directive

The deffilter directive allows a query to define a custom filter, which can then be used in output clauses to transform substituted data.

The syntax of deffilter is illustrated in this example:

 @(deffilter rot13
    ("a" "n")
    ("b" "o")
    ("c" "p")
    ("d" "q")
    ("e" "r")
    ("f" "s")
    ("g" "t")
    ("h" "u")
    ("i" "v")
    ("j" "w")
    ("k" "x")
    ("l" "y")
    ("m" "z")
    ("n" "a")
    ("o" "b")
    ("p" "c")
    ("q" "d")
    ("r" "e")
    ("s" "f")
    ("t" "g")
    ("u" "h")
    ("v" "i")
    ("w" "j")
    ("x" "k")
    ("y" "l")
    ("z" "m"))
 @(output :filter rot13)
 hey there!
 url gurer!

The deffilter symbol must be followed by the name of the filter to be defined, followed by bind expressions which evaluate to lists of strings. Each list must be at least two elements long and specifies one or more texts which are mapped to a replacement text. For instance, the following specifies a telephone keypad mapping from uppercase letters to digits.

  @(deffilter alpha_to_phone ("E" "0")
                             ("J" "N" "Q" "1")
                             ("R" "W" "X" "2")
                             ("D" "S" "Y" "3")
                             ("F" "T" "4")
                             ("A" "M" "5")
                             ("C" "I" "V" "6")
                             ("B" "K" "U" "7")
                             ("L" "O" "P" "8")
                             ("G" "H" "Z" "9"))

  @(deffilter foo (`@a` `@b`) ("c" `->@d`))

  @(bind x ("from" "to"))
  @(bind y ("---" "+++"))
  @(deffilter sub x y)

The last deffilter has the same effect as the @(deffilter sub ("from" "to") ("---" "+++")) directive.

Filtering works using a longest match algorithm. The input is scanned from left to right, and the longest piece of text is identified at every character position which matches a string on the left-hand side, and that text is replaced with its associated replacement text. The scanning then continues at the first character after the matched text.

If none of the strings matches at a given character position, then that character is passed through the filter untranslated, and the scan continues at the next character in the input.

Filtering is not in-place but rather instantiates a new text, and so replacement text is not re-scanned for more replacements.

If a filter definition accidentally contains two or more repetitions of the same left-hand string with different right-hand translations, the later ones take precedence. No warning is issued.


7.7.16 The filter Directive

The syntax of the filter directive is:

  @(filter FILTER { VAR }+ )

A filter is specified, followed by one or more variables whose values are filtered and stored back into each variable.

Example: convert a, b, and c to uppercase and HTML encode:

  @(filter (:upcase :tohtml) a b c)


7.8 Exceptions


7.8.1 Introduction

The exceptions mechanism in TXR is another disciplined form of nonlocal transfer, in addition to the blocks mechanism (see Blocks above). Like blocks, exceptions provide a construct which serves as the target for a dynamic exit. Both blocks and exceptions can be used to bail out of deep nesting when some condition occurs. However, exceptions provide more complexity. Exceptions are useful for error handling, and TXR in fact maps certain error situations to exception control transfers. However, exceptions are not inherently an error-handling mechanism; they are a structured dynamic control transfer mechanism, one of whose applications is error handling.

An exception control transfer (simply called an exception) is always identified by a symbol, which is its type. Types are organized in a subtype-supertype hierarchy. For instance, the file-error exception type is a subtype of the error type. This means that a file error is a kind of error. An exception handling block which catches exceptions of type error will catch exceptions of type file-error, but a block which catches file-error will not catch all exceptions of type error. A query-error is a kind of error, but not a kind of file-error. The symbol t is the supertype of every type: every exception type is considered to be a kind of t. (Mnemonic: t stands for type, as in any type).

Exceptions are handled using @(catch) clauses within a @(try) directive.

In addition to being useful for exception handling, the @(try) directive also provides unwind protection by means of a @(finally) clause, which specifies query material to be executed unconditionally when the try clause terminates, no matter how it terminates.


7.8.2 The try Directive

The general syntax of the try directive is

  ... main clause, required ...
  ... optional catch clauses ...
  ... optional finally clause

A catch clause looks like:

  @(catch TYPE [ PARAMETERS ])

and also this simple form:


which catches all exceptions, and is equivalent to @(catch t).

A finally clause looks like:


The main clause may not be empty, but the catch and finally may be.

A try clause is surrounded by an implicit anonymous block (see Blocks section above). So for instance, the following is a no-op (an operation with no effect, other than successful execution):


The @(accept) causes a successful termination of the implicit anonymous block. Execution resumes with query lines or directives which follow, if any.

try clauses and blocks interact. For instance, an accept from within a try clause invokes a finally.

 @(block foo)
 @  (try)
 @    (accept foo)
 @  (finally)
 @     (output)
 @     (end)
 @  (end)

How this works: the try block's main clause is @(accept foo). This causes the enclosing block named foo to terminate, as a successful match. Since the try is nested within this block, it too must terminate in order for the block to terminate. But the try has a finally clause, which executes unconditionally, no matter how the try block terminates. The finally clause performs some output, which is seen.

Note that finally interacts with accept in subtle ways not revealed in this example; they are documented in the description of accept under the block directive documentation.


7.8.3 The finally clause

A try directive can terminate in one of three ways. The main clause may match successfully, and possibly yield some new variable bindings. The main clause may fail to match. Or the main clause may be terminated by a nonlocal control transfer, like an exception being thrown or a block return (like the block foo example in the previous section).

No matter how the try clause terminates, the finally clause is processed.

The finally clause is itself a query which binds variables, which leads to questions: what happens to such variables? What if the finally block fails as a query? As well as: what if a finally clause itself initiates a control transfer? Answers follow.

Firstly, a finally clause will contribute variable bindings only if the main clause terminates normally (either as a successful or failed match). If the main clause of the try block successfully matches, then the finally block continues matching at the next position in the data, and contributes bindings. If the main clause fails, then the finally block tries to match at the same position where the main clause failed.

The overall try directive succeeds as a match if either the main clause or the finally clause succeed. If both fail, then the try directive is a failed match.



In this example, the main clause of the try captures line "1" of the data as variable a, then the finally clause captures "2" as b, and then the query continues with the @c line after try block, so that c captures "3".


 hello @a

In this example, the main clause of the try fails to match, because the input is not prefixed with "hello ". However, the finally clause matches, binding b to "1". This means that the try block is a successful match, and so processing continues with @c which captures "2".

When finally clauses are processed during a nonlocal return, they have no externally visible effect if they do not bind variables. However, their execution makes itself known if they perform side effects, such as output.

A finally clause guards only the main clause and the catch clauses. It does not guard itself. Once the finally clause is executing, the try block is no longer guarded. This means if a nonlocal transfer, such as a block accept or exception, is initiated within the finally clause, it will not re-execute the finally clause. The finally clause is simply abandoned.

The disestablishment of blocks and try clauses is properly interleaved with the execution of finally clauses. This means that all surrounding exit points are visible in a finally clause, even if the finally clause is being invoked as part of a transfer to a distant exit point. The finally clause can make a control transfer to an exit point which is more near than the original one, thereby "hijacking" the control transfer. Also, the anonymous block established by the try directive is visible in the finally clause.


  @  (try)
  @    (next "nonexistent-file")
  @  (finally)
  @    (accept)
  @  (end)
  @(catch file-error)
  @  (output)
  file error caught
  @  (end)

In this example, the @(next) directive throws an exception of type file-error, because the given file does not exist. The exit point for this exception is the @(catch file-error) clause in the outermost try block. The inner block is not eligible because it contains no catch clauses at all. However, the inner try block has a finally clause, and so during the processing of this exception which is headed for @(catch file-error), the finally clause performs an anonymous accept. The exit point for that accept is the anonymous block surrounding the inner try. So the original transfer to the catch clause is thereby abandoned. The inner try terminates successfully due to the accept, and since it constitutes the main clause of the outer try, that also terminates successfully. The "file error caught" message is never printed.


7.8.4 catch clauses

catch clauses establish their associated try blocks as potential exit points for exception-induced control transfers (called "throws").

A catch clause specifies an optional list of symbols which represent the exception types which it catches. The catch clause will catch exceptions which are a subtype of any one of those exception types.

If a try block has more than one catch clause which can match a given exception, the first one will be invoked.

When a catch is invoked, it is understood that the main clause did not terminate normally, and so the main clause could not have produced any bindings.

catch clauses are processed prior to finally.

If a catch clause itself throws an exception, that exception cannot be caught by that same clause or its siblings in the same try block. The catch clauses of that block are no longer visible at that point. Nevertheless, the catch clauses are still protected by the finally block. If a catch clause throws, or otherwise terminates, the finally block is still processed.

If a finally block throws an exception, then it is simply aborted; the remaining directives in that block are not processed.

So the success or failure of the try block depends on the behavior of the catch clause or the finally clause, if there is one. If either of them succeed, then the try block is considered a successful match.


 @  (next "nonexistent-file")
 @  x
 @  (catch file-error)

Here, the try block's main clause is terminated abruptly by a file-error exception from the @(next) directive. This is handled by the catch clause, which binds variable a to the input line "1". Then the finally clause executes, binding b to "2". The try block then terminates successfully, and so @c takes "3".


7.8.5 catch Clauses with Parameters

A catch clause may have parameters following the type name, like this:

  @(catch pair (a b))

To write a catch-all with parameters, explicitly write the master supertype t:

  @(catch t (arg ...))

Parameters are useful in conjunction with throw. The built-in error exceptions carry one argument, which is a string containing the error message. Using throw, arbitrary parameters can be passed from the throw site to the catch site.


7.8.6 The throw Directive

The throw directive generates an exception. A type must be specified, followed by optional arguments, which are bind expressions. For example,

  @(throw pair "a" `@file.txt`)

throws an exception of type pair, with two arguments, being "a" and the expansion of the quasiliteral `@file.txt`.

The selection of the target catch is performed purely using the type name; the parameters are not involved in the selection.

Binding takes place between the arguments given in throw and the target catch.

If any catch parameter, for which a throw argument is given, is a bound variable, it has to be identical to the argument, otherwise the catch fails. (Control still passes to the catch, but the catch is a failed match).

 @(bind a "apple")
 @(throw e "banana")
 @(catch e (a))
 [query fails]

If any argument is an unbound variable, the corresponding parameter in the catch is left alone: if it is an unbound variable, it remains unbound, and if it is bound, it stays as is.

 @(throw e "honda" unbound)
 @(catch e (car1 car2))
 @car1 @car2
 honda toyota

If a catch has fewer parameters than there are throw arguments, the excess arguments are ignored:

 @(throw e "banana" "apple" "pear")
 @(catch e (fruit))

If a catch has more parameters than there are throw arguments, the excess parameters are left alone. They may be bound or unbound variables.

 @(throw e "honda")
 @(catch e (car1 car2))
 @car1 @car2
 honda toyota

A throw argument passing a value to a catch parameter which is unbound causes that parameter to be bound to that value.

throw arguments are evaluated in the context of the throw, and the bindings which are available there. Consideration of what parameters are bound is done in the context of the catch.

 @(bind c "c")
 @(forget c)
 @(bind (a c) ("a" "lc"))
 @(throw e a c)
 @(catch e (b a))

In the above example, c has a top-level binding to the string "c", but then becomes unbound via forget within the try construct, and rebound to the value "lc". Since the try construct is terminated by a throw, these modifications of the binding environment are discarded. Hence, at the end of the query, variable c ends up bound to the original value "c". The throw still takes place within the scope of the bindings set up by the try clause, so the values of a and c that are thrown are "a" and "lc". However, at the catch site, variable a does not have a binding. At that point, the binding to "a" established in the try has disappeared already. Being unbound, the catch parameter a can take whatever value the corresponding throw argument provides, so it ends up with "lc".

There is a horizontal form of throw. For instance:

  abc@(throw e 1)

throws exception e if abc matches.

If throw is used to generate an exception derived from type error and that exception is not handled, TXR will issue diagnostics on the *stderr* stream and terminate. If an exception derived from warning is not handled, TXR will generate diagnostics on the *stderr* stream, after which control returns to the throw directive, and proceeds with the next directive. If an exception not derived from error is thrown, control returns to the throw directive and proceeds with the next directive.


7.8.7 The defex Directive

The defex directive allows the query writer to invent custom exception types, which are arranged in a type hierarchy (meaning that some exception types are considered subtypes of other types).

Subtyping means that if an exception type B is a subtype of A, then every exception of type B is also considered to be of type A. So a catch for type A will also catch exceptions of type B. Every type is a supertype of itself: an A is a kind of A. This implies that every type is a subtype of itself also. Furthermore, every type is a subtype of the type t, which has no supertype other than itself. Type nil is a subtype of every type, including itself. The subtyping relationship is transitive also. If A is a subtype of B, and B is a subtype of C, then A is a subtype of C.

defex may be invoked with no arguments, in which case it does nothing:


It may be invoked with one argument, which must be a symbol. This introduces a new exception type. Strictly speaking, such an introduction is not necessary; any symbol may be used as an exception type without being introduced by @(defex):

  @(defex a)

Therefore, this also does nothing, other than document the intent to use a as an exception.

If two or more argument symbols are given, the symbols are all introduced as types, engaged in a subtype-supertype relationship from left to right. That is to say, the first (leftmost) symbol is a subtype of the next one, which is a subtype of the next one and so on. The last symbol, if it had not been already defined as a subtype of some type, becomes a direct subtype of the master supertype t. Example:

  @(defex d e)
  @(defex a b c d)

The first directive defines d as a subtype of e, and e as a subtype of t. The second defines a as a subtype of b, b as a subtype of c, and c as a subtype of d, which is already defined as a subtype of e. Thus a is now a subtype of e. The above can be condensed to:

  @(defex a b c d e)


 @(defex gorilla ape primate)
 @(defex monkey primate)
 @(defex human primate)
 gorilla @name
 @(throw gorilla name)
 monkey @name
 @(throw monkey name)
 human @name
 @(throw human name)
 @(catch primate (name))
 @kind @name
 we have a primate @name of kind @kind
 gorilla joe
 human bob
 monkey alice
 we have a primate joe of kind gorilla
 we have a primate bob of kind human
 we have a primate alice of kind monkey

Exception types have a pervasive scope. Once a type relationship is introduced, it is visible everywhere. Moreover, the defex directive is destructive, meaning that the supertype of a type can be redefined. This is necessary so that something like the following works right:

  @(defex gorilla ape)
  @(defex ape primate)

These directives are evaluated in sequence. So after the first one, the ape type has the type t as its immediate supertype. But in the second directive, ape appears again, and is assigned the primate supertype, while retaining gorilla as a subtype. This situation could be diagnosed as an error, forcing the programmer to reorder the statements, but instead TXR obliges. However, there are limitations. It is an error to define a subtype-supertype relationship between two types if they are already connected by such a relationship, directly or transitively. So the following definitions are in error:

  @(defex a b)
  @(defex b c)
  @(defex a c)@# error: a is already a subtype of c, through b

  @(defex x y)
  @(defex y x)@# error: circularity; y is already a supertype of x.


7.8.8 The assert Directive

The assert directive requires the remaining query or subquery which follows it to match. If the remainder fails to match, the assert directive throws an exception. If the directive is simply


Then it throws an assertion of type assert, which is a subtype of error. The assert directive also takes arguments similar to the throw directive: an exception symbol and additional arguments which are bind expressions, and may be unbound variables. The following assert directive, if it triggers, will throw an exception of type foo, with arguments 1 and "2":

  @(assert foo 1 "2")


  Important Header
  Foo: @a, @b

Without the assertion in places, if the Foo: @a, @b part does not match, then the entire interior of the @(collect) clause fails, and the collect continues searching for another match.

With the assertion in place, if the text "Important Header" and its underline match, then the remainder of the collect body must match, otherwise an exception is thrown. Now the program will not silently skip over any Important Header sections due to a problem in its matching logic. This is particularly useful when the matching is varied with numerous cases, and they must all be handled.

There is a horizontal assert directive also. For instance:


asserts that if the prefix "abc" is matched, then it must be followed by a successful match for "d@x", or else an exception is thrown.

If the exception is not handled, and is derived from error then TXR issues diagnostics on the *stderr* stream and terminates. If the exception is derived from warning and not handled, TXR issues a diagnostic on *stderr* after which control returns to the assert directive. Control silently returns to the assert directive if an exception of any other kind is not handled.

When control returns to assert due to an unhandled exception, it behaves like a failed match, similarly to the require directive.



The TXR language contains an embedded Lisp dialect called TXR Lisp.

This language is exposed in TXR in a number of ways.

In any situation that calls for an expression, a Lisp expression can be used, if it is preceded by the @ character. The Lisp expression is evaluated and its value becomes the value of that expression. Thus, TXR directives are embedded in literal text using @, and Lisp expressions are embedded in directives using @ also.

Furthermore, certain directives evaluate Lisp expressions without requiring @. These are @(do), @(require), @(assert), @(if) and @(next).

TXR Lisp code can be placed into files. On the command line, TXR treats files with a ".tl", ".tlo" or ".tlo.gz" suffix as TXR Lisp source or compiled code, and the @(load) directive does also.

TXR also provides an interactive listener for Lisp evaluation.

Lastly, TXR Lisp expressions can be evaluated via the command line, using the -e and -p options.


Bind variable a to the integer 4:

  @(bind a @(+ 2 2))

Bind variable b to the standard input stream. Note that @ is not required on a Lisp variable:

  @(bind a *stdin*)

Define several Lisp functions inside @(do):

    (defun add (x y) (+ x y))

    (defun occurs (item list)
      (cond ((null list) nil)
            ((atom list) (eql item list))
            (t (or (eq (first list) item)
                   (occurs item (rest list)))))))

Trigger a failure unless previously bound variable answer is greater than 42:

  @(require (> (int-str answer) 42)


8.1 Overview

TXR Lisp is a small and simple dialect, like Scheme, but much more similar to Common Lisp than Scheme. It has separate value and function binding namespaces, like Common Lisp (and thus is a Lisp-2 type dialect), and represents Boolean true and false with the symbols t and nil (note the case sensitivity of identifiers denoting symbols!). Furthermore, the symbol nil is also the empty list, which terminates nonempty lists.

TXR Lisp has lexically scoped local variables and dynamic global variables, similarly to Common Lisp, including the convention that defvar marks symbols for dynamic binding in local scopes. Lexical closures are supported. TXR Lisp also supports global lexical variables via defvarl.

Functions are lexically scoped in TXR Lisp; they can be defined in the pervasive global environment using defun or in local scopes using flet and labels.


8.2 Additional Syntax

Much of the TXR Lisp syntax has been introduced in the previous sections of the manual, since directive forms are based on it. There is some additional syntax that is useful in TXR Lisp programming.


8.2.1 Symbol Tokens

The symbol tokens in TXR Lisp, called a lident (Lisp identifier) has a similar syntax to the bident (braced identifier) in the TXR pattern language. It may consist of all the same characters, as well as the / (slash) character which may not be used in a bident. Thus a lident may consist of these characters, in addition to letters, numbers and underscores:

 ! $ % & * + - < = > ? \ ~ /

and may not look like a number.

A lident may also include all of the Unicode characters which are permitted in a bident.

The one character which is allowed in a lident but not in a bident is / (forward slash).

A lone / is a valid lident and consequently a symbol token in TXR Lisp. The token /abc/ is also a symbol, and, unlike in a braced expression, is not a regular expression. In TXR Lisp expressions, regular expressions are written with a leading #.


8.2.2 Package Prefixes

If a symbol name contains a colon, the lident characters, if any, before that colon constitute the package prefix.

For example, the syntax foo:bar denotes bar symbol in the foo package.

It is a syntax error to read a symbol whose package doesn't exist.

If the package exists, but the symbol name doesn't exist in that package, then the symbol is interned in that package.

If the package name is an empty string (the colon is preceded by nothing), the package is understood to be the keyword package. The symbol is interned in that package.

The syntax :test denotes the symbol test in the keyword package, the same as keyword:test.

Symbols in the keyword package are self-evaluating. This means that when a keyword symbol is evaluated as a form, the value of that form is the keyword symbol itself. Exactly two non-keyword symbols also have this special self-evaluating behavior: the symbols t and nil in the user package, whose fully qualified names are usr:t and usr:nil.

The syntax @foo:bar denotes the meta prefix @ being applied to the foo:bar symbol, not to a symbol in the @foo package.

The syntax #:bar denotes an uninterned symbol named bar, described in the next section.

Dialect Note:

In ANSI Common Lisp, the foo:bar syntax does not intern the symbol bar in the foo package; the symbol must exist and be an exported symbol, or else the syntax is erroneous. In ANSI Common Lisp, the syntax foo::bar does intern foo in the bar package. TXR's package system has no double-colon syntax, and lacks the concept of exported symbols.


8.2.3 Uninterned Symbols

Uninterned symbols are written with the #: prefix, followed by zero or more lident characters. When an uninterned symbol is read, a new, unique symbol is constructed, with the specified name. Even if two uninterned symbols have the same name, they are different objects. The make-sym and gensym functions produce uninterned symbols.

"Uninterned" means "not entered into a package". Interning refers to a process which combines package lookup with symbol creation, which ensures that multiple occurrences of a symbol name in written syntax are all converted to the same object: the first occurrence creates the symbol and associates it with its name in a package. Subsequent occurrences do not create a new symbol, but retrieve the existing one.


8.2.4 Meta-Atoms and Meta-Expressions

An expression may be preceded by the @ (at sign) character. If the expression is an atom, then this is a meta-atom, otherwise it is a meta-expression.

When the atom is a symbol, this is also called a meta-symbol and in situations when such a symbol behaves like a variable, it is also referred to as a meta-variable.

When the atom is an integer, the meta-atom expression is called a meta-number.

Meta-atom and meta-expression expressions have no evaluation semantics; evaluating them throws an exception. They play a syntactic role in the op operator, which makes use of meta-variables and meta-numbers, and in structural pattern matching, which uses meta-variables as pattern variables and whose operator vocabulary is based on meta-expressions.

Meta-expressions also appear in the quasiliteral notation.

In other situations, application code may assign meaning to meta syntax as the programmer sees fit.

Meta syntax is defined as a shorthand notation, as follows:

If X is the syntax of an atom, such as a symbol, string or vector, then @X is a shorthand for the expression (sys:var X). Here, sys:var refers to the var symbol in the system-package.

If X is a compound expression, either (...) or [...], then @X is a shorthand for the expression (sys:expr X).

The behavior of @ followed by the syntax of a floating-point constant introduced by a leading decimal point, not preceded by digits, is unspecified. Examples of this are @.123 and @.123E+5.

The behavior of @ followed by the syntax of a floating-point expression in E notation, which lacks a decimal point, is also unspecified. An example of this is @12E5.

It is a syntax error for @ to be followed by what appears to be a floating-point constant consisting of a decimal point flanked by digits on both sides. For instance @1.2 is rejected.

A meta-expression followed by a period, and the syntax of another object is otherwise interpreted as a referencing dot expression. For instance @1.E3 denotes (qref @1 E3) which, in turn, denotes (qref (sys:var 1) E3), even though the unprefixed character sequence 1.E3 is otherwise a floating-point constant.


8.2.5 Consing Dot

Unlike other major Lisp dialects, TXR Lisp allows a consing dot with no forms preceding it. This construct simply denotes the form which follows the dot. That is to say, the parser implements the following transformation:

  (. expr) -> expr

This is convenient in writing function argument lists that only take variable arguments. Instead of the syntax:

  (defun fun args ...)

the following syntax can be used:

  (defun fun (. args) ...)

When a lambda form is printed, it is printed in the following style.

  (lambda nil ...) -> (lambda () ...)
  (lambda sym ...) -> (lambda (. sym) ...)
  (lambda (sym) ...) -> (lambda (sym) ...)

In no other circumstances is nil printed as (), or an atom sym as (. sym).

This notation is implemented for the square brackets, according to this transformation:

  [. expr] -> (dwim . expr)

This is useful in Structural Pattern Matching, allowing a pattern like

  [. @args]

to match a dwim expression and capture all of its arguments in a variable, without having to resort to the internal notation:

Compatibility Note: support for [. expr] was introduced in TXR 282. Older versions do not read the syntax, but do print (dwim . @var) as [. @var] which is then unreadable in those versions, breaking read-print consistency.


8.2.6 Referencing Dot

A dot token which is flanked by expressions on both sides, without any intervening whitespace, is the referencing dot, and not the consing dot. The referencing dot is a syntactic sugar which translated to the qref syntax ("quoted ref"). When evaluated as a form, this syntax denotes structure access; see Structures. However, it is possible to put this syntax to use for other purposes, in other contexts.

  ;; a.b may be almost any expressions
  a.b           <-->  (qref a b)
  a.b.c         <-->  (qref a b c)
  a.(qref b c)  <-->  (qref a b c)
  (qref a b).c  <-->  (qref (qref a b) c)

That is to say, this dot operator constructs a qref expression out of its left and right arguments. If the right argument of the dot is already a qref expression (whether produced by another instance of the dot operator, or expressed directly) it is merged. This requires the qref dot operator to be right-to-left associative, so that a.b.c works by first translating b.c to (qref b c), and then adjoining a to produce (qref a b c).

If the referencing dot is immediately followed by a question mark, it forms a single token, which produces the following syntactic variation, in which the following item is annotated as a list headed by the symbol t:

  a.?b     <-->  (t a).b         <--> (qref (t a) b)
  a.?b.?c  <-->  (t a).(t b).c   <--> (qref (t a) (t b) c)
  a.?(b)   <-->  (t a).(b)       <--> (qref (t a) (b))
  (a).?b   <-->  (t (a)).b       <--> (qref (t (a)) b)

This syntax denotes null-safe access to structure slots and methods. a.?b means that a may evaluate to nil, in which case the expression yields nil; otherwise, a must evaluate to a struct which has a slot b, and the expression denotes access to that slot. Similarly, a.?(b 1) means that if a evaluates to nil, the expression yields nil; otherwise, a is treated as a struct object whose method b is invoked with argument 1, and the value returned by that method becomes the value of the expression.

Integer tokens cannot be involved in this syntax, because they form floating-point constants when juxtaposed with a dot. Such ambiguous uses of floating-point tokens are diagnosed as syntax errors:

  (a.4)   ;; error: cramped floating-point literal
  (a .4)  ;; good: a followed by 0.4


8.2.7 Unbound Referencing Dot

Closely related to the referencing dot syntax is the unbound referencing dot. This is a dot which is flanked by an expression on the right, without any intervening whitespace, but is not preceded by an expression Rather, it is preceded by whitespace, or some punctuation such as [, ( or '. This is a syntactic sugar which translates to uref syntax:

  .a       <--> (uref a)
  .a.b     <--> (uref a b)
  .a.?b    <--> (uref (t a) b)

If the unbound referencing dot is itself combined with a question mark to form the .? token, then the translation to uref is as follows:

  .?a      <--> (uref t a)
  .?a.b    <--> (uref t a b)
  .?a.?b   <--> (uref t a (t b))

When the unbound referencing dot is applied to a dotted expression, this can be understood as a conversion of qref to uref.

Indeed, this is exactly what happens if the unbound dot is applied to an explicit qref expression:

  .(qref a b)   <--> (uref a b)

The unbound referencing dot takes its name from the semantics of the uref macro, which produces a function that implements late binding of an object to a method slot. Whereas the expression obj.a.b denotes accessing object obj to retrieve slot a and then accessing slot b of the object from that slot, the expression .a.b. represents a "disembodied" reference: it produces a function which takes an object as an argument and then performs the implied slot referencing on that argument. When the function is called, it is said to bind the referencing to the object. Hence that referencing is "unbound".

Whereas the expression .a produces a function whose argument must be an object, .?a produces a function whose argument may be nil. The function detects this case and returns nil.


8.2.8 Quote and Quasiquote

The quote character in front of an expression is used for suppressing evaluation, which is useful for forms that evaluate to something other than themselves. For instance if '(+ 2 2) is evaluated, the value is the three-element list (+ 2 2), whereas if (+ 2 2) is evaluated, the value is 4. Similarly, the value of 'a is the symbol a itself, whereas the value of a is the contents of the variable a.

The caret in front of an expression is a quasiquote. A quasiquote is like a quote, but with the possibility of substitution of material.

Under a quasiquote, form is considered to be a quasiquote template. The template is considered to be a literal structure, except that it may contain the notations ,expr and ,*expr which denote non-constant parts.

A quasiquote gets translated into code which, when evaluated, constructs the structure implied by qq-template, taking into account the unquotes and splices.

A quasiquote also processes nested quasiquotes specially.

If qq-template does not contain any unquotes or splices (which match its level of nesting), or is simply an atom, then ^qq-template is equivalent to 'qq-template . in other words, it is like an ordinary quote. For instance ^(a b ^(c ,d)) is equivalent to '(a b ^(c ,d)). Although there is an unquote ,d it belongs to the inner quasiquote ^(c ,d), and the outer quasiquote does not have any unquotes of its own, making it equivalent to a quote.

Dialect Note: in Common Lisp and Scheme, ^form is written `form, and quasiquotes are also informally known as backquotes. In TXR, the backquote character ` used for quasistring literals.

The comma character is used within a qq-template to denote an unquote. Whereas the quasiquote suppresses evaluation, similarly to the quote, the comma introduces an exception: an element of a form which is evaluated. For example, list ^(a b c ,(+ 2 2) (+ 2 2)) is the list (a b c 4 (+ 2 2)). Everything in the quasiquote stands for itself, except for the ,(+ 2 2) which is evaluated.

Note: if a variable is called *x*, then the syntax ,*x* means ,* x*: splice the value of x*. In this situation, whitespace between the comma and the variable name must be used: , *x*.

The comma-star operator is used within quasiquote list to denote a splicing unquote. The form which follows ,* must evaluate to a list. That list is spliced into the structure which the quasiquote denotes. For example: '(a b c ,*(list (+ 3 3) (+ 4 4) d)) evaluates to (a b c 6 8 d). The expression (list (+ 3 3) (+ 4 4)) is evaluated to produce the list (6 8), and this list is spliced into the quoted template.

This syntax is not a distinct quasiquoting operator, but rather the combination of an unquote occurring as a meta-expression, denoting the structure (sys:expr ,expr). This structure is treated specially by the quasiquote expander. Code is generated for it such that if expr evaluates to a value val which is an atom, then the result will be the (sys:var val) structure. If val is a cons rather than an atom, then the result is the (sys:expr val) structure. In other words, when quasiquoting is used to insert a value under the @ meta prefix, the expander generates code to analyze the type of the value, and produce to the form which is most likely intended.

Dialect Notes:

In other Lisp dialects, like Scheme and ANSI Common Lisp, the equivalent syntax is usually ,@ (comma at). The @ character already has an assigned meaning in TXR, so * is used.

However, * is also a character that may appear in a symbol name, which creates a potential for ambiguity. The syntax ,*abc denotes the application of the ,* splicing operator to the symbolic expression abc; to apply the ordinary non-splicing unquote to the symbol *abc, whitespace must be used: , *abc.

In TXR, the unquoting and splicing forms may freely appear outside of a quasiquote template. If they are evaluated as forms, however, they throw an exception:

   ,(+ 2 2) ;; error!

   ',(+ 2 2) --> ,(+ 2 2)

In other Lisp dialects, a comma not enclosed by backquote syntax is treated as a syntax error by the reader.

TXR's quasiquote supports splicing multiple items into a quote, if that quote is itself evaluated via an unquote. Concretely, these two examples produce the same result:

      (let ((args '(a b c)))
        ^^(let ((a 1) (b 2) (c 3))
          (list ,',*args)))))
  -> (1 2 3)

      (let ((args '(a b c)))
        ^^(let ((a 1) (b 2) (c 3))
          (list ,*',args)))))
  -> (1 2 3)

The only difference is that the former example uses ,',*args whereas the latter ,*',args. Thus the former example splices args into the quote as if by (quote ,*args) which is invalid quote syntax if args doesn't expand to exactly one element. This invalid quote syntax is accepted by the quasiquote expander when it occurs in the above unquoting and splicing situation. Effectively, it behaves as if the splice distributes across the quoted unquote, such that all the arguments of the quote end up individually quoted, and spliced into the surrounding list.

The Common Lisp equivalent this combination, ,',@args, works in some Common Lisp implementations, such as CLISP.


8.2.9 Quasiquoting non-List Objects

Quasiquoting is supported over hash table and vector literals (see Vectors and Hashes below). A hash table or vector literal can be quoted, like any object, for instance:

  '#(1 2 3)

The #(1 2 3) literal is turned into a vector atom right in the TXR parser, and this atom is being quoted: this is (quote atom) syntactically, which evaluates to atom.

When a vector is quasi-quoted, this is a case of ^atom which evaluates to atom.

A vector can be quasiquoted, for example:

  ^#(1 2 3)

Unquotes can occur within a quasiquoted vector:

  (let ((a 42))
    ^#(1 ,a 3)) ; value is #(1 42 3)

In this situation, the ^#(...) notation produces code which constructs a vector.

The vector in the following example is also a quasivector. It contains unquotes, and though the quasiquote is not directly applied to it, it is embedded in a quasiquote:

  (let ((a 42))
    ^(a b c #(d ,a))) ; value is (a b c #(d 42))

Hash-table literals have two parts: the list of hash construction arguments and the key-value pairs. For instance:

   #H((:eql-based) (a 1) (b 2))

where (:eql-based) indicates that this hash table's keys are treated using eql equality, and (a 1) and (b 2) are the key/value entries. Hash literals may be quasiquoted. In quasiquoting, the arguments and pairs are treated as separate syntax; it is not one big list. So the following is not a possible way to express the above hash:

  ;; not supported: splicing across the entire syntax
  (let ((hash-syntax '((:eql-based) (a 1) (b 2))))

This is correct:

  ;; fine: splicing hash arguments and contents separately
  (let ((hash-args '(:eql-based))
        (hash-contents '((a 1) (b 2))))
    ^#H(,hash-args ,*hash-contents))


8.2.10 Quasiquoting combined with Quasiliterals

When a quasiliteral is embedded in a quasiquote, it is possible to use splicing to insert material into the quasiliteral.


  (eval (let ((a 3)) ^`abc @,a @{,a} @{(list 1 2 ,a)}`))

  -> "abc 3 3 1 2 3"


8.2.11 Vector Literals

A hash token followed by a list denotes a vector. For example #(1 2 a) is a three-element vector containing the numbers 1 and 2, and the symbol a.


8.2.12 Struct Literals

#S(name {slot value}*)
The notation #S followed by a nested list syntax denotes a struct literal. The first item in the syntax is a symbol denoting the struct type name. This must be the name of a struct type, otherwise the literal is erroneous. Followed by the struct type are slot names interleaved with their values. The values are literal expressions, not subject to evaluation. Each slot name which is present in the literal must name a slot in the struct type, though not all slots in the struct type must be present in the literal.

When a struct literal is read, the denoted struct type is constructed as if by a call to make-struct with an empty plist argument, followed by a sequence of assignments which store into each slot the corresponding value expression.


8.2.13 Hash Literals

#H((hash-argument*) (key value)*)
The notation #H followed by list syntax denotes a hash-table literal. The first item in the syntax is a list of keywords. These are the same keywords as are used when calling the function hash to construct a hash table. Allowed keywords are: :equal-based, :eql-based, :eq-based, :weak-keys, :weak-vals, and :userdata. If the :userdata keyword is present, it must be followed by an object; that object specifies the hash table's user data, which can be retrieved using the hash-userdata function. The :equal-based, :eql-based and :eq-based keywords are mutually exclusive.

An empty list can be specified as nil or (), which defaults to a hash table based on the eql function, with no weak semantics or user data.

The entire syntax following #H may be an empty list; however, that empty list may not be specified as nil; the empty parentheses notation is required.

The hash table's key-value contents are specified as zero or more two-element lists, whose first element specifies the key and whose second specifies the value. Both expressions are literal objects, not subject to evaluation.


8.2.14 Range Literals

#R(from to)
The notation #R followed by a two-element list syntax denotes a range literal. It combines from and to expressions, themselves literals not subject to evaluation, producing the range object whose corresponding to and from fields are the objects denoted by these expressions.


8.2.15 Buffer Literals

The notation #b' introduces a buffer object: a data representation for a block of bytes. This #b' prefix must be followed by a data section and a closing quote. The data section consists of hexadecimal digits, among which may be interspersed whitespace: tabs, spaces and newlines. There must be an even number of digits, or else the notation is ill-formed. The whitespace is ignored, and pairs of successive hex digits specify bytes. If there are no hex digits, then a zero length buffer is specified.

Buffers may be constructed by the make-buf function, and other means such as the ffi-get function.

Note that the #b prefix is also used for binary numbers. In that syntax, it is followed by an optional sign, and then a mixture of one or more of the digits 0 or 1.


8.2.16 Tree Node Literals

#N([key [left [right]]])
The notation #N followed by list syntax denotes a tree node literal. The list syntax must be a proper list that has up to three elements. If the list is empty, it may not be written as nil.

A tree node is an object of type tnode. Every tnode has three elements: a key, a left link and a right link. They may be objects of any type. If the tree node literal syntax omits any of these, they default to nil.


8.2.17 Tree Literals

#T([([keyfun [lessfun [equalfun]]]) item*])
The notation #T followed by list syntax denotes a tree literal, which specifies an object of type tree. Objects of type tree are search trees.

The list syntax which follows #T may be empty. If so, it cannot be written as nil.

The first element of the #T syntax, if present, must be a list of zero to three elements. These elements are symbols giving the names of the tree object's key abstraction functions. keyfun specifies the key function which is applied to each element to retrieve its key. If it is omitted, the object shall use the identity function as its key. The lessfun specifies the name of the comparison function by which keys are compared for inequality. It defaults to less. The equalfun specifies the function by which keys are compared for equality. It defaults to equal. A symbol which is specified as the name of any of these three special functions must be an element of the list stored in the special variable *tree-fun-whitelist*, otherwise the string literal is diagnosed as erroneous. Note: this is due to security considerations, since these three functions are executed during the processing of tree syntax.

A tree object is constructed from a tree literal by first creating an empty tree endowed with the three key abstraction functions that are indicated in the syntax, either explicitly or as defaults. Then, every element object is constructed from its respective literal syntax and inserted into the tree.

Duplicate objects are preserved. For instance the tree literal #T(() 1 1 1) specifies a tree with three nodes which have the same key. Duplicates appear in the tree in the order that they appear in the literal.


8.2.18 JSON Literals

Introduces a JSON literal.
Introduces a JSON quasiliteral, allowing unquoting and splicing of Lisp expressions.

The implementation of JSON syntax is based on, and intended to conform with the IETF RFC 8259 document. Only TXR's extensions to JSON syntax are described in this manual, as well as the correspondence between JSON syntax and Lisp.

The json-syntax is translated into a TXR Lisp object as follows.

A JSON string corresponds to a Lisp string. A JSON number corresponds to a Lisp floating-point number. A JSON array corresponds to a Lisp vector. A JSON object corresponds to an equal-based hash table.

The JSON Boolean symbols true and false translate to the Lisp symbols t and nil, respectively, those being the standard ones in the usr package.

The JSON symbol null maps to the null symbol in the usr package.

The #Jjson-syntax expression produces the object:

  (json quote

where lisp-object is the Lisp value which corresponds to the json-syntax.

Similarly, but with a key difference, the #J^json-syntax expression produces the object:

  (json sys:qquote

in which quote has been replaced with sys:qquote.

The json symbol is bound as a macro, which is expanded when a #J expression is evaluated.

The following remarks indicate special treatment and extensions in the processing of JSON. Similar remarks regarding the production of JSON are given under the put-json function.

When an invalid UTF-8 byte is encountered inside a JSON string, its value is mapped into the code point range U+DC01 to U+DCFF. That byte is consumed, and decoding continues with the next byte. This treatment is consistent with the treatment of invalid UTF-8 bytes in TXR Lisp literals and I/O streams. If the valid UTF-8 byte U+0000 (ASCII NUL) occurs in a JSON string, it is also mapped to U+DC00, TXR's pseudo-null character. This treatment is consistent with TXR string literals and I/O streams.

The JSON escape sequence \u0000 denoting the U+0000 NUL character is also converted to U+DC00.

TXR Lisp does not impose the restriction that the keys in a JSON object must be strings: #J{1:2,true:false} is accepted.

TXR Lisp allows the circle notation to occur within JSON syntax. See the section Notation for Circular and Shared Structure.

TXR Lisp supports the extension of Lisp comments in JSON. When the ; character (semicolon) occurs in the middle of JSON syntax, outside of a token, that character and all characters until the end of the line constitute a comment that is discarded. TXR Lisp never produces comments when printing JSON.

TXR Lisp allows for JSON syntax to be quasiquoted, and provides two extensions for writing unquotes and splicing unquotes. Within a JSON quasiquote, the ~ (tilde) character introduces a Lisp expression whose value is to be substituted at that point. Thus, the tilde serves the role of the unquoting comma used in Lisp quasiquotes. Splicing is indicated by the character sequence ~*, which introduces a Lisp expression that is expected to produce a list, whose elements are interpolated into the JSON value.

Note: quasiquoting allows Lisp values to be introduced into the resulting object which are outside of the JSON type system, such as integers, characters, symbols or structures. These objects have no representation in JSON syntax.


  ;; Basic JSON:

  #Jtrue -> t
  #Jfalse -> nil
  (list #J true #Jtrue #Jfalse) -> (t t nil)
  #J[1, 2, 3.14] -> #(1.0 2.0 3.14)
  #J{"foo":"bar"} -> #H(() ("foo" "bar"))

  ;; Quoting JSON shows the json expression

  '#Jfalse -> (json quote ())
  '#Jtrue -> (json quote t)
  '#J["a", true, 3.0] -> (json quote #("a" t 3.0))
  '#J^[~(+ 2 2), 3] -> (json sys:qquote #(,(+ 2 2) 3.0))

  :; Circle notation:

  #J[#1="abc", #1#, #1#] -> #("abc" "abc" "abc")

  ;; JSON Quasiquote:

  #J^[~*(list 1.0 2.0 3.0), ~(* 2.0 2), 5.0]
  --> #(1.0 2.0 3.0 4.0 5.0)

  ;; Lisp quasiquote around JSON quote: requires evaluation round.

  ^#J[~*(list 1.0 2.0 3.0), ~(* 2.0 2), 5.0]
  --> (json quote #(1.0 2.0 3.0 4.0 5.0))

  (eval ^#J[~*(list 1.0 2.0 3.0), ~(* 2.0 2), 5.0])
  --> #(1.0 2.0 3.0 4.0 5.0)

  ;; Comment extension
  #J[1, ; Comment inside JSON.
     2, ; Another one.
     3] ; Lisp comment outside of JSON.
  --> #(1.0 2.0 3.0)


8.2.19 The .. notation

In TXR Lisp, there is a special "dotdot" notation consisting of a pair of dots. This can be written between successive atoms or compound expressions, and is a shorthand for rcons.

That is to say, A .. B translates to (rcons A B), and so for instance (a b .. (c d) e .. f . g) means (a (rcons b (c d)) (rcons e f) . g).

The rcons function constructs a range object, which denotes a pair of values. Range objects are most commonly used for referencing subranges of sequences.

For instance, if L is a list, then [L 1 .. 3] computes a sublist of L consisting of elements 1 through 2 (counting from zero).

Note that if this notation is used in the dot position of an improper list, the transformation still applies. That is, the syntax (a . b .. c) is valid and produces the object (a . (rcons b c)) which is another way of writing (a rcons b c), which is quite probably nonsense.

The notation's .. operator associates right to left, so that a..b..c denotes (rcons a (rcons b c)).

Note that range objects are not printed using the dotdot notation. A range literal has the syntax of a two-element list, prefixed by #R. (See Range Literals above.)

In any context where the dotdot notation may be used, and where it is evaluated to its value, a range literal may also be specified. If an evaluated dotdot notation specifies two constant expressions, then an equivalent range literal can replace it. For instance the form [L 1 .. 3] can also be written [L #R(1 3)]. The two are syntactically different, and so if these expressions are being considered for their syntax rather than value, they are not the same.


8.2.20 The DWIM Brackets

TXR Lisp has a square bracket notation. The syntax [...] is a shorthand way of writing (dwim ...). The [] syntax is useful for situations where the expressive style of a Lisp-1 dialect is useful.

For instance if foo is a variable which holds a function object, then [foo 3] can be used to call it, instead of (call foo 3). If foo is a vector, then [foo 3] retrieves the fourth element, like (vecref foo 3). Indexing over lists, strings and hash tables is possible, and the notation is assignable.

Furthermore, any arguments enclosed in [] which are symbols are treated according to a modified namespace lookup rule.

More details are given in the documentation for the dwim operator.


8.2.21 Compound Forms

In TXR Lisp, there are two types of compound forms: the Lisp-2 style compound forms, denoted by ordinary lists that are expressed with parentheses. There are Lisp-1 style compound forms denoted by the DWIM Brackets, described in the previous section.

The first position of an ordinary Lisp-2 style compound form, is expected to have a function or operator name. Then arguments follow. There may also be an expression in the dotted position, if the form is a function call.

If the form is a function call then the arguments are evaluated. If any of the arguments are symbols, they are treated according to Lisp-2 namespacing rules.

A function name may be a symbol, or else any of the syntactic forms given in the description of the function func-get-name.


8.2.22 Dot Position in Function Calls

If there is an expression in the dotted position of a function call expression, it is also evaluated, and the resulting value is involved in the function call in a special way.

Firstly, note that a compound form cannot be used in the dot position, for obvious reasons, namely that (a b c . (foo z)) does not mean that there is a compound form in the dot position, but denotes an alternate spelling for (a b c foo z), where foo behaves as a variable.

If the dot position of a compound form is an atom, then the behavior may be understood according to the following transformations:

  (f a b c ... . x)  -->  (apply (fun f) a b c ... x)
  [f a b c ... . x]  -->  [apply f a b c ... x]

In addition to atoms, meta-expressions and meta-symbols can appear in the dot position, even though their underlying syntax is actually a compound expression. This is made to work according to a transformation pattern which superficially resembles the above one for atoms:

  (f a b c ... . @x)  -->  (apply (fun f) a b c ... @x)

However, in this situation, the @x is a notation denoting the expression (sys:var x) and thus the entire form is a proper list, not a dotted list. With the underlying syntax revealed, the transformation looks like this:

  (f a b c ... sys:var x)  -->  (apply (fun f) a b c ... (sys:var @x))

That is to say, the TXR Lisp form expander reacts to the presence of a sys:var or sys:expr atom in embedded in the form. That symbol and the items which follow it are wrapped in an additional level of nesting, converted into a single compound form element.

Effectively, in all these cases, the dot notation constitutes a shorthand for apply.


  ;; a contains 3
  ;; b contains 4
  ;; c contains #(5 6 7)
  ;; s contains "xyz"

  (foo a b . c)  ;; calls (foo 3 4 5 6 7)
  (foo a)        ;; calls (foo 3)
  (foo . s)      ;; calls (foo #\x #\y #\z)

  (list . a)     ;; yields 3
  (list a . b)   ;; yields (3 . 4)
  (list a . c)   ;; yields (3 5 6 7)
  (list* a c)    ;; yields (3 . #(5 6 7))

  (cons a . b)   ;; error: cons isn't variadic.
  (cons a b . c) ;; error: cons requires exactly two arguments.

  [foo a b . c]  ;; calls (foo 3 4 5 6 7)

  [c 1]          ;; indexes into vector #(5 6 7) to yield 6

  (call (op list 1 . @1) 2) ;; yields (1 . 2)

Note that the atom in the dot position of a function call may be a symbol macro. Since the semantics works as if by transformation to an apply form in which the original dot position atom is an ordinary argument, the symbol macro may produce a compound form.


  (symacrolet ((x 2))
    (list 1 . x))  ;; yields (1 . 2)

  (symacrolet ((x (list 1 2)))
    (list 1 . x))  ;; yields (1 1 2)

That is to say, the expansion of x is not substituted into the form (list 1 . x) but rather the transformation to apply syntax takes place first, and so the substitution of x takes place in a form resembling (apply (fun list) 1 x).

Dialect Note:

In some other Lisp dialects like ANSI Common Lisp, the improper list syntax may not be used as a function call; a function called apply (or similar) must be used for application even if the expression which gives the trailing arguments is a symbol. Moreover, applying sequences other than lists is not supported.


8.2.23 Improper Lists as Macro Calls

TXR Lisp allows macros to be called using forms which are improper lists. These forms are simply destructured by the usual macro parameter list destructuring. To be callable this way, the macro must have an argument list which specifies a parameter match in the dot position. This dot position must either match the terminating atom of the improper list form, or else match the trailing portion of the improper list form.

For instance if a macro mac is defined as

  (defmacro mac (a b . c) ...)

then it may not be invoked as (mac 1 . 2) because the required argument b is not satisfied, and so the 2 argument cannot match the dot position c as required. The macro may be called as (mac 1 2 . 3) in which case c receives the form 3. If it is called as (mac 1 2 3 . 4) then c receives the improper list form 3 . 4.


8.2.24 Regular-Expression Literals

In TXR Lisp, the / character can occur in symbol names, and the / token is a symbol. Therefore the /regex/ syntax is not used for denoting regular expressions; rather, the #/regex/ syntax is used.


8.2.25 Notation for Circular and Shared Structure

TXR Lisp supports a printed notation called circle notation which accurately articulates the representation of objects which contain shared substructures as well as circular references. The notation is supported as a means of input, and is also optionally produced as output, controlled by the *print-circle* variable.

Ordinarily, shared substructure in printed objects is not evident, except in the case of multiple occurrences of interned symbols, in whose semantics it is implicit that they refer to the same object. Other shared structure is printed as separate copies which look like distinct objects. For instance, the object produced by (let ((shared '(1 2))) (list shared shared)) is printed as ((1 2) (1 2)), where it is not clear that the two occurrences of (1 2) are actually the same object. Under the circle notation, this object can be represented as (#5=(1 2) #5#). The #5= part introduces a reference label, associating the arbitrarily chosen nonnegative integer 5 with the object which follows. The subsequent notation #5# simply refers to the object labeled by 5, reproducing that object by reference. The result is a two-element list which has the same (1 2) in two places.

Circular structure presents a greater challenge to printing: namely, if it is printed by a naive recursive descent, it results in infinite output, and possibly stack exhaustion due to recursion. The circle notation detects and handles circular references. For instance, the object produced by (let ((c (list 1))) (rplacd c c)) produces a circular list which looks like an infinite list of 1's: (1 1 1 1 ...). This cannot be printed. However, under the circle notation, it can be represented as #1=(1 . #1#). The entire object itself is labeled by the integer 1. Then, enclosed within the syntax of that labeled object itself, a reference occurs to the label. This circular label reference represents the corresponding circular reference in the object.

A detailed description of the notational elements follows:

#digits= object
The #= syntax introduces an object label which denotes the object whose printed representation follows. The label is identified by the integer value arising from digits digits which are one or more decimal digits. Note: the value zero is permitted; even though when the notation is produced by the TXR Lisp printer, labeling begins at 1. Negative values are not possible because a leading sign is not part of the syntax.

There may be no more than one definition for a given label within the syntactic scope being parsed, otherwise a syntax error occurs. In TXR pattern language code, an entire source file is parsed as one unit, and so scope for the circular notation's references is the entire source file. Files processed by @(include) have their own scope. The scope for labels in TXR Lisp source code is the top-level expression in which they appear. Consequently, references in one TXR Lisp top-level expression cannot reach definitions in another.

The ## syntax denotes a label reference: the repetition of an object that was previously labeled by the integer given by digits. If no such label had been introduced in the syntactic scope, a syntax error occurs. An object was previously labeled by digits if a #= definition occurs in the same syntactic scope as the reference, and is applied to an object which either encloses the reference, or lexically precedes the reference. Forward references such as (#1# #1=(1 2)) are not supported.


Circular notation can span hash-table literals. The syntax #1=#H((:eql-based) (#1# #1#)) denotes an eql-based hash table which contains one entry, in which that same table itself is both the key and value. This kind of circularity is not supported for equal-based hash tables. The analogous syntax #1=#H(() (#1# #1#)) produces a hash table in an inconsistent state.

Dialect Note:

Circle notation is taken from Common Lisp, intended to be unsurprising to users familiar with that language. The implementation is based on descriptions in the ANSI Common Lisp document, judiciously taking into account the content of the X3J13 Cleanup Issues named PRINT-CIRCLE-STRUCTURE:USER-FUNCTIONS-WORK and PRINT-CIRCLE-SHARED:RESPECT-PRINT-CIRCLE.


8.2.26 Notation for Erasing Objects

#; expr
The TXR Lisp notation #; in TXR Lisp indicates that the expression expr is to be read and then discarded, as if it were replaced by whitespace.

This is useful for temporarily "commenting out" an expression.


Whereas it is valid for a TXR Lisp source file to be empty, it is a syntax error if a TXR Lisp source file contains nothing but one or more objects which are each suppressed by a preceding #;. In the interactive listener, an input line consisting of nothing but commented-out objects is similarly a syntax error.

The notation does not cascade; consecutive occurrences of #; trigger a syntax error.

The notation interacts with the circle notation. Firstly, if an object which is erased by #; contains circular-referencing instances of the label notation, those instances refer to nil. Secondly, commented-out objects may introduce labels which are subsequently referenced in expr. An example of the first situation occurs in:


Here the #1# label is a circular reference because it refers to an object which is a parent of the object which contains that reference. Such a reference is only satisfied by a "backpatching" process once the entire surrounding syntax is processed to the top level. The erasure perpetrated by #; causes the #1# label reference to be replaced by nil, and therefore the labeled object is the object (nil).

An example of the second situation is

  #;(#2=(a b c)) #2#

Here, even though the expression (#2=(a b c)) is suppressed, the label definition which it has introduced persists into the following object, where the label reference #2# resolves to (a b c).

A combination of the two situations occurs in

  #;(#1=(#1#)) #1#

which yields (nil). This is because the #1= label is available; but the earlier #1# reference, being a circular reference inside an erased object, had lapsed to nil.


8.3 Generalization of List Accessors

In ancient Lisp in the 1960's, it was not possible to apply the operations car and cdr to the nil symbol (empty list), because it is not a cons cell. In the InterLisp dialect, this restriction was lifted: these operations were extended to accept nil (and return nil). The convention was adopted in other Lisp dialects such as MacLisp and eventually in Common Lisp. Thus there exists an object which is not a cons, yet which takes car and cdr.

In TXR Lisp, this relaxation is extended further. For the sake of convenience, the operations car and cdr, are made to work with strings and vectors:

  (cdr "") -> nil
  (car "") -> nil

  (car "abc") -> #\a
  (cdr "abc") -> "bc"

  (cdr #(1 2 3)) -> #(2 3)
  (car #(1 2 3)) -> 1

Moreover, structure types which define the methods car, cdr and nullify can also be treated in the same way.

The ldiff function is also extended in a special way. When the right parameter a non-list sequence, then it uses the equal equality test rather than eq for detecting the tail of the list.

  (ldiff "abcd" "cd") -> (#\a #\b)

The ldiff operation starts with "abcd" and repeatedly applies cdr to produce "bcd" and "cd", until the suffix is equal to the second argument: (equal "cd" "cd") yields true.

Operations based on car, cdr and ldiff, such as keep-if and remq extend to strings and vectors.

Most derived list processing operations such as remq or mapcar obey the following rule: the returned object follows the type of the leftmost input list object. For instance, if one or more sequences are processed by mapcar, and the leftmost one is a character string, the function is expected to return characters, which are converted to a character string. However, in the event that the objects produced cannot be assembled into that type of sequence, a list is returned instead.

For example [mapcar list "ab" "12"] returns ((#\a #\b) (#\1 #\2)), because a string cannot hold lists of characters. However [mappend list "ab" "12"] returns "a1b2".

The lazy versions of these functions such as mapcar* do not have this behavior; they produce lazy lists.


8.4 Generalization of Iteration

TXR Lisp implements a unified paradigm for iterating over sequence-like container structures and abstract spaces such as bounded and unbounded ranges of integers. This concept is based around an iterator abstraction which is directly compatible with Lisp cons-cell traversal in the sense that when iteration takes place over lists, the iterator instance is nothing but a cons cell.

An iterator is created using the constructor function iter-begin which takes a single argument. The argument denotes a space to be traversed; the iterator provides the means for that traversal.

When the iter-begin function is applied to a list (a cons cell or the nil object), the return value is that object itself. The remaining functions in the iterator API then behave like aliases for list processing functions. The iter-more function behaves like identity, iter-item behaves like car and iter-step behaves like cdr.

For example, the following loops not only produce identical behavior, but the iter variable steps through the cons cells in the same manner in both:

  ;; print all symbols in the list (a b c d):

  (let ((iter '(a b c d)))
    (while iter
      (prinl (car iter))
      (set iter (cdr iter))))

  ;; likewise:

  (let ((iter (iter-begin '(a b c d))))
    (while (iter-more iter)
      (prinl (iter-item iter))
      (set iter (iter-step iter))))

There are three important differences.

Firstly, both examples will still work if the list (a b c d) is replaced by a different kind of sequence, such as the string "abcd" or the vector #(a b c d). However, the former example will not execute efficiently on these objects. The reason is that the cdr function will construct successive suffixes of the string and list object. That requires not only the allocation of memory, but changes the running time complexity of the loop from linear to quadratic.

Secondly, the former example with car/cdr will not work correctly if the sequence is an empty non-list sequence, like the null string or empty vector. Rectifying this problem requires the nullify function to be used:

  ;; print all symbols in the list (a b c d):

  (let ((iter (nullify "abcd")))
    (while iter
      (prinl (car iter))
      (set iter (cdr iter))))

The nullify function converts empty sequences of all kinds into the empty list nil.

Thirdly, the second example will work even if the input list is replaced with certain objects which are not sequences at all:

  ;; Print the integers from 0 to 3

  (let ((iter (iter-begin 0..4)))
    (while (iter-more iter)
      (prinl (iter-item iter))
      (set iter (iter-step iter))))

  ;; Print incrementing integers starting at 1,
  ;; breaking out of the loop after 100.

  (let ((iter (iter-begin 1)))
    (while (iter-more iter)
      (if (eql 100 (prinl (iter-item iter)))
      (set iter (iter-step iter))))

In TXR Lisp, numerous functions that appear as list processing functions in other contemporary Lisp dialects, and historically, are actually sequence processing functions based on the above iterator paradigm.


8.5 Callable Objects

In TXR Lisp, sequences (strings, vectors and lists) as well as hashes and regular expressions can be used as functions everywhere, not just with the DWIM brackets.

Sequences work as one- or two-argument functions. With a single argument, an element is selected by position and returned. With two arguments, a range is extracted and returned.

Moreover, when a sequence is used as a function of one argument, and the argument is a range object rather than an integer, then the call is equivalent to the two-argument form. This is the basis for array slice syntax like ["abc" 0..1] .

Hashes also work as one or two argument functions, corresponding to the arguments of the gethash function.

A regular expression behaves as a one, two, or three argument function, which operates on a string argument. It returns the leftmost matching substring, or else nil.

Structure objects are callable if they implement the lambda method.

Integers and ranges are callable like functions. They take one argument, which must be a sequence or hash. An integer selects the corresponding element position from the sequence, and a range extracts a slice of its argument.

Example 1:

  (mapcar "abc" '(2 0 1)) -> (#\c #\a #\b)

Here, mapcar treats the string "abc" as a function of one argument (since there is one list argument). This function maps the indices 0, 1 and 2 to the corresponding characters of string "abc". Through this function, the list of integer indices (2 0 1) is taken to the list of characters (#\c #\a #\b).

Example 2:

  (call '(1 2 3 4) 1..3) -> (2 3)

Here, the shorthand 1 .. 3 denotes (rcons 1 3). A range used as an argument to a sequence performs range extraction: taking a slice starting at index 1, up to and not including index 3, as if by the call (sub '(1 2 3 4) 1 3).

Example 3:

  (call '(1 2 3 4) '(0 2)) -> (1 2)

A sequence applied to a list of index arguments is equivalent to using the select function, as if (select '(1 2 3 4) '(0 2)) were called.

Example 4:

  (call #/b./ "abcd") -> "bc"

Here, the regular expression, called as a function, finds the matching substring "bc" within the argument "abcd".

Example 5:

  [1 "abcd"] -> #\b

  ["abcd" 1] -> #\b
An integer used as function indexes into sequence. This produces the same result as when the sequence is used as a function with an integer argument.

Example 6:

  [1..3 '(a b c d)] -> (b c)
  ['(a b c d) 1..3] -> (b c)

A range used as a function extracts a slice of its argument.


8.6 Special Variables

Similarly to Common Lisp, TXR Lisp is lexically scoped by default, but also has dynamically scoped (a.k.a "special") variables.

When a variable is defined with defvar or defparm, a binding for the symbol is introduced in the global name space, regardless of in what scope the defvar form occurs.

Furthermore, at the time the defvar form is evaluated, the symbol which names the variable is tagged as special.

When a symbol is tagged as special, it behaves differently when it is used in a lexical binding construct like let, and all other such constructs such as function parameter lists. Such a binding is not the usual lexical binding, but a "rebinding" of the global variable. Over the dynamic scope of the form, the global variable takes on the value given to it by the rebinding. When the form terminates, the prior value of the variable is restored. (This is true no matter how the form terminates; even if by an exception.)

Because of this "pervasive special" behavior of a symbol that has been used as the name of a global variable, a good practice is to make global variables have visually distinct names via the "earmuffs" convention: beginning and ending the name with an asterisk.


  (defvar *x* 42)     ;; *x* has a value of 42

  (defun print-x ()
    (format t "~a\n" *x*))

  (let ((*x* "abc"))  ;; this overrides *x*
    (print-x))        ;; *x* is now "abc" and so that is printed

  (print-x)           ;; *x* is 42 again and so "42" is printed

Dialect Note 1:

The terms bind and binding are used differently in TXR Lisp compared to ANSI Common Lisp. In TXR Lisp binding is an association between a symbol and an abstract storage location. The association is registered in some namespace, such as the global namespace or a lexical scope. That storage location, in turn, contains a value. In ANSI Lisp, a binding of a dynamic variable is the association between the symbol and a value. It is possible for a dynamic variable to exist, and not have a value. A value can be assigned, which creates a binding. In TXR Lisp, an assignment is an operation which transfers a value into a binding, not one which creates a binding.

In ANSI Lisp, a dynamic variable can exist which has no value. Accessing the value signals a condition, but storing a value is permitted; doing so creates a binding. By contrast, in TXR Lisp a global variable cannot exist without a value. If a defvar form doesn't specify a value, and the variable doesn't exist, it is created with a value of nil.

Dialect Note 2:

Unlike ANSI Common Lisp, TXR Lisp has global lexical variables in addition to special variables. These are defined using defvarl and defparml. The only difference is that when variables are introduced by these macros, the symbols are not marked special, so their binding in lexical scopes is not altered to dynamic binding.

Many variables in TXR Lisp's standard library are global lexicals. Those which are special variables obey the "earmuffs" convention in their naming. For instance s-ifmt, log-emerg and sig-hup are global lexicals, because they provide constant values for which overriding doesn't make sense. On the other hand the standard output stream variable *stdout* is special. Overriding it over a dynamic scope is useful, as a means of redirecting the output of functions which write to the *stdout* stream.

Dialect Note 3:

In Common Lisp, defparm is known as defparameter.


8.7 Syntactic Places and Accessors

The TXR Lisp feature known as syntactic places allows programs to use the syntax of a form which is used to access a value from an environment or object, as an expression which denotes a place where a value may be stored.

They are almost exactly the same concept as "generalized references" in Common Lisp, and are related to "lvalues" in languages in the C family, or "designators" in Pascal.


8.7.1 Symbolic Places

A symbol is a is a syntactic place if it names a variable. If a is a variable, then it may be assigned using the set operator: the form (set a 42) causes a to have the integer value 42.


8.7.2 Compound Places

A compound expression can be a syntactic place, if its leftmost constituent is as symbol which is specially registered, and if the form has the correct syntax for that kind of place, and suitable semantics. Such an expression is a compound place.

An example of a compound place is a car form. If c is an expression denoting a cons cell, then (car c) is not only an expression which retrieves the value of the car field of the cell. It is also a syntactic place which denotes that field as a storage location. Consequently, the expression (set (car c) "abc") stores the character string "abc" in that location. Although the same effect can be obtained with (rplaca c "abc") the syntactic place frees the programmer from having to remember different update functions for different kinds of places. There are various other advantages. TXR Lisp provides a plethora of operators for modifying a place in addition to set. Subject to certain usage restrictions, these operators work uniformly on all places. For instance, the expression (rotate (car x) [str 3] y) causes three different kinds of places to exchange contents, while the three expressions denoting those places are evaluated only once. New kinds of place update macros like rotate are quite easily defined, as are new kinds of compound places.


8.7.3 Accessor Functions

When a function call form such as the above (car x) is a syntactic place, then the function is called an accessor. This term is used throughout this document to denote functions which have associated syntactic places.


8.7.4 Macro Call Syntactic Places

Syntactic places can be macros (global and lexical), including symbol macros. So for instance in (set x 42) the x place can actually be a symbolic macro which expands to, say, (cdr y). This means that the assignment is effectively (set (cdr y) 42).


8.7.5 User-Defined Syntactic Places and Place Operators

Syntactic places, as well as operators upon syntactic places, are both open-ended. Code can be written quite easily in TXR Lisp to introduce new kinds of places, as well as new place-mutating operators. New places can be introduced with the help of the defplace, define-accessor or defset macros, or possibly the define-place-macro macro in simple cases when a new syntactic place can be expressed as a transformation to the syntax of an existing place. Three ways exist for developing new place update macros (place operators). They can be written using the ordinary macro definer ordinary macro definer defmacro, with the help of special utility macros called with-update-expander, with-clobber-expander, and with-delete-expander. They can also be written using defmacro in conjunction with the operators placelet or placelet*. Simple update macros similar to inc and push can be written compactly using define-modify-macro.


8.7.6 Deletable Places

Unlike generalized references in Common Lisp, TXR Lisp syntactic places support the concept of deletion. Some kinds of places can be deleted, which is an action distinct from (but does not preclude) being overwritten with a value. What exactly it means for a place to be deleted, or whether that is even permitted, depends on the kind of place. For instance a place which denotes a lexical variable may not be deleted, whereas a global variable may be. A place which denotes a hash-table entry may be deleted, and results in the entry being removed from the hash table. Deleting a place in a list causes the trailing items, if any, or else the terminating atom, to move in to close the gap. Users may define new kinds of places which support deletion semantics.


8.7.7 Evaluation of Places

To bring about their effect, place operators must evaluate one or more places. Moreover, some of them evaluate additional forms which are not places. Which arguments of a place operator form are places and which are ordinary forms depends on its specific syntax. For all the built-in place operators, the position of an argument in the syntax determines whether it is treated as (and consequently required to be) a syntactic place, or whether it is an ordinary form.

All built-in place operators perform the evaluation of place and non-place argument forms in strict left-to-right order.

Place forms are evaluated not in order to compute a value, but in order to determine the storage location. In addition to determining a storage location, the evaluation of a place form may possibly give rise to side effects. Once a place is fully evaluated, the storage location can then be accessed. Access to the storage location is not considered part of the evaluation of a place. To determine a storage location means to compute some hidden referential object which provides subsequent access to that location without the need for a reevaluation of the original place form. (The subsequent access to the place through this referential object may still require a multi-step traversal of a data structure; minimizing such steps is a matter of optimization.)

Place forms may themselves be compounds, which contain subexpressions that must be evaluated. All such evaluation for the built-in places takes place in left to right order.

Certain place operators, such as shift and rotate, exhibit an unspecified behavior with regard to the timing of the access of the prior value of a place, relative to the evaluation of places which occur later in the same place operator form. Access to the prior values may be delayed until the entire form is evaluated, or it may be interleaved into the evaluation of the form. For example, in the form (shift a b c 1), the prior value of a can be accessed and saved as soon as a is evaluated, prior to the evaluation of b. Alternatively, a may be accessed and saved later, after the evaluation of b or after the evaluation of all the forms. This issue affects the behavior of place-modifying forms whose subforms contain side effects. It is recommended that such forms not be used in programs.


8.7.8 Nested Places

Certain place forms are required to have one or more arguments which are themselves places. The prime example of this, and the only example from among built-in syntactic places, are DWIM forms. A DWIM form has the syntax

obj-place index [alt])

and the square-bracket-notation equivalent:

obj-place index [alt]]

Note that not only is the entire form a place, denoting some element or element range of obj-place, but there is the added constraint that obj-place must also itself be a syntactic place.

This requirement is necessary, because it supports the behavior that when the element or element range is updated, then obj-place is also potentially updated.

After the assignment (set [obj 0..3] '("forty" "two")) not only is the range of places denoted by [obj 0..3] replaced by the list of strings ("forty" "two") but obj may also be overwritten with a new value.

This behavior is necessary because the DWIM brackets notation maintains the illusion of an encapsulated array-like container over several dissimilar types, including Lisp lists. But Lisp lists do not behave as fully encapsulated containers. Some mutations on Lisp lists return new objects, which then have to stored (or otherwise accepted) in place of the original objects in order to maintain the array-like container illusion.


8.7.9 Built-In Syntactic Places

The following is a summary of the built-in place forms, in addition to symbolic places denoting variables. New syntactic place forms can be defined by TXR programs.

object [num])
object [num])
index obj)
index obj)
num obj)
index obj)
seq idx)
sequence [from [to]])
vec idx)
str idx)
hash key [alt])
obj-place index [alt])
integer obj-place ) ;; integers are callable
range obj-place ) ;; ranges are callable
obj [from [to]])
obj [from [to]])
str [from [to]])
obj-place index [alt]] ;; equivalent to dwim
integer obj-place ]
range obj-place ]
struct-obj slot-name-valued-form)
struct-obj slot-name) ;; by macro-expansion to (slot ...)
struct-obj.slot-name ;; equivalent to qref
socket level option [ffi-type])
carray [from [to]])
buf [from [to]])


8.7.10 Built-In Place-Mutating Operators

The following is a summary of the built-in place mutating macros. They are described in detail in their own sections.

(set {place new-value}*)
Assigns the values of expressions to places, performing assignments in left-to-right order, returning the value assigned to the rightmost place.

(pset {place new-value}*)
Assigns the values of expressions to places, performing the determination of places and evaluation of the expressions left to right, but the assignment in parallel. Returns the value assigned to the rightmost place.

(zap place [new-value])
Assigns new-value to place, defaulting to nil, and returns the prior value.

(flip place)
Logically toggles the Boolean value of place, and returns the new value.

(test-set place)
If place contains nil, stores t into the place and returns t to indicate that the store took place. Otherwise does nothing and returns nil.

(test-clear place)
If place contains a Boolean true value, stores nil into the place and returns t to indicate that the store took place. Otherwise does nothing and returns nil.

(compare-swap place cmp-fun cmp-val store-val)
Examines the value of place and compares it to cmp-val using the comparison function given by the function name cmp-fun. If the comparison is false, returns nil. Otherwise, stores the store-val value into place and returns t.

(ensure place init-expr)
If the place is nil, evaluates init-expr, stores that value into place and returns it. Otherwise, returns the value of place without changing its value or evaluating init-expr.

(inc place [delta])
Increments place by delta, which defaults to 1, and returns the new value.

(dec place [delta])
Decrements place by delta, which defaults to 1, and returns the new value.

(pinc place [delta])
Increments place by delta, which defaults to 1, and returns the old value.

(pdec place [delta])
Decrements place by delta, which defaults to 1, and returns the old value.

(test-inc place [delta [from-val]])
Increments place by delta and returns t if the previous value was eql to from-val, where delta defaults to 1 and from-val defaults to zero.

(test-dec place [delta [to-val]])
Decrements place by delta and returns t if the new value is eql to to-val, where delta defaults to 1 and to-val defaults to 0.

(swap left-place right-place)
Exchanges the values of left-place and right-place.

(push item place)
Adds item to the front of the list which is currently stored in place, then stores the extended list back into place and returns it.

(pop place)
Pop the list stored in place and returns the popped value.

(shift place+ shift-in-value)
Treats one or more places as a "multi-place shift register". Values are shifted to the left among the places. The rightmost place receives shift-in-value, and the value of the leftmost place emerges as the return value.

(rotate place*)
Treats zero or more places as a "multi-place rotate register". The places exchange values among themselves, by a rotation by one place to the left. The value of the leftmost place goes to the rightmost place, and that value is returned.

(del place)
Deletes a place which supports deletion, and returns the value which existed in that place prior to deletion.

(lset {place}+ sequence)
Sets multiple places to values obtained from successive elements of sequence.

(upd place opip-arg*)
Applies an opip-style operational pipeline to the value of place and stores the result back into place.

(set-mask place integer*)
Sets to 1 the bits in place corresponding to bits that are equal to 1 in the mask made up of the integer arguments (by combining them together with the inclusive or operation).

(clear-mask place integer*)
Clears (sets to 0) the bits in place corresponding to bits that are equal to 1 in the mask made up of the integer arguments (by combining them together with the inclusive or operation).


8.8 Namespaces and Environments

TXR Lisp is a Lisp-2 dialect: it features separate namespaces for functions and variables.


8.8.1 Global Functions and Operator Macros

In TXR Lisp, global functions and operator macros coexist, meaning that the same symbol can be defined as both a macro and a function.

There is a global namespace for functions, into which functions can be introduced with the defun macro. The global function environment can be inspected and modified using the symbol-function accessor.

There is a global namespace for macros, into which macros are introduced with the defmacro macro. The global function environment can be inspected and modified using the symbol-macro accessor.

If a name x is defined as both a function and a macro, then an expression of the form (x ...) is expanded by the macro, whereas an expression of the form [x ...] refers to the function. Moreover, the macro can produce a call to the function. The expression (fun x) will retrieve the function object.


8.8.2 Global and Dynamic Variables

There is a global namespace for variables also. The operators defvar and defparm introduce bindings into this namespace. These operators have the side effect of marking a symbol as a special variable, of the symbol are treated as dynamic variables, subject to rebinding. The global variable namespace together with the special dynamic rebinding is called the dynamic environment. The dynamic environment can be inspected and modified using the symbol-value accessor.

The operators defvarl and defparml introduce bindings into the global namespace without marking symbols as special variables. Such bindings are called global lexical variables.


8.8.3 Global Symbol Macros

Symbol macros may be defined over the global variable namespace using defsymacro.

Note that whereas a symbol may simultaneously have both a function and macro binding in the global namespace, a symbol may not simultaneously have a variable and symbol macro binding.


8.8.4 Lexical Environments

In addition to global and dynamic namespaces, TXR Lisp provides lexically scoped binding for functions, variables, macros, and symbol macros. Lexical variable binding are introduced with let, let* or various binding macros derived from these. Lexical functions are bound with flet and labels. Lexical macros are established with macrolet and lexical symbol macros with symacrolet.

Macros receive an environment parameter with which they may expand forms in their correct environment, and perform some limited introspection over that environment in order to determine the nature of bindings, or the classification of forms in those environments. This introspection is provided by lexical-var-p, lexical-fun-p, and lexical-lisp1-binding.

Lexical operator macros and lexical functions can also coexist in the following way. A lexical function shadows a global or lexical macro completely. However, the reverse is not the case. A lexical macro shadows only those uses of a function which look like macro calls. This is succinctly demonstrated by the following form:

  (flet ((foo () 43))
    (macrolet ((foo () 44))
      (list (fun foo) (foo) [foo])))

  -> (#<interpreted fun: lambda nil> 44 43)

The (fun foo) and [fun] expressions are oblivious to the macro; the macro expansion process process the symbol foo in those contexts. However the form (foo) is subject to macro-expansion and replaced with 44.

If the flet and macrolet are reversed, the behavior is different:

  (macrolet ((foo () 44))
    (flet ((foo () 43))
      (list (fun foo) (foo) [foo])))

  -> (#<interpreted fun: lambda nil> 43 43)

All three forms refer to the function, which lexically shadows the macro.


8.8.5 Pattern Language and Lisp Scope Nesting

TXR Lisp expressions can be embedded in the TXR pattern language in various ways. Likewise, the pattern language can be invoked from TXR Lisp. This brings about the possibility that Lisp code attempts to access pattern variables bound in the pattern language. The TXR pattern language can also attempt to access TXR Lisp variables.

The rules are as follows, but they have undergone historic changes. See the COMPATIBILITY section, in particular notes under 138 and 121, and also 124.

A Lisp expression evaluated from the TXR pattern language executes in a null lexical environment. The current set of pattern variables captured up to that point by the pattern language are installed as dynamic variables. They shadow any Lisp global variables (whether those are defined by defvar or defvarl).

In the reverse direction, a variable reference from the TXR pattern language searches the pattern variable space first. If a variable doesn't exist there, then the lookup refers to the TXR Lisp global variable space. The pattern language doesn't see Lisp lexical variables.

When Lisp code is evaluated from the pattern language, the pattern variable bindings are not only installed as dynamic variables for the sake of their visibility from Lisp, but they are also specially stored in a dynamic environment frame. When TXR pattern code is reentered from Lisp, these bindings are picked up from the closest such environment frame, allowing the nested invocation of pattern code to continue with the bindings captured by outer pattern code.

Concisely, in any context in which a symbol has both a binding as a Lisp global variable as well as a pattern variable, that symbol refers to the pattern variable. Pattern variables are propagated through Lisp evaluation into nested invocations of the pattern language.

The pattern language can also reference Lisp variables using the @ prefix, which is a consequence of that prefix introducing an expression that is evaluated as Lisp, the name of a variable being such an expression.




9.1 Conventions

The following sections list all of the special operators, macros and functions in TXR Lisp.

In these sections, syntax is indicated using these conventions:

A symbol in fixed-width-italic font denotes some syntactic unit: it may be a symbol or compound form. The syntactic unit is explained in the corresponding Description section.

{syntax}* word*
This indicates a repetition of zero or more of the given syntax enclosed in the braces or syntactic unit. The curly braces may be omitted if the scope of the * is clear.

{syntax}+ word+
This indicates a repetition of one or more of the given syntax enclosed in the braces or syntactic unit. The curly braces may be omitted if the scope of the + is clear.

{syntax | syntax | ...}
This indicates a single, mandatory element, which is selected from among the indicated alternatives. May be combined with + or * repetition.

[syntax] [word]
Square brackets indicate optional syntax.

[syntax | syntax | ...]
Square brackets containing piped elements indicate an optional element, which, if present, must be chosen from among the indicated alternatives.

'[' ']'
The quoted square brackets indicate literal brackets which appear in the syntax, which they do without quotes. For instance '['foo [ bar ]']' is a pattern denotes the two possible expressions [foo] and [foo bar].

syntax -> result
The arrow notation is used in examples to indicate that the evaluation of the given syntax produces a value, whose printed representation is result.


9.2 Form Evaluation

A compound expression with a symbol as its first element, if intended to be evaluated, denotes either an operator invocation or a function call. This depends on whether the symbol names an operator or a function.

When the form is an operator invocation, the interpretation of the meaning of that form is under the complete control of that operator.

If the compound form is a function call, the remaining forms, if any, denote argument expressions to the function. They are evaluated in left-to-right order to produce the argument values, which are passed to the function. An exception is thrown if there are not enough arguments, or too many. Programs can define named functions with the defun operator

Some operators are macros. There exist predefined macros in the library, and macro operators can also be user-defined using the macro-defining operator defmacro. Operators that are not macros are called special operators.

Macro operators work as functions which are given the source code of the form. They analyze the form, and translate it to another form which is substituted in their place. This happens during a code walking phase called the expansion phase, which is applied to each top-level expression prior to evaluation. All macros occurring in a form are expanded in the expansion phase, and subsequent evaluation takes place on a structure which is devoid of macros. All that remains are the executable forms of special operators, function calls, symbols denoting either variables or themselves, and atoms such as numeric and string literals.

Special operators can also perform code transformations during the expansion phase, but that is not considered macroexpansion, but rather an adjustment of the representation of the operator into a required executable form. In effect, it is post-macro compilation phase.

Note that Lisp forms occurring in TXR pattern language are not individual top-level forms. Rather, the entire TXR query is parsed at the same time, and the macros occurring in its Lisp forms are expanded at that time.


9.2.1 Operator quote




The quote operator, when evaluated, suppresses the evaluation of form, and instead returns form itself as an object. For example, if form is a symbol sym, then the value of (quote sym) is sym itself. Without quote, sym would evaluate to the value held by the variable which is named sym, or else throw an error if there is no such variable. The quote operator never raises an error, if it is given exactly one argument, as required.

The notation 'obj is translated to the object (quote obj) providing a shorthand for quoting. Likewise, when an object of the form (quote obj) is printed, it appears as 'obj.


  ;; yields symbol a itself, not value of variable a
  (quote a) -> a

  ;; yields three-element list (+ 2 2), not 4.
  (quote (+ 2 2)) -> (+ 2 2)


9.3 Variable Binding

Variables are associations between symbols and storage locations which hold values. These associations are called bindings.

Bindings are held in a context called an environment.

Lexical environments hold local variables, and nest according to the syntactic structure of the program. Lexical bindings are always introduced by a some form known as a binding construct, and the corresponding environment is instantiated during the evaluation of that construct. There also exist bindings outside of any binding construct, in the so-called global environment. Bindings in the global environment can be temporarily shadowed by lexically-established binding in the dynamic environment. See the Special Variables section above.

Certain special symbols cannot be used as variable names, namely the symbols t and nil, and all of the keyword symbols (symbols in the keyword package), which are denoted by a leading colon. When any of these symbols is evaluated as a form, the resulting value is that symbol itself. It is said that these special symbols are self-evaluating or self-quoting, similarly to all other atom objects such as numbers or strings.

When a form consisting of a symbol, other than the above special symbols, is evaluated, it is treated as a variable, and yields the value of the variable's storage location. If the variable doesn't exist, an exception is thrown.

Note: symbol forms may also denote invocations of symbol macros. (See the operators defsymacro and symacrolet). All macros, including symbol macros, which occur inside a form are fully expanded prior to the evaluation of a form, therefore evaluation does not consider the possibility of a symbol being a symbol macro.


9.3.1 Operator defvar and Macro defparm


sym [value])
sym value)


The defvar operator binds a name in the variable namespace of the global environment. Binding a name means creating a binding: recording, in some namespace of some environment, an association between a name and some named entity. In the case of a variable binding, that entity is a storage location for a value. The value of a variable is that which has most recently been written into the storage location, and is also said to be a value of the binding, or stored in the binding.

If the variable named sym already exists in the global environment, the form has no effect; the value form is not evaluated, and the value of the variable is unchanged.

If the variable does not exist, then a new binding is introduced, with a value given by evaluating the value form. If the form is absent, the variable is initialized to nil.

The value form is evaluated in the environment in which the defvar form occurs, not necessarily in the global environment.

The symbols t and nil may not be used as variables, nor can they be keyword symbols (symbols denoted by a leading colon).

In addition to creating a binding, the defvar operator also marks sym as the name of a special variable. This changes what it means to bind that symbol in a lexical binding construct such as the let operator, or a function parameter list. See the section "Special Variables" far above.

The defparm macro behaves like defvar when a variable named sym doesn't already exist.

If sym already denotes a variable binding in the global namespace, defparm evaluates the value form and assigns the resulting value to the variable.

The following equivalence holds:

  (defparm x y)  <-->  (prog1 (defvar x) (set x y))

The defvar and defparm forms return sym.


9.3.2 Macros defvarl and defparml


sym [value])
sym value)


The defvarl and defparml macros behave, respectively, almost exactly like defvar and defparm.

The difference is that these operators do not mark sym as special.

If a global variable sym does not previously exist, then after the evaluation of either of these forms (boundp sym) is true, but (special-var-p sym) isn't.

If sym had been already introduced as a special variable, it stays that way after the evaluation of defvarl or defparml.


9.3.3 Operators let and let*


  (let ({
sym | (sym init-form)}*) body-form*)
  (let* ({
sym | (sym init-form)}*) body-form*)


The let and let* operators introduce a new scope with variables and evaluate forms in that scope. The operator symbol, either let or let*, is followed by a list which can contain any mixture of sym or (sym init-form) pairs. Each sym must be a symbol, and specifies the name of variable to be instantiated and initialized.

The (sym init-form) variant specifies that the new variable sym receives an initial value from the evaluation of init-form. The plain sym variant specifies a variable which is initialized to nil. The init-forms are evaluated in order, by both let and let*.

The symbols t and nil may not be used as variables, and neither can be keyword symbols: symbols denoted by a leading colon.

The difference between let and let* is that in let*, later init-forms are in scope of the variables established by earlier variables in the same let* construct. In plain let, the init-forms are evaluated in a scope which does not include any of the variables.

When the variables are established, the body-forms are evaluated in order. The value of the last body-form becomes the return value of the let. If there are no body-forms, then the return value nil is produced.

The list of variables may be empty.

The list of variables may contain duplicate syms if the operator is let*. In that situation, a given init-form has in scope the rightmost duplicate of any given sym that has been previously established. The body-forms have in scope the rightmost duplicate of any sym in the construct. Therefore, the following form calculates the value 3:

  (let* ((a 1)
         (a (succ a))
         (a (succ a)))

Each duplicate is a separately instantiated binding, and may be independently captured by a lexical closure placed in a subsequent init-form:

  (let* ((a 0)
         (f1 (lambda () (inc a)))
         (a 0)
         (f2 (lambda () (inc a))))
    (list [f1] [f1] [f1] [f2] [f2] [f2]))

  --> (1 2 3 1 2 3)

The preceding example shows that there are two mutable variables named a in independent scopes, each respectively captured by the separate closures f1 and f2. Three calls to f1 increment the first a while the second a retains its initial value.

Under let, the behavior of duplicate variables is unspecified.

Implementation note: the TXR compiler diagnoses and rejects duplicate symbols in let whereas the interpreter ignores the situation.

When the names of a special variables is specified in let or let* remain, a new binding is created for them in the dynamic environment, rather than the lexical environment. In let*, later init-forms are evaluated in a dynamic scope in which previous dynamic variables are established, and later dynamic variables are not yet established. A special variable may appear multiple times in a let*, just like a lexical variable. Each duplicate occurrence extends the dynamic environment with a new dynamic binding. All these dynamic environments are removed when the let or let* form terminates. Dynamic environments aren't captured by lexical closures, but are captured in delimited continuations.


  (let ((a 1) (b 2)) (list a b)) -> (1 2)
  (let* ((a 1) (b (+ a 1))) (list a b (+ a b))) -> (1 2 3)
  (let ()) -> nil
  (let (:a nil)) -> error, :a and nil can't be used as variables


TXR Lisp follows ANSI Common Lisp in making let the parallel binding construct, and let* the sequential one. In that language, the situation exists for historic reasons: mainly that let was initially understood as being a macro for an immediately-called lambda where the parameters come into existence simultaneously, receiving the evaluated values of all the argument expressions. The need for sequential binding was recognized later, by which time let was cemented as a parallel binding construct. There are very good arguments for, in a new design, using the let name for the construct which has sequential semantics. Nevertheless, in this matter, TXR Lisp remains compatible with dialects like ANSI CL and Emacs Lisp.


9.3.4 Operator progv


symbols-expr values-expr body-form*)


The progv operator binds dynamic variables, and evaluates the body-forms in the dynamic scope of those bindings. The bindings are removed when the form terminates. The result value is that of the last body-form or else nil if there are no forms.

The symbols-expr and values-expr are expressions which are evaluated. Their values are expected to be lists, of bindable symbols and arbitrary values, respectively. The symbols coming from one list are bound to the values coming from the other list.

If there are more symbols than values, then the extra symbols will appear unbound, as if they were first bound and then hidden using the makunbound function.

If there are more values than symbols, the extra values are ignored.

Note that dynamic binding takes place for the symbols even if they have not been introduced as special variables via defvar or defparm. However, if those symbols appear as expressions denoting variables inside the body-forms, they will not necessarily be treated as dynamic variables. If they have lexical definitions in scope, those will be referenced. Furthermore, the compiler treats undefined variables as global references, and not dynamic.


  (progv '(a b) '(1 2) (cons a b))  ->  (1 . 2)

  (progv '(x) '(1) (let ((x 4)) (symbol-value 'x))) -> 1

  (let ((x 'lexical)
        (vars (list 'x))
        (vals (list 'dynamic)))
    (progv vars vals (list x (symbol-value 'x))))

  --> (lexical dynamic)


9.4 Functions


9.4.1 Operator defun


name (param* [: opt-param*] [. rest-param])


The defun operator introduces a new function in the global function namespace. The function is similar to a lambda, and has the same parameter syntax and semantics as the lambda operator.

Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.

Unlike in lambda, the body-forms of a defun are surrounded by a block. The name of this block is the same as the name of the function, making it possible to terminate the function and return a value using (return-from name value). For more information, see the definition of the block operator.

A function may call itself by name, allowing for recursion.

The special symbols t and nil may not be used as function names. Neither can keyword symbols.

It is possible to define methods as well as macros with defun, as an alternative to the defmeth and defmacro forms.

To define a method, the syntax (meth type name) should be used as the argument to the name parameter. This gives rise to the syntax (defun (meth type name) args form*) which is equivalent to the (defmeth type name args form*) syntax.

Macros can be defined using (macro name) as the name parameter of defun. This way of defining a macro doesn't support destructuring; it defines the expander as an ordinary function with an ordinary argument list. To work, the function must accept two arguments: the entire macro call form that is to be expanded, and the macro environment. Thus, the macro definition syntax is (defun (macro name) form env form*) which is equivalent to the (defmacro name (:form form :env env) form*) syntax.

Dialect Note:

In ANSI Common Lisp, keywords may be used as function names. In TXR Lisp, they may not.

Dialect Note:

A function defined by defun may coexist with a macro defined by defmacro. This is not permitted in ANSI Common Lisp.


9.4.2 Operator lambda


  (lambda (
param* [: opt-param*] [. rest-param])


The lambda operator produces a value which is a function. Like in most other Lisps, functions are objects in TXR Lisp. They can be passed to functions as arguments, returned from functions, aggregated into lists, stored in variables, etc.

Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.

The first argument of lambda is the list of parameters for the function. It may be empty, and it may also be an improper list (dot notation) where the terminating atom is a symbol other than nil. It can also be a single symbol.

The second and subsequent arguments are the forms making up the function body. The body may be empty.

When a function is called, the parameters are instantiated as variables that are visible to the body forms. The variables are initialized from the values of the argument expressions appearing in the function call.

The dotted notation can be used to write a function that accepts a variable number of arguments. There are two ways write a function that accepts only a variable argument list and no required arguments:

  (lambda (.
rest-param) ...)
rest-param ...)

(These notations are syntactically equivalent because the list notation (. X) actually denotes the object X which isn't wrapped in any list).

The keyword symbol : (colon) can appear in the parameter list. This is the symbol in the keyword package whose name is the empty string. This symbol is treated specially: it serves as a separator between required parameters and optional parameters. Furthermore, the : symbol has a role to play in function calls: it can be specified as an argument value to an optional parameter by which the caller indicates that the optional argument is not being specified. It will be processed exactly that way.

An optional parameter can also be written in the form (name expr [sym]). In this situation, if the call does not specify a value for the parameter, or specifies a value as the : (colon) keyword symbol, then the parameter takes on the value of the expression expr. This expression is only evaluated when its value is required.

If sym is specified, then sym will be introduced as an additional binding with a Boolean value which indicates whether or not the optional parameter had been specified by the caller.

Each expr that is evaluated is evaluated in an environment in which all of the previous parameters are visible, in addition to the surrounding environment of the lambda. For instance:

  (let ((default 0))
    (lambda (str : (end (length str)) (counter default))
      (list str end counter)))

In this lambda, the initializing expression for the optional parameter end is (length str), and the str variable it refers to is the previous argument. The initializer for the optional variable counter is the expression default, and it refers to the binding established by the surrounding let. This reference is captured as part of the lambda's lexical closure.

Keyword symbols, and the symbols t and nil may not be used as parameter names. The behavior is unspecified if the same symbol is specified more than once anywhere in the parameter list, whether as a parameter name or as the indicator sym in an optional parameter or any combination.

Implementation note: the TXR compiler diagnoses and rejects duplicate symbols in lambda whereas the interpreter ignores the situation.

Note: it is not always necessary to use the lambda operator directly in order to produce an anonymous function.

In situations when lambda is being written in order to simulate partial evaluation, it may be possible to instead make use of the op macro. For instance the function (lambda (. args) [apply + a args]) which adds the values of all of its arguments together, and to the lexically captured variable a can be written more succinctly as (op + a). The op operator is the main representative of a family of operators: lop, ap, ip, do, ado, opip and oand.

In situations when functions are simply combined together, the effect may be achieved using some of the available functional combinators, instead of a lambda. For instance chaining together functions as in (lambda (x) (square (cos x))) is achievable using the chain function: [chain cos square]. The opip operator can also be used: (opip cos square). Numerous combinators are available; see the section Partial Evaluation and Combinators.

When a function is needed which accesses an object, there are also alternatives. Instead of (lambda (obj) obj.slot) and (lambda (obj arg) obj.(slot arg)), it is simpler to use the .slot and .(slot arg) notations. See the section Unbound Referencing Dot. Also see the functions umethod and uslot as well as the related convenience macros umeth and usl.

If a function is needed which partially applies, to some arguments, a method invoked on a specific object, the method function or meth macro may be used. For instance, instead of (lambda (arg) obj.(method 3 arg)), it is possible to write (meth obj 3) except that the latter produces a variadic function.


The following expression returns a function which captures the variable counter. Whenever the returned function is called, it increments counter by one, and returns the incremented value.

  (let ((counter 0))
    (lambda () (inc counter)))

The following produces a variadic function which requires at least two arguments. The third and subsequent arguments are aggregated into a list passed as the single parameter z:

  (lambda (x y . z) (list 'my-arguments-are x y z))

A variadic function with no required arguments. The parameter name for the received arguments appears alone in place of the parameter list.

  (lambda args (list 'my-list-of-arguments args))

Same as the previous example, using a dotted notation specific to TXR Lisp.

  (lambda (. args) (list 'my-list-of-arguments args))

Note that (. args) is just a written notation equivalent to args and not a different object structure.

Optional arguments:

  [(lambda (x : y) (list x y)) 1] -> (1 nil)
  [(lambda (x : y) (list x y)) 1 2] -> (1 2)

Passing : (colon symbol) to request default value of optional parameter:

  [(lambda (x : (y 42) z) (list x y z)) 1 2 3] -> (1 2 3)
  [(lambda (x : (y 42) z) (list x y z)) 1 : 3] -> (1 42 3)
  [(lambda (x : (y 42) z) (list x y z)) 1] -> (1 42 nil)

Presence-indicating variable accompanying optional parameter:

  [(lambda (x : (y 42 have-y)) (list x y have-y)) 1 2]
  -> (1 2 t)

  [(lambda (x : (y 42 have-y)) (list x y have-y)) 1]
  -> (1 42 nil)

  ;; defaulting via : is indistinguishable from missing
  [(lambda (x : (y 42 have-y)) (list x y have-y)) 1 :]
  -> (1 42 nil)


9.4.3 Macros flet and labels


  (flet ({(
name param-list function-body-form*)}*)
  (labels ({(
name param-list function-body-form*)}*)


The flet and labels macros bind local, named functions in the lexical scope.

Note that the above syntax synopsis describes only the canonical parameter syntax which remains after parameter list macros are expanded. See the section Parameter List Macros.

The difference between flet and labels is that a function defined by labels can see itself, and therefore recurse directly by name. Moreover, if multiple functions are defined by the same labels construct, they all have each other's names in scope of their bodies. By contrast, a flet-defined function does not have itself in scope and cannot recurse. Multiple functions in the same flet do not have each other's names in their scopes.

More formally, the function-body-forms and param-list of the functions defined by labels are in a scope in which all of the function names being defined by that same labels construct are visible.

Under both labels and flet, the local functions that are defined are lexically visible to the main body-forms.

Note that labels and flet are properly scoped with regard to macros. During macro expansion, function bindings introduced by these macro operators shadow macros defined by macrolet and defmacro.

Furthermore, function bindings introduced by labels and flet also shadow symbol macros defined by symacrolet, when those symbol macros occur as arguments of a dwim form.

See also: the macrolet operator.

Dialect Note:

The flet and labels macros do not establish named blocks around the body forms of the local functions which they bind. This differs from ANSI Common Lisp, whose local function have implicit named blocks, allowing for return-from to be used.


  ;; Wastefully slow algorithm for determining evenness.
  ;; Note:
  ;; - mutual recursion between labels-defined functions
  ;; - inner is-even bound by labels shadows the outer
  ;;   one bound by defun so the (is-even n) call goes
  ;;   to the local function.

  (defun is-even (n)
   (labels ((is-even (n)
              (if (zerop n) t (is-odd (- n 1))))
            (is-odd (n)
              (if (zerop n) nil (is-even (- n 1)))))
     (is-even n)))


9.4.4 Function call


function argument*)


The call function invokes function, passing it the given arguments, if any.

function need not be a function; other kinds of objects can be used in place of functions with various semantics. The details are given in the description of the dwim operator.


Apply lambda to 1 2 arguments, adding them to produce 3:

  (call (lambda (a b) (+ a b)) 1 2)

Useless use of call on a named function; equivalent to (list 1 2):

  (call (fun list) 1 2)


9.4.5 Functions apply and iapply


function [arg* trailing-args])
function [arg* trailing-args])


The apply function invokes function, optionally passing to it an argument list. The return value of the apply call is that of function.

If no arguments are present after function, then function is invoked without arguments.

If one argument is present after function, then it is interpreted as trailing-args. If this is a sequence (a list, vector or string), then the elements of the sequence are passed as individual arguments to function. If trailing-args is not a sequence, then function is invoked with an improper argument list, terminated by the trailing-args atom.

If two or more arguments are present after function, then the last of these arguments is interpreted as trailing-args. The previous arguments represent leading arguments. When the argument list is formed to which function is applied, the leading arguments become individual arguments presented in the same order, followed by arguments taken from the trailing_args list.

Note that if trailing-args value is an atom or an improper list, the function is then invoked with an improper argument list. Only a variadic function may be invoked with an improper argument list. Moreover, all of the function's required and optional parameters must be satisfied by elements of the improper list, such that the terminating atom either matches the rest-param directly (see the lambda operator) or else the rest-param receives an improper list terminated by that atom. To treat the terminating atom of an improper list as an ordinary element which can satisfy a required or optional function parameter, the iapply function may be used, described next.

The iapply function ("improper apply") is similar to apply, except with regard to the treatment of trailing-args. Firstly, under iapply, if trailing-args is an atom other than nil (possibly a sequence, such as a vector or string), then it is treated as an ordinary argument: function is invoked with a proper argument list, whose last element is trailing-args. Secondly, if trailing-args is a list, but an improper list, then the terminating atom of trailing-args becomes an individual argument. This terminating atom is not split into multiple arguments, even if it is a sequence. Thus, in all possible cases, iapply treats an extra non-nil atom as an argument, and never calls function with an improper argument list.


  ;; '(1 2 3) becomes arguments to list, thus (list 1 2 3).
  (apply (fun list) '(1 2 3)) -> (1 2 3)

  ;; this effectively invokes (list 1 2 3 4)
  (apply (fun list) 1 2 '(3 4)) -> (1 2 3 4)

  ;; this effectively invokes (list 1 2 . 3)
  (apply (fun list) 1 2 3)) -> (1 2 . 3)

  ;; "abc" is separated into characters
  ;; which become arguments of list
  (apply (fun list) "abc") -> (#\a #\b #\c)

Dialect Note:

Note that some uses of this function that are necessary in other Lisp dialects are not necessary in TXR Lisp. The reason is that in TXR Lisp, improper list syntax is accepted as a compound form, and performs application:

  (foo a b . x)

Here, the variables a and b supply the first two arguments for foo. In the dotted position, x must evaluate to a list or vector. The list or vector's elements are pulled out and treated as additional arguments for foo. This syntax can only be used if x is a symbolic form or an atom. It cannot be a compound form, because (foo a b . (x)) and (foo a b x) are equivalent structures.


9.4.6 Operator fun




The fun operator retrieves the function object corresponding to a named function in the current lexical environment.

The function-name may be a symbol denoting a named function: a built in function, or one defined by defun.

The function-name may also take any of the forms specified in the description of the func-get-name function. If such a function-name refers to a function which exists, then the fun operator yields that function.

Note: the fun operator does not see macro bindings via their symbolic names with which they are defined by defmacro. However, the name syntax (macro name) may be used to refer to macros. This syntax is documented in the description of func-get-name. It is also possible to retrieve a global macro expander using the function symbol-macro.


9.4.7 Operator dwim


  (set (dwim
obj-place index [alt]) new-value)
  (set (dwim {
integer | range} obj-place) new-value)
  (set '['
obj-place index [alt]']' new-value)
  (set '[{'
integer | range} obj-place']' new-value)


The dwim operator's name is an acronym: DWIM may be taken to mean "Do What I Mean", or alternatively, "Dispatch, in a Way that is Intelligent and Meaningful".

The notation [...] is a shorthand which denotes (dwim ...).

Note that since the [ and ] are used in this document for indicating optional syntax, in the above Syntax synopsis the quoted notation '[' and ']' denotes bracket tokens which literally appear in the syntax.

The dwim operator takes a variable number of arguments, which are treated as expressions to be individually macro-expanded and evaluated, using the same rules.

This means that the first argument isn't a function name, but an ordinary expression which can simply compute a function object (or, more generally, a callable object).

Furthermore, for those arguments of dwim which are symbols (after all macro-expansion is performed), the evaluation rules are altered. For the purposes of resolving symbols to values, the function and variable binding namespaces are considered to be merged into a single space, creating a situation that is similar to a Lisp-1 style dialect.

This special Lisp-1 evaluation is not recursively applied. All arguments of dwim which, after macro expansion, are not symbols are evaluated using the normal Lisp-2 evaluation rules. Thus, the DWIM operator must be used in every expression where the Lisp-1 rules for reducing symbols to values are desired.

If a symbol has bindings both in the variable and function namespace in scope, and is referenced by a dwim argument, this constitutes a conflict which is resolved according to two rules. When nested scopes are concerned, then an inner binding shadows an outer binding, regardless of their kind. An inner variable binding for a symbol shadows an outer or global function binding, and vice versa.

If a symbol is bound to both a function and variable in the global namespace, then the variable binding is favored.

Macros do not participate in the special scope conflation, with one exception. What this means is that the space of symbol macros is not folded together with the space of operator macros. An argument of dwim that is a symbol might be symbol macro, variable or function, but it cannot be interpreted as the name of a operator macro.

The exception is this: from the perspective of a dwim form, function bindings can shadow symbol macros. If a function binding is defined in an inner scope relative to a symbol macro for the same symbol, using flet or labels, the function hides the symbol macro. In other words, when macro expansion processes an argument of a dwim form, and that argument is a symbol, it is treated specially in order to provide a consistent name lookup behavior. If the innermost binding for that symbol is a function binding, it refers to that function binding, even if a more outer symbol macro binding exists, and so the symbol is not expanded using the symbol macro. By contrast, in an ordinary form, a symbolic argument never resolves to a function binding. The symbol refers to either a symbol macro or a variable, whichever is nested closer.

If, after macro expansion, the leftmost argument of the dwim is the name of a special operator or macro, the dwim form doesn't denote an invocation of that operator or macro. A dwim form is an invocation of the dwim operator, and the leftmost argument of that operator, if it is a symbol, is treated as a binding to be resolved in the variable or function namespace, like any other argument. Thus [if x y] is an invocation of the if function, not the if operator.

How many arguments are required by the dwim operator depends on the type of object to which the first argument expression evaluates. The possibilities are:

[function argument*]
Call the given function object with the given arguments.

[symbol argument*]
If the first expression evaluates to a symbol, that symbol is resolved in the function namespace, and then the resulting function, if found, is called with the given arguments.

[sequence index]
Retrieve an element from sequence, at the specified index, which is a nonnegative integer.

This form is also a syntactic place. If a value is stored to this place, it replaces the element.

The place may also be deleted, which has the effect of removing the element from the sequence, shifting the elements at higher indices, if any, down one element position, and shortening the sequence by one. If the place is deleted, and if sequence is a list, then the sequence form itself must be a place.

This form is implemented using the ref accessor such that, except for the argument evaluation semantics of the DWIM brackets, it is equivalent to using the (ref sequence index) syntax.

Retrieve the specified range of elements. The range of elements is specified in the from and to fields of a range object. The .. (dotdot) syntactic sugar denotes the construction of the range object via the rcons function. See the section on Range Indexing below.

This form is also a syntactic place. Storing a value in this place has the effect of replacing the subsequence with a new subsequence. Deleting the place has the effect of removing the specified subsequence from sequence. If sequence is a list, then the sequence form must itself be a place. The new-value argument in a range assignment can be a string, vector or list, regardless of whether the target is a string, vector or list. If the target is a string, the replacement sequence must be a string, or a list or vector of characters.

The semantics is implemented using the sub accessor, such that the following equivalence holds:

  [seq]  <--> (sub seq

For this reason, sequence may be any object that is iterable by iter-begin.

[sequence index-seq]
Elements of sequence specified by elements of index-seq, are extracted and returned as a sequence of the same kind as sequence.

This form is equivalent to (select sequence where-index) except when the target of an assignment operation.

This form is a syntactic place if sequence is one. If a sequence is assigned to this place, then elements of the sequence are distributed to the specified locations.

The following equivalences hold between index-sequence-based indexing and the select and replace functions, except that set always returns the value assigned, whereas replace returns its first argument:

  [seq idx-seq] <--> (select seq idx-seq)

  (set [seq idx-seq] new) <--> (replace seq new idx-seq)

Note that unlike the select function, this does not support [hash index-seq] because since hash keys may be lists, that syntax is indistinguishable from a simple hash lookup where index-seq is the key.

[hash key [alt]]
Retrieve a value from the hash table corresponding to key, or else return alt if there is no such entry. The expression alt is always evaluated, whether or not its value is used.

[search-tree key]
Retrieves an element from the search tree as if by applying the tree-lookup function to key.

Retrieves a list of elements from the search tree as if by evaluating the (sub-tree search-tree from-key to-below-key) expression.

[regex [start [from-end]] string]
Determine whether regular expression regex matches string, and in that case return the (possibly empty) leftmost matching substring. Otherwise, return nil.

If start is specified, it gives the starting position where the search begins, and if from-end is given, and has a value other than nil, it specifies a search from right to left. These optional arguments have the same conventions and semantics as their equivalents in the search-regst function.

Note that string is always required, and is always the rightmost argument.

[struct arg*]
The structure instance struct is inquired whether it supports a method named by the symbol lambda. If so, that method is invoked on the object. The method receives struct followed by the value of every arg. If this form is used as a place, then the object must support a lambda-set method.
[carray index]
Element and range indexing is possible on object of type carray which manipulate arrays in a foreign ("C language") representation, and are closely associated with the Foreign Function Interface (FFI). Just like in the case of sequences, the semantics of referencing carray objects with the bracket notation is based on the functions ref, refset, sub and replace. These, in turn, rely on the specialized functions. carray-ref, carray-refset, carray-sub and carray-replace.

[buf index]
Indexing is supported for objects of type buf. This provides a way to access and store the individual bytes of a buffer.

[integer sequence]
If the left argument is an integer, it denotes selection of an element from sequence. The integer value acts as the index into a vector-like or list-like sequence, or a key into a hash table.

[range {seq | ind}]
If the left argument is a range, and there is one argument, the semantics is that of the rangeref function: either the selection of a point from the range by an integer index ind, or the selection of a subrange of sequence seq according to the endpoints of range.

Note that the various above forms are not actually cases of the dwim operator but the due to the semantics of the left argument objects being used as functions. All of the semantics described above is available in any situation in which an object is used as a function: for instance, as an argument of the call or apply operators, or the functional argument in mapcar.

Range Indexing:

Vector and list range indexing is based from zero, meaning that the first element is numbered zero, the second one and so on. zero. Negative values are allowed; the value -1 refers to the last element of the vector or list, and -2 to the second last and so forth. Thus the range 1 .. -2 means "everything except for the first element and the last two".

The symbol t represents the position one past the end of the vector, string or list, so 0..t denotes the entire list or vector, and the range t..t represents the empty range just beyond the last element. It is possible to assign to t..t. For instance:

  (defvar list '(1 2 3))
  (set [list t..t] '(4)) ;; list is now (1 2 3 4)

The value zero has a "floating" behavior when used as the end of a range. If the start of the range is a negative value, and the end of the range is zero, the zero is interpreted as being the position past the end of the sequence, rather than the first element. For instance the range -1..0 means the same thing as -1..t. Zero at the start of a range always means the first element, so that 0..-1 refers to all the elements except for the last one.


The dwim operator allows for a Lisp-1 flavor of programming in TXR Lisp, which is principally a Lisp-2 dialect.

A Lisp-1 dialect is one in which an expression like (a b) treats both a and b as expressions subject to the same evaluation rules—at least, when a isn't an operator or an operator macro. This means that the symbols a and b are resolved to values in the same namespace. The form denotes a function call if the value of variable a is a function object. Thus in a Lisp-1, named functions do not exist as such: they are just variable bindings. In a Lisp-1, the form (car 1) means that there is a variable called car, which holds a function, which is retrieved from that variable and applied to the 1 argument. In the expression (car car), both occurrences of car refer to the variable, and so this form applies the car function to itself. It is almost certainly meaningless. In a Lisp-2 (car 1) means that there is a function called car, in the function namespace. In the expression (car car) the two occurrences refer to different bindings: one is a function and the other a variable. Thus there can exist a variable car which holds a cons-cell object, rather than the car function, and the form makes sense.

The Lisp-1 approach is useful for functional programming, because it eliminates cluttering occurrences of the call and fun operators. For instance:

  ;; regular notation

  (call foo (fun second) '((1 a) (2 b)))

  ;; [] notation

  [foo second '((1 a) (2 b))]

Lisp-1 dialects can also provide useful extensions by giving a meaning to objects other than functions in the first position of a form, and the dwim/[...] syntax does exactly this.

TXR Lisp is a Lisp-2 because Lisp-2 also has advantages. Lisp-2 programs which use macros naturally achieve hygiene because lexical variables do not interfere with the function namespace. If a Lisp-2 program has a local variable called list, this does not interfere with the hidden use of the function list in a macro expansion in the same block of code. Lisp-1 dialects have to provide hygienic macro systems to attack this problem. Furthermore, even when not using macros, Lisp-1 programmers have to avoid using the names of functions as lexical variable names, if the enclosing code might use them.

The two namespaces of a Lisp-2 also naturally accommodate symbol macros and operator macros. Whereas functions and variables can be represented in a single namespace readily, because functions are data objects, this is not so with symbol macros and operator macros, the latter of which are distinguished syntactically by their position in a form. In a Lisp-1 dialect, given (foo bar), either of the two symbols could be a symbol macro, but only foo can possibly be an operator macro. Yet, having only a single namespace, a Lisp-1 doesn't permit (foo foo), where foo is simultaneously a symbol macro and an operator macro, though the situation is unambiguous by syntax even in Lisp-1. In other words, Lisp-1 dialects do not entirely remove the special syntactic recognition given to the leftmost position of a compound form, yet at the same time they prohibit the user from taking full advantage of it by providing only one namespace.

TXR Lisp provides the "best of both worlds": the DWIM brackets notation provides a model of Lisp-1 computation that is purer than Lisp-1 dialects (since the leftmost argument is not given any special syntactic treatment at all) while the Lisp-2 foundation provides a traditional Lisp environment with its "natural hygiene".


9.4.8 Function functionp




The functionp function returns t if obj is a function, otherwise it returns nil.


9.4.9 Function copy-fun




The copy-fun function produces and returns a duplicate of function, which must be a function.

A duplicate of a function is a distinct function object not eq to the original function, yet which accepts the same arguments and behaves exactly the same way as the original.

If a function contains no captured environment, then a copy made of that function by copy-fun is indistinguishable from the original function in every regard, except for being a distinct object that compares unequal to the original under the eq function.

If a function contains a captured environment, then a copy of that function made by copy-fun has its own copy of that environment. If the copied function changes the values of captured lexical variables, the original function is not affected by these changes and vice versa.

The entire lexical environment is copied; the copy and original function do not share any portion of the environment at any level of nesting.


9.5 Sequencing, Selection and Iteration


9.5.1 Operators/Functions progn and prog1




The progn operator evaluates each form in left-to-right order, and returns the value of the last form. The value of the form (progn) is nil.

The prog1 operator evaluates each form in left-to-right order, and returns the value of the first form. The value of the form (prog1) is nil.

Various other operators such as let also arrange for the evaluation of a body of forms, the value of the last of which is returned. These operators are said to feature an implicit progn.

These special operators are also functions. The progn function accepts zero or more arguments. It returns its last argument, or nil if called with no arguments. The prog1 function likewise accepts zero or more arguments. It returns its first argument, or nil if called with no arguments.

Dialect Notes:

In ANSI Common Lisp, prog1 requires at least one argument. Neither prog nor prog1 exist as functions.


9.5.2 Macro/Function prog2




The prog2 evaluates each form in left-to-right order. The value is that of the second form, if present, otherwise it is nil.

The form (prog2 1 2 3) yields 2. The value of (prog2 1 2) is also 2; (prog2 1) and (prog2) yield nil.

The prog2 symbol also has a function binding. The prog2 function accepts any number of arguments. If invoked with at least two arguments, it returns the second one. Otherwise it returns nil.

Dialect Notes:

In ANSI Common Lisp, prog2 requires at least two arguments. It does not exist as a function.


9.5.3 Operator cond


  (cond {(
test form*)}*)


The cond operator provides a multi-branching conditional evaluation of forms. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.

The forms are processed from left to right as follows: the first form, test, in each group is evaluated. If it evaluates true, then the remaining forms in that group, if any, are also evaluated. Processing then terminates and the result of the last form in the group is taken as the result of cond. If test is the only form in the group, then result of test is taken as the result of cond.

If the first form of a group yields nil, then processing continues with the next group, if any. If all form groups yield nil, then the cond form yields nil. This holds in the case that the syntax is empty: (cond) yields nil.


9.5.4 Macros caseq, caseql and casequal


test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])


These three macros arrange for the evaluation of test-form, whose value is then compared against the key or keys in each normal-clause. When the value matches a key, then the remaining forms of normal-clause are evaluated, and the value of the last form is returned; subsequent clauses are not evaluated.

If no normal-clause matches, and there is no else-clause, then the value nil is returned. Otherwise, the forms in the else-clause are evaluated, and the value of the last one is returned. If there are no forms, then nil is returned.

If duplicates keys are present in such a way that the value of the test-form matches multiple normal-clauses, it is unspecified which of those clauses is evaluated.

The syntax of a normal-clause takes on these two forms:

key form*)

where key may be an atom which denotes a single key, or else a list of keys. There is a restriction that the symbol t may not be used as key. The form (t) may be used as a key to match that symbol.

The syntax of an else-clause is:


which resembles a form that is often used as the final clause in the cond syntax.

The three forms of the case construct differ from what type of test they apply between the value of test-form and the keys. The caseq macro generates code which uses the eq function's equality. The caseql macro uses eql, and casequal uses equal.


  (let ((command-symbol (casequal command-string
                          (("q" "quit") 'quit)
                          (("a" "add") 'add)
                          (("d" "del" "delete") 'delete)
                          (t 'unknown))))


9.5.5 Macros caseq*, caseql* and casequal*


test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])


The caseq*, caseql*, and casequal* macros are similar to the macros caseq, caseql, and casequal, differing from them in only the following regard. The normal-clause, of these macros has the form (evaluated-key form*) where evaluated-key is either an atom, which is evaluated to produce a key, or else else a compound form, whose elements are evaluated as forms, producing multiple keys. This evaluation takes place at macro-expansion time, in the global environment.

The else-clause works the same way under these macros as under caseq et al.

Note that although in a normal-clause, evaluated-key must not be the atom t, there is no restriction against it being an atom which evaluates to t. In this situation, the value t has no special meaning.

The evaluated-key expressions are evaluated in the order in which they appear in the construct, at the time the caseq*, caseql* or casequal* macro is expanded.

Note: these macros allow the use of variables and global symbol macros as case keys.


  (defvarl red 0)
  (defvarl green 1)
  (defvarl blue 2)

  (let ((color blue))
    (caseql* color
      (red "hot")
      ((green blue) "cool")))
  --> "cool"


9.5.6 Macros ecaseq, ecaseql, ecasequal, ecaseq*, ecaseql* and ecasequal*


test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])
test-form normal-clause* [else-clause])


These macros are error-catching variants of, respectively, caseq, caseql, casequal, caseq*, caseql* and casequal*.

If the else-clause is present in the invocation of an error-catching case macro, then the the invocation is precisely equivalent to the corresponding non-error-trapping variant.

If the else-clause is missing in the invocation of an error-catching variant, then a default else-clause is inserted which throws an exception of type case-error, derived from error. After this insertion, the semantics follows that of the non-error-trapping variant.

For instance, (ecaseql 3), which has no else-clause, is equivalent to (caseql 3 (t expr)) where expr indicates the inserted expression which throws case-error. However, (ecaseql 3 (t 42)) is simply equivalent to (caseql 3 (t 42)), since it has an else-clause.

Note: the error-catching case macros are intended for situations in which it is a matter of program correctness that every possible value of test-form matches a normal-clause, such that if a failure to match occurs, it indicates a software defect. The error-throwing else-clause helps to ensure that the error situation is noticed. Without this clause, the case macro terminates with a value of nil, which may conceal the defect and delay its identification.


9.5.7 Operator/Function if


cond t-form [e-form])
cond then [else]']'


There exist both an if operator and an if function. A list form with the symbol if in the first position is interpreted as an invocation of the if operator. The function can be accessed using the DWIM bracket notation and in other ways.

The if operator provides a simple two-way-selective evaluation control. The cond form is evaluated. If it yields true then t-form is evaluated, and that form's return value becomes the return value of the if. If cond yields false, then e-form is evaluated and its return value is taken to be that of if. If e-form is omitted, then the behavior is as if e-form were specified as nil.

The if function provides no evaluation control. All of its arguments are evaluated from left to right. If the cond argument is true, then it returns the then argument, otherwise it returns the value of the else argument if present, otherwise it returns nil.


9.5.8 Operator/Function and




There exist both an and operator and an and function. A list form with the symbol and in the first position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.

The and operator provides three functionalities in one. It computes the logical "and" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). It also provides an idiom for the convenient substitution of a value in place of nil when some other values are all true.

The and operator evaluates as follows. First, a return value is established and initialized to the value t. The forms, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when nil is stored in the return value. When evaluation stops, the operator yields the return value.

The and function provides no evaluation control: it receives all of its arguments fully evaluated. If it is given no arguments, it returns t. If it is given one or more arguments, and any of them are nil, it returns nil. Otherwise, it returns the value of the last argument.


  (and) -> t
  (and (> 10 5) (stringp "foo")) -> t
  (and 1 2 3) -> 3  ;; shorthand for (if (and 1 2) 3).


9.5.9 Macro/Function nand




There exist both a nand macro and a nand function. A list form with the symbol nand in the first position is interpreted as an invocation of the macro. The function can be accessed using the DWIM bracket notation and in other ways.

The nand macro and function are the logical negation of the and operator and function. They are related according to the following equivalences:

  (nand f0 f1 f2 ...) <--> (not (and f0 f1 f2 ...))
  [nand f0 f1 f2 ...] <--> (not [and f0 f1 f2 ...])


9.5.10 Operator/Function or




There exist both an or operator and an or function. A list form with the symbol or in the first position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.

The or operator provides three functionalities in one. It computes the logical "or" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). The behavior of or also provides an idiom for the selection of the first non-nil value from a sequence of forms.

The or operator evaluates as follows. First, a return value is established and initialized to the value nil. The forms, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when a true value is stored into the return value. When evaluation stops, the operator yields the return value.

The or function provides no evaluation control: it receives all of its arguments fully evaluated. If it is given no arguments, it returns nil. If all of its arguments are nil, it also returns nil. Otherwise, it returns the value of the first argument which isn't nil.


  (or) -> nil
  (or 1 2) -> 1
  (or nil 2) -> 2
  (or (> 10 20) (stringp "foo")) -> t


9.5.11 Macro/Function nor




There exist both a nor macro and a nor function. A list form with the symbol nor in the first position is interpreted as an invocation of the macro. The function can be accessed using the DWIM bracket notation and in other ways.

The nor macro and function are the logical negation of the or operator and function. They are related according to the following equivalences:

  (nor f0 f1 f2 ...) <--> (not (or f0 f1 f2 ...))
  [nor f0 f1 f2 ...] <--> (not [or f0 f1 f2 ...])


9.5.12 Macros when and unless


expression form*)
expression form*)


The when macro operator evaluates expression. If expression yields true, and there are additional forms, then each form is evaluated. The value of the last form becomes the result value of the when form. If there are no forms, then the result is nil.

The unless operator is similar to when, except that it reverses the logic of the test. The forms, if any, are evaluated if and only if expression is false.


9.5.13 Macros while and until


expression form*)
expression form*)


The while macro operator provides a looping construct. It evaluates expression. If expression yields nil, then the evaluation of the while form terminates, producing the value nil. Otherwise, if there are additional forms, then each form is evaluated. Next, evaluation returns to expression, repeating all of the previous steps.

The until macro operator is similar to while, except that the until form terminates when expression evaluates true, rather than false.

These operators arrange for the evaluation of all their enclosed forms in an anonymous block. Any of the forms, or expression, may use the return operator to terminate the loop, and optionally to specify a result value for the form.

The only way these forms can yield a value other than nil is if the return operator is used to terminate the implicit anonymous block, and is given an argument, which becomes the result value.


9.5.14 Macros while* and until*


expression form*)
expression form*)


The while* and until* macros are similar, respectively, to the macros while and until.

They differ in one respect: they begin by evaluating the forms one time unconditionally, without first evaluating expression. After this evaluation, the subsequent behavior is like that of while or until.

Another way to regard the behavior is that that these forms execute one iteration unconditionally, without evaluating the termination test prior to the first iteration. Yet another view is that these constructs relocate the test from the top of the loop to the bottom of the loop.


9.5.15 Macro whilet


  (whilet ({
sym | (sym init-form)}+)


The whilet macro provides a construct which combines iteration with variable binding.

The evaluation of the form takes place as follows. First, fresh bindings are established for syms as if by the let* operator. It is an error for the list of variable bindings to be empty.

After the establishment of the bindings, the value of the last sym is tested. If the value is nil, then whilet terminates. Otherwise, body-forms are evaluated in the scope of the variable bindings, and then whilet iterates from the beginning, again establishing fresh bindings for the syms, and testing the value of the last sym.

All evaluation takes place in an anonymous block, which can be terminated with the return operator. Doing so terminates the loop. If the whilet loop is thus terminated by an explicit return, a return value can be specified. Under normal termination, the return value is nil.

In the syntax, a small convenience is permitted. Instead of the last (sym init-form) it is permissible for the syntax (init-form) to appear, the sym being omitted. A machine-generated variable is substituted in place of the missing sym and that variable is then initialized from init-form and used as the basis of the test.


  ;; read lines of text from *stdin* and print them,
  ;; until the end-of-stream condition:

  (whilet ((line (get-line)))
    (put-line line))

  ;; read lines of text from *stdin* and print them,
  ;; until the end-of-stream condition occurs or
  ;; a line is identical to the character string "end".

  (whilet ((line (get-line))
           (more (and line (nequal line "end"))))
    (put-line line))


9.5.16 Macros iflet and whenlet


  (iflet {({
sym | (sym init-form)}+) | atom-form}
then-form [else-form])
  (whenlet {({
sym | (sym init-form)}+) | atom-form}


The iflet and whenlet macros combine the variable binding of let* with conditional evaluation of if and when, respectively.

In either construct's syntax, a non-compound form atom-form may appear in place of the variable binding list. In this case, atom-form is evaluated as a form, and the construct is equivalent to its respective ordinary if or when counterpart.

If the list of variable bindings is empty, it is interpreted as the atom nil and treated as an atom-form.

If one or more bindings are specified rather than atom-form, then the evaluation of these forms takes place as follows. First, fresh bindings are established for syms as if by the let* operator.

Then, the last variable's value is tested. If it is not nil then the test is true, otherwise false.

In the syntax, a small convenience is permitted. Instead of the last (sym init-form) it is permissible for the syntax (init-form) to appear, the sym being omitted. A machine-generated variable is substituted in place of the missing sym and that variable is then initialized from init-form and used as the basis of the test. This is intended to be useful in situations in which then-form or else-form do not require access to the tested value.

In the case of the iflet operator, if the test is true, the operator evaluates then-form and yields its value. Otherwise the test is false, and if the optional else-form is present, that is evaluated instead and its value is returned. If this form is missing, then nil is returned.

In the case of the whenlet operator, if the test is true, then the body-forms, if any, are evaluated. The value of the last one is returned, otherwise nil if the forms are missing. If the test is false, then evaluation of body-forms is skipped, and nil is returned.


  ;; dispose of foo-resource if present
  (whenlet ((foo-res (get-foo-resource obj)))
    (foo-shutdown foo-res)
    (set-foo-resource obj nil))

  ;; Contrast with: above, using when and let
  (let ((foo-res (get-foo-resource obj)))
    (when foo-res
      (foo-shutdown foo-res)
      (set-foo-resource obj nil)))

  ;; print frobosity value if it exceeds 150
  (whenlet ((fv (get-frobosity-value))
            (exceeds-p (> fv 150)))
    (format t "frobosity value ~a exceeds 150\n" fv))

  ;; same as above, taking advantage of the
  ;; last variable being optional:
  (whenlet ((fv (get-frobosity-value))
            ((> fv 150)))
    (format t "frobosity value ~a exceeds 150\n" fv))

  ;; yield 4: 3 interpreted as atom-form
  (whenlet 3 4)

  ;; yield 4: nil interpreted as atom-form
  (iflet () 3 4)


9.5.17 Macro condlet


    ([({ sym | (
sym init-form)}+) | atom-form]


The condlet macro generalizes iflet.

Each argument is a compound consisting of at least one item: a list of bindings or atom-form. This item is followed by zero or more body-forms.

If there are no body-forms then the situation is treated as if there were a single body-form specified as nil.

The arguments of condlet are considered in sequence, starting with the leftmost.

If the argument's left item is an atom-form then the form is evaluated. If it yields true, then the body-forms next to it are evaluated in order, and the condlet form terminates, yielding the value obtained from the last body-form. If atom-form yields false, then the next argument is considered, if there is one.

If the argument's left item is a list of bindings, then it is processed with exactly the same logic as under the iflet macro. If the last binding contains a true value, then the adjoining body-forms are evaluated in a scope in which all of the bindings are visible, and condlet terminates, yielding the value of the last body-form. Otherwise, the next argument of condlet is considered (processed in a scope in which the bindings produced by the current item are no longer visible).

If condlet runs out of arguments, it terminates and returns nil.


  (let ((l '(1 2 3)))
      ;; first arg
      (((a (first l)   ;; a binding gets 1
        (b (second l)) ;; b binding gets 2
        (g (> a b))))  ;; last variable g is nil
       'foo)           ;; not evaluated
      ;; second arg
      (((b (second l)  ;; b gets 2
        (c (third l))  ;; c gets 3
        (g (> b c))))  ;; last variable g is true
       'bar)))         ;; condlet terminates
  --> bar              ;; result is bar


9.5.18 Macro ifa


cond then [else])


The ifa macro provides an anaphoric conditional operator resembling the if operator. Around the evaluation of the then and else forms, the symbol it is implicitly bound to a subexpression of cond, a subexpression which is thereby identified as the it-form. This it alias provides a convenient reference to that place or value, similar to the word "it" in the English language, and similar anaphoric pronouns in other languages.

If it is bound to a place form, the binding is established as if using the placelet operator: the form is evaluated only once, even if the it alias is used multiple times in the then or else expressions. Furthermore, the place form is implicitly surrounded with read-once so that the place's value is accessed only once, and multiple references to it refer to a copy of the value cached in a hidden variable, rather than generating multiple accesses to the place. Otherwise, if the form is not a syntactic place it is bound as an ordinary lexical variable to the form's value.

An it-candidate is an an expression viable for having its value or storage location bound to the it symbol. An it-candidate is any expression which is not a constant expression according to the constantp function, and not a symbol.

The ifa macro imposes applies several rules to the cond expression:

The cond expression must be either an atom, a function call form, or a dwim form. Otherwise the ifa expression is ill-formed, and throws an exception at macro-expansion time. For the purposes of these rules, a dwim form is considered as a function call expression, whose first argument is the second element of the form. That is to say, [f x] which is equivalent to (dwim f x) is treated similarly to (f x) as a one-argument call.

If the cond expression is a function call with two or more arguments, at most one of them may be an it-candidate. If two or more arguments are it-candidates, the situation is ambiguous. The ifa expression is ill-formed and throws an exception at macro-expansion time.
If cond is an atom, or a function call expression with no arguments, then the it symbol is not bound. Effectively, ifa macro behaves like the ordinary if operator.
If cond is a function call or dwim expression with exactly one argument, then the it variable is bound to the argument expression, except when the function being called is not, null, or false. This binding occurs regardless of whether the expression is an it-candidate.
If cond is a function call with exactly one argument to the Boolean negation function which goes by one of the three names not, null, or false, then that situation is handled by a rewrite according to the following pattern:

  (ifa (not
expr) then else) -> (ifa expr else then)

which applies likewise for null or false substituted for not. The Boolean inverse function is removed, and the then and else expressions are exchanged.

If cond is a function call with two or more arguments, then it is only well-formed if at most one of those arguments is an it-candidate. If there is one such argument, then the it variable is bound to it.
Otherwise the variable is bound to the leftmost argument expression, regardless of whether that argument expression is an it-candidate.

In all other regards, the ifa macro behaves similarly to if.

The cond expression is evaluated, and, if applicable, the value of, or storage location denoted by the appropriate argument is captured and bound to the variable it whose scope extends over the then form, as well as over else, if present.

If cond yields a true value, then then is evaluated and the resulting value is returned, otherwise else is evaluated if present and its value is returned. A missing else is treated as if it were the nil form.


  (ifa t 1 0)  ->  1

  ;; Rule 6: it binds to (* x x), which is
  ;; the only it-candidate.
  (let ((x 6) (y 49))
    (ifa (> y (* x x)) ;; it binds to (* x x)
      (list it)))
  -> (36)

  ;; Rule 4: it binds to argument of evenp,
  ;; even though 4 isn't an it-candidate.
  (ifa (evenp 4)
    (list it))
  -> (4)

  ;; Rule 5:
  (ifa (not (oddp 4))
    (list it))
  -> (4)

  ;; Rule 7: no candidates: choose leftmost
  (let ((x 6) (y 49))
    (ifa (< 1 x y)
      (list it)))
  -> (1)

  -> (4)
  ;; Violation of Rule 1:
  ;; while is not a function
  (ifa (while t (print 42))
    (list it))
  --> exception!

  ;; Violation of Rule 2:
  (let ((x 6) (y 49))
    (ifa (> (* y y y) (* x x)))
      (list it))
  --> exception!


9.5.19 Macro conda


  (conda {(
test form*)}*)


The conda operator provides a multi-branching conditional evaluation of forms, similarly to the cond operator. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.

The conda operator is anaphoric: it expands into a nested structure of zero or more ifa invocations, according to these patterns:

  (conda) -> nil
  (conda (x y ...) ...) -> (ifa x (progn y ...) (conda ...))

Thus, conda inherits all the restrictions on the test expressions from ifa, as well as the anaphoric it variable feature.


9.5.20 Macro whena


test form*)


The whena macro is similar to the when macro, except that it is anaphoric in exactly the same manner as the ifa macro. It may be understood as conforming to the following equivalence:

  (whena x f0 f2 ...)  <-->  (if x (progn f0 f2 ...))


9.5.21 Macro dotimes


  (dotimes (
var count-form [result-form])


The dotimes macro implements a simple counting loop. var is established as a variable, and initialized to zero. count-form is evaluated one time to produce a limiting value, which should be a number. Then, if the value of var is less than the limiting value, the body-forms are evaluated, var is incremented by one, and the process repeats with a new comparison of var against the limiting value possibly leading to another evaluation of the forms.

If var is found to equal or exceed the limiting value, then the loop terminates.

When the loop terminates, its return value is nil unless a result-form is present, in which case the value of that form specifies the return value.

body-forms as well as result-form are evaluated in the scope in which the binding of var is visible.


9.5.22 Operators each, each*, collect-each, collect-each*, append-each and append-each*


  (each ({(
sym init-form)}*) body-form*)
  (each* ({(
sym init-form)}*) body-form*)
  (collect-each ({(
sym init-form)}*) body-form*)
  (collect-each* ({(
sym init-form)}*) body-form*)
  (append-each ({(
sym init-form)}*) body-form*)
  (append-each* ({(
sym init-form)}*) body-form*)


These operators establish a loop for iterating over the elements of one or more sequences. Each init-form must evaluate to an iterable object that is suitable as an argument for the iter-begin function. The sequences are then iterated in parallel over repeated evaluations of the body-forms, with each sym variable being assigned to successive elements of its sequence. The shortest list determines the number of iterations, so if any of the init-forms evaluate to an empty sequence, the body is not executed.

If the list of (sym init-form) pairs itself is empty, then an infinite loop is specified.

The body forms are enclosed in an anonymous block, allowing the return operator to terminate the loop prematurely and optionally specify the return value.

The collect-each and collect-each* variants are like each and each*, except that for each iteration, the resulting value of the body is collected into a list. When the iteration terminates, the return value of the collect-each or collect-each* operator is this collection.

The append-each and append-each* variants are like each and each*, except that for each iteration other than the last, the resulting value of the body must be a list. The last iteration may produce either an atom or a list. The objects produced by the iterations are combined together as if they were arguments to the append function, and the resulting value is the value of the append-each or append-each* operator.

The alternate forms denoted by the adorned symbols each*, collect-each* and append-each*, differ from each, collect-each and append-each in the following way. The plain forms evaluate the init-forms in an environment in which none of the sym variables are yet visible. By contrast, the alternate forms evaluate each init-form in an environment in which bindings for the previous sym variables are visible. In this phase of evaluation, sym variables are list-valued: one by one they are each bound to the list object emanating from their corresponding init-form. Just before the first loop iteration, however, the sym variables are assigned the first item from each of their lists.


The semantics of collect-each may be understood in terms of an equivalence to a code pattern involving mapcar:

  (collect-each ((x xinit)        (mapcar (lambda (x y)
                 (y yinit))  <-->           body)
    body)                                 xinit yinit)

The collect-each* variant may be understood in terms of the following equivalence involving let* for sequential binding and mapcar:

  (collect-each* ((x xinit)        (let* ((x xinit)
                  (y yinit))  <-->        (y yinit))
    body)                            (mapcar (lambda (x y)
                                             x y))

However, note that the let* as well as each invocation of the lambda binds fresh instances of the variables, whereas these operators are permitted to bind a single instance of the variables, which are first initialized with the initializing expressions, and then reused as iteration variables which are stepped by assignment.

The other operators may be understood likewise, with the substitution of the mapdo function in the case of each and each* and of the mappend function in the case of append-each and append-each*.


 ;; print numbers from 1 to 10 and whether they are even or odd
 (each* ((n 1..11) ;; n is just a range object in this scope
         (even (collect-each ((m n)) (evenp m))))
   ;; n is an integer in this scope
   (format t "~s is ~s\n" n (if even "even" "odd")))


 1 is "odd"
 2 is "even"
 3 is "odd"
 4 is "even"
 5 is "odd"
 6 is "even"
 7 is "odd"
 8 is "even"
 9 is "odd"
 10 is "even"


9.5.23 Macros for and for*


  ({for | for*} ({
sym | (sym init-form)}*)
test-form result-form*])
  ({for | for*} ({
sym | (sym init-form)}*)
test-form result-form*]))
  ({for | for*} ({
sym | (sym init-form)}*))


The macros for and for* combine variable binding with loop iteration. The first argument is a list of variables with optional initializers, exactly the same as in the let and let* operators. Furthermore, the difference between for and for* is like that between let and let* with regard to this list of variables.

The second variant in the above syntax synopsis shows that when body-forms are absent, then a list of inc-forms which is empty may be omitted from the syntax.

The for and for* macros execute these steps:

Establish an anonymous block over the entire form, allowing the return operator to be used to terminate the loop.
Establish bindings for the specified variables similarly to let and let*. The variable bindings are visible over the test-form, each result-form, each inc-form and each body-form.
Evaluate test-form. If test-form yields nil, then the loop terminates. Each result-form is evaluated, and the value of the last of these forms is the result value of the loop. If there are no result-forms then the result value is nil. If the test-form is omitted, then the test is taken to be true, and the loop does not terminate.
Otherwise, if test-form yields true, then each body-form is evaluated in turn. Then, each inc-form is evaluated in turn and processing resumes at step 2.


9.5.24 Macros doloop and doloop*


  ({doloop | doloop*}
     ({ sym | (
sym [init-form [step-form])}*)
test-form result-form*])


The doloop and doloop* macros provide an iteration construct inspired by the ANSI Common Lisp do and do* macros.

Each sym element in the form must be a symbol suitable for use as a variable name.

The tagbody-forms are placed into an implicit tagbody, meaning that a tagbody-form which is an integer, character or symbol is interpreted as a tagbody label which may be the target of a control transfer via the go macro.

The doloop macro binds each sym to the value produced by evaluating the adjacent init-form. Then, in the environment in which these variables now exist, test-form is evaluated. If that form yields nil, then the loop terminates. The result-forms are evaluated, and the value of the last one is returned.

If result-forms are absent, then nil is returned.

If test-form is also absent, then the loop terminates and returns nil.

If test-form produces a true value, then result-forms are not evaluated. Instead, the implicit tagbody consisting of the tagbody-forms is evaluated. If that evaluation terminates normally, the loop variables are then updated by assigning to each sym the value of step-form.

The following defaulting behaviors apply in regard to the variable syntax. For each sym which has an associated init-form but no step-form, the init-form is duplicated and taken as the step-form. Thus a variable specification like (x y) is equivalent to (x y y). If both forms are omitted, then the init-form is taken to be nil, and the step-form is taken to be sym. This means that the variable form (x) is equivalent to (x nil x) which has the effect that x retains its current value when the next loop iteration begins. Lastly, the sym variant is equivalent to (sym) so that x is also equivalent to (x nil x).

The differences between doloop and doloop* are: doloop binds the variables in parallel, similarly to let, whereas doloop* binds sequentially, like let*; moreover, doloop performs the step-form assignments in parallel as if using a single (pset sym0 step-form-0 sym1 step-form-1 ...) form, whereas doloop* performs the assignment sequentially as if using set rather than pset.

The doloop and doloop* macros establish an anonymous block, allowing early return from the loop, with a value, via the return operator.

Dialect Note:

These macros are substantially different from the ANSI Common Lisp do and do* macros. Firstly, the termination logic is inverted; effectively they implement "while" loops, whereas their ANSI CL counterparts implement "until" loops. Secondly, in the ANSI CL macros, the defaulting of the missing step-form is different. Variables with no step-form are not updated. In particular, this means that the form (x y) is not equivalent to (x y y); the ANSI CL macros do not feature the automatic replication of init-form into the step-form position.


9.5.25 Macros sum-each, sum-each*, mul-each and mul-each*


  (sum-each ({(
sym init-form)}*) body-form*)
  (sum-each* ({(
sym init-form)}*) body-form*)
  (mul-each ({(
sym init-form)}*) body-form*)
  (mul-each* ({(
sym init-form)}*) body-form*)


The macros sum-each, and mul-each behave very similarly to the each operator. Whereas the each operator form returns nil as its result, the sum-each and mul-each forms, if they execute to completion and return normally, return an accumulated value.

The sum-each macro initializes newly instantiated, hidden accumulator variable to the value 0. For each iteration of the loop, the body-forms are evaluated, and are expected to produce a value. This value is added to the current value of the hidden accumulator using the + function, and the result is stored into the accumulator. If sum-each returns normally, then the value of this accumulator becomes its resulting value.

The mul-each macro similarly initializes a hidden accumulator to the value 1. The value from each iteration of the body is multiplied with the accumulator using the * function, and the result is stored into the accumulator. If mul-each returns normally, then the value of this accumulator becomes its resulting value.

The sum-each* and mul-each* variants of the macros implement the sequential scoping rule for the variable bindings, exactly the way each* alters the semantics of each.

The body-forms are enclosed in an implicit anonymous block. If the forms terminate by returning from the anonymous block then these macros terminate with the specified value.

When sum-each* and sum-each are specified with variables whose values specify zero iterations, or with no variables at all, the form terminates with a value of 0. In this situation, mul-each and mul-each* terminate with 1. Note that this behavior differs from each, and its closely-related operators, which loop infinitely when no variables are specified.

It is unspecified whether mul-each and mul-each* continue iterating when the accumulator takes on a value satisfying the zerop predicate.


9.5.26 Macros each-true, some-true, each-false and some-false


  (each-true ({(
sym init-form)}*) body-form*)
  (some-true ({(
sym init-form)}*) body-form*)
  (each-false ({(
sym init-form)}*) body-form*)
  (some-false ({(
sym init-form)}*) body-form*)


These macros iterate zero or more variables over sequences, similarly to the each operator, and calculate logical results, with short-circuiting semantics.

The each-true macro initializes an internal result variable to the t value. It then evaluates the body-forms for each tuple of variable values, replacing the result variable with the value produced by these forms. If that value is nil, the iteration stops. When the iteration terminates normally, the value of the result variable is returned.

If no variables are specified, termination occurs immediately. Note that this is different from the each operator, which iterates indefinitely if no variables are specified.

The body-forms are surrounded by an implicit anonymous block, making it possible to terminate via return or return-from. In these cases, the form terminates with nil or the specified return value. The internal result is ignored.

The some-true macro is similar to each-true, with the following differences. The internal result variable is initialized to nil rather than t. The iteration stops whenever the body-forms produce a true value, and that value is returned.

The each-false and some-false macros are, respectively, similar to each-true and some-true, with one difference. After each iteration, the value produced by the body-forms is logically inverted using the not function prior to being assigned to the result variable.


  (each-true ()) -> t
  (each-true ((a ()))) -> t
  (each-true ((a '(1 2 3))) a) -> 3

  (each-true ((a '(1 2 3))
              (b '(4 5 6)))
    (< a b))
  -> t

  (each-true ((a '(1 2 3))
              (b '(4 0 6)))
    (< a b))
  -> nil

  (some-true ((a '(1 2 3))) a) -> 1
  (some-true ((a '(nil 2 3))) a) -> 2
  (some-true ((a '(nil nil nil))) a) -> nil

  (some-true ((a '(1 2 3))
              (b '(4 0 6)))
    (< a b))
  -> t

  (some-true ((a '(1 2 3))
              (b '(0 1 2)))
    (< a b))
  -> nil

  (each-false ((a '(1 2 3))
               (b '(4 5 6)))
    (> a b))
  -> t

  (each-false ((a '(1 2 3))
               (b '(4 0 6)))
    (> a b))
  -> nil

  (some-false ((a '(1 2 3))
               (b '(4 0 6)))
    (> a b))
  -> t

  (some-false ((a '(1 2 3))
               (b '(0 1 2)))
    (> a b))
  -> nil


9.5.27 Macros each-prod, collect-each-prod and append-each-prod


  (each-prod ({(
sym init-form)}*) body-form*)
  (collect-each-prod ({(
sym init-form)}*) body-form*)
  (append-each-prod ({(
sym init-form)}*) body-form*)


The macros each-prod, collect-each-prod and append-each-prod have a similar syntax to each, collect-each and collect-each-prod. However, instead of iterating over sequences in parallel, they iterate over the Cartesian product of the elements from the sequences. The difference between collect-each and collect-each-prod is analogous to that between the functions mapcar and maprod.

Like in the each operator family, the body-forms are surrounded by an anonymous block. If these forms execute a return from this block, then these macros terminate with the specified return value.

When no iterations are performed, including in the case when an empty list of variables is specified, all these macro forms terminate and return nil. Note that this behavior differs from each, and its closely-related operators, which loop infinitely when no variables are specified.

With one caveat noted below, these macros can be understood as providing syntactic sugar according to the pattern established by the following equivalences:

  (each-prod               (block nil
    ((x xinit)               (let ((#:gx xinit) (#:gy yinit))
     (y yinit))       <-->     (maprodo (lambda (x y)
    body)                                 body)
                                        #:gx #:gy))

  (collect-each-prod       (block nil
    ((x xinit)               (let ((#:gx xinit) (#:gy yinit))
     (y yinit))       <-->     (maprod (lambda (x y)
    body)                                body)
                                       #:gx #:gy))

  (append-each-prod        (block nil
    ((x xinit)               (let ((#:gx xinit) (#:gy yinit))
     (y yinit))       <-->     (maprend (lambda (x y)
    body)                                body)
                                       #:gx #:gy))

However, note that each invocation of the lambda binds fresh instances of the variables, whereas these operators are permitted to bind a single instance of the variables, which are then stepped by assignment.


  (collect-each-prod ((a '(a b c))
                      (n #(1 2)))
    (cons a n))

 --> ((a . 1) (a . 2) (b . 1) (b . 2) (c . 1) (c . 2))


9.5.28 Macros each-prod*, collect-each-prod* and append-each-prod*


  (each-prod* ({(
sym init-form)}*) body-form*)
  (collect-each-prod* ({(
sym init-form)}*) body-form*)
  (append-each-prod* ({(
sym init-form)}*) body-form*)


The macros each-prod*, collect-each-prod* and append-each-prod* are variants of each-prod, collect-each-prod and append-each-prod with sequential binding.

These macros can be understood as providing syntactic sugar according to the pattern established by the following equivalences:

  (each-prod*              (let* ((x xinit)
    ((x xinit)                    (y yinit))
     (y yinit))       <-->   (maprodo (lambda (x y) body)
    body)                             x y)

  (collect-each-prod*      (let* ((x xinit)
    ((x xinit)                    (y yinit))
     (y yinit))       <-->   (maprod (lambda (x y) body)
    body)                            x y)

  (append-each-prod*       (let* ((x xinit)
    ((x xinit)                    (y yinit))
     (y yinit))       <-->   (maprend (lambda (x y) body)
    body)                             x y)

However, note that the let* as well as each invocation of the lambda binds fresh instances of the variables, whereas these operators are permitted to bind a single instance of the variables, which are first initialized with the initializing expressions, and then reused as iteration variables which are stepped by assignment.


  (collect-each-prod* ((a "abc")
                       (b (upcase-str a)))

  --> ("aA" "aB" "aC" "bA" "bB" "bC" "cA" "cB" "cC")


9.5.29 Macros sum-each-prod, sum-each-prod*, mul-each-prod and mul-each-prod*


  (sum-each-prod ({(
sym init-form)}*) body-form*)
  (sum-each-prod* ({(
sym init-form)}*) body-form*)
  (mul-each-prod ({(
sym init-form)}*) body-form*)
  (mul-each-prod* ({(
sym init-form)}*) body-form*)


The macros sum-each-prod and mul-each-prod have a similar syntax to sum-each and mul-each. However, instead of iterating over sequences in parallel, they iterate over the Cartesian product of the elements from the sequences.

The macros sum-each-prod* and mul-each-prod* variants perform sequential variable binding when establishing the initial values of the variables, similarly to the each* operator.

The body-forms are surrounded by an implicit anonymous block. If these forms execute a return from this block, then these macros terminate with the specified return value.

When no iterations are specified, including in the case when an empty list of variables is specified, the summing macros terminate, yielding 0, and the multiplicative macros terminate with 1. Note that this behavior differs from each, and its closely-related operators, which loop infinitely when no variables are specified.


  ;; Inefficiently calculate (+ (* 1 2 3) (* 4 3 2)).
  ;; Every value from (1 2 3) is paired with every value
  ;; from (4 3 2) to form a partial products, and
  ;; sum-each-prod adds these together implicitly:

  (sum-each-prod ((x '(1 2 3))
                  (y '(4 3 2)))
    (* x y))
  -> 54


9.5.30 Operators block and block*


name body-form*)
name-form body-form*)


The block operator introduces a named block around the execution of some forms. The name argument may be any object, though block names are usually symbols. Two block name objects are considered to be the same name according to eq equality. Since a block name is not a variable binding, keyword symbols are permitted, and so are the symbols t and nil. A block named by the symbol nil is slightly special: it is understood to be an anonymous block.

The block* operator differs from block in that it evaluates name-form, which is expected to produce a symbol. The resulting symbol is used for the name of the block.

A named or anonymous block establishes an exit point for the return-from or return operator, respectively. These operators can be invoked within a block to cause its immediate termination with a specified return value.

A block also establishes a prompt for a delimited continuation. Anywhere in a block, a continuation can be captured using the sys:capture-cont function. Delimited continuations are described in the section Delimited Continuations. A delimited continuation allows an apparently abandoned block to be restarted at the capture point, with the entire call chain and dynamic environment between the prompt and the capture point intact.

Blocks in TXR Lisp have dynamic scope. This means that the following situation is allowed:

  (defun func () (return-from foo 42))
  (block foo (func))

The function can return from the foo block even though the foo block does not lexically surround foo.

It is because blocks are dynamic that the block* variant exists; for lexically scoped blocks, it would make little sense to have support a dynamically computed name.

Thus blocks in TXR Lisp provide dynamic nonlocal returns, as well as returns out of lexical nesting.

It is permitted for blocks to be aggressively progn-converted by compilation. This means that a block form which meets certain criteria is converted to a progn form which surrounds the body-forms and thus no longer establishes an exit point.

A block form will be spared from progn-conversion by the compiler if it meets the following rules.

Any body-form references the block's name in a return, return-from, sys:abscond-from or sys:capture-cont expression.
The form contains at least one direct call to a function not in the standard TXR Lisp library.
The form contains at least one direct call to the functions sys:capture-cont, return*, sys:abscond*, match-fun, eval, load, compile, compile-file or compile-toplevel.
The form references any of the functions in rules 2 and 3 as a function binding via the dwim operator (or the DWIM brackets notation) or via the fun operator.
The form is a block* form; these are spared from the optimization.
Removal of blocks under the above rules means that some use of blocks which works in interpreted code will not work in compiled programs. Programs which adhere to the rules are not affected by such a difference.

Additionally, the compiler may progn-convert blocks in contravention of the above rules, but only if doing so makes no difference to visible program behavior.


  (defun helper ()
    (return-from top 42))

  ;; defun implicitly defines a block named top
  (defun top ()
    (helper) ;; function returns 42
    (prinl 'notreached)) ;; never printed

  (defun top2 ()
    (let ((h (fun helper)))
      (block top (call h))   ;; may progn-convert
      (block top (call 'helper)) ;; may progn-convert
      (block top (helper)))) ;; not removed
In the above examples, the block containing (call h) may be converted to progn because it doesn't express a direct call to the helper function. The block which calls helper using (call 'helper) is also not considered to be making a direct call.

Dialect Note:

In Common Lisp, blocks are lexical. A separate mechanism consisting of catch and throw operators performs nonlocal transfer based on symbols. The TXR Lisp example:

  (defun func () (return-from foo 42))
  (block foo (func))

is not allowed in Common Lisp, but can be transliterated to:

  (defun func () (throw 'foo 42))
  (catch 'foo (func))

Note that foo is quoted in CL. This underscores the dynamic nature of the construct. throw itself is a function and not an operator. Also note that the CL example, in turn, is even more closely transcribed back into TXR Lisp simply by replacing its throw and catch with return* and block*:

  (defun func () (return* 'foo 42))
  (block* 'foo (func))

Common Lisp blocks also do not support delimited continuations.


9.5.31 Operators return and return-from


  (return [
name [value])


The return operator must be dynamically enclosed within an anonymous block (a block named by the symbol nil). It immediately terminates the evaluation of the innermost anonymous block which encloses it, causing it to return the specified value. If the value is omitted, the anonymous block returns nil.

The return-from operator must be dynamically enclosed within a named block whose name matches the name argument. It immediately terminates the evaluation of the innermost such block, causing it to return the specified value. If the value is omitted, that block returns nil.


    (block foo
      (let ((a "abc\n")
            (b "def\n"))
        (pprint a *stdout*)
        (return-from foo 42)
        (pprint b *stdout*)))

Here, the output produced is "abc". The value of b is not printed because. return-from terminates block foo, and so the second pprint form is not evaluated.


9.5.32 Function return*


name [value])


The return* function is similar to the return-from operator, except that name is an ordinary function parameter, and so when return* is used, an argument expression must be specified which evaluates to a symbol. Thus return* allows the target block of a return to be dynamically computed.

The following equivalence holds between the operator and function:

  (return-from a b)  <-->  (return* 'a b)

Expressions used as name arguments to return* which do not simply quote a symbol have no equivalent in return-from.


9.5.33 Macros tagbody and go


  (tagbody {
form | label}*)


The tagbody macro provides a form of the "go to" control construct. The arguments of a tagbody form are a mixture of zero or more forms and go labels. The latter consist of those arguments which are symbols, integers or characters. Labels are not considered by tagbody and go to be forms, and are not subject to macro expansion or evaluation.

The go macro is available inside tagbody. It is erroneous for a go form to occur outside of a tagbody. This situation is diagnosed by global macro called go, which unconditionally throws an error.

In the absence of invocations of go or other control transfers, the tagbody macro evaluates each form in left-to-right order. The go labels are ignored. After the last form is evaluated, the tagbody form terminates, and yields nil.

Any form itself, or else any of its subforms, may be the form (go label) where label matches one of the go labels of a surrounding tagbody. When this go form is evaluated, then the evaluation of form is immediately abandoned, and control transfers to the specified label. The forms are then evaluated in left-to-right order starting with the form immediately after that label. If the label is not followed by any forms, then the tagbody terminates. If label doesn't match to any label in any surrounding tagbody, the go form is erroneous.

The abandonment of a form by invocation of go is a dynamic transfer. All necessary unwinding inside form takes place.

The go labels are lexically scoped, but dynamically bound. Their scope being lexical means that the labels are not visible to forms which are not enclosed within the tagbody, even if their evaluation is invoked from that tagbody. The dynamic binding means that the labels of a tagbody form are established when it begins evaluating, and removed when that form terminates. Once a label is removed, it is not available to be the target of a go control transfer, even if that go form has the label in its lexical scope. Such an attempted transfer is erroneous.

It is permitted for tagbody forms to nest arbitrarily. The labels of an inner tagbody are not visible to an outer tagbody. However, the reverse is true: a go form in an inner tagbody may branch to a label in an outer tagbody, in which case the entire inner tagbody terminates.

In cases where the same objects are used as labels by an inner and outer tagbody, the inner labels shadow the outer labels.

There is no restriction on what kinds of symbols may be labels. Symbols in the keyword package as well as the symbols t and nil are valid tagbody labels.

Dialect Note:

ANSI Common Lisp tagbody supports only symbols and integers as labels (which are called "go tags"); characters are not supported.


  ;; print the numbers 1 to 10
  (let ((i 0))
      (go skip) ;; forward goto skips 0
      (prinl i)
      (when (<= (inc i) 10)
        (go again))))

  ;; Example of erroneous usage: by the time func is invoked
  ;; by (call func) the tagbody has already terminated. The
  ;; lambda body can still "see" the label, but it doesn't
  ;; have a binding.
  (let (func)
      (set func (lambda () (go label)))
      (go out)
      (prinl 'never-reached)
    (call func))

  ;; Example of unwinding when the unwind-protect
  ;; form is abandoned by (go out). Output is:
  ;;   reached
  ;;   cleanup
  ;;   out
         (prinl 'reached)
         (go out)
         (prinl 'notreached))
       (prinl 'cleanup))
     (prinl 'out))


9.5.34 Macros prog and prog*


  (prog ({
sym | (sym init-form)}*)
body-form | label}*)
  (prog* ({
sym | (sym init-form)}*)
body-form | label}*)


The prog and progn* macros combine the features of let and let*, respectively, anonymous block and tagbody.

The prog macro treats the sym and init-form expressions similarly to let, establishing variable bindings in parallel. The prog* macro treats these expressions in a similar way to let*.

The forms enclosed are treated like the argument forms of the tagbody macro: labels are permitted, along with use of go.

Finally, an anonymous block is established around all of the enclosed forms (both the init-forms and body-formss) allowing the use of return to terminate evaluation with a value.

The prog macro may be understood according to the following equivalence:

   (prog vars forms ...)  <-->  (block nil
                                  (let vars
                                    (tagbody forms ...)))

Likewise, the prog* macro follows an analogous equivalence, with let replaced by let*.


9.6 Evaluation


9.6.1 Function eval


form [env [menv]])


The eval function treats the form object as a Lisp expression, which is expanded and evaluated. The side effects implied by the form are performed, and the value which it produces is returned.

The optional env argument specifies an environment for resolving the function and variable references encountered in form. If this argument is omitted, then evaluation takes place in the global environment.

The optional menv object specifies a macro environment for expanding macros encountered in form. If this argument is omitted, then form may refer to only global macros.

If both menv and env are specified, then env takes precedence over menv, behaving like a more nested scope. Definitions contained in env shadow same-named definitions in menv.

The form is not expanded all at once. Rather, it is treated by the following algorithm:

First, if form is a macro, it is macro-expanded as if by an application of the function macroexpand (with a suitable environment argument, calculated by a combination of env and menv).
If the resulting expanded form is a progn, compile-only, or eval-only form, then eval iterates over that form's argument expressions, passing each expression to a recursive call to eval using the same env.
Otherwise, if the expanded form isn't one of the above three kinds of expressions, it is subject to a full expansion and evaluation.
This algorithm allows a sequence of top-level forms to be combined into a single top-level form, even when the expansion of forms occurring later in the sequence depends on the evaluation effects of forms earlier in the sequence.

For instance, a form like (progn (defmacro foo ()) (foo)) may be processed with eval, because the above algorithm ensures that the (defmacro foo ()) expression is fully evaluated first, thereby providing the macro definition required by (foo).

This expansion and evaluation order is important because the semantics of eval forms the reference model for how the load function processes top-level forms. Moreover, file compilation perform a similar treatment of top-level forms and incremental macro compilation. The result is that the behavior is consistent between source files and compiled files. See the sections Top-Level Forms and File Compilation Model.

Note that, according to these rules, the constituent body forms of a macrolet or symacrolet top-level form are not individual top-level forms, even if the expansion of the construct combines the expanded versions of those forms with progn.

The form (macrolet () (defmacro foo ()) (foo)) will therefore not work correctly. However, the specific problem in this situation can be be resolved by rewriting foo as a macrolet macro: (macrolet ((foo ())) (foo)).

See also: the make-env function.


9.6.2 Function constantp


form [env])


The constantp function determines whether form is a constant form, with respect to environment env.

If env is absent, the global environment is used. The env argument is used for fully expanding form prior to analyzing.

Currently, constantp returns true for any form which, after macro-expansion, is any of the following: a compound form with the symbol quote in its first position; a non-symbolic atom; or one of the symbols which evaluate to themselves and cannot be bound as variables. These symbols are the keyword symbols, and the symbols t and nil.

Additionally, constantp returns true for a compound form, or a DWIM form, whose symbol is the member of a set a large number of constant-foldable library functions, and whose arguments are, recursively, constantp expressions for the same environment. The arithmetic functions are members of this set.

For all other inputs, constantp returns nil.

Note: some uses of constantp require manual expansion.


  (constantp nil) -> t
  (constantp t) -> t
  (constantp :key) -> t
  (constantp :) -> t
  (constantp 'a) -> nil
  (constantp 42) -> t

  (constantp '(+ 2 2 [* 3 (/ 4 4)])) -> t

  ;; symacrolet form expands to 42, which is constant
  (constantp '(symacrolet ((a 42)) a))

  (defmacro cp (:env e arg)
    (constantp arg e))

  ;; macro call (cp 'a) is replaced by t because
  ;; the symbol a expands to (+ 2 2) in the given environment,
  ;; and so (* a a) expands to (* (+ 2 2) (+ 2 2)) which is constantp.
  (symacrolet ((a (+ 2 2)))
    (cp '(* a a))) -> t


9.6.3 Function make-env


  (make-env [
var-bindings [fun-bindings [next-env]]])


The make-env function creates an environment object suitable as the env parameter.

The var-bindings and fun-bindings parameters, if specified, should be association lists, mapping symbols to objects. The objects in fun-bindings should be functions, or objects callable as functions.

The next-env argument, if specified, should be an environment.

Note: bindings can also be added to an environment using the env-vbind and env-fbind functions.


9.6.4 Functions env-vbind and env-fbind


env symbol value)
env symbol value)


These functions bind a symbol to a value in either the function or variable space of environment env.

Values established in the function space should be functions or objects that can be used as functions such as lists, strings, arrays or hashes.

If symbol already exists in the environment, in the given space, then its value is updated with value.

If env is specified as nil, then the binding takes place in the global environment.


9.6.5 Functions env-vbindings, env-fbindings and env-next




These function retrieve the components of env, which must be an environment. The env-vbindings function retrieves the association list representing variable bindings. Similarly, the env-fbindings retrieves the association list of function bindings. The env-next function retrieves the next environment, if env has one, otherwise nil.

If e is an environment constructed by the expression (make-env v f n), then (env-vbindings e) retrieves v, (env-fbindings e) retrieves f and (env-next e) returns n.


9.7 Global Environment


9.7.1 Accessors symbol-function, symbol-macro and symbol-value


  (symbol-function {
symbol | method-name | lambda-expr})
  (set (symbol-function {
symbol | method-name}) new-value)
  (set (symbol-macro
symbol) new-value)
  (set (symbol-value
symbol) new-value)


If given a symbol argument, the symbol-function function retrieves the value of the global function binding of the given symbol if it has one: that is, the function object bound to the symbol. If symbol has no global function binding, then nil is returned.

The symbol-function function supports method names of the form (meth struct slot) where struct names a struct type, and slot is either a static slot or one of the keyword symbols :init or :postinit which refer to special functions associated with a structure type. Names in this format are returned by the func-get-name function. The symbol-function function also supports names of the form (macro name) which denote macros. Thus, symbol-function provides unified access to functions, methods and macros.

If a lambda expression is passed to symbol-function, then the expression is macro-expanded and if that is successful, the function implied by that expression is returned. It is unspecified whether this function is interpreted or compiled.

The symbol-macro function retrieves the value of the global macro binding of symbol if it has one.

Note: the name of this function has nothing to do with symbol macros; it is named for consistency with symbol-function and symbol-value, referring to the "macro-expander binding of the symbol cell".

The value of a macro binding is a function object. Intrinsic macros are C functions in the TXR kernel, which receive the entire macro call form and macro environment, performing their own destructuring. Currently, macros written in TXR Lisp are represented as curried C functions which carry the following list object in their environment cell:

  (#<environment object>
macro-parameter-list body-form*)

Local macros created by macrolet have nil in place of the environment object.

This representation is likely to change or expand to include other forms in future TXR versions.

The symbol-value function retrieves the value stored in the dynamic binding of symbol that is apparent in the current context. If the variable has no dynamic binding, then symbol-value retrieves its value in the global environment. If symbol has no variable binding, but is defined as a global symbol macro, then the value of that symbol macro binding is retrieved. The value of a symbol macro binding is simply the replacement form.

Rather than throwing an exception, each of these functions returns nil if the argument symbol doesn't have the binding in the respective namespace or namespaces which that function searches.

A symbol-function, symbol-macro, or symbol-value form denotes a place, if symbol has a binding of the respective kind. This place may be assigned to or deleted. Assignment to the place causes the denoted binding to have a new value. Deleting a place with the del macro removes the binding, and returns the previous contents of that binding. A binding denoted by a symbol-function form is removed using fmakunbound, one denoted by by symbol-macro is removed using mmakunbound and a binding denoted by symbol-value is removed using makunbound.

Deleting a method via symbol-function is not possible; an attempt to do so has no effect.

Storing a value, using any one of these three accessors, to a nonexistent variable, function or macro binding, is not erroneous. It has has the effect of creating that binding.

Using symbol-function accessor to assign to a lambda expression is erroneous.

Deleting a binding, using any of these three accessors, when the binding does not exist, also isn't erroneous. There is no effect and the del operator yields nil as the prior value, consistent with the behavior when accessors are used to retrieve a nonexistent value.

Dialect Note:

In ANSI Common Lisp, the symbol-function function retrieves a function, macro or special operator binding of a symbol. These are all in one space and may not coexist. In TXR Lisp, it retrieves a symbol's function binding only. Common Lisp has an accessor named macro-function similar to symbol-macro.


9.7.2 Functions boundp, fboundp and mboundp


  (fboundp {
symbol | method-name | lambda-expr})


boundp returns t if the symbol is bound as a variable or symbol macro in the global environment, otherwise nil.

fboundp returns t if the symbol has a function binding in the global environment, the method specified by method-name exists, or a lambda expression argument is given. Otherwise it returns nil.

mboundp returns t if the symbol has an operator macro binding in the global environment, otherwise nil.

Dialect Notes:

The boundp function in ANSI Common Lisp doesn't report that global symbol macros have a binding. They are not considered bindings. In TXR Lisp, they are considered bindings.

The ANSI Common Lisp fboundp yields true if its argument has a function, macro or operator binding, whereas the TXR Lisp fboundp does not consider operators or macros. The ANSI CL fboundp does not yield true for lambda expressions. Behavior similar to the Common Lisp expression (fboundp x) in Common Lisp can be obtained in TXR Lisp using the

  (or (fboundp x) (mboundp x) (special-operator-p x))

expression, except that this will also yield true when x is a lambda expression.

The mboundp function doesn't exist in ANSI Common Lisp.


9.7.3 Function makunbound




The function makunbound removes the binding of symbol from either the dynamic environment or the global symbol macro environment. After the call to makunbound, symbol appears to be unbound.

If the makunbound call takes place in a scope in which there exists a dynamic rebinding of symbol, the information for restoring the previous binding is not affected by makunbound. When that scope terminates, the previous binding will be restored.

If the makunbound call takes place in a scope in which the dynamic binding for symbol is the global binding, then the global binding is removed. When the global binding is removed, then if symbol was previously marked as special (for instance by defvar) this marking is removed.

Otherwise if symbol has a global symbol macro binding, that binding is removed.

If symbol has no apparent dynamic binding, and no global symbol macro binding, makunbound does nothing.

In all cases, makunbound returns symbol.

Dialect Note:

The behavior of makunbound differs from its counterpart in ANSI Common Lisp.

The makunbound function in Common Lisp only removes a value from a dynamic variable. The dynamic variable does not cease to exist, it only ceases to have a value (because a binding is a value). In TXR Lisp, the variable ceases to exist. The binding of a variable isn't its value, it is the variable itself: the association between a name and an abstract storage location, in some environment. If the binding is undone, the variable disappears.

The makunbound function in Common Lisp does not remove global symbol macros, which are not considered to be bindings in the variable namespace. That is to say, the Common Lisp boundp does not report true for symbol macros.

The Common Lisp makunbound also doesn't remove the special attribute from a symbol. If a variable is introduced with defvar and then removed with makunbound, the symbol continues to exhibit dynamic binding rather than lexical in subsequent scopes. In TXR Lisp, if a global binding is removed, so is the special attribute.


9.7.4 Functions fmakunbound and mmakunbound




The function fmakunbound removes any binding for symbol from the function namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.

The function mmakunbound removes any binding for symbol from the operator macro namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.

Dialect Note:

The behavior of fmakunbound differs from its counterpart in ANSI Common Lisp. The fmakunbound function in Common Lisp removes a function or macro binding, which do not coexist.

The mmakunbound function doesn't exist in Common Lisp.


9.7.5 Function func-get-form




The func-get-form function retrieves a source code form of func, which must be an interpreted function. The source code form has the syntax (name arglist body-form*) .


9.7.6 Function func-get-name


func [env])


The func-get-name tries to resolve the function object func to a name. If that is not possible, it returns nil.

The resolution is performed by an exhaustive search through up to three spaces.

If an environment is specified by env, then this is searched first. If a binding is found in that environment which resolves to the function, then the search terminates and the binding's symbol is returned as the function's name.

If the search through environment env fails, or if that argument is not specified, then the global environment is searched for a function binding which resolves to func. If such a binding is found, then the search terminates, and the binding's symbol is returned. If two or more symbols in the global environment resolve to the function, it is not specified which one is returned.

If the global function environment search fails, then the function is considered as a possible macro. The global macro environment is searched for a macro binding whose expander function is func, similarly to the way the function environment was searched. If a binding is found, then the syntax (macro name) is returned, where name is the name of the global macro binding that was found which resolves to func. If two or more global macro bindings share func, it is not specified which of those bindings provides name.

If the global macro search fails, then func is considered as a possible method. The static slot space of all struct types is searched for a slot which contains func. If such a slot is found, then the method name is returned, consisting of the syntax (meth type name) where type is a symbol denoting the struct type and name is the static slot of the struct type which holds func.

A check is also performed whether func might be equal to one of the two special functions of a structure type: its initfun or postinitfun, in which case it is returned as either the (meth type :init) or the (meth type :postinit) syntax.

If func is an interpreted function not found under any name, then a lambda expression denoting that function is returned in the syntax (lambda args form*)

If func cannot be identified as a function, then nil is returned.


9.7.7 Function func-get-env




The func-get-env function retrieves the environment object associated with function func. The environment object holds the captured bindings of a lexical closure.


9.7.8 Functions fun-fixparam-count and fun-optparam-count




The fun-fixparam-count reports func's number of fixed parameters. The fixed parameters consist of the required parameters and the optional parameters. Variadic functions have a parameter which captures the remaining arguments which are in excess of the fixed parameters. That parameter is not considered a fixed parameter and therefore doesn't contribute to this count.

The fun-optparam-count reports func's number of optional parameters.

The func argument must be a function.

Note: if a function isn't variadic (see the fun-variadic function) then the value reported by fun-fixparam-count represents the maximum number of arguments which can be passed to the function. The minimum number of required arguments can be calculated for any function by subtracting the value from fun-optparam-count from the value from fun-fixparam-count.


9.7.9 Function fun-variadic




The fun-variadic function returns t if func is a variadic function, otherwise nil.

The func argument must be a function.


9.7.10 Function interp-fun-p




The interp-fun-p function returns t if obj is an interpreted function, otherwise it returns nil.


9.7.11 Function vm-fun-p




The vm-fun-p function returns t if obj a function compiled for the virtual machine: a function representation produced by means of the functions compile-file, compile-toplevel or compile. If obj is of any other type, the function returns nil.


9.7.12 Function special-var-p




The special-var-p function returns t if obj is a symbol marked for special variable binding, otherwise it returns nil. Symbols are marked special by defvar and defparm.


9.7.13 Function special-operator-p




The special-operator-p function returns t if obj is a symbol which names a special operator, otherwise it returns nil.


9.7.14 Symbol Macro %fun%


The symbol macro %fun% indicates the current function name, There is a global %fun% symbol macro which expands to nil. Around certain kinds of named functions, a local binding for %fun% is established which provides the function name. The purpose of this name is for use in diagnostic messages; therefore it is an abbreviated name.

The %fun% macro is established for defun, defmacro and defmeth forms. It is also established for methods defined inside a defstruct form including the methods :init, :postinit, :fini and :postfini.

The %fun% macro is visible not only to the its function's body, but also to the expressions inside the parameter list which compute the default values for optional parameters.

The name provided by %fun% is intended for use in diagnostic messages and is therefore an informal name, and not the formal name which can be passed to symbol-function to retrieve the function.

In the case of a defun function named x, the %fun% name is that symbol, x. Thus, in this case, the name is the same as the formal name. In the case of a defmacro named x, %fun% also expands to the symbol x x, but that is the formal name of the macro, which is (macro x). In the case of a method x of a structure type s, %fun% is the two-element list (s x), rather than the formal name (meth s x).


  ;; log a message naming the function
  (defun connect-to-host (addr)
    (format t "~s: connecting to host ~s" %fun% addr))


9.8 Object Type

In TXR Lisp, objects obey the following type hierarchy. In this type hierarchy, the internal nodes denote abstract types: no object is an instance of an abstract type. Nodes in square brackets indicate an internal structure in the type graph, invisible to programs, and angle brackets indicate a plurality of types which are not listed by name:

  t ----+--- [cobj types] ---+--- hash
        |                    |
        |                    +--- hash-iter
        |                    |
        |                    +--- stream
        |                    |
        |                    +--- random-state
        |                    |
        |                    +--- regex
        |                    |
        |                    +--- buf
        |                    |
        |                    +--- tree
        |                    |
        |                    +--- tree-iter
        |                    |
        |                    +--- seq-iter
        |                    |
        |                    +--- cptr
        |                    |
        |                    +--- dir
        |                    |
        |                    +--- struct-type
        |                    |
        |                    +--- <all structures>
        |                    |
        |                    +--- ... others
        +--- sequence ---+--- string ---+--- str
        |                |              |
        |                |              +--- lstr
        |                |              |
        |                |              +--- lit
        |                |
        |                +--- list ---+--- null
        |                |            |
        |                |            +--- cons
        |                |            |
        |                |            +--- lcons
        |                |
        |                +--- vec
        |                |
        |                +--- <structures with car or length methods>
        +--- number ---+--- float
        |              |
        |              +--- integer ---+--- fixnum
        |                              |
        |                              +--- bignum
        +--- chr
        +--- sym
        +--- env
        +--- range
        +--- tnode
        +--- pkg
        +--- fun
        +--- args

In addition to the above hierarchy, the following relationships also exist:

  t ---+--- atom --- <any type other than cons> --- nil
       +--- cons ---+--- lcons --- nil
                    +--- nil

  sym --- null

  struct ---- <all structures>

That is to say, the types are exhaustively partitioned into atoms and conses; an object is either a cons or else it isn't, in which case it is the abstract type atom.

The cons type is odd in that it is both an abstract type, serving as a supertype for the type lcons and it is also a concrete type in that regular conses are of this type.

The type nil is an abstract type which is empty. That is to say, no object is of type nil. This type is considered the abstract subtype of every other type, including itself.

The type nil is not to be confused with the type null which is the type of the nil symbol.

Because the type of nil is the type null and nil is also a symbol, the null type is a subtype of sym.

Lastly, the symbol struct serves as the supertype of all structures.


9.8.1 Function typeof




The typeof function returns a symbol representing the type of value.

The core types are identified by the following symbols:

Cons cell.


Literal string embedded in the TXR executable image.


Fixnum integer: an integer that fits into the value word, not having to be heap-allocated.

A bignum integer: arbitrary precision integer that is heap-allocated.

Floating-point number.


Symbol package.



Lazy cons.

Range object.

Lazy string.

Function/variable binding environment.

Hash table.

I/O stream of any kind.

Regular-expression object.

A structure type: the type of any one of the values which represents a structure type.

Binary search tree node.

Binary search tree.

Function argument list represented as an object.

There are more kinds of objects, such as user-defined structures.


9.8.2 Function subtypep


left-type right-type)


The subtypep function tests whether left-type and right-type name a pair of types, such that the left type is a subtype of the right type.

The arguments are either type symbols, or structure type objects, as returned by the find-struct-type function. Thus, the symbol time, which is the name of a predefined struct type, and the object returned by (find-struct-type 'time) are considered equivalent argument values.

If either argument doesn't name a type, the behavior is unspecified.

Each type is a subtype of itself. Most other type relationships can be inferred from the type hierarchy diagrams given in the introduction to this section.

In addition, there are inheritance relationships among structures. If left-type and right-type are both structure types, then subtypep yields true if the types are the same struct type, or if the right type is a direct or indirect supertype of the left.

The type symbol struct is a supertype of all structure types.


9.8.3 Function typep


object type-symbol)


The typep function tests whether the type of object is a subtype of the type named by type-symbol.

The following equivalence holds:

  (typep a b) --> (subtypep (typeof a) b)


9.8.4 Macro typecase


test-form {(type-sym clause-form*)}*)


The typecase macro evaluates test-form and then successively tests its type against each clause.

Each clause consists of a type symbol type-sym and zero or more clause-forms.

The first clause whose type-sym is a supertype of the type of test-form's value is considered to be the matching clause. That clause's clause-forms are evaluated, and the value of the last form is returned.

If there is no matching clause, or there are no clauses present, or the matching clause has no clause-forms, then nil is returned.

Note: since t is the supertype of every type, a clause whose type-sym is the symbol t always matches. If such a clause is placed as the last clause of a typecase, it provides a fallback case, whose forms are evaluated if none of the previous clauses match.


9.8.5 Macro etypecase


test-form {(type-sym clause-form*)}*)


The etypecase macro is the error-catching variant of typecase, similar to the relationship between the ecaseq and caseq families of macros.

If one of the clauses has a type-sym which is the symbol t, then etypecase is precisely equivalent to typecase. Otherwise, a clause with a type-sym of t and which throws an exception of type case-error, derived from error, is appended to the existing clauses, after which the semantics follows that of typecase.


9.8.6 Function built-in-type-p




The built-in-type-p function returns t if object is a symbol which is the name of a built-in type. For all other objects it returns nil.


9.9 Object Equivalence


9.9.1 Functions identity, identity* and use




The identity function returns its argument.

If the identity* function is given at least one argument, then it returns its leftmost argument, otherwise it returns nil.

The use function is a synonym of identity.


The identity function is useful as a functional argument, when a transformation function is required, but no transformation is actually desired. In this role, the use synonym leads to readable code. For instance:

  ;; construct a function which returns its integer argument
  ;; if it is odd, otherwise it returns its successor.
  ;; "If it's odd, use it, otherwise take its successor".

  [iff oddp use succ]

  ;; Applications of the function:

  [[iff oddp use succ] 3] -> 3  ;; use applied to 3

  [[iff oddp use succ] 2] -> 3  ;; succ applied to 2


9.9.2 Functions null, not and false




The null, not and false functions are synonyms. They tests whether value is the object nil. They return t if this is the case, nil otherwise.


  (null '()) -> t
  (null nil) -> t
  (null ()) -> t
  (false t) -> nil

  (if (null x) (format t "x is nil!"))

  (let ((list '(b c d)))
    (if (not (memq 'a list))
      (format t "list ~s does not contain the symbol a\n")))


9.9.3 Functions true and have




The true function is the complement of the null, not and false functions. The have function is a synonym for true.

It return t if the value is any object other than nil. If value is nil, it returns nil.

Note: programs should avoid explicitly testing values with true. For instance (if x ...) should be favored over (if (true x) ...). However, the latter is useful with the ifa macro because (ifa (true expr) ...) binds the it variable to the value of expr, no matter what kind of form expr is, which is not true in the (ifa expr ...) form.


   ;; Compute indices where the list '(1 nil 2 nil 3)
   ;; has true values:
   [where '(1 nil 2 nil 3) true] -> (1 3)


9.9.4 Functions eq, eql and equal


left-obj right-obj)
left-obj right-obj)
left-obj right-obj)


The principal equality test functions eq, eql and equal test whether two objects are equivalent, using different criteria. They return t if the objects are equivalent, and nil otherwise.

The eq function uses the strictest equivalence test, called implementation equality. The eq function returns t if and only if, left-obj and right-obj are actually the same object. The eq test is implemented by comparing the raw bit pattern of the value, whether or not it is an immediate value or a pointer to a heaped object. Two character values are eq if they are the same character, and two fixnum integers are eq if they have the same value. All other object representations are actually pointers, and are eq if and only if they point to the same object in memory. So, for instance, two bignum integers might not be eq even if they have the same numeric value, two lists might not be eq even if all their corresponding elements are eq and two strings might not be eq even if they hold identical text.

The eql function is slightly less strict than eq. The difference between eql and eq is that if left-obj and right-obj are numbers which are of the same kind and have the same numeric value, eql returns t, even if they are different objects. Note that an integers and a floating-point number are not eql even if one has a value which converts to the other: thus, (eql 0.0 0) yields nil; a comparison expression which finds these numbers equal is (= 0.0 0). The eql function also specially treats range objects. Two distinct range objects are eql if their corresponding from and to fields are eql. For all other object types, eql behaves like eq.

The equal function is less strict still than eql. In general, it recurses into some kinds of aggregate objects to perform a structural equivalence check. For struct types, it also supports customization via equality substitution. See the Equality Substitution section under Structures.

Firstly, if left-obj and right-obj are eql then they are also equal, though the converse isn't necessarily the case.

If two objects are both cons cells, then they are equal if their car fields are equal and their cdr fields are equal.

If two objects are vectors, they are equal if they have the same length, and their corresponding elements are equal.

If two objects are strings, they are equal if they are textually identical.

If two objects are functions, they are equal if they have equal environments, and if they have the same code. Two compiled functions are considered to have the same code if and only if they are pointers to the same function. Two interpreted functions are considered to have the same code if their list structure is equal.

Two hashes are equal if they use the same equality (both are :equal-based, or both are :eql-based or else both are :eq-based), if their associated user data elements are equal (see the function hash-userdata), if their sets of keys are identical, and if the data items associated with corresponding keys from each respective hash are equal objects.

Two ranges are equal if their corresponding to and from fields are equal.

For some aggregate objects, there is no special semantics. Two arguments which are symbols, packages, or streams are equal if and only if they are the same object.

Certain object types have a custom equal function.


9.9.5 Functions neq, neql and nequal


left-obj right-obj)
left-obj right-obj)
left-obj right-obj)


The functions neq, neql and nequal are logically negated counterparts of, respectively, eq, eql and equal.

If eq returns t for a given pair of arguments left-obj and right-obj, then neq returns nil. Vice versa, if eq returns nil, neq returns t.

The same relationship exits between eql and neql, and between equal and nequal.


9.9.6 Functions meq, meql and mequal


left-obj right-obj*)
left-obj right-obj*)
left-obj right-obj*)


The functions meq, meql and mequal ("member equal" or "multi-equal") provide a particular kind of a generalization of the binary equality functions eq, eql and equal to multiple arguments.

The left-obj value is compared to each right-obj value using the corresponding binary equality function. If a match occurs, then t is returned, otherwise nil.

The traversal of the right-obj argument values proceeds from left to right, and stops when a match is found.


9.9.7 Function less


left-obj right-obj)
obj obj*)


The less function, when called with two arguments, determines whether left-obj compares less than right-obj in a generic way which handles arguments of various types.

The argument syntax of less is generalized. It can accept one argument, in which case it unconditionally returns t regardless of that argument's value. If more than two arguments are given, then less generalizes in a way which can be described by the following equivalence pattern, with the understanding that each argument expression is evaluated exactly once:

  (less a b c) <--> (and (less a b) (less b c))
  (less a b c d) <--> (and (less a b) (less b c) (less c d))

The less function is used as the default for the lessfun argument of the functions sort and merge, as well as the testfun argument of the pos-min and find-min.

The less function is capable of comparing numbers, characters, symbols, strings, as well as lists and vectors of these. It can also compare buffers.

If both arguments are the same object so that (eq left-obj right-obj) holds true, then the function returns nil regardless of the type of left-obj, even if the function doesn't handle comparing different instances of that type. In other words, no object is less than itself, no matter what it is.

The less function pairs with the equal function. If values a and b are objects which are of suitable types to the less function, then exactly one of the following three expressions must be true: (equal a b), (less a b) or (less b a).

The less relation is: antisymmetric, such that if (less a b) is true, then then (less b a) is false; irreflexive, such that (less a a) is false; and transitive, such that (less a b) and (less b c) imply (less a c).

The following are detailed criteria that less applies to arguments of different types and combinations thereof.

If both arguments are numbers or characters, they are compared as if using the < function.

If both arguments are strings, they are compared as if using the string-lt function.

If both arguments are symbols, the following rules apply. If the symbols have names which are different, then the result is that of their names being compared by the string-lt function. If less is passed symbols which have the same name, and neither of these symbols has a home package, then the raw bit patterns of their values are compared as integers: effectively, the object with the lower machine address is considered lesser than the other. If only one of the two same-named symbols has no home package, then if that symbol is the left argument, less returns t, otherwise nil. If both same-named symbols have home packages, then the result of less is that of string-lt applied to the names of their respective packages. Thus a:foo is less than z:foo.

If both arguments are conses, then they are compared as follows:

The less function is recursively applied to the car fields of both arguments. If it yields true, then left-obj is deemed to be less than right-obj.
Otherwise, if the car fields are unequal under the equal function, less returns nil.
If the car fields are equal then less is recursively applied to the cdr fields of the arguments, and the result of that comparison is returned.

This logic performs a lexicographic comparison on ordinary lists such that for instance (1 1) is less than (1 1 1) but not less than (1 0) or (1).

Note that the empty nil list nil compared to a cons is handled by type-based precedence, described below.

Two vectors are compared by less lexicographically, similarly to strings. Corresponding elements, starting with element 0, of the vectors are compared until an index position is found where corresponding elements of the two vectors are not equal. If this differing position is beyond the end of one of the two vectors, then the shorter vector is considered to be lesser. Otherwise, the result of less is the outcome of comparing those differing elements themselves with less.

Two buffers are also compared by less lexicographically, as if they were vectors of integer byte values.

Two ranges are compared by less using lexicographic logic similar to conses and vectors. The from fields of the ranges are first compared. If they are not equal, equal then less is applied to those fields and the result is returned. If the from fields are equal, then less is applied to the to fields and that result is returned.

If the two arguments are of the above types, but of different types from each other, then less resolves the situation based on the following precedence: numbers and characters are less than ranges, which are less than strings, which are less than symbols, which are less than conses, which are less than vectors, which are less than buffers.

Note that since nil is a symbol, it is ranked lower than a cons. This interpretation ensures correct behavior when nil is regarded as an empty list, since the empty list is lexicographically prior to a nonempty list.

If either argument is a structure for which the equal method is defined, the method is invoked on that argument, and the value returned is used in place of that argument for performing the comparison. Structures with no equal method cannot participate in a comparison, resulting in an error. See the Equality Substitution section under Structures.

Finally, if either of the arguments has a type other than the above types, the situation is an error.


9.9.8 Function greater


left-obj right-obj)
obj obj*)


The greater function is equivalent to less with the arguments reversed. That is to say, the following equivalences hold:

  (greater a) <--> (less a) <--> t
  (greater a b) <--> (less b a)
  (greater a b c ...) <--> (less ... c b a)

The greater function is used as the default for the testfun argument of the pos-max and find-max functions.


9.9.9 Functions lequal and gequal


obj obj*)
obj obj*)


The functions lequal and gequal are similar to less and greater respectively, but differ in the following respect: when called with two arguments which compare true under the equal function, the lequal and gequal functions return t.

When called with only one argument, both functions return t and both functions generalize to three or more arguments in the same way as do less and greater.


9.9.10 Function copy




The copy function duplicates objects of various supported types: sequences, hashes, structures and random states. If object is nil, it returns nil. Otherwise, copy is equivalent to invoking a more specific copying function according to the type of the argument, as follows:

(copy-list object)
(copy-str object)
(copy-vec object)
(copy-hash object)
struct type
(copy-struct object)
(copy-fun object)
(copy-buf object)
(copy-carray object)
(make-random-state object)
(copy-tnode object)
(copy-search-tree object)
(copy-tree-iter object)
(copy-cptr object)

For all other types of object, the invocation is erroneous.

Except in the case when sequence is nil, copy returns a value that is distinct from (not eq to) sequence. This is different from the behavior of [sequence 0..t] or (sub sequence 0 t) which recognize that they need not make a copy of sequence, and just return it.

Note however, that the elements of the returned sequence may be eq to elements of the original sequence. In other words, copy is a deeper copy than just duplicating the sequence value itself, but it is not a deep copy.


9.10 List Manipulation


9.10.1 Function cons


car-value cdr-value)


The cons function allocates, initializes and returns a single cons cell. A cons cell has two fields called car and cdr, which are accessed by functions of the same name, or by the functions first and rest, which are synonyms for these.

Lists are made up of conses. A (proper) list is either the symbol nil denoting an empty list, or a cons cell which holds the first item of the list in its car, and the list of the remaining items in cdr. The expression (cons 1 nil) allocates and returns a single cons cell which denotes the one-element list (1). The cdr is nil, so there are no additional items.

A cons cell whose cdr is an atom other than nil is printed with the dotted pair notation. For example the cell produced by (cons 1 2) is denoted (1 . 2). The notation (1 . nil) is perfectly valid as input, but the cell which it denotes will print back as (1). The notations are equivalent.

The dotted pair notation can be used regardless of what type of object is the cons cell's cdr. so that for instance (a . (b c)) denotes the cons cell whose car is the symbol a a and whose cdr is the list (b c). This is exactly the same thing as (a b c). In other words (a b ... l m . (n o ... w . (x y z))) is exactly the same as (a b ... l m n o ... w x y z).

Every list, and more generally cons-cell tree structure, can be written in a "fully dotted" notation, such that there are as many dots as there are cells. For instance the cons structure of the nested list (1 (2) (3 4 (5))) can be made more explicit using (1 . ((2 . nil) . ((3 . (4 . ((5 . nil) . nil))) . nil)))). The structure contains eight conses, and so there are eight dots in the fully dotted notation.

The number of conses in a linear list like (1 2 3) is simply the number of items, so that list in particular is made of three conses. Additional nestings require additional conses, so for instance (1 2 (3)) requires four conses. A visual way to count the conses from the printed representation is to count the atoms, then add the count of open parentheses, and finally subtract one.

A list terminated by an atom other than nil is called an improper list, and the dot notation is extended to cover improper lists. For instance (1 2 . 3) is an improper list of two elements, terminated by 3, and can be constructed using (cons 1 (cons 2 3)). The fully dotted notation for this list is (1 . (2 . 3)).


9.10.2 Function atom




The atom function tests whether value is an atom. It returns t if this is the case, nil otherwise. All values which are not cons cells are atoms.

(atom x) is equivalent to (not (consp x)).


  (atom 3) -> t
  (atom (cons 1 2)) -> nil
  (atom "abc") -> t
  (atom '(3)) -> nil


9.10.3 Function consp




The consp function tests whether value is a cons. It returns t if this is the case, nil otherwise.

(consp x) is equivalent to (not (atom x)).

Nonempty lists test positive under consp because a list is represented as a reference to the first cons in a chain of one or more conses.

Note that a lazy cons is a cons and satisfies the consp test. See the function make-lazy-cons and the macro lcons.


  (consp 3) -> nil
  (consp (cons 1 2)) -> t
  (consp "abc") -> nil
  (consp '(3)) -> t


9.10.4 Accessors car and first


  (set (car
object) new-value)
  (set (first
object) new-value)


The functions car and first are synonyms.

If object is a cons cell, these functions retrieve the car field of that cons cell. (car (cons 1 2)) yields 1.

For programming convenience, object may be of several other kinds in addition to conses.

(car nil) is allowed, and returns nil.

object may also be a vector or a string. If it is an empty vector or string, then nil is returned. Otherwise the first character of the string or first element of the vector is returned.

object may be a structure. The car operation is possible if the object has a car method. If so, car invokes that method and returns whatever the method returns. If the structure has no car method, but has a lambda method, then the car function calls that method with one argument, that being the integer zero. Whatever the method returns, car returns. If neither method is defined, an error exception is thrown.

A car form denotes a valid place whenever object is a valid argument for the rplaca function. Modifying the place denoted by the form is equivalent to invoking rplaca with object as the left argument, and the replacement value as the right argument. It takes place in the manner given under the description rplaca function, and obeys the same restrictions.

A car form supports deletion. The following equivalence then applies:

  (del (car place)) <--> (pop place)

This implies that deletion requires the argument of the car form to be a place, rather than the whole form itself. In this situation, the argument place may have a value which is nil, because pop is defined on an empty list.

The abstract concept behind deleting a car is that physically deleting this field from a cons, thereby breaking it in half, would result in just the cdr remaining. Though fragmenting a cons in this manner is impossible, deletion simulates it by replacing the place which previously held the cons, with that cons' cdr field. This semantics happens to coincide with deleting the first element of a list by a pop operation.


9.10.5 Accessors cdr and rest


  (set (cdr
object) new-value)
  (set (rest
object) new-value)


The functions cdr and rest are synonyms.

If object is a cons cell, these functions retrieve the cdr field of that cons cell. (cdr (cons 1 2)) yields 2.

For programming convenience, object may be of several other kinds in addition to conses.

(cdr nil) is allowed, and returns nil.

object may also be a vector or a string. If it is a nonempty string or vector containing at least two items, then the remaining part of the object is returned, with the first element removed. For example (cdr "abc") yields "bc". If object is a one-element vector or string, or an empty vector or string, then nil is returned. Thus (cdr "a") and (cdr "") both result in nil.

If object is a structure, then cdr requires it to support either the cdr method or the lambda method. If both are present, cdr is used. When the cdr function uses the cdr method, it invokes it with no arguments. Whatever value the method returns becomes the return value of cdr. When cdr invokes a structure's lambda method, it passes as the argument the range object #R(1 t). Whatever the lambda method returns becomes the return value of cdr.

The invocation syntax of a cdr or rest form is a syntactic place. The place is semantically correct if object is a valid argument for the rplacd function. Modifying the place denoted by the form is equivalent to invoking rplacd with object as the left argument, and the replacement value as the right argument. It takes place in the manner given under the description rplacd function, and obeys the same restrictions.

A cdr place supports deletion, according to the following near equivalence:

  (del (cdr place)) <--> (prog1 (cdr place)
                                (set place (car place)))

The place expression is evaluated only once.

Note that this is symmetric with the delete semantics of car in that the cons stored in place goes away, as does the cdr field, leaving just the car, which takes the place of the original cons.


Walk every element of the list (1 2 3) using a for loop:

    (for ((i '(1 2 3))) (i) ((set i (cdr i)))
      (print (car i) *stdout*)
      (print #\newline *stdout*))

The variable i marches over the cons cells which make up the "backbone" of the list. The elements are retrieved using the car function. Advancing to the next cell is achieved using (cdr i). If i is the last cell in a (proper) list, (cdr i) yields nil and so i becomes nil, the loop guard expression i fails and the loop terminates.


9.10.6 Functions rplaca and rplacd


object new-car-value)
object new-cdr-value)


If object is a cons cell or lazy cons cell, then rplaca and rplacd functions assign new values into the car and cdr fields of the object. In addition, these functions are meaningful for other kinds of objects also.

Note that, except for the difference in return value, (rplaca x y) is the same as the more generic (set (car x) y), and likewise (rplacd x y) can be written as (set (cdr x) y).

The rplaca and rplacd functions return cons. Note: In TXR versions 89 and earlier, these functions returned the new value. The behavior was undocumented.

The cons argument does not have to be a cons cell. Both functions support meaningful semantics for vectors and strings. If cons is a string, it must be modifiable.

The rplaca function replaces the first element of a vector or first character of a string. The vector or string must be at least one element long.

The rplacd function replaces the suffix of a vector or string after the first element with a new suffix. The new-cdr-value must be a sequence, and if the suffix of a string is being replaced, it must be a sequence of characters. The suffix here refers to the portion of the vector or string after the first element.

It is permissible to use rplacd on an empty string or vector. In this case, new-cdr-value specifies the contents of the entire string or vector, as if the operation were done on a nonempty vector or string, followed by the deletion of the first element.

The object argument may be a structure. In the case of rplaca, the structure must have a defined rplaca method or else, failing that, a lambda-set method. The first of these methods which is available, in the given order, is used to perform the operation. Whatever the respective method returns, If the lambda-set method is used, it is called with two arguments (in addition to object): the integer zero, and new-car-value.

In the case of rplacd, the structure must have a defined rplacd method or else, failing that, a lambda-set method. The first of these methods which is available, in the given order, is used to perform the operation. Whatever the respective method returns, If the lambda-set method is used, it is called with two arguments (in addition to object): the range value #R(1 t) and new-car-value.


9.10.7 Accessors second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth


  (set (first
object) new-value)
  (set (second
object) new-value)
  (set (tenth
object) new-value)


Used as functions, these accessors retrieve the elements of a sequence by position. If the sequence is shorter than implied by the position, these functions return nil.

When used as syntactic places, these accessors denote the storage locations by position. The location must exist, otherwise an error exception results. The places support deletion.


  (third '(1 2)) -> nil
  (second "ab") -> #\b
  (third '(1 2 . 3)) -> **error, improper list*

  (let ((x (copy "abcd")))
    (inc (third x))
    x) -> "abce"


9.10.8 Functions append and nconc


  (append [
  (nconc [


The append function creates a new object which is a catenation of the list arguments. All arguments are optional; append produces the empty list, and if a single argument is specified, that argument is returned.

If two or more arguments are present, then the situation is identified as one or more sequence arguments followed by last-arg. The sequence arguments must be sequences; last-arg may be a sequence or atom.

The append operation over three or more arguments is left-associative, such that (append x y z) is equivalent to both (append (append x y) z) and (append x (append z y)).

This allows the catenation of an arbitrary number of arguments to be understood in terms of a repeated application of the two-argument case, whose semantics is given by these rules:

nil catenates with nil to produce nil:
  (append nil nil) -> nil
nil catenates with a proper or improper list, producing that list itself:
  (append nil '(1 2)) -> (1 2)
  (append nil '(1 2 . 3)) -> (1 2 . 3)
A proper list catenates with nil, producing that list itself:
  (append '(1 2) nil) -> (1 2)
A proper list catenates with an atom, producing an improper list terminated by that atom, whether or not that atom is a sequence:
  (append '(1 2) #(3)) -> (1 2 . #(3))
  (append '(1 2) 3) -> (1 2 . 3)
A non-list sequence catenates with another sequence into a sequence, producing a sequence which contains the elements of both, of the same kind as the left sequence. The elements must be compatible; a string can only catenate with a sequence of characters.
  (append #(1 2) #(3 4)) -> #(1 2 3 4)
  (append "ab" "cd") -> "abcd"
  (append "ab" #(#\c #\d)) -> "abcd"
  (append "ab" #(3 4)) -> ;; error
A non-list sequence catenates with an atom if it is a suitable element type for that kind of sequence. The resulting sequence is of the same kind, and includes that atom:
  (append #(1 2) 3) -> #(1 2 3)
  (append "ab" #) -> "abc"  (append "ab" 3) -> ;; error
If an improper list is catenated with any object, the catenation takes place between the terminating atom of that list and that object. This requires the terminating atom to be a sequence. If the catenation is possible, then the result is a new improper list which is a copy of the original, but with the terminating atom replaced by a catenation of that atom and the object:
  (append '(1 2 . "ab") "c") -> (1 2 . "abc")
  (append '(1 2 . "ab") '(2 3)) -> ;; error
A non-sequence atom doesn't catenate; the situation is erroneous:
  (append 1 2) -> ;; error
  (append '(1 . 2) 3) -> ;; error

If N arguments are specified, where N > 1, then the first N-1 arguments must be proper lists. Copies of these lists are catenated together. The last argument N, shown in the above syntax as last-arg, may be any kind of object. It is installed into the cdr field of the last cons cell of the resulting list. Thus, if argument N is also a list, it is catenated onto the resulting list, but without being copied. Argument N may be an atom other than nil; in that case append produces an improper list.

The nconc function works like append, but may destructively manipulate any of the input objects.


  ;; An atom is returned.
  (append 3) -> 3

  ;; A list is also just returned: no copying takes place.
  ;; The eq function can verify that the same object emerges
  ;; from append that went in.
  (let ((list '(1 2 3)))
    (eq (append list) list)) -> t

  (append '(1 2 3) '(4 5 6) 7) -> '(1 2 3 4 5 6 . 7))

  ;; the (4 5 6) tail of the resulting list is the original
  ;; (4 5 6) object, shared with that list.

  (append '(1 2 3) '(4 5 6)) -> '(1 2 3 4 5 6)

  (append nil) -> nil

  ;; (1 2 3) is copied: it is not the last argument
  (append '(1 2 3) nil) -> (1 2 3)

  ;; empty lists disappear
  (append nil '(1 2 3) nil '(4 5 6)) -> (1 2 3 4 5 6)
  (append nil nil nil) -> nil

  ;; atoms and improper lists other than in the last position
  ;; are erroneous
  (append '(a . b) 3 '(1 2 3)) -> **error**

  ;; sequences other than lists can be catenated.
  (append "abc" "def" "g" #\h) -> "abcdefgh"

  ;; lists followed by non-list sequences end with non-list
  ;; sequences catenated in the terminating atom:
  (append '(1 2) '(3 4) "abc" "def") -> (1 2 3 4 . "abcdef")


9.10.9 Function append*


  (append* [


The append* function lazily catenates lists.

If invoked with no arguments, it returns nil. If invoked with a single argument, it returns that argument.

Otherwise, it returns a lazy list consisting of the elements of every list argument from left to right.

Arguments other than the last are treated as lists, and traversed using car and cdr functions to visit their elements.

The last argument isn't traversed: rather, that object itself becomes the cdr field of the last cons cell of the lazy list constructed from the previous arguments.


9.10.10 Functions revappend and nreconc


list1 list2)
list1 list2)


The revappend function returns a list consisting of list2 appended to a reversed copy of list1. The returned object shares structure with list2, which is unmodified.

The nreconc function behaves similarly, except that the returned object may share structure with not only list2 but also list1, which is modified.


9.10.11 Function list




The list function creates a new list, whose elements are the argument values.


  (list) -> nil
  (list 1) -> (1)
  (list 'a 'b) -> (a b)


9.10.12 Function list*




The list* function is a generalization of cons. If called with exactly two arguments, it behaves exactly like cons: (list* x y) is identical to (cons x y). If three or more arguments are specified, the leading arguments specify additional atoms to be consed to the front of the list. So for instance (list* 1 2 3) is the same as (cons 1 (cons 2 3)) and produces the improper list (1 2 . 3). Generalizing in the other direction, list* can be called with just one argument, in which case it returns that argument, and can also be called with no arguments in which case it returns nil.


  (list*) -> nil
  (list* 1) -> 1
  (list* 'a 'b) -> (a . b)
  (list* 'a 'b 'c) -> (a b . c)

Dialect Note:

Note that unlike in some other Lisp dialects, the effect of (list* 1 2 x) can also be obtained using (list 1 2 . x). However, (list* 1 2 (func 3)) cannot be rewritten as (list 1 2 . (func 3)) because the latter is equivalent to (list 1 2 func 3).


9.10.13 Accessor sub-list


list [from [to]])
  (set (sub-list
list [from [to]]) new-value)


The sub-list function has the same parameters and semantics as the sub function, except that it operates on its list argument using list operations, and assumes that list is terminated by nil.

If a sub-list form is used as a place, then the list argument form must also be a place.

The sub-list place denotes a subrange of list as if it were a storage location. The previous value of this location, if needed, is fetched by a call to sub-list. Storing new-value to the place is performed by a call to replace-list. The return value of replace-list is stored into list. In an update operation which accesses the prior value and stores a new value, the arguments list, from, to and new-value are evaluated once.


9.10.14 Function replace-list


list item-sequence [from [to]])


The replace-list function is like the replace function, except that it operates on its list argument using list operations. It assumes that list is terminated by nil, and that it is made of cells which can be mutated using rplaca.


9.10.15 Functions listp and proper-list-p




The listp and proper-list-p functions test, respectively, whether value is a list, or a proper list, and return t or nil accordingly.

The listp test is weaker, and executes without having to traverse the object. The value produced by the expression (listp x) is the same as that of (or (null x) (consp x)), except that x is evaluated only once. The empty list nil is a list, and a cons cell is a list.

The proper-list-p function returns t only for proper lists. A proper list is either nil, or a cons whose cdr is a proper list. proper-list-p traverses the list, and its execution will not terminate if the list is circular.

These functions return nil for list-like sequences that are not made of actual cons cells.

Dialect Note: in TXR 137 and older, proper-list-p is called proper-listp. The name was changed for adherence to conventions and compatibility with other Lisp dialects, like Common Lisp. However, the function continues to be available under the old name. Code that must run on TXR 137 and older installations should use proper-listp, but its use going forward is deprecated.


9.10.16 Function endp




The endp function returns t if object is the object nil.

If object is a cons cell, then endp returns t.

Otherwise, endp function throws an exception.


9.10.17 Function length-list




The length-list function returns the length of list, which may be a proper or improper list. The length of a list is the number of conses in that list.


9.10.18 Function copy-list




The copy-list function which returns a list similar to list, but with a newly allocated cons-cell structure.

If list is an atom, it is simply returned.

Otherwise, list is a cons cell, and copy-list returns the same object as the expression (cons (car list) (copy-list (cdr list))).

Note that the object (car list) is not deeply copied, but only propagated by reference into the new list. copy-list produces a new list structure out of the same items that are in list.

Dialect Note:

Common Lisp does not allow the argument to be an atom, except for the empty list nil.


9.10.19 Function length-list-<


list len)


The length-list-< function determines whether the length of list, is less than the integer len.

The expression

  (length-list-< x y)

is similar to, but usefully different from

  (< (length-list x) y)

because length-list-< is required to only traverses list far enough to be able to determine the return value. If the end of the list is reached before len conses are encountered, the function returns t, otherwise if len conses are encountered, the function terminates immediately and returns nil.

The length-list-< function is therefore safe to use with infinite lazy lists and circular lists, for which length would not terminate.

Note: there is more generic function length-< which works with efficiently with different kinds of sequences.

Note: the length-list-< is useful in situations when a decision must be made between two algorithms based on the length of one or more input lists. The decision can be made without wastefully performing a full pass over the input lists to measure their length.


9.10.20 Function copy-cons




The copy-cons function creates and returns a new object that is a replica of cons.

The cons argument must be either a cons cell, or else a lazy cons: an object of type lcons.

A new cell of the same type as cons is created, and all of its fields are initialized by copying the corresponding fields from cons.

If cons is lazy, the newly created object is in the same state as the original. If the original has not yet been updated and thus has an update function, the copy also has not yet been updated and has the same update function.


9.10.21 Function copy-tree




The copy-tree function returns a copy of obj which represents an arbitrary cons-cell-based structure.

The cell structure of obj is traversed and a similar structure is constructed, but without regard for substructure sharing or circularity.

More precisely, if obj is an atom, then it is returned. If it is an ordinary cons cell, then copy-tree is recursively applied to the car and cdr fields to produce their individual replicas. A new cons cell is then produced from the replicated car and cdr. If obj is a lazy cons, then just like in the ordinary cons case, the car and cdr fields are duplicated with a recursive call to copy-tree. Then, a lazy cons is created from these replicated fields. If cell has an update function, then the newly created lazy cons has the same update function; the function isn't copied.

Like copy-cons, the copy-tree function doesn't trigger the update of lazy conses. The copies of lazy conses which have not been updated are also conses which have not been updated.


9.10.22 Functions reverse and nreverse





The functions reverse and nreverse produce an object which contains the same items as proper list list, but in reverse order. If list is nil, then both functions return nil.

The reverse function is non-destructive: it creates a new list.

The nreverse function creates the structure of the reversed list out of the cons cells of the input list, thereby destructively altering it (if it contains more than one element). How nreverse uses the material from the original list is unspecified. It may rearrange the cons cells into a reverse order, or it may keep the structure intact, but transfer the car values among cons cells into reverse order. Other approaches are possible.


9.10.23 Accessor nthlast


index list)
  (set (nthlast
index list) new-value)


The nthlast function retrieves the n-th last cons cell of a list, indexed from one. The index parameter must be a an integer. If index is positive and so large that it specifies a nonexistent cons beyond the beginning of the list, nthlast returns list. Effectively, values of index larger than the length of the list are clamped to the length. If index is negative, then nthlast yields nil. An index value of zero retrieves the terminating atom of list or else the value list itself, if list is an atom.

The following equivalences hold:

  (nthlast 1 list) <--> (last list)

An nthlast place designates the storage location which holds the n-th cell, as indicated by the value of index.

A negative index doesn't denote a place.

A positive index greater than the length of the list is treated as if it were equal to the length of the list.

If list is itself a syntactic place, then the index value n is permitted for a list of length n. This index value denotes the list place itself. Storing to this value overwrites list. If list isn't a syntactic place, then storing to position n isn't permitted.

If list is of length zero, or an atom (in which case its length is considered to be zero) then the above remarks about position n apply to an index value of zero: if list is a syntactic place, then the position denotes list itself, otherwise the position doesn't exist as a place.

If list contains one or more elements, then index value of zero denotes the cdr field of its last cons cell. Storing a value to this place overwrites the terminating atom.


9.10.24 Accessor butlastn


num list)
  (set (butlastn
num list) new-value )


The butlastn function calculates that initial portion of list which excludes the last num elements.

Note: the butlastn function doesn't support non-list sequences as sequences; it treats them as the terminating atom of a zero-length improper list. The butlast sequence function supports non-list sequences. If x is a list, then the following equivalence holds:

  (butlastn n x)  <-->  (butlast x n)

If num is zero, or negative, then butlastn returns list.

If num is positive, and meets or exceeds the length of list, then butlastn returns nil.

If a butlastn form is used as a syntactic place, then list must be a place. Assigning to the form causes list to be replaced with a new list which is a catenation of the new value and the last num elements of the original list, according to the following equivalence:

  (set (butlastn n x) v)


  (progn (set x (append v (nthlast n x))) v)

except that n, x and v are evaluated only once, in left-to-right order.


9.10.25 Accessor nth


index object)
  (set (nth
index object) new-value)


The nth function performs random access on a list, retrieving the n-th element indicated by the zero-based index value given by index. The index argument must be a nonnegative integer.

If index indicates an element beyond the end of the list, then the function returns nil.

The following equivalences hold:

  (nth 0 list) <--> (car 0) <--> (first list)
  (nth 1 list) <--> (cadr list) <--> (second list)
  (nth 2 list) <--> (caddr list) <--> (third list)

  (nth x y) <--> (car (nthcdr x y))


9.10.26 Accessor nthcdr


index list)
  (set (nthcdr
index list) new-value)


The nthcdr function retrieves the n-th cons cell of a list, indexed from zero. The index parameter must be a nonnegative integer. If index specifies a nonexistent cons beyond the end of the list, then nthcdr yields nil.

The following equivalences hold:

  (nthcdr 0 list) <--> list
  (nthcdr 1 list) <--> (cdr list)
  (nthcdr 2 list) <--> (cddr list)

  (car (nthcdr x y)) <--> (nth x y)

An nthcdr place designates the storage location which holds the n-th cell, as indicated by the value of index. Indices beyond the last cell of list do not designate a valid place. If list is itself a place, then the zeroth index is permitted and the resulting place denotes list. Storing a value to (nthcdr 0 list) overwrites list. Otherwise if list isn't a syntactic place, then the zeroth index does not designate a valid place; index must have a positive value. A nthcdr place does not support deletion.

Dialect Note:

In Common Lisp, nthcdr is only a function, not an accessor; nthcdr forms do not denote places.


9.10.27 Function tailp


object list)


The tailp function tests whether object is a tail of list. This means that object is either list itself, or else one of the cons cells of list or else the terminating atom of list.

More formally, a recursive definition follows. If object and list are the same object (thus equal under the eq function) then tailp returns t. If list is an atom, and is not object, then the function returns nil. Otherwise, list is a cons that is not object and tailp yields the same value as the (tailp object (cdr list)) expression.


9.10.28 Accessors caar, cadr, cdar, cddr, ..., cdddddr


  (set (caar
object) new-value)
  (set (cadr
object) new-value)


The a-d accessors provide a shorthand notation for accessing two to five levels deep into a cons-cell-based tree structure. For instance, the the equivalent of the nested function call expression (car (car (cdr object))) can be achieved using the single function call (caadr object). The symbol names of the a-d accessors are a generalization of the words "car" and "cdr". They encode the pattern of car and cdr traversal of the structure using a sequence of the letters a and d placed between c and r. The traversal is encoded in right-to-left order, so that cadr indicates a traversal of the cdr link, followed by the car. This order corresponds to the nested function call notation, which also encodes the traversal right-to-left. The following diagram illustrates the straightforward relationship:

  (cdr (car (cdr x)))
    ^    ^    ^
    |   /     |
    |  /     /
    | / ____/
    || /
  (cdadr x)

TXR Lisp provides all possible a-d accessors up to five levels deep, from caar all the way through cdddddr.

Expressions involving a-d accessors are places. For example, (caddr x) denotes the same place as (car (cddr x)), and (cdadr x) denotes the same place as (cdr (cadr x)).

The a-d accessor places support deletion, with semantics derived from the deletion semantics of the car and cdr places. For example, (del (caddr x)) means the same as (del (car (cddr x))).


9.10.29 Functions cyr and cxr


address object)
address object)


The cyr and cxr functions provide car/cdr navigation of tree structure driven by numeric address given by the address argument.

The address argument can express any combination of the application of car and cdr functions, including none at all.

The difference between cyr and cxr is the bit order of the encoding. Under cyr, the most significant bit of the encoding given in address indicates the initial car/cdr navigation, and the least significant bit gives the final one. Under cxr, it is opposite.

Both functions require address to be a positive integer. Any other argument raises an error.

Under both functions, the address value 1 encodes the identity operation: no car/cdr


9.10.30 Functions flatten and flatten*


  (flatten {
list | atom})
  (flatten* {
list | atom})


The flatten function recursively traverses a nested list, returning a list whose elements are all of the non-nil atoms contained in list, at any level of nesting. If the argument is an atom rather than a list, then it is returned. Otherwise, the list argument must be a proper list, as must all lists nested within it.