Manpage for TXR 151

Sep 27, 2016

Contents

[collapse all]
1 NAME
2 SYNOPSIS
3 DESCRIPTION
4 ARGUMENTS AND OPTIONS
5 STATUS AND ERROR REPORTING
[+]6 BASIC TXR SYNTAX
[+]7 DIRECTIVES
[+]8 TXR LISP
[+]9 LISP OPERATOR, FUNCTION AND MACRO REFERENCE
[+]10 INTERACTIVE LISTENER
11 SETUID/SETGID OPERATION
[+]12 STAND-ALONE APPLICATION SUPPORT
13 DEBUGGER
[+]14 COMPATIBILITY
[+]15 APPENDIX

 

1 NAME

TXR - text processing language (version 151)

 

2 SYNOPSIS

txr [ options ] [ script-file [ data-files ... ]]

 

3 DESCRIPTION

TXR is a language oriented toward processing text from files or streams, supporting multiple programming paradigms. It is a combination of two programming languages: an text scanning and extraction language referred to as the TXR pattern language, or sometimes just TXR when it is clear, and a general-purpose dialect of Lisp called TXR Lisp.

A script written in the TXR pattern language is referred to in this document as a query, and it specifies a pattern which matches (a prefix of) an entire file, or multiple files. Patterns can consists of large chunks of multi-line free-form text, which is matched literally against material in the input sources. Free variables occurring in the pattern (denoted by the @ symbol) are bound to the pieces of text occurring in the corresponding positions. If the overall match is successful, then TXR can do one of two things: it can report the list of variables which were bound, in the form of a set of variable assignments which can be evaluated by the eval command of the POSIX shell language, or generate a custom report according to special directives in the query. Patterns can be arbitrarily complex, and can be broken down into named pattern functions, which may be mutually recursive. TXR patterns can work horizontally (characters within a line) or vertically (spanning multiple lines). Multiple lines can be treated as a single line.

In addition to embedded variables which implicitly match text, the TXR pattern language supports a number of directives, for matching text using regular expressions, for continuing a match in another file, for searching through a file for the place where an entire sub-query matches, for collecting lists, and for combining sub-queries using logical conjunction, disjunction and negation, and numerous others.

Furthermore, embedded within TXR is a powerful Lisp dialect. TXR Lisp supports functional and imperative programming, and provides data types such as symbols, strings, vectors, hash tables with weak reference support, lazy lists, and arbitrary-precision (bignum integers).

 

4 ARGUMENTS AND OPTIONS

If TXR is given no arguments, it will enter into an interactive mode. See the INTERACTIVE LISTENER section for a description of this mode. When TXR enters interactive mode this way, it prints a one-line banner is printed announcing the program name and version, and one line of help text instructing the user how to exit.

Options which don't take an argument may be combined together. The -v and -q options are mutually exclusive. Of these two, the one which occurs in the rightmost position in the argument list dominates. The -c and -f options are also mutually exclusive; if both are specified, it is a fatal error.

-Dvar=value
Bind the variable var to the value value prior to processing the query. The name is in scope over the entire query, so that all occurrence of the variable are substituted and match the equivalent text. If the value contains commas, these are interpreted as separators, which give rise to a list value. For instance -Da,b,c creates a list of the strings "a", "b" and "c". (See Collect Directive bellow). List variables provide a multiple match. That is to say, if a list variable occurs in a query, a successful match occurs if any of its values matches the text. If more than one value matches the text, the first one is taken.

-Dvar
Binds the variable var to an empty string value prior to processing the query.

-q
Quiet operation during matching. Certain error messages are not reported on the standard error device (but the if the situations occur, they still fail the query). This option does not suppress error generation during the parsing of the query, only during its execution.

-i
If this option is present, then TXR will enter into an interactive interpretation mode after processing all options, and the input query if one is present. See the INTERACTIVE LISTENER section for a description of this mode.

-d
--debugger
Invoke the interactive TXR debugger. See the DEBUGGER section.

-n
--noninteractive
This option affects behavior related to TXR's *std-input* stream. Normally, if this stream is connected to a terminal device, it is automatically marked as having the real-time property when TXR starts up (see the functions .code stream-set-prop and real-time-stream-p). The -n option suppresses this behavior; the *std-input* stream remains ordinary.

The TXR pattern language reads standard input via a lazy list, created by applying the lazy-stream-cons function to the *std-input* stream. If that stream is marked real-time, then the lazy list which is returned by that function has behaviors that are better suited for scanning interactive input. A more detailed explanation is given under the description of this function.

-v
Verbose operation. Detailed logging is enabled.

-b
This is a deprecated option, which is silently ignored. In TXR versions prior to 90, the printing of variable bindings (see -B option) was implicit behavior which was automatically suppressed in certain situations. The -b option suppressed it unconditionally.

-B
If the query is successful, print the variable bindings as a sequence of assignments in shell syntax that can be eval-ed by a POSIX shell. II the query fails, print the word "false". Evaluation of this word by the shell has the effect of producing an unsuccessful termination status from the shell's eval command.

-l or --lisp-bindings
This option implies -B. Print the variable bindings in Lisp syntax instead of shell syntax.

-a num
This option implies -B. The decimal integer argument num specifies the maximum number of array dimensions to use for list-valued variable bindings. The default is 1. Additional dimensions are expressed using numeric suffixes in the generated variable names. For instance, consider the three-dimensional list arising out of a triply nested collect: ((("a" "b") ("c" "d")) (("e" "f") ("g" "h"))). Suppose this is bound to a variable V. With -a 1, this will be reported as:


  V_0_0[0]="a"
  V_0_1[0]="b"
  V_1_0[0]="c"
  V_1_1[0]="d"
  V_0_0[1]="e"
  V_0_1[1]="f"
  V_1_0[1]="g"
  V_1_1[1]="h"

With -a 2, it comes out as:


  V_0[0][0]="a"
  V_1[0][0]="b"
  V_0[0][1]="c"
  V_1[0][1]="d"
  V_0[1][0]="e"
  V_1[1][0]="f"
  V_0[1][1]="g"
  V_1[1][1]="h"

The leftmost bracketed index is the most major index. That is to say, the dimension order is: NAME_m_m+1_..._n[1][2]...[m-1].

-c query
Specifies the query in the form of a command line argument. If this option is used, the script-file argument is omitted. The first non-option argument, if there is one, now specifies the first input source rather than a query. Unlike queries read from a file, (non-empty) queries specified as arguments using -c do not have to properly end in a newline. Internally, TXR adds the missing newline before parsing the query. Thus -c "@a" is a valid query which matches a line.

Example:

Shell script which uses TXR to read two lines "1" and "2" from standard input, binding them to variables a and b. Standard input is specified as - and the data comes from shell "here document" redirection:

code:
 #!/bin/sh


 txr -B -c "@a
 @b" - <<!
 1
 2
 !

output:
 a=1
 b=2

The @; comment syntax can be used for better formatting:


  txr -B -c "@;
  @a
  @b"

-f script-file
Specifies the file from which the query is to be read, instead of the script-file argument. This is useful in #! ("hash bang") scripts. (See Hash Bang Support below).

-e expression
Evaluates a TXR Lisp expression for its side effects, without printing its value. Can be specified more than once. The script-file argument becomes optional if -e is used at least once. If the evaluation of every expression evaluated this way terminates normally, and there is no script-file argument, then TXR terminates with a successful status.

-p expression
Just like -e but prints the value of expression using the prinl function.

-P expression
Like -p but prints using the pprinl function.

-t expression
Like -p but prints using the tprint function.

-C number
--compat=number

Requests TXR to behave in a manner that is compatible with the specified version of TXR. This makes a difference in situations when a release of TXR breaks backward compatibility. If some version N+1 deliberately introduces a change which is backward incompatible, then -C N can be used to request the old behavior.

The requested value of N can be too low, in which case TXR will complain and exit with an unsuccessful termination status. This indicates that TXR refuses to be compatible with such an old version. Users requiring the behavior of that version will have to install an older version of TXR which supports that behavior, or even that exact version.

If the option is specified more than once, the behavior is not specified.

Compatibility can also be requested via the TXR_COMPAT environment variable instead of the -C option.

For more information, see the COMPATIBILITY section.

--gc-delta=number

The number argument to this option must be a decimal integer. It represents a megabyte value, the "GC delta": one megabyte is 1048576 bytes. The "GC delta" controls an aspect of the garbage collector behavior. See the gc-set-delta function for a description.

--debug-autoload
This option turns on debugging, like --debugger but also requests stepping into the auto-load processing of TXR Lisp library code. Normally, debugging through the evaluations triggered by auto-loading is suppressed.

--debug-expansion
This option turns on debugging, like --debugger but also requests stepping into the parse-time macro-expansion of TXR Lisp code embedded in TXR queries. Normally, this is suppressed.

--help
Prints usage summary on standard output, and terminates successfully.

--license
Prints the software license. This depends on the software being installed such that the LICENSE file is in the data directory. Use of TXR implies agreement with the liability disclaimer in the license.

--version
Prints program version standard output, and terminates successfully.

--args
The --args option provides a way to encode multiple arguments as a single argument, which is useful on some systems which have limitations in their implementation of the "hash bang" mechanism. For details about its special syntax, See Hash Bang Support below. It is also very useful in stand-alone application deployment. See the section STAND-ALONE APPLICATION SUPPORT, in which example uses of --args are shown.

--eargs
The --eargs option (extended --args) is like --args but must be followed by an argument. The argument is substituted in place of occurrences of {} in the --eargs syntax.

--lisp
This option influences the treatment of query files which do not have a suffix indicating their type: they are treated as TXR Lisp source. Moreover, if --lisp is specified, and an unsuffixed file does not exist, then TXRwill add the ".tl" suffix and try the file again. In the same situation, if --lisp is not present, TXR will first try adding the ".txr" suffix. If that fails, then ".tl" suffix will be tried. Note that --lisp influences how the argument of the -f option is treated, but only if it precedes that option. It has no effect on the -c option. The argument of -c is always TXR pattern language code. Lisp code can be evaluated using the -e, -p, or -P options.

--reexec
On platforms which support the POSIX exec family of functions, this option causes TXR to re-execute itself. The re-executed image receives the remaining arguments which follow the --reexec argument. Note: this option is useful for supporting setuid operation in "hash hang" scripts. On some platforms, the interpreter designated by a "hash bang" script runs without altered privilege, even if that interpreter is installed setuid. If the interpreter is executed directly, then setuid applies to it, but not if it is executed via "hash bang". If the --reexec option is used in the interpreter command line of such a script, the interpreter will re-execute itself, thereby gaining the setuid privilege. The re-executed image will then obtain the script name from the arguments which are passed to it and determine whether that script will run setuid. See the section SETUID/SETGID OPERATION.

--gc-debug
This option enables a behavior which stresses the garbage collector with frequent garbage collection requests. The purpose is to make it more likely to reproduce certain kinds of bugs. It makes TXR run very slowly.

--vg-debug
If TXR is enabled with Valgrind support, then this option is available. It enables code which uses the Valgrind API to integrate with the Valgrind debugger, for more accurate tracking of garbage collected objects. For example, objects which have been reclaimed by the garbage collector are marked as inaccessible, and marked as uninitialized when they are allocated again.

--dv-regex
If this option is used, then regular expressions are all treated using the derivative-based back-end. The NFA-based regex implementation is disabled. Normally, only regular expressions which require the intersection and complement operators are handled using the derivative back-end. This option makes it possible to test that back-end on test cases that it wouldn't normally receive.

--
Signifies the end of the option list.

-
This argument is not interpreted as an option, but treated as a filename argument. After the first such argument, no more options are recognized. Even if another argument looks like an option, it is treated as a name. This special argument - means "read from standard input" instead of a file. The script-file, or any of the data files, may be specified using this option. If two or more files are specified as -, the behavior is system-dependent. It may be possible to indicate EOF from the interactive terminal, and then specify more input which is interpreted as the second file, and so forth.

After the options, the remaining arguments are files. The first file argument specifies the script file, and is mandatory if the -f option has not been specified, and TXR isn't operating in interactive mode or evaluating expressions from the command line via -e or one of the related options. A file argument consisting of a single - means to read the standard input instead of opening a file.

TXR begins by reading the script. In the case of the TXR pattern language, the entire query is scanned, internalized and then begins executing, if it is free of syntax errors. (TXR Lisp is processed differently, form by form). On the other hand, the pattern language reads data files in a lazy manner. A file isn't opened until the query demands material from that file, and then the contents are read on demand, not all at once.

The suffix of the script-file is significant. If the name has no suffix, or if it has a ".txr" suffix, then it is assumed to be in the TXR pattern language. If it has the ".tl" suffix, then it is assumed to be TXR Lisp. The --lisp option changes the treatment of unsuffixed script file names, causing them to be interpreted as TXR Lisp .

If an unsuffixed script file name is specified, and cannot be opened, then TXR will add the ".txr" suffix and try again. If that fails, it will be tried with the ".tl" suffix, and treated as TXR Lisp . If the --lisp option has been specified, then TXR tries only the ".tl" suffix.

A TXR Lisp file is processed as if by the load macro: forms from the file are read and evaluated. If the forms do not terminate the TXR process or throw an exception, and there are no syntax errors, then TXR terminates successfully after evaluating the last form. If syntax errors are encountered in a form, then TXR terminates unsuccessfully. TXR Lisp is documented in the section TXR LISP.

If a query file is specified, but no file arguments, it is up to the query to open a file, pipe or standard input via the @(next) directive prior to attempting to make a match. If a query attempts to match text, but has run out of files to process, the match fails.

 

5 STATUS AND ERROR REPORTING

TXR sends errors and verbose logs to the standard error device. The following paragraphs apply when TXR is run without enabling verbose mode with -v, or the printing of variable bindings with -B or -a.

If the command line arguments are incorrect, TXR issues an error diagnostic and terminates with a failed status.

If the script-file specifies a query, and the query has a malformed syntax, TXR likewise issues error diagnostics and terminates with a failed status.

If the query fails due to a mismatch, TXR terminates with a failed status. No diagnostics are issued.

If the query is well-formed, and matches, then TXR issues no diagnostics, and terminates with a successful status.

In verbose mode (option -v), TXR issues diagnostics on the standard error device even in situations which are not erroneous.

In bindings-printing mode (options -B or -a), TXR prints the word false if the query fails, and exits with a failed termination status. If the query succeeds, the variable bindings, if any, are output on standard output.

If the script-file is TXR Lisp, then it is processed form by form. Each top-level Lisp form is evaluated after it is read. If any form is syntactically malformed, TXR issues diagnostics and terminates unsuccessfully. This is somewhat different from how the pattern language is treated: a script in the pattern language is parsed in its entirety before being executed.

 

6 BASIC TXR SYNTAX

 

6.1 Comments

A query may contain comments which are delimited by the sequence @; and extend to the end of the line. Whitespace can occur between the @ and ;. A comment which begins on a line swallows that entire line, as well as the newline which terminates it. In essence, the entire comment line disappears. If the comment follows some material in a line, then it does not consume the newline. Thus, the following two queries are equivalent:

1.
 @a@; comment: match whole line against variable @a
 @; this comment disappears entirely
 @b

2.
 @a
 @b

The comment after the @a does not consume the newline, but the comment which follows does. Without this intuitive behavior, line comment would give rise to empty lines that must match empty lines in the data, leading to spurious mismatches.

Instead of the ; character, the # character can be used. This is an obsolescent feature.

 

6.2 Hash Bang Support

TXR has several features which support use of the "hash bang" convention for creating apparently stand-alone executable programs.

If the first line of a query begins with the characters #!, that entire line is deleted from the query. This allows for TXR queries to be turned into standalone executable programs in the POSIX environment.

Shell example: create a simple executable program called "twoline.txr" and run it. This assumes TXR is installed in /usr/bin.


  $ cat > hello.txr
  #!/usr/bin/txr
  @(bind a "Hey")
  @(output)
  Hello, world!
  @(end)
  $ chmod a+x hello.txr
  $ ./hello.txr
  Hello, world!

When this plain hash bang line is used, TXR receives the name of the script as an argument. Therefore, it is not possible to pass additional options to TXR. For instance, if the above script is invoked like this


  $ ./hello.txr -B

the -B option isn't processed by TXR, but treated as an additional argument, just as if txr scriptname -B had been executed directly.

This behavior is useful if the script author wants not to expose the TXR options to the user of the script.

However, the hash bang line can use the -f option:


  #!/usr/bin/txr -f

Now, the name of the script is passed as an argument to the -f option, and TXR will look for more options after that, so that the resulting program appears to accept TXR options. Now we can run


  $ ./hello.txr -B
  Hello, world!
  a="Hey"

The -B option is honored.

On some operating systems, it is not possible to pass more than one argument through the hash bang mechanism. That is to say, this will not work.


  #!/usr/bin/txr -B -f

To support systems like this, TXR supports the special argument --args, as well as as an extended version, --eargs. With --args, it is possible to encode multiple arguments into one argument. The --args option must be followed by a separator character, chosen by the programmer. The characters after that are split into multiple arguments on the separator character. The --args option is then removed from the argument list and replaced with these arguments, which are processed in its place.

Example:


  #!/usr/bin/txr --args:-B:-f

The above has the same behavior as


  #!/usr/bin/txr -B -f

on a system which supports multiple arguments in hash bang. The separator character is the colon, and so the remainder of that argument, -B:-f, is split into the two arguments -B -f.

The --eargs mechanism allows an additional flexibility. An --eargs argument must be followed by one more argument. Occurrences of the two-character sequence {} in the encoded argument string are replaced with that following argument. This replacement occurs after the argument splitting.

Example:


  #!/usr/bin/txr --eargs:-B:{}:--foo:42

This has an effect which cannot be replicated in any known implementation of the hash bang mechanism. Suppose that this hash bang line is placed in a script called script.txr. When this script is invoked with arguments, as in:


  script.txr a b c

then TXR is invoked similarly to:


  /usr/bin/txr --eargs:-B:{}:--foo:42 script.txr a b c

Then, when --eargs processing takes place, firstly the argument sequence


  -B {} --foo 42

is produced. Then, all occurrences of {} are replaced with script.txr, resulting in:


  -B script.txr --foo 42

The resulting TXR invocation is


  /usr/bin/txr -B script.txr --foo 42 a b c

Thus, --eargs allows some arguments to be encoded into the interpreter script, such that script name is inserted anywhere among them, possibly multiple times. Arguments for the interpreter can be encoded, as well as arguments to be processed by the script.

TXR supports setuid hash bang scripting, even on platforms that do not support setuid and setgid attributes on hash bang scripts. On such platforms, TXR has to be installed setuid/setgid. See the section SETUID/SETGID OPERATION. On some platforms, it may also be necessary to to use the --reexec option.

 

6.3 Whitespace

Outside of directives, whitespace is significant in TXR queries, and represents a pattern match for whitespace in the input. An extent of text consisting of an undivided mixture of tabs and spaces is a whitespace token.

Whitespace tokens match a precisely identical piece of whitespace in the input, with one exception: a whitespace token consisting of precisely one space has a special meaning. It is equivalent to the regular expression @/[ ]+/: match an extent of one or more spaces (but not tabs!). Multiple consecutive spaces do not have this meaning.

Thus, the query line "a b" (one space between a and b) matches "a b" with any number of spaces between the two letters.

For matching a single space, the syntax @\ can be used (backslash-escaped space).

It is more often necessary to match multiple spaces than to exactly match one space, so this rule simplifies many queries and adds inconvenience to only few.

In output clauses, string and character literals and quasiliterals, a space token denotes a space.

 

6.4 Text

Query material which is not escaped by the special character @ is literal text, which matches input character for character. Text which occurs at the beginning of a line matches the beginning of a line. Text which starts in the middle of a line, other than following a variable, must match exactly at the current position, where the previous match left off. Moreover, if the text is the last element in the line, its match is anchored to the end of the line.

An empty query line matches an empty line in the input. Note that an empty input stream does not contain any lines, and therefore is not matched by an empty line. An empty line in the input is represented by a newline character which is either the first character of the file, or follows a previous newline-terminated line.

Input streams which end without terminating their last line with a newline are tolerated, and are treated as if they had the terminator.

Text which follows a variable has special semantics, described in the section Variables below.

A query may not leave a line of input partially matched. If any portion of a line of input is matched, it must be entirely matched, otherwise a matching failure results. However, a query may leave unmatched lines. Matching only four lines of a ten line file is not a matching failure. The eof directive can be used to explicitly match the end of a file.

In the following example, the query matches the text, even though the text has an extra line.

code:
 Four score and seven
 years ago our

data:
 Four score and seven
 years ago our
 forefathers

In the following example, the query fails to match the text, because the text has extra material on one line that is not matched:

code:
 I can carry nearly eighty gigs
 in my head

data:
 I can carry nearly eighty gigs of data
 in my head

Needless to say, if the text has insufficient material relative to the query, that is a failure also.

To match arbitrary material from the current position to the end of a line, the "match any sequence of characters, including empty" regular expression @/.*/ can be used. Example:

code:
 I can carry nearly eighty gigs@/.*/

data:
 I can carry nearly eighty gigs of data

In this example, the query matches, since the regular expression matches the string "of data". (See Regular Expressions section below).

Another way to do this is:

code:
 I can carry nearly eighty gigs@(skip)

 

6.5 Special Characters in Text

Control characters may be embedded directly in a query (with the exception of newline characters). An alternative to embedding is to use escape syntax. The following escapes are supported:

@\ newline
A backslash immediately followed by a newline introduces a physical line break without breaking up the logical line. Material following this sequence continues to be interpreted as a continuation of the previous line, so that indentation can be introduced to show the continuation without appearing in the data.
@\ space
A backslash followed by a space encodes a space. This is useful in line continuations when it is necessary for some or all of the leading spaces to be preserved. For instance the two line sequence


  abcd@\
    @\  efg

is equivalent to the line


  abcd  efg

The two spaces before the @\ in the second line are consumed. The spaces after are preserved.

@\a
Alert character (ASCII 7, BEL).
@\b
Backspace (ASCII 8, BS).
@\t
Horizontal tab (ASCII 9, HT).
@\n
Line feed (ASCII 10, LF). Serves as abstract newline on POSIX systems.
@\v
Vertical tab (ASCII 11, VT).
@\f
Form feed (ASCII 12, FF). This character clears the screen on many kinds of terminals, or ejects a page of text from a line printer.
@\r
Carriage return (ASCII 13, CR).
@\e
Escape (ASCII 27, ESC)
@\x hex-digits
A @\x immediately followed by a sequence of hex digits is interpreted as a hexadecimal numeric character code. For instance @\x41 is the ASCII character A. If a semicolon character immediately follows the hex digits, it is consumed, and characters which follow are not considered part of the hex escape even if they are hex digits.
@\ octal-digits

A @\ immediately followed by a sequence of octal digits (0 through 7) is interpreted as an octal character code. For instance @\010 is character 8, same as @\b. If a semicolon character immediately follows the octal digits, it is consumed, and subsequent characters are not treated as part of the octal escape, even if they are octal digits.

Note that if a newline is embedded into a query line with @\n, this does not split the line into two; it's embedded into the line and thus cannot match anything. However, @\n may be useful in the @(cat) directive and in @(output).

 

6.6 Character Handling and International Characters

TXR represents text internally using wide characters, which are used to represent Unicode code points. Script source code, as well as all data sources, are assumed to be in the UTF-8 encoding. In TXR and TXR Lisp source, extended characters can be used directly in comments, literal text, string literals, quasiliterals and regular expressions. Extended characters can also be expressed indirectly using hexadecimal or octal escapes. On some platforms, wide characters may be restricted to 16 bits, so that TXR can only work with characters in the BMP (Basic Multilingual Plane) subset of Unicode.

TXR does not use the localization features of the system library; its handling of extended characters is not affected by environment variables like LANG and L_CTYPE. The program reads and writes only the UTF-8 encoding.

If TXR encounters an invalid bytes in the UTF-8 input, what happens depends on the context in which this occurs. In a query, comments are read without regard for encoding, so invalid encoding bytes in comments are not detected. A comment is simply a sequence of bytes terminated by a newline. In lexical elements which represent text, such as string literals, invalid or unexpected encoding bytes are treated as syntax errors. The scanner issues an error message, then discards a byte and resumes scanning. Certain sequences pass through the scanner without triggering an error, namely some UTF-8 overlong sequences. These are caught when when the lexeme is subject to UTF-8 decoding, and treated in the same manner as other UTF-8 data, described in the following paragraph.

Invalid bytes in data are treated as follows. When an invalid byte is encountered in the middle of a multibyte character, or if the input ends in the middle of a multibyte character, or if a character is extracted which is encoded as an overlong form, the UTF-8 decoder returns to the starting byte of the ill-formed multibyte character, and extracts just that byte, mapping it to the Unicode character range U+DC00 through U+DCFF. The decoding resumes afresh at the following byte, expecting that byte to be the start of a UTF-8 code.

Furthermore, because TXR internally uses a null-terminated character representation of strings which easily interoperates with C language interfaces, when a null character is read from a stream, TXR converts it to the code U+DC00. On output, this code converts back to a null byte, as explained in the previous paragraph. By means of this representational trick, TXR can handle textual data containing null bytes.

 

6.7 Regular Expression Directives

In place of a piece of text (see section Text above), a regular expression directive may be used, which has the following syntax:


  @/RE/

where the RE part enclosed in slashes represents regular expression syntax (described in the section Regular Expressions below).

Long regular expressions can be broken into multiple lines using a backslash-newline sequence. Whitespace before the sequence or after the sequence is not significant, so the following two are equivalent:


  @/reg \
    ular/


  @/regular/

There may not be whitespace between the backslash and newline.

Whereas literal text simply represents itself, regular expression denotes a (potentially infinite) set of texts. The regular expression directive matches the longest piece of text (possibly empty) which belongs to the set denoted by the regular expression. The match is anchored to the current position; thus if the directive is the first element of a line, the match is anchored to the start of a line. If the regular expression directive is the last element of a line, it is anchored to the end of the line also: the regular expression must match the text from the current position to the end of the line.

Even if the regular expression matches the empty string, the match will fail if the input is empty, or has run out of data. For instance suppose the third line of the query is the regular expression @/.*/, but the input is a file which has only two lines. This will fail: the data has no line for the regular expression to match. A line containing no characters is not the same thing as the absence of a line, even though both abstractions imply an absence of characters.

Like text which follows a variable, a regular expression directive which follows a variable has special semantics, described in the section Variables below.

 

6.8 Variables

Much of the query syntax consists of arbitrary text, which matches file data character for character. Embedded within the query may be variables and directives which are introduced by a @ character. Two consecutive @@ characters encode a literal @.

A variable matching or substitution directive is written in one of several ways:


  @
sident
  @{
bident}
  @*
sident
  @*{
bident}
  @{
bident /regex/}
  @{
bident (fun [arg ... ])}
  @{
bident number}

The forms with an * indicate a long match, see Longest Match below. The last two three forms with the embedded regexp /regex/ or number or function have special semantics; see Positive Match below.

The identifier t cannot be used as a name; it is a reserved symbol which denotes the value true. An attempt to use the variable @t will result in an exception. The symbol nil can be used where a variable name is required syntactically, but it has special semantics, described in a section below.

A sident is a "simple identifier" form which is not delimited by braces.

A sident consists of any combination of one or more letters, numbers, and underscores. It may not look like a number, so that for instance 123 is not a valid sident, but 12A is valid. Case is sensitive, so that FOO is different from foo, which is different from Foo.

The braces around an identifier can be used when material which follows would otherwise be interpreted as being part of the identifier. When a name is enclosed in braces it is a bident.

The following additional characters may be used as part of bident which are not allowed in a sident:


 ! $ % & * + - < = > ? \ _ ~

The rule still holds that a name cannot look like a number so +123 is not a valid bident but these are valid: a->b, *xyz*, foo-bar.

The syntax @FOO_bar introduces the name FOO_bar, whereas @{FOO}_bar means the variable named "FOO" followed by the text "_bar". There may be whitespace between the @ and the name, or opening brace. Whitespace is also allowed in the interior of the braces. It is not significant.

If a variable has no prior binding, then it specifies a match. The match is determined from some current position in the data: the character which immediately follows all that has been matched previously. If a variable occurs at the start of a line, it matches some text at the start of the line. If it occurs at the end of a line, it matches everything from the current position to the end of the line.

 

6.9 Negative Match

If a variable is one of the plain forms


  @
sident
  @{
bident}
  @*
sident
  @*{
bident}

then this is a "negative match". The extent of the matched text (the text bound to the variable) is determined by looking at what follows the variable, and ranges from the current position to some position where the following material finds a match. This is why this is called a "negative match": the spanned text which ends up bound to the variable is that in which the match for the trailing material did not occur.

A variable may be followed by a piece of text, a regular expression directive, a function call, a directive, another variable, or nothing (i.e. occurs at the end of a line). These cases are described in detail below.

 

6.9.1 Variable Followed by Nothing

If the variable is followed by nothing, the negative match extends from the current position in the data, to the end of the line. Example:
code:
 a b c @FOO
data:
 a b c defghijk
result:
 FOO="defghijk"

 

6.9.2 Variable Followed by Text

For the purposes of determining the negative match, text is defined as a sequence of literal text and regular expressions, not divided by a directive. So for instance in this example:


  @a:@/foo/bcd e@(maybe)f@(end)

the variable @a is considered to be followed by ":@/foo/bcd e".

If a variable is followed by text, then the extent of the negative match is determined by searching for the first occurrence of that text within the line, starting at the current position.

The variable matches everything between the current position and the matching position (not including the matching position). Any whitespace which follows the variable (and is not enclosed inside braces that surround the variable name) is part of the text. For example:

code:
 a b @FOO e f
data:
 a b c d e f
result:
 FOO="c d"

In the above example, the pattern text "a b " matches the data "a b ". So when the @FOO variable is processed, the data being matched is the remaining "c d e f". The text which follows @FOO is " e f". This is found within the data "c d e f" at position 3 (counting from 0). So positions 0-2 ("c d") constitute the matching text which is bound to FOO.

 

6.9.3 Variable Followed by a Function Call or Directive

If the variable is followed by a function call, or a directive, the extent is determined by scanning the text for the first position where a match occurs for the entire remainder of the line. (For a description of functions, see Functions.)

For example:


  @foo@(bind a "abc")xyz

Here, foo will match the text from the current position to where "xyz" occurs, even though there is a @(bind) directive. Furthermore, if more material is added after the xyz, it is part of the search. Note the difference between the following two:


  @foo@/abc/@(func)
  @foo@(func)@/abc/

In the first example, the variable foo matches the text from the current position until the match for the regular expression abc. @(func) is not considered when processing @foo. In the second example, the variable foo matches the text from the current position until the position which matches the function call, followed by a match for the regular expression. The entire sequence @(func)@/abc/ is considered.

 

6.9.4 Consecutive Variables

If an unbound variable specifies a fixed-width match or a regular expression, then the issue of consecutive variables does not arise. Such a variable consumes text regardless of any context which follows it.

However, what if an unbound variable with no modifier is followed by another variable? The behavior depends on the nature of the other variable.

If the other variable is also unbound, and also has no modifier, this is a semantic error which will cause the query to fail. A diagnostic message will be issued, unless operating in quiet mode via -q. The reason is that there is no way to bind two consecutive variables to an extent of text; this is an ambiguous situation, since there is no matching criterion for dividing the text between two variables. (In theory, a repetition of the same variable, like @FOO@FOO, could find a solution by dividing the match extent in half, which would work only in the case when it contains an even number of characters. This behavior seems to have dubious value).

An unbound variable may be followed by one which is bound. The bound variable is effectively replaced by the text which it denotes, and the logic proceeds accordingly.

It is possible for a variable to be bound to a regular expression. If x is an unbound variable and y is bound to a regular expression RE, then @x@y means @x@/RE/. A variable v can be bound to a regular expression using, for example, @(bind v #/RE/).

The @* syntax for longest match is available. Example:

code:
 @FOO:@BAR@FOO
data:
 xyz:defxyz
result:
 FOO=xyz, BAR=def

Here, FOO is matched with "xyz", based on the delimiting around the colon. The colon in the pattern then matches the colon in the data, so that BAR is considered for matching against "defxyz". BAR is followed by FOO, which is already bound to "xyz". Thus "xyz" is located in the "defxyz" data following "def", and so BAR is bound to "def".

If an unbound variable is followed by a variable which is bound to a list, or nested list, then each character string in the list is tried in turn to produce a match. The first match is taken.

An unbound variable may be followed by another unbound variable which specifies a regular expression or function call match. This is a special case called a "double variable match". What happens is that the text is searched using the regular expression or function. If the search fails, than neither variable is bound: it is a matching failure. If the search succeeds, than the first variable is bound to the text which is skipped by the search. The second variable is bound to the text matched by the regular expression or function. Examples:

code:
 @foo@{bar /abc/}
data:
 xyz@#abc
result:
 foo="xyz@#", BAR="abc"

 

6.9.5 Consecutive Variables Via Directive

Two variables can be de facto consecutive in a manner shown in the following example:


  @var1@(all)@var2@(end)

This is treated just like the variable followed by directive. No semantic error is identified, even if both variables are unbound. Here, @var2 matches everything at the current position, and so @var1 ends up bound to the empty string.

Example 1: b matches at position 0 and a binds the empty string:

code:
 @a@(all)@b@(end)
data:
 abc
result:
 a=""
 b="abc"

Example 2: *a specifies longest match (see Longest Match below), and so it takes everything:

code:
 @*a@(all)@b@(end)
data:
 abc
result:
 a="abc"
 b=""

 

6.9.6 Longest Match

The closest-match behavior for the negative match can be overridden to longest match behavior. A special syntax is provided for this: an asterisk between the @ and the variable, e.g:
code:
 a @*{FOO}cd
data:
 a b cdcdcdcd
result:
 FOO="b cdcdcd"

code:
 a @{FOO}cd
data:
 a b cdcdcd
result:
 FOO="b "
 b=""

In the former example, the match extends to the rightmost occurrence of "cd", and so FOO receives "b cdcdcd". In the latter example, the * syntax isn't used, and so a leftmost match takes place. The extent covers only the "b ", stopping at the first "cd" occurrence.

 

6.10 Positive Match

There are syntactic variants of variable syntax which have an embedded expression enclosed with the variable in braces:


  @{
bident /regex/}
  @{
bident (fun [args...])}
  @{
bident number}

These specify a variable binding that is driven by a positive match derived from a regular expression, function or character count, rather than from trailing material (which is regarded as a "negative" match, since the variable is bound to material which is skipped in order to match the trailing material). In the /regex/ form, the match extends over all characters from the current position which match the regular expression regex. (see Regular Expressions section below). In the (fun [args ...]) form, the match extends over characters which are matched by the call to the function, if the call succeeds. Thus @{x (y z w)} is just like @(y z w), except that the region of text skipped over by @(y z w) is also bound to the variable x. See Functions below.

In the number form, the match processes a field of text which consists of the specified number of characters, which must be non-negative number. If the data line doesn't have that many characters starting at the current position, the match fails. A match for zero characters produces an empty string. The text which is actually bound to the variable is all text within the specified field, but excluding leading and trailing whitespace. If the field contains only spaces, then an empty string is extracted.

This syntax is processed without consideration of what other syntax follows. A positive match may be directly followed by an unbound variable.

 

6.11 Special Symbols nil and t

Just like in the Common Lisp language, the names nil and t are special.

nil symbol stands for the empty list object, an object which marks the end of a list, and Boolean false. It is synonymous with the syntax () which may be used interchangeably with nil in most constructs.

In TXR Lisp, nil and t cannot be used as variables. When evaluated, they evaluate to themselves.

In the TXR pattern language, nil can be used in the variable binding syntax, but does not create a binding; it has a special meaning. It allows the variable matching syntax to be used to skip material, in ways similar to the skip directive.

The nil symbol is also used as a block name, both in the TXR pattern language and in TXR Lisp. A block named nil is considered to be anonymous.

 

6.12 Keyword Symbols

Names whose names begin with the : character are keyword symbols. These also may not be used as variables either and stand for themselves. Keywords are useful for labeling information and situations.

 

6.13 Regular Expressions

Regular expressions are a language for specifying sets of character strings. Through the use of pattern matching elements, regular expression is able to denote an infinite set of texts. TXR contains an original implementation of regular expressions, which supports the following syntax:

.
The period is a "wildcard" that matches any character.
[]
Character class: matches a single character, from the set specified by special syntax written between the square brackets. This supports basic regexp character class syntax. POSIX notation like [:digit:] is not supported. The regex tokens \s, \d and \w are permitted in character classes, but not their complementing counterparts. These tokens simply contribute their characters to the class. The class [a-zA-Z] means match an uppercase or lowercase letter; the class [0-9a-f] means match a digit or a lowercase letter; the class [^0-9] means match a non-digit, and so forth. There are no locale-specific behaviors in TXR regular expressions; [A-Z] denotes an ASCII/Unicode range of characters. The class [\d.] means match a digit or the period character. A ] or - can be used within a character class, but must be escaped with a backslash. A ^ in the first position denotes a complemented class, unless it is escaped by backslash. In any other position, it denotes itself. Two backslashes code for one backslash. So for instance [\[\-] means match a [ or - character, [^^] means match any character other than ^, and [\^\\] means match either a ^ or a backslash. Regex operators such as *, + and & appearing in a character class represent ordinary characters. The characters -, ] and ^ occurring outside of a character class are ordinary. Unescaped / characters can appear within a character class. The empty character class [] matches no character at all, and its complement [^] matches any character, and is treated as a synonym for the . (period) wildcard operator.
\s, \w and \d
These regex tokens each match a single character. The \s regex token matches a wide variety of ASCII whitespace characters and Unicode spaces. The \w token matches alphabetic word characters; it is equivalent to the character class [A-Za-z_]. The \d token matches a digit, and is equivalent to [0-9].
\S, \W and \D
These regex tokens are the complemented counterparts of \s, \w and \d. The \S token matches all those characters which \s does not match, \W matches all characters that \w does not match and \D matches nondigits.
empty
An empty expression is a regular expression. It represents the set of strings consisting of the empty string; i.e. it matches just the empty string. The empty regex can appear alone as a full regular expression (for instance the TXR syntax @// with nothing between the slashes) and can also be passed as a subexpression to operators, though this may require the use of parentheses to make the empty regex explicit. For example, the expression a| means: match either a, or nothing. The forms * and (*) are syntax errors; though not useful, the correct way to match the empty expression zero or more times is the syntax ()*.
nomatch
The nomatch regular expression represents the empty set: it matches no strings at all, not even the empty string. There is no dedicated syntax to directly express nomatch in the regex language. However, the empty character class [] is equivalent to nomatch, and may be considered to be a notation for it. Other representations of nomatch are possible: for instance, the regex ~.* which is the complement of the regex that denotes the set of all possible strings, and thus denotes the empty set. A nomatch has uses; for instance, it can be used to temporarily "comment out" regular expressions. The regex ([]abc|xyz) is equivalent to (xyz), since the []abc branch cannot match anything. Using [] to "block" a subexpression allows you to leave it in place, then enable it later by removing the "block".
(R)
If R is a regular expression, then so is (R). The contents of parentheses denote one regular expression unit, so that for instance in (RE)*, the * operator applies to the entire parenthesized group. The syntax () is valid and equivalent to the empty regular expression.
R?
Optionally match the preceding regular expression R.
R*
Match the expression R zero or more times. This operator is sometimes called the "Kleene star", or "Kleene closure". The Kleene closure favors the longest match. Roughly speaking, if there are two or more ways in which R1*R2 can match, than that match occurs in which R1* matches the longest possible text.
R+
Match the preceding expression R one or more times. Like R*, this favors the longest possible match: R+ is equivalent to RR*.
R1%R2
Match R1 zero or more times, then match R2. If this match can occur in more than one way, then it occurs such that R1 is matched the fewest number of times, which is opposite from the behavior of R1*R2. Repetitions of R1 terminate at the earliest point in the text where a non-empty match for R2 occurs. Because it favors shorter matches, % is termed a non-greedy operator. If R2 is the empty expression, or equivalent to it, then R1%R2 reduces to R1*. So for instance (R%) is equivalent to (R*), since the missing right operand is interpreted as the empty regex. Note that whereas the expression (R1*R2) is equivalent to (R1*)R2, the expression (R1%R2) is not equivalent to (R1%)R2.
~R
Match the opposite of the following expression R; that is, match exactly those texts that R does not match. This operator is called complement, or logical not.
R1R2
Two consecutive regular expressions denote catenation: the left expression must match, and then the right.
R1|R2
match either the expression R1 or R2. This operator is known by a number of names: union, logical or, disjunction, branch, or alternative.
R1&R2
Match both the expression R1 and R2 simultaneously; i.e. the matching text must be one of the texts which are in the intersection of the set of texts matched by R1 and the set matched by R2. This operator is called intersection, logical and, or conjunction.

Any character which is not a regular expression operator, a backslash escape, or the slash delimiter, denotes one-position match of that character itself.

Any of the special characters, including the delimiting /, and the backslash, can be escaped with a backslash to suppress its meaning and denote the character itself.

Furthermore, all of the same escapes as are described in the section Special Characters in Text above are supported - the difference is that in regular expressions, the @ character is not required, so for example a tab is coded as \t rather than @\t. Octal and hex character escapes can be optionally terminated by a semicolon, which is useful if the following characters are octal or hex digits not intended to be part of the escape.

Only the above escapes are supported. Unlike in some other regular expression implementations, if a backlash appears before a character which isn't a regex special character or one of the supported escape sequences, it is an error. This wasn't true of historic versions of TXR. See the COMPATIBILITY section.

Precedence table, highest to lowest:
OperatorsClassAssociativity
(R) []primary
R? R+ R* R%...postfixleft-to-right
R1R2catenationleft-to-right
~R ...%Runaryright-to-left
R1&R2intersectionleft-to-right
R1|R2unionleft-to-right

The % operator is like a postfix operator with respect to its left operand, but like a unary operator with respect to its right operand. Thus a~b%c~d is a(~(b%(c(~d)))) , demonstrating right-to-left associativity, where all of b% may be regarded as a unary operator being applied to c~d. Similarly, a?*+%b means (((a?)*)+)%b, where the trailing %b behaves like a postfix operator.

In TXR, regular expression matches do not span multiple lines. The regex language has no feature for multi-line matching. However, the @(freeform) directive allows the remaining portion of the input to be treated as one string in which line terminators appear as explicit characters. Regular expressions may freely match through this sequence.

It's possible for a regular expression to match an empty string. For instance, if the next input character is z, facing a the regular expression /a?/, there is a zero-character match: the regular expression's state machine can reach an acceptance state without consuming any characters. Examples:

code:
 @A@/a?/@/.*/
data:
 zzzzz
result:
 A=""

code:
 @{A /a?/}@B
data:
 zzzzz
result:
 A="", B="zzzz"

code:
 @*A@/a?/
data:
 zzzzz
result:
 A="zzzzz"

In the first example, variable @A is followed by a regular expression which can match an empty string. The expression faces the letter z at position 0 in the data line. A zero-character match occurs there, therefore the variable A takes on the empty string. The @/.*/ regular expression then consumes the line.

Similarly, in the second example, the /a?/ regular expression faces a z, and thus yields an empty string which is bound to A. Variable @B consumes the entire line.

The third example requests the longest match for the variable binding. Thus, a search takes place for the rightmost position where the regular expression matches. The regular expression matches anywhere, including the empty string after the last character, which is the rightmost place. Thus variable A fetches the entire line.

For additional information about the advanced regular expression operators, NOTES ON EXOTIC REGULAR EXPRESSIONS below.

 

6.14 Compound Expressions

If the @ escape character is followed by an open parenthesis or square bracket, this is taken to be the start of a TXR Lisp compound expression.

The TXR language has the unusual property that its syntactic elements, so-called directives, are Lisp compound expressions. These expressions not only enclose syntax, but expressions which begin with certain symbols de facto behave as tokens in a phrase structure grammar. For instance, the expression @(collect) begins a block which must be terminated by the expression @(end), otherwise there is a syntax error. The collect expression can contain arguments which modify the behavior of the construct, for instance @(collect :gap 0 :vars (a b)). In some ways, this situation might be compared to the HTML language, in which an element such as <a> must be terminated by </a> and can have attributes such as <a href="...">.

Compound contain subexpressions: other compound expressions, or literal objects of various kinds. Among these are: symbols, numbers, string literals, character literals, quasiliterals and regular expressions. These are described in the following sections. Additional kinds of literal objects exist, which are discussed in the TXR LISP section of the manual.

Some examples of compound expressions are:


  (banana)


  (a b c (d e f))


  (  a (b (c d) (e  ) ))


  ("apple" #\b #\space 3)


  (a #/[a-z]*/ b)


  (_ `@file.txt`)

Symbols occurring in a compound expression follow a slight more permissive lexical syntax than the bident in the syntax @{bident} introduced earlier. The / (slash) character may be part of an identifier, or even constitute an entire identifier. In fact a symbol inside a directive is a lident. This is described in the Symbol Tokens section under TXR LISP. A symbol must not be a number; tokens that look like numbers are treated as numbers and not symbols.

 

6.15 Character Literals

Character literals are introduced by the #\ syntax, which is either followed by a character name, the letter x followed by hex digits, the letter o followed by octal digits, or a single character. Valid character names are:


  nul                 linefeed            return
  alarm               newline             esc
  backspace           vtab                space
  tab                 page                pnul

For instance #\esc denotes the escape character.

This convention for character literals is similar to that of the Scheme language. Note that #\linefeed and #\newline are the same character. The #\pnul character is specific to TXR and denotes the U+DC00 code in Unicode; the name stands for "pseudo-null", which is related to its special function. For more information about this, see the section "Character Handling and International Characters".

 

6.16 String Literals

String literals are delimited by double quotes. A double quote within a string literal is encoded using \" and a backslash is encoded as \\. Backslash escapes like \n and \t are recognized, as are hexadecimal escapes like \xFF or \xxabc and octal escapes like \123. Ambiguity between an escape and subsequent text can be resolved by using trailing semicolon delimiter: "\xabc;d" is a string consisting of the character U+0ABC followed by "d". The semicolon delimiter disappears. To write a literal semicolon immediately after a hex or octal escape, write two semicolons, the first of which will be interpreted as a delimiter. Thus, "\x21;;" represents "!;".

If the line ends in the middle of a literal, it is an error, unless the last character is a backslash. This backslash is a special escape which does not denote a character; rather, it indicates that the string literal continues on the next line. The backslash is deleted, along with whitespace which immediately precedes it, as well as leading whitespace in the following line. The escape sequence "\ " (backslash space) can be used to encode a significant space.

Example:


  "foo   \
   bar"


  "foo   \
  \ bar"


  "foo\  \
   bar"

The first string literal is the string "foobar". The second two are "foo bar".

 

6.17 Word List Literals

A word list literal (WLL) provides a convenient way to write a list of strings when such a list can be given as whitespace-delimited words.

There are two flavors of the WLL: the regular WLL which begins with #" (hash, double-quote) and the splicing list literal which begins with #*" (hash, star, double-quote).

Both types are terminated by a double quote, which may be escaped as \" in order to include it as a character. All the escaping conventions used in string literals can be used in word literals.

Unlike in string literals, whitespace (tabs and spaces) is not significant in word literals: it separates words. Whitespace may be escaped with a backslash in order to include it as a literal character.

Just like in string literals, an unescaped newline character is not allowed. A newline preceded by a backslash is permitted. Such an escaped backslash, together with any leading and trailing unescaped whitespace, is removed and replaced with a single space.

Example:


  #"abc def ghi"   --> notates ("abc" "def" "ghi")


  #"abc   def \
      ghi"         --> notates ("abc" "def" "ghi")


  #"abc\ def ghi" --> notates ("abc def" "ghi")


  #"abc\ def\ \
   \ ghi"         --> notates ("abc def " " ghi")

A splicing word literal differs from a word literal in that it does not produce a list of string literals, but rather it produces a sequence of string literals that is merged into the surrounding syntax. Thus, the following two notations are equivalent:


  (1 2 3 #*"abc def" 4 5 #"abc def")


  (1 2 3 "abc" "def" 4 5 ("abc" "def"))

The regular WLL produced a single list object, but the splicing WLL expanded into multiple string literal objects.

 

6.18 String Quasiliterals

Quasiliterals are similar to string literals, except that they may contain variable references denoted by the usual @ syntax. The quasiliteral represents a string formed by substituting the values of those variables into the literal template. If a is bound to "apple" and b to "banana", the quasiliteral `one @a and two @{b}s` represents the string "one apple and two bananas". A backquote escaped by a backslash represents itself. Unlike in directive syntax, two consecutive @ characters do not code for a literal @, but cause a syntax error. The reason for this is that compounding of the @ syntax is meaningful. Instead, there is a \@ escape for encoding a literal @ character. Quasiliterals support the full output variable syntax. Expressions within variable substitutions follow the evaluation rules of TXR Lisp. This hasn't always been the case: see the COMPATIBILITY section.

Quasiliterals can be split into multiple lines in the same way as ordinary string literals.

 

6.19 Quasiword List Literals

The quasiword list literals (QLL-s) are to quasiliterals what WLL-s are to ordinary literals. (See the above section Word List Literals.)

A QLL combines the convenience of the WLL with the power of quasistrings.

Just as in the case of WLL-s, there are two flavors of the QLL: the regular QLL which begins with #`  (hash, backquote) and the splicing QLL which begins with #*`  (hash, star, backquote).

Both types are terminated by a backquote, which may be escaped as \`  in order to include it as a character. All the escaping conventions used in quasiliterals can be used in QLL.

Unlike in quasiliterals, whitespace (tabs and spaces) is not significant in QLL: it separates words. Whitespace may be escaped with a backslash in order to include it as a literal character.

A newline is not permitted unless escaped. An escaped newline works exactly the same way as it does in word list literals (WLL-s).

Note that the delimiting into words is done before the variable substitution. If the variable a contains spaces, then #`@a` nevertheless expands into a list of one item: the string derived from a.

Examples:


  #`abc @a ghi`  --> notates (`abc` `@a` `ghi`)


  #`abc   @d@e@f \
  ghi`            --> notates (`abc` `@d@e@f` `ghi`)


  #`@a\ @b @c` --> notates (`@a @b` `@c`)

A splicing QLL differs from an ordinary QLL in that it does not produce a list of quasiliterals, but rather it produces a sequence of quasiliterals that is merged into the surrounding syntax.

 

6.20 Numbers

TXR supports integers and floating-point numbers.

An integer constant is made up of digits 0 through 9, optionally preceded by a + or - sign.

Examples:


  123
  -34
  +0
  -0
  +234483527304983792384729384723234

An integer constant can also be specified in hexadecimal using the prefix #x followed by an optional sign, followed by hexadecimal digits: 0 through 9 and the upper or lower case letters A through F:


  #xFF    ;; 255
  #x-ABC  ;; -2748

Similarly, octal numbers are supported with the prefix #o followed by octal digits:


  #o777   ;; 511

and binary numbers can be written with a #b prefix:


  #b1110  ;; 14

A floating-point constant is marked by the inclusion of a decimal point, the exponential "e notation", or both. It is an optional sign, followed by a mantissa consisting of digits, a decimal point, more digits, and then an optional exponential notation consisting of the letter e or E, an optional + or - sign, and then digits indicating the exponent value. In the mantissa, the digits are not optional. At least one digit must either precede the decimal point or follow. That is to say, a decimal point by itself is not a floating-point constant.

Examples:


  .123
  123.
  1E-3
  20E40
  .9E1
  9.E19
  -.5
  +3E+3
  1.E5

Examples which are not floating-point constant tokens:


  .      ;; dot token, not a number
  123E   ;; the symbol 123E
  1.0E-  ;; syntax error: invalid floating point constant
  1.0E   ;; syntax error: invalid floating point constant
  1.E    ;; syntax error: invalid floating point literal
  .e     ;; syntax error: dot token followed by symbol

In TXR there is a special "dotdot" token consisting of two consecutive periods. An integer constant followed immediately by dotdot is recognized as such; it is not treated as a floating constant followed by a dot. That is to say, 123.. does not mean 123. . (floating point 123.0 value followed by dot token). It means 123 .. (integer 123 followed by .. token).

Dialect note: unlike in Common Lisp, 123. is not an integer, but the floating-point number 123.0.

 

6.21 Comments

Comments of the form @; were introduced earlier. Inside compound expressions, another convention for comments exists: Lisp comments, which are introduced by the ; (semicolon) character and span to the end of the line.

Example:


  @(foo  ; this is a comment
    bar  ; this is another comment
    )

This is equivalent to @(foo bar).

 

7 DIRECTIVES

 

7.1 Overview

When a TXR Lisp compound expressions occurs in TXR preceded by a @, it is a directive.

Directives which are based on certain symbols are, additionally, involved in a phrase-structure syntax which uses Lisp expressions as if they were tokens.

For instance, the directive


  @(collect)

not only denotes a compound expression with the collect symbol in its head position, but it also introduces a syntactic phrase which requires a matching @(end) directive. In other words, @(collect) is not only an expression, but serves as a kind of token in a higher level phrase structure grammar.

Effectively, collect is a reserved symbol in the TXR language. A TXR program cannot use this symbol as the name of a pattern function, due to its role in the syntax. Lisp code, of course, can use the symbol.

Usually if this type of directive occurs alone in a line, not preceded or followed by other material, it is involved in a "vertical" (or line oriented) syntax.

If such a directive is embedded in a line (has preceding or trailing material) then it is in a horizontal syntactic and semantic context (character-oriented).

There is an exception: the definition of a horizontal function looks like this:


  @(define name (arg))body material@(end)

Yet, this is considered one vertical item, which means that it does not match a line of data. (This is necessary because all horizontal syntax matches something within a line of data, which is undesirable for definitions.)

Many directives exhibit both horizontal and vertical syntax, with different but closely related semantics. A few are vertical only, and some are horizontal only.

A summary of the available directives follows:

@(eof)
Explicitly match the end of file. Fails if unmatched data remains in the input stream.

@(eol)
Explicitly match the end of line. Fails if the current position is not the end of a line. Also fails if no data remains (there is no current line).

@(next)
Continue matching in another file or other data source.

@(block)
Groups together a sequence of directives into a logical name block, which can be explicitly terminated from within using the @(accept) and @(fail) directives. Blocks are described in the section BLOCKS below.

@(skip)
Treat the remaining query as a subquery unit, and search the lines (or characters) of the input file until that subquery matches somewhere. A skip is also an anonymous block.

@(trailer)
Treat the remaining query or subquery as a match for a trailing context. That is to say, if the remainder matches, the data position is not advanced.

@(freeform)
Treat the remainder of the input as one big string, and apply the following query line to that string. The newline characters (or custom separators) appear explicitly in that string.

@(fuzz)
The fuzz directive, inspired by the patch utility, specifies a partial match for some lines.

@(line) and @(chr)
These directives match a variable or expression against the current line number or character position.

@(name)
Match a variable against the name of the current data source.

@(data)
Match a variable against the remaining data (lazy list of strings).

@(some)
Multiple clauses are each applied to the same input. Succeeds if at least one of the clauses matches the input. The bindings established by earlier successful clauses are visible to the later clauses.

@(all)
Multiple clauses are applied to the same input. Succeeds if and only if each one of the clauses matches. The clauses are applied in sequence, and evaluation stops on the first failure. The bindings established by earlier successful clauses are visible to the later clauses.

@(none)
Multiple clauses are applied to the same input. Succeeds if and only if none of them match. The clauses are applied in sequence, and evaluation stops on the first success. No bindings are ever produced by this construct.

@(maybe)
Multiple clauses are applied to the same input. No failure occurs if none of them match. The bindings established by earlier successful clauses are visible to the later clauses.

@(cases)
Multiple clauses are applied to the same input. Evaluation stops on the first successful clause.

@(require)
The require directive is similar to the do directive: it evaluates one or more TXR Lisp expressions. If the result of the rightmost expression is nil, then require triggers a match failure. See the TXR LISP section far below.

@(if), @(elif), and @(else)
The if directive with optional elif and else clauses allows one of multiple bodies of pattern matching directives to be conditionally selected by testing the values of Lisp expressions.

@(choose)
Multiple clauses are applied to the same input. The one whose effect persists is the one which maximizes or minimizes the length of a particular variable.

@(empty)
The @(empty) directive matches the empty string. It is useful in certain situations, such as expressing an empty match in a directive that doesn't accept an empty clause. The @(empty) syntax has another meaning in @(output) clauses, in conjunction with @(repeat).

@(define name (args ...))
Introduces a function. Functions are described in the Functions section below.

@(call expr args*)
Performs function indirection. Evaluates expr, which must produce a symbol that names a pattern function. Then that pattern function is invoked.

@(gather)
Searches text for matches for multiple clauses which may occur in arbitrary order. For convenience, lines of the first clause are treated as separate clauses.

@(collect)
Search the data for multiple matches of a clause. Collect the bindings in the clause into lists, which are output as array variables. The @(collect) directive is line oriented. It works with a multi-line pattern and scans line by line. A similar directive called @(coll) works within one line.

A collect is an anonymous block.

@(and)
Separator of clauses for @(some), @(all), @(none), @(maybe) and @(cases). Equivalent to @(or). The choice is stylistic.

@(or)
Separator of clauses for @(some), @(all), @(none), @(maybe) and @(cases). Equivalent to @(and). The choice is stylistic.

@(end)
Required terminator for @(some), @(all), @(none), @(maybe), @(cases), @(if), @(collect), @(coll), @(output), @(repeat), @(rep), @(try), @(block) and @(define).

@(fail)
Terminate the processing of a block, as if it were a failed match. Blocks are described in the section BLOCKS below.

@(accept)
Terminate the processing of a block, as if it were a successful match. What bindings emerge may depend on the kind of block: collect has special semantics. Blocks are described in the section BLOCKS below.

@(try)
Indicates the start of a try block, which is related to exception handling, described in the EXCEPTIONS section below.

@(catch) and @(finally)
Special clauses within @(try). See EXCEPTIONS below.

@(defex) and @(throw)
Define custom exception types; throw an exception. See EXCEPTIONS below.

@(assert)
The assert directive requires the following material to match, otherwise it throws an exception. It is useful for catching mistakes or omissions in parts of a query that are sure-fire matches.

@(flatten)
Normalizes a set of specified variables to one-dimensional lists. Those variables which have scalar value are reduced to lists of that value. Those which are lists of lists (to an arbitrary level of nesting) are converted to flat lists of their leaf values.

@(merge)
Binds a new variable which is the result of merging two or more other variables. Merging has somewhat complicated semantics.

@(cat)
Decimates a list (any number of dimensions) to a string, by catenating its constituent strings, with an optional separator string between all of the values.

@(bind)
Binds one or more variables against a value using a structural pattern match. A limited form of unification takes place which can cause a match to fail.

@(set)
Destructively assigns one or more existing variables using a structural pattern, using syntax similar to bind. Assignment to unbound variables triggers an error.

@(rebind)
Evaluates an expression in the current binding environment, and then creates new bindings for the variables in the structural pattern. Useful for temporarily overriding variable values in a scope.

@(forget)
Removes variable bindings.

@(local)
Synonym of @(forget).

@(output)
A directive which encloses an output clause in the query. An output section does not match text, but produces text. The directives above are not understood in an output clause.

@(repeat)
A directive understood within an @(output) section, for repeating multi-line text, with successive substitutions pulled from lists. The directive @(rep) produces iteration over lists horizontally within one line. These directives have a different meaning in matching clauses, providing a shorthand notation for @(repeat :vars nil) and @(rep :vars nil), respectively.

@(deffilter)
This directive is used for defining named filters, which are useful for filtering variable substitutions in output blocks. Filters are useful when data must be translated between different representations that have different special characters or other syntax, requiring escaping or similar treatment. Note that it is also possible to use a function as a filter. See Function Filters below.

@(filter)
The filter directive passes one or more variables through a given filter or chain or filters, updating them with the filtered values.

@(load) and @(include)
These directives allow TXR programs to be modularized. They bring in code from a file, in two different ways.

@(do)
The do directive is used to evaluate TXR Lisp expressions, discarding their result values. See the TXR LISP section far below.

 

7.2 Subexpression Evaluation

Some directives contain subexpressions which are evaluated. Two distinct styles of evaluations occur in TXR: bind expressions and Lisp expressions. Which semantics applies to an expression depends on the syntactic context in which it occurs: which position in which directive.

The evaluation of TXR Lisp expressions is described in the TXR LISP section of the manual.

Bind expressions are so named because they occur in the @(bind) directive. TXR pattern function invocations also treat argument expressions as bind expressions.

The @(rebind), @(set), @(merge), and @(deffilter) directives also use bind expression evaluation. Bind expression evaluation also occurs in the argument position of the :tlist keyword in the @(next) directive.

Unlike Lisp expressions, bind expressions do not support operators. If a bind expression is a nested list structure, it is a template denoting that structure. Any symbol in any position of that structure is interpreted as a variable. When the bind expression is evaluated, those corresponding positions in the template are replaced by the values of the variables.

Anywhere where a variable can appear in a bind expression's nested list structure, a Lisp expression can appear preceded by the @ character. That Lisp expression is evaluated and its value is substituted into the bind expression's template.

Moreover, a Lisp expression preceded by @ can be used as an entire bind expression. The value of that Lisp expression is then taken as the bind expression value.

Any object in a bind expression which is not a nested list structure containing Lisp expressions or variables denotes itself literally.

Examples:

In the following examples, the variables a and b are assumed to have the string values "foo" and "bar", respectively.

The -> notation indicates the value of each expression.


  a              ->  "foo"
  (a b)          ->  ("foo" "bar")
  ((a) ((b) b))  ->  (("foo") (("bar") "bar"))
  (list a b)     ->  error: unbound variable list
  @(list a b)    ->  ("foo" "bar") ;; Lisp expression
  (a @[b 1..:])  ->  ("foo" "ar")  ;; Lisp eval of [b 1..:]
  (a @(+ 2 2))   ->  ("foo" 4)     ;; Lisp eval of (+ 2 2)
  #(a b)         ->  (a b)         ;; Vector literal, not list.
  [a b]          ->  error: unbound variable dwim

The last example above [a b] is a notation equivalent to (dwim a b) and so follows similarly to the example involving list.

 

7.3 Input Scanning and Data Manipulation

 

7.3.1 The next directive

The next directive indicates that the remaining directives in the current block are to be applied against a new input source.

It can only occur by itself as the only element in a query line, and takes various arguments, according to these possibilities:


  @(next)
  @(next
source)
  @(next
source :nothrow)
  @(next :args)
  @(next :env)
  @(next :list
lisp-expr)
  @(next :tlist
bind-expr)
  @(next :string
lisp-expr)
  @(next nil)

The lone @(next) without arguments specifies that subsequent directives will match inside the next file in the argument list which was passed to TXR on the command line.

If source is given, it must be a TXR Lisp expression which denotes an input source. Its value may be a string or an input stream. For instance, if variable A contains the text "data", then @(next A) means switch to the file called "data", and @(next `@A.txt`) means to switch to the file "data.txt". The directive @(next (open-command `git log`)) switches to the input stream connected to the output of the git log command.

If the input source cannot be opened for whatever reason, TXR throws an exception (see EXCEPTIONS below). An unhandled exception will terminate the program. Often, such a drastic measure is inconvenient; if @(next) is invoked with the :nothrow keyword, then if the input source cannot be opened, the situation is treated as a simple match failure.

The variant @(next :args) means that the remaining command line arguments are to be treated as a data source. For this purpose, each argument is considered to be a line of text. The argument list does include that argument which specifies the file that is currently being processed or was most recently processed. As the arguments are matched, they are consumed. This means that if a @(next) directive without arguments is executed in the scope of @(next :args), it opens the file named by the first unconsumed argument.

To process arguments, and then continue with the original file and argument list, wrap the argument processing in a @(block). When the block terminates, the input source and argument list are restored to what they were before the block.

The variant @(next :env) means that the list of process environment variables is treated as a source of data. It looks like a text file stream consisting of lines of the form "name=value". If this feature is not available on a given platform, an exception is thrown.

The syntax @(next :list lisp-expr) treats TXR Lisp expression lisp-expr as a source of text. The value of lisp-expr is flattened to a simple list in a way similar to the @(flatten) directive. The resulting list is treated as if it were the lines of a text file: each element of the list must be a string, which represents a line. If the strings happen contain embedded newline characters, they are a visible constituent of the line, and do not act as line separators.

The syntax @(next :tlist bind-expr) is very similar to @(next :list ...) except that bind-expr is not a TXR Lisp expression, but a TXR bind expression.

The syntax @(next :string lisp-expr) treats expression lisp-expr as a source of text. The value of the expression must be a string. Newlines in the string are interpreted as line terminators.

A string which is not terminated by a newline is tolerated, so that:


  @(next :string "abc")
  @a

binds a to "abc". Likewise, this is also the case with input files and other streams whose last line is not terminated by a newline.

However, watch out for empty strings, which are analogous to a correctly formed empty file which contains no lines:


  @(next :string "")
  @a

This will not bind a to ""; it is a matching failure. The behavior of :list is different. The query


  @(next :list "")
  @a

binds a to "". The reason is that under :list the string "" is flattened to the list ("") which is not an empty input stream, but a stream consisting of one empty line.

The @(next nil) variant indicates that the following subquery is applied to empty data, and the list of data sources from the command line is considered empty. This directive is useful in front of TXR code which doesn't process data sources from the command line, but takes command line arguments. The @(next nil) incantation absolutely prevents TXR from trying to open the first command line argument as a data source.

Note that the @(next) directive only redirect the source of input over the scope of subquery in which the next directive appears, not necessarily all remaining directives. For example, the following query looks for the line starting with "xyz" at the top of the file "foo.txt", within a some directive. After the @(end) which terminates the @(some), the "abc" is matched in the previous input stream which was in effect before the @(next) directive:


  @(some)
  @(next "foo.txt")
  xyz@suffix
  @(end)
  abc

However, if the @(some) subquery successfully matched "xyz@suffix" within the file foo.text, there is now a binding for the suffix variable, which is visible to the remainder of the entire query. The variable bindings survive beyond the clause, but the data stream does not.

 

7.3.2 The skip directive

The skip directive considers the remainder of the query as a search pattern. The remainder is no longer required to strictly match at the current line in the current input stream. Rather, the current stream is searched, starting with the current line, for the first line where the entire remainder of the query will successfully match. If no such line is found, the skip directive fails. If a matching position is found, the remainder of the query is processed from that point.

Of course, the remainder of the query can itself contain skip directives. Each such directive performs a recursive subsearch.

Skip comes in vertical and horizontal flavors. For instance, skip and match the last line:


  @(skip)
  @last
  @(eof)

Skip and match the last character of the line:


  @(skip)@{last 1}@(eol)

The skip directive has two optional arguments, which are evaluated as TXR Lispexpressions. If the first argument evaluates to an integer, its value limits the range of lines scanned for a match. Judicious use of this feature can improve the performance of queries.

Example: scan until "size: @SIZE" matches, which must happen within the next 15 lines:


  @(skip 15)
  size: @SIZE

Without the range limitation skip will keep searching until it consumes the entire input source. In a horizontal skip, the range-limiting numeric argument is expressed in characters, so that


  abc@(skip 5)def

means: there must be a match for "abc" at the start of the line, and then within the next five characters, there must be a match for "def".

Sometimes a skip is nested within a collect, or following another skip. For instance, consider:


  @(collect)
  begin @BEG_SYMBOL
  @(skip)
  end @BEG_SYMBOL
  @(end)

The above collect iterates over the entire input. But, potentially, so does the embedded skip. Suppose that "begin x" is matched, but the data has no matching "end x". The skip will search in vain all the way to the end of the data, and then the collect will try another iteration back at the beginning, just one line down from the original starting point. If it is a reasonable expectation that an end x occurs 15 lines of a "begin x", this can be specified instead:


  @(collect)
  begin @BEG_SYMBOL
  @(skip 15)
  end @BEG_SYMBOL
  @(end)

If the symbol nil is used in place of a number, it means to scan an unlimited range of lines; thus, @(skip nil) is equivalent to @(skip).

If the symbol :greedy is used, it changes the semantics of the skip to longest match semantics. For instance, match the last three space-separated tokens of the line:


  @(skip :greedy) @a @b @c

Without :greedy, the variable @c will can match multiple tokens, and end up with spaces in it, because nothing follows @c and so it matches from any position which follows a space to the end of the line. Also note the space in front of @a. Without this space, @a will get an empty string.

A line oriented example of greedy skip: match the last line without using @eof:


  @(skip :greedy)
  @last_line

There may be a second numeric argument. This specifies a minimum number of lines to skip before looking for a match. For instance, skip 15 lines and then search indefinitely for begin ...:


  @(skip nil 15)
  begin @BEG_SYMBOL

The two arguments may be used together. For instance, the following matches if, and only if, the 15th line of input starts with begin :


  @(skip 1 15)
  begin @BEG_SYMBOL

Essentially, @(skip 1 n) means "hard skip by n lines". @(skip 1 0) is the same as @(skip 1), which is a noop, because it means: "the remainder of the query must match starting on the very next line", or, more briefly, "skip exactly zero lines", which is the behavior if the skip directive is omitted altogether.

Here is one trick for grabbing the fourth line from the bottom of the input:


  @(skip)
  @fourth_from_bottom
  @(skip 1 3)
  @(eof)

Or using greedy skip:


  @(skip :greedy)
  @fourth_from_bottom
  @(skip 1 3)

Nongreedy skip with the @(eof) has a slight advantage because the greedy skip will keep scanning even though it has found the correct match, then backtrack to the last good match once it runs out of data. The regular skip with explicit @(eof) will stop when the @(eof) matches.

 

7.3.3 Reducing Backtracking with Blocks

skip can consume considerable CPU time when multiple skips are nested. Consider:


  @(skip)
  A
  @(skip)
  B
  @(skip)
  C

This is actually nesting: the second a third skips occur within the body of the first one, and thus this creates nested iteration. TXR is searching for the combination of skips which find match the pattern of lines A, B and C, with backtracking behavior. The outermost skip marches through the data until it finds A, followed by a pattern match for the second skip. The second skip iterates within to find B, followed by the third skip, and the third skip iterates to find C. If there is only one line A, and one B, then this is reasonably fast. But suppose there are many lines matching A and B, giving rise to a large number combinations of skips which match A and B, and yet do not find a match for C, triggering backtracking. The nested stepping which tries the combinations of A and B can give rise to a considerable running time.

One way to deal with the problem is to unravel the nesting with the help of blocks. For example:


  @(block)
  @  (skip)
  A
  @(end)
  @(block)
  @  (skip)
  B
  @(end)
  @(skip)
  C

Now the scope of each skip is just the remainder of the block in which it occurs. The first skip finds A, and then the block ends. Control passes to the next block, and backtracking will not take place to a block which completed (unless all these blocks are enclosed in some larger construct which backtracks, causing the blocks to be re-executed.

Of course, this rewrite is not equivalent, and cannot be used for instance in backreferencing situations such as:


  @;
  @; Find three lines anywhere in the input which are identical.
  @;
  @(skip)
  @line
  @(skip)
  @line
  @(skip)
  @line

This example depends on the nested search-within-search semantics.

 

7.3.4 The trailer directive

The trailer directive introduces a trailing portion of a query or subquery which matches input material normally, but in the event of a successful match, does not advance the current position. This can be used, for instance, to cause @(collect) to match partially overlapping regions.

Example:


  @(collect)
  @line
  @(trailer)
  @(skip)
  @line
  @(end)

This script collects each line which has a duplicate somewhere later in the input. Without the @(trailer) directive, this does not work properly for inputs like:


  111
  222
  111
  222

Without @(trailer), the first duplicate pair constitutes a match which spans over the 222. After that pair is found, the matching continues after the second 111.

With the @(trailer) directive in place, the collect body, on each iteration, only consumes the lines matched prior to @(trailer).

 

7.3.5 The freeform directive

The freeform directive provides a useful alternative to TXR's line-oriented matching discipline. The freeform directive treats all remaining input from the current input source as one big line. The query line which immediately follows freeform is applied to that line.

The syntax variations are:


  @(freeform)
  ... query line ..


  @(freeform
number)
  ... query line ..


  @(freeform
string)
  ... query line ..


  @(freeform
number string)
  ... query line ..

where number and string denote TXR Lisp expressions which evaluate to an integer or string value, respectively.

If number and string are both present, they may be given in either order.

If the number argument is given, its value limits the range of lines which are combined together. For instance @(freeform 5) means to only consider the next five lines to to be one big line. Without this argument, freeform is "bottomless". It can match the entire file, which creates the risk of allocating a large amount of memory.

If the string argument is given, it specifies a custom line terminator. The default terminator is "\n". The terminator does not have to be one character long.

Freeform does not convert the entire remainder of the input into one big line all at once, but does so in a dynamic, lazy fashion, which takes place as the data is accessed. So at any time, only some prefix of the data exists as a flat line in which newlines are replaced by the terminator string, and the remainder of the data still remains as a list of lines.

After the subquery is applied to the virtual line, the unmatched remainder of that line is broken up into multiple lines again, by looking for and removing all occurrences of the terminator string within the flattened portion.

Care must be taken if the terminator is other than the default "\n". All occurrences of the terminator string are treated as line terminators in the flattened portion of the data, so extra line breaks may be introduced. Likewise, in the yet unflattened portion, no breaking takes place, even if the text contains occurrences of the terminator string. The extent of data which is flattened, and the amount of it which remains, depends entirely on the query line underneath @(flatten).

In the following example, lines of data are flattened using $ as the line terminator.

code:
 @(freeform "$")
 @a$@b:
 @c
 @d

data:
 1
 2:3
 4

output (-B):
 a="1"
 b="2"
 c="3"
 d="4"

The data is turned into the virtual line 1$2:3$4$. The @a$@b: subquery matches the 1$2: portion, binding a to "1", and b to "2". The remaining portion 3$4$ is then split into separate lines again according to the line terminator $i:


  3
  4

Thus the remainder of the query


  @c
  @d

faces these lines, binding c to 3 and d to 4. Note that since the data does not contain dollar signs, there is no ambiguity; the meaning may be understood in terms of the entire data being flattened and split again.

In the following example, freeform is used to solve a tokenizing problem. The Unix password file has fields separated by colons. Some fields may be empty. Using freeform, we can join the password file using ":" as a terminator. By restricting freeform to one line, we can obtain each line of the password file with a terminating ":", allowing for a simple tokenization, because now the fields are colon-terminated rather than colon-separated.

Example:


  @(next "/etc/passwd")
  @(collect)
  @(freeform 1 ":")
  @(coll)@{token /[^:]*/}:@(end)
  @(end)

 

7.3.6 The fuzz directive

The fuzz directive allows for an imperfect match spanning a set number of lines. It takes two arguments, both of which are TXR Lisp expressions that should evaluate to integers:

@(fuzz m n)
  ...

This expresses that over the next n query lines, the matching strictness is relaxed a little bit. Only m out of those n lines have to match. Afterward, the rest of the query follows normal, strict processing.

In the degenerate situation that there are fewer than n query lines following the fuzz directive, then m of them must succeed nevertheless. (If there are fewer than m, then this is impossible.)

 

7.3.7 The line and chr directives

The line and chr directives perform binding between the current input line number or character position within a line, against an expression or variable:


  @(line 42)
  @(line x)
  abc@(chr 3)def@(chr y)

The directive @(line 42) means "match the current input line number against the integer 42". If the current line is 42, then the directive matches, otherwise it fails. line is a vertical directive which doesn't consume a line of input. Thus, the following matches at the beginning of an input stream, and x ends up bound to the first line of input:


  @(line 1)
  @(line 1)
  @(line 1)
  @x

The directive @(line x) binds variable x to the current input line number, if x is an unbound variable. If x is already bound, then the value of x must match the current line number, otherwise the directive fails.

The chr directive is similar to line except that it's a horizontal directive, and matches the character position rather than the line position. Character positions are measured from zero, rather than one. chr does not consume a character. Hence the two occurrences of chr in the following example both match, and x takes the entire line of input:


  @(chr 0)@(chr 0)@x

The argument of line or chr may be a @-delimited Lisp expression. This is useful for matching computed lines or character positions:


  @(line @(+ a (* b c)))

 

7.3.8 The name directive

The name directive performs a binding between the name of the current data source and a variable or bind expression:


  @(name na)
  @(name "data.txt")

If na is an unbound variable, it is bound and takes on the name of the data source, such as a file name. If na is bound, then it has to match the name of the data source, otherwise the directive fails.

The directive @(name "data.txt") fails unless the current data source has that name.

 

7.3.9 The data directive

The data directive performs a binding between the unmatched data at the current position, and and a variable or bind expression. The unmatched data takes the form of a list of strings:


  @(data d)

The binding is performed on object equality. If d is already bound, a matching failure occurs unless d contains the current unmatched data.

Matching the current data has various uses.

For instance, two branches of pattern matching can, at some point, bind the current data into different variables. When those paths join, the variables can be bound together to create the assertion that the current data had been the same at those points:


  @(all)
  @  (skip)
  foo
  @  (skip)
  bar
  @  (data x)
  @(or)
  @  (skip)
  xyzzy
  @  (skip)
  bar
  @  (data y)
  @(end)
  @(require (eq x y))

Here, two branches of the @(all) match some material which ends in the line bar. However, it is possible that this is a different line. The data directives are used to create an assertion that the data regions matched by the two branches are identical. That is to say, the unmatched data x captured after the first bar and the unmatched data y captured after the second bar must be the same object in order for @(require (eq x y)) to succeed, which implies that the same bar was matched in both branches of the @(all).

Another use of data is simply to gain access to the trailing remainder of the unmatched input in order to print it, or do some special processing on it.

The tprint Lisp function is useful for printing the unmatched data as newline-terminated lines:


  @(data remainder)
  @(do (tprint remainder))

 

7.3.10 The some, all, none, maybe, cases and choose directives

These directives, called the parallel directives, combine multiple subqueries, which are applied at the same input position, rather than to consecutive input.

They come in vertical (line mode) and horizontal (character mode) flavors.

In horizontal mode, the current position is understood to be a character position in the line being processed. The clauses advance this character position by moving it to the right. In vertical mode, the current position is understood to be a line of text within the stream. A clause advances the position by some whole number of lines.

The syntax of these parallel directives follows this example:


  @(some)
  subquery1
  .
  .
  .
  @(and)
  subquery2
  .
  .
  .
  @(and)
  subquery3
  .
  .
  .
  @(end)

And in horizontal mode:


  @(some)subquery1...@(and)subquery2...@(and)subquery3...@(end)

Long horizontal lines can be broken up with line continuations, allowing the above example to be written like this, which is considered a single logical line:


  @(some)@\
     subquery1...@\
  @(and)@\
     subquery2...@\
  @(and)@\
     subquery3...@\
  @(end)

The @(some), @(all), @(none), @(maybe), @(cases) or @(choose) must be followed by at least one subquery clause, and be terminated by @(end). If there are two or more subqueries, these additional clauses are indicated by @(and) or @(or), which are interchangeable. The separator and terminator directives also must appear as the only element in a query line.

The choose directive requires keyword arguments. See below.

The syntax supports arbitrary nesting. For example:


  QUERY:            SYNTAX TREE:


  @(all)            all -+
  @  (skip)              +- skip -+
  @  (some)              |        +- some -+
  it                     |        |        +- TEXT
  @  (and)               |        |        +- and
  @    (none)            |        |        +- none -+
  was                    |        |        |        +- TEXT
  @    (end)             |        |        |        +- end
  @  (end)               |        |        +- end
  a dark                 |        +- TEXT
  @(end)                 *- end

nesting can be indicated using whitespace between @ and the directive expression. Thus, the above is an @(all) query containing a @(skip) clause which applies to a @(some) that is followed by the text line "a dark". The @(some) clause combines the text line "it", and a @(none) clause which contains just one clause consisting of the line "was".

The semantics of the parallel directives is:

@(all)
Each of the clauses is matched at the current position. If any of the clauses fails to match, the directive fails (and thus does not produce any variable bindings). Clauses following the failed directive are not evaluated. Bindings extracted by a successful clause are visible to the clauses which follow, and if the directive succeeds, all of the combined bindings emerge.

@(some [ :resolve (var ...) ])
Each of the clauses is matched at the current position. If any of the clauses succeed, the directive succeeds, retaining the bindings accumulated by the successfully matching clauses. Evaluation does not stop on the first successful clause. Bindings extracted by a successful clause are visible to the clauses which follow.

The :resolve parameter is for situations when the @(some) directive has multiple clauses that need to bind some common variables to different values: for instance, output parameters in functions. Resolve takes a list of variable name symbols as an argument. This is called the resolve set. If the clauses of @(some) bind variables in the resolve set, those bindings are not visible to later clauses. However, those bindings do emerge out of the @(some) directive as a whole. This creates a conflict: what if two or more clauses introduce different bindings for a variable in the resolve set? This is why it is called the resolve set: conflicts for variables in the resolve set are automatically resolved in favor of later directives.

Example:


  @(some :resolve (x))
  @  (bind a "a")
  @  (bind x "x1")
  @(or)
  @  (bind b "b")
  @  (bind x "x2")
  @(end)

Here, the two clauses both introduce a binding for x. Without the :resolve parameter, this would mean that the second clause fails, because x comes in with the value "x1", which does not bind with "x2". But because x is placed into the resolve set, the second clause does not see the "x1" binding. Both clauses establish their bindings independently creating a conflict over x. The conflict is resolved in favor of the second clause, and so the bindings which emerge from the directive are:


  a="a"
  b="b"
  x="x2"

@(none)
Each of the clauses is matched at the current position. The directive succeeds only if all of the clauses fail. If any clause succeeds, the directive fails, and subsequent clauses are not evaluated. Thus, this directive never produces variable bindings, only matching success or failure.

@(maybe)
Each of the clauses is matched at the current position. The directive always succeeds, even if all of the clauses fail. Whatever bindings are found in any of the clauses are retained. Bindings extracted by any successful clause are visible to the clauses which follow.

@(cases)
Each of the clauses is matched at the current position. The clauses are matched, in order, at the current position. If any clause matches, the matching stops and the bindings collected from that clause are retained. Any remaining clauses after that one are not processed. If no clause matches, the directive fails, and produces no bindings.

@(choose [ :longest var | :shortest var ])
Each of the clauses is matched at the current position in order. In this construct, bindings established by an earlier clause are not visible to later clauses. Although any or all of the clauses can potentially match, the clause which succeeds is the one which maximizes or minimizes the length of the text bound to the specified variable. The other clauses have no effect.

For all of the parallel directives other than @(none) and @(choose), the query advances the input position by the greatest number of lines that match in any of the successfully matching subclauses that are evaluated. The @(none) directive does not advance the input position.

For instance if there are two subclauses, and one of them matches three lines, but the other one matches five lines, then the overall clause is considered to have made a five line match at its position. If more directives follow, they begin matching five lines down from that position.

 

7.3.11 The require directive

The syntax of @(require) is:


  @(require
lisp-expression)

The require directive evaluates a TXR Lisp expression. (See TXR LISP far below.) If the expression yields a true value, then it succeeds, and matching continues with the directives which follow. Otherwise the directive fails.

In the context of the require directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.

Example:


  @; require that 4 is greater than 3
  @; This succeeds; therefore, @a is processed
  @(require (> (+ 2 2) 3))
  @a

 

7.3.12 The if directive

The if directive allows for conditional selection of pattern matching clauses, based on the Boolean results Lisp expressions.

The syntax of the if directive can be exemplified as follows:


  @(if
lisp-expr)
  .
  .
  .
  @(elif
lisp-expr)
  .
  .
  .
  @(elif
lisp-expr)
  .
  .
  .
  @(else)
  .
  .
  .
  @(end)

The @(elif) and @(else) clauses are all optional. If @(else) is present, it must be last, before @(end), after any @(elif) clauses. Any of the clauses may be empty.

Example:


  @(if (> (length str) 42))
  foo: @a @b
  @(else)
  {@c}
  @(end)

In this example, if the length of the variable str is greater than 42, then matching continues with "foo: @a b", otherwise it proceeds with {@c}.

More precisely, how the if directive works is as follows. The Lisp expressions are evaluated in order, starting with the if expression, then the elif expressions if any are present. If any Lisp expression yields a true result (any value other than nil) then evaluation of Lisp expressions stops. The corresponding clause of that Lisp expression is selected and pattern matching continues with that clauses. The result of that clause (its success or failure, and any newly bound variables) is then taken as the result of the if directive. If none of the Lisp expressions yield true, and an else clause is present, then that clause is processed and its result determines the result of the if directive. If none of the Lisp expressions yield true, and there is no else clause, then the if directive is deemed to have trivially succeeded, allowing matching to continue with whatever directive follows it.

 

7.3.13 The gather directive

Sometimes text is structured as items that can appear in an arbitrary order. When multiple matches need to be extracted, there is a combinatorial explosion of possible orders, making it impractical to write pattern matches for all the possible orders.

The gather directive is for these situations. It specifies multiple clauses which all have to match somewhere in the data, but in any order.

For further convenience, the lines of the first clause of the gather directive are implicitly treated as separate clauses.

The syntax follows this pattern


  @(gather)
  one-line-query1
  one-line-query2
  .
  .
  .
  one-line-queryN
  @(and)
  multi
  line
  query1
  .
  .
  .
  @(and)
  multi
  line
  query2
  .
  .
  .
  @(end)

Of course the multi-line clauses are optional. The gather directive takes keyword parameters, see below.

 

7.3.14 The until / last clause in gather

Similarly to collect, gather has an optional until/last clause:


  @(gather)
  ...
  @(until)
  ...
  @(end)

How gather works is that the text is searched for matches for the single line and multi-line queries. The clauses are applied in the order in which they appear. Whenever one of the clauses matches, any bindings it produces are retained and it is removed from further consideration. Multiple clauses can match at the same text position. The position advances by the longest match from among the clauses which matched. If no clauses match, the position advances by one line. The search stops when all clauses are eliminated, and then the cumulative bindings are produced. If the data runs out, but unmatched clauses remain, the directive fails.

Example: extract several environment variables, which do not appear in a particular order:


  @(next :env)
  @(gather)
  USER=@USER
  HOME=@HOME
  SHELL=@SHELL
  @(end)

If the until or last clause is present and a match occurs, then the matches from the other clauses are discarded and the gather terminates. The difference between until/last is that any bindings bindings established in last are retained, and the input position is advanced past the matched material. The until/last clause has visibility to bindings established in the previous clauses in that same iteration, even though those bindings end up thrown away.

For consistency, the :mandatory keyword is supported in the until/last clause of gather. The semantics of using :mandatory in this situation is tricky. In particular, if it is in effect, and the gather terminates successfully by collecting all required matches, it will trigger a failure. On the other hand, if the until or last clause activates before all required matches are gathered, a failure also occurs, whether or not the clause is :mandatory.

Meaningful use of :mandatory requires that the gather be open-ended; it must allow some (or all) variables not to be required. The presence of the option means that for the gather to succeed, all required variables must be gathered first, but then termination must be achieved via the until/last clause before all gather clauses are satisfied.

 

7.3.15 Keyword parameters in gather

The gather directive accepts the keyword parameter :vars. The argument to vars is a list of required and optional variables. A required variable is specified as a symbol. An optional variable is specified as a two element list which pairs a symbol with a Lisp expression. That Lisp expression is evaluated and specifies the default value for the variable.

Example:


  @(gather :vars (a b c (d "foo")))
  ...
  @(end)

Here, a, b and c are required variables, and d is optional, with the default value given by the Lisp expression "foo".

The presence of :vars changes the behavior in three ways.

Firstly, even if all the clauses in the gather match successfully and are eliminated, the directive will fail if the required variables do not have bindings. It doesn't matter whether the bindings are existing, or whether they are established by the gather.

Secondly, if some of the clauses of the gather did not match, but all of the required variables have bindings, then the directive succeeds. Without the presence of :vars, it would fail in this situation.

Thirdly, if gather succeeds (all required variables have bindings), then all of the optional variables which do not have bindings are given bindings to their default values.

The expressions which give the default values are evaluated whenever the gather directive is evaluated, whether or not their values are used.

 

7.3.16 The collect directive

The syntax of the collect directive is:


  @(collect)
  ... lines of subquery
  @(end)

or with an until or last clause:


  @(collect)
  ... lines of subquery: main clause
  @(until)
  ... lines of subquery: until clause
  @(end)


  @(collect)
  ... lines of subquery: main clause
  @(last)
  ... lines of subquery: last clause
  @(end)

The repeat symbol may be specified instead of collect, which changes the meaning, see below:


  @(repeat)
  ... lines of subquery
  @(end)

The subquery is matched repeatedly, starting at the current line. If it fails to match, it is tried starting at the subsequent line. If it matches successfully, it is tried at the line following the entire extent of matched data, if there is one. Thus, the collected regions do not overlap. (Overlapping behavior can be obtained: see the @(trailer) directive).

Unless certain keywords are specified, or unless the collection is explicitly failed with @(fail), it always succeeds, even if it collects nothing, and even if the until/last clause never finds a match.

If no until/last last clause is specified, and the collect is not limited using parameters, the collection is unbounded: it consumes the entire data file. If any query material follows such the collect clause, it will fail if it tries to match anything in the current file; but of course, it is possible to continue matching in another file by means of @(next).

 

7.3.17 The until / last clause in collect

If an until/last last clause is specified, the collection stops when that clause matches at the current position.

If an until clause terminates collect, no bindings are collected at that position, even if the main clause matches at that position also. Moreover, the position is not advanced. The remainder of the query begins matching at that position.

If a last clause terminates collect, the behavior is different. Any bindings captured by the main clause are thrown away, just like with the until clause. However, the bindings in the last clause itself survive, and the position is advanced to skip over that material.

Example:

code:
 @(collect)
 @a
 @(until)
 42
 @b
 @(end)
 @c
data:
 1
 2
 3
 42
 5
 6
result:
 a[0]="1"
 a[1]="2"
 a[2]="3"
 c="42"

The line 42 is not collected, even though it matches @a. Furthermore, the @(until) does not advance the position, so variable c takes 42.

If the @(until) is changed to @(last) the output will be different:

result:
 a[0]="1"
 a[1]="2"
 a[2]="3"
 b="5"
 c="6"

The 42 is not collected into the a list, just like before. But now the binding captured by @b emerges. Furthermore, the position advances so variable now takes 6.

The binding variables within the clause of a collect are treated specially. The multiple matches for each variable are collected into lists, which then appear as array variables in the final output.

Example:

code:
 @(collect)
 @a:@b:@c
 @(end)
data:
 John:Doe:101
 Mary:Jane:202
 Bob:Coder:313
result:
 a[0]="John"
 a[1]="Mary"
 a[2]="Bob"
 b[0]="Doe"
 b[1]="Jane"
 b[2]="Coder"
 c[0]="101"
 c[1]="202"
 c[2]="313"

The query matches the data in three places, so each variable becomes a list of three elements, reported as an array.

Variables with list bindings may be referenced in a query. They denote a multiple match. The -D command line option can establish a one-dimensional list binding.

The clauses of collect may be nested. Variable matches collated into lists in an inner collect, are again collated into nested lists in the outer collect. Thus an unbound variable wrapped in N nestings of @(collect) will be an N-dimensional list. A one dimensional list is a list of strings; a two dimensional list is a list of lists of strings, etc.

It is important to note that the variables which are bound within the main clause of a collect. That is, the variables which are subject to collection appear, within the collect, as normal one-value bindings. The collation into lists happens outside of the collect. So for instance in the query:


 @(collect)
 @x=@x
 @(end)

The left @x establishes a binding for some material preceding an equal sign. The right @x refers to that binding. The value of @x is different in each iteration, and these values are collected. What finally comes out of the collect clause is a single variable called x which holds a list containing each value that was ever instantiated under that name within the collect clause.

Also note that the until clause has visibility over the bindings established in the main clause. This is true even in the terminating case when the until clause matches, and the bindings of the main clause are discarded.

 

7.3.18 Keyword parameters in collect

By default, collect searches the rest of the input indefinitely, or until the until/last clause matches. It skips arbitrary amounts of nonmatching material before the first match, and between matches.

Within the @(collect) syntax, it is possible to specify keyword parameters for additional control of the behavior. A keyword parameter consist of a keyword symbol followed by an argument, enclosed within the @(collect) syntax. The following are the supported keywords.

:maxgap n
The :maxgap keyword takes a numeric argument n, which is a Lisp expression. It causes the collect to terminate if it fails to find a match after skipping n lines from the starting position, or more than five lines since any successful match. For example,


  @(collect :maxgap 5)

specifies that the gap between the current position and the first match for the body of the collect, or between consecutive matches can be no longer than five lines. A :maxgap value of 0 means that the collected regions must be adjacent. For instance:


  @(collect :maxgap 0)
  M @a
  @(end)

means: from here, collect consecutive lines of the form "M ...". This will not search for the first such line, nor will it skip lines which do not match this form.

:mingap n
The :mingap keyword complements :maxgap, though not exactly. Its argument n, a Lisp expression, specifies a minimum number of lines which must separate consecutive matches. However, it has no effect on the distance from the starting position to the first match.

:gap n
The :gap keyword effectively specifies :mingap and :maxgap at the same time, and can only be used if these other two are not used. Thus:


  @(collect :gap 1)
  @a
  @(end)

means: collect every other line starting with the current line.

:times n
This shorthand means the same thing as if
:mintimes n :maxtimes n
were specified. This means that exactly n matches must occur. If fewer occur, then the collect fails. Collect stops once it achieves n matches.

:mintimes n
The argument n of the :mintimes keyword is a Lisp expression which specifies that at least n matches must occur, or else the collect fails.

:mintimes n
The Lisp argument expression n of the :mintimes keyword specifies that at most n matches are collected.

:lines n
The argument n of the :lines keyword parameter is a Lisp expression which specifies the upper bound on how many lines should be scanned by collect, measuring from the starting position. The extent of the collect body is not counted. Example:


  @(collect :lines 2)
  foo: @a
  bar: @b
  baz: @c
  @(end)

The above collect will look for a match only twice: at the current position, and one line down.

:vars ({variable | (variable default-value)}*)
The :vars keyword specifies a restriction on what variables will emanate from the collect. Its argument is a list of variable names. An empty list may be specified using empty parentheses or, equivalently, the symbol nil. The default-value element of the syntax is a Lisp expression. The behavior of the :vars keyword is specified in the following section, "Specifying variables in collect".

:counter {variable | (variable starting-value)}
The :counter keyword's argument is a variable name symbol, or a compound expression consisting of a variable name symbol and the TXR Lisp expression starting-value. If this keyword argument is specified, then a binding for variable is established prior to each repetition of the collect body, to an integer value representing the repetition count. By default, repetition counts begin at zero. If starting-value is specified, it must evaluate to a number. This number is then added to each repetition count, and variable takes on the resulting displaced value.

If there is an existing binding for variable prior to the processing of the collect, then the variable is shadowed.

The binding is collected in the same way as other bindings that are established in the collect body.

The repetition count only increments after a successful match.

The variable is visible to the collect's until/last clause. If that clause is being processed after a successful match of the body, then variable holds an integer value. If the body fails to match, then the until/last clause sees a binding for variable with a value of nil.

 

7.3.19 Specifying variables in collect

Normally, any variable for which a new binding occurs in a collect block is collected. A collect clause may be "sloppy": it can neglect to collect some variables on some iterations, or bind some variables which are intended to behave like local temporaries, but end up collated into lists. Another issue is that the collect clause might not match anything at all, and then none of the variables are bound.

The :vars keyword allows the query writer to add discipline the collect body.

The argument to :vars is a list of variable specs. A variable spec is either a symbol, or a (symbol default-value) pair, where default-value is a Lisp expression whose value specifies a default value for the variable.

When a :vars list is specified, it means that only the given variables can emerge from the successful collect. Any newly introduced bindings for other variables do not propagate.

Furthermore, for any variable which is not specified with a default value, the collect body, whenever it matches successfully, must bind that variable. If it neglects to bind the variable, an exception of type query-error is thrown. (If a collect body matches successfully, but produces no new bindings, then this error is suppressed.)

For any variable which does have a default value, if the collect body neglects to bind that variable, the behavior is as if collect did bind that variable to that default value.

The default values are expressions, and so can be quasiliterals.

Lastly, if in the event that collect does not match anything, the variables specified in vars (whether or not they have a default value) are all bound to empty lists. (These bindings are established after the processing of the until/last last clause, if present.)

Example:


  @(collect :vars (a b (c "foo")))
  @a @c
  @(end)

Here, if the body "@a @c" matches, an error will be thrown because one of the mandatory variables is b, and the body neglects to produce a binding for b.

Example:


  @(collect :vars (a (c "foo")))
  @a @b
  @(end)

Here, if "@a @b" matches, only a will be collected, but not b, because b is not in the variable list. Furthermore, because there is no binding for c in the body, a binding is created with the value "foo", exactly as if c matched such a piece of text.

In the following example, the assumption is that THIS NEVER MATCHES is not found anywhere in the input but the line THIS DOES MATCH is found and has a successor which is bound to a. Because the body did not match, the :vars a and b should be bound to empty lists. But a is bound by the last clause to some text, so this takes precedence. Only b is bound to an empty list.


  @(collect :vars (a b))
  THIS NEVER MATCHES
  @(last)
  THIS DOES MATCH
  @a
  @(end)

The following means: do not allow any variables to propagate out of any iteration of the collect and therefore collect nothing:


  @(collect :vars nil)
  ...
  @(end)

Instead of writing @(collect :vars nil), it is possible to write @(repeat). @(repeat) takes all collect keywords, except for :vars. There is a @(repeat) directive used in @(output) clauses; that is a different directive.

 

7.3.20 Mandatory until and last

The until/last clause supports the option keyword :mandatory, exemplified by the following:


  @(collect)
  ...
  @(last :mandatory)
  ...
  @(end)

This means that the collect must be terminated by a match for the until/last clause, or else by an explicit @(accept).

Specifically, the collect cannot terminate due to simply running out of data, or exceeding a limit on the number of matches that may be collected. In those situations, if an until or last clause is present with :mandatory, the collect is deemed to have failed.

 

7.3.21 The coll directive

The coll directive is the horizontal version of collect. Whereas collect works with multi-line clauses on line-oriented material, coll works within a single line. With coll, it is possible to recognize repeating regularities within a line and collect lists.

Regular-expression based Positive Match variables work well with coll.

Example: collect a comma-separated list, terminated by a space.

code:
 @(coll)@{A /[^, ]+/}@(until) @(end)@B
data:
 foo,bar,xyzzy blorch
result:
 A[0]="foo"
 A[1]="bar"
 A[2]="xyzzy"
 B=blorch

Here, the variable A is bound to tokens which match the regular expression /[^, ]+/: non-empty sequence of characters other than commas or spaces.

Like collect, coll searches for matches. If no match occurs at the current character position, it tries at the next character position. Whenever a match occurs, it continues at the character position which follows the last character of the match, if such a position exists.

If not bounded by an until clause, it will exhaust the entire line. If the until clause matches, then the collection stops at that position, and any bindings from that iteration are discarded. Like collect, coll also supports an until/last clause, which propagates variable bindings and advances the position. The :mandatory keyword is supported.

coll clauses nest, and variables bound within a coll are available to clauses within the rest of the coll clause, including the until/last clause, and appear as single values. The final list aggregation is only visible after the coll clause.

The behavior of coll leads to difficulties when a delimited variable are used to match material which is delimiter separated rather than terminated. For instance, entries in a comma-separated files usually do not appear as "a,b,c," but rather "a,b,c".

So for instance, the following result is not satisfactory:

code:
 @(coll)@a @(end)
data:
 1 2 3 4 5
result:
 a[0]="1"
 a[1]="2"
 a[2]="3"
 a[3]="4"

The 5 is missing because it isn't followed by a space, which the text-delimited variable match "@a " looks for. After matching "4 ", coll continues to look for matches, and doesn't find any. It is tempting to try to fix it like this:

code:
 @(coll)@a@/ ?/@(end)
data:
 1 2 3 4 5
result:
 a[0]=""
 a[1]=""
 a[2]=""
 a[3]=""
 a[4]=""
 a[5]=""
 a[6]=""
 a[7]=""
 a[8]=""

The problem now is that the regular expression / ?/ (match either a space or nothing), matches at any position. So when it is used as a variable delimiter, it matches at the current position, which binds the empty string to the variable, the extent of the match being zero. In this situation, the coll directive proceeds character by character. The solution is to use positive matching: specify the regular expression which matches the item, rather than a trying to match whatever follows. The collect directive will recognize all items which match the regular expression:

code:
 @(coll)@{a /[^ ]+/}@(end)
data:
 1 2 3 4 5
result:
 a[0]="1"
 a[1]="2"
 a[2]="3"
 a[3]="4"
 a[4]="5"

The until clause can specify a pattern which, when recognized, terminates the collection. So for instance, suppose that the list of items may or may not be terminated by a semicolon. We must exclude the semicolon from being a valid character inside an item, and add an until clause which recognizes a semicolon:

code:
 @(coll)@{a /[^ ;]+/}@(until);@(end);
data:
 1 2 3 4 5;
result:
 a[0]="1"
 a[1]="2"
 a[2]="3"
 a[3]="4"
 a[4]="5"

Whether followed by the semicolon or not, the items are collected properly.

Note that the @(end) is followed by a semicolon. That's because when the @(until) clause meets a match, the matching material is not consumed.

This repetition can, of course, be avoided by using @(last) instead of @(until) since @(last) consumes the terminating material.

Instead of the above regular-expression-based approach, this extraction problem can also be solved with cases:

code:
 @(coll)@(cases)@a @(or)@a@(end)@(end)
data:
 1 2 3 4 5
result:
 a[0]="1"
 a[1]="2"
 a[2]="3"
 a[3]="4"
 a[4]="5"

 

7.3.22 Keyword parameters in coll

The @(coll) directive takes most of the same parameters as @(collect). See the section Keyword parameters in collect above. So for instance @(coll :gap 0) means that the collects must be consecutive, and @(coll :maxtimes 2) means that at most two matches will be collected. The :lines keyword does not exist, but there is an analogous :chars keyword.

The @(coll) directive takes the :vars keyword.

The shorthand @(rep) may be used instead of @(coll :vars nil). @(rep) takes all keywords, except :vars.

 

7.3.23 The flatten directive

The flatten directive can be used to convert variables to one dimensional lists. Variables which have a scalar value are converted to lists containing that value. Variables which are multidimensional lists are flattened to one-dimensional lists.

Example (without @(flatten))

code:
 @b
 @(collect)
 @(collect)
 @a
 @(end)
 @(end)
data:
 0
 1
 2
 3
 4
 5
result:
 b="0"
 a_0[0]="1"
 a_1[0]="2"
 a_2[0]="3"
 a_3[0]="4"
 a_4[0]="5"

Example (with @(flatten)):

code:
 @b
 @(collect)
 @(collect)
 @a
 @(end)
 @(end)
 @(flatten a b)
data:
 0
 1
 2
 3
 4
 5
result:
 b="0"
 a[0]="1"
 a[1]="2"
 a[2]="3"
 a[3]="4"
 a[4]="5"

 

7.3.24 The merge directive

The syntax of merge follows the pattern:

@(merge destination [sources ...])

destination is a variable, which receives a new binding. sources are bind expressions.

The merge directive provides a way of combining two or more variables or expressions in a somewhat complicated but very useful way. A new binding is created for the destination variable, which holds the result of the operation.

This directive is useful for combining the results from collects at different levels of nesting into a single nested list such that parallel elements are at equal depth.

The merge directive performs its special function if invoked with at least three arguments: a destination and two sources.

The one-argument case @(merge x) binds a new variable x and initializes it with the empty list and is thus equivalent to @(bind x). Likewise, the two-argument case @(merge x y) is equivalent to @(bind x y), establishing a binding for x which is initialized with the value of y.

To understand what merge does when two sources are given, as in @(merge C A B), we first have to define a property called depth. The depth of an atom such as a string is defined as 1. The depth of an empty list is 0. The depth of a nonempty list is one plus the depth of its deepest element. So for instance "foo" has depth 1, ("foo") has depth 2, and ("foo" ("bar")) has depth three.

We can now define a binary (two argument) merge(A, B) function as follows. First, merge(A, B) normalizes the values A and B to produce a pair of values which have equal depth, as defined above. If either value is an atom it is first converted to a one-element list containing that atom. After this step, both values are lists; and the only way an argument has depth zero is if it is an empty list. Next, if either value has a smaller depth than the other, it is wrapped in a list as many times as needed to give it equal depth. For instance if A is (a) and B is (((("b" "c") ("d" "e)))) then A is converted to (((("a")))). Finally, the list values are appended together to produce the merged result. In the case of the preceding two example values, the result is: (((("a"))) ((("b" "c") ("d" "e)))). The result is stored into a the newly bound destination variable C.

If more than two source arguments are given, these are merged by a left-associative reduction, which is to say that a three argument merge(X, Y, Z) is defined as merge(merge(X, Y), Z). The leftmost two values are merged, and then this result is merged with the third value, and so on.

 

7.3.25 The cat directive

The cat directive converts a list variable into a single piece of text. The syntax is:


  @(cat
var [sep])

The sep argument is a Lisp expression whose value specifies a separating piece of text. If it is omitted, then a single space is used as the separator.

Example:

code:
 @(coll)@{a /[^ ]+/}@(end)
 @(cat a ":")
data:
 1 2 3 4 5
result:
 a="1:2:3:4:5"

 

7.3.26 The bind directive

The syntax of the bind directive is:


  @(bind
pattern bind-expression {keyword value}*)

The bind directive is a kind of pattern match, which matches one or more variables given in pattern against a value produced by the bind-expression on the right.

Variables names occurring in the pattern expression may refer to bound variables, or may be unbound.

All variables references occurring in bind-expression must have value.

Binding occurs as follows. The tree structure of pattern and the value of bind-expression are considered to be parallel structures.

Any variables in pattern which are unbound receive a new binding, which is initialized with the structurally corresponding piece of the object produced by bind-expression.

Any variables in pattern which are already bound must match the corresponding part of the value of bind-expression, or else the bind directive fails. Variables which are already bound are not altered, retaining their current values, even if the matching is inexact.

The simplest bind is of one variable against itself, for instance bind A against A:


  @(bind A A)

This will throw an exception if A is not bound. If A is bound, it succeeds, since A matches itself.

The next simplest bind binds one variable to another:


  @(bind A B)

Here, if A is unbound, it takes on the same value as B. If A is bound, it has to match B, or the bind fails. Matching means that either

-
A and B are the same text
-
A is text, B is a list, and A occurs within B.
-
vice versa: B is text, A is a list, and B occurs within A.
-
A and B are lists and are either identical, or one is found as substructure within the other.

The right hand side does not have to be a variable. It may be some other object, like a string, quasiliteral, regexp, or list of strings, et cetera. For instance


  @(bind A "ab\tc")

will bind the string "ab\tc" to the variable A if A is unbound. If A is bound, this will fail unless A already contains an identical string. However, the right hand side of a bind cannot be an unbound variable, nor a complex expression that contains unbound variables.

The left hand side of bind can be a nested list pattern containing variables. The last item of a list at any nesting level can be preceded by a . (dot), which means that the variable matches the rest of the list from that position.

Example 1:

Suppose that the list A contains ("now" "now" "brown" "cow"). Then the directive @(bind (H N . C) A), assuming that H, N and C are unbound variables, will bind H to "how", code N to "now", and C to the remainder of the list ("brown" "cow").

Example: suppose that the list A is nested to two dimensions and contains (("how" "now") ("brown" "cow")). Then @(bind ((H N) (B C)) A) binds H to "how", N to "now", B to "brown" and C to "cow".

The dot notation may be used at any nesting level. it must be followed by an item. The forms (.) and (X .) are invalid, but (. X) is valid and equivalent to X.

The number of items in a left pattern match must match the number of items in the corresponding right side object. So the pattern () only matches an empty list. The notations () and nil mean exactly the same thing.

The symbols nil, t and keyword symbols may be used on either side. They represent themselves. For example @(bind :foo :bar) fails, but @(bind :foo :foo) succeeds since the two sides denote the same keyword symbol object.

Example 2:

In this example, suppose A contains "foo" and B contains bar. Then @(bind (X (Y Z)) (A (B "hey"))) binds X to "foo", Y to "bar" and Z to "hey". This is because the bind-expression produces the object ("foo" ("bar" "hey")) which is then structurally matched against the pattern (X (Y Z)), and the variables receive the corresponding pieces.

 

7.3.27 Keywords in the bind directive

The bind directive accepts these keywords:

:lfilt
The argument to :lfilt is a filter specification. When the left side pattern contains a binding which is therefore matched against its counterpart from the right side expression, the left side is filtered through the filter specified by :lfilt for the purposes of the comparison. For example:


  @(bind "a" "A" :lfilt :upcase)

produces a match, since the left side is the same as the right after filtering through the :upcase filter.

:rfilt
The argument to :rfilt is a filter specification. The specified filter is applied to the right hand side material prior to matching it against the left side. The filter is not applied if the left side is a variable with no binding. It is only applied to determine a match. Binding takes place the unmodified right hand side object.

For example, the following produces a match:


  @(bind "A" "a" :rfilt :upcase)

:filter
This keyword is a shorthand to specify both filters to the same value. For instance :filter :upcase is equivalent to :lfilt :upcase :rfilt :upcase.

For a description of filters, see Output Filtering below.

Of course, compound filters like (:fromhtml :upcase) are supported with all these keywords. The filters apply across arbitrary patterns and nested data.

Example:


  @(bind (a b c) ("A" "B" "C"))
  @(bind (a b c) (("z" "a") "b" "c") :rfilt :upcase)

Here, the first bind establishes the values for a, b and c, and the second bind succeeds, because the value of a matches the second element of the list ("z" "a") if it is upcased, and likewise b matches "b" and c matches "c" if these are upcased.

 

7.3.28 Lisp forms in the bind directive

TXR Lisp forms, introduced by @ may be used in the bind-expression argument of bind, or as the entire form. This is consistent with the rules for bind expressions.

TXR Lisp forms can be used in the pattern expression also.

Example:


  @(bind a @(+ 2 2))
  @(bind @(+ 2 2) @(* 2 2))

Here, a is bound to the integer 4. The second bind then succeeds because the forms (+ 2 2) and (* 2 2) produce equal values.

 

7.3.29 The set directive

The set directive syntactically resembles bind, but is not a pattern match. It overwrites the previous values of variables with new values from the right hand side. Each variable that is assigned must have an existing binding: set will not induce binding.

Examples follow.

Store the value of A back into A, an operation with no effect:


  @(set A A)

Exchange the values of A and B:


  @(set (A B) (B A))

Store a string into A:


  @(set A "text")

Store a list into A:


  @(set A ("line1" "line2"))

Destructuring assignment. A ends up with "A", B ends up with ("B1" "B2") and C binds to ("C1" "C2").


  @(bind D ("A" ("B1" "B2") "C1" "C2"))
  @(bind (A B C) (() () ()))
  @(set (A B . C) D)

Note that set does not support a TXR Lisp expression on the left side, so the following are invalid syntax:


  @(set @(+ 1 1) @(* 2 2))
  @(set @b @(list "a"))

The second one is erroneous even though there is a variable on the left. Because it is preceded by the @ escape, it is a Lisp variable, and not a pattern variable.

 

7.3.30 The rebind directive

The rebind directive resembles set but it is not an assignment. It combines the semantics of local, bind and set. The expression on the right hand side is evaluated in the current environment. Then the variables in the pattern on the left are introduced as new bindings, whose values come from the pattern.

rebind makes it easy to create temporary bindings based on existing bindings.


  @(define pattern-function (arg))
  @;; inside a pattern function:
  @(rebind recursion-level @(+ recursion-level 1))
  @;; ...
  @(end)

When the function terminates, the previous value of recursion-level is restored. The effect is like the following, but much easier to write and faster to execute:


  @(define pattern-function (arg))
  @;; inside a pattern function:
  @(local temp)
  @(set temp recursion-level)
  @(local recursion-level)
  @(set recursion-level @(+ temp 1))
  @;; ...
  @(end)

 

7.3.31 The forget directive

The forget has two spellings: @(forget) and @(local).

The arguments are one or more symbols, for example:


  @(forget a)
  @(local a b c)

this can be written


  @(local a)
  @(local a b c)

Directives which follow the forget or local directive no longer see any bindings for the symbols mentioned in that directive, and can establish new bindings.

It is not an error if the bindings do not exist.

It is strongly recommended to use the @(local) spelling in functions, because the forgetting action simulates local variables: for the given symbols, the machine forgets any earlier variables from outside of the function, and consequently, any new bindings for those variables belong to the function. (Furthermore, functions suppress the propagation of variables that are not in their parameter list, so these locals will be automatically forgotten when the function terminates.)

 

7.3.32 The do directive

The syntax of @(do) is:


  @(do
lisp-expression)

The do directive evaluates a TXR Lisp expression. (See TXR LISP far below.) The value of the expression is ignored, and matching continues continues with the directives which follow the do directive, if any.

In the context of the do directive, the expression should not be introduced by the @ symbol; it is expected to be a Lisp expression.

Example:


  @; match text into variables a and b, then insert into hash table h
  @(bind h (hash :equal-based))
  @a:@b
  @(do (set [h a] b))

 

7.4 Blocks

 

7.4.1 Overview

Blocks are sections of a query which are either denoted by a name, or are anonymous. They may nest: blocks can occur within blocks and other constructs.

Blocks are useful for terminating parts of a pattern matching search prematurely, and escaping to a higher level. This makes blocks not only useful for simplifying the semantics of certain pattern matches, but also an optimization tool.

Judicious use of blocks and escapes can reduce or eliminate the amount of backtracking that TXR performs.

 

7.4.2 The block directive

The @(block name) directive introduces a named block, except when name is the symbol nil. The @(block) directive introduces an unnamed block, equivalent to @(block nil).

The @(skip) and @(collect) directives introduce implicit anonymous blocks, as do function bodies.

 

7.4.3 Block Scope

The names of blocks are in a distinct namespace from the variable binding space. So @(block foo) is unrelated to the variable @foo.

A block extends from the @(block ...) directive which introduces it, until the matching @(end), and may be empty. For instance:


  @(some)
  abc
  @(block foo)
  xyz
  @(end)
  @(end)

Here, the block foo occurs in a @(some) clause, and so it extends to the @(end) which terminates the block. After that @(end), the name foo is not associated with a block (is not "in scope"). The second @(end) terminates the @(some) block.

The implicit anonymous block introduced by @(skip) has the same scope as the @(skip): it extends over all of the material which follows the skip, to the end of the containing subquery.

 

7.4.4 Block Nesting

Blocks may nest, and nested blocks may have the same names as blocks in which they are nested. For instance:


  @(block)
  @(block)
  ...
  @(end)
  @(end)

is a nesting of two anonymous blocks, and


  @(block foo)
  @(block foo)
  @(end)
  @(end)

is a nesting of two named blocks which happen to have the same name. When a nested block has the same name as an outer block, it creates a block scope in which the outer block is "shadowed"; that is to say, directives which refer to that block name within the nested block refer to the inner block, and not to the outer one.

 

7.4.5 Block Semantics

A block normally does nothing. The query material in the block is evaluated normally. However, a block serves as a termination point for @(fail) and @(accept) directives which are in scope of that block and refer to it.

The precise meaning of these directives is:

@(fail name)
Immediately terminate the enclosing query block called name, as if that block failed to match anything. If more than one block by that name encloses the directive, the inner-most block is terminated. No bindings emerge from a failed block.

@(fail)
Immediately terminate the innermost enclosing anonymous block, as if that block failed to match.

If the implicit block introduced by @(skip) is terminated in this manner, this has the effect of causing skip itself to fail. I.e. the behavior is as if skip search did not find a match for the trailing material, except that it takes place prematurely (before the end of the available data source is reached).

If the implicit block associated with a @(collect) is terminated this way, then the entire collect fails. This is a special behavior, because a collect normally does not fail, even if it matches nothing and collects nothing!

To prematurely terminate a collect by means of its anonymous block, without failing it, use @(accept).

@(accept name)
Immediately terminate the enclosing query block called name, as if that block successfully matched. If more than one block by that name encloses the directive, the inner-most block is terminated. Any bindings established within that block until this point emerge from that block.

@(accept)
Immediately terminate the innermost enclosing anonymous block, as if that block successfully matched. Any bindings established within that block until this point emerge from that block.

If the implicit block introduced by @(skip) is terminated in this manner, this has the effect of causing the skip itself to succeed, as if all of the trailing material had successfully matched.

If the implicit block associated with a @(collect) is terminated this way, then the collection stops. All bindings collected in the current iteration of the collect are discarded. Bindings collected in previous iterations are retained, and collated into lists in accordance with the semantics of collect.

Example: alternative way to achieve @(until) termination:


  @(collect)
  @  (maybe)
  ---
  @  (accept)
  @  (end)
  @LINE
  @(end)

This query will collect entire lines into a list called LINE. However, if the line --- is matched (by the embedded @(maybe)), the collection is terminated. Only the lines up to, and not including the --- line, are collected. The effect is identical to:


  @(collect)
  @LINE
  @(until)
  ---
  @(end)

The difference (not relevant in these examples) is that the until clause has visibility into the bindings set up by the main clause.

However, the following example has a different meaning:


  @(collect)
  @LINE
  @  (maybe)
  ---
  @  (accept)
  @  (end)
  @(end)

Now, lines are collected until the end of the data source, or until a line is found which is followed by a --- line. If such a line is found, the collection stops, and that line is not included in the collection! The @(accept) terminates the process of the collect body, and so the action of collecting the last @LINE binding into the list is not performed.

 

7.4.6 Data Extent of Terminated Blocks

A query block may have matched some material prior to being terminated by accept. In that case, it is deemed to have only matched that material, and not any material which follows. This may matter, depending on the context in which the block occurs.

Example:

code:
 @(some)
 @(block foo)
 @first
 @(accept foo)
 @ignored
 @(end)
 @second
data:
 1
 2
 3
result:
 first="1"
 second="2"

At the point where the accept occurs, the foo block has matched the first line, bound the text "1" to the variable @first. The block is then terminated. Not only does the @first binding emerge from this terminated block, but what also emerges is that the block advanced the data past the first line to the second line. Next, the @(some) directive ends, and propagates the bindings and position. Thus the @second which follows then matches the second line and takes the text "2".

In the following query, the foo block occurs inside a maybe clause. Inside the foo block there is a @(some) clause. Its first subclause matches variable @first and then terminates block foo. Since block foo is outside of the @(some) directive, this has the effect of terminating the @(some) clause:

code:
 @(maybe)
 @(block foo)
 @  (some)
 @first
 @  (accept foo)
 @  (or)
 @one
 @two
 @three
 @four
 @  (end)
 @(end)
 @second
data:
 1
 2
 3
 4
 5
result:
 first="1"
 second="2"

The second clause of the @(some) directive, namely:


  @one
  @two
  @three
  @four

is never processed. The reason is that subclauses are processed in top to bottom order, but the processing was aborted within the first clause the @(accept foo). The @(some) construct never gets the opportunity to match four lines.

If the @(accept foo) line is removed from the above query, the output is different:

code:
 @(maybe)
 @(block foo)
 @  (some)
 @first
 @#          <--  @(accept foo) removed from here!!!
 @  (or)
 @one
 @two
 @three
 @four
 @  (end)
 @(end)
 @second
data:
 1
 2
 3
 4
 5
result:
 first="1"
 one="1"
 two="2"
 three="3"
 four="4"
 second="5"

Now, all clauses of the @(some) directive have the opportunity to match. The second clause grabs four lines, which is the longest match. And so, the next line of input available for matching is 5, which goes to the @second variable.

 

7.4.7 Interaction Between the trailer and accept Directives

If one of the clauses which follow a @(trailer) requests a successful termination to an outer block via @(accept), then @(trailer) intercepts the escape and adjusts the data extent to the position that it was given.

Example:

code:
 @(block)
 @(trailer)
 @line1
 @line2
 @(accept)
 @(end)
 @line3
data:
 1
 2
 3
result:
 line1="1"
 line2="2"
 line3="1"

The variable line3 is bound to "1" because although @(accept) yields a data position which has advanced to the third line, this is intercepted by @(trailer) and adjusted back to the first line. Neglecting to do this adjustment would violate the semantics of trailer.

Directives other than @(trailer) have no such special interaction with accept.

 

7.5 Functions

 

7.5.1 Overview

TXR functions allow a query to be structured to avoid repetition. On a theoretical note, because TXR functions support recursion, functions enable TXR to match some kinds of patterns which exhibit self-embedding, or nesting, and thus cannot be matched by a regular language.

Functions in TXR are not exactly like functions in mathematics or functional languages, and are not like procedures in imperative programming languages. They are not exactly like macros either. What it means for a TXR function to take arguments and produce a result is different from the conventional notion of a function.

A TXR function may have one or more parameters. When such a function is invoked, an argument must be specified for each parameter. However, a special behavior is at play here. Namely, some or all of the argument expressions may be unbound variables. In that case, the corresponding parameters behave like unbound variables also. Thus TXR function calls can transmit the "unbound" state from argument to parameter.

It should be mentioned that functions have access to all bindings that are visible in the caller; functions may refer to variables which are not mentioned in their parameter list.

With regard to returning, TXR functions are also unconventional. If the function fails, then the function call is considered to have failed. The function call behaves like a kind of match; if the function fails, then the call is like a failed match.

When a function call succeeds, then the bindings emanating from that function are processed specially. Firstly, any bindings for variables which do not correspond to one of the function's parameters are thrown away. Functions may internally bind arbitrary variables in order to get their job done, but only those variables which are named in the function argument list may propagate out of the function call. Thus, a function with no arguments can only indicate matching success or failure, but not produce any bindings. Secondly, variables do not propagate out of the function directly, but undergo a renaming. For each parameter which went into the function as an unbound variable (because its corresponding argument was an unbound variable), if that parameter now has a value, that value is bound onto the corresponding argument.

Example:


  @(define collect-words (list))
  @(coll)@{list /[^ \t]+/}@(end)
  @(end)

The above function collect-words contains a query which collects words from a line (sequences of characters other than space or tab), into the list variable called list. This variable is named in the parameter list of the function, therefore, its value, if it has one, is permitted to escape from the function call.

Suppose the input data is:


  Fine summer day

and the function is called like this:


  @(collect-words wordlist)

The result (with txr -B) is:


  wordlist[0]=Fine
  wordlist[1]=summer
  wordlist[1]=day

How it works is that in the function call @(collect-words wordlist), wordlist is an unbound variable. The parameter corresponding to that unbound variable is the parameter list. Therefore, that parameter is unbound over the body of the function. The function body collects the words of "Fine summer day" into the variable list, and then yields the that binding. Then the function call completes by noticing that the function parameter list now has a binding, and that the corresponding argument wordlist has no binding. The binding is thus transferred to the wordlist variable. After that, the bindings produced by the function are thrown away. The only enduring effects are:

-
the function matched and consumed some input; and
-
the function succeeded; and
-
the wordlist variable now has a binding.

Another way to understand the parameter behavior is that function parameters behave like proxies which represent their arguments. If an argument is an established value, such as a character string or bound variable, the parameter is a proxy for that value and behaves just like that value. If an argument is an unbound variable, the function parameter acts as a proxy representing that unbound variable. The effect of binding the proxy is that the variable becomes bound, an effect which is settled when the function goes out of scope.

Within the function, both the original variable and the proxy are visible simultaneously, and are independent. What if a function binds both of them? Suppose a function has a parameter called P, which is called with an argument A, which is an unbound variable, and then, in the function, both A and P bound. This is permitted, and they can even be bound to different values. However, when the function terminates, the local binding of A simply disappears (because the symbol A is not among the parameters of the function). Only the value bound to P emerges, and is bound to A, which still appears unbound at that point. The P binding disappears also, and the net effect is that A is now bound. The "proxy" binding of A through the parameter P "wins" the conflict with the direct binding.

 

7.5.2 Definition Syntax

Function definition syntax comes in two flavors: vertical and horizontal. Horizontal definitions actually come in two forms, the distinction between which is hardly noticeable, and the need for which is made clear below.

A function definition begins with a @(define ...) directive. For vertical functions, this is the only element in a line.

The define symbol must be followed by a symbol, which is the name of the function being defined. After the symbol, there is a parenthesized optional argument list. If there is no such list, or if the list is specified as () or the symbol nil then the function has no parameters. Examples of valid define syntax are:


  @(define foo)
  @(define bar ())
  @(define match (a b c))

If the define directive is followed by more material on the same line, then it defines a horizontal function:


  @(define match-x)x@(end)

If the define is the sole element in a line, then it is a vertical function, and the function definition continues below:


  @(define match-x)
  x
  @(end)

The difference between the two is that a horizontal function matches characters within a line, whereas a vertical function matches lines within a stream. The former match-x matches the character x, advancing to the next character position. The latter match-x matches a line consisting of the character x, advancing to the next line.

Material between @(define) and @(end) is the function body. The define directive may be followed directly by the @(end) directive, in which case the function has an empty body.

Functions may be nested within function bodies. Such local functions have dynamic scope. They are visible in the function body in which they are defined, and in any functions invoked from that body.

The body of a function is an anonymous block. (See BLOCKS above).

 

7.5.3 Two Forms of The Horizontal Function

If a horizontal function is defined as the only element of a line, it may not be followed by additional material. The following construct is erroneous:


  @(define horiz (x))@foo:@bar@(end)lalala

This kind of definition is actually considered to be in the vertical context, and like other directives that have special effects and that do not match anything, it does not consume a line of input. If the above syntax were allowed, it would mean that the line would not only define a function but also match lalala. This would, in turn, would mean that the @(define)...@(end) is actually in horizontal mode, and so it matches a span of zero characters within a line (which means that is would require a line of input to match: a surprising behavior for a non-matching directive!)

A horizontal function can be defined in an actual horizontal context. This occurs if its is in a line where it is preceded by other material. For instance:


  X@(define fun)...@(end)Y

This is a query line which must match the text XY. It also defines the function fun. The main use of this form is for nested horizontal functions:


  @(define fun)@(define local_fun)...@(end)@(end)

 

7.5.4 Vertical-Horizontal Overloading

A function of the same name may be defined as both vertical and horizontal. Both functions are available at the same time. Which one is used by a call is resolved by context. See the section Vertical Versus Horizontal Calls below.

 

7.5.5 Call Syntax

A function is invoked by compound directive whose first symbol is the name of that function. Additional elements in the directive are the arguments. Arguments may be symbols, or other objects like string and character literals, quasiliterals ore regular expressions.

Example:

code:
 @(define pair (a b))
 @a @b
 @(end)
 @(pair first second)
 @(pair "ice" cream)
data:
 one two
 ice milk
result:
 first="one"
 second="two"
 cream="milk"

The first call to the function takes the line "one two". The parameter a takes "one" and parameter b takes "two". These are rebound to the arguments first and second. The second call to the function binds the a parameter to the word "ice", and the b is unbound, because the corresponding argument cream is unbound. Thus inside the function, a is forced to match ice. Then a space is matched and b collects the text "milk". When the function returns, the unbound "cream" variable gets this value.

If a symbol occurs multiple times in the argument list, it constrains both parameters to bind to the same value. That is to say, all parameters which, in the body of the function, bind a value, and which are all derived from the same argument symbol must bind to the same value. This is settled when the function terminates, not while it is matching. Example:

code:
 @(define pair (a b))
 @a @b
 @(end)
 @(pair same same)
data:
 one two
result:
 [query fails]

Here the query fails because a and b are effectively proxies for the same unbound variable same and are bound to different values, creating a conflict which constitutes a match failure.

 

7.5.6 Vertical Versus Horizontal Calls

A function call which is the only element of the query line in which it occurs is ambiguous. It can go either to a vertical function or to the horizontal one. If both are defined, then it goes to the vertical one.

Example:

code:
 @(define which (x))@(bind x "horizontal")@(end)
 @(define which (x))
 @(bind x "vertical")
 @(end)
 @(which fun)
result:
 fun="vertical"

Not only does this call go to the vertical function, but it is in a vertical context.

If only a horizontal function is defined, then that is the one which is called, even if the call is the only element in the line. This takes place in a horizontal character-matching context, which requires a line of input which can be traversed:

Example:

code:
 @(define which (x))@(bind x "horizontal")@(end)
 @(which fun)
data:
 ABC
result:
 [query fails]

The query fails because since @(which fun) is in horizontal mode, it matches characters in a line. Since the function body consists only of @(bind ...) which doesn't match any characters, the function call requires an empty line to match. The line ABC is not empty, and so there is a matching failure. The following example corrects this:

Example:

code:
 @(define which (x))@(bind x "horizontal")@(end)
 @(which fun)
data:
 [empty line]
result:
 fun="horizontal"

A call made in a clearly horizontal context will prefer the horizontal function, and only fall back on the vertical one if the horizontal one doesn't exist. (In this fall-back case, the vertical function is called with empty data; it is useful for calling vertical functions which process arguments and produce values.)

In the next example, the call is followed by trailing material, placing it in a horizontal context. Leading material will do the same thing:

Example:

code:
 @(define which (x))@(bind x "horizontal")@(end)
 @(define which (x))
 @(bind x "vertical")
 @(end)
 @(which fun)B
data:
 B
result:
 fun="horizontal"

 

7.5.7 Local Variables

As described earlier, variables bound in a function body which are not parameters of the function are discarded when the function returns. However, that, by itself, doesn't make these variables local, because pattern functions have visibility to all variables in their calling environment. If a variable x exists already when a function is called, then an attempt to bind it inside a function may result in a failure. The local directive must be used in a pattern function to list which variables are local.

Example:


  @(define path (path))@\
    @(local x y)@\
    @(cases)@\
      (@(path x))@(path y)@(bind path `(@x)@y`)@\
    @(or)@\
      @{x /[.,;'!?][^ \t\f\v]/}@(path y)@(bind path `@x@y`)@\
    @(or)@\
      @{x /[^ .,;'!?()\t\f\v]/}@(path y)@(bind path `@x@y`)@\
    @(or)@\
      @(bind path "")@\
    @(end)@\
  @(end)

This is a horizontal function which matches a path, which lands into four recursive cases. A path can be parenthesized path followed by a path; it can be a certain character followed by a path, or it can be empty

This function ensures that the variables it uses internally, x and y, do not have anything to do with any inherited bindings for x and y.

Note that the function is recursive, which cannot work without x and y being local, even if no such bindings exist prior to the top-level invocation of the function. The invocation @(path x) causes x to be bound, which is visible inside the invocation @(path y), but that invocation needs to have its own binding of x for local use.

 

7.5.8 Nested Functions

Function definitions may appear in a function. Such definitions are visible in all functions which are invoked from the body (and not necessarily enclosed in the body). In other words, the scope is dynamic, not lexical. Inner definitions shadow outer definitions. This means that a caller can redirect the function calls that take place in a callee, by defining local functions which capture the references.

Example:

code:
 @(define which)
 @  (fun)
 @(end)
 @(define fun)
 @  (output)
 toplevel fun!
 @  (end)
 @(end)
 @(define callee)
 @  (define fun)
 @    (output)
 local fun!
 @    (end)
 @  (end)
 @  (which)
 @(end)
 @(callee)
 @(which)
output:
 local fun!
 toplevel fun!

Here, the function which is defined which calls fun. A toplevel definition of fun is introduced which outputs "toplevel fun!". The function callee provides its own local definition of fun which outputs "local fun!" before calling which. When callee is invoked, it calls which, whose @(fun) call is routed to callee's local definition. When which is called directly from the top level, its fun call goes to the toplevel definition.

 

7.5.9 Indirect Calls

Function indirection may be performed using the call directive. If fun-expr is an expression which evaluates to a symbol, and that symbol names a function which takes no arguments, then
  @(call fun-expr)
may be used to invoke the function. Of course, additional expressions may be supplied which specify arguments.

Example 1:

 @(define foo (arg))
  @(bind arg "abc")
  @(end)
  @(call @'foo b)

In this example, the effect is that foo is invoked, and b ends up bound to "abc".

The call directive here uses the @'foo expression to calculate the name of the function to be invoked. The @ symbol indicates that the expression which follows is TXR Lisp , and 'foo is the TXR Lisp syntax for quoting a symbol. (See the quote operator).

Of course, this particular call expression can just be replaced by the direct invocation syntax @(foo b).

The power of call lies in being able to specify the function as a value which comes from elsewhere in the program, as in the following example.

 @(define foo (arg))
  @(bind arg "abc")
  @(end)
  @(bind f @'foo)
  @(call f b)

Here the call directive obtains the name of the function from the f variable.

Note that function names are resolved to functions in the environment that is apparent at the point in execution where the call takes place. Very simply, the directive @(call f args ...) is precisely equivalent to @(s args ...) if, at the point of the call, f is a variable which holds the symbol s and symbol s is defined as a function. Otherwise it is erroneous.

 

7.6 Modularization

 

7.6.1 The load and include directives

The syntax of the load and include directives is:


  @(load
expr)
  @(include
expr)

Where expr is a Lisp expression that evaluates to a string giving the path of the file to load.

If the *load-path* has a current value which is not nil and the path is pure relative according to the pure-rel-path-p function, then the path is interpreted relative to the directory portion of the path which is stored in *load-path*.

If *load-path* is nil, or the load path is not pure relative, then it the path is taken as-is.

If the file named by the path cannot be opened, then the .txr suffix is added and another attempt is made. Thus load expressions need not refer to the suffix. In the future, additional suffixes may be searched (compiled versions of a file).

Both the load and include directives bind the *load-path* variable to the path of the loaded file just before parsing syntax from it, and remove the binding when their processing of the file is complete. Processing TXR Lisp code means that each of its forms is read, and evaluated. Processing TXR code means parsing the entire file in its entirety, and then executing its directives against the current input.

The load and include directives differ as follows. The action of load is not performed immediately but at evaluation time. Evaluation time occurs after a TXR program is read from beginning to end and parsed. That is to say, when a TXR query is parsed, any embedded @(load ...) forms in it are parsed and constitute part of its syntax tree. They are executed when that query is executed and its execution reaches those load directives.

By contrast, the action of include is performed immediately, right after the @(include ...) directive syntax is parsed. That is to say, as the TXR parser encounters this syntax it processes it immediately. The included material is read and processed. If it is TXR syntax, then it is parsed and incorporated into the syntax tree in place of the include directive. The parser then continues processing the original file after the include directive. If TXR Lisp code is processed by the include directive, then its forms are read and evaluated. An empty directive is substituted into the syntax tree in this case.

Note: the include directive is useful for loading TXR files which contain Lisp macros which are needed by the parent program. The parent program cannot use load to bring in macros because macros are required during expansion, which takes place prior to evaluation time, whereas load doesn't execute until evaluation time.

See also: the self-path, stdlib and *load-path* variables in TXR Lisp.

 

7.7 Output

 

7.7.1 Introduction

A TXR query may perform custom output. Output is performed by output clauses, which may be embedded anywhere in the query, or placed at the end. Output occurs as a side effect of producing a part of a query which contains an @(output) directive, and is executed even if that part of the query ultimately fails to find a match. Thus output can be useful for debugging. An output clause specifies that its output goes to a file, pipe, or (by default) standard output. If any output clause is executed whose destination is standard output, TXR makes a note of this, and later, just prior to termination, suppresses the usual printing of the variable bindings or the word false.

 

7.7.2 The output directive

The syntax of the @(output) directive is:


  @(output [
destination ] { bool-keyword | keyword value }* )
  .
  . one or more output directives or lines
  .
  @(end)

If the directive has arguments, then the first one is evaluated. If it is an object other than a keyword symbol, then it specifies the optional destination. Any remaining arguments after the optional destination are the keyword list. If the destination is missing, then the entire argument list is a keyword list.

The destination argument, if present, is treated as a TXR Lisp expression and evaluated. The resulting value is taken as the output destination. The value may be a string which gives the path name of a file to open for output. Otherwise, the destination must be a stream object.

The keyword list consists of a mixture of Boolean keywords which do not have an argument, or keywords with arguments.

The following Boolean keywords are supported:

:nothrow
The output directive throws an exception if the output destination cannot be opened, unless the :nothrow keyword is present, in which case the situation is treated as a match failure.

Note that since command pipes are processes that report errors asynchronously, a failing command will not throw an immediate exception that can be suppressed with :nothrow. This is for synchronous errors, like trying to open a destination file, but not having permissions, etc.

:append
This keyword is meaningful for files, specifying append mode: the output is to be added to the end of the file rather than overwriting the file.

The following value keywords are supported:

:filter
The argument can be a symbol, which specifies a filter to be applied to the variable substitutions occurring within the output clause. The argument can also be a list of filter symbols, which specifies that multiple filters are to be applied, in left to right order.

See the later sections Output Filtering below, and The Deffilter Directive.

:into
The argument of :into is a symbol which denotes a variable. The output will go into that variable. If the variable is unbound, it will be created. Otherwise, its contents are overwritten unless the :append keyword is used. If :append is used, then the new content will be appended to the previous content of the variable, after flattening the content to a list, as if by the flatten directive.

:named
The argument of :named is a symbol which denotes a variable. The file or pipe stream which is opened for the output is stored in this variable, and is not closed at the end of the output block. This allows a subsequent output block to continue output on the same stream, which is possible using the next two keywords, :continue or :finish. A new binding is established for the variable, even if it already has an existing binding.

:continue
A destination should not be specified if :continue is used. The argument of :continue is an expression, such as a variable name, that evaluates to a stream object. That stream object is used for the output block. At the end of the output block, the stream is flushed, but not closed. A usage example is given in the documentation for the Close Directive below.

:finish
A destination should not be specified if :finish is used. The argument of :finish is an expression, such as a variable name, that evaluates to a stream object. That stream object is used for the output block. At the end of the output block, the stream is closed. An example is given in the documentation for the Close Directive below.

 

7.7.3 Output Text

Text in an output clause is not matched against anything, but is output verbatim to the destination file, device or command pipe.

 

7.7.4 Output Variables

Variables occurring in an output clause do not match anything; instead their contents are output.

A variable being output can be any object. If it is of a type other than a list or string, it will be converted to a string as if by the tostring function in TXR Lisp.

A list is converted to a string in a special way: the elements are individually converted to a string and then they are catenated together. The default separator string is a single space: an alternate separation can be specified as an argument in the brace substitution syntax. Empty lists turn into an empty string.

Lists may be output within @(repeat) or @(rep) clauses. Each nesting of these constructs removes one level of nesting from the list variables that it contains.

In an output clause, the @{name number} variable syntax generates fixed-width field, which contains the variable's text. The absolute value of the number specifies the field width. For instance -20 and 20 both specify a field width of twenty. If the text is longer than the field, then it overflows the field. If the text is shorter than the field, then it is left-adjusted within that field, if the width is specified as a positive number, and right-adjusted if the width is specified as negative.

An output variable may specify a filter which overrides any filter established for the output clause. The syntax for this is @{NAME :filter filterspec}. The filter specification syntax is the same as in the output clause. See Output Filtering below.

 

7.7.5 Output Variables: Indexing

Additional syntax is supported in output variables that does not appear in pattern matching variables.

A square bracket index notation may be used to extract elements or ranges from a variable, which works with strings, vectors and lists. Elements are indexed from zero. This notation is only available in brace-enclosed syntax, and looks like this:

@{name[expr]}
Extract the element at the position given by expr.

@{name[expr1..expr2]}
Extract a range of elements from the position given by expr1, up to one position less than the position given by expr2.

If the variable is a list, it is treated as a list substitution, exactly as if it were the value of an unsubscripted list variable. The elements of the list are converted to strings and catenated together wit ha separator string between them, the default one being a single space.

An alternate character may be given as a string argument in the brace notation.

Example:


  @(bind a ("a" "b" "c" "d"))
  @(output)
  @{a[1..3] "," 10}
  @(end)

The above produces the text "b,c" in a field 10 spaces wide. The [1..3] argument extracts a range of a; the "," argument specifies an alternate separator string, and 10 specifies the field width.

 

7.7.6 Output Substitutions

The brace syntax has another syntactic and semantic extension in output clauses. In place of the symbol, an expression may appear. The value of that expression is substituted.

Example:


 @(bind a "foo")
 @(output)
 @{`@a:` -10}

Here, the quasiliteral expression `@a:` is evaluated, producing the string "foo:". This string is printed right-adjusted in a 10 character field.

 

7.7.7 The repeat directive

The repeat directive generates repeated text from a "boilerplate", by taking successive elements from lists. The syntax of repeat is like this:


  @(repeat)
  .
  .
  main clause material, required
  .
  .
  special clauses, optional
  .
  .
  @(end)

repeat has four types of special clauses, any of which may be specified with empty contents, or omitted entirely. They are described below.

repeat takes arguments, also described below.

All of the material in the main clause and optional clauses is examined for the presence of variables. If none of the variables hold lists which contain at least one item, then no output is performed, (unless the repeat specifies an @(empty) clause, see below). Otherwise, among those variables which contain non-empty lists, repeat finds the length of the longest list. This length of this list determines the number of repetitions, R.

If the repeat contains only a main clause, then the lines of this clause is output R times. Over the first repetition, all of the variables which, outside of the repeat, contain lists are locally rebound to just their first item. Over the second repetition, all of the list variables are bound to their second item, and so forth. Any variables which hold shorter lists than the longest list eventually end up with empty values over some repetitions.

Example: if the list A holds "1", "2" and "3"; the list B holds "A", "B"; and the variable C holds "X", then


  @(repeat)
  >> @C
  >> @A @B
  @(end)

will produce three repetitions (since there are two lists, the longest of which has three items). The output is:


  >> X
  >> 1 A
  >> X
  >> 2 B
  >> X
  >> 3

The last line has a trailing space, since it is produced by "@A @B", where B has an empty value. Since C is not a list variable, it produces the same value in each repetition.

The special clauses are:

@(single)
If the repeat produces exactly one repetition, then the contents of this clause are processed for that one and only repetition, instead of the main clause or any other clause which would otherwise be processed.

@(first)
The body of this clause specifies an alternative body to be used for the first repetition, instead of the material from the main clause.

@(last)
The body of this clause is used instead of the main clause for the last repetition.

@(empty)
If the repeat produces no repetitions, then the body of this clause is output. If this clause is absent or empty, the repeat produces no output.

@(mod n m)
The forms n and m are Lisp expressions that evaluate to integers. The value of m should be nonzero. The clause denoted this way is active if the repetition modulo m is equal to n. The first repetition is numbered zero. For instance the clause headed by @(mod 0 2) will be used on repetitions 0, 2, 4, 6, ... and @(mod 1 2) will be used on repetitions 1, 3, 5, 7, ...

@(modlast n m)
The meaning of n and m is the same as in @(mod n m), but one more condition is imposed. This clause is used if the repetition modulo m is equal to n, and if it is the last repetition.

The precedence among the clauses which take an iteration is: single > first > mod > modlast > last > main. That is if two or more of these clauses can apply to a repetition, then the leftmost one in this precedence list applies. For instance, if there is just a single repetition, then any of these special clause types can apply to that repetition, since it is the only repetition, as well as the first and last one. In this situation, if there is a @(single) clause present, then the repetition is processed using that clause. Otherwise, if there is a @(first) clause present, that clause is used. Failing that, @(mod) is used if there is such a clause and its numeric conditions are satisfied. If there isn't, then @(modlast) clauses are considered, and if there are none, or none of them activate, then @(last) is considered. Finally if none of all these clauses are present or apply, then the repetition is processed using the main clause.

Repeat supports arguments.


  @(repeat
      [:counter {
symbol | (symbol expr)}]
      [:vars ({
symbol | (symbol expr)}*)])

The :counter argument designates a symbol which will behave as an integer variable over the scope of the clauses inside the repeat. The variable provides access to the repetition count, starting at zero, incrementing with each repetition. If the the argument is given as (symbol expr) then expr is a Lisp expression whose value is taken as a displacement value which is added to each iteration of the counter. For instance :counter (c 1) specifies a counter c which counts from 1.

The :vars argument specifies a list of variable names, or pairs consisting of a variable name and Lisp expression. For every variable paired with a Lisp expression, the expression is evaluated, and a binding is introduced, associating that variable with the expression's value.

The repeat directive then processes the list of variables, selecting from it those which have a binding, either a previously existing binding or one just introduced from a Lisp expression. For each selected variable, repeat will assume that the variable occur in the repeat block and contains a list to be iterated.

Thus :vars Firstly, it is needed for situations in which @(repeat) is not able to deduce the existence of a variable in the block. It does not dig very deeply to discover variables, and does not "see" variables that are referenced via embedded TXR Lisp expressions. For instance, the following produces no output:


  @(bind list ("a" "b" "c"))
  @(output)
  @(repeat)
  @(format nil "<~a>" list)
  @(end)
  @(end)

Although the list variable appears in the repeat block, it is embedded in a TXR Lisp construct. That construct will never be evaluated because no repetitions take place: the repeat construct doesn't find any variables and so doesn't iterate. The remedy is to provide a little help via the :vars parameter:


  @(bind list ("a" "b" "c"))
  @(output)
  @(repeat :vars (list))
  @(format nil "<~a>" list)
  @(end)
  @(end)

Now the repeat block iterates over list and the output is:


  <a>
  <b>
  <c>

Secondly, The variable binding syntax supported by :vars additionally provides a solution for situations when it is necessary to iterate over some list, but that list is the result of an expression, and not stored in any variable. A repeat block iterates only over lists emanating from variables; it does not iterate over lists pulled from arbitrary expressions.

Example: output all file names matching the *.txr pattern in the current directory:


  @(output)
  @(repeat :vars ((name (glob "*.txr"))))
  @name
  @(end)
  @(end)

 

7.7.8 Nested repeat directives

If a repeat clause encloses variables which hold multidimensional lists, those lists require additional nesting levels of repeat (or rep). It is an error to attempt to output a list variable which has not been decimated into primary elements via a repeat construct.

Suppose that a variable X is two-dimensional (contains a list of lists). X must be twice nested in a repeat. The outer repeat will traverse the lists contained in X. The inner repeat will traverse the elements of each of these lists.

A nested repeat may be embedded in any of the clauses of a repeat, not only the main clause.

 

7.7.9 The rep directive

The rep directive is similar to repeat. Whereas repeat is line oriented, rep generates material within a line. It has all the same clauses, but everything is specified within one line:


  @(rep)... main material ... .... special clauses ...@(end)

More than one @(rep) can occur within a line, mixed with other material. A @(rep) can be nested within a @(repeat) or within another @(rep).

Also, @(rep) accepts the same :counter and :vars arguments.

 

7.7.10 repeat and rep Examples

Example 1: show the list L in parentheses, with spaces between the elements, or the word EMPTY if the list is empty:


  @(output)
  @(rep)@L @(single)(@L)@(first)(@L @(last)@L)@(empty)EMPTY@(end)
  @(end)

Here, the @(empty) clause specifies EMPTY. So if there are no repetitions, the text EMPTY is produced. If there is a single item in the list L, then @(single)(@L) produces that item between parentheses. Otherwise if there are two or more items, the first item is produced with a leading parenthesis followed by a space by @(first)(@L and the last item is produced with a closing parenthesis: @(last)@L). All items in between are emitted with a trailing space by the main clause: @(rep)@L.

Example 2: show the list L like Example 1 above, but the empty list is ().


  @(output)
  (@(rep)@L @(last)@L@(end))
  @(end)

This is simpler. The parentheses are part of the text which surrounds the @(rep) construct, produced unconditionally. If the list L is empty, then @(rep) produces no output, resulting in (). If the list L has one or more items, then they are produced with spaces each one, except the last which has no space. If the list has exactly one item, then the @(last) applies to it instead of the main clause: it is produced with no trailing space.

 

7.7.11 The close directive

The syntax of the close directive is:


  @(close
expr)

Where expr evaluates to a stream. The close directive can be used to explicitly close streams created using @(output ... :named var) syntax, as an alternative to @(output :finish expr).

Examples:

Write two lines to "foo.txt" over two output blocks using a single stream:


  @(output "foo.txt" :named foo)
  Hello,
  @(end)
  @(output :continue foo)
  world!
  @(end)
  @(close foo)

The same as above, using :finish rather than :continue so that the stream is closed at the end of the second block:


  @(output "foo.txt" :named foo)
  Hello,
  @(end)
  @(output :finish foo)
  world!
  @(end)

 

7.7.12 Output Filtering

Often it is necessary to transform the output to preserve its meaning under the convention of a given data format. For instance, if a piece of text contains the characters < or >, then if that text is being substituted into HTML, these should be replaced by &lt; and &gt;. This is what filtering is for. Filtering is applied to the contents of output variables, not to any template text. TXR implements named filters. Built-in filters are named by keywords, given below. User-defined filters are possible, however. See notes on the deffilter directive below.

Instead of a filter name, the syntax (fun name) can be used. This denotes that the function called name is to be used as a filter. This is described in the next section Function Filters below.

Built-in filters named by keywords:

:tohtml
Filter text to HTML, representing special characters using HTML ampersand sequences. For instance > is replaced by &gt;.

:tohtml*
Filter text to HTML, representing special characters using HTML ampersand sequences. Unlike :tohtml, this filter doesn't treat the single and double quote characters. It is not suitable for preparing HTML fragments which end up inserted into HTML tag attributes.

:fromhtml
Filter text with HTML codes into text in which the codes are replaced by the corresponding characters. For instance &gt; is replaced by >.

:upcase
Convert the 26 lower case letters of the English alphabet to upper case.

:downcase
Convert the 26 upper case letters of the English alphabet to lower case.

:frompercent
Decode percent-encoded text. Character triplets consisting of the % character followed by a pair of hexadecimal digits (case insensitive) are are converted to bytes having the value represented by the hexadecimal digits (most significant nybble first). Sequences of one or more such bytes are treated as UTF-8 data and decoded to characters.

:topercent
Convert to percent encoding according to RFC 3986. The text is first converted to UTF-8 bytes. The bytes are then converted back to text as follows. Bytes in the range 0 to 32, and 127 to 255 (note: including the ASCII DEL), bytes whose values correspond to ASCII characters which are listed by RFC 3986 as being in the "reserved set", and the byte value corresponding to the ASCII % character are encoded as a three-character sequence consisting of the % character followed by two hexadecimal digits derived from the byte value (most significant nybble first, upper case). All other bytes are converted directly to characters of the same value without any such encoding.

:fromurl
Decode from URL encoding, which is like percent encoding, except that if the unencoded + character occurs, it is decoded to a space character. Of course %20 still decodes to space, and %2B to the + character.

:tourl
Encode to URL encoding, which is like percent encoding except that a space maps to + rather than %20. The + character, being in the reserved set, encodes to %2B.

:frombase64
Decode from the Base 64 encoding described in RFC 4648.

:tobase64
Encodes to the RFC 4648 Base 64 encoding.

:tonumber
Converts strings to numbers. Strings that contain a period, e or E are converted to floating point as if by the Lisp function flo-str. Otherwise they are converted to integer as if using int-str with a radix of 10. Non-numeric junk results in the object nil.

:toint
Converts strings to integers as if using int-str with a radix of 10. Non-numeric junk results in the object nil.

:tofloat
Converts strings to floating-point values as if using the function flo-str. Non-numeric junk results in the object nil.

:hextoint
Converts strings to integers as if using int-str with a radix of 16. Non-numeric junk results in the object nil.

Examples:

To escape HTML characters in all variable substitutions occurring in an output clause, specify :filter :tohtml in the directive:


  @(output :filter :tohtml)
  ...
  @(end)

To filter an individual variable, add the syntax to the variable spec:


  @(output)
  @{x :filter :tohtml}
  @(end)

Multiple filters can be applied at the same time. For instance:


  @(output)
  @{x :filter (:upcase :tohtml)}
  @(end)

This will fold the contents of x to upper case, and then encode any special characters into HTML. Beware of combinations that do not make sense. For instance, suppose the original text is HTML, containing codes like &quot;. The compound filter (:upcase :fromhtml) will not work because &quot; will turn to &QUOT; which no longer be recognized by the :fromhtml filter, since the entity names in HTML codes are case-sensitive.

Capture some numeric variables and convert to numbers:


  @date @time @temperature @pressure
  @(filter :tofloat temperature pressure)
  @;; temperature and pressure can now be used in calculations

 

7.7.13 Function Filters

A function can be used as a filter. For this to be possible, the function must conform to certain rules:

1.
The function must take two special arguments, which may be followed by additional arguments.

2.
When the function is called, the first argument will be bound to a string, and the second argument will be unbound. The function must produce a value by binding it to the second argument. If the filter is to be used as the final filter in a chain, it must produce a string.

For instance, the following is a valid filter function:


  @(define foo_to_bar (in out))
  @  (next :string in)
  @  (cases)
  foo
  @    (bind out "bar")
  @  (or)
  @    (bind out in)
  @  (end)
  @(end)

This function binds the out parameter to "bar" if the in parameter is "foo", otherwise it binds the out parameter to a copy of the in parameter. This is a simple filter.

To use the filter, use the syntax (:fun foo_to_bar) in place of a filter name. For instance in the bind directive:


  @(bind "foo" "bar" :lfilt (:fun foo_to_bar))

The above should succeed since the left side is filtered from "foo" to "bar", so that there is a match.

Of course, function filters can be used in a chain:


  @(output :filter (:downcase (:fun foo_to_bar) :upcase))
  ...
  @(end)

Here is a split function which takes an extra argument which specifies the separator:


  @(define split (in out sep))
  @  (next :list in)
  @  (coll)@(maybe)@token@sep@(or)@token@(end)@(end)
  @  (bind out token)
  @(end)

Furthermore, note that it produces a list rather than a string. This function separates the argument in into tokens according to the separator text carried in the variable sep.

Here is another function, join, which catenates a list:


  @(define join (in out sep))
  @  (output :into out)
  @  (rep)@in@sep@(last)@in@(end)
  @  (end)
  @(end)

Now here is these two being used in a chain:


  @(bind text "how,are,you")
  @(output :filter (:fun split ",") (:fun join "-"))
  @text
  @(end)

Output:


  how-are-you

When the filter invokes a function, it generates the first two arguments internally to pass in the input value and capture the output. The remaining arguments from the (:fun ...) construct are also passed to the function. Thus the string objects "," and "-" are passed as the sep argument to split and join.

Note that split puts out a list, which join accepts. So the overall filter chain operates on a string: a string goes into split, and a string comes out of join.

 

7.7.14 The deffilter directive

The deffilter directive allows a query to define a custom filter, which can then be used in output clauses to transform substituted data.

This directive's syntax is illustrated in this example:

code:
 @(deffilter rot13
    ("a" "n")
    ("b" "o")
    ("c" "p")
    ("d" "q")
    ("e" "r")
    ("f" "s")
    ("g" "t")
    ("h" "u")
    ("i" "v")
    ("j" "w")
    ("k" "x")
    ("l" "y")
    ("m" "z")
    ("n" "a")
    ("o" "b")
    ("p" "c")
    ("q" "d")
    ("r" "e")
    ("s" "f")
    ("t" "g")
    ("u" "h")
    ("v" "i")
    ("w" "j")
    ("x" "k")
    ("y" "l")
    ("z" "m"))
 @(collect)
 @line
 @(end)
 @(output :filter rot13)
 @(repeat)
 @line
 @(end)
 @(end)
data:
 hey there!
output:
 url gurer!

The deffilter symbol must be followed by the name of the filter to be defined, followed by bind expressions which evaluate to lists of strings. Each list must be at least two elements long and specifies one or more texts which are mapped to a replacement text. For instance, the following specifies a telephone keypad mapping from upper case letters to digits.


  @(deffilter alpha_to_phone ("E" "0")
                             ("J" "N" "Q" "1")
                             ("R" "W" "X" "2")
                             ("D" "S" "Y" "3")
                             ("F" "T" "4")
                             ("A" "M" "5")
                             ("C" "I" "V" "6")
                             ("B" "K" "U" "7")
                             ("L" "O" "P" "8")
                             ("G" "H" "Z" "9"))


  @(deffilter foo (`@a` `@b`) ("c" `->@d`))


  @(bind x ("from" "to"))
  @(bind y ("---" "+++"))
  @(deffilter sub x y)

The last deffilter has the same effect as the @(deffilter sub ("from" "to") ("---" "+++")) directive.

Filtering works using a longest match algorithm. The input is scanned from left to right, and the longest piece of text is identified at every character position which matches a string on the left hand side, and that text is replaced with its associated replacement text. The scanning then continues at the first character after the matched text.

If none of the strings matches at a given character position, then that character is passed through the filter untranslated, and the scan continues at the next character in the input.

Filtering is not in-place but rather instantiates a new text, and so replacement text is not re-scanned for more replacements.

If a filter definition accidentally contains two or more repetitions of the same left hand string with different right hand translations, the later ones take precedence. No warning is issued.

 

7.7.15 The filter directive

The syntax of the filter directive is:


  @(filter FILTER { VAR }+ )

A filter is specified, followed by one or more variables whose values are filtered and stored back into each variable.

Example: convert a, b, and c to upper case and HTML encode:


  @(filter (:upcase :tohtml) a b c)

 

7.8 Exceptions

 

7.8.1 Introduction

The exceptions mechanism in TXR is another disciplined form of non-local transfer, in addition to the blocks mechanism (see BLOCKS above). Like blocks, exceptions provide a construct which serves as the target for a dynamic exit. Both blocks and exceptions can be used to bail out of deep nesting when some condition occurs. However, exceptions provide more complexity. Exceptions are useful for error handling, and TXR in fact maps certain error situations to exception control transfers. However, exceptions are not inherently an error-handling mechanism; they are a structured dynamic control transfer mechanism, one of whose applications is error handling.

An exception control transfer (simply called an exception) is always identified by a symbol, which is its type. Types are organized in a subtype-supertype hierarchy. For instance, the file-error exception type is a subtype of the error type. This means that a file error is a kind of error. An exception handling block which catches exceptions of type error will catch exceptions of type file-error, but a block which catches file-error will not catch all exceptions of type error. A query-error is a kind of error, but not a kind of file-error. The symbol t is the supertype of every type: every exception type is considered to be a kind of t. (Mnemonic: t stands for type, as in any type).

Exceptions are handled using @(catch) clauses within a @(try) directive.

In addition to being useful for exception handling, the @(try) directive also provides unwind protection by means of a @(finally) clause, which specifies query material to be executed unconditionally when the try clause terminates, no matter how it terminates.

 

7.8.2 The try directive

The general syntax of the try directive is


  @(try)
  ... main clause, required ...
  ... optional catch clauses ...
  ... optional finally clause
  @(end)

A catch clause looks like:


  @(catch TYPE [ PARAMETERS ])
  .
  .
  .

and also this simple form:


  @(catch)
  .
  .
  .

which catches all exceptions, and is equivalent to @(catch t).

A finally clause looks like:


  @(finally)
  ...
  .
  .

The main clause may not be empty, but the catch and finally may be.

A try clause is surrounded by an implicit anonymous block (see BLOCKS section above). So for instance, the following is a no-op (an operation with no effect, other than successful execution):


  @(try)
  @(accept)
  @(end)

The @(accept) causes a successful termination of the implicit anonymous block. Execution resumes with query lines or directives which follow, if any.

try clauses and blocks interact. For instance, an accept from within a try clause invokes a finally.

code:
 @(block foo)
 @  (try)
 @    (accept foo)
 @  (finally)
 @     (output)
 bye!
 @     (end)
 @  (end)
output:
 bye!

How this works: the try block's main clause is @(accept foo). This causes the enclosing block named foo to terminate, as a successful match. Since the try is nested within this block, it too must terminate in order for the block to terminate. But the try has a finally clause, which executes unconditionally, no matter how the try block terminates. The finally clause performs some output, which is seen.

 

7.8.3 The finally clause

A try directive can terminate in one of three ways. The main clause may match successfully, and possibly yield some new variable bindings. The main clause may fail to match. Or the main clause may be terminated by a non-local control transfer, like an exception being thrown or a block return (like the block foo example in the previous section).

No matter how the try clause terminates, the finally clause is processed.

The finally clause is itself a query which binds variables, which leads to questions: what happens to such variables? What if the finally block fails as a query? As well as: what if a finally clause itself initiates a control transfer? Answers follow.

Firstly, a finally clause will contribute variable bindings only if the main clause terminates normally (either as a successful or failed match). If the main clause of the try block successfully matches, then the finally block continues matching at the next position in the data, and contributes bindings. If the main clause fails, then the finally block tries to match at the same position where the main clause failed.

The overall try directive succeeds as a match if either the main clause or the finally clause succeed. If both fail, then the try directive is a failed match.

Example:

code:
 @(try)
 @a
 @(finally)
 @b
 @(end)
 @c
data:
 1
 2
 3
result:
 a="1"
 b="2"
 c="3"

In this example, the main clause of the try captures line "1" of the data as variable a, then the finally clause captures "2" as b, and then the query continues with the @c line after try block, so that c captures "3".

Example:

code:
 @(try)
 hello @a
 @(finally)
 @b
 @(end)
 @c
data:
 1
 2
result:
 b="1"
 c="2"

In this example, the main clause of the try fails to match, because the input is not prefixed with "hello ". However, the finally clause matches, binding b to "1". This means that the try block is a successful match, and so processing continues with @c which captures "2".

When finally clauses are processed during a non-local return, they have no externally visible effect if they do not bind variables. However, their execution makes itself known if they perform side effects, such as output.

A finally clause guards only the main clause and the catch clauses. It does not guard itself. Once the finally clause is executing, the try block is no longer guarded. This means if a nonlocal transfer, such as a block accept or exception, is initiated within the finally clause, it will not re-execute the finally clause. The finally clause is simply abandoned.

The disestablishment of blocks and try clauses is properly interleaved with the execution of finally clauses. This means that all surrounding exit points are visible in a finally clause, even if the finally clause is being invoked as part of a transfer to a distant exit point. The finally clause can make a control transfer to an exit point which is more near than the original one, thereby "hijacking" the control transfer. Also, the anonymous block established by the try directive is visible in the finally clause.

Example:


  @(try)
  @  (try)
  @    (next "nonexistent-file")
  @  (finally)
  @    (accept)
  @  (end)
  @(catch file-error)
  @  (output)
  file error caught
  @  (end)
  @(end)

In this example, the @(next) directive throws an exception of type file-error, because the given file does not exist. The exit point for this exception is the @(catch file-error) clause in the outer-most try block. The inner block is not eligible because it contains no catch clauses at all. However, the inner try block has a finally clause, and so during the processing of this exception which is headed for @(catch file-error), the finally clause performs an anonymous accept. The exit point for that accept is the anonymous block surrounding the inner try. So the original transfer to the catch clause is thereby abandoned. The inner try terminates successfully due to the accept, and since it constitutes the main clause of the outer try, that also terminates successfully. The "file error caught" message is never printed.

 

7.8.4 catch clauses

catch clauses establish their associated try blocks as potential exit points for exception-induced control transfers (called "throws").

A catch clause specifies an optional list of symbols which represent the exception types which it catches. The catch clause will catch exceptions which are a subtype of any one of those exception types.

If a try block has more than one catch clause which can match a given exception, the first one will be invoked.

When a catch is invoked, it is of course understood that the main clause did not terminate normally, and so the main clause could not have produced any bindings.

catch clauses are processed prior to finally.

If a catch clause itself throws an exception, that exception cannot be caught by that same clause or its siblings in the same try block. The catch clauses of that block are no longer visible at that point. Nevertheless, the catch clauses are still protected by the finally block. If a catch clause throws, or otherwise terminates, the finally block is still processed.

If a finally block throws an exception, then it is simply aborted; the remaining directives in that block are not processed.

So the success or failure of the try block depends on the behavior of the catch clause or the finally clause, if there is one. If either of them succeed, then the try block is considered a successful match.

Example:

code:
 @(try)
 @  (next "nonexistent-file")
 @  x
 @  (catch file-error)
 @a
 @(finally)
 @b
 @(end)
 @c
data:
 1
 2
 3
result:
 a="1"
 b="2"
 c="3"

Here, the try block's main clause is terminated abruptly by a file-error exception from the @(next) directive. This is handled by the catch clause, which binds variable a to the input line "1". Then the finally clause executes, binding b to "2". The try block then terminates successfully, and so @c takes "3".

 

7.8.5 catch Clauses with Parameters

A catch clause may have parameters following the type name, like this:


  @(catch pair (a b))

To write a catch-all with parameters, explicitly write the master supertype t:


  @(catch t (arg ...))

Parameters are useful in conjunction with throw. The built-in error exceptions carry one argument, which is a string containing the error message. Using throw, arbitrary parameters can be passed from the throw site to the catch site.

 

7.8.6 The throw directive

The throw directive generates an exception. A type must be specified, followed by optional arguments, which are bind expressions. For example,


  @(throw pair "a" `@file.txt`)

throws an exception of type pair, with two arguments, being "a" and the expansion of the quasiliteral `@file.txt`.

The selection of the target catch is performed purely using the type name; the parameters are not involved in the selection.

Binding takes place between the arguments given in throw and the target catch.

If any catch parameter, for which a throw argument is given, is a bound variable, it has to be identical to the argument, otherwise the catch fails. (Control still passes to the catch, but the catch is a failed match).

code:
 @(bind a "apple")
 @(try)
 @(throw e "banana")
 @(catch e (a))
 @(end)
result:
 [query fails]

If any argument is an unbound variable, the corresponding parameter in the catch is left alone: if it is an unbound variable, it remains unbound, and if it is bound, it stays as is.

code:
 @(try)
 @(trow e "honda" unbound)
 @(catch e (car1 car2))
 @car1 @car2
 @(end)
data:
 honda toyota
result:
 car1="honda"
 car2="toyota"

If a catch has fewer parameters than there are throw arguments, the excess arguments are ignored:

code:
 @(try)
 @(throw e "banana" "apple" "pear")
 @(catch e (fruit))
 @(end)
result:
 fruit="banana"

If a catch has more parameters than there are throw arguments, the excess parameters are left alone. They may be bound or unbound variables.

code:
 @(try)
 @(trow e "honda")
 @(catch e (car1 car2))
 @car1 @car2
 @(end)
data:
 honda toyota
result:
 car1="honda"
 car2="toyota"

A throw argument passing a value to a catch parameter which is unbound causes that parameter to be bound to that value.

throw arguments are evaluated in the context of the throw, and the bindings which are available there. Consideration of what parameters are bound is done in the context of the catch.

code:
 @(bind c "c")
 @(try)
 @(forget c)
 @(bind (a c) ("a" "lc"))
 @(throw e a c)
 @(catch e (b a))
 @(end)
result:
 c="c"
 b="a"
 a="lc"

In the above example, c has a toplevel binding to the string "c", but then becomes unbound via forget within the try construct, and rebound to the value "lc". Since the try construct is terminated by a throw, these modifications of the binding environment are discarded. Hence, at the end of the query, variable c ends up bound to the original value "c". The throw still takes place within the scope of the bindings set up by the try clause, so the values of a and c that are thrown are "a" and "lc". However, at the catch site, variable a does not have a binding. At that point, the binding to "a" established in the try has disappeared already. Being unbound, the catch parameter a can take whatever value the corresponding throw argument provides, so it ends up with "lc".

 

7.8.7 The defex directive

The defex directive allows the query writer to invent custom exception types, which are arranged in a type hierarchy (meaning that some exception types are considered subtypes of other types).

Subtyping means that if an exception type B is a subtype of A, then every exception of type B is also considered to be of type A. So a catch for type A will also catch exceptions of type B. Every type is a supertype of itself: an A is a kind of A. This of course implies that every type is a subtype of itself also. Furthermore, every type is a subtype of the type t, which has no supertype other than itself. Type nil is is a subtype of every type, including itself. The subtyping relationship is transitive also. If A is a subtype of B, and B is a subtype of C, then A is a subtype of C.

defex may be invoked with no arguments, in which case it does nothing:


  @(defex)

It may be invoked with one argument, which must be a symbol. This introduces a new exception type. Strictly speaking, such an introduction is not necessary; any symbol may be used as an exception type without being introduced by @(defex):


  @(defex a)

Therefore, this also does nothing, other than document the intent to use a as an exception.

If two or more argument symbols are given, the symbols are all introduced as types, engaged in a subtype-supertype relationship from left to right. That is to say, the first (leftmost) symbol is a subtype of the next one, which is a subtype of the next one and so on. The last symbol, if it had not been already defined as a subtype of some type, becomes a direct subtype of the master supertype t. Example:


  @(defex d e)
  @(defex a b c d)

The first directive defines d as a subtype of e, and e as a subtype of t. The second defines a as a subtype of b, b as a subtype of c, and c as a subtype of d, which is already defined as a subtype of e. Thus a is now a subtype of e. The the above can be condensed to:


  @(defex a b c d e)

Example:

code:
 @(defex gorilla ape primate)
 @(defex monkey primate)
 @(defex human primate)
 @(collect)
 @(try)
 @(skip)
 @(cases)
 gorilla @name
 @(throw gorilla name)
 @(or)
 monkey @name
 @(throw monkey name)
 @(or)
 human @name
 @(throw human name)
 @(end)@#cases
 @(catch primate (name))
 @kind @name
 @(output)
 we have a primate @name of kind @kind
 @(end)@#output
 @(end)@#try
 @(end)@#collect
data:
 gorilla joe
 human bob
 monkey alice
output:
 we have a primate joe of kind gorilla
 we have a primate bob of kind human
 we have a primate alice of kind monkey

Exception types have a pervasive scope. Once a type relationship is introduced, it is visible everywhere. Moreover, the defex directive is destructive, meaning that the supertype of a type can be redefined. This is necessary so that something like the following works right:


  @(defex gorilla ape)
  @(defex ape primate)

These directives are evaluated in sequence. So after the first one, the ape type has the type t as its immediate supertype. But in the second directive, ape appears again, and is assigned the primate supertype, while retaining gorilla as a subtype. This situation could be diagnosed as an error, forcing the programmer to reorder the statements, but instead TXR obliges. However, there are limitations. It is an error to define a subtype-supertype relationship between two types if they are already connected by such a relationship, directly or transitively. So the following definitions are in error:


  @(defex a b)
  @(defex b c)
  @(defex a c)@# error: a is already a subtype of c, through b


  @(defex x y)
  @(defex y x)@# error: circularity; y is already a supertype of x.

 

7.8.8 The assert directive

The assert directive requires the remaining query or sub-query which follows it to match. If the remainder fails to match, the assert directive throws an exception. If the directive is simply


  @(assert)

Then it throws an assertion of type assert, which is a subtype of error. The assert directive also takes arguments similar to the throw directive: an exception symbol and additional arguments which are bind expressions, and may be unbound variables. The following assert directive, if it triggers, will throw an exception of type foo, with arguments 1 and "2":


  @(assert foo 1 "2")

Example:


  @(collect)
  Important Header
  ----------------
  @(assert)
  Foo: @a, @b
  @(end)

Without the assertion in places, if the Foo: @a, @b part does not match, then the entire interior of the @(collect) clause fails, and the collect continues searching for another match.

With the assertion in place, if the text "Important Header" and its underline match, then the remainder of the collect body must match, otherwise an exception is thrown. Now the program will not silently skip over any Important Header sections due to a problem in its matching logic. This is particularly useful when the matching is varied with numerous cases, and they must all be handled.

There is a horizontal assert directive also. For instance:


  abc@(assert)d@x

asserts that if the prefix "abc" is matched, then it must be followed by a successful match for "d@x", or else an exception is thrown.

 

8 TXR LISP

The TXR language contains an embedded Lisp dialect called TXR Lisp.

This language is exposed in TXR in several ways.

Firstly, in any situation that calls for an expression, a Lisp expression can be used, if it is preceded by the @ character. The Lisp expression is evaluated and its value becomes the value of that expression. Thus, TXR directives are embedded in literal text using @, and Lisp expressions are embedded in directives using @ also.

Secondly, certain directives evaluate Lisp expressions without requiring @. These are @(do), @(require), @(assert), @(if) and @(next).

Thirdly, TXR Lisp code can be placed into files. On the command line, TXR treats files with a ".tl" suffix as TXR Lisp code, and the @(load) directive does also.

Lastly, TXR Lisp expressions can be evaluated via the command line, using the -e and -p options.

Examples:

Bind variable a to the integer 4:


  @(bind a @(+ 2 2))

Bind variable b to the standard input stream. Note that @ is not required on a Lisp variable:


  @(bind a *stdin*)

Define several Lisp functions inside @(do):


  @(do
    (defun add (x y) (+ x y))


    (defun occurs (item list)
      (cond ((null list) nil)
            ((atom list) (eql item list))
            (t (or (eq (first list) item)
                   (occurs item (rest list)))))))

Trigger a failure unless previously bound variable answer is greater than 42:


  @(require (> (int-str answer) 42)

 

8.1 Overview

TXR Lisp is a small and simple dialect, like Scheme, but much more similar to Common Lisp than Scheme. It has separate value and function binding namespaces, like Common Lisp (and thus is a Lisp-2 type dialect), and represents Boolean true and false with the symbols t and nil (note the case sensitivity of identifiers denoting symbols!) Furthermore, the symbol nil is also the empty list, which terminates nonempty lists.

TXR Lisp has lexically scoped local variables and dynamic global variables, similarly to Common Lisp, including the convention that defvar marks symbols for dynamic binding in local scopes. Lexical closures are supported. TXR Lisp also supports global lexical variables via defvarl.

Functions are lexically scoped in TXR Lisp; they can be defined in pervasive global environment using defun or in local scopes using flet and labels.

 

8.2 Additional Syntax

Much of the TXR Lisp syntax has been introduced in the previous sections of the manual, since directive forms are based on it. There is some additional syntax that is useful in TXR Lisp programming.

 

8.2.1 Symbol Tokens

The symbol tokens in TXR Lisp, called a lident (Lisp identifier) has a similar syntax to the bident (braced identifier) in the TXR pattern language. It may consist of all the same characters, as well as the / (slash) character which may not be used in a bident. Thus a lident may consist of these characters, in addition to letters and numbers:


 ! $ % & * + - < = > ? \ _ ~ /

and of course, may not look like a number. A lone / is a symbol in TXR Lisp. The token /abc/ is also a symbol, and not a regular expression, like it is in the braced variable syntax. Within TXR Lisp, regular expressions are written with a leading #.

 

8.2.2 Consing Dot

Unlike other major Lisp dialects, TXR Lisp allows a consing dot with no forms preceding it. This construct simply denotes the form which follows the dot. That is to say, the parser implements the following transformation:


  (. expr) -> expr

This is convenient in writing function argument lists that only take variable arguments. Instead of the syntax:


  (defun fun args ...)

the following syntax can be used:


  (defun fun (. args) ...)

When a lambda form is printed, it is printed in the following style.


  (lambda nil ...) -> (lambda () ...)
  (lambda sym ...) -> (lambda (. sym) ...)
  (lambda (sym) ...) -> (lambda (sym) ...)

In no other circumstances is nil printed as (), or an atom sym as (. sym).

 

8.2.3 Referencing Dot

A dot token which is flanked by expressions on both sides, without any intervening whitespace, is the referencing dot, and not the consing dot. The referencing dot is a syntactic sugar which translated to the qref syntax ("quoted ref"). This syntax denotes structure access; see Structures.


  ;; a.b may be almost any expressions
  a.b           <-->  (qref a b)
  a.b.c         <-->  (qref a b c)
  a.(qref b c)  <-->  (qref a b c)
  (qref a b).c  <-->  (qref (qref a b) c)

That is to say, this dot operator constructs a qref expression out of its left and right arguments. If the right argument of the dot is already a qref expression (whether produced by another instance of the dot operator, or expressed directly) it is merged. And the qref dot operator is right-to-left associative, so that a.b.c first produces (qref b c) via the right dot, and then a is adjoined into the syntax via the right dot.

Integer tokens cannot be involved in this syntax, because they form floating-point constants when juxtaposed with a dot. Such ambiguous uses of floating-point tokens are diagnosed as syntax errors:


  (a.4)   ;; error: cramped floating-point literal
  (a .4)  ;; good: a followed by 0.4

 

8.2.4 Quote and Quasiquote

'expr

The quote character in front of an expression is used for suppressing evaluation, which is useful for forms that evaluate to something other than themselves. For instance if '(+ 2 2) is evaluated, the value is the three-element list (+ 2 2), whereas if (+ 2 2) is evaluated, the value is 4. Similarly, the value of 'a is the symbol a itself, whereas the value of a is the contents of the variable a.

^qq-template

The caret in front of an expression is a quasiquote. A quasiquote is like a quote, but with the possibility of substitution of material.

Under a quasiquote, form is considered to be a quasiquote template. The template is considered to be a literal structure, except that it may contain the notations ,expr and ,*expr which denote non-constant parts.

A quasiquote gets translated into code which, when evaluated, constructs the structure implied by qq-template, taking into account the unquotes and splices.

A quasiquote also processes nested quasiquotes specially.

If qq-template does not contain any unquotes or splices (which match its level of nesting), or is simply an atom, then ^qq-template is equivalent to 'qq-template . in other words, it is like an ordinary quote. For instance ^(a b ^(c ,d)) is equivalent to '(a b ^(c ,d)). Although there is an unquote ,d it belongs to the inner quasiquote ^(c ,d), and the outer quasiquote does not have any unquotes of its own, making it equivalent to a quote.

Dialect note: in Common Lisp and Scheme, ^form is written `form, and quasiquotes are also informally known as backquotes. In TXR, the backquote character ` used for quasi string literals.

,expr

The comma character is used within a qq-template to denote an unquote. Whereas the quasiquote suppresses evaluation, similarly to the quote, the comma introduces an exception: an element of a form which is evaluated. For example, list ^(a b c ,(+ 2 2) (+ 2 2)) is the list (a b c 4 (+ 2 2)). Everything in the quasiquote stands for itself, except for the ,(+ 2 2) which is evaluated.

Note: if a variable is called *x*, then the syntax ,*x* means ,* x*: splice the value of x*. In this situation, whitespace between the comma and the variable name should be used: , *x*.

,*expr

The comma-star operator is used within quasiquote list to denote a splicing unquote. The form which follows ,* must evaluate to a list. That list is spliced into the structure which the quasiquote denotes. For example: '(a b c ,*(list (+ 3 3) (+ 4 4) d)) evaluates to (a b c 6 8 d). The expression (list (+ 3 3) (+ 4 4)) is evaluated to produce the list (6 8), and this list is spliced into the quoted template.

Dialect note: in other Lisp dialects, the equivalent syntax is usually ,@ (comma at). The @ character already has an assigned meaning, so * is used.

 

8.2.5 Quasiquoting non-List Objects

Quasiquoting is supported over hash table and vector literals (see Vectors and Hashes below). A hash table or vector literal can be quoted, like any object, for instance:


  '#(1 2 3)

The #(1 2 3) literal is turned into a vector atom right in the TXR parser, and this atom is being quoted: this is (quote atom) syntactically, which evaluates to atom.

When a vector is quasi-quoted, this is a case of ^atom which evaluates to atom.

A vector can be quasiquoted, for example:


  ^#(1 2 3)

Of course, unquotes can occur within it.


  (let ((a 42))
    ^#(1 ,a 3)) ; value is #(1 42 3)

In this situation, the ^#(...) notation produces code which constructs a vector.

The vector in the following example is also a quasivector. It contains unquotes, and though the quasiquote is not directly applied to it, it is embedded in a quasiquote:


  (let ((a 42))
    ^(a b c #(d ,a))) ; value is (a b c #(d 42))

Hash table literals have two parts: the list of hash construction arguments and the key-value pairs. For instance:


   #H((:equal-based) (a 1) (b 2))

where (:equal-based) is the list of construction arguments and the pairs (a 1) and (b 2) are the key/value entries. Hash literals may be quasiquoted. In quasiquoting, the arguments and pairs are treated as separate syntax; it is not one big list. So the following is not a possible way to express the above hash:


  ;; not supported: splicing across the entire syntax
  (let ((hash-syntax '((:equal-based) (a 1) (b 2))))
    ^#H(,*hash-syntax))

This is correct:


  ;; fine: splicing hash arguments and contents separately
  (let ((hash-args '(:equal-based))
        (hash-contents '((a 1) (b 2))))
    ^#H(,hash-args ,*hash-contents))

 

8.2.6 Quasiquoting combined with Quasiliterals

When a quasiliteral is embedded in a quasiquote, it is possible to use splicing to insert material into the quasiliteral.

Example:


  (eval (let ((a 3)) ^`abc @,a @{,a} @{(list 1 2 ,a)}`))


  -> "abc 3 3 1 2 3"

 

8.2.7 Vector Literals

#(...)

A hash token followed by a list denotes a vector. For example #(1 2 a) is a three-element vector containing the numbers 1 and 2, and the symbol a.

 

8.2.8 Struct Literals

#H(name {slot value}*)

The notation #S followed by a nested list syntax denotes a struct literal. The first item in the syntax is a symbol denoting the struct type name. This must be the name of a struct type, otherwise the literal is erroneous. Followed by the struct type are slot names interleaved with their values. Each slot name which is present in the literal must name a slot in the struct type, though not all slots in the struct type must be present in the literal. When a struct literal is read, the denoted struct type is constructed as if by a call to make-struct whose plist argument is formed from the slot and value elements of the literal, individually quoted to suppress their evaluation as forms.

 

8.2.9 Hash Literals

#H((hash-argument*) (key value)*)

The notation #H followed by a nested list syntax denotes a hash table literal. The first item in the syntax is a list of keywords. These are the same keywords as are used when calling the function hash to construct a hash table. Allowed keywords are: :equal-based, :weak-keys and :weak-values. An empty list can be specified as nil or (), which defaults to a hash table based on the eql function, with no weak semantics.

 

8.2.10 Range Literals

#R(from to)

The notation #R followed by a two-element list syntax denotes a range literal.

 

8.2.11 The .. notation

In TXR Lisp, there is a special "dotdot" notation consisting of a pair of dots. This can be written between successive atoms or compound expressions, and is a shorthand for rcons.

That is to say, A .. B translates to (rcons A B), and so for instance (a b .. (c d) e .. f . g) means (a (rcons b (c d)) (rcons e f) . g).

The rcons function constructs a range object, which denotes a pair of values. Range objects are most commonly used for referencing subranges of sequences.

For instance, if L is a list, then [L 1 .. 3] computes a sublist of L consisting of elements 1 through 2 (counting from zero).

Note that if this notation is used in the dot position of an improper list, the transformation still applies. That is, the syntax (a . b .. c) is valid and produces the object (a . (rcons b c)) which is another way of writing (a rcons b c), which is quite probably nonsense.

The notation's .. operator associates right to left, so that a..b..c denotes (rcons a (rcons b c)).

Note that range objects are not printed using the dotdot notation. A range literal has the syntax of a two-element list, prefixed by #R. (See Range Literals above).

In any context where the dotdot notation may be used, and where it is evaluated to its value, a range literal may also be specified. If an evaluated dotdot notation specifies two constant expressions, then an equivalent range literal can replace it. For instance the form [L 1 .. 3] can also be written [L #R(1 3)]. The two are syntactically different, and so if these expressions are being considered for their syntax rather than value, they are not the same.

 

8.2.12 The DWIM Brackets

TXR Lisp has a square bracket notation. The syntax [...] is a shorthand way of writing (dwim ...). The [] syntax is useful for situations where the expressive style of a Lisp-1 dialect is useful.

For instance if foo is a variable which holds a function object, then [foo 3] can be used to call it, instead of (call foo 3). If foo is a vector, then [foo 3] retrieves the fourth element, like (vecref foo 3). Indexing over lists, strings and hash tables is possible, and the notation is assignable.

Furthermore, any arguments enclosed in [] which are symbols are treated according to a modified namespace lookup rule.

More details are given in the documentation for the dwim operator.

 

8.2.13 Compound Forms

In TXR Lisp, there are two types of compound forms: the Lisp-2 style compound forms, denoted by ordinary lists that are expressed with parentheses. There are Lisp-1 style compound forms denoted by the DWIM Brackets, described in the previous section.

The first position of an ordinary Lisp-2 style compound form, is expected to have a function or operator name. Then arguments follow. There may also be an expression in the dotted position, if the form is a function call.

If the form is a function call then the arguments are evaluated. If any of the arguments are symbols, they are treated according to Lisp-2 namespacing rules.

 

8.2.14 Dot Position in Function Calls

If there is an expression in the dotted position of a function call expression, it is also evaluated, and the resulting value is involved in the function call in a special way.

Firstly, note that a compound form cannot be used in the dot position, for obvious reasons, namely that (a b c . (foo z)) does not mean that there is a compound form in the dot position, but denotes an alternate spelling for (a b c foo z), where foo behaves as a variable. (There exists a special exception to this, namely that the meta-numbers and meta-symbols of the op operator can be used in the dot position).

If the dot position of a compound form is an atom, then the behavior may be understood according to the following transformations:


  (f a b c ... . x)  -->  (apply (fun f) a b c ... x)
  [f a b c ... . x]  -->  [apply f a b c ... x]

Effectively, the dot notation constitutes a shorthand for apply.

Examples:


  ;; a contains 3
  ;; b contains 4
  ;; c contains #(5 6 7)
  ;; s contains "xyz"


  (foo a b . c)  ;; calls (foo 3 4 5 6 7)
  (foo a)        ;; calls (foo 3)
  (foo . s)      ;; calls (foo #\x #\y #\z)


  (list . a)     ;; yields 3
  (list a . b)   ;; yields (3 . 4)
  (list a . c)   ;; yields (3 5 6 7)
  (list* a c)    ;; yields (3 . #(5 6 7))


  (cons a . b)   ;; error: cons isn't variadic.
  (cons a b . c) ;; error: cons requires exactly two arguments.


  [foo a b . c]  ;; calls (foo 3 4 5 6 7)


  [c 1]          ;; indexes into vector #(5 6 7) to yield 6


  (call (op list 1 . @1) 2) ;; yields 2

Note that the atom in the dot position of a function call may be a symbol macro. Since the semantics works as if by transformation to an apply form in which the original dot position atom is an ordinary argument, the symbol macro may produce a compound form.

Thus:


  (symacrolet ((x 2))
    (list 1 . x))  ;; yields (1 . 2)


  (symacrolet ((x (list 1 2)))
    (list 1 . x))  ;; (yields (1 . 3))

That is to say, the expansion of x is not substituted into the form (list 1 . x) but rather the transformation to apply syntax takes place first, and so the substitution of x takes place in a form resembling (apply (fun list) 1 x).

Dialect Note:

In some other Lisp dialects like ANSI Common Lisp, the improper list syntax may not be used as a function call; a function called apply (or similar) must be used for application even if the expression which gives the trailing arguments is a symbol. Moreover, applying sequences other than lists is not supported.

 

8.2.15 Improper Lists as Macro Calls

TXR Lisp allows macros to be called using forms which are improper lists. These forms are simply destructured by the usual macro parameter list destructuring. To be callable this way, the macro must have an argument list which specifies a parameter match in the dot position. This dot position must either match the terminating atom of the improper list form, or else match the trailing portion of the improper list form.

For instance if a macro mac is defined as


  (defmacro mac (a b . c) ...)

then it may not be invoked as (mac 1 . 2) because the required argument b is not satisfied, and so the 2 argument cannot match the dot position c as required. The macro may be called as (mac 1 2 . 3) in which case c receives the form 3. If it is called as (mac 1 2 3 . 4) then c receives the improper list form 3 . 4.

 

8.2.16 Regular Expression Literals

In TXR Lisp, the / character can occur in symbol names, and the / token is a symbol. Therefore the /regex/ syntax is not used for denoting regular expressions; rather, the #/regex/ syntax is used.

 

8.3 Generalization of List Accessors

In ancient Lisp in the 1960's, it was not possible to apply the operations car and cdr to the nil symbol (empty list), because it is not a cons cell. In the InterLisp dialect, this restriction was lifted: these operations were extended to accept nil (and return nil). The convention was adopted in other Lisp dialects such as MacLisp and eventually in Common Lisp. Thus there exists an object which is not a cons, yet which takes car and cdr.

In TXR Lisp, this relaxation is extended further. For the sake of convenience, the operations car and cdr, are made to work with strings and vectors:


  (cdr "") -> nil
  (car "") -> nil


  (car "abc") -> #\a
  (cdr "abc") -> "bc"


  (cdr #(1 2 3)) -> #(2 3)
  (car #(1 2 3)) -> 1

Moreover, structure types which define the methods car, cdr and nullify can also be treated in the same way.

The ldiff function is also extended in a special way. When the right parameter a non-list sequence, then it uses the equal equality test rather than eq for detecting the tail of the list.


  (ldiff "abcd" "cd") -> (#\a #\b)

The ldiff operation starts with "abcd" and repeatedly applies cdr to produce "bcd" and "cd", until the suffix is equal to the second argument: (equal "cd" "cd") yields true.

Operations based on car, cdr and ldiff, such as keep-if and remq extend to strings and vectors.

Most derived list processing operations such as remq or mapcar obey the following rule: the returned object follows the type of the leftmost input list object. For instance, if one or more sequences are processed by mapcar, and the leftmost one is a character string, the function is expected to return characters, which are converted to a character string. However, in the event that the objects produced cannot be assembled into that type of sequence, a list is returned instead.

For example [mapcar list "ab" "12"] returns ((#\a #\b) (#\1 #\2)), because a string cannot hold lists of characters. However [mappend list "ab" "12"] returns "a1b2".

The lazy versions of these functions such as mapcar* do not have this behavior; they produce lazy lists.

 

8.4 Callable Objects

In TXR Lisp, sequences (strings, vectors and lists) as well as hashes and regular expressions can be used as functions everywhere, not just with the DWIM brackets.

Sequences work as one or two-argument functions. With a single argument, an element is selected by position and returned. With two arguments, a range is extracted and returned.

Moreover, when a sequence is used as a function of one argument, and the argument is a range object rather than an integer, then the call is equivalent to the two-argument form. This is the basis for array slice syntax like ["abc" 0..1] .

Hashes also work as one or two argument functions, corresponding to the arguments of the gethash function.

A regular expression behaves as a one, two, or three argument function, which operates on a string argument. It returns the leftmost matching substring, or else nil.

Example 1:


  (mapcar "abc" '(2 0 1)) -> (#\c #\a #\b)

Here, mapcar treats the string "abc" as a function of one argument (since there is one list argument). This function maps the indices 0, 1 and 2 to the corresponding characters of string "abc". Through this function, the list of integer indices (2 0 1) is taken to the list of characters (#\c #\a #\b).

Example 2:


  (call '(1 2 3 4) 1..3) -> (2 3)

Here, the shorthand 1 .. 3 denotes (rcons 1 3). A range used as an argument to a sequence performs range extraction: taking a slice starting at index 1, up to and not including index 3, as if by the call (sub '(1 2 3 4) 1 3).

Example 3:


  (call '(1 2 3 4) '(0 2)) -> (1 2)

A list of indices applied to a sequence is equivalent to using the select function, as if (select '(1 2 3 4) '(0 2)) were called.

Example 4:


  (call #/b./ "abcd") -> "bc"

Here, the regular expression, called as a function, finds the matching substring "bc" within the argument "abcd".

 

8.5 Special Variables

Similarly to Common Lisp, TXR Lisp is lexically scoped by default, but also has dynamically scoped (a.k.a "special") variables.

When a variable is defined with defvar or defparm, a binding for the symbol is introduced in the global name space, regardless of in what scope the defvar form occurs.

Furthermore, at the time the defvar form is evaluated, the symbol which names the variable is tagged as special.

When a symbol is tagged as special, it behaves differently when it is used in a lexical binding construct like let, and all other such constructs such as function parameter lists. Such a binding is not the usual lexical binding, but a "rebinding" of the global variable. Over the dynamic scope of the form, the global variable takes on the value given to it by the rebinding. When the form terminates, the prior value of the variable is restored. (This is true no matter how the form terminates; even if by an exception.)

Because of this "pervasive special" behavior of a symbol that has been used as the name of a global variable, a good practice is to make global variables have visually distinct names via the "earmuffs" convention: beginning and ending the name with an asterisk.

Example:


  (defvar *x* 42)     ;; *x* has a value of 42


  (defun print-x ()
    (format t "~a\n" *x*))


  (let ((*x* "abc"))  ;; this overrides *x*
    (print-x))        ;; *x* is now "abc" and so that is printed


  (print-x)           ;; *x* is 42 again and so "42" is printed

Dialect Note 1:

The terms bind and binding are used differently in TXR Lisp compared to ANSI Common Lisp. In TXR Lisp binding is an association between a symbol and an abstract storage location. The association is registered in some namespace, such as the global namespace or a lexical scope. That storage location, in turn, contains a value. In ANSI Lisp, a binding of a dynamic variable is the association between the symbol and a value. It is possible for a dynamic variable to exist, and not have a value. A value can be assigned, which creates a binding. In TXR Lisp, an assignment is an operation which transfers a value into a binding, not one which creates a binding.

In ANSI Lisp, a dynamic variable can exist which has no value. Accessing the value signals a condition, but storing a value is permitted; doing so creates a binding. By contrast, in TXR Lisp a global variable cannot exist without a value. If a defvar form doesn't specify a value, and the variable doesn't exist, it is created with a value of nil.

Dialect Note 2:

Unlike ANSI Common Lisp, TXR Lisp has global lexical variables in addition to special variables. These are defined using defvarl and defparml. The only difference is that when variables are introduced by these macros, the symbols are not marked special, so their binding in lexical scopes is not altered to dynamic binding.

Many variables in TXR Lisp's standard library are global lexicals. Those which are special variables obey the "earmuffs" convention in their naming. For instance s-ifmt, log-emerg and sig-hup are global lexicals, because they provide constant values for which overriding doesn't make sense. On the other hand the standard output stream variable *stdout* is special. Overriding it over a dynamic scope is very useful.

Dialect Note 3:

In Common Lisp, defparm is known as defparameter.

 

8.6 Syntactic Places and Accessors

The TXR Lisp feature known as syntactic places allows programs to use the syntax of a form which is used to access a value from an environment or object, as an expression which denotes a place where a value may be stored.

They are almost exactly the same concept as "generalized references" in Common Lisp, and are related to "lvalues" in languages in the C family, or "designators" in Pascal.

 

8.6.1 Symbolic Places

A symbol is a is a syntactic place if it names a variable. If a is a variable, then it may be assigned using the set operator: the form (set a 42) causes a to have the integer value 42.

 

8.6.2 Compound Places

A compound expression can be a syntactic place, if its leftmost constituent is as symbol which is specially registered, and if the form has the correct syntax for that kind of place, and suitable semantics. Such an expression is a compound place.

An example of a compound place is a car form. If c is an expression denoting a cons cell, then (car c) is not only an expression which retrieves the value of the car field of the cell. It is also a syntactic place which denotes that field as a storage location. Consequently, the expression (set (car c) "abc") stores the character string "abc" in that location. Although the same effect can be obtained with (rplaca c "abc") the syntactic place frees the programmer from having to remember different update functions for different kinds of places. There are various other advantages. TXR Lisp provides a plethora of operators for modifying a place in addition to set. Subject to certain usage restrictions, these operators work uniformly on all places. For instance, the expression (rotate (car x) [str 3] y) causes three different kinds of places to exchange contents, while the three expressions denoting those places are evaluated only once. New kinds of place update macros like rotate are quite easily defined, as are new kinds of compound places.

 

8.6.3 Accessor Functions

When a function call form such as the above (car x) is a syntactic place, then the function is called an accessor. This term is used throughout this document to denote functions which have associated syntactic places.

 

8.6.4 Macro Call Syntactic Places

Syntactic places can be macros (global and lexical), including symbol macros. So for instance in (set x 42) the x place can actually be a symbolic macro which expands to, say, (cdr y). This means that the assignment is effectively (set (cdr y) 42).

 

8.6.5 User-Defined Syntactic Places and Place Operators

Syntactic places, as well as operators upon syntactic places, are both open-ended. Code can be written quite easily in TXR Lisp to introduce new kinds of places, as well as new place-mutating operators. New places can be introduced with the help of the defplace macro, or possibly the define-place-macro macro in simple cases when a new syntactic place can be expressed as a transformation to the syntax of an existing place. Three ways exist for developing new place update macros (place operators). They can be written using the ordinary macro definer ordinary macro definer defmacro, with the help of special utility macros called with-update-expander, with-clobber-expander, and with-delete-expander. They can also be written using defmacro in conjunction with the operators placelet or placelet*. Simple update macros similar to inc and push can be written compactly using define-modify-macro.

 

8.6.6 Deletable Places

Unlike generalized references in Common Lisp, TXR Lisp syntactic places support the concept of deletion. Some kinds of places can be deleted, which is an action distinct from (but does not preclude) being overwritten with a value. What exactly it means for a place to be deleted, or whether that is even permitted, depends on the kind of place. For instance a place which denotes a lexical variable may not be deleted, whereas a global variable may be. A place which denotes a hash table entry may be deleted, and results in the entry being removed from the hash table. Deleting a place in a list causes the trailing items, if any, or else the terminating atom, to move in to close the gap. Users may, of course, define new kinds of places which support deletion semantics.

 

8.6.7 Evaluation of Places

To bring about their effect, place operators must evaluate one or more places. Moreover, some of them evaluate additional forms which are not places. Which arguments of a place operator form are places and which are ordinary forms depends on its specific syntax. For all the built-in place operators, the position of an argument in the syntax determines whether it is treated as (and consequently required to be) a syntactic place, or whether it is an ordinary form.

All built-in place operators perform the evaluation of place and non-place argument forms in strict left to right order.

Place forms are evaluated not in order to compute a value, but in order to determine the storage location. In addition to determining a storage location, the evaluation of a place form may possibly give rise to side effects. Once a place is fully evaluated, the storage location can then be accessed. Access to the storage location is not considered part of the evaluation of a place. To determine a storage location means to compute some hidden referential object which provides subsequent access to that location without the need for a re-evaluation of the original place form. (The subsequent access to the place through this referential object may still require a multi-step traversal of a data structure; minimizing such steps is a matter of optimization.)

Place forms may themselves be compounds, which contain subexpressions that must be evaluated. All such evaluation for the built-in places takes place in left to right order.

Certain place operators, such as shift and rotate, exhibit an unspecified behavior with regard to the timing of the access of the prior value of a place, relative to the evaluation of places which occur later in the same place operator form. Access to the prior values may be delayed until the entire form is evaluated, or it may be interleaved into the evaluation of the form. For example, in the form (shift a b c 1), the prior value of a can be accessed and saved as soon as a is evaluated, prior to the evaluation of b. Alternatively, a may be accessed and saved later, after the evaluation of b or after the evaluation of all the forms. This issue affects the behavior of place-modifying forms whose subforms contain side effects. It is recommended that such forms not be used in programs.

 

8.6.8 Nested Places

Certain place forms are required to have one or more arguments which are themselves places. The prime example of this, and the only example from among built-in syntactic places, are DWIM forms. A DWIM form has the syntax


  (dwim
obj-place index [alt])

and of course the square-bracket-notation equivalent:


  [
obj-place index [alt]]

Note that not only is the entire form a place, denoting some element or element range of obj-place, but there is the added constraint that obj-place must also itself be a syntactic place.

This requirement is necessary, because it supports the behavior that when the element or element range is updated, then obj-place is also potentially updated.

After the assignment (set [obj 0..3] '("forty" "two")) not only is the range of places denoted by [obj 0..3] replaced by the list of strings ("forty" "two") but obj may also be overwritten with a new value.

This behavior is necessary because the DWIM brackets notation maintains the illusion of an encapsulated array-like container over several dis-similar types, including Lisp lists. But Lisp lists do not behave as fully encapsulated containers. Some mutations on Lisp lists return new objects, which then have to stored (or otherwise accepted) in place of the original objects in order to maintain the array-like container illusion.

 

8.6.9 Built-In Syntactic Places

The following is a summary of the built-in place forms, in addition to symbolic places denoting variables. Of course, new syntactic place forms can be defined by TXR programs.


  (car
object)
  (first
object)
  (rest
object)
  (second
object)
  (third
object)
  ...
  (tenth
object)
  (cdr
object)
  (caar
object)
  (cadr
object)
  (cdar
object)
  (cddr
object)
  ...
  (cdddddr
object)
  (nthcdr
index list)
  (rest
object)
  (vecref
vec idx)
  (chr-str
str idx)
  (gethash
hash key [alt])
  (dwim
obj-place index [alt])
  [
obj-place index [alt]] ;; equivalent to dwim
  (symbol-value
symbol-valued-form)
  (symbol-function
symbol-valued-form)
  (symbol-macro
symbol-valued-form)
  (fun
function-name)
  (force
promise)
  (errno)

 

8.6.10 Built-In Place-Mutating Operators

The following is a summary of the built-in place mutating macros. They are described in detail in their own sections.

(set {place new-value}*)
Assigns the values of expressions to places, performing assignments in left to right order, returning the value assigned to the rightmost place.

(pset {place new-value}*)
Assigns the values of expressions to places, performing the determination of places and evaluation of the expressions left to right, but the assignment in parallel. Returns the value assigned to the rightmost place.

(zap place [new-value])
Assigns new-value to place, defaulting to nil, and returns the prior value.

(flip place)
Logically toggles the Boolean value of place, and returns the new value.

(test-set place)
If place contains nil, stores t into the place and returns t to indicate that the store took place. Otherwise does nothing and returns nil.

(test-clear place)
If place contains a Boolean true value, stores nil into the place and returns t to indicate that the store took place. Otherwise does nothing and returns nil.

(compare-swap place cmp-fun cmp-val store-val)
Examines the value of place and compares it to cmp-val using the comparison function given by the function name cmp-fun. If the comparison is false, returns nil. Otherwise, stores the store-val value into place and returns t.

(inc place [delta])
Increments place by delta, which defaults to 1, and returns the new value.

(dec place [delta])
Decrements place by delta, which defaults to 1, and returns the new value.

(pinc place [delta])
Increments place by delta, which defaults to 1, and returns the old value.

(pdec place [delta])
Decrements place by delta, which defaults to 1, and returns the old value.

(test-inc place [delta [from-val]])
Increments place by delta and returns t if the previous value was eql to from-val, where delta defaults to 1 and from-val defaults to zero.

(test-dec place [delta [to-val]])
Decrements place by delta and returns t if the new value is eql to to-val, where delta defaults to 1 and to-val defaults to 0.

(swap left-place right-place)
Exchanges the values of left-place and right-place.

(push item place)
Pushes item into the list stored in place and returns item.

(pop place)
Pop the list stored in place and returns the popped value.

(shift place+ shift-in-value)
Treats one or more places as a "multi-place shift register". Values are shifted to the left among the places. The rightmost place receives shift-in-value, and the value of the leftmost place emerges as the return value.

(rotate place*)
Treats zero or more places as a "multi-place rotate register". The places exchange values among themselves, by a rotation by one place to the left. The value of the leftmost place goes to the rightmost place, and that value is returned.

(del place)
Deletes a place which supports deletion, and returns the value which existed in that place prior to deletion.

(lset {place}+ list-expr)
Sets multiple places to values obtained from successive elements of sequence.

(upd place opip-arg*)
Applies an opip-style operational pipeline to the value of place and stores the result back into place.

 

8.7 Namespaces and Environments

TXR Lisp is a Lisp-2 dialect: it features separate namespaces for functions and variables.

 

8.7.1 Global Functions and Operator Macros

In TXR Lisp, global functions and operator macros co-exist, meaning that the same symbol can be defined as both a macro and a function.

There is a global namespace for functions, into which functions can be introduced with the defun macro. The global function environment can be inspected and modified using the symbol-function accessor.

There is a global namespace for macros, into which macros are introduced with the defmacro macro. The global function environment can be inspected and modified using the symbol-macro accessor.

If a name x is defined as both a function and a macro, then an expression of the form (x ...) is expanded by the macro, whereas an expression of the form [x ...] refers to the function. Moreover, the macro can produce a call to the function. The expression (fun x) will retrieve the function object.

 

8.7.2 Global and Dynamic Variables

There is a global namespace for variables also. The operators defvar and defparm introduce bindings into this namespace. These operators have the side effect of marking a symbol as a special variable, of the symbol are treated as dynamic variables, subject to rebinding. The global variable namespace together with the special dynamic rebinding is called the dynamic environment. The dynamic environment can be inspected and modified using the symbol-value accessor.

The operators defvarl and defparml introduce bindings into the global namespace without marking symbols as special variables. Such bindings are called global lexical variables.

 

8.7.3 Global Symbol Macros

Symbol macros may be defined over the global variable namespace using defsymacro.

 

8.7.4 Lexical Environments

In addition to global and dynamic namespaces, TXR Lisp provides lexically scoped binding for functions, variables, macros, and symbol macros. Lexical variable binding are introduced with let, let* or various binding macros derived from these. Lexical functions are bound with flet and labels. Lexical macros are established with macrolet and lexical symbol macros with symacrolet.

Macros receive an environment parameter with which they may expand forms in their correct environment, and perform some limited introspection over that environment in order to determine the nature of bindings, or the classification of forms in those environments. This introspection is provided by lexical-var-p, lexical-fun-p, and lexical-lisp1-binding.

Lexical operator macros and lexical functions can also co-exist in the following way. A lexical function shadows a global or lexical macro completely. However, the reverse is not the case. A lexical macro shadows only those uses of a function which look like macro calls. This is succinctly demonstrated by the following form:


  (flet ((foo () 43))
    (macrolet ((foo () 44))
      (list (fun foo) (foo) [foo])))


  -> (#<interpreted fun: lambda nil> 44 43)

The (fun foo) and [fun] expressions are oblivious to the macro; the macro expansion process process the symbol foo in those contexts. However the form (foo) is subject to macro-expansion and replaced with 44.

If the flet and macrolet are reversed, the behavior is different:


  (macrolet ((foo () 44))
    (flet ((foo () 43))
      (list (fun foo) (foo) [foo])))


  -> (#<interpreted fun: lambda nil> 43 43)

All three forms refer to the function, which lexically shadows the macro.

 

8.7.5 Pattern Language and Lisp Scope Nesting

TXR Lisp expressions can be embedded in the TXR pattern language in various ways. Likewise, the pattern language can be invoked from TXR Lisp. This brings about the possibility that Lisp code attempts to access pattern variables bound in the pattern language. The TXR pattern language can also attempt to access TXR Lisp variables.

The rules are as follows, but they have undergone historic changes. See the COMPATIBILITY section, in particular notes under 138 and 121, and also 124.

A Lisp expression evaluated from the TXR pattern language executes in a null lexical environment. The current set of pattern variables captured up to that point by the pattern language are installed as dynamic variables. They shadow any Lisp global variables (whether those are defined by defvar or defvarl).

In the reverse direction, a variable reference from the TXR pattern language searches the pattern variable space first. If a variable doesn't exist there, then the lookup refers to the TXR Lisp global variable space. The pattern language doesn't see Lisp lexical variables.

When Lisp code is evaluated from the pattern language, the pattern variable bindings are not only installed as dynamic variables for the sake of their visibility from Lisp, but they are also specially stored in a dynamic environment frame. When TXR pattern code is re-entered from Lisp, these bindings are picked up from the closest such environment frame, allowing the nested invocation of pattern code to continue with the bindings captured by outer pattern code.

Concisely, in any context in which a symbol has both a binding as a Lisp global variable as well as a pattern variable, that symbol refers to the pattern variable. Pattern variables are propagated through Lisp evaluation into nested invocations of the pattern language.

The pattern language can also reference Lisp variables using the @ prefix, which is a consequence of that prefix introducing an expression that is evaluated as Lisp, the name of a variable being such an expression.

 

9 LISP OPERATOR, FUNCTION AND MACRO REFERENCE

 

9.1 Conventions

The following sections list all of the special operators, macros and functions in TXR Lisp.

In these sections Syntax is indicated using these conventions:

word
A symbol in fixed-width-italic font denotes some syntactic unit: it may be a symbol or compound form. The syntactic unit is explained in the corresponding Description section.

{syntax}* word*
This indicates a repetition of zero or more of the given syntax enclosed in the braces or syntactic unit. The curly braces may be omitted if the scope of the * is clear.

{syntax}+ word+
This indicates a repetition of one or more of the given syntax enclosed in the braces or syntactic unit. The curly braces may be omitted if the scope of the + is clear.

{syntax | syntax | ...}
This indicates a choice among alternatives. May be combined with + or * repetition.

[syntax] [word]
Square brackets indicate optional syntax.

syntax -> result
The arrow notation is used in examples to indicate that the evaluation of the given syntax produces a value, whose printed representation is result.

 

9.2 Form Evaluation

A compound expression with a symbol as its first element, if intended to be evaluated, denotes either an operator invocation or a function call. This depends on whether the symbol names an operator or a function.

When the form is an operator invocation, the interpretation of the meaning of that form is under the complete control of that operator.

If the compound form is a function call, the remaining forms, if any, denote argument expressions to the function. They are evaluated in left to right order to produce the argument values, which are passed to the function. An exception is thrown if there are not enough arguments, or too many. Programs can define named functions with the defun operator

Some operators are macros. There exist predefined macros in the library, and macro operators can also be user-defined using the macro-defining operator defmacro. Operators that are not macros are called special operators.

Macro operators work as functions which are given the source code of the form. They analyze the form, and translate it to another form which is substituted in their place. This happens during a code walking phase called the expansion phase, which is applied to each top-level expression prior to evaluation. All macros occurring in a form are expanded in the expansion phase, and subsequent evaluation takes place on a structure which is devoid of macros. All that remains are the executable forms of special operators, function calls, symbols denoting either variables or themselves, and atoms such as numeric and string literals.

Special operators can also perform code transformations during the expansion phase, but that is not considered macroexpansion, but rather an adjustment of the representation of the operator into an required executable form. In effect, it is post-macro compilation phase.

Note that Lisp forms occurring in TXR pattern language are not individual top-level forms. Rather, the entire TXR query is parsed at the same time, and the macros occurring in its Lisp forms are expanded at that time.

 

9.2.1 Operator quote

Syntax:


  (quote
form)

Description:

The quote operator, when evaluated, suppresses the evaluation of form, and instead returns form itself as an object. For example, if form is a symbol, then form is not evaluated to the symbol's value; rather the symbol itself is returned.

Note: the quote syntax '<form> is translated to (quote form).

Example:


  ;; yields symbol a itself, not value of variable a
  (quote a) -> a


  ;; yields three-element list (+ 2 2), not 4.
  (quote (+ 2 2)) -> (+ 2 2)

 

9.3 Variable Binding

Variables are associations between symbols and storage locations which hold values. These associations are called bindings.

Bindings are held in a context called an environment.

Lexical environments hold local variables, and nest according to the syntactic structure of the program. Lexical bindings are always introduced by a some form known as a binding construct, and the corresponding environment is instantiated during the evaluation of that construct. There also exist bindings outside of any binding construct, in the so-called global environment . Bindings in the global environment can be temporarily shadowed by lexically-established binding in the dynamic environment . See the Special Variables section above.

Certain special symbols cannot be used as variable names, namely the symbols t and nil, and all of the keyword symbols (symbols in the keyword package), which are denoted by a leading colon. When any of these symbols is evaluated as a form, the resulting value is that symbol itself. It is said that these special symbols are self-evaluating or self-quoting, similarly to all other atom objects such as numbers or strings.

When a form consisting of a symbol, other than the above special symbols, is evaluated, it is treated as a variable, and yields the value of the variable's storage location. If the variable doesn't exist, an exception is thrown.

Note: symbol forms may also denote invocations of symbol macros. (See the operators defsymacro and symacrolet). All macros, including symbol macros, which occur inside a form are fully expanded prior to the evaluation of a form, therefore evaluation does not consider the possibility of a symbol being a symbol macro.

 

9.3.1 Operator defvar and macro defparm

Syntax:


  (defvar
sym [value])
  (defparm
sym value)

Description:

The defvar operator binds a name in the variable namespace of the global environment. Binding a name means creating a binding: recording, in some namespace of some environment, an association between a name and some named entity. In the case of a variable binding, that entity is a storage location for a value. The value of a variable is that which has most recently been written into the storage location, and is also said to be a value of the binding, or stored in the binding.

If the variable named sym already exists in the global environment, the form has no effect; the value form is not evaluated, and the value of the variable is unchanged.

If the variable does not exist, then a new binding is introduced, with a value given by evaluating the value form. If the form is absent, the variable is initialized to nil.

The value form is evaluated in the environment in which the defvar form occurs, not necessarily in the global environment.

The symbols t and nil may not be used as variables, and neither can be keyword symbols: symbols denoted by a leading colon.

In addition to creating a binding, the defvar operator also marks sym as the name of a special variable. This changes what it means to bind that symbol in a lexical binding construct such as the let operator, or a function parameter list. See the section "Special Variables" far above.

The defparm macro behaves like defvar when a variable named sym doesn't already exist.

If sym already denotes a variable binding in the global namespace, defparm evaluates the value form and assigns the resulting value to the variable.

The following equivalence holds:


  (defparm x y)  <-->  (prog1 (defvar x) (set x y))

The defvar and defparm forms return sym.

 

9.3.2 Macros defvarl and defparml

Syntax:


  (defvarl
sym [value])
  (defparml
sym value)

Description:

The defvarl and defparml macros behave, respectively, almost exactly like defvar and defparm.

The difference is that these operators do not mark sym as special.

If a global variable sym does not previously exist, then after the evaluation of either of these forms (boundp sym) is true, but (special-var-p sym) isn't.

If sym had been already introduced as a special variable, it stays that way after the evaluation of defvarl or defparml.

 

9.3.3 Operators let and let*

Syntax:


  (let ({
sym | (sym init-form)}*) body-form*)
  (let* ({
sym | (sym init-form)}*) body-form*)

Description:

The let and let* operators introduce a new scope with variables and evaluate forms in that scope. The operator symbol, either let or let*, is followed by a list which can contain any mixture of variable name symbols, or (sym init-form) pairs. A symbol denotes the name of variable to be instantiated and initialized to the value nil. A symbol specified with an init-form denotes a variable which is initialized from the value of the init-form.

The symbols t and nil may not be used as variables, and neither can be keyword symbols: symbols denoted by a leading colon.

The difference between let and let* is that in let*, later init-form-s have visibility over the variables established by earlier variables in the same let* construct. In plain let, the variables are not visible to any of the init-form-s.

When the variables are established, then the body-form-s are evaluated in order. The value of the last body-form becomes the return value of the let.

If there are no body-form-s, then the return value nil is produced.

The list of variables may be empty.

Examples:


  (let ((a 1) (b 2)) (list a b)) -> (1 2)
  (let* ((a 1) (b (+ a 1))) (list a b (+ a b))) -> (1 2 3)
  (let ()) -> nil
  (let (:a nil)) -> error, :a and nil can't be used as variables

 

9.4 Functions

 

9.4.1 Operator defun

Syntax:


  (defun
name (param* [: opt-param*] [. rest-param])
    
body-form)

Description:

The defun operator introduces a new function in the global function namespace. The function is similar to a lambda, and has the same parameter syntax and semantics as the lambda operator.

Unlike in lambda, the body-form-s of a defun are surrounded by a block. The name of this block is the same as the name of the function, making it possible to terminate the function and return a value using (return-from name value). For more information, see the definition of the block operator.

A function may call itself by name, allowing for recursion.

The special symbols t and nil may not be used as function names. Neither can keyword symbols.

It is possible to define methods with defun, as an alternative to the defmeth macro.

To define a method, the syntax (meth type name) should be used as the argument to the name parameter.

The syntax (defun (meth type name) args forms) is equivalent to the (defmeth type name args forms) syntax.

Dialect Note:

In ANSI Common Lisp, keywords may be used as function names. In TXR Lisp, they may not.

Dialect Note:

A function defined by defun may co-exist with a macro defined by defmacro. This is not permitted in ANSI Common Lisp.

 

9.4.2 Operator lambda

Syntax:


  (lambda (
param* [: opt-param*] [. rest-param])
    
body-form)
  (lambda
rest-param
    
body-form)

Description:

The lambda operator produces a value which is a function. Like in most other Lisps, functions are objects in TXR Lisp. They can be passed to functions as arguments, returned from functions, aggregated into lists, stored in variables, et cetera.

The first argument of lambda is the list of parameters for the function. It may be empty, and it may also be an improper list (dot notation) where the terminating atom is a symbol other than nil. It can also be a single symbol.

The second and subsequent arguments are the forms making up the function body. The body may be empty.

When a function is called, the parameters are instantiated as variables that are visible to the body forms. The variables are initialized from the values of the argument expressions appearing in the function call.

The dotted notation can be used to write a function that accepts a variable number of arguments. There are two ways write a function that accepts only a variable argument list and no required arguments:


  (lambda (.
rest-param) ...)
  (lambda
rest-param ...)

(These notations are syntactically equivalent because the list notation (. X) actually denotes the object X which isn't wrapped in any list).

The keyword symbol : (colon) can appear in the parameter list. This is the symbol in the keyword package whose name is the empty string. This symbol is treated specially: it serves as a separator between required parameters and optional parameters. Furthermore, the : symbol has a role to play in function calls: it can be specified as an argument value to an optional parameter by which the caller indicates that the optional argument is not being specified. It will be processed exactly that way.

An optional parameter can also be written in the form (name expr [sym]). In this situation, if the call does not specify a value for the parameter (or specifies a value as the keyword : (colon)) then the parameter takes on the value of the expression expr. If sym is specified, then sym will be introduced as an additional binding with a Boolean value which indicates whether or not the optional parameter had been specified by the caller.

The initializer expressions are evaluated an environment in which all of the previous parameters are visible, in addition to the surrounding environment of the lambda. For instance:


  (let ((default 0))
    (lambda (str : (end (length str)) (counter default))
      (list str end counter)))

In this lambda, the initializing expression for the optional parameter end is (length str), and the str variable it refers to is the previous argument. The initializer for the optional variable counter is the expression default, and it refers to the binding established by the surrounding let. This reference is captured as part of the lambda's lexical closure.

Examples:

Counting function:
This function, which takes no arguments, captures the variable counter. Whenever this object is called, it increments counter by 1 and returns the incremented value.


  (let ((counter 0))
    (lambda () (inc counter)))

Function that takes two or more arguments:
The third and subsequent arguments are aggregated into a list passed as the single parameter z:


  (lambda (x y . z) (list 'my-arguments-are x y z))

Variadic function:


  (lambda args (list 'my-list-of-arguments args))

Optional arguments:


  [(lambda (x : y) (list x y)) 1] -> (1 nil)
  [(lambda (x : y) (list x y)) 1 2] -> (1 2)

 

9.4.3 Macros flet and labels

Syntax:


  (flet ({(
name param-list function-body-form*)}*)
    
body-form*)


  (labels ({(
name param-list function-body-form*)}*)
    
body-form*)

Description:

The flet and labels macros bind local, named functions in the lexical scope. The difference between flet and labels is that a function defined by labels can see itself, and therefore recurse directly by name. Moreover, if multiple functions are defined by the same labels construct, they all have each other's names in scope of their bodies. By contrast, a flet-defined function does not have itself in scope and cannot recurse. Multiple functions in the same flet do not have each other's names in their scopes.

More formally, the function-body-form-s and param-list of the functions defined by labels are in a scope in which all of the function names being defined by that same labels construct are visible.

Under both labels and flet, the local functions that are defined are lexically visible to the main body-form-s.

Note that labels and flet are properly scoped with regard to macros. During macro expansion, function bindings introduced by these macro operators shadow macros defined by macrolet and defmacro.

Furthermore, function bindings introduced by labels and flet also shadow symbol macros defined by symacrolet, when those symbol macros occur as arguments of a dwim form.

See also: the macrolet operator.

Dialect Note:

The flet and labels macros do not establish named blocks around the body forms of the local functions which they bind. This differs from ANSI Common Lisp, whose local function have implicit named blocks, allowing for return-from to be used.

Examples:


  ;; Wastefully slow algorithm for determining evenness.
  ;; Note:
  ;; - mutual recursion between labels-defined functions
  ;; - inner is-even bound by labels shadows the outer
  ;;   one bound by defun so the (is-even n) call goes
  ;;   to the local function.


  (defun is-even (n)
   (labels ((is-even (n)
              (if (zerop n) t (is-odd (- n 1))))
            (is-odd (n)
              (if (zerop n) nil (is-even (- n 1)))))
     (is-even n)))

 

9.4.4 Function call

Syntax:


  (call
function argument*)

Description:

The call function invokes function, passing it the given arguments, if any.

Examples:

Apply arguments 1 2 to a lambda which adds them to produce 3:


  (call (lambda (a b) (+ a b)) 1 2)

Useless use of call on a named function; equivalent to (list 1 2):


  (call (fun list) 1 2)

 

9.4.5 Operator fun

Syntax:


  (fun
function-name)

Description:

The fun operator retrieves the function object corresponding to a named function in the current lexical environment.

The function-name is a symbol denoting a named function: a built in function, or one defined by defun.

Note: the fun operator does not see macro bindings. It is possible to retrieve a global macro expander using symbol-function.

Dialect Note:

A lambda expression is not a function name in TXR Lisp. The syntax (fun (lambda ...)) is invalid.

 

9.4.6 Operator dwim

Syntax:


  (dwim
argument*)
  [
argument*]
  (set (dwim
obj-place index [alt]) new-value)
  (set [
obj-place index [alt]] new-value)

Description:

The dwim operator's name is an acronym: DWIM may be taken to mean "Do What I Mean", or alternatively, "Dispatch, in a Way that is Intelligent and Meaningful".

The notation [...] is a shorthand which denotes (dwim ...).

The dwim operator takes a variable number of arguments, which are treated as expressions to be individually macro-expanded and evaluated, using the same rules.

This means that the first argument isn't a function name, but an ordinary expression which can simply compute a function object (or, more generally, a callable object).

Furthermore, for those arguments of dwim which are symbols (after all macro-expansion is performed), the evaluation rules are altered. For the purposes of resolving symbols to values, the function and variable binding namespaces are considered to be merged into a single space, creating a situation that is very similar to a Lisp-1 style dialect.

This special Lisp-1 evaluation is not recursively applied. All arguments of dwim which, after macro expansion, are not symbols are evaluated using the normal Lisp-2 evaluation rules. Thus, the DWIM operator must be used in every expression where the Lisp-1 rules for reducing symbols to values are desired.

If a symbol has bindings both in the variable and function namespace in scope, and is referenced by a dwim argument, this constitutes a conflict which is resolved according to two rules. When nested scopes are concerned, then an inner binding shadows an outer binding, regardless of their kind. An inner variable binding for a symbol shadows an outer or global function binding, and vice versa.

If a symbol is bound to both a function and variable in the global namespace, then the variable binding is favored.

Macros do not participate in the special scope conflation, with one exception. What this means is that the space of symbol macros is not folded together with the space of operator macros. An argument of dwim that is a symbol might be symbol macro, variable or function, but it cannot be interpreted as the name of a operator macro.

The exception is this: from the perspective of a dwim form, function bindings can shadow symbol macros. If a function binding is defined in an inner scope relative to a symbol macro for the same symbol, using flet or labels, the function hides the symbol macro. In other words, when macro expansion processes an argument of a dwim form, and that argument is a symbol, it is treated specially in order to provide a consistent name lookup behavior. If the innermost binding for that symbol is a function binding, it refers to that function binding, even if a more outer symbol macro binding exists, and so the symbol is not expanded using the symbol macro. By contrast, in an ordinary form, a symbolic argument never resolves to a function binding. The symbol refers to either a symbol macro or a variable, whichever is nested closer.

If, after macro expansion, the leftmost argument of the dwim is the name of a special operator or macro, the dwim form doesn't denote an invocation of that operator or macro. A dwim form is an invocation of the dwim operator, and the leftmost argument of that operator, if it is a symbol, is treated as a binding to be resolved in the variable or function namespace, like any other argument. Thus [if x y] is an invocation of the if function, not the if operator.

How many arguments are required by the dwim operator depends on the type of object to which the first argument expression evaluates. The possibilities are:

[function argument*]
Call the given function object with the given arguments.

[symbol argument*]
If the first expression evaluates to a symbol, that symbol is resolved in the function namespace, and then the resulting function, if found, is called with the given arguments.

[sequence index]
Retrieve an element from sequence, at the specified index, which is a nonnegative integer.

This form is also a place if the sequence subform is a place. If a value is stored to this place, it replaces the element.

The place may also be deleted, which has the effect of removing the element from the sequence, shifting the elements at higher indices, if any, down one element position, and shortening the sequence by one.

[sequence from-index..to-below-index]
Retrieve the specified range of elements. The range of elements is specified in the from and to fields of a range object. The .. (dotdot) syntactic sugar denotes it construction via the rcons function. See the section on Range Indexing below.

This form is also a syntactic place, if the sequence subform is a place. Storing a value in this place has the effect of replacing the subsequence with a new subsequence. Deleting the place has the effect of removing the specified subsequence from sequence. The new-value argument in a range assignment can be a string, vector or list, regardless of whether the target is a string, vector or list. If the target is a string, the replacement sequence must be a string, or a list or vector of characters.

[sequence index-list]
Elements specified by index-list, which may be a list or vector, are extracted from sequence and returned as a sequence of the same kind as sequence.

This form is equivalent to (select sequence where-index) except when the target of an assignment operation.

This form is a syntactic place if sequence is one. If a sequence is assigned to this place, then elements of the sequence are distributed to the specified locations.

The following equivalences hold between index-list-based indexing and the select and replace functions, except that set always returns the value assigned, whereas replace returns its first argument:


  [seq idx-list] <--> (select seq idx-list)


  (set [seq idx-list] new) <--> (replace seq new idx-list)

Note that unlike the select function, this does not support [hash index-list] because since hash keys may be lists, that syntax is indistinguishable from a simple hash lookup where index-list is the key.

[hash key [alt]]
Retrieve a value from the hash table corresponding to key, or else return alt if there is no such entry. The expression alt is always evaluated, whether or not its value is used.

[regex [start [from-end]] string ]
Determine whether regular expression regex matches string, and in that case return the (possibly empty) leftmost matching substring. Otherwise, return nil.

If start is specified, it gives the starting position where the search begins, and if from-end is given, and has a value other than nil, it specifies a search from right to left. These optional arguments have the same conventions and semantics as their equivalents in the search-regst function.

Note that string is always required, and is always the rightmost argument.

Range Indexing:

Vector and list range indexing is based from zero, meaning that the first element is numbered zero, the second one and so on. zero. Negative values are allowed; the value -1 refers to the last element of the vector or list, and -2 to the second last and so forth. Thus the range 1 .. -2 means "everything except for the first element and the last two".

The symbol t represents the position one past the end of the vector, string or list, so 0 .. t denotes the entire list or vector, and the range t .. t represents the empty range just beyond the last element. It is possible to assign to t .. t. For instance:


  (defvar list '(1 2 3))
  (set [list t .. t] '(4)) ;; list is now (1 2 3 4)

The value zero has a "floating" behavior when used as the end of a range. If the start of the range is a negative value, and the end of the range is zero, the zero is interpreted as being the position past the end of the sequence, rather than the first element. For instance the range -1..0 means the same thing as -1..t. Zero at the start of a range always means the first element, so that 0..-1 refers to all the elements except for the last one.

Notes:

The dwim operator allows for a Lisp-1 flavor of programming in TXR Lisp, which is principally a Lisp-2 dialect.

A Lisp-1 dialect is one in which an expression like (a b) treats both a and b as expressions subject to the same evaluation rules—at least, when a isn't an operator or an operator macro. This means that the symbols a and b are resolved to values in the same namespace. The form denotes a function call if the value of variable a is a function object. Thus in a Lisp-1, named functions do not exist as such: they are just variable bindings. In a Lisp-1, the form (car 1) means that there is a variable called car, which holds a function, which is retrieved from that variable and the argument 1 is applied to it. In the expression (car car), both occurrences of car refer to the variable, and so this form applies the car function to itself. It is almost certainly meaningless. In a Lisp-2 (car 1) means that there is a function called car, in the function namespace. In the expression (car car) the two occurrences refer to different bindings: one is a function and the other a variable. Thus there can exist a variable car which holds a cons cell object, rather than the car function, and the form makes sense.

The Lisp-1 approach is useful for functional programming, because it eliminates cluttering occurrences of the call and fun operators. For instance:


  ;; regular notation


  (call foo (fun second) '((1 a) (2 b)))


  ;; [] notation


  [foo second '((1 a) (2 b))]

Lisp-1 dialects can also provide useful extensions by giving a meaning to objects other than functions in the first position of a form, and the dwim/[...] syntax does exactly this.

TXR Lisp is a Lisp-2 because Lisp-2 also has advantages. Lisp-2 programs which use macros naturally achieve hygiene because lexical variables do not interfere with the function namespace. If a Lisp-2 program has a local variable called list, this does not interfere with the hidden use of the function list in a macro expansion in the same block of code. Lisp-1 dialects have to provide hygienic macro systems to attack this problem. Furthermore, even when not using macros, Lisp-1 programmers have to avoid using the names of functions as lexical variable names, if the enclosing code might use them.

The two namespaces of a Lisp-2 also naturally accommodate symbol macros and operator macros. Whereas functions and variables can be represented in a single namespace readily, because functions are data objects, this is not so with symbol macros and operator macros, the latter of which are distinguished syntactically by their position in a form. In a Lisp-1 dialect, given (foo bar), either of the two symbols could be a symbol macro, but only foo can possibly be an operator macro. Yet, having only a single namespace, a Lisp-1 doesn't permit (foo foo), where foo is simultaneously a symbol macro and an operator macro, though the situation is unambiguous by syntax even in Lisp-1. In other words, Lisp-1 dialects do not entirely remove the special syntactic recognition given to the leftmost position of a compound form, yet at the same time they prohibit the user from taking full advantage of it by providing only one namespace.

TXR Lisp provides the "best of both worlds": the DWIM brackets notation provides a model of Lisp-1 computation that is purer than Lisp-1 dialects (since the leftmost argument is not given any special syntactic treatment at all) while the Lisp-2 foundation provides a traditional Lisp environment with its "natural hygiene".

 

9.5 Sequencing, Selection and Iteration

 

9.5.1 Operators progn and prog1

Syntax:


  (progn
form*)
  (prog1
form*)

Description:

The progn operator evaluates forms in order, and returns the value of the last form. The return value of the form (progn) is nil.

The prog1 operator evaluates forms in order, and returns the value of the first form. The return value of the form (prog1) is nil.

Various other operators such as let also arrange for the evaluation of a body of forms, the value of the last of which is returned. These operators are said to feature an implicit progn.

 

9.5.2 Operator cond

Syntax:


  (cond {(
test form*)}*)

Description:

The cond operator provides a multi-branching conditional evaluation of forms. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.

The forms are processed from left to right as follows: the first form, test, in each group is evaluated. If it evaluates true, then the remaining forms in that group, if any, are also evaluated. Processing then terminates and the result of the last form in the group is taken as the result of cond. If test is the only form in the group, then result of test is taken as the result of cond.

If the first form of a group yields nil, then processing continues with the next group, if any. If all form groups yield nil, then the cond form yields nil. This holds in the case that the syntax is empty: (cond) yields nil.

 

9.5.3 Macros caseq, caseql and casequal

Syntax:


  (caseq
test-form normal-clause* [else-clause])
  (caseql
test-form normal-clause* [else-clause])
  (casequal
test-form normal-clause* [else-clause])

Description:

These three macros arrange for the evaluation of of test-form, whose value is then compared against the key or keys in each normal-clause in turn. When the value matches a key, then the remaining forms of normal-clause are evaluated, and the value of the last form is returned; subsequent clauses are not evaluated. When the value doesn't match any of the keys of a normal-clause then the next normal-clause is tested. If all these clauses are exhausted, and there is no else-clause, then the value nil is returned. Otherwise, the forms in the else-clause are evaluated, and the value of the last one is returned.

The syntax of a normal-clause takes on these two forms:


  (
key form*)

where key may be an atom which denotes a single key, or else a list of keys. There is a restriction that the symbol t may not be used as key. The form (t) may be used as a key to match that symbol.

The syntax of an else-clause is:


  (t
form*)

which resembles a form that is often used as the final clause in the cond syntax.

The three forms of the case construct differ from what type of test they apply between the value of test-form and the keys. The caseq macro generates code which uses the eq function's equality. The caseql macro uses eql, and casequal uses equal.

Example


  (let ((command-symbol (casequal command-string
                          (("q" "quit") 'quit)
                          (("a" "add") 'add)
                          (("d" "del" "delete") 'delete)
                          (t 'unknown))))
    ...)

 

9.5.4 Operator/function if

Syntax:


  (if
cond t-form [e-form])
  [if
cond then [else]]

Description:

There exist both an if operator and an if function. A list form with the symbol if in the fist position is interpreted as an invocation of the if operator. The function can be accessed using the DWIM bracket notation and in other ways.

The if operator provides a simple two-way-selective evaluation control. The cond form is evaluated. If it yields true then t-form is evaluated, and that form's return value becomes the return value of the if. If cond yields false, then e-form is evaluated and its return value is taken to be that of if. If e-form is omitted, then the behavior is as if e-form were specified as nil.

The if function provides no evaluation control. All of arguments are evaluated from left to right. If the cond argument is true, then it returns the then argument, otherwise it returns the value of the else argument if present, otherwise it returns nil.

 

9.5.5 Operator/function and

Syntax:


  (and
form*)
  [and
arg*]

Description:

There exist both an and operator and an and function. A list form with the symbol and in the fist position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.

The and operator provides three functionalities in one. It computes the logical "and" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). It also provides an idiom for the convenient substitution of a value in place of nil when some other values are all true.

The and operator evaluates as follows. First, a return value is established and initialized to the value t. The form-s, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when nil is stored in the return value. When evaluation stops, the operator yields the return value.

The and function provides no evaluation control; it receives all of its arguments fully evaluated. If it is given no arguments, it returns t. If it is given one or more arguments, and any of them are nil, it returns nil. Otherwise it returns the value of the last argument.

Examples:


  (and) -> t
  (and (> 10 5) (stringp "foo")) -> t
  (and 1 2 3) -> 3  ;; shorthand for (if (and 1 2) 3).

 

9.5.6 Operator/function or

Syntax:


  (or
form*)
  [or
arg*]

Description:

There exist both an or operator and an or function. A list form with the symbol or in the fist position is interpreted as an invocation of the operator. The function can be accessed using the DWIM bracket notation and in other ways.

The or operator provides three functionalities in one. It computes the logical "or" function over several forms. It controls evaluation (a.k.a. "short-circuiting"). The behavior of or also provides an idiom for the selection of the first non-nil value from a sequence of forms.

The or operator evaluates as follows. First, a return value is established and initialized to the value nil. The form-s, if any, are evaluated from left to right. The return value is overwritten with the result of each form. Evaluation stops when all forms are exhausted, or when a true value is stored into the return value. When evaluation stops, the operator yields the return value.

The or function provides no evaluation control; it receives all of its arguments fully evaluated. If it is given no arguments, it returns nil. If all of its arguments are nil, it also returns nil. Otherwise, it returns the value of the first argument which isn't nil.

Examples:


  (or) -> nil
  (or 1 2) -> 1
  (or nil 2) -> 2
  (or (> 10 20) (stringp "foo")) -> t

 

9.5.7 Macros when and unless

Syntax:


  (when
expression form*)
  (unless
expression form*)

Description:

The when macro operator evaluates expression. If expression yields true, and there are additional forms, then each form is evaluated. The value of the last form is becomes the result value of the when form. If there are no forms, then the result is nil.

The unless operator is similar to when, except that it reverses the logic of the test. The forms, if any, are evaluated if, and only if expression is false.

 

9.5.8 Macros while and until

Syntax:


  (while
expression form*)
  (until
expression form*)

Description:

The while macro operator provides a looping construct. It evaluates expression. If expression yields nil, then the evaluation of the while form terminates, producing the value nil. Otherwise, if there are additional forms, then each form is evaluated. Next, evaluation returns to expression, repeating all of the previous steps.

The until macro operator is similar to while, except that the until form terminates when expression evaluates true, rather than false.

These operators arrange for the evaluation of all their enclosed forms in an anonymous block. Any of the form-s, or expression, may use the return operator to terminate the loop, and optionally to specify a result value for the form.

The only way these forms can yield a value other than nil is if the return operator is used to terminate the implicit anonymous block, and is given an argument, which becomes the result value.

 

9.5.9 Macros while* and until*

Syntax:


  (while*
expression form*)
  (until*
expression form*)

Description:

The while* and until* macros are similar, respectively, to the macros while and until.

They differ in one respect: they begin by evaluating the form-s one time unconditionally, without first evaluating expression. After this evaluation, the subsequent behavior is like that of while or until.

Another way to regard the behavior is that that these forms execute one iteration unconditionally, without evaluating the termination test prior to the first iteration. Yet another view is that these constructs relocate the test from the "top of the loop" to the "bottom of the loop".

 

9.5.10 Macro whilet

Syntax:


  (whilet ({
sym | (sym init-form)}+)
    
body-form*)

Description:

The whilet macro provides a construct which combines iteration with variable binding.

The evaluation of the form takes place as follows. First, fresh bindings are established for sym-s as if by the let* operator. It is an error for the list of variable bindings to be empty.

After the establishment of the bindings, the the value of the sym is tested. If the value is nil, then whilet terminates. Otherwise, body-form-s are evaluated in the scope of the variable bindings, and then whilet iterates from the beginning, again establishing fresh bindings for the sym-s, and testing the value of the last sym.

All evaluation takes place in an anonymous block, which can be terminated with the return operator. Doing so terminates the loop. If the whilet loop is thus terminated by an explicit return, a return value can be specified. Under normal termination, the return value is nil.

Examples:


  ;; read lines of text from *std-input* and print them,
  ;; until the end-of-stream condition:


  (whilet ((line (get-line)))
    (put-line line))


  ;; read lines of text from *std-input* and print them,
  ;; until the end-of-stream condition occurs or
  ;; a line is identical to the character string "end".


  (whilet ((line (get-line))
           (more (and line (not (equal line "end")))))
    (put-line line))

 

9.5.11 Macros iflet and whenlet

Syntax:


  (iflet {({
sym | (sym init-form)}+) | atom-form}
    
then-form [else-form])
  (whenlet {({
sym | (sym init-form)}+) | atom-form}
    
body-form*)

Description:

The iflet and whenlet macros combine the variable binding of let* with conditional evaluation of if and when, respectively.

In either construct's syntax, a non-compound form atom-form may appear in place of the variable binding list. In this case, atom-form is evaluated as a form, and the construct is equivalent to its respective ordinary if or when counterpart.

If the list of variable bindings is empty, it is interpreted as the atom nil and treated as an atom-form.

If one or more bindings are specified rather than atom-form, then the evaluation of these forms takes place as follows. First, fresh bindings are established for sym-s as if by the let* operator.

Then, the last variable's value is tested. If it is not nil then the test is true, otherwise false.

In the case of the iflet operator, if the test is true, the operator evaluates then-form and yields its value. Otherwise the test is false, and if the optional else-form is present, that is evaluated instead and its value is returned. If this form is missing, then nil is returned.

In the case of the whenlet operator, if the test is true, then the body-form-s, if any, are evaluated. The value of the last one is returned, otherwise nil if the forms are missing. If the test is false, then evaluation of body-form-s is skipped, and nil is returned.

Examples:


  ;; dispose of foo-resource if present
  (whenlet ((foo-res (get-foo-resource obj)))
    (foo-shutdown foo-res)
    (set-foo-resource obj nil))


  ;; Contrast with: above, using when and let
  (let ((foo-res (get-foo-resource obj)))
    (when foo-res
      (foo-shutdown foo-res)
      (set-foo-resource obj nil)))


  ;; print frobosity value if it exceeds 150
  (whenlet ((fv (get-frobosity-value))
            (exceeds-p (> fv 150)))
    (format t "frobosity value ~a exceeds 150\n" fv))


  ;; yield 4: 3 interpreted as atom-form
  (whenlet 3 4)


  ;; yield 4: nil interpreted as atom-form
  (iflet () 3 4)

 

9.5.12 Macro condlet

Syntax:


  (condlet
     ([({ sym | (
sym init-form)}+) | atom-form]
     
body-form*)*)

Description:

The condlet macro generalizes iflet.

Each argument is a compound consisting of at least one item: a list of bindings or atom-form. This item is followed by zero or more body-form-s.

If the are are no body-form-s then the situation is treated as if there were a single body-form specified as nil.

The arguments of condlet are considered in sequence, starting with the leftmost.

If the argument's left item is an atom-form then the form is evaluated. If it yields true, then the body-form-s next to it are evaluated in order, and the condlet form terminates, yielding the value obtained from the last body-form. If atom-form yields false, then the next argument is considered, if there is one.

If the argument's left item is a list of bindings, then it is processed with exactly the same logic as under the iflet macro. If the last binding contains a true value, then the adjoining body-form-s are evaluated in a scope in which all of the bindings are visible, and condlet terminates, yielding the value of the last body-form. Otherwise, the next argument of condlet is considered (processed in a scope in which the bindings produced by the current item are no longer visible).

If condlet runs out of arguments, it terminates and returns nil.

Example:


  (let ((l '(1 2 3)))
    (condlet
       ;; first arg
       (((a (first l)   ;; a binding gets 1
         (b (second l)) ;; b binding gets 2
         (g (> a b))))  ;; last variable g is nil
        'foo)           ;; not evaluated
       ;; second arg
       (((b (second l)  ;; b gets 2
         (c (third l))  ;; c gets 3
         (g (> b c))))  ;; last variable g is true
        'bar)))         ;; condlet terminates
   --> bar              ;; result is bar

 

9.5.13 Macro ifa

Syntax:


  (ifa
cond then [else])

Description:

The ifa macro provides a anaphoric conditional operator resembling the if operator. Around the evaluation of the then and else forms, the symbol it is implicitly bound to a subexpression of cond, a subexpression which is thereby identified as the it-form. This it alias provides a convenient reference to that place or value, similar to the word "it" in the English language, and similar anaphoric pronouns in other languages.

If it is bound to a place form, the binding is established as if using the placelet operator: the form is evaluated only once, even if the it alias is used multiple times in the then or else expressions. Otherwise, if the form is not a syntactic place it is bound as an ordinary lexical variable to the form's value.

An it-candidate is an an expression viable for having its value or storage location bound to the it symbol. An it-candidate is any expression which is not a constant expression according to the constantp function, and not a symbol.

The ifa macro imposes applies several rules to the cond expression:

1.
The cond expression must be either an atom, a function call form, or a dwim form. Otherwise the ifa expression is ill-formed, and throws an exception at macro-expansion time. For the purposes of these rules, a dwim form is considered as a function call expression, whose first argument is the second element of the form. That is to say, [f x] which is equivalent to (dwim f x) is treated similarly to (f x) as a one-argument call.

2.
If the cond expression is a function call with two or more arguments, at most one of them may be an it-candidate. If two or more arguments are it-candidates, the situation is ambiguous. The ifa expression is ill-formed and throws an exception at macro-expansion time.
3.
If cond is an atom, or a function call expression with no arguments, then the it symbol is not bound. Effectively, ifa macro behaves like the ordinary if operator.
4.
If cond is a function call or dwim expression with exactly one argument, then the it variable is bound to the argument expression, except when the function being called is not, null, or false. This binding occurs regardless of whether the expression is an it-candidate.
5.
If cond is a function call with exactly one argument to the Boolean negation function which goes by one of the three names not, null, or false, then that situation is handled by a rewrite according to the following pattern:


  (ifa (not
expr) then else) -> (ifa expr else then)

which applies likewise for null or false substituted for not. The Boolean inverse function is removed, and the then and else expressions are exchanged.

6.
If cond is a function call with two or more arguments, then it is only well-formed if at most one of those arguments is an it-candidate. If there is one such argument, then the it variable is bound to it.
7.
Otherwise the variable is bound to the leftmost argument expression, regardless of whether that argument expression is an it-candidate.

In all other regards, the ifa macro behaves similarly to if.

The cond expression is evaluated, and, if applicable, the value of, or storage location denoted by the appropriate argument is captured and bound to the variable it whose scope extends over the then form, as well as over else, if present.

If cond yields a true value, then then is evaluated and the resulting value is returned, otherwise else is evaluated if present and its value is returned. A missing else is treated as if it were the nil form.

Examples:


  (ifa t 1 0)  ->  1


  ;; Rule 7: it binds to (* x x), which is
  ;; the only it-candidate.
  (let ((x 6) (y 49))
    (ifa (> y (* x x)) ;; it binds to (* x x)
      (list it)))
  -> (36)


  ;; Rule 4: it binds to argument of evenp,
  ;; even though 4 isn't an it-candidate.
  (ifa (evenp 4)
    (list it))
  -> (4)


  ;; Rule 5:
  (ifa (not (oddp 4))
    (list it))
  -> (4)


  ;; Violation of Rule 1:
  ;; while is not a function
  (ifa (while t (print 42))
    (list it))
  --> exception!


  ;; Violation of Rule 2:
  (let ((x 6) (y 49))
    (ifa (> (* y y y) (* x x)))
      (list it))
  --> exception!

 

9.5.14 Macro conda

Syntax:


  (conda {(
test form*)}*)

Description:

The conda operator provides a multi-branching conditional evaluation of forms, similarly to the cond operator. Enclosed in the cond form are groups of forms expressed as lists. Each group must be a list of at least one form.

The conda operator is anaphoric: it expands into a nested structure of zero or more ifa invocations, according to these patterns:


  (conda) -> nil
  (conda (x y ...) ...) -> (ifa x (progn y ...) (conda ...))

Thus, conda inherits all the restrictions on the test expressions from ifa, as well as the anaphoric it variable feature.

 

9.5.15 Macro dotimes

Syntax:


  (dotimes (
var count-form [result-form]) body-form*)

Description:

The dotimes macro implements a simple counting loop. var is established as a variable, and initialized to zero. count-form is evaluated one time to produce a limiting value, which should be a number. Then, if the value of var is less than the limiting value, the body-form-s are evaluated, var is incremented by one, and the process repeats with a new comparison of var against the limiting value possibly leading to another evaluation of the forms.

If var is found to equal or exceed the limiting value, then the loop terminates.

When the loop terminates, its return value is nil unless a result-form is present, in which case the value of that form specifies the return value.

body-form-s as well as result-form are evaluated in the scope in which the binding of var is visible.

 

9.5.16 Operators each, each*, collect-each, collect-each*, append-each and append-each*

Syntax:


  (each ({(
sym init-form)}*) body-form*)
  (each* ({(
sym init-form)}*) body-form*)
  (collect-each ({(
sym init-form)}*) body-form*)
  (collect-each* ({(
sym init-form)}*) body-form*)
  (append-each ({(
sym init-form)}*) body-form*)
  (append-each* ({(
sym init-form)}*) body-form*)

Description:

These operators establish a loop for iterating over the elements of one or more lists. Each init-form must evaluate to a list. The lists are then iterated in parallel over repeated evaluations of the body-form-s, with each sym variable being assigned to successive elements of its list. The shortest list determines the number of iterations, so if any of the init-form-s evaluate to an empty list, the body is not executed.

The body forms are enclosed in an anonymous block, allowing the return operator to terminate the loop prematurely and optionally specify the return value.

The collect-each and collect-each* variants are like each and each*, except that for each iteration, the resulting value of the body is collected into a list. When the iteration terminates, the return value of the collect-each or collect-each* operator is this collection.

The append-each and append-each* variants are like each and each*, except that for each iteration other than the last, the resulting value of the body must be a list. The last iteration may produce either an atom or a list. The objects produced by the iterations are combined together as if they were arguments to the append function, and the resulting value is the value of the append-each or append-each* operator.

The alternate forms denoted by the adorned symbols each*, collect-each* and append-each*, differ from each, collect-each and append-each* in the following way. The plain forms evaluate the init-form-s in an environment in which none of the sym variables are yet visible. By contrast, the alternate forms evaluate each init-form in an environment in which bindings for the previous sym variables are visible. In this phase of evaluation, sym variables are list-valued: one by one they are each bound to the list object emanating from their corresponding init-form. Just before the first loop iteration, however, the sym variables are assigned the first item from each of their lists.

Example:


 ;; print numbers from 1 to 10 and whether they are even or odd
 (each* ((n (range 1 10)) ;; n list a list here!
         (even (collect-each ((n m)) (evenp m))))
   ;; n is an item here!
   (format t "~s is ~s\n" n (if even "even" "odd")))

Output:


 1 is odd
 2 is even
 3 is odd
 4 is even
 5 is odd
 6 is even
 7 is odd
 8 is even
 9 is odd
 10 is even

 

9.5.17 Operators for and for*

Syntax:


  ({for | for*} ({
sym | (sym init-form)}*)
                ([
test-form result-form*])
                (
inc-form*)
    
body-form*)

Description:

The for and for* operators combine variable binding with loop iteration. The first argument is a list of variables with optional initializers, exactly the same as in the let and let* operators. Furthermore, the difference between for and for* is like that between let and let* with regard to this list of variables.

The for and for* operators execute these steps:

1.
Establish an anonymous block over the entire form, allowing the return operator to be used to terminate the loop.
2.
Establish bindings for the specified variables similarly to let and let*. The variable bindings are visible over the test-form, each result-form, each inc-form and each body-form.
3.
Evaluate test-form. If test-form yields nil, then the loop terminates. Each result-form is evaluated, and the value of the last of these forms is is the result value of the loop. If there are no result-form-s then the result value is nil. If the test-form is omitted, then the test is taken to be true, and the loop does not terminate.
4.
Otherwise, if test-form yields true, then each body-form is evaluated in turn. Then, each inc-form is evaluated in turn and processing resumes at step 2.

Furthermore, the for and for* operators establish an anonymous block, allowing the return operator to be used to terminate at any point.

 

9.5.18 Operators block and block*

Syntax:


  (block
name body-form*)
  (block*
name-form body-form*)

Description:

The block operator introduces a named block around the execution of some forms. The name argument must be a symbol. Since a block name is not a variable binding, keyword symbols are permitted, and so are the symbols t and nil. A block named by the symbol nil is slightly special: it is understood to be an anonymous block.

The block* operator differs from block in that it evaluates name-form, which is expected to produce a symbol. The resulting symbol is used for the name of the block.

A named or anonymous block establishes an exit point for the return-from or return operator, respectively. These operators can be invoked within a block to cause its immediate termination with a specified return value.

A block also establishes a prompt for a delimited continuation. Anywhere in a block, a continuation can be captured using the sys:capture-cont function. Delimited continuations are described in the section Delimited Continuations. A delimited continuation allows an apparently abandoned block to be restarted at the capture point, with the entire call chain and dynamic environment between the prompt and the capture point intact.

Blocks in TXR Lisp have dynamic scope. This means that the following situation is allowed:


  (defun func () (return-from foo 42))
  (block foo (func))

The function can return from the foo block even though the foo block does not lexically surround foo.

It is because blocks are dynamic that the block* variant exists; for lexically scoped blocks, it would make little sense to have support a dynamically computed name.

Thus blocks in TXR Lisp provide dynamic non-local returns, as well as returns out of lexical nesting.

Dialect Note:

In Common Lisp, blocks are lexical. A separate mechanism consisting of catch and throw operators performs non-local transfer based on symbols. The TXR Lisp example:


  (defun func () (return-from foo 42))
  (block foo (func))

is not allowed in Common Lisp, but can be transliterated to:


  (defun func () (throw 'foo 42))
  (catch 'foo (func))

Note that foo is quoted in CL. This underscores the dynamic nature of the construct. throw itself is a function and not an operator. Also note that the CL example, in turn, is even more closely transcribed back into TXR Lispsimply by replacing its throw and catch with return* and block*:


  (defun func () (return* 'foo 42))
  (block* 'foo (func))

Common Lisp blocks also do not support delimited continuations.

 

9.5.19 Operators return and return-from

Syntax:


  (return [
value])
  (return-from
name [value])

Description:

The return operator must be dynamically enclosed within an anonymous block (a block named by the symbol nil). It immediately terminates the evaluation of the innermost anonymous block which encloses it, causing it to return the specified value. If the value is omitted, the anonymous block returns nil.

The return-from operator must be dynamically enclosed within a named block whose name matches the name argument. It immediately terminates the evaluation of the innermost such block, causing it to return the specified value. If the value is omitted, that block returns nil.

Example:


    (block foo
      (let ((a "abc\n")
            (b "def\n"))
        (pprint a *stdout*)
        (return-from foo 42)
        (pprint b *stdout*)))

Here, the output produced is "abc". The value of b is not printed because. return-from terminates block foo, and so the second pprint form is not evaluated.

 

9.5.20 Function return*

Syntax:


  (return*
name [value])

Description:

The return* function is similar to the the return-from operator, except that name is an ordinary function parameter, and so when return* is used, an argument expression must be specified which evaluates to a symbol. Thus return* allows the target block of a return to be dynamically computed.

The following equivalence holds between the operator and function:


  (return-from a b)  <-->  (return* 'a b)

Expressions used as name arguments to return* which do not simply quote a symbol have no equivalent in return-from.

 

9.6 Evaluation

 

9.6.1 Function eval

Syntax:


  (eval
form [env])

Description:

The eval function treats the form object as a Lisp expression, which is evaluated. The side effects implied by the form are performed, and the value which it produces is returned. The optional env object specifies an environment for resolving the function and variable references encountered in the expression. If this argument is omitted nil then evaluation takes place in the global environment.

See also: the make-env function.

 

9.6.2 Function constantp

Syntax:


  (constantp
form [env ])

Description:

The constantp function determines whether form is a constant form, with respect to environment env.

If env is absent, the global environment is used. The env argument is used for macro-expanding form.

Currently, constantp returns true for any form, which, after macro-expansion is a compound form with the symbol quote in its first position, a non-symbolic atom, or one of the symbols which evaluate to themselves and cannot be bound as variables. These symbols are the keyword symbols, and the symbols t and nil.

In the future, constantp will be able to recognize more constant forms, such as calls to certain functions whose arguments are constant forms.

 

9.6.3 Function make-env

Syntax:


  (make-env [
variable-bindings [function-bindings [next-env]]])

Description:

The make-env function creates an environment object suitable as the env parameter.

The variable-bindings and function-bindings parameters, if specified, should be association lists, mapping symbols to objects. The objects in function-bindings should be functions, or objects callable as functions.

The next-env argument, if specified, should be an environment.

Note: bindings can also be added to an environment using the env-vbind and env-fbind functions.

 

9.6.4 Functions env-vbind and env-fbind

Syntax:


  (env-vbind
env symbol value)
  (env-fbind
env symbol value)

Description:

These functions bind a symbol to a value in either the function or variable space of environment env.

Values established in the function space should be functions or objects that can be used as functions such as lists, strings, arrays or hashes.

If symbol already exists in the environment, in the given space, then its value is updated with value.

 

9.7 Global Environment

 

9.7.1 Accessors symbol-function, symbol-macro and symbol-value

Syntax:


  (symbol-function
symbol)
  (symbol-macro
symbol)
  (symbol-value
symbol)
  (set (symbol-function
symbol) new-value)
  (set (symbol-macro
symbol) new-value)
  (set (symbol-value
symbol) new-value)

Description:

The symbol-function function retrieves the value of the global function binding of the given symbol if it has one: that is, the function object bound to the symbol. If symbol has no global function binding, then nil is returned.

The symbol-macro function retrieves the value of the global macro binding of symbol if it has one. The value of a macro binding isn't a function object, but a list of the following form:


  (#<environment object>
macro-parameter-list body-form*)

This representation is likely to change or expand to include other forms in future TXR versions.

Note: the name of this function has nothing to do with symbol macros; it is named for consistency with symbol-function and symbol-value, referring to the "macro-expander binding of the symbol cell".

The symbol-value function retrieves the value stored in the dynamic binding of symbol that is apparent in the current context. If the variable has no dynamic binding, then symbol-value retrieves its value in the global environment. If symbol has no variable binding, but is defined as a global symbol macro, then the value of that symbol macro binding is retrieved. The value of a symbol macro binding is simply the replacement form.

Rather than throwing an exception, each of these functions returns nil if the argument symbol doesn't have the binding in the respective namespace or namespaces which that function searches.

A symbol-function, symbol-macro, or symbol-value form denotes a place, if symbol has a binding of the respective kind. This place may be assigned to or deleted. Assignment to the place causes the denoted binding to have a new value. Deleting a place with the del macro removes the binding, and returns the previous contents of that binding. A binding denoted by a symbol-function form is removed using fmakunbound, one denoted by by symbol-macro is removed using mmakunbound and a binding denoted by symbol-value is removed using makunbound.

If one of these three accessors is applied to a symbol which doesn't have a binding in the respective namespace corresponding to that accessor, then the form denotes a nonexistent place. An attempt to store a value to this place results in an exception being thrown.

Deleting such a nonexistent place doesn't throw an exception. If a nonexistent place is deleted using the del macro, nothing happens, and instead of the prior value of the place, which doesn't exist, the macro yields the value nil.

Dialect note:

In ANSI Common Lisp, the symbol-function function retrieves a function, macro or special operator binding. These are all in one space and may not co-exist. In TXR Lisp, it retrieves strictly a function binding. The symbol-macro function doesn't exist in Common Lisp.

 

9.7.2 Functions boundp, fboundp and mboundp

Syntax:


  (boundp
symbol)
  (fboundp
symbol)
  (mboundp
symbol)

Description:

boundp returns t if the symbol is bound as a variable or symbol macro in the global environment, otherwise nil.

fboundp returns t if the symbol has a function binding in the global environment, otherwise it returns nil nil.

mboundp returns t if the symbol has an operator macro binding in the global environment, otherwise nil.

Dialect Notes:

The boundp function in ANSI Common Lisp doesn't report that global symbol macros have a binding. They are not considered bindings. In TXR Lisp, they are considered bindings.

The ANSI Common Lisp fboundp yields true if its argument has a function, macro or operator binding. The behavior of the Common Lisp expression (fboundp x) in Common Lisp can be obtained in TXR Lisp using the
  (or (fboundp x) (mboundp x) (special-operator-p x))
expression.

The mboundp function doesn't exist in ANSI Common Lisp.

 

9.7.3 Functions makunbound, fmakunbound and mmakunbound

Syntax:


  (makunbound
symbol)
  (fmakunbound
symbol)
  (mmakunbound
symbol)

Description:

The function makunbound the binding of symbol from either the dynamic environment or the global symbol macro environment. After the call to makunbound, symbol appears to be unbound.

If the makunbound call takes place in a scope in which there exists a dynamic rebinding of symbol, the information for restoring the previous binding is not affected by makunbound. When that scope terminates, the previous binding will be restored.

If the makunbound call takes place in a scope in which the dynamic binding for symbol is the global binding, then the global binding is removed. When the global binding is removed, then if symbol was previously marked as special (for instance by defvar) this marking is removed.

Otherwise if symbol has a global symbol macro binding, that binding is removed.

If symbol has no apparent dynamic binding, and no global symbol macro binding, makunbound does nothing.

In all cases, makunbound returns symbol.

Dialect Note:

The behavior of makunbound differs from its counterpart in ANSI Common Lisp.

The makunbound function in Common Lisp only removes a value from a dynamic variable. The dynamic variable does not cease to exist, it only ceases to have a value (because a binding is a value). In TXR Lisp, the variable ceases to exist. The binding of a variable isn't its value, it is the variable itself: the association between a name and an abstract storage location, in some environment. If the binding is undone, the variable disappears.

The makunbound function in Common Lisp does not remove global symbol macros, which are not considered to be bindings in the variable namespace. That is to say, the Common Lisp boundp does not report true for symbol macros.

The Common Lisp makunbound also doesn't remove the special attribute from a symbol. If a variable is introduced with defvar and then removed with makunbound, the symbol continues to exhibit dynamic binding rather than lexical in subsequent scopes. In TXR Lisp, if a global binding is removed, so is the special attribute.

 

9.7.4 Functions fmakunbound and mmakunbound

Syntax:


  (fmakunbound
symbol)
  (mmakunbound
symbol)

Description:

The function fmakunbound removes any binding for symbol from the function namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.

The function mmakunbound removes any binding for symbol from the operator macro namespace of the global environment. If symbol has no such binding, it does nothing. In either case, it returns symbol.

Dialect Note:

The behavior of fmakunbound differs from its counterpart in ANSI Common Lisp. The fmakunbound function in Common Lisp removes a function or macro binding, which do not coexist.

The mmakunbound function doesn't exist in Common Lisp.

 

9.7.5 Function func-get-form

Syntax:


  (func-get-form
func)

Description:

The func-get-form function retrieves a source code form of func, which must be an interpreted function. The source code form has the syntax (name arglist body-form*) .

 

9.7.6 Function func-get-name

Syntax:


  (func-get-name
func [env])

Description:

The func-get-name tries to resolve the function object func to a name. If that is not possible, it returns nil.

The resolution is performed by an exhaustive search through up to three spaces.

If an environment is specified by env, then this is searched first. If a binding is found in that environment which resolves to the function, then the search terminates and the binding's symbol is returned as the function's name.

If the search through environment env fails, or if that argument is not specified, then the global environment is searched for a function binding which resolves to func. If such a binding is found, then the search terminates, and the binding's symbol is returned. If two or more symbols in the global environment resolve to the function, it is not specified which one is returned.

If the global function environment search fails, then the function is considered as a possible method. The static slot space of all struct types is searched for a slot which contains func. If such a slot is found, then the method name is returned, consisting of the syntax (meth type name) where type is a symbol denoting the struct type and name is the static slot of the struct type which holds func.

If all the searches fail, then nil is returned.

 

9.7.7 Function func-get-env

Syntax:


  (func-get-env
func)

Description:

The func-get-env function retrieves the environment object associated with function func. The environment object holds the captured bindings of a lexical closure.

 

9.7.8 Function functionp

Syntax:


  (functionp
obj)

Description:

The functionp function returns t if obj is a function, otherwise it returns nil.

 

9.7.9 Function interp-fun-p

Syntax:


  (interp-fun-p
obj)

Description:

The interp-fun-p function returns t if obj is an interpreted function, otherwise it returns nil.

 

9.7.10 Function special-var-p

Syntax:


  (special-var-p
obj)

Description:

The special-var-p function returns t if obj is a symbol marked for special variable binding, otherwise it returns nil. Symbols are marked special by defvar and defparm.

 

9.7.11 Function special-operator-p

Syntax:


  (special-operator-p
obj)

Description:

The special-operator-p function returns t if obj is a symbol which names a special operator, otherwise it returns nil.

 

9.8 Object Type

In TXR Lisp, objects obey the following type hierarchy. In this type hierarchy, the internal nodes denote abstract types: no object is an instance of an abstract type. Nodes in square brackets indicate an internal structure in the type graph, visible to programs, and angle brackets indicate a plurality of types which are not listed by name:


  t ----+--- [cobj types] ---+--- hash
        |                    |
        |                    +--- stream
        |                    |
        |                    +--- random-state
        |                    |
        |                    +--- regex
        |                    |
        |                    +--- struct-type
        |                    |
        |                    +--- <structures>
        |                    |
        |                    +... <others>
        |
        |
        +--- sequence ---+--- string ---+--- str
        |                |              |
        |                |              +--- lstr
        |                |              |
        |                |              +--- lit
        |                |
        |                +--- list ---+--- null
        |                |            |
        |                |            +--- cons
        |                |            |
        |                |            +--- lcons
        |                |
        |                +--- vec
        |
        +--- number ---+--- float
        |              |
        |              +--- integer ---+--- fixnum
        |                              |
        |                              +--- bignum
        |
        +--- sym
        |
        +--- env
        |
        +--- range
        |
        +--- pkg
        |
        +--- fun

In addition to the above hierarchy, the following relationships also exist:


  t ---+--- atom --- <any type other than cons> --- nil
       |
       +--- cons ---+--- lcons --- nil
                    |
                    +--- nil


  sym --- null

That is to say, the types are exhaustively partitioned into atoms and conses; an object is either a cons or else it isn't, in which case it is the abstract type atom.

The cons type is odd in that it is both an abstract type, serving as a supertype for the type lcons and it is also a concrete type in that regular conses are of this type.

The type nil is an abstract type which is empty. That is to say, no object is of type nil. This type is considered the abstract subtype of every other type, including itself.

The type nil is not to be confused with the type null which is the type of the nil symbol.

Lastly, because the type of nil is the type null and nil is also a symbol, the null type is a subtype of sym.

 

9.8.1 Function typeof

Syntax:


  (typeof
value)

Description:

The typeof function returns a symbol representing the type of value.

The core types are identified by the following symbols:

cons
Cons cell.

str
String.

lit
Literal string embedded in the TXR executable image.

chr
Character.

fixnum
Fixnum integer: an integer that fits into the value word, not having to be heap allocated.

bignum
A bignum integer: arbitrary precision integer that is heap-allocated.

float
Floating-point number.

sym
Symbol.

pkg
Symbol package.

fun
Function.

vec
Vector.

lcons
Lazy cons.

range
Range object.

lstr
Lazy string.

env
Function/variable binding environment.

hash
Hash table.

stream
I/O stream of any kind.

regex
Regular expression object.

struct-type
A structure type: the type of any one of the values which represents a structure type.

There are more kinds of objects, such as user-defined structures.

 

9.8.2 Function subtypep

Syntax:


  (subtypep
left-type-symbol right-type-symbol)

Description:

The subtypep function tests whether left-type-symbol and right-type-symbol name a pair of types, such that the left type is a subtype of the right type.

Each type is a subtype of itself. Most other type relationships can be inferred from the type hierarchy diagrams given in the introduction to this section.

In addition, there are inheritance relationships among structures. If left-type-symbol and right-type-symbol both name structure types, then subtypep yields true if the types are the same struct type, or if the right type is a direct or indirect supertype of the left.

 

9.8.3 Function typep

Syntax:


  (typep
object type-symbol)

Description:

The typep function tests whether the type of object is a subtype of the type named by type-symbol.

The following equivalence holds:


  (typep a b) --> (subtypep (typeof a) b)

 

9.8.4 Macro typecase

Syntax:


  (typecase
test-form {(type-sym clause-form*)}*)

Description:

The typecase macro evaluates test-form and then successively tests its type against each clause.

Each clause consists of a type symbol type-sym and zero or more clause-form-s.

The first clause whose type-sym is a supertype of the type of test-form's value is considered to be the matching clause. That clause's clause-form-s are evaluated, and the value of the last form is returned.

If there is no matching clause, or there are no clauses present, or the matching clause has no clause-form-s, then nil is returned.

Note: since t is the supertype of every type, a clause whose type-sym is the symbol t always matches. If such a clause is placed as the last clause of a typecase, it provides a fallback case, whose forms are evaluated if none of the previous clauses match.

 

9.9 Object Equivalence

 

9.9.1 Functions identity and use

Syntax:


  (identity
value)
  (use
value)

Description:

The identity function returns its argument.

The use function is a synonym.

Notes:

The identity function is useful as a functional argument, when a transformation function is required, but no transformation is actually desired. In this role, the use synonym leads to readable code. For instance:
  ;; construct a function which returns its integer argument
  ;; if it is odd, otherwise it returns its successor.
  ;; "If it's odd, use it, otherwise take its successor".


  [iff oddp use succ]


  ;; Applications of the function:


  [[iff oddp use succ] 3] -> 3  ;; use applied to 3


  [[iff oddp use succ] 2] -> 3  ;; succ applied to 2

 

9.9.2 Functions null, not and false

Syntax:


  (null
value)
  (not
value)
  (false
value)

Description:

The null, not and false functions are synonyms. They tests whether value is the object nil. They return t if this is the case, nil otherwise.

Examples:


  (null '()) -> t
  (null nil) -> t
  (null ()) -> t
  (false t) -> nil


  (if (null x) (format t "x is nil!"))


  (let ((list '(b c d)))
    (if (not (memq 'a list))
      (format t "list ~s does not contain the symbol a\n")))

 

9.9.3 Functions true and have

Syntax:


  (true
value)
  (have
value)

Description:

The true function is the complement of the null, not and false functions. The have function is a synonym for true.

It return t if the value is any object other than nil. If value is nil, it returns nil.

Note: programs should avoid explicitly testing values with true. For instance (if x ...) should be favored over (if (true x) ...). However, the latter is useful with the ifa macro because (ifa (true expr) ...) binds the it variable to the value of expr, no matter what kind of form expr is, which is not true in the (ifa expr ...) form.

Example:


   ;; Compute indices where the list '(1 nil 2 nil 3)
   ;; has true values:
   [where '(1 nil 2 nil 3) true] -> (1 3)

 

9.9.4 Functions eq, eql and equal

Syntax:


  (eq
left-obj right-obj)
  (eql
left-obj right-obj)
  (equal
left-obj right-obj)

Description:

The principal equality test functions eq, eql and equal test whether two objects are equivalent, using different criteria. They return t if the objects are equivalent, and nil otherwise.

The eq function uses the strictest equivalence test, called implementation equality. The eq function returns t if, and only if, left-obj and right-obj are actually the same object. The eq test is is implemented by comparing the raw bit pattern of the value, whether or not it is an immediate value or a pointer to a heaped object. Two character values are eq if they are the same character, and two fixnum integers are eq if they have the same value. All other object representations are actually pointers, and are eq if, and only, if they point to the same object in memory. So, for instance, two bignum integers might not be eq even if they have the same numeric value, two lists might not be eq even if all their corresponding elements are eq and two strings might not be eq even if they hold identical text.

The eql function is slightly less strict than eq. The difference between eql and eq is that if left-obj and right-obj are numbers which are of the same kind and have the same numeric value, eql returns t, even if they are different objects. Note that an integers and a floating-point number are not eql even if one has a value which converts to the other: thus, (eql 0.0 0) yields nil; the comparison operation which finds these numbers equal is the (= 0.0 0). The eql function also specially treats range objects. Two distinct range objects are eql if their corresponding from and to fields are eql. For all other object types, eql behaves like eq.

The equal function is less strict still than eql. In general, it recurses into some kinds of aggregate objects to perform a structural equivalence check. For struct types, it also supports customization via equality substitution. See the Equality Substitution section under Structures.

Firstly, if left-obj and right-obj are eql then they are also equal, though of course the converse isn't necessarily the case.

If two objects are both cons cells, then they are equal if their car fields are equal and their cdr fields are equal.

If two objects are vectors, they are equal if they have the same length, and their corresponding elements are equal.

If two objects are strings, they are equal if they are textually identical.

If two objects are functions, they are equal if they have equal environments, and if they have the same code. Two compiled functions are considered to have the same code if and only if they are pointers to the same function. Two interpreted functions are considered to have the same code if their list structure is equal.

Two hashes are equal if they use the same equality (both are :equal-based, or both are the default :eql-based), if their associated user data elements are equal (see the function get-hash-userdata), if their sets of keys are identical, and if the data items associated with corresponding keys from each respective hash are equal objects.

Two ranges are equal if their corresponding to and from fields are equal.

For some aggregate objects, there is no special semantics. Two arguments which are symbols, packages, or streams are equal if and only if they are the same object.

Certain object types have a custom equal function.

 

9.9.5 Function less

Syntax:


  (less
left-obj right-obj)
  (less
obj obj*)

Description:

The less function, when called with two arguments, determines whether left-obj compares less than right-obj in a generic way which handles arguments of various types.

The argument syntax of less is generalized. It can accept one argument, in which case it unconditionally returns t regardless of that argument's value. If more than two arguments are given, then less generalizes in a way which can be described by the following equivalence pattern, with the understanding that each argument expression is evaluated exactly once:


  (less a b c) <--> (and (less a b) (less b c))
  (less a b c d) <--> (and (less a b) (less b c) (less c d))

The less function is used as the default for the lessfun argument of the functions sort and merge, as well as the testfun argument of the pos-min and find-min.

The less function is capable of comparing numbers, characters, symbols, strings, as well as lists and vectors of these.

If both arguments are the same object so that (eq left-obj right-obj) holds true, then the function returns nil regardless of the type of left-obj, even if the function doesn't handle comparing different instances of that type. In other words, no object is less than itself, no matter what it is.

If both arguments are numbers or characters, they are compared as if using the < function.

If both arguments are strings, they are compared as if using the string-lt function.

If both arguments are symbols, then their names are compared in their place, as if by the string-lt function.

If both arguments are conses, then they are compared as follows:

1.
The less function is recursively applied to the car fields of both arguments. If it yields true, then left-obj is deemed to be less than right-obj.
2.
Otherwise, if the car fields are unequal under the equal function, less returns nil.
3.
If the car fields are equal then less is recursively applied to the cdr fields of the arguments, and the result of that comparison is returned.

This logic performs a lexicographic comparison on ordinary lists such that for instance (1 1) is less than (1 1 1) but not less than (1 0) or (1).

Note that the empty nil list nil compared to a cons is handled by type-based precedence, described below.

If the arguments are vectors, they are compared lexicographically, similar to strings. Corresponding elements, starting with element 0, of the vectors are compared until an index position is found where the vectors differ. If this differing position is beyond the end of one of the two vectors, then the shorter vector is considered to be lesser. Otherwise, the result of less is the outcome of comparing those differing elements themselves with less.

If the two arguments are of the above types, but of mutually different types, then less resolves the situation based on the following precedence: numbers and characters are less than strings, which are less than symbols, which are less than conses, which are less than vectors.

Note that since nil is a symbol, it is ranked lower than a cons. This interpretation ensures correct behavior when nil is regarded as an empty list, since the empty list is lexicographically prior to a nonempty list.

If either argument is a structure for which the equal method is defined, the method is invoked on that argument, and the value returned is used in place of that argument for performing the comparison. Structures with no equal method cannot participate in a comparison, resulting in an error. See the Equality Substitution section under Structures.

Finally, if either of the arguments has a type other than the above types, the situation is an error.

 

9.9.6 Function greater

Syntax:


  (greater
left-obj right-obj)
  (greater
obj obj*)

Description:

The greater function is equivalent to less with the arguments reversed. That is to say, the following equivalences hold:


  (greater a <--> (less a) <--> t
  (greater a b) <--> (less b a)
  (greater a b c ...) <--> (less ... c b a)

The greater function is used as the default for the testfun argument of the pos-max and find-max functions.

 

9.9.7 Functions lequal and gequal

Syntax:


  (lequal
obj obj*)
  (gequal
obj obj*)

Description:

The functions lequal and gequal are similar to less and greater respectively, but differ in the following respect: when called with two arguments which compare true under the equal function, the lequal and gequal functions return t.

When called with only one argument, both functions return t and both functions generalize to three or more arguments in the same way as do less and greater.

 

9.10 List Manipulation

 

9.10.1 Function cons

Syntax:


  (cons
car-value cdr-value)

Description:

The cons function allocates, initializes and returns a single cons cell. A cons cell has two fields called car and cdr, which are accessed by functions of the same name, or by the functions first and rest, which are synonyms for these.

Lists are made up of conses. A (proper) list is either the symbol nil denoting an empty list, or a cons cell which holds the first item of the list in its car, and the list of the remaining items in cdr. The expression (cons 1 nil) allocates and returns a single cons cell which denotes the one-element list (1). The cdr is nil, so there are no additional items.

A cons cell whose cdr is an atom other than nil is printed with the dotted pair notation. For example the cell produced by (cons 1 2) is denoted (1 . 2). The notation (1 . nil) is perfectly valid as input, but the cell which it denotes will print back as (1). The notations are equivalent.

The dotted pair notation can be used regardless of what type of object is the cons cell's cdr. so that for instance (a . (b c)) denotes the cons cell whose car is the symbol a a and whose cdr is the list (b c). This is exactly the same thing as (a b c). In other words (a b ... l m . (n o ... w . (x y z))) is exactly the same as (a b ... l m n o ... w x y z).

Every list, and more generally cons cell tree structure, can be written in a "fully dotted" notation, such that there are as many dots as there are cells. For instance the cons structure of the nested list (1 (2) (3 4 (5))) can be made more explicit using (1 . ((2 . nil) . ((3 . (4 . ((5 . nil) . nil))) . nil)))). The structure contains eight conses, and so there are eight dots in the fully dotted notation.

The number of conses in a linear list like (1 2 3) is simply the number of items, so that list in particular is made of three conses. Additional nestings require additional conses, so for instance (1 2 (3)) requires four conses. A visual way to count the conses from the printed representation is to count the atoms, then add the count of open parentheses, and finally subtract one.

A list terminated by an atom other than nil is called an improper list, and the dot notation is extended to cover improper lists. For instance (1 2 . 3) is an improper list of two elements, terminated by 3, and can be constructed using (cons 1 (cons 2 3)). The fully dotted notation for this list is (1 . (2 . 3)).

 

9.10.2 Function atom

Syntax:


  (atom
value)

Description:

The atom function tests whether value is an atom. It returns t if this is the case, nil otherwise. All values which are not cons cells are atoms.

(atom x) is equivalent to (not (consp x)).

Examples:


  (atom 3) -> t
  (atom (cons 1 2)) -> nil
  (atom "abc") -> t
  (atom '(3)) -> nil

 

9.10.3 Function consp

Syntax:


  (consp
value)

Description:

The consp function tests whether value is a cons. It returns t if this is the case, nil otherwise.

(consp x) is equivalent to (not (atom x)).

Non-empty lists test positive under consp because a list is represented as a reference to the first cons in a chain of one or more conses.

Note that a lazy cons is a cons and satisfies the consp test. See the function make-lazy-cons and the macro lcons.

Examples:


  (consp 3) -> nil
  (consp (cons 1 2)) -> t
  (consp "abc") -> nil
  (consp '(3)) -> t

 

9.10.4 Accessors car and first

Syntax:


  (car
object)
  (first
object)
  (set (car
object) new-value)
  (set (first
object) new-value)

Description:

The functions car and first are synonyms.

If object is a cons cell, these functions retrieve the car field of that cons cell. (car (cons 1 2)) yields 1.

For programming convenience, object may be of several other kinds in addition to conses.

(car nil) is allowed, and returns nil.

object may also be a vector or a string. If it is an empty vector or string, then nil is returned. Otherwise the first character of the string or first element of the vector is returned.

A car form denotes a valid place when object is accessible via car, isn't the object nil, and is modifiable.

A car form supports deletion. The following equivalence then applies:


  (del (car place)) <--> (pop place)

This implies that deletion requires the argument of the car form to be a place, rather than the whole form itself. In this situation, the argument place may have a value which is nil, because pop is defined on an empty list.

The abstract concept behind deleting a car is that physically deleting this field from a cons, thereby breaking it in half, would result in just the cdr remaining. Though fragmenting a cons in this manner is impossible, deletion simulates it by replacing the place which previously held the cons, with that cons' cdr field. This semantics happens to coincide with deleting the first element of a list by a pop operation.

 

9.10.5 Accessors cdr and rest

Syntax:


  (cdr
object)
  (rest
object)
  (set (cdr
object) new-value)
  (set (rest
object) new-value)

Description:

The functions cdr and rest are synonyms.

If object is a cons cell, these functions retrieve the cdr field of that cons cell. (cdr (cons 1 2)) yields 2.

For programming convenience, object may be of several other kinds in addition to conses.

(cdr nil) is allowed, and returns nil.

object may also be a vector or a string. If it is a non-empty string or vector containing at least two items, then the remaining part of the object is returned, with the first element removed. For example (cdr "abc") yields "bc". If object is is a one-element vector or string, or an empty vector or string, then nil is returned. Thus (cdr "a") and (cdr "") both result in nil.

The invocation syntax of a cdr or rest form is a syntactic place. The place is semantically valid when object is accessible via cdr, isn't the object nil, and is modifiable.

A cdr place supports deletion, according to the following near equivalence:


  (del (cdr place)) <--> (prog1 (cdr place)
                                (set place (car place)))

Of course, place is evaluated only once.

Note that this is symmetric with the delete semantics of car in that the cons stored in place goes away, as does the cdr field, leaving just the car, which takes the place of the original cons.

Example:

Walk every element of the list (1 2 3) using a for loop:


    (for ((i '(1 2 3))) (i) ((set i (cdr i)))
      (print (car i) *stdout*)
      (print #\newline *stdout*))

The variable i marches over the cons cells which make up the "backbone" of the list. The elements are retrieved using the car function. Advancing to the next cell is achieved using (cdr i). If i is the last cell in a (proper) list, (cdr i) yields nil and so i becomes nil, the loop guard expression i fails and the loop terminates.

 

9.10.6 Functions rplaca and rplacd

Syntax:


  (rplaca
cons new-car-value)
  (rplacd
cons new-cdr-value)

Description:

The rplaca and rplacd functions assign new values into the car and cdr fields of the cell cons.

Note that, except for the difference in return value, (rplaca x y) is the same as the more generic (set (car x) y), and likewise (rplacd x y) can be written as (set (cdr x) y).

It is an error if cons is not a cons or lazy cons. In particular, whereas (car nil) is correct, (rplaca nil ...) is erroneous.

The rplaca and rplacd functions return cons. Note: TXR versions 89 and earlier, these functions returned the new value. The behavior was undocumented.

The cons argument does not have to be a cons cell. Both functions support meaningful semantics for vectors and strings. If cons is a string, it must be modifiable.

The rplaca function replaces the first element of a vector or first character of a string. The vector or string must be at least one element long.

The rplacd function replaces the suffix of a vector or string after the first element with a new suffix. The new-cdr-value must be a sequence, and if the suffix of a string is being replaced, it must be a sequence of characters. The suffix here refers to the portion of the vector or string after the first element.

It is permissible to use rplacd on an empty string or vector. In this case, new-cdr-value specifies the contents of the entire string or vector, as if the operation were done on a non-empty vector or string, followed by the deletion of the first element.

 

9.10.7 Accessors second, third, fourth, fifth, sixth, seventh, eighth, ninth and tenth

Syntax:


  (first
object)
  (second
object)
  (third
object)
  (fourth
object)
  (fifth
object)
  (sixth
object)
  (seventh
object)
  (eighth
object)
  (ninth
object)
  (tenth
object)
  (set (first
object) new-value)
  (set (second
object) new-value)
  ...
  (set (tenth
object) new-value)

Description:

Used as functions, these accessors retrieve the elements of a sequence by position. If the sequence is shorter than implied by the position, these functions return nil.

When used as syntactic places, these accessors denote the storage locations by position. The location must exist, otherwise an error exception results. The places support deletion.

Examples:


  (third '(1 2)) -> nil
  (second "ab") -> #\b
  (third '(1 2 . 3)) -> **error, improper list*


  (let ((x (copy "abcd")))
    (inc (third x))
    x) -> "abce"

 

9.10.8 Functions append, nconc and append*

Syntax:


  (append [
list* last-arg])
  (nconc [
list* last-arg])
  (append* [
list* last-arg])

Description:

The append function creates a new list which is a catenation of the list arguments. All arguments are optional; (append) produces the empty list.

If a single argument is specified, then append simply returns the value of that argument. It may be any kind of object.

If N arguments are specified, where N > 1, then the first N-1 arguments must be proper lists. Copies of these lists are catenated together. The last argument N, shown in the above syntax as last-arg, may be any kind of object. It is installed into the cdr field of the last cons cell of the resulting list. Thus, if argument N is also a list, it is catenated onto the resulting list, but without being copied. Argument N may be an atom other than nil; in that case append produces an improper list.

The nconc function works like append, but avoids consing. It destructively manipulates (that is to say, mutates) incoming lists to catenate them, and so must be used with care.

The append* function works like append, but returns a lazy list which produces the catenation of the lists on demand. If some of the arguments are themselves lazy lists which are infinite, then append* can return immediately, whereas append will get caught in an infinite loop trying to produce a catenation and eventually exhaust available memory. (However, the last argument to append may be an infinite lazy list, because append does not traverse the last argument.)

Examples:


  ;; An atom is returned.
  (append 3) -> 3


  ;; A list is also just returned: no copying takes place.
  ;; The eq function can verify that the same object emerges
  ;; from append that went in.
  (let ((list '(1 2 3)))
    (eq (append list) list)) -> t


  (append '(1 2 3) '(4 5 6) 7) -> '(1 2 3 4 5 6 . 7))


  ;; the (4 5 6) tail of the resulting list is the original
  ;; (4 5 6) object, shared with that list.


  (append '(1 2 3) '(4 5 6)) -> '(1 2 3 4 5 6)


  (append nil) -> nil


  ;; (1 2 3) is copied: it is not the last argument
  (append '(1 2 3) nil) -> (1 2 3)


  ;; empty lists disappear
  (append nil '(1 2 3) nil '(4 5 6)) -> (1 2 3 4 5 6)
  (append nil nil nil) -> nil


  ;; atoms and improper lists other than in the last position
  ;; are erroneous
  (append '(a . b) 3 '(1 2 3)) -> **error**

 

9.10.9 Functions revappend and nreconc

Syntax:


  (revappend
list1 list2)
  (nreconc
list1 list2)

Description:

The revappend function returns a list consisting of list2 appended to a reversed copy of list1. The returned object shares structure with list2, which is unmodified.

The nreconc function behaves similarly, except that the the returned object may share structure with not only list2 but also list1, which is modified.

 

9.10.10 Function list

Syntax:


  (list
value*)

Description:

The list function creates a new list, whose elements are the argument values.

Examples:


  (list) -> nil
  (list 1) -> (1)
  (list 'a 'b) -> (a b)

 

9.10.11 Function list*

Syntax:


  (list*
value*)

Description:

The list* function is a generalization of cons. If called with exactly two arguments, it behaves exactly like cons: (list* x y) is identical to (cons x y). If three or more arguments are specified, the leading arguments specify additional atoms to be consed to the front of the list. So for instance (list* 1 2 3) is the same as (cons 1 (cons 2 3)) and produces the improper list (1 2 . 3). Generalizing in the other direction, list* can be called with just one argument, in which case it returns that argument, and can also be called with no arguments in which case it returns nil.

Examples:


  (list*) -> nil
  (list* 1) -> 1
  (list* 'a 'b) -> (a . b)
  (list* 'a 'b 'c) -> (a b . c)

Dialect Note:

Note that unlike in some other Lisp dialects, the effect of (list* 1 2 x) can also be obtained using (list 1 2 . x). However, (list* 1 2 (func 3)) cannot be rewritten as (list 1 2 . (func 3)) because the latter is equivalent to (list 1 2 func 3).

 

9.10.12 Function sub-list

Syntax:


  (sub-list
list [from [to]])

Description:

This function is like the sub function, except that it operates strictly on lists.

For a description of the arguments and semantics, refer to the sub function.

 

9.10.13 Function replace-list

Syntax:


  (replace-list
list item-sequence [from [to]])

Description:

The replace-list function is like the replace function, except that the first argument must be a list.

For a description of the arguments, semantics and return value, refer to the replace function.

 

9.10.14 Functions listp and proper-list-p

Syntax:


  (listp
value)
  (proper-list-p
value)

Description:

The listp and proper-list-p functions test, respectively, whether value is a list, or a proper list, and return t or nil accordingly.

The listp test is weaker, and executes without having to traverse the object. (listp x) is equivalent to (or (null x) (consp x)). The empty list nil is a list, and a cons cell is a list.

The proper-list-p function returns t only for proper lists. A proper list is either nil, or a cons whose cdr is a proper list. proper-list-p traverses the list, and its execution will not terminate if the list is circular.

Dialect Note: in TXR 137 and older, proper-list-p is called proper-listp. The name was changed for adherence to conventions and compatibility with other Lisp dialects, like Common Lisp. However, the function continues to be available under the old name. Code that must run on TXR 137 and older installations should use proper-listp, but its use going forward is deprecated.

 

9.10.15 Function length-list

Syntax:


  (length-list
list)

Description:

The length-list function returns the length of list, which may be a proper or improper list. The length of a list is the number of conses in that list.

 

9.10.16 Function copy-list

Syntax:


  (copy-list
list)

Description:

The copy-list function which returns a list similar to list, but with a newly allocated cons cell structure.

If list is an atom, it is simply returned.

Otherwise, list is a cons cell, and copy-list returns the same object as the expression (cons (car list) (copy-list (cdr list))).

Note that the object (car list) is not deeply copied, but only propagated by reference into the new list. copy-list produces a new list structure out of the same items that are in list.

Dialect Note:

Common Lisp does not allow the argument to be an atom, except for the empty list nil.

 

9.10.17 Function copy-cons

Syntax:


  (copy-cons
cons)

Description:

This function creates a fresh cons cell, whose car and cdr fields are copied from cons.

 

9.10.18 Functions reverse and nreverse

Syntax:


  (reverse
list)
  (nreverse
list)

Description:

Description:

The functions reverse and nreverse produce an object which contains the same items as proper list list, but in reverse order. If list is nil, then both functions return nil.

The reverse function is non-destructive: it creates a new list.

The nreverse function creates the structure of the reversed list out of the cons cells of the input list, thereby destructively altering it (if it contains more than one element). How nreverse uses the material from the original list is unspecified. It may rearrange the cons cells into a reverse order, or it may keep the structure intact, but transfer the car values among cons cells into reverse order. Other approaches are possible.

 

9.10.19 Function ldiff

Syntax:


  (ldiff
list sublist)

Description:

The values list and sublist are proper lists.

The ldiff function determines whether sublist is a structural suffix of list (meaning that it actually is a suffix, and is not merely equal to one).

This is true if list and sublist are the same object, or else, recursively, if sublist is a suffix of (cdr list).

The object nil is the sublist of every list, including itself.

The ldiff function returns a new list consisting of the elements of the prefix of list which come before the sublist suffix. The elements are in the same order as in list. If sublist is not a suffix of list, then a copy of list is returned.

This function also works more generally on sequences. The list and sublist arguments may be strings or vectors. In this case, the suffixing matching behavior is relaxed to one of structural equivalence. See the relevant examples below.

Examples:


  ;;; unspecified: the compiler could make
  ;;; '(2 3) a suffix of '(1 2 3),
  ;;; or they could be separate objects.
  (ldiff '(1 2 3) '(2 3)) -> either (1) or (1 2 3)


  ;; b is the (1 2) suffix of a, so the ldiff is (1)
  (let ((a '(1 2 3)) (b (cdr a)))
    (ldiff a b))
  -> (1)


  ;; string and vector behavior
  (ldiff "abc" "bc") -> "a"


  (ldiff "abc" nil) -> "abc"


  (ldiff #(1 2 3) #(3)) -> #(1 2)


  ;; mixtures do not have above behavior
  (ldiff #(1 2 3) '(3)) -> #(1 2 3)


  (ldiff '(1 2 3) #(3)) -> #(1 2 3)


  (ldiff "abc" #(#\b #\c)) -> "abc"

 

9.10.20 Function last

Syntax:


  (last
seq)

Description:

If seq is a nonempty proper or improper list, the last function returns the last cons cell in the list: that cons cell whose cdr field is a terminating atom.

If seq is nil, then nil is returned.

If seq is a non-list sequence, then a one-element suffix of seq is returned, or an empty suffix if seq is an empty sequence.

 

9.10.21 Accessor nthcdr

Syntax:


  (nthcdr
index list)
  (set (nthcdr
index list) new-value)

Description:

The nthcdr function retrieves the n-th cons cell of a list, indexed from zero. The index parameter must be a non-negative integer. If index specifies a nonexistent cons beyond the end of the list, then nthcdr yields nil. The following equivalences hold:


  (nthcdr 0 list) <--> list
  (nthcdr 1 list) <--> (cdr list)
  (nthcdr 2 list) <--> (cddr list)

An nthcdr place designates the storage location which holds the n-th cell, as indicated by the value of index. Indices beyond the last cell of list do not designate a valid place. If list is itself a place, then the zeroth index is permitted and the resulting place denotes list. Storing a value to (nthcdr 0 list) overwrites list. Otherwise if list isn't a syntactic place, then the zeroth index does not designate a valid place; index must have a positive value. A nthcdr place does not support deletion.

Dialect Note:

In Common Lisp, nthcdr is only a function, not an accessor; nthcdr forms do not denote places.

 

9.10.22 Accessors caar, cadr, cdar, cddr, ... cdddddr

Syntax:


  (caar
object)
  (cadr
object)
  (cdar
object)
  (cddr
object)
  ...
  (cdddr
object)
  (set (caar
object) new-value)
  (set (cadr
object) new-value)
  ...

Description:

The a-d accessors provide a shorthand notation for accessing two to five levels deep into a cons-cell-based tree structure. For instance, the the equivalent of the nested function call expression (car (car (cdr object))) can be achieved using the single function call (caadr object). The symbol names of the a-d accessors are a generalization of the words "car" and "cdr". They encode the pattern of car and cdr traversal of the structure using a sequence of the the letters a and d placed between c and r. The traversal is encoded in right-to-left order, so that cadr indicates a traversal of the cdr link, followed by the car. This order corresponds to the nested function call notation, which also encodes the traversal right-to-left. The following diagram illustrates the straightforward relationship:
  (cdr (car (cdr x)))
    ^    ^    ^
    |   /     |
    |  /     /
    | / ____/
    || /
  (cdadr x)

TXR Lisp provides all possible a-d accessors up to five levels deep, from caar all the way through cdddddr.

Expressions involving a-d accessors are places. For example, (caddr x) denotes the same place as (car (cddr x)), and (cdadr x) denotes the same place as (cdr (cadr x)).

The a-d accessor places support deletion, with semantics derived from the deletion semantics of the car and cdr places. For example, (del (caddr x)) means the same as (del (car (cddr x))).

 

9.10.23 Functions flatten and flatten*

Syntax:


  (flatten
list)
  (flatten*
list)

Description:

The flatten function produces a list whose elements are all of the non-nil atoms contained in the structure of list.

The flatten* function works like flatten except that it produces a lazy list. It can be used to lazily flatten an infinite lazy structure.

Examples:


  (flatten '(1 2 () (3 4))) -> (1 2 3 4)


  ;; equivalent to previous, since
  ;; nil is the same thing as ()
  (flatten '(1 2 nil (3 4))) -> (1 2 3 4)


  (flatten nil) -> nil


  (flatten '(((()) ()))) -> nil

 

9.10.24 Functions flatcar and flatcar*

Syntax:


  (flatcar
tree)
  (flatcar*
tree)

Description:

The flatcar function produces a list of all the atoms contained in the tree structure tree, in the order in which they appear, when the structure is traversed left to right.

This list includes those nil atoms which appear in car fields.

The list excludes nil atoms which appear in cdr fields.

The flatcar* function works like flatcar except that it produces a lazy list. It can be used to lazily flatten an infinite lazy structure.

Examples:


  (flatcar '(1 2 () (3 4))) -> (1 2 nil 3 4)


  (flatcar '(a (b . c) d (e) (((f)) . g) (nil . z) nil . h))


  --> (a b c d e f g nil z nil h)

 

9.10.25 Function tree-find

Syntax:


  (tree-find
obj tree test-function)

Description:

The tree-find function searches tree for an occurrence of obj. Tree can be any atom, or a cons. If tree it is a cons, it is understood to be a proper list whose elements are also trees.

The equivalence test is performed by test-function which must take two arguments, and has conventions similar to eq, eql or equal.

tree-find works as follows. If tree is equivalent to obj under test-function, then t is returned to announce a successful finding. If this test fails, and tree is an atom, nil is returned immediately to indicate that the find failed. Otherwise, tree is taken to be a proper list, and tree-find is recursively applied to each element of the list in turn, using the same obj and test-function arguments, stopping at the first element which returns a non-nil value.

 

9.10.26 Functions memq, memql and memqual

Syntax:


  (memq
object list)
  (memql
object list)
  (memqual
object list)

Description:

The memq, memql and memqual functions search list for a member which is, respectively, eq, eql or equal to object. (See the eq, eql and equal functions above.)

If no such element found, nil is returned.

Otherwise, that suffix of list is returned whose first element is the matching object.

 

9.10.27 Functions member and member-if

Syntax:


  (member
key sequence [testfun [keyfun]])
  (member-if
predfun sequence [keyfun])

Description:

The member and member-if functions search through sequence for an item which matches a key, or satisfies a predicate function, respectively.

The keyfun argument specifies a function which is applied to the elements of the sequence to produce the comparison key. If this argument is omitted, then the untransformed elements of the sequence themselves are examined.

The member function's testfun argument specifies the test function which is used to compare the comparison keys taken from the sequence to the search key. If this argument is omitted, then the equal function is used. If member does not find a matching element, it returns nil. Otherwise it returns the suffix of sequence which begins with the matching element.

The member-if function's predfun argument specifies a predicate function which is applied to the successive comparison keys pulled from the sequence by applying the key function to successive elements. If no match is found, then nil is returned, otherwise what is returned is the suffix of sequence which begins with the matching element.

 

9.10.28 Functions rmemq, rmemql, rmemqual, rmember and rmember-if

Syntax:


  (rmemq
object list)
  (rmemql
object list)
  (rmemqual
object list)
  (rmember
key sequence [testfun [keyfun]])
  (rmember-if
predfun sequence [keyfun])

Description:

These functions are counterparts to memq, memql, memqual, member and member-if which look for the right-most element which matches object, rather than for the left-most element.

 

9.10.29 Functions conses and conses*

Syntax:


  (conses
list)
  (conses*
list)

Description:

These functions return a list whose elements are the conses which make up list. The conses* function does this in a lazy way, avoiding the computation of the entire list: it returns a lazy list of the conses of list. The conses function computes the entire list before returning.

The input list may be proper or improper.

The first cons of list is that list itself. The second cons is the rest of the list, or (cdr list). The third cons is (cdr (cdr list)) and so on.

Example:


  (conses '(1 2 3)) -> ((1 2 3) (2 3) (3))

Dialect Note:

These functions are useful for simulating the maplist function found in other dialects like Common Lisp.

TXR Lisp's (conses x) can be expressed in Common Lisp as (maplist #'identity x).

Conversely, the Common Lisp operation (maplist function list) can be computed in TXR Lisp as (mapcar function (conses list)).

More generally, the Common Lisp operation


  (maplist function list0 list1 ... listn)

can be expressed as:


  (mapcar function (conses list0)
                   (conses list1) ... (conses listn))

 

9.11 Association Lists

Association lists are ordinary lists formed according to a special convention. Firstly, any empty list is a valid association list. A non-empty association list contains only cons cells as the key elements. These cons cells are understood to represent key/value associations, hence the name "association list".

 

9.11.1 Function assoc

Syntax:


  (assoc
key alist)

Description:

The assoc function searches an association list alist for a cons cell whose car field is equivalent to key (with equality determined by the equal function). The first such cons is returned. If no such cons is found, nil is returned.

 

9.11.2 Function assql

Syntax:


  (assql
key alist)

Description:

The assql function is just like assoc, except that the equality test is determined using the eql function rather than equal.

 

9.11.3 Function acons

Syntax:


  (acons
car cdr alist)

Description:

The acons function constructs a new alist by consing a new cons to the front of alist. The following equivalence holds:


  (acons car cdr alist) <--> (cons (cons car cdr) alist)

 

9.11.4 Function acons-new

Syntax:


  (acons-new
car cdr alist)

Description:

The acons-new function searches alist, as if using the assoc function, for an existing cell which matches the key provided by the car argument. If such a cell exists, then its cdr field is overwritten with the cdr argument, and then the alist is returned. If no such cell exists, then a new list is returned by adding a new cell to the input list consisting of the car and cdr values, as if by the acons function.

 

9.11.5 Function aconsql-new

Syntax:


  (aconsql-new
car cdr alist)

Description:

This function is like acons-new, except that the eql function is used for equality testing. Thus, the list is searched for an existing cell as if using the assql function rather than assoc.

 

9.11.6 Function alist-remove

Syntax:


  (alist-remove
alist keys)

Description:

The alist-remove function takes association list alist and produces a duplicate from which cells matching the specified keys have been removed. The keys argument is a list of the keys not to appear in the output list.

 

9.11.7 Function alist-nremove

Syntax:


  (alist-nremove
alist keys)

Description:

The alist-nremove function is like alist-remove, but potentially destructive. The input list alist may be destroyed and its structural material re-used to form the output list. The application should not retain references to the input list.

 

9.11.8 Function copy-alist

Syntax:


  (copy-alist
alist)

Description:

The copy-alist function duplicates alist. Unlike copy-list, which only duplicates list structure, copy-alist also duplicates each cons cell of the input alist. That is to say, each element of the output list is produced as if by the copy-cons function applied to the corresponding element of the input list.

 

9.12 Property Lists

 

9.12.1 Function prop

Syntax:


  (prop
plist key)

Description:

A property list a flat list of even length consisting of interleaved pairs of property names (usually symbols) and their values (arbitrary objects). An example property list is (:a 1 :b "two") which contains two properties, :a having value 1, and :b having value "two".

The prop function searches property list plist for key key. If the key is found, then the value next to it is returned. Otherwise nil is returned.

It is ambiguous whether nil is returned due to the property not being found, or due to the property being present with a nil value.

 

9.13 List Sorting

 

9.13.1 Function merge

Syntax:


  (merge
seq1 seq2 [lessfun [keyfun]])

Description:

The merge function merges two sorted sequences seq1 and seq2 into a single sorted sequence. The semantics and defaulting behavior of the lessfun and keyfun arguments are the same as those of the sort function.

The sequence which is returned is of the same kind as seq1.

This function is destructive of any inputs that are lists. If the output is a list, it is formed out of the structure of the input lists.

 

9.13.2 Function multi-sort

Syntax:


  (multi-sort
columns less-funcs [key-funcs])

Description:

The multi-sort function regards a list of lists to be the columns of a database. The corresponding elements from each list constitute a record. These records are to be sorted, producing a new list of lists.

The columns argument supplies the list of lists which comprise the columns of the database. The lists should ideally be of the same length. If the lists are of different lengths, then the shortest list is taken to be the length of the database. Excess elements in the longer lists are ignored, and do not appear in the sorted output.

The less-funcs argument supplies a list of comparison functions which are applied to the columns. Successive functions correspond to successive columns. If less-funcs is an empty list, then the sorted database will emerge in the original order. If less-funcs contains exactly one function, then the rows of the database is sorted according to the first column. The remaining columns simply follow their row. If less-funcs contains more than one function, then additional columns are taken into consideration if the items in the previous columns compare equal. For instance if two elements from column one compare equal, then the corresponding second column elements are compared using the second column comparison function.

The optional key-funcs argument supplies transformation functions through which column entries are converted to comparison keys, similarly to the single key function used in the sort function and others. If there are more key functions than less functions, the excess key functions are ignored.

 

9.14 Lazy Lists and Lazy Evaluation

 

9.14.1 Function make-lazy-cons

Syntax:


  (make-lazy-cons
function)

Description:

The function make-lazy-cons makes a special kind of cons cell called a lazy cons, or lcons. Lazy conses are useful for implementing lazy lists.

Lazy lists are lists which are not allocated all at once. Rather, their elements materialize when they are accessed, like magic stepping stones appearing under one's feet out of thin air.

A lazy cons has car and cdr fields like a regular cons, and those fields are initialized to nil when the lazy cons is created. A lazy cons also has an update function, the one which is provided as the function argument to make-lazy-cons.

When either the car and cdr fields of a cons are accessed for the first time, the function is automatically invoked first. That function has the opportunity to initialize the car and cdr fields. Once the function is called, it is removed from the lazy cons: the lazy cons no longer has an update function.

To continue a lazy list, the function can make another call to make-lazy-cons and install the resulting cons as the cdr of the lazy cons.

Example:


  ;;; lazy list of integers between min and max
  (defun integer-range (min max)
    (let ((counter min))
      ;; min is greater than max; just return empty list,
      ;; otherwise return a lazy list
      (if (> min max)
        nil
        (make-lazy-cons
          (lambda (lcons)
            ;; install next number into car
            (rplaca lcons counter)
            ;; now deal wit cdr field
            (cond
              ;; max reached, terminate list with nil!
              ((eql counter max)
               (rplacd lcons nil))
              ;; max not reached: increment counter
              ;; and extend with another lazy cons
              (t
                (inc counter)
                (rplacd lcons (make-lazy-cons
                                (lcons-fun lcons))))))))))

 

9.14.2 Function lconsp

Syntax:


  (lconsp
value)

Description:

The lconsp function returns t if value is a lazy cons cell. Otherwise it returns nil, even if value is an ordinary cons cell.

 

9.14.3 Function lcons-fun

Syntax:


  (lcons-fun
lazy-cons)

Description:

The lcons-fun function retrieves the update function of a lazy cons. Once a lazy cons has been accessed, it no longer has an update function and lcons-fun returns nil. While the update function of a lazy cons is executing, it is still accessible. This allows the update function to retrieve a reference to itself and propagate itself into another lazy cons (as in the example under make-lazy-cons).

 

9.14.4 Macro lcons

Syntax:


  (lcons
car-expression cdr-expression)

Description:

The lcons macro simplifies the construction of structures based on lazy conses. Syntactically, it resembles the cons function. However, the arguments are expressions rather than values. The macro generates code which, when evaluated, immediately produces a lazy cons. The expressions car-expression and cdr-expression are not immediately evaluated. Rather, when either the car or cdr field of the lazy cons cell is accessed, these expressions are both evaluated at that time, in the order that they appear in the lcons expression, and in the original lexical scope in which that expression was evaluated. The return values of these expressions are used, respectively, to initialize the corresponding fields of the lazy cons.

Note: the lcons macro may be understood in terms of the following reference implementation, as a syntactic sugar combining the make-lazy-cons constructor with a lexical closure provided by a lambda function:


  (defmacro lcons (car-form cdr-form)
    (let ((lc (gensym)))
       ^(make-lazy-cons (lambda (,lc)
                          (rplaca ,lc ,car-form)
                          (rplacd ,lc ,cdr-form)))))

Example:


  ;; Given the following function ...


  (defun fib-generator (a b)
    (lcons a (fib-generator b (+ a b))))


  ;; ... the following function call generates the Fibonacci
  ;; sequence as an infinite lazy list.


  (fib-generator 1 1) -> (1 1 2 3 5 8 13 ...)

 

9.14.5 Functions lazy-stream-cons and get-lines

Syntax:


  (lazy-stream-cons
stream)
  (get-lines [
stream])

Description:

The lazy-stream-cons and get-lines functions are synonyms, except that the stream argument is optional in get-lines and defaults to *stdin*. Thus, the following description of lazy-stream-cons also applies to get-lines.

The lazy-stream-cons returns a lazy cons which generates a lazy list based on reading lines of text from input stream stream, which form the elements of the list. The get-line function is called on demand to add elements to the list.

The lazy-stream-cons function itself makes the first call to get-line on the stream. If this returns nil, then the stream is closed and nil is returned. Otherwise, a lazy cons is returned whose update function will install that line into the car field of the lazy cons, and continue the lazy list by making another call to lazy-stream-cons, installing the result into the cdr field.

lazy-stream-cons inspects the real-time property of a stream as if by the real-time-stream-p function. This determines which of two styles of lazy list are returned. For an ordinary (non-real-time) stream, the lazy list treats the end-of-file condition accurately: an empty file turns into the empty list nil, a one line file into a one-element list which contains that line and so on. This accuracy requires one line of lookahead which is not acceptable in real-time streams, and so a different type of lazy list is used, which generates an extra nil item after the last line. Under this type of lazy list, an empty input stream translates to the list (nil); a one-line stream translates to ("line" nil) and so forth.

 

9.14.6 Macro delay

Syntax:


  (delay
expression)

Description:

The delay operator arranges for the delayed (or "lazy") evaluation of expression. This means that the expression is not evaluated immediately. Rather, the delay expression produces a promise object.

The promise object can later be passed to the force function (described later in this document). The force function will trigger the evaluation of the expression and retrieve the value.

The expression is evaluated in the original scope, no matter where the force takes place.

The expression is evaluated at most once, by the first call to force. Additional calls to force only retrieve a cached value.

Example:


  ;; list is popped only once: the value is computed
  ;; just once when force is called on a given promise
  ;; for the first time.


  (defun get-it (promise)
    (format t "*list* is ~s\n" *list*)
    (format t "item is ~s\n" (force promise))
    (format t "item is ~s\n" (force promise))
    (format t "*list* is ~s\n" *list*))


  (defvar *list* '(1 2 3))


  (get-it (delay (pop *list*)))


  Output:


  *list* is (1 2 3)
  item is 1
  item is 1
  *list* is (2 3)

 

9.14.7 Accessor force

Syntax:


  (force
promise)
  (set (force
promise) new-value)

Description:

The force function accepts a promise object produced by the delay macro. The first time force is invoked, the expression which was wrapped inside promise by the delay macro is evaluated (in its original lexical environment, regardless of where in the program the force call takes place). The value of expression is cached inside promise and returned, becoming the return value of the force function call. If the force function is invoked additional times on the same promise, the cached value is retrieved.

A force form is a syntactic place, denoting the value cache location within promise.

Storing a value in a force place causes future accesses to the promise to return that value.

If the promise had not yet been forced, then storing a value into it prevents that from ever happening. The delayed expression will never be evaluated.

If, while a promise is being forced, the evaluation of expression itself causes an assignment to the promise, it is not specified whether the promise will take on the value of expression or the assigned value.

 

9.14.8 Function promisep

Syntax:


  (promisep
object)

Description:

The promisep function returns t if object is a promise object: an object created by the delay macro. Otherwise it returns nil.

Note: promise objects are conses. The typeof function applied to a promise returns cons.

 

9.14.9 Macro mlet

Syntax:


  (mlet ({
sym | (sym init-form)}*) body-form*)

Description:

The mlet macro ("magic let" or "mutual let") implements a variable binding construct similar to let and let*.

Under mlet, the scope of the bindings of the sym variables extends over the init-form-s, as well as the body-form-s.

Unlike the let* construct, each init-form has each sym in scope. That is to say, an init-form can refer not only to previous variables, but also to later variables as well as to its own variable.

The variables are not initialized until their values are accessed for the first time. Any sym whose value is not accessed is not initialized.

Furthermore, the evaluation of each init-form does not take place until the time when its value is needed to initialize the associated sym. This evaluation takes place once. If a given sym is not accessed during the evaluation of the mlet construct, then its init-form is never evaluated.

The bound variables may be assigned. If, before initialization, a variable is updated in such a way that its prior value is not needed, it is unspecified whether initialization takes place, and thus whether its init-form is evaluated.

Direct circular references are erroneous and are diagnosed. This takes place when the macro-expanded form is evaluated, not during the expansion of mlet.

Examples:


  ;; Dependent calculations in arbitrary order
  (mlet ((x (+ y 3))
         (z (+ x 1))
         (y 4))
    (+ z 4))  -->  12


  ;; Error: circular reference:
  ;; x depends on y, y on z, but z on x again.
  (mlet ((x (+ y 1))
         (y (+ z 1))
         (z (+ x 1)))
    z)


  ;; Okay: lazy circular reference because lcons is used
  (mlet ((list (lcons 1 list)))
    list)  -->  (1 1 1 1 1 ...) ;; circular list

In the last example, the list variable is accessed for the first time in the body of the mlet form. This causes the evaluation of the lcons form. This form evaluates its arguments lazily, which means that it is not a problem that list is not yet initialized. The form produces a lazy cons, which is then used to initialize list. When the car or cdr fields of the lazy cons are accessed, the list expression in the lcons argument is accessed. By that time, the variable is initialized and holds the lazy cons itself, which creates the circular reference, and a circular list.

 

9.14.10 Functions generate, giterate and ginterate

Syntax:


  (generate
while-fun gen-fun)
  (giterate
while-fun gen-fun [value])
  (ginterate
while-fun gen-fun [value])

Description:

The generate function produces a lazy list which dynamically produces items according to the following logic.

The arguments to generate are functions which do not take any arguments. The return value of generate is a lazy list.

When the lazy list is accessed, for instance with the functions car and cdr, it produces items on demand. Prior to producing each item, while-fun is called. If it returns a true Boolean value (any value other than nil), then the gen-fun function is called, and its return value is incorporated as the next item of the lazy list. But if while-fun yields nil, then the lazy list immediately terminates.

Prior to returning the lazy list, generate invokes the while-fun one time. If while-fun yields nil, then generate returns the empty list nil instead of a lazy list. Otherwise, it instantiates a lazy list, and invokes the gen-func to populate it with the first item.

The giterate function is similar to generate, except that while-fun and gen-fun are functions of one argument rather than functions of no arguments. The optional value argument defaults to nil and is threaded through the function calls. That is to say, the lazy list returned is (value [gen-fun value] [gen-fun [gen-fun value]] ...).

The lazy list terminates when a value fails to satisfy while-fun. That is to say, prior to generating each value, the lazy list tests the value using while-fun. If that function returns nil, then the item is not added, and the sequence terminates.

Note: giterate could be written in terms of generate like this:


  (defun giterate (w g v)
     (generate (lambda () [w v])
               (lambda () (prog1 v (set v [g v])))))

The ginterate function is a variant of giterate which includes the test-failing item in the generated sequence. That is to say ginterate generates the next value and adds it to the lazy list. The value is then tested using while-fun. If that function returns nil, then the list is terminated, and no more items are produced.

Example:


  (giterate (op > 5) (op + 1) 0) -> (0 1 2 3 4)
  (ginterate (op > 5) (op + 1) 0) -> (0 1 2 3 4 5)

 

9.14.11 Function expand-right

Syntax:


  (expand-right
gen-fun value)

Description:

The expand-right function is a complement to reduce-right, with lazy semantics.

The gen-fun parameter is a function, which must accept a single argument, and return either a cons pair or nil.

The value parameter is any value.

The first call to gen-fun receives value.

The return value is interpreted as follows. If gen-fun returns a cons cell pair (elem . next) then elem specifies the element to be added to the lazy list, and next specifies the value to be passed to the next call to gen-fun. If gen-fun returns nil then the lazy list ends.

Examples:


  ;; Count down from 5 to 1 using explicit lambda
  ;; for gen-fun:


  (expand-right
    (lambda (item)
      (if (zerop item) nil
        (cons item (pred item))))
    5)
  --> (5 4 3 2 1)


  ;; Using functional combinators:
  [expand-right [iff zerop nilf [callf cons identity pred]] 5]
  --> (5 4 3 2 1)


  ;; Include zero:
  [expand-right
    [iff null
       nilf
       [callf cons identity [iff zerop nilf pred]]] 5]
  --> (5 4 3 2 1 0)

 

9.14.12 Functions expand-left and nexpand-left

Syntax:


  (expand-left
gen-fun value)
  (nexpand-left
gen-fun value)

Description:

The expand-left function is a companion to expand-right.

Unlike expand-right, it has eager semantics: it calls gen-fun repeatedly and accumulates an output list, not returning until gen-fun returns nil.

The semantics is as follows. expand-left initializes an empty accumulation list. Then gen-fun is called, with value as its argument.

If gen-fun it returns a cons cell, then the car of that cons cell is pushed onto the accumulation list, and the procedure is repeated: gen-fun is called again, with cdr taking the place of value.

If gen-fun returns nil, then the accumulation list is returned.

If the expression (expand-right f v) produces a terminating list, then the following equivalence holds:


  (expand-left f v) <--> (reverse (expand-right f v))

Of course, the equivalence cannot hold for arguments to expand-left which produce an infinite list.

The nexpand-left function is a destructive version of expand-left.

The list returned by nexpand-left is composed of the cons cells returned by gen-fun whereas the list returned by expand-left is composed of freshly allocated cons cells.

 

9.14.13 Function repeat

Syntax:


  (repeat
list [count])

Description:

If list is empty, then repeat returns an empty list.

If count is omitted, the repeat function produces an infinite lazy list formed by catenating together copies of list.

If count is specified and is zero or negative, then an empty list is returned.

Otherwise a list is returned consisting of count repetitions of list catenated together.

 

9.14.14 Function pad

Syntax:


  (pad
sequence object [count])

Description:

The pad function produces a lazy list which consists of all of the elements of sequence followed by repetitions of object.

If count is omitted, then the repetition of object is infinite. Otherwise the specified number of repetitions occur.

Note that sequence may be a lazy list which is infinite. In that case, the repetitions of object will never occur.

 

9.14.15 Function weave

Syntax:


  (weave {
sequence}*)

Description:

The weave function interleaves elements from the sequences given as arguments.

If called with no arguments, it returns the empty list.

If called with a single sequence, it returns the elements of that sequence as a new lazy list.

When called with two or more sequences, weave returns a lazy list which draws elements from the sequences in a round-robin fashion, repeatedly scanning the sequences from left to right, and taking an item from each one, removing it from the sequence. Whenever a sequence runs out of items, it is deleted; the weaving then continues with the remaining sequences. The weaved sequence terminates when all sequences are eliminated. (If at least one of the sequences is an infinite lazy list, then the weaved sequence is infinite.)

Examples:


  ;; Weave negative integers with positive ones:
  (weave (range 1) (range -1 : -1)) -> (1 -1 2 -2 3 -3 ...)


  (weave "abcd" (range 1 3) '(x x x x x x x))
  --> (#\a 1 x #\b 2 x #\c 3 x #\d x x x x)

 

9.14.16 Macros gen and gun

Syntax:


  (gen
while-expression produce-item-expression)
  (gun
produce-item-expression)

Description:

The gen macro operator produces a lazy list, in a manner similar to the generate function. Whereas the generate function takes functional arguments, the gen operator takes two expressions, which is often more convenient.

The return value of gen is a lazy list. When the lazy list is accessed, for instance with the functions car and cdr, it produces items on demand. Prior to producing each item, the while-expression is evaluated, in its original lexical scope. If the expression yields a non-nil value, then produce-item-expression is evaluated, and its return value is incorporated as the next item of the lazy list. If the expression yields nil, then the lazy list immediately terminates.

The gen operator itself immediately evaluates while-expression before producing the lazy list. If the expression yields nil, then the operator returns the empty list nil. Otherwise, it instantiates the lazy list and invokes the produce-item-expression to force the first item.

The gun macro similarly creates a lazy list according to the following rules. Each successive item of the lazy list is obtained as a result of evaluating produce-item-expression. However, when produce-item-expression yields nil, then the list terminates (without adding that nil as an item).

Note 1: the form gun can be implemented as a macro-expanding to an instance of the gen operator, like this:


  (defmacro gun (expr)
    (let ((var (gensym)))
      ^(let (,var)
         (gen (set ,var ,expr)
              ,var))))

This exploits the fact that the set operator returns the value that is assigned, so the set expression is tested as a condition by gen, while having the side effect of storing the next item temporarily in a hidden variable.

In turn, gen can be implemented as a macro expanding to some lambda functions which are passed to the generate function:


  (defmacro gen (while-expr produce-expr)
    ^(generate (lambda () ,while-expr) (lambda () ,produce-expr)))

Note 2: gen can be considered as an acronym for Generate, testing Expression before Next item, whereas gun stands for Generate Until Null.

Example:


  ;; Make a lazy list of integers up to 1000
  ;; access and print the first three.
  (let* ((counter 0)
         (list (gen (< counter 1000) (inc counter))))
    (format t "~s ~s ~s\n" (pop list) (pop list) (pop list)))


  Output:
  1 2 3

 

9.14.17 Functions range and range*

Syntax:


  (range [
from [to [step]]])
  (range* [
from [to [step]]])

Description:

The range and range* functions generate a lazy sequence of integers, with a fixed step between successive values.

The difference between range and range* is that range* excludes the endpoint. For instance (range 0 3) generates the list (0 1 2 3), whereas (range* 0 3) generates (0 1 2).

All arguments are optional. If the step argument is omitted, then it defaults to 1: each value in the sequence is greater than the previous one by 1. Positive or negative step sizes are allowed. There is no check for a step size of zero, or for a step direction which cannot meet the endpoint.

The to argument specifies the endpoint value, which, if it occurs in the sequence, is excluded from it by the range* function, but included by the range function. If to is missing, or specified as nil, then there is no endpoint, and the sequence which is generated is infinite, regardless of step.

If from is omitted, then the sequence begins at zero, otherwise from must be an integer which specifies the initial value.

The sequence stops if it reaches the endpoint value (which is included in the case of range, and excluded in the case of range*). However, a sequence with a stepsize greater than 1 or less than -1 might step over the endpoint value, and therefore never attain it. In this situation, the sequence also stops, and the excess value which surpasses the endpoint is excluded from the sequence.

 

9.15 Ranges

 

9.15.1 Function rcons

Syntax:


  (rcons
from to)

Description:

The rcons function constructs a range object which holds the values from and to.

Though range objects are effectively binary cells like conses, they are atoms. They also aren't considered sequences, nor are they structures.

Range objects are used for indicating numeric ranges, such as substrings of lists, arrays and strings. The dotdot notation serves as a syntactic sugar for rcons. The syntax a..b denotes the expression (rcons a b).

Note that ranges are immutable, meaning that it is not possible to replace the values in a range.

 

9.15.2 Function rangep

Syntax:


  (rangep
value)

Description:

The rangep function returns t if value is a range. Otherwise it returns nil.

 

9.15.3 Functions from and to

Syntax:


  (from
range)
  (to
range)

Description:

The from and to functions retrieve, respectively, the from and to fields of a range.

Note that these functions are not accessors, which is because ranges are immutable.

 

9.16 Characters and Strings

 

9.16.1 Function mkstring

Syntax:


  (mkstring
length char)

Description:

The mkstring function constructs a string object of a length specified by the length parameter. Every position in the string is initialized with char, which must be a character value.

 

9.16.2 Function copy-str

Syntax:


  (copy-str
string)

Description:

The copy-str function constructs a new string whose contents are identical to string.

If string is a lazy string, then a lazy string is constructed with the same attributes as string. The new lazy string has its own copy of the prefix portion of string which has been forced so far. The unforced list and separator string are shared between string and the newly constructed lazy string.

 

9.16.3 Function upcase-str

Syntax:


  (upcase-str
string)

Description:

The upcase-str function produces a copy of string such that all lower-case characters of the English alphabet are mapped to their upper case counterparts.

 

9.16.4 Function downcase-str

Syntax:


  (downcase-str
string)

Description:

The downcase-str function produces a copy of string such that all upper case characters of the English alphabet are mapped to their lower case counterparts.

 

9.16.5 Function string-extend

Syntax:


  (string-extend
string tail)

Description:

The string-extend function destructively increases the length of string, which must be an ordinary dynamic string. It is an error to invoke this function on a literal string or a lazy string.

The tail argument can be a character, string or integer. If it is a string or character, it specifies material which is to be added to the end of the string: either a single character or a sequence of characters. If it is an integer, it specifies the number of characters to be added to the string.

If tail is an integer, the newly added characters have indeterminate contents. The string appears to be the original one because of an internal terminating null character remains in place, but the characters beyond the terminating zero are indeterminate.

 

9.16.6 Function stringp

Syntax:


  (stringp
obj)

Description:

The stringp function returns t if obj is one of the several kinds of strings. Otherwise it returns nil.

 

9.16.7 Function length-str

Syntax:


  (length-str
string)

Description:

The length-str function returns the length string in characters. The argument must be a string.

 

9.16.8 Function search-str

Syntax:


  (search-str
haystack needle [start [from-end]])

Description:

The search-str function finds an occurrence of the string needle inside the haystack string and returns its position. If no such occurrence exists, it returns nil.

If a start argument is not specified, it defaults to zero. If it is a non-negative integer, it specifies the starting character position for the search. Negative values of start indicate positions from the end of the string, such that -1 is the last character of the string.

If the from-end argument is specified and is not nil, it means that the search is conducted right-to-left. If multiple matches are possible, it will find the rightmost one rather than the leftmost one.

 

9.16.9 Function search-str-tree

Syntax:


  (search-str-tree
haystack tree [start [from-end]])

Description:

The search-str-tree function is similar to search-str, except that instead of searching haystack for the occurrence of a single needle string, it searches for the occurrence of numerous strings at the same time. These search strings are specified, via the tree argument, as an arbitrarily structured tree whose leaves are strings.

The function finds the earliest possible match, in the given search direction, from among all of the needle strings.

If tree is a single string, the semantics is equivalent to search-str.

 

9.16.10 Function match-str

Syntax:


  (match-str
bigstring littlestring [start])

Description:

Without the start argument, the match-str function determines whether littlestring is a prefix of bigstring, returning a t or nil indication.

If the start argument is specified, and is a non-negative integer, then the function tests whether littlestring matches a prefix of that portion of bigstring which starts at the given position.

If the start argument is a negative integer, then match-str determines whether littlestring is a suffix of bigstring, ending on that position of bigstring, where -1 denotes the last character of bigstring, -2 the second last one and so on.

If start is -1, then this corresponds to testing whether littlestring is a suffix of bigstring.

 

9.16.11 Function match-str-tree

Syntax:


  (match-str-tree
bigstring tree [start])

Description:

The match-str-tree function is a generalization of match-str which matches multiple test strings against bigstring at the same time. The value reported is the longest match from among any of the strings.

The strings are specified as an arbitrarily shaped tree structure which has strings at the leaves.

If tree is a single string atom, then the function behaves exactly like match-str.

 

9.16.12 Function sub-str

Syntax:


  (sub-str
string [from [to]])

Description:

The sub-str function is like the more generic function sub, except that it operates only on strings. For a description of the arguments and semantics, refer to the sub function.

 

9.16.13 Function replace-str

Syntax:


  (replace-str
string item-sequence [from [to]])

Description:

The replace-str function is like the replace function, except that the first argument must be a string.

For a description of the arguments, semantics and return value, refer to the replace function.

 

9.16.14 Function cat-str

Syntax:


  (cat-str
string-list [sep-string])

Description:

The cat-str function catenates a list of strings given by string-list into a single string. The optional sep-string argument specifies a separator string which is interposed between the catenated strings.

 

9.16.15 Function split-str

Syntax:


  (split-str
string sep [keep-between])

Description:

The split-str function breaks the string into pieces, returning a list thereof. The sep argument must be either a string or a regular expression. It specifies the separator character sequence within string.

All non-overlapping matches for sep within string are identified in left to right order, and are removed from string. The string is broken into pieces according to the gaps left behind by the removed separators, and a list of the remaining pieces is returned.

If sep is the empty string, then the separator pieces removed from the string are considered to be the empty strings between its characters. In this case, if string is of length one or zero, then it is considered to have no such pieces, and a list of one element is returned containing the original string. These remarks also apply to the situation when sep is a regular expression which matches only an empty substring of string.

If a match for sep is not found in the string at all (not even an empty match), then the string is not split at all: a list of one element is returned containing the original string.

If sep matches the entire string, then a list of two empty strings is returned, except in the case that the original string is empty, in which case a list of one element is returned, containing the empty string.

Whenever two adjacent matches for sep occur, they are considered separate cuts with an empty piece between them.

This operation is nondestructive: string is not modified in any way.

If the optional keep-between argument is specified and is not nil, If an argument is given and is true, then split-str incorporates the matching separating pieces of string into the resulting list, such that if the resulting list is catenated, a string equivalent to the original string will be produced.

Note: To split a string into pieces of length one such that an empty string produces nil rather than (), use the (tok-str string #/./) pattern.

 

9.16.16 Function split-str-set

Syntax:


  (split-str-set
string set)

Description:

The split-str-set function breaks the string into pieces, returning a list thereof. The set argument must be a string. It specifies a set of characters. All occurrences of any of these characters within string are identified, and are removed from string. The string is broken into pieces according to the gaps left behind by the removed separators.

Adjacent occurrences of characters from set within string are considered to be separate gaps which come between empty strings.

This operation is nondestructive: string is not modified in any way.

 

9.16.17 Functions tok-str and tok-where

Syntax:


  (tok-str
string regex [keep-between])
  (tok-where
string regex)

Description:

The tok-str function searches string for tokens, which are defined as substrings of string which match the regular expression regex in the longest possible way, and do not overlap. These tokens are extracted from the string and returned as a list.

Whenever regex matches an empty string, then an empty token is returned, and the search for another token within string resumes after advancing by one character position. So for instance, (tok-str "abc" #/a?/) returns the ("a" "" "" ""). After the token "a" is extracted from a non-empty match for the regex, the regex is considered to match three more times: before the "b", between "b" and "c", and after the "c".

If the keep-between argument is specified, and is not nil, then the behavior of tok-str changes in the following way. The pieces of string which are skipped by the search for tokens are included in the output. If no token is found in string, then a list of one element is returned, containing string. Generally, if N tokens are found, then the returned list consists of 2N + 1 elements. The first element of the list is the (possibly empty) substring which had to be skipped to find the first token. Then the token follows. The next element is the next skipped substring and so on. The last element is the substring of string between the last token and the end.

The tok-where function works similarly to tok-str, but instead of returning the extracted tokens themselves, it returns a list of the character position ranges within string where matches for regex occur. The ranges are pairs of numbers, represented as cons cells, where the first number of the pair gives the starting character position, and the second number is one position past the end of the match. If a match is empty, then the two numbers are equal.

The tok-where function does not support the keep-between parameter.

 

9.16.18 Function list-str

Syntax:


  (list-str
string)

Description:

The list-str function converts a string into a list of characters.

 

9.16.19 Function trim-str

Syntax:


  (trim-str
string)

Description:

The trim-str function produces a copy of string from which leading and trailing tabs, spaces and newlines are removed.

 

9.16.20 Function chrp

Syntax:


  (chrp
obj)

Description:

Returns t if obj is a character, otherwise nil.

 

9.16.21 Function chr-isalnum

Syntax:


  (chr-isalnum
char)

Description:

Returns t if char is an alpha-numeric character, otherwise nil. Alpha-numeric means one of the upper or lower case letters of the English alphabet found in ASCII, or an ASCII digit. This function is not affected by locale.

 

9.16.22 Function chr-isalpha

Syntax:


  (chr-isalpha
char)

Description:

Returns t if char is an alphabetic character, otherwise nil. Alphabetic means one of the upper or lower case letters of the English alphabet found in ASCII. This function is not affected by locale.

 

9.16.23 Function chr-isascii

Syntax:


  (chr-isalpha
char)

Description:

This function returns t if the code of character char is in the range 0 to 127 inclusive. For characters outside of this range, it returns nil.

 

9.16.24 Function chr-iscntrl

Syntax:


  (chr-iscntrl
char)

Description:

This function returns t if the character char is a character whose code ranges from 0 to 31, or is 127. In other words, any non-printable ASCII character. For other characters, it returns nil.

 

9.16.25 Functions chr-isdigit and chr-digit

Syntax:


  (chr-isdigit
char)
  (chr-digit
char)

Description:

If char is is an ASCII decimal digit character, chr-isdigit returns the value t and chr-digit returns the integer value corresponding to that digit character, a value in the range 0 to 9. Otherwise, both functions return nil.

 

9.16.26 Function chr-isgraph

Syntax:


  (chr-isgraph
char)

Description:

This function returns t if char is a non-space printable ASCII character. It returns nil if it is a space or control character.

It also returns nil for non-ASCII characters: Unicode characters with a code above 127.

 

9.16.27 Function chr-islower

Syntax:


  (chr-islower
char)

Description:

This function returns t if char is an ASCII lower case letter. Otherwise it returns nil.

 

9.16.28 Function chr-isprint

Syntax:


  (chr-isprint
char)

Description:

This function returns t if char is an ASCII character which is not a control character. It also returns nil for all non-ASCII characters: Unicode characters with a code above 127.

 

9.16.29 Function chr-ispunct

Syntax:


  (chr-ispunct
char)

Description:

This function returns t if char is an ASCII character which is not a control character. It also returns nil for all non-ASCII characters: Unicode characters with a code above 127.

 

9.16.30 Function chr-isspace

Syntax:


  (chr-isspace
char)

Description:

This function returns t if char is an ASCII whitespace character: any of the characters in the set #\space, #\tab, #\linefeed, #\newline, #\return, #\vtab and #\page. For all other characters, it returns nil.

 

9.16.31 Function chr-isblank

Syntax:


  (chr-isblank
char)

Description:

This function returns t if char is a space or tab: the character #\space or #\tab. For all other characters, it returns nil.

 

9.16.32 Function chr-isunisp

Syntax:


  (chr-isunisp
char)

Description:

This function returns t if char is a Unicode whitespace character. This the case for all the characters for which chr-isspace returns t. It also returns t for these additional characters: #\xa0, #\x1680, #\x180e, #\x2000, #\x2001, #\x2002, #\x2003, #\x2004, #\x2005, #\x2006, #\x2007, #\x2008, #\x2009, #\x200a, #\x2028, #\x2029, #\x205f, and #\x3000. For all other characters, it returns nil.

 

9.16.33 Function chr-isupper

Syntax:


  (chr-isupper
char )

Description:

This function returns t if char is an ASCII upper case letter. Otherwise it returns nil.

 

9.16.34 Function chr-isxdigit and chr-xdigit

Syntax:


  (chr-isxdigit
char)
  (chr-xdigit
char)

Description:

If char is a hexadecimal digit character, chr-isxdigit returns the value t and chr-xdigit returns the integer value corresponding to that digit character, a value in the range 0 to 15. Otherwise, both functions returns nil.

A hexadecimal digit is one of the ASCII digit characters 0 through 9, or else one of the letters A through F or their lower-case equivalents a through f denoting the values 10 to 15.

 

9.16.35 Function chr-toupper

Syntax:


  (chr-toupper
char)

Description:

If character char is a lower case ASCII letter character, this function returns the upper case equivalent character. If it is some other character, then it just returns char.

 

9.16.36 Function chr-tolower

Syntax:


  (chr-tolower
char)

Description:

If character char is an upper case ASCII letter character, this function returns the lower case equivalent character. If it is some other character, then it just returns char.

 

9.16.37 Functions int-chr and chr-int

Syntax:


  (int-chr
char)
  (chr-int
num)

Description:

The argument char must be a character. The num-chr function returns that character's Unicode code point value as an integer.

The argument num must be a fixnum integer in the range 0 to #\x10FFFF. The argument is taken to be a Unicode code point value and the corresponding character object is returned.

Note: these functions are also known by the obsolescent names num-chr and chr-num.

 

9.16.38 Accessor chr-str

Syntax:


  (chr-str
str idx)
  (set (chr-str
str idx) new-value)

Description:

The chr-str function performs random access on string str to retrieve the character whose position is given by integer idx, which must be within range of the string.

The index value 0 corresponds to the first (leftmost) character of the string and so non-negative values up to one less than the length are possible.

Negative index values are also allowed, such that -1 corresponds to the last (rightmost) character of the string, and so negative values down to the additive inverse of the string length are possible.

An empty string cannot be indexed. A string of length one supports index 0 and index -1. A string of length two is indexed left to right by the values 0 and 1, and from right to left by -1 and -2.

If the element idx of string str exists, and the string is modifiable, then the chr-str form denotes a place.

A chr-str place supports deletion. When a deletion takes place, then the character at idx is removed from the string. Any characters after that position move by one position to close the gap, and the length of the string decreases by one.

Notes:

Direct use of chr-str is equivalent to the DWIM bracket notation except that str must be a string. The following relation holds:


  (chr-str s i) --> [s i]

since [s i] <--> (ref s i), this also holds:


  (chr-str s i) --> (ref s i)

However, note the following difference. When the expression [s i] is used as a place, then the subexpression s must be a place. When (chr-str s i) is used as a place, s need not be a place.

 

9.16.39 Function chr-str-set

Syntax:


  (chr-str-set
str idx char)

Description:

The chr-str function performs random access on string str to overwrite the character whose position is given by integer idx, which must be within range of the string. The character at idx is overwritten with character char.

The idx argument works exactly as in chr-str.

The str argument must be a modifiable string.

Notes:

Direct use of chr-str is equivalent to the DWIM bracket notation except that str must be a string. The following relation holds:


  (chr-str-set s i c) --> (set [s i] c)

since (set [s i] c) <--> (refset s i c), this also holds:


  (chr-str s i) --> (refset s i c)

 

9.16.40 Function span-str

Syntax:


  (span-str
str set)

Description:

The span-str function determines the longest prefix of string str which consists only of the characters in string set, in any combination.

 

9.16.41 Function compl-span-str

Syntax:


  (compl-span-str
str set)

Description:

The compl-span-str function determines the longest prefix of string str which consists only of the characters which do not appear in set, in any combination.

 

9.16.42 Function break-str

Syntax:


  (break-str
str set)

Description:

The break-str function returns an integer which represents the position of the first character in string str which appears in string set.

If there is no such character, then nil is returned.

 

9.17 Lazy Strings

Lazy strings are objects that were developed for the TXR pattern matching language, and are exposed via TXR Lisp. Lazy strings behave much like strings, and can be substituted for strings. However, unlike regular strings, which exist in their entirety, first to last character, from the moment they are created, lazy strings do not exist all at once, but are created on demand. If character at index N of a lazy string is accessed, then characters 0 through N of that string are forced into existence. However, characters at indices beyond N need not necessarily exist.

A lazy string dynamically grows by acquiring new text from a list of strings which is attached to that lazy string object. When the lazy string is accessed beyond the end of its hitherto materialized prefix, it takes enough strings from the list in order to materialize the index. If the list doesn't have enough material, then the access fails, just like an access beyond the end of a regular string. A lazy string always takes whole strings from the attached list.

Lazy string growth is achieved via the lazy-str-force-upto function which forces a string to exist up to a given character position. This function is used internally to handle various situations.

The lazy-str-force function forces the entire string to materialize. If the string is connected to an infinite lazy list, this will exhaust all memory.

Lazy strings are specially recognized in many of the regular string functions, which do the right thing with lazy strings. For instance when sub-str is invoked on a lazy string, a special version of the sub-str logic is used which handles various lazy string cases, and can potentially return another lazy string. Taking a sub-str of a lazy string from a given character position to the end does not force the entire lazy string to exist, and in fact the operation will work on a lazy string that is infinite.

Furthermore, special lazy string functions are provided which allow programs to be written carefully to take better advantage of lazy strings. What carefully means is code that avoids unnecessarily forcing the lazy string. For instance, in many situations it is necessary to obtain the length of a string, only to test it for equality or inequality with some number. But it is not necessary to compute the length of a string in order to know that it is greater than some value.

 

9.17.1 Function lazy-str

Syntax:


  (lazy-str
string-list [terminator [limit-count]])

Description:

The lazy-str function constructs a lazy string which draws material from string-list which is a list of strings.

If the optional terminator argument is given, then it specifies a string which is appended to every string from string-list, before that string is incorporated into the lazy string. If terminator is not given, then it defaults to the string "\n", and so the strings from string-list are effectively treated as lines which get terminated by newlines as they accumulate into the growing prefix of the lazy string. To avoid the use of a terminator string, a null string terminator argument must be explicitly passed. In that case, the lazy string grows simply by catenating elements from string-list.

If the limit-count argument is specified, it must be a positive integer. It expresses a maximum limit on how many elements will be consumed from string-list in order to feed the lazy string. Once that many elements are drawn, the string ends, even if the list has not been exhausted.

 

9.17.2 Function lazy-stringp

Syntax:


  (lazy-stringp
obj)

Description:

The lazy-stringp function returns t if obj is a lazy string. Otherwise it returns nil.

 

9.17.3 Function lazy-str-force-upto

Syntax:


  (lazy-str-force-upto
lazy-str index)

Description:

The lazy-str-force-upto function tries to instantiate the lazy string such that the position given by index materializes. The index is a character position, exactly as used in the chr-str function.

Some positions beyond index may also materialize, as a side effect.

If the string is already materialized through to at least index, or if it is possible to materialize the string that far, then the value t is returned to indicate success.

If there is insufficient material to force the lazy string through to the index position, then nil is returned.

It is an error if the lazy-str argument isn't a lazy string.

 

9.17.4 Function lazy-str-force

Syntax:


  (lazy-str-force
lazy-str)

Description:

The lazy-str argument must be a lazy string. The lazy string is forced to fully materialize.

The return value is an ordinary, non-lazy string equivalent to the fully materialized lazy string.

 

9.17.5 Function lazy-str-get-trailing-list

Syntax:


  (lazy-str-get-trailing-list
string index)

Description:

The lazy-str-get-trailing-list function can be considered, in some way, an inverse operation to the production of the lazy string from its associated list.

First, string is forced up through the position index. That is the only extent to which string is modified by this function.

Next, the suffix of the materialized part of the lazy string starting at position index, is split into pieces on occurrences of the terminator character (which had been given as the terminator argument in the lazy-str constructor, and defaults to newline). If the index position is beyond the part of the string which can be materialized (in adherence with the lazy string's limit-count constructor parameter), then the list of pieces is considered to be empty.

Finally, a list is returned consisting of the pieces produced by the split, to which is appended the remaining list of the string which has not yet been forced to materialize.

 

9.17.6 Functions length-str->, length-str->=, length-str-< and length-str-<=

Syntax:


  (length-str->
string len)
  (length-str->=
string len)
  (length-str-<
string len)
  (length-str-<=
string len)

Description:

These functions compare the lengths of two strings. The following equivalences hold, as far as the resulting value is concerned:


  (length-str-> s l) <--> (> (length-str s) l)
  (length-str->= s l) <--> (>= (length-str s) l)
  (length-str-< s l) <--> (< (length-str s) l)
  (length-str-<= s l) <--> (<= (length-str s) l)

The difference between the functions and the equivalent forms is that if the string is lazy, the length-str function will fully force it in order to calculate and return its length.

These functions only force a string up to position len, so they are not only more efficient, but on infinitely long lazy strings they are usable.

length-str cannot compute the length of a lazy string with an unbounded length; it will exhaust all memory trying to force the string.

These functions can be used to test such as string whether it is longer or shorter than a given length, without forcing the string beyond that length.

 

9.17.7 Function cmp-str

Syntax:


  (cmp-str
left-string right-string)

Description:

The cmp-str function returns a negative integer if left-string is lexicographically prior to right-string, and a positive integer if the reverse situation is the case. Otherwise the strings are equal and zero is returned.

If either or both of the strings are lazy, then they are only forced to the minimum extent necessary for the function to reach a conclusion and return the appropriate value, since there is no need to look beyond the first character position in which they differ.

The lexicographic ordering is naive, based on the character code point values in Unicode taken as integers, without regard for locale-specific collation orders.

 

9.17.8 Functions str=, str<, str>, str>= and str<=

Syntax:


  (str=
left-string right-string)
  (str<
left-string right-string)
  (str>
left-string right-string)
  (str<=
left-string right-string)
  (str>=
left-string right-string)

Description:

These functions compare left-string and right-string lexicographically, as if by the cmp-str function.

The str= function returns t if the two strings are exactly the same, character for character, otherwise it returns nil.

The str< function returns t if left-string is lexicographically before right-string, otherwise nil.

The str> function returns t if left-string is lexicographically after right-string, otherwise nil.

The str< function returns t if left-string is lexicographically before right-string, or if they are exactly the same, otherwise nil.

The str< function returns t if left-string is lexicographically after right-string, or if they are exactly the same, otherwise nil.

 

9.17.9 Function string-lt

Syntax:


  (string-lt
left-str right-str)

Description:

The string-lt is a deprecated alias for str<.

 

9.18 Vectors

 

9.18.1 Function vector

Syntax:


  (vector
length [initval])

Description:

The vector function creates and returns a vector object of the specified length. The elements of the vector are initialized to initval, or to nil if initval is omitted.

 

9.18.2 Function vec

Syntax:


  (vec
arg*)

Description:

The vec function creates a vector out of its arguments.

 

9.18.3 Function vectorp

Syntax:


  (vectorp
obj)

Description:

The vectorp function returns t if obj is a vector, otherwise it returns nil.

 

9.18.4 Function vec-set-length

Syntax:


  (vec-set-length
vec len)

Description:

The vec-set-length modifies the length of vec, making it longer or shorter. If the vector is made longer, then the newly added elements are initialized to nil. The len argument must be nonnegative.

The return value is vec.

 

9.18.5 Accessor vecref

Syntax:


  (vecref
vec idx)
  (set (vecref
vec idx) new-value)

Description:

The vecref function performs indexing into a vector. It retrieves an element of vec at position idx, counted from zero. The idx value must range from 0 to one less than the length of the vector. The specified element is returned.

If the element idx of vector vec exists, then the vecref form denotes a place.

A vecref place supports deletion. When a deletion takes place, then if idx denotes the last element in the vector, the vector's length is decreased by one, so that the vector no longer has that element. Otherwise, if idx isn't the last element, then each elements values at a higher index than idx shifts by one one element position to the adjacent lower index. Then, the length of the vector is decreased by one, so that the last element position disappears.

 

9.18.6 Function vec-push

Syntax:


  (vec-push
vec elem)

Description:

The vec-push function extends the length of a vector vec by one element, and sets the new element to the value elem.

The previous length of the vector (which is also the position of elem) is returned.

This function performs similarly to the generic function ref, except that the first argument must be a vector.

 

9.18.7 Function length-vec

Syntax:


  (length-vec
vec)

Description:

The length-vec function returns the length of vector vec. It performs similarly to the generic length function, except that the argument must be a vector.

 

9.18.8 Function size-vec

Syntax:


  (size-vec
vec)

Description:

The size-vec function returns the number of elements for which storage is reserved in the vector vec.

Notes:

The length of the vector can be extended up to this size without any memory allocation operations having to be performed.

 

9.18.9 Function vec-list

Syntax:


  (vec-list
list)

Description:

This function returns a vector which contains all of the same elements and in the same order as list list.

Note: this function is also known by the obsolescent name vector-list.

 

9.18.10 Function list-vec

Syntax:


  (list-vec
vec)

Description:

The list-vec function returns a list of the elements of vector vec.

Note: this function is also known by the obsolescent name list-vector.

 

9.18.11 Function copy-vec

Syntax:


  (copy-vec
vec)

Description:

The copy-vec function returns a new vector object of the same length as vec and containing the same elements in the same order.

 

9.18.12 Function sub-vec

Syntax:


  (sub-vec
vec [from [to]])

Description:

The sub-vec function is like the more generic function sub, except that it operates only on vectors.

For a description of the arguments and semantics, refer to the sub function.

 

9.18.13 Function replace-vec

Syntax:


  (replace-vec
vec item-sequence [from [to]])

Description:

The replace-vec is like the replace function, except that the first argument must be a vector.

For a description of the arguments, semantics and return value, refer to the replace function.

 

9.18.14 Function cat-vec

Syntax:


  (cat-vec
vec-list)

Description:

The vec-list argument is a list of vectors. The cat-vec function produces a catenation of the vectors listed in vec-list. It returns a single large vector formed by catenating those vectors together in order.

 

9.19 Structures

TXR supports a structure data type. Structures are objects which hold multiple storage locations called slots, which are named by symbols. Structures can be related to each other by inheritance.

The type of a structure is itself an object, of type struct-type.

When the program defines a new structure type, it does so by creating a new struct-type instance, with properties which describe the new structure type: its name, its list of slots, its initialization and "boa constructor" functions, and the structure type it inherits from (the "super").

The struct-type object is then used to generate instances.

Structures instances are not only containers which hold named slots, but they also indicate their struct type. Two structures which have the same number of slots having the same names are not necessarily of the same type.

Structure types and structures may be created and manipulated using a programming interface based on functions.

For more convenient and clutter-free expression of structure-based program code, macros are also provided.

Furthermore, concise and expressive slot access syntax is provided courtesy of the referencing dot syntax, a syntactic sugar for the qref macro.

Structure types have a name, which is a symbol. The typeof function, when applied to a any struct type, returns the symbol struct-type. When typeof is applied to a struct instance, it returns the name of the struct type. Effectively, struct names are types.

The consequences are unspecified if an existing struct name is re-used for a different struct type, or an existing type name is used for a struct type.

 

9.19.1 Static Slots

Structure slots can be of two kinds: they can be the ordinary instance slots or they can be static slots. The instances of a given structure type have their own instance of a given instance slot. However, they all share a single instance of a static slot.

Static slots are allocated in a global area associated with a structure type and are initialized when the structure type is created. They are useful for efficiently representing properties which have the same value for all instances of a struct. These properties don't have to occupy space in each instance, and time doesn't have to be wasted initializing them each time a new instance is created. Static slots are also useful for struct-specific global variables. Lastly, static slots are also useful for holding methods and functions. Although structures can have methods and functions in their instances, usually, all structures of the same type share the same functions. The defstruct macro supports a special syntax for defining methods and struct-specific functions at the same time when a new structure type is defined. The defmeth macro can be used for adding new methods and functions to an existing structure and its descendants.

Static slots may be assigned just like instance slots. Changing a static slot, of course, changes that slot in every structure of the same type.

Static slots are not listed in the #S(...) notation when a structure is printed. When the structure notation is read from a stream, if static slots are present, they will be processed and their values stored in the static locations they represent, thus changing their values for all instances.

Static slots are inherited just like instance slots. However, when one structure type inherits a static slot from another, that structure type has its own storage location for that slot.

The slot type can be overridden. A structure type deriving from another type can introduce slots which have the same names as the supertype, but are of a different kind: an instance slot in the supertype can be replaced by a static slot in the derived type or vice versa.

A structure type is associated with a static initialization function which may be used to store initial values into static slots. This function is invoked once in a type's life time, when the type is created. The function is also inherited by derived struct types and invoked when they are created.

If a newly introduced (that is to say, non-inherited) static slot isn't initialized by the static initialization function, its value defaults to nil. If an inherited slot isn't initialized by its supertype's initialization function, then its initial value in the new type is a copy of the current value of the supertype's corresponding slot.

 

9.19.2 Macro defstruct

Syntax:


  (defstruct {
name | (name arg*)} super
    
slot-specifier*)

Description:

The defstruct macro defines a new structure type and registers it under name, which must be a bindable symbol, according to the bindable function. Likewise, the name of every slot must also be a bindable symbol.

The super argument must either be nil or a symbol which names an existing struct type. The newly defined struct type will inherit all slots, as well as initialization behaviors from this type.

The defstruct macro is implemented using the make-struct-type function, which is more general. The macro analyzes the defstruct argument syntax, and synthesizes arguments which are then used to call the function. Some remarks in the description of defstruct only apply to structure types defined using that macro.

Slots are specified using zero or more slot specifiers. Slot specifiers come in the following variety:

name
The simplest slot specifier is just a name, which must be a bindable symbol, as defined by the bindable function. This form is a short form for the (:instance name nil) syntax.
(symbol init-form)
This syntax is a short form for the (:instance name init-form) syntax.
(:instance name init-form)
This syntax specifies an instance slot called name whose initial value is obtained by evaluating init-form whenever a new instance of the structure is created. This evaluation takes place in the original lexical environment in which the defstruct form occurs.
(:static name init-form)
This syntax specifies a static slot called name whose initial value is obtained by evaluating init-form once, during the evaluation of the defstruct form in which it occurs.
(:method name (param+) body-form*)
This syntax creates a static slot called name which is initialized with an anonymous function. The anonymous function is created during the evaluation of the defstruct form. The function takes the arguments specified by the param symbols, and its body consists of the body-form-s. There must be at least one param. When the function is invoked as a method, as intended, the leftmost param receives the structure instance. The body-form-s are evaluated in a context in which a block named name is visible. Consequently, return-from may be used to terminate the execution of a method and return a value. Methods are invoked using the instance.(name arg ...) syntax, which implicitly inserts the instance into the argument list.
(:function name (param*) body-form*)
This syntax creates a static slot called name which is initialized with an anonymous function. The anonymous function is created during the evaluation of the defstruct form. The function takes the arguments specified by the param symbols, and its body consists of the body-form-s. This specifier differs from :method only in one respect: there may be zero parameters. A structure function defined this way is intended to be used as a utility function which doesn't receive the structure instance as an argument. The body-form-s are evaluated in a context in which a block named name is visible. Consequently, return-from may be used to terminate the execution of the function and return a value. Such functions are called using the instance.[name arg ...] syntax which doesn't insert the instance into the argument list.
(:init (param) body-form*)
The :init specifier doesn't describe a slot. Rather, it specifies code which is executed when a structure is instantiated, after the slot initializations specific to the structure type are performed. The code consists of body-form-s which are evaluated in order in a lexical scope in which the variable param is bound to the structure object.

The :init specifier may not appear more than once in a given defstruct form.

When an object with one or more levels of inheritance is instantiated, the :init code of a base structure type, if any, is executed before any initializations specific to a derived structure type.

The :init initializations are executed before any other slot initializations. The argument values passed to the new or lnew operator or the make-struct function are not yet stored in the object's slots, and are not accessible. Initialization code which needs these values to be stable can be defined with :postinit.

Initializers in base structures must be careful about assumptions about slot kinds, because derived structures can alter static slots to instance slots or vice versa. To avoid an unwanted initialization being applied to the wrong kind of slot, initialization code can be made conditional on the outcome of static-slot-p applied to the slot. (Code generated by defstruct for initializing instance slots performs this kind of check).

The body-form-s of an :init specifier are not surrounded by an implicit block.

(:postinit (param) body-form*)
The :postinit specifier is very similar to :init. Both specify forms which are evaluated during object instantiation. The difference is that the body-form-s of a :postinit are evaluated after other initializations have taken place, including the :init initializations, as a second pass. By the time :postinit initialization runs, the argument material from the make-struct, new or lnew invocation has already been processed and stored into slots. Like :init actions, :postinit actions registered at different levels of the type's inheritance hierarchy are invoked in the base-to-derived order.
(:fini (param) body-form*)
The :fini specifier doesn't describe a slot. Rather, it specifies a finalization function which is associated with the structure instance, as if by use of the finalize function. This finalization registration takes place as the first step when an instance of the structure is created, before the slots are initialized and the :init code, if any, has been executed. The registration takes place as if by the evaluation of the form (finalize obj (lambda (param) body-form...) t) where obj denotes the structure instance. Note the t argument which requests reverse order of registration, ensuring that if an object has multiple finalizers registered at different levels of inheritance hierarchy, the finalizers specified for a derived structure type are called before inherited finalizers.

The body-form-s of a :fini specifier are not surrounded by an implicit block.

Note that an object's finalizers can be called explicitly with call-finalizers.

The with-objects macro arranges for finalizers to be called on objects when the execution of a scope terminates by any means.

The slot names given in a defstruct must all be unique among themselves, but they may match the names of existing slots in the super base type.

A given structure type can have only one slot under a given symbolic name. If a newly specified slot matches the name of an existing slot in the super type or that type's chain of ancestors, it is called a repeated slot.

A repeated slot inherits initialization forms from all of its ancestors.

The kind of the repeated slot (static or instance) is not inherited; it is established by the defstruct and may be different from the type of the same-named slot in the supertype or its ancestors.

A repeated slot only inherits the initializations which correspond to its kind. If a repeated slot is introduced as a static slot, then all of the static initializations in the ancestry chain are performed on that slot, which takes place during the evaluation of the defstruct form. If that slot is an instance slot in any of the ancestor structure types, their initializations do not apply and are not evaluated.

If a repeated slot is introduced as an instance slot then none of the static initializations in the ancestry chain are performed on it; none of the forms are evaluated. Those initializations target a static slot, which the derived type doesn't have. When an instance of the structure is created, then the instance initializations are performed on that slot from all of the ancestor structure types in which that slot is also an instance slot.

The initialization for slots which are specified using the :method or :function specifiers is re-ordered with regard to :static slots. Regardless of their placement in the defstruct form, :method and :function slots are initialized before :static slots. This ordering is useful, because it means that when the initialization expression for a given static slot constructs an instance of the struct type, any instance initialization code executing for that instance can use all functions and methods of the struct type. However, note the static slots which follow that slot in the defstruct syntax are not yet initialized. If it is necessary for a structure's initialization code to have access to all static slots, even when the structure is instantiated during the initialization of a static slot, a possible solution may be to use lazy instantiation using the lnew operator, rather than ordinary eager instantiation via new. It is also necessary to ensure that that the instance isn't accessed until all static initializations are complete, since access to the instance slots of a lazily instantiated structure triggers its initialization.

The structure name is specified using two forms, plain name or the syntax (name arg*) If the second form is used, then the structure type will support "boa construction", where "boa" stands for "by order of arguments". The arg-s specify the list of slot names which are to be initialized in the by-order-of-arguments style. For instance, if three slot names are given, then those slots can be optionally initialized by giving three arguments in the new macro or the make-struct function.

Slots are first initialized according to their init-form-s, regardless of whether they are involved in boa construction

A slot initialized in this style still has a init-form which is processed independently of the existence of, and prior to, boa construction.

The boa constructor syntax can specify optional parameters, delimited by a colon, similarly to the lambda syntax. However, the optional parameters may not be arbitrary symbols; they must be symbols which name slots. Moreover, the (name init-form [present-p]) optional parameter syntax isn't supported.

When boa construction is invoked with optional arguments missing, the default values for those arguments come from the init-form-s in the remaining defstruct syntax.

Examples:


  (defvar *counter* 0)


  ;; New struct type foo with no super type:
  ;; Slots a and b initialize to nil.
  ;; Slot c is initialized by value of (inc *counter*).
  (defstruct foo nil (a b (c (inc *counter*))))


  (new foo) -> #S(foo a nil b nil c 1)
  (new foo) -> #S(foo a nil b nil c 2)


  ;; New struct bar inheriting from foo.
  (defstruct bar foo (c 0) (d 100))


  (new bar) -> #S(bar a nil b nil c 0 d 100)
  (new bar) -> #S(bar a nil b nil c 0 d 100)


  ;; counter was still incremented during
  ;; construction of d:
  *counter* -> 4


  ;; override slots with new arguments
  (new foo a "str" c 17) -> #S(foo a "str" b nil c 17)


  *counter* -> 5


  ;; boa initialization
  (defstruct (point x : y) nil (x 0) (y 0))


  (new point) -> #S(point x 0 y 0)
  (new (point 1 1)) -> #S(point x 1 y 1)


  ;; property list style initialization
  ;; can always be used:
  (new point x 4 y 5) -> #S(point x 4 y 5)


  ;; boa applies last:
  (new (point 1 1) x 4 y 5) -> #S(point x 1 y 1)


  ;; boa with optional argument omitted:
  (new (point 1)) -> #S(point x 1 y 0)


  ;; boa with optional argument omitted and
  ;; with property list style initialization:
  (new (point 1) x 5 y 5) -> #S(point x 1 y 5)

 

9.19.3 Macro defmeth

Syntax:


  (defmeth
type-name name param-list body-form*)

Description:

The defmeth macro installs a function into the static slot named by the symbol name in the struct type indicated by type-name.

If the structure type doesn't already have such a static slot, it is first added, as if by the static-slot-ensure function, subject to the same checks.

If the function has at least one argument, it can be used as a method. In that situation, the leftmost argument passes the structure instance on which the method is being invoked.

The function takes the arguments specified by the param-list symbols, and its body consists of the body-form-s.

The body-form-s are placed into a block named name.

A method named lambda allows a structure to be used as if it were a function. When arguments are applied to the structure as if it were a function, the lambda method is invoked with those arguments, with the object itself inserted into the leftmost argument position.

If defmeth is used to redefine an existing method, the semantics can be inferred from that of static-slot-ensure. In particular, the method will be imposed into all subtypes which do not override the method using an instance slot, overwriting any subtype-specific methods stored in static slots of the same name. These subtype methods have to be individually reinstated, if they are required.

 

9.19.4 Macros new and lnew

Syntax:


  (new {
name | (name arg*)} {slot init-form}*)
  (lnew {
name | (name arg*)} {slot init-form}*)

Description:

The new macro creates a new instance of the structure type named by name.

If the structure supports "boa construction", then, optionally, the arguments may be given using the syntax (name arg*) instead of name.

Slot values may also be specified by the slot and init-form arguments.

Note: the evaluation order in new is surprising: namely, init-form-s are evaluated before arg-s if both are present.

When the object is constructed, all default initializations take place first. If the object's structure type has a supertype, then the supertype initializations take place. Then the type's initializations take place, followed by the slot init-form overrides from the new macro, and lastly the "boa constructor" overrides.

If any of the initializations abandon the evaluation of new by a non-local exit such as an exception throw, the object's finalizers, if any, are invoked.

The macro lnew differs from new in that it specifies the construction of a lazy struct, as if by the make-lazy-struct function. When lnew is used to construct an instance, a lazy struct is returned immediately, without evaluating any of the the arg and init-form expressions. The expressions are evaluated when any of the object's instance slots is accessed for the first time. At that time, these expressions are evaluated (in the same order as under new) and initialization proceeds in the same way.

If any of the initializations abandon the delayed initializations steps arranged by lnew by a non-local exit such as an exception throw, the object's finalizers, if any, are invoked.

Lazy initialization does not detect cycles. Immediately prior to the lazy initialization of a struct, the struct is marked as no longer requiring initialization. Thus, during initialization, its instance slots may be freely accessed. Slots not yet initialized evaluate as nil.

 

9.19.5 Macro with-slots

Syntax:


  (with-slots ({
slot | (sym slot)}*) struct-expr
    
body-form*)

Description:

The with-slots binds lexical macros to serve as aliases for the slots of a structure.

The struct-expr argument is expected to be an expression which evaluates to a struct object. It is evaluated once, and its value is retained. The aliases are then established to the slots of the resulting struct value.

The aliases are specified as zero or more expressions which consist of either a single symbol slot or a (sym slot) pair. The simple form binds a macro named slot to a slot also named slot. The pair form binds a macro named sym to a slot named slot.

The lexical aliases are syntactic places: assigning to an alias causes the value to be stored into the slot which it denotes.

After evaluating struct-expr the with-slots macro arranges for the evaluation of body-form-s in the lexical scope in which the aliases are visible.

Dialect Notes:

The intent of the with-slots macro is to help reduce the verbosity of code which makes multiple references to the same slot. Use of with-slots is less necessary in TXR Lisp than other Lisp dialects thanks to the dot operator for accessing struct slots.

Lexical aliases to struct places can also be arranged with considerable convenience using the placelet operator. However, placelet will not bind multiple aliases to multiple slots of the same object such that the expression which produces the object is evaluated only once.

Example:


  (defstruct point nil x y)


  ;; Here, with-slots introduces verbosity because
  ;; each slot is accessed only once. The function
  ;; is equivalent to:
  ;;
  ;; (defun point-delta (p0 p1)
  ;;   (new point x (- p1.x p0.x) y (- p1.y p0.y)))
  ;;
  ;; Also contrast with the use of placelet:
  ;;
  ;; (defun point-delta (p0 p1)
  ;;   (placelet ((x0 p0.x) (y0 p0.y)
  ;;              (x1 p1.x) (y1 p1.y))
  ;;     (new point x (- x1 x0) y (- y1 y0)))))


  (defun point-delta (p0 p1)
    (with-slots ((x0 x) (y0 y)) p0
      (with-slots ((x1 x) (y1 y)) p1
        (new point x (- x1 x0) y (- y1 y0)))))

 

9.19.6 Macro qref

Syntax:


  (qref
object-form
     {
slot | (slot arg*) | [slot arg*]}+)

Description:

The qref macro performs structure slot access. Structure slot access is more conveniently expressed using the referencing dot notation, which works by translating to qref qref syntax, according to the following equivalence:


  a.b.c.d <--> (qref a b c d)  ;; a b c d must not be numbers

(See the Referencing Dot section under Additional Syntax.)

The leftmost argument of qref is an expression which is evaluated. This argument is followed by one or more reference designators. If there are two or more designators, the following equivalence applies:


  (qref obj d1 d2 ...)  <---> (qref (qref obj d1) d2 ...)

That is to say, qref is applied to the object and a single designator. This must yield an object, which to which the next designator is applied as if by another qref operation, and so forth.

Thus, qref can be understood entirely in terms of the semantics of the binary form (qref object-form designator)

Designators come in three forms: a lone symbol, an ordinary compound expression consisting of a symbol followed by arguments, or a DWIM expression consisting of a symbol followed by arguments.

A lone symbol designator indicates the slot of that name. That is to say, the following equivalence applies:


  (qref o n)  <-->  (slot o 'n)

Where slot is the structure slot accessor function. Because slot is an accessor, this form denotes the slot as a syntactic place; slots can be modified via assignment to the qref form and the referencing dot syntax.

A compound designator indicates that the named slot is a function, and arguments are to be applied to it. The following equivalence applies in this case, except that o is evaluated only once:


  (qref o (n arg ...)) <--> (call (slot o 'n) o arg ...)

A DWIM designator indicates that the named slot is a function or an indexable or callable object. The following equivalence applies:


  (qref obj [name arg ...])  <-->  [(slot obj 'name) arg ...]

Example:


  (defstruct foo nil
    (array (vec 1 2 3))
    (increment (lambda (self index delta)
                 (inc [self.array index] delta))))


  (defvarl s (new foo))


  ;; access third element of s.array:
  s.[array 2]  -->  3


  ;; increment first element of array by 42
  s.(increment 0 42)  -->  43


  ;; access array member
  s.array  -->  #(43 2 3)

Note how increment behaves much like a single-argument-dispatch object-oriented method. Firstly, the syntax s.(increment 0 42) effectively selects the increment function which is particular to the s object. Secondly, the object is passed to the selected function as the leftmost argument, so that the function has access to the object.

 

9.19.7 Macro meth

Syntax:


  (meth
struct slot)

Description:

The meth macro binds struct as the leftmost argument of the function stored in slot, returning a function which takes the remaining arguments. That is to say, it returns a function f such that (f arg ... ) calls (struct.slot struct arg ... ) except that struct is evaluated only once.

The argument struct must be an expression which evaluates to a struct. The slot argument is not evaluated, and must be a symbol denoting a slot. The syntax can be understood as a translation to a call of the method function:


  (meth a b)  <-->  (method a 'b)

The meth macro allows indirection upon a method-like function stored in a function slot.

Example:


  ;; struct for counting atoms eq to key
  (defstruct (counter key) nil
    key
    (count 0)
    (:method increment (self key)
      (if (eq self.key key)
        (inc self.count))))


  ;; pass all atoms in tree to func
  (defun map-tree (tree func)
    (if (atom tree)
      [func tree]
      (progn (map-tree (car tree) func)
             (map-tree (cdr tree) func))))


  ;; count occurrences of symbol a
  ;; using increment method of counter,
  ;; passed as func argument to map-tree.
  (let ((c (new (counter 'a)))
        (tr '(a (b (a a)) c a d)))
    (map-tree tr (meth c increment))
    c)
  --> #S(counter key a count 4
                 increment #<function: type 0>)

 

9.19.8 Macro umeth

Syntax:


  (umeth
slot)

Description:

The umeth macro binds the symbol slot to a function and returns that function.

When that function is called, it expects at least one argument. The leftmost argument must be an object of struct type.

The slot named slot is retrieved from that object, and is expected to be a function. That function is called with the same arguments.

The syntax can be understood as a translation to a call of the umethod function:


  (umeth s)  <-->  (umethod 's)

The macro merely provides the syntactic sugar of not having to quote the symbol.

Example:


   ;; seal and dog are variables which hold structures of
   ;; different types. Both have a method called bark.


   (let ((bark-fun (umeth bark)))
     [bark-fun dog]     ;; same effect as dog.(bark)
     [bark-fun seal])   ;; same effect as seal.(bark)

The u in umeth stands for "unbound". The function produced by umeth is not bound to any specific object; it binds to an object whenever it is invoked by retrieving the actual method from the object's slot at call time.

 

9.19.9 Macro usl

Syntax:


  (usl
slot)

Description:

The usl macro binds the symbol slot to a function and returns that function.

When that function is called, it expects exactly one argument. That argument must be an object of struct type. The slot named slot is retrieved from that object and returned.

The name usl stands for "unbound slot". The term "unbound" refers to the returned function not being bound to a particular object. The binding of the slot to an object takes place whenever the function is called.

 

9.19.10 Function make-struct-type

Syntax:


  (make-struct-type
name super static-slots slots
                   
static-initfun initfun boactor)
                   
boactor postinitfun)

Description:

The make-struct-type function creates a new struct type.

The name argument must be a bindable symbol, according to the bindable function. It specifies the name property of the struct type as well as the name under which the struct type is globally registered.

The super argument indicates the supertype for the struct type. It must be either a value of type struct-type, a symbol which names a struct type, or else nil, indicating that the newly created struct type has no supertype.

The static-slots argument is a list of symbol which specify static slots. The symbols must be bindable and the list must not contain duplicates.

The slots argument is a list of symbols which specifies the instance slots. The symbols must be bindable and there must not be any duplicates within the list, or against entries in the static-slots list.

The new struct type's effective list of slots is formed by appending together static-slots and slots, and then appending that to the list of the supertype's slots, and de-duplicating the resulting list as if by the uniq function. Thus, any slots which are already present in the supertype are removed. If the structure has no supertype, then the list of supertype slots is taken to be empty. When a structure is instantiated, it shall have all the slots specified in the effective list of slots. Each instance slot shall be initialized to the value nil, prior to the invocation of initfun and boactor.

The static-initfun argument either specifies an initialization function, or is nil, which is equivalent to specifying a function which does nothing.

Prior to the invocation of static-initfun, each new static slot shall be initialized to the value nil and each inherited static slot shall be initialized to the current value which the corresponding static slot holds in the supertype.

If specified, static-initfun function must accept one argument. When the structure type is created (before the make-struct-type function returns) all of the static-initfun functions in the chain of supertype ancestry are invoked, in order of inheritance. Each is passed the structure type as an argument. The purpose is to initialize the static slots.

The initfun argument either specifies an initialization function, or is nil, which is equivalent to specifying a function which does nothing. If specified, this function must accept one argument. When a structure is instantiated, every initfun in its chain of supertype ancestry is invoked, in order of inheritance, so that the root supertype's initfun is called first and the structure's own specific initfun is called last. These calls occur before the slots are initialized from the arg arguments or the slot-init-plist of make-struct. Each function is passed the newly created structure object, and may alter its slots.

The boactor argument either specifies a by-order-of-arguments initialization function ("boa constructor") or is nil, which is equivalent to specifying a constructor which does nothing. If specified, it must be a function which takes at least one argument. When a structure is instantiated, and boa arguments are given, the boactor is invoked, with the structure as the leftmost argument, and the boa arguments as additional arguments. This takes place after the processing of initfun functions, and after the processing of the slot-init-plist specified in the make-struct call. Note that the boactor functions of the supertypes are not called, only the boactor specific to the type being constructed.

The postinitfun argument either specifies an initialization function, or is nil, which is equivalent to specifying a function which does nothing. If specified, this function must accept one argument. The postinitfun function is similar to initfun. The difference is that postinitfun functions are called after all other initialization processing, rather than before. Unlike initfun functions, they are also called in the opposite in order of inheritance, so that the structure type's own specific postinitfun is called first, and root supertype's initfun is called last.

 

9.19.11 Function find-struct-type

Syntax:


  (find-struct-type
name)

Description:

The find-struct-type returns a struct-type object corresponding to the symbol name.

If no struct type is registered under name, then it returns nil.

 

9.19.12 Function struct-type-p

Syntax:


  (struct-type-p
obj)

Description:

The struct-type-p function returns t if obj is a structure type, otherwise it returns nil.

 

9.19.13 Function super

Syntax:


  (super
type)

Description:

The super function returns the struct type object which is the supertype of type, or returns nil if type has no supertype.

The type argument must be either a struct type object, a a symbol which names a struct type (which is resolved to that type), or else a structure instance (which is resolved to its structure type).

 

9.19.14 Function make-struct

Syntax:


  (make-struct
type slot-init-plist arg*)

Description:

The make-struct function returns a new object which is an instance of the structure type type.

The type argument must either be a struct-type object, or else a symbol which is the name of a structure.

The slot-init-plist argument gives a list of slot initializations in the style of a property list, as defined by the prop function. It may be empty, in which case it has no effect. Otherwise, it specifies slot names and their values. Each slot name which is given must be a slot of the structure type. The corresponding value will be stored into the slot of the newly created object. If a slot is repeated, it is unspecified which value takes effect.

The optional arg-s specify arguments to the structure type's boa constructor. If the arguments are omitted, the boa constructor is not invoked. Otherwise the boa constructor is invoked on the structure object and those arguments. The argument list must match the trailing parameters of the boa constructor (the remaining parameters which follow the leftmost argument which passes the structure to the boa constructor).

When a new structure is instantiated by make-struct, its slot values are first initialized by the structure type's registered functions as described under make-struct-type. Then, the slot-init-plist is processed, if not empty, and finally, the arg-s are processed, if present, and passed to the boa constructor.

If any of the initializations abandon the evaluation of make-struct by a non-local exit such as an exception throw, the object's finalizers, if any, are invoked.

 

9.19.15 Function make-lazy-struct

Syntax:


  (make-lazy-struct
type argfun)

Description:

The make-lazy-struct function returns a new object which is an instance of the structure type type.

The type argument must either be a struct-type object, or else a symbol which is the name of a structure.

The argfun argument should be a function which can be called with no parameters and returns a cons cell. More requirements are specified below.

The object returned by make-lazy-struct is a lazily-initialized struct instance, or lazy struct.

A lazy struct remains uninitialized until just before the first access to any of its instance slots. Just before an instance slot is accessed, initialization takes place as follows. The argfun function is invoked with no arguments. Its return value must be a cons cell. The car of the cons cell is taken to be a property list, as defined by the prop function. The cdr field is taken to be a list of arguments. These values are treated as if they were, respectively, the slot-init-plist and the boa constructor arguments given in a make-struct invocation. Initialization of the structure proceeds as described in the description of make-struct.

 

9.19.16 Function copy-struct

Syntax:


  (copy-struct
struct-obj)

Description:

The copy-struct function creates and returns a new object which is a duplicate of struct-obj, which must be a structure.

The duplicate object is a structure of the same type as struct-obj and has the same slot values.

The creation of a duplicate does not involve calling any of the struct type's initialization functions.

Only instance slots participate in the duplication. Since the original structure and copy are of the same structure type, they already share static slots.

 

9.19.17 Accessor slot

Syntax:


  (slot
struct-obj slot-name)
  (set (slot
struct-obj slot-name) new-value)

Description:

The slot function retrieves a structure's slot. The struct-obj argument must be a structure, and slot-name must be a symbol which names a slot in that structure.

Because slot is an accessor, a slot form is a syntactic place which denotes the slot's storage location.

A syntactic place expressed by slot does not support deletion.

 

9.19.18 Function slotset

Syntax:


  (slotset
struct-obj slot-name new-value)

Description:

The slotset function stores a value in a structure's slot.
 The struct-obj argument must be a structure, and slot-name must be a symbol which names a slot in that structure.

The new-value argument specifies the value to be stored in the slot.

 

9.19.19 Function structp

Syntax:


  (structp
obj)

Description:

The structp function returns t if obj is a structure, otherwise it returns nil.

 

9.19.20 Function struct-type

Syntax:


  (structp
struct-obj)

Description:

The struct-type function returns the structure type object which defines the type of the structure object struct-obj.

 

9.19.21 Function clear-struct

Syntax:


  (clear-struct
struct-obj [value])

Description:

The clear-struct replaces all instance slots of struct-obj with value, which defaults to nil if omitted.

Note that finalizers are not executed prior to replacing the slot values.

 

9.19.22 Function reset-struct

Syntax:


  (reset-struct
struct-obj)

Description:

The reset-struct function reinitializes the structure object struct-obj as if it were being newly created. First, all the slots are set to nil as if by the clear-struct function. Then the slots are initialized by invoking the initialization functions, in order of the supertype ancestry, just as would be done for a new structure object created by make-struct with an empty slot-init-plist and no boa arguments.

Note that finalizers registered against struct-obj are not invoked, and remain registered. If the structure has state which is cleaned up by finalizers, it is advisable to invoke them using call-finalizers prior to using reset-struct, or to take other measures to deal with the situation.

 

9.19.23 Function replace-struct

Syntax:


  (replace-struct
target-obj source-obj)

Description:

The replace-struct function causes target-obj to take on the attributes of source-obj without changing its identity.

The type of target-obj is changed to that of source-obj.

All instance slots of target-obj are discarded, and it is given new slots, which are copies of the instance slots of source-obj.

Because of the type change, target-obj implicitly loses all of its original static slots, and acquires those of source obj.

Note that finalizers registered against target-obj are not invoked, and remain registered. If target-obj has state which is cleaned up by finalizers, it is advisable to invoke them using call-finalizers prior to using replace-struct, or to take other measures to handle the situation.

 

9.19.24 Function method

Syntax:


  (method
struct-obj slot-name)

Description:

The method function retrieves a function from a structure's slot and binds that function's left argument to the structure.

The struct-obj argument must be a structure, and slot-name must be a symbol denoting a slot in that structure. The slot must hold a function of at least one argument.

The method function returns a function which, when invoked, calls the function previously retrieved from the object's slot, passing to that function struct-obj as the leftmost argument, followed by the function's own arguments.

Note: the meth macro is an alternative interface which is suitable if the slot name isn't a computed value.

 

9.19.25 Function super-method

Syntax:


  (super-method
struct-obj slot-name)

Description:

The super-method function retrieves a function from a static slot belonging to the supertype of the structure type of struct-obj.

It then returns a function which binds that function's left argument to the structure.

The struct-obj argument must be a structure which has a supertype, and slot-name must be a symbol denoting a static slot in that supertype. The slot must hold a function of at least one argument.

The super-method function returns a function which, when invoked, calls the function previously retrieved from the supertype's static slot, passing to that function struct-obj as the leftmost argument, followed by the function's own arguments.

 

9.19.26 Function umethod

Syntax:


  (umethod
slot-name)

Description:

The umethod returns a function which represents the set of all methods named by the slot slot-name in all structure types, including ones not yet defined. The slot-name argument must be a symbol.

This function must be called with at least one argument. The leftmost argument must be an object of structure type, which has a slot named slot-name. The function will retrieve the value of the slot from that object, expecting it to be a function, and calls it, passing to it all of its arguments.

Note: the umethod name stands for "unbound method". Unlike the method function, umethod doesn't return a method whose leftmost argument is already bound to an object; the binding occurs at call time.

 

9.19.27 Function uslot

Syntax:


  (uslot
slot-name)

Description:

The uslot returns a function which represents all slots named slot-name in all structure types, including ones not yet defined. The slot-name argument must be a symbol.

The returned function must be called with exactly one argument. The argument must be a structure which has a slot named slot-name. The function will retrieve the value of the slot from that object and return it.

Note: the uslot name stands for "unbound slot". The returned function isn't bound to a particular object. The binding of slot-name to a slot in the structure object occurs when the function is called.

 

9.19.28 Function slotp

Syntax:


  (slotp
type name)

Description:

The slotp function returns t if name name is a symbol which names a slot in the structure type type. Otherwise it returns nil.

The type argument must be a structure type, or else a symbol which names a structure type.

 

9.19.29 Function static-slot-p

Syntax:


  (static-slot-p
type name)

Description:

The static-slot-p function returns t if name name is a symbol which names a slot in the structure type type, and if that slot is a static slot. Otherwise it returns nil.

The type argument must be a structure type, or else a symbol which names a structure type.

 

9.19.30 Function static-slot

Syntax:


  (static-slot
type name)

Description:

The static-slot function retrieves the value of the static slot named by symbol name of the structure type type.

The type argument must be a structure type or a symbol which names a structure type, and name must be a static slot of this type.

 

9.19.31 Function static-slot-set

Syntax:


  (static-slot-set
type name new-value)

Description:

The static-slot-set function stores new-value into the static slot named by symbol name of the structure type type.

It returns new-value.

The type argument must be a structure type or the name of a structure type, and name must be a static slot of this type.

 

9.19.32 Function static-slot-ensure

Syntax:


  (static-slot-ensure
type name new-value [no-error-p])

Description:

The static-slot-ensure first ensures that the struct type type and all struct types derived from it have a static slot called name. The slot is added as a static slot to every eligible type which doesn't already have an instance or static slot by that name.

Then, new-value is stored into all of the name static slots of type and all its derived types.

If type itself already has an instance slot called name then an error is thrown, and the function has no effect. If the same situation is true of the subtypes of type then the situation is ignored: for those subtypes, no static slot is added, and new-value is not stored. If the no-error-p argument is present, and its value is true, then type is treated just like the subtypes: if it has a conflicting instance slot, then the situation is ignored and the subtypes are processed anyway.

 

9.19.33 Function call-super-method

Syntax:


  (call-super-method
struct-obj name argument*)

Description:

The call-super-method retrieves the function stored in the slot name of the supertype of struct-obj and invokes it, passing to that function struct-obj as the leftmost argument, followed by the given argument-s, if any.

The struct-obj argument must be of structure type. Moreover, that structure type must be derived from another structure type, and name must name a static slot of that structure type.

The object retrieved from that static slot must be callable as a function, and accept the arguments.

 

9.19.34 Function call-super-fun

Syntax:


  (call-super-fun
type name argument*)

Description:

The call-super-method retrieves the function stored in the slot name of the supertype of type and invokes it, passing to that function the given argument-s, if any.

The type argument must be a structure type. Moreover, that structure type must be derived from another structure type, and name must name a static slot of that structure type.

The object retrieved from that static slot must be callable as a function, and accept the arguments.

 

9.19.35 Macro with-objects

Syntax:


  (with-objects ({(
sym init-form)}*) body-form*)

Description:

The with-objects macro provides a binding construct very similar to let*.

Each sym must be a symbol suitable for use as a variable name.

Each init-form is evaluated in sequence, and a binding is established for its corresponding sym which is initialized with the value of that form. The binding is visible to subsequent init-form-s.

Additionally, the values of the init-form-s are noted as they are produced. When the with-objects form terminates, by any means, the call-finalizers function is invoked on each value which was returned by an init-form and had been noted. These calls are performed in the reverse order relative to the original evaluation of the forms.

After the variables are established and initialized, the body-form-s are evaluated in the scope of the variables. The value of the last form is returned, or else nil if there are no forms. The invocations of call-finalizers take place just before the value of the last form is returned.

 

9.20 Special Structure Functions

Special structure functions are user-defined methods or structure functions which are specially recognized by certain functions in TXR Lisp. They endow structure objects with the ability to participate in certain usage scenarios, or to participate in a customized way.

Special functions are required to bound to static slots, which is the case if the defmeth macro is used, or when methods or functions are defined using syntax inside a defstruct form. If a special function or method is defined as an instance slot, then the behavior of library functions which depend on this method is unspecified.

 

9.20.1 Method equal

Syntax:


  
object.(equal)

Description:

Normally, two struct values are not considered the same under the equal function unless they are the same object.

However, if the equal method is defined for a structure type, then instances of that structure type support equality substitution.

The equal method must not take any arguments. Moreover, the method must never return nil.

When a struct which supports equality substitution is compared using equal, less or greater, its equal method is invoked, and the return value is used in place of that structure for the purposes of the comparison.

The same applies when an struct is hashed using the hash-equal function, or implicitly by an :equal-hash hash tables.

Note: if an equal method is defined or redefined with different semantics for a struct type whose instances have already been inserted as keys in an :equal-based hash table, searches for those keys will not work reliably.

 

9.20.2 Method print

Syntax:


  
object.(print stream)

Description:

If a method named by the symbol print is defined for a structure type, then it is used for pretty-printing instances of that type.

The stream argument specifies the output stream to which the printed representation is to be written.

The value returned by the print method is ignored.

 

9.20.3 Method lambda

Syntax:


  
object.(lambda arg*)

Description:

If a structure type provides a method called lambda then it can be used as a function.

Of course, this method can be called by name, using the syntax given in the above syntactic description.

However, the intended use is that it allows the structure instance itself to be used as a function. When arguments are applied to a structure object as if it were a function, this is erroneous, unless that object has a lambda method. In that case, the arguments are passed to the lambda method. Of course, the leftmost argument of the method is the structure instance itself.

That is to say, the following equivalences apply, except that s is evaluated only once:


  (call s args ...)  <-->  s.(lambda args ...)


  [s args ...]  <-->  [s.lambda s args ...]


  (mapcar s list)  <-->  (mapcar (meth s lambda) list)

 

9.20.4 Methods car, cdr, and nullify

Syntax:


  
object.(car)
  
object.(cdr)
  
object.(nullify)

Description:

Structures may be treated as sequences if they define methods named by the symbols car, cdr, and nullify.

If a structure supports these methods, then these methods are used by the functions car, cdr, nullify, empty and various other sequence manipulating functions derived from them, when those functions are applied to that object.

An object which implements these three methods can be considered to denote an abstract sequence. The object's car method should return the first value in that abstract sequence, or else nil if that sequence is empty.

The object's cdr method should return an object denoting the remainder of the sequence, or else nil if the sequence is empty or contains only one value. This returned object can be of any type: it may be of the same structure type as that object, a different structure type, a list, or whatever else. If a non-sequence object is returned.

The nullify method should return nil if the object is considered to denote an empty sequence. Otherwise it should return that object itself.

 

9.20.5 Function from-list

Syntax:


  
object.[from-list list]

Description:

If a from-list structure function is defined for a structure type, it is called in certain situations with an argument which is a list object. The function's purpose is to construct a new instance of the structure type, derived from that list.

Note: the from-list function isn't a method; it doesn't receive object as an argument. In the style of call depicted by the syntax description above, object is used to identify the structure type whose from-list static slot provides the function definition.

The purpose of this function is to allow sequence processing operations such as mapcar and remove to operate on a structure object as if it were a sequence, and return a transformed sequence of the same type. This is analogous to the way such functions can operate on a vector or string, and return a vector or string.

If a structure object behaves as a sequence thanks to providing car, cdr and nullify methods, but does not have a from-list function, then those sequence-processing operations which return a sequence will always return a plain list of items.

 

9.21 Sequence Manipulation

 

9.21.1 Function seqp

Syntax:


  (seqp
object)

Description:

The function seqp returns t if object is a sequence, otherwise nil.

A sequence is defined as a list, vector or string. The object nil denotes the empty list and so is a sequence.

 

9.21.2 Function length

Syntax:


  (length
sequence)

Description:

The length function returns the number of items in sequence, and returns it. sequence may be a hash, in which case (hash-count sequence) is returned.

 

9.21.3 Function empty

Syntax:


  (empty
sequence)

Description:

Returns t if (length sequence) is zero, otherwise nil.

 

9.21.4 Function copy

Syntax:


  (copy
object)

Description:

The copy function duplicates objects of various supported types: sequences, hashes, structures and random states. If object is nil, it returns nil. If object is a list, it returns (copy-list object). If object is a string, it returns (copy-str object). If object is a vector, it returns (copy-vec object). If object is a hash, it returns (copy-hash object). If object is a structure, it returns (copy-struct object). Lastly, if object is a random state, it returns (make-random-state object).

Except in the case when sequence is nil, copy returns a value that is distinct from (not eq to) sequence. This is different from the behavior of [sequence 0..t] or (sub sequence 0 t) which recognize that they need not make a copy of sequence, and just return it.

Note however, that the elements of the returned sequence may be eq to elements of the original sequence. In other words, copy is a deeper copy than just duplicating the sequence value itself, but it is not a deep copy.

 

9.21.5 Function sub

Syntax:


  (sub
sequence [from [to]])

Description:

The sub function extracts a slice from input sequence sequence. The slice is a sequence of the same type as sequence.

If the from argument is omitted, it defaults to 0. If the to parameter is omitted, it defaults to t. Thus (sub a) means (sub a 0 t).

The following equivalence holds between the sub function and the DWIM-bracket syntax:


  ;; from is not a list
  (sub seq from to) <--> [seq from..to]

The description of the dwim operator—in particular, the section on Range Indexing—explains the semantics of the range specification.

If the sequence is a list, the output sequence may share substructure with the input sequence.

 

9.21.6 Function replace

Syntax:


  (replace
sequence replacement-sequence [from [to]])
  (replace
sequence replacement-sequence index-list)

Description:

The replace function modifies sequence in the ways described below.

The operation is destructive: it may work "in place" by modifying the original sequence. The caller should retain the return value and stop relying on the original input sequence.

The return value of replace is the modified version of sequence. This may be the same object as sequence or it may be a newly allocated object.

Note that the form:


  (set seq (replace seq new fr to))

has the same effect on the variable seq as the form:


  (set [seq fr..to] new)

except that the former set form returns the entire modified sequence, whereas the latter returns the value of the new argument.

The replace function has two invocation styles, distinguished by the type of the third argument. If the third argument is a list or vector, then it is deemed to be the index-list parameter of the second form. Otherwise, if the third argument is missing, or is not a list, then it is deemed to be the from argument of the first form.

The first form of the replace function replaces a contiguous subsequence of the sequence with replacement-sequence. The replaced subsequence may be empty, in which case an insertion is performed. If replacement-sequence is empty (for example, the empty list nil), then a deletion is performed.

If the from and to arguments are omitted, their values default to 0 and t respectively.

The description of the dwim operator—in particular, the section on Range Indexing—explains the semantics of the range specification.

The second form of the replace function replaces a subsequence of elements from sequence given by index-list, with their counterparts from replacement-sequence. This form of the replace function does not insert or delete; it simply overwrites elements. If replacement-sequence and index-list are of different lengths, then the shorter of the two determines the maximum number of elements which are overwritten. Furthermore, similar restrictions apply on index-list as under the select function. Namely, the replacement stops when an index value in index-list is encountered which is out of range for sequence. furthermore, if sequence is a list, then index-list must be monotonically increasing.

 

9.21.7 Function take

Syntax:


  (take
count sequence)

Description:

The take function returns sequence with all except the first count items removed.

If sequence is a list, then take returns a lazy list which produces the first count items of sequence.

For other kinds of sequences, including lazy strings, drop works eagerly.

If count exceeds the length of sequence then a sequence is returned which has all the items. This object may be sequence itself, or a copy.

If count is negative, it is treated as zero.

 

9.21.8 Functions take-while and take-until

Syntax:


  (take-while
predfun sequence [keyfun])
  (take-until
predfun sequence [keyfun])

Description:

The take-while and take-until functions return a prefix of sequence whose items satisfy certain conditions.

The take-while function returns the longest prefix of sequence whose elements, accessed through keyfun satisfy the function predfun.

The keyfun argument defaults to the identity function: the elements of sequence are examined themselves.

The take-until function returns the longest prefix of sequence which consists of elements, accessed through keyfun, that do not satisfy predfun followed by an element which does satisfy predfun. If sequence has no such prefix, then an empty sequence is returned of the same kind as sequence.

If sequence is a list, then these functions return a lazy list.

 

9.21.9 Function drop

Syntax:


  (drop
count sequence)

Description:

The drop function returns sequence with the first count items removed.

If count is negative, it is treated as zero.

If count is zero, then sequence is returned.

If count exceeds the length of sequence then an empty sequence is returned of the same kind as sequence.

 

9.21.10 Functions drop-while and drop-until

Syntax:


  (drop-while
predfun sequence [keyfun])
  (drop-until
predfun sequence [keyfun])

Description:

The drop-while and drop-until functions return sequence with a prefix of that sequence removed, according to conditions involving predfun and keyfun.

The drop-while function removes the longest prefix of sequence whose elements, accessed through keyfun satisfy the function predfun, and returns the remaining sequence.

The keyfun argument defaults to the identity function: the elements of sequence are examined themselves.

The drop-until function removes the longest prefix of sequence which consists of elements, accessed through keyfun, that do not satisfy predfun followed by an element which does satisfy predfun. A sequence of the remaining elements is returned.

If sequence has no such prefix, then a sequence same as sequence is returned, which may be sequence itself or a copy.

 

9.21.11 Function butlast

Syntax:


  (butlast
sequence)

Description:

The butlast function returns the prefix of sequence consisting of a copy of it, with the last item omitted. If sequence is empty, an empty sequence is returned.

Dialect note: the Common Lisp function nbutlast is not provided. The TXR Lisptake function provides the same functionality for lists (only with the arguments reversed relative to nbutlast), and additionally provides lazy semantics, and works with vectors and strings.

 

9.21.12 Function search

Syntax:


  (search
haystack needle [testfun [keyfun])

Description:

The search function determines whether the sequence needle occurs as substring within haystack, under the given comparison function testfun and key function keyfun. If this is the case, then the zero-based position of the leftmost occurrence of key within haystack is returned. Otherwise nil is returned to indicate that key does not occur within haystack. If key is empty, then zero is always returned.

The arguments haystack and needle are sequences: lists, vectors or strings, in any combination.

If needle is not empty, then occurs at some position N within haystack if the first element of needle matches the element at position N of haystack, the second element of needle matches the element at position N+1 of haystack and so forth, for all elements of needle. A match between elements is determined by passing each element through keyfun, and then comparing the resulting values using testfun.

If testfun is supplied, it must be a function which can be called with two arguments. If it is not supplied, it defaults to eql.

If keyfun is supplied, it must be a function which can be called with one argument. If it is not supplied, it defaults to identity.

Examples:


  ;; fails because 3.0 doesn't match 3
  ;; under the default eql function
  [search #(1.0 3.0 4.0 7.0) '(3 4)] -> nil


  ;; occurrence found at position 1:
  ;; (3.0 4.0) matches (3 4) under =
  [search #(1.0 3.0 4.0 7.0) '(3 4) =] -> 1


  ;; "even odd odd odd even" pattern
  ;; matches at position 2
  [search #(1 1 2 3 5 7 8) '(2 1 1 1 2) : evenp] -> 2


  ;; Case insensitive string search
  [search "abcd" "CD" : chr-toupper] -> 2


  ;; Case insensitive string search
  ;; using vector of characters as key
  [search "abcd" #(#\C #\D) : chr-toupper] -> 2

 

9.21.13 Function rsearch

Syntax:


  (rsearch
haystack needle [testfun [keyfun])

Description:

The rsearch function is like search except that if needle matches haystack in multiple places, rsearch returns the right-most matching position rather than the leftmost.

 

9.21.14 Functions ref and refset

Syntax:


  (ref
seq index)
  (refset
seq index new-value)

Description:

The ref and refset functions perform array-like indexing into sequences. If the seq parameter is a hash, then these functions perform has retrieval and storage; in that case index isn't restricted to an integer value.

The ref function retrieves an element of seq, whereas refset overwrites an element of seq with a new value.

If seq is a sequence then index argument must be an integer. The first element of the sequence is indexed by zero. Negative values are permitted, denoting backward indexing from the end of the sequence, such that the last element is indexed by -1, the second last by -2 and so on. See also the Range Indexing section under the description of the dwim operator.

If seq is a list, then out-of-range indices, whether positive or negative, are treated leniently by ref: such accesses produce the value nil, rather than an error. For other sequence types, such accesses are erroneous. For hashes, accesses to nonexistent elements are treated leniently, and produce nil.

The refset function is strict for out-of-range indices over all sequences, including lists. In the case of hashes, a refset of a nonexistent key creates the key.

The refset function returns new-value.

The following equivalences hold between ref and refset, and the DWIM bracket syntax:


  (ref seq idx) <--> [seq idx]


  (refset seq idx new) <--> (set [seq idx] new)

The difference is that ref and refset are first class functions which can be used in functional programming as higher order functions, whereas the bracket notation is syntactic sugar, and set is an operator, not a function. Therefore the brackets cannot replace all uses of ref and refset.

 

9.21.15 Function update

Syntax:


  (update
sequence-or-hash function)

Description:

The update function replaces each elements in a sequence, or each value in a hash table, with the value of function applied to that element or value.

The sequence or hash table is returned.

 

9.21.16 Functions remq, remql and remqual

Syntax:


  (remq
object list<> [ key-function ])
  (remql
object list<> [ key-function ])
  (remqual
object list<> [ key-function ])

Description:

The remq, remql and remqual functions produce a new list based on list, removing the elements whose associated keys are eq, eql or equal to object.

The input list is unmodified, but the returned list may share substructure with it. If no items are removed, it is possible that the return value is list itself.

If key-function is omitted, then the element keys compared to object are the elements themselves. Otherwise, key-function is applied to each element and the resulting value is that element's key which is compared to object.

 

9.21.17 Functions remq*, remql* and remqual*

Syntax:


  (remq*
object list )
  (remql*
object list )
  (remqual*
object list )

Description:

The remq*, remql* and remqual* functions are lazy versions of remq, remql and remqual. Rather than computing the entire new list prior to returning, these functions return a lazy list.

Caution: these functions can still get into infinite looping behavior. For instance, in (remql* 0 (repeat '(0))), remql will keep consuming the 0 values coming out of the infinite list, looking for the first item that does not have to be deleted, in order to instantiate the first lazy value.

Examples:


  ;; Return a list of all the natural numbers, excluding 13,
  ;; then take the first 100 of these.
  ;; If remql is used, it will loop until memory is exhausted,
  ;; because (range 1) is an infinite list.


  [(remql* 13 (range 1)) 0..100]

 

9.21.18 Functions keepq, keepql and keepqual

Syntax:


  (keepq
object list [key-function])
  (keepql
object list [key-function])
  (keepqual
object list [key-function])

Description:

The keepq, keepql and keepqual functions produce a new list based on list, removing the items whose keys are not eq, eql or equal to object.

The input list is unmodified, but the returned list may share substructure with it. If no items are removed, it is possible that the return value is list itself.

The optional key-function is applied to each element from the list to convert it to a key which is compared to object. If key-function is omitted, then each element itself of list is compared to object.

 

9.21.19 Functions remove-if, keep-if, remove-if* and keep-if*

Syntax:


  (remove-if
predicate-function list [key-function])
  (keep-if
predicate-function list [key-function])
  (remove-if*
predicate-function list [key-function])
  (keep-if*
predicate-function list [key-function])

Description:

The remove-if function produces a list whose contents are those of list but with those elements removed which satisfy predicate-function. Those elements which are not removed appear in the same order. The result list may share substructure with the input list, and may even be the same list object if no items are removed.

The optional key-function specifies how each element from the list is transformed to an argument to predicate-function. If this argument is omitted then the predicate function is applied to the elements directly, a behavior which is identical to key-function being (fun identity).

The keep-if function is exactly like remove-if, except the sense of the predicate is inverted. The function keep-if retains those items which remove-if will delete, and removes those that remove-if will preserve.

The remove-if* and keep-if* functions are like remove-if and keep-if, but produce lazy lists.

Examples:


  ;; remove any element numerically equal to 3.
  (remove-if (op = 3) '(1 2 3 4 3.0 5)) -> (1 2 4 5)


  ;; remove those pairs whose first element begins with "abc"
  [remove-if (op equal [@1 0..3] "abc")
             '(("abcd" 4) ("defg" 5))
             car]
  -> (("defg" 5))


  ;; equivalent, without test function
  (remove-if (op equal [(car @1) 0..3] "abc")
             '(("abcd" 4) ("defg" 5)))
  -> (("defg" 5))

 

9.21.20 Functions countqual, countql and countq

Syntax:


  (countq
object list)
  (countql
object list)
  (countqual
object list)

Description:

The countq, countql and countqual functions count the number of objects in list which are eq, eql or equal to object, and return the count.

 

9.21.21 Function count-if

Syntax:


  (count-if
predicate-function list [key-function])

Description:

The count-if function counts the number of elements of list which satisfy predicate-function and returns the count.

The optional key-function specifies how each element from the list is transformed to an argument to predicate-function. If this argument is omitted then the predicate function is applied to the elements directly, a behavior which is identical to key-function being (fun identity).

 

9.21.22 Functions posq, posql and posqual

Syntax:


  (posq
object list)
  (posql
object list)
  (posqual
object list)

Description:

The posq, posql and posqual functions return the zero-based position of the first item in list which is, respectively, eq, eql or equal to object.

 

9.21.23 Functions pos and pos-if

Syntax:


  (pos
key list [testfun [keyfun]])
  (pos-if
predfun list [keyfun])

Description:

The pos and pos-if functions search through list for an item which matches key, or satisfies predicate function predfun, respectively. They return the zero-based position of the matching item.

The keyfun argument specifies a function which is applied to the elements of list to produce the comparison key. If this argument is omitted, then the untransformed elements of list are examined.

The pos function's testfun argument specifies the test function which is used to compare the comparison keys from list to key. If this argument is omitted, then the equal function is used. The position of the first element list whose comparison key (as retrieved by keyfun) matches the search (under testfun) is returned. If no such element is found, nil is returned.

The pos-if function's predfun argument specifies a predicate function which is applied to the successive comparison keys taken from list by applying keyfun to successive elements. The position of the first element for which predfun yields true is returned. If no such element is found, nil is returned.

 

9.21.24 Functions rposq, rposql, rposqual, rpos and rpos-if

Syntax:


  (rposq
object list)
  (rposql
object list)
  (rposqual
object list)
  (rpos
key list [testfun [keyfun]])
  (rpos-if
predfun list [keyfun])

Description:

These functions are counterparts of rposq, rposql, rposqual, rpos and rpos-if which report position of the right-most matching item, rather than the left-most.

 

9.21.25 Functions pos-max and pos-min

Syntax:


  (pos-max
sequence [testfun [keyfun]])
  (pos-min
sequence [testfun [keyfun]])

Description:

The pos-min and pos-max functions implement exactly the same algorithm; they differ only in their defaulting behavior with regard to the testfun argument. If testfun is not given, then the pos-max function defaults testfun to the greater function, whereas pos-min defaults it to the less function.

If sequence is empty, both functions return nil.

Without a testfun argument, the pos-max function finds the zero-based position index of the numerically maximum value occurring in sequence, whereas pos-min without a testfun argument finds the index of the minimum value.

If a testfun argument is given, the two functions are equivalent. The testfun function must be callable with two arguments. If testfun behaves like a greater-than comparison, then pos-max and pos-min return the index of the maximum element. If testfun behaves like a less-than comparison, then the functions return the index of the minimum element.

The keyfun argument defaults to the identity function. Each element from sequence is passed through this one-argument function, and the resulting value is used in its place.

 

9.21.26 Function where

Syntax:


  (where
function object)

Description:

If object is a sequence, the where function returns a list of the numeric indices of those of its elements which satisfy function. The numeric indices appear in increasing order.

If object is a hash, the where function returns an unordered list of keys which have values which satisfy function.

function must be a function that can be called with one argument. For each element of object, function is called with that element as an argument. If a non-nil value is returned, then the zero-based index of that element is added to a list. Finally, the list is returned.

 

9.21.27 Function select

Syntax:


  (select
object {index-list |function})

Description:

The select function returns an object, of the same kind as object, which consists of those elements of object which are identified by the indices in index-list, which may be a list or a vector.

If function is given instead of index-list, then function is invoked with object as its argument. The return value is then taken as if it were the index-list argument .

If object is a sequence, then index-list consists of numeric indices. The select function stops processing object upon encountering an index inside index-list which is out of range. (Rationale: without this strict behavior, select would not be able to terminate if index-list is infinite.)

If object is a list, then index-list must contain monotonically increasing numeric values, even if no value is out of range, since the select function makes a single pass through the list based on the assumption that indices are ordered. (Rationale: optimization.)

If object is a hash, then index-list is a list of keys. A new hash is returned which contains those elements of object whose keys appear in index-list. All of index-list is processed, even if it contains keys which are not in object.

 

9.21.28 Function in

Syntax:


  (in
sequence key [testfun [keyfun]])
  (in
hash key)

Description:

The in function tests whether key is found inside sequence or hash.

If the testfun argument is specified, it specifies the function which is used to comparison keys from the sequence to key. Otherwise the equal function is used.

If the keyfun argument is specified, it specifies a function which is applied to the elements of sequence to produce the comparison keys. Without this argument, the elements themselves are taken as the comparison keys.

If the object being searched is a hash, then the keyfun and testfun arguments are ignored.

The in function returns t if it finds key in sequence or hash, otherwise nil.

 

9.21.29 Function partition

Syntax:


  (partition
sequence {index-list |index |function})

Description:

If sequence is empty, then partition returns an empty list, and the second argument is ignored; if it is function, it is not called.

Otherwise, partition returns a lazy list of partitions of sequence. Partitions are consecutive, non-overlapping, non-empty sub-strings of sequence, of the same kind as sequence, such that if these sub-strings are catenated together in their order of appearance, a sequence equal to the original is produced.

If the second argument is of the form index-list, it shall be a sequence of strictly non-decreasing, integers. First, any leading negative or zero values in this sequence are dropped. The partition function then divides sequence according to the indices in index list. The first partition begins with the first element of sequence. The second partition begins at the first position in index-list, and so on. Indices beyond the length of the sequence are ignored.

If index-list is empty then a one-element list containing the entire sequence is returned.

If the second argument is a function, then this function is applied to sequence, and the return value of this call is then used in place of the second argument, which must be an index or index-list.

If the second argument is an atom other than a function, it is assumed to be an integer index, and is turned into an index-list of one element.

Examples:


  (partition '(1 2 3) 1) -> ((1) (2 3))


  ;; split the string where there is a "b"
  (partition "abcbcbd" (op where (op eql #\b))) -> ("a" "bc"
                                                    "bc" "bd")

 

9.21.30 Functions split and split*

Syntax:


  (split
sequence {index-list |index |function})
  (split*
sequence {index-list |index |function})

Description:

If sequence is empty, then both split and split* return an empty list, and the second argument is ignored; if it is function, it is not called.

Otherwise, split returns a lazy list of pieces of sequence: consecutive, non-overlapping, possibly empty sub-strings of sequence, of the same kind as sequence. A catenation of these pieces in the order they appear would produce a sequence that is equal to the original sequence.

The split* function differs from split in that the elements indicated by the split indices are removed.

If the second argument is of the form index-list, it shall be a sequence of increasing integers. The split function divides sequence according to the indices in index list. The first piece always begins with the first element of sequence. Each subsequent piece begins with the position indicated by an element of index-list. Negative indices are ignored. Repeated values give rise to empty pieces. If index-list includes index zero, then an empty first piece is generated. If index-list includes an index greater than or equal to the length of sequence (equivalently, an index beyond the last element of the sequence) then an additional empty last piece is generated.

If index-list is empty then a one-element list containing the entire sequence is returned.

If the second argument is a function, then this function is applied to sequence, and the return value of this call is then used in place of the second argument, which must be an index or index-list.

If the second argument is an atom other than a function, it is assumed to be an integer index, and is turned into an index-list of one element.

Examples:


  (split '(1 2 3) 1) -> ((1) (2 3))


  (split "abc" 0) -> ("" "abc")
  (split "abc" 3) -> ("abc" "")
  (split "abc" 1) -> ("a" "bc")
  (split "abc" '(0 1 2 3)) -> ("" "a" "b" "c" "")
  (split "abc" '(1 2)) -> ("a" "b" "c")


  (split "abc" '(-1 1 2 15)) -> ("a" "b" "c")


  ;; triple split at makes two additional empty pieces
  (split "abc" '(1 1 1)) -> ("a" "" "" "bc")


  (split* "abc" 0) -> ("" "bc") ;; "a" is removed
  (split* "abc" '(0 1 2)) -> ("" "" "" "") ;; all characters removed

 

9.21.31 Function partition*

Syntax:


  (partition*
sequence {index-list |index |function})

Description:

If sequence is empty, then partition* returns an empty list, and the second argument is ignored; if it is function, it is not called.

If the second argument is of the form index-list, which is a sequence of strictly increasing non-negative integers, then partition* produces a lazy list of pieces taken from sequence. The pieces are formed by deleting from sequence the elements at the positions given in index-list. The pieces are the non-empty sub-strings between the deleted elements.

If index-list is empty then a one-element list containing the entire sequence is returned.

If the second argument is a function, then this function is applied to sequence, and the return value of this call is then used in place of the second argument, which must be an index or index-list.

If the second argument is an atom other than a function, it is assumed to be an integer index, and is turned into an index-list of one element.

Examples:


  (partition* '(1 2 3 4 5) '(0 2 4)) -> ((1) (3) (5))


  (partition* "abcd" '(0 3)) -> "bc"


  (partition* "abcd" '(0 1 2 3)) -> nil

 

9.21.32 Functions find and find-if

Syntax:


  (find
key sequence [testfun [keyfun]])
  (find-if
predfun sequence [keyfun])

Description:

The find and find-if functions search through a sequence for an item which matches a key, or satisfies a predicate function, respectively.

The keyfun argument specifies a function which is applied to the elements of sequence to produce the comparison key. If this argument is omitted, then the untransformed elements of the sequence are searched.

The find function's testfun argument specifies the test function which is used to compare the comparison keys from sequence to the search key. If this argument is omitted, then the equal function is used. The first element from the list whose comparison key (as retrieved by keyfun) matches the search (under testfun) is returned. If no such element is found, nil is returned.

The find-if function's predfun argument specifies a predicate function which is applied to the successive comparison keys pulled from the list by applying keyfun to successive elements. The first element for which predfun yields true is returned. If no such element is found, nil is returned.

 

9.21.33 Functions rfind and rfind-if

Syntax:


  (rfind
key sequence [testfun [keyfun]])
  (rfind-if
predfun sequence [keyfun])

Description:

The rfind and rfind-if functions are almost exactly like find and find-if except that if there are multiple matches for key in sequence, they return the right-most element rather than the leftmost.

 

9.21.34 Functions find-max and find-min

Syntax:


  (find-max
sequence [testfun [keyfun]])
  (find-min
sequence [testfun [keyfun]])

Description:

The find-min and find-max function implement exactly the same algorithm; they differ only in their defaulting behavior with regard to the testfun argument. If testfun is not given, then the find-max function defaults it to the greater function, whereas find-min defaults it to the less function.

Without a testfun argument, the find-max function finds the numerically maximum value occurring in sequence, whereas pos-min without a testfun argument finds the minimum value.

If a testfun argument is given, the two functions are equivalent. The testfun function must be callable with two arguments. If testfun behaves like a greater-than comparison, then find-max and find-min both return the maximum element. If testfun behaves like a less-than comparison, then the functions return the minimum element.

The keyfun argument defaults to the identity function. Each element from sequence is passed through this one-argument function, and the resulting value is used in its place for the purposes of the comparison. However, the original element is returned.

 

9.21.35 Function set-diff

Syntax:


  (set-diff
seq1 seq2 [testfun [keyfun]])

Description:

The set-diff function treats the sequences seq1 and seq2 as if they were sets and computes the set difference: a sequence which contains those elements in seq1 which do not occur in seq2.

set-diff returns a sequence of the same kind as seq1.

Element equivalence is determined by a combination of testfun and keyfun. Elements are compared pairwise, and each element of a pair is passed through keyfun function to produce a comparison value. The comparison values are compared using testfun. If keyfun is omitted, then the untransformed elements themselves are compared, and if testfun is omitted, then the equal function is used.

If seq1 contains duplicate elements which do not occur in seq2 (and thus are preserved in the set difference) then these duplicates appear in the resulting sequence. Furthermore, the order of the items from seq1 is preserved.

 

9.21.36 Functions mapcar, mappend mapcar* and mappend*

Syntax:


  (mapcar
function sequence*)
  (mappend
function sequence*)
  (mapcar*
function sequence*)
  (mappend*
function sequence*)

Description:

When given only one argument, the mapcar function returns nil. function is never called.

When given two arguments, the mapcar function applies function to each elements of sequence and returns a sequence of the resulting values in the same order as the original values. The returned sequence is the same kind as sequence, if possible. If the accumulated values cannot be elements of that type of sequence, then a list is returned.

When additional sequences are given as arguments, this filtering behavior is generalized in the following way: mapcar traverses the sequences in parallel, taking a value from each sequence as an argument to the function. If there are two lists, function is called with two arguments and so forth. The traversal is limited by the length of the shortest sequence. The return values of the function are collected into a new sequence which is returned. The returned sequence is of the same kind as the leftmost input sequence, unless the accumulated values cannot be elements of that type of sequence, in which case a list is returned.

The mappend function works like mapcar, with the following difference. Rather than accumulating the values returned by the function into a sequence, mappend expects the items returned by the function to be sequences which are catenated with append, and the resulting sequence is returned. The returned sequence is of the same kind as the leftmost input sequence, unless the values cannot be elements of that type of sequence, in which case a list is returned.

The mapcar* and mappend* functions work like mapcar and mappend, respectively. However, they return lazy lists rather than generating the entire output list prior to returning.

Caveats:

Like mappend, mappend* must "consume" empty lists. For instance, if the function being mapped puts out a sequence of nil-s, then the result must be the empty list nil, because (append nil nil nil nil ...) is nil.

But suppose that mappend* is used on inputs which are infinite lazy lists, such that the function returns nil values indefinitely. For instance:


  ;; Danger: infinite loop!!!
  (mappend* (fun identity) (repeat '(nil))) 

The mappend* function is caught in a loop trying to consume and squash an infinite stream of nil-s, and so doesn't return.

Examples:


  ;; multiply every element by two
  (mapcar (lambda (item) (* 2 item)) '(1 2 3)) -> (4 6 8)


  ;; "zipper" two lists together
  (mapcar (lambda (le ri) (list le ri)) '(1 2 3) '(a b c)) '((1 a) (2 b) (3 c)))


  ;; like append, mappend allows a lone atom or a trailing atom:
  (mappend (fun identity) 3) -> (3)
  (mappend (fun identity) '((1) 2)) -> (1 . 2)


  ;; take just the even numbers
  (mappend (lambda (item) (if (evenp x) (list x))) '(1 2 3 4 5))
  -> (2 4)

 

9.21.37 Function mapdo

Syntax:


  (mapdo
function sequence*)

Description:

The mapdo function is similar to mapcar, but always returns nil. It is useful when function performs some kind of side effect, hence the "do" in the name, which is a mnemonic for the execution of imperative actions.

When only the function argument is given, function is never called, and nil is returned.

If a single sequence argument is given, then mapdo iterates over sequence, invoking function on each element.

If two or more sequence arguments are given, then mapdo iterates over the sequences in parallel, extracting parallel tuples of items. These tuples are passed as arguments to function, which must accept as many arguments as there are sequences.

 

9.21.38 Functions transpose and zip

Syntax:


  (transpose
sequence)
  (zip
sequence*)

Description:

The transpose function performs a transposition on sequence. This means that the elements of sequence must be sequences. These sequences are understood to be columns; transpose exchanges rows and columns, returning a sequence of the rows which make up the columns. The returned sequence is of the same kind as sequence, and the rows are also the same kind of sequence as the first column of the original sequence. The number of rows returned is limited by the shortest column among the sequences.

All of the input sequences (the elements of sequence) must have elements which are compatible with the first sequence. This means that if the first element of sequence is a string, then the remaining sequences must be strings, or else sequences of characters, or of strings.

The zip function takes variable arguments, and is equivalent to calling transpose on a list of the arguments. The following equivalences hold:

Syntax:


   (zip . x) <--> (transpose x)


   [apply zip x] <--> (transpose x)

Examples:


  ;; transpose list of lists
  (transpose '((a b c) (c d e))) ->  ((a c) (b d) (c e))


  ;; transpose vector of strings:
  ;; - string columns become string rows
  ;; - vector input becomes vector output
  (transpose #("abc" "def" "ghij")) -> #("adg" "beh" "cfi")


  ;; error: transpose wants to make a list of strings
  ;; but 1 is not a character
  (transpose #("abc" "def" '(1 2 3))) ;; error!


  ;; String elements are catenated:
  (transpose #("abc" "def" ("UV" "XY" "WZ"))) -> #("adUV" "beXY" "cfWZ")


  (zip '(a b c) '(c d e)) ->  ((a c) (b d) (c e))

 

9.21.39 Functions window-map and window-mappend

Syntax:


  (window-map
range boundary function sequence)
  (window-mappend
range boundary function sequence)

Description:

The window-map and window-mappend functions process the elements of sequence by passing arguments derived from each successive element to function. Both functions return, if possible, a sequence of the same kind as sequence, otherwise a list.

Under window-map, values returned by function are accumulated into a sequence of the same type as sequence and that sequence is returned. Under window-mappend, the values returned by the calls to function are expected to be sequence which are appended together to form the output sequence.

These functions are analogous to mapcar and mappend. Unlike these, they operate only on a single sequence, and over this sequence they perform a sliding window mapping, whose description follows.

The argument to the range parameter must be a positive integer, not exceeding 512. This parameter specified the amount of ahead/behind context on either side of each element which is processed. It indirectly determines the window size for the mapping. The window size is twice range, plus one. For instance if range is , then the window size is 5: the element being processed lies at the center of the window, flanked by two elements on either side, making five.

The function argument must specify a function which accepts a number of arguments corresponding to the window size. For instance if range is 2, making the window size 5, then function must accept 5 arguments. These arguments constitute the sliding window being processed. Each time function is called, the middle argument is the element being processed, and the arguments surrounding it are its window.

When an element is processed from somewhere in the interior of a sequence, where it is flanked on either side by at least range elements, then the window is populated by those flanking elements taken from sequence.

The boundary parameter specifies the window contents which are used for the processing of elements which are closer than range to either end of the sequence. The argument may be a sequence containing at least twice range number of elements (one less than the window size): if it has additional elements, they are not used. If it is a list, it may be shorter than twice range. The argument may also be one of the two keyword symbols :wrap or :reflect, described below.

If boundary is a sequence, it may be regarded as divided into two pieces of range length. If it is a list of insufficient length, then missing elements are supplied as nil to make two range's worth of elements. These two pieces then flank sequence on either end. The left half of boundary is effectively prepended to the sequence, and the right half effectively appended. When the sliding window extends beyond the boundary of sequence near its start or end, the window is populated from these flanking elements obtained from boundary.

If boundary is the keyword :wrap, then the sequence is effectively flanked by copies of itself on both ends, repeated enough times to satisfy the window. For instance if the sequence is (1 2 3) and the window size is 9 due to the value of range being 7, then the behavior of :wrap is as if a boundary were specified consisting of (3 1 2 3 1 2 3 1). The left flank is (3 1 2 3) and the right flank is (1 2 3 4) formed by repetitions of (1 2 3) surrounding it on either side, extending out to infinity, and chopped to range.

If boundary is the keyword :reflect, then the sequence is effectively flanked by reversed copies of itself on both ends, repeated enough times to satisfy the window. For instance if the sequence is (1 2 3) and the window size is 9, then the behavior of :wrap is as if a boundary were specified consisting of (1 3 2 1 3 2 1 3).

 

9.21.40 Function interpose

Syntax:


  (interpose
sep sequence)

Description:

The interpose function returns a sequence of the same type as sequence, in which the elements from sequence appear with the sep value inserted between them.

If sequence is an empty sequence or a sequence of length 1, then a sequence identical to sequence is returned. It may be a copy of sequence or it may be sequence itself.

If sequence is a character string, then the value sep must be a character.

It is permissible for sequence, or for a suffix of sequence to be a lazy list, in which case interpose returns a lazy list, or a list with a lazy suffix.

Examples:


  (interpose #\- "xyz") -> "x-y-z"
  (interpose t nil) -> nil
  (interpose t #()) -> #()
  (interpose #\a "") -> ""
  (interpose t (range 0 0)) -> (0)
  (interpose t (range 0 1)) -> (0 t 1)
  (interpose t (range 0 2)) -> (0 t 1 t 2)

 

9.21.41 Functions apply and iapply

Syntax:


  (apply
function [arg* trailing-args])
  (iapply
function [arg* trailing-args])

Description:

The apply function invokes function, optionally passing to it an argument list. The return value of the apply call is that of function.

If no arguments are present after function, then function is invoked without arguments.

If one argument is present after function, then it is interpreted as trailing-args. If this is a sequence (a list, vector or string), then the elements of the sequence are passed as individual arguments to function. If trailing-args is not a sequence, then function is invoked with an improper argument list, terminated by the trailing-args atom.

If two or more arguments are present after function, then the last of these arguments is interpreted as trailing-args. The previous arguments represent leading arguments which are applied to function, prior to the arguments taken from trailing-args.

Note that if trailing-args value is an atom or an improper list, the function is then invoked with an improper argument list. Only a variadic function may be invoked with an improper argument lists. Moreover, all of the function's required and optional parameters must be satisfied by elements of the improper list, such that the terminating atom either matches the rest-param directly (see the lambda operator) or else the rest-param receives an improper list terminated by that atom. To treat the terminating atom of an improper list as an ordinary element which can satisfy a required or optional function parameter, the iapply function may be used, described next.

The iapply function ("improper apply") is similar to apply, except with regard to the treatment of trailing-args. Firstly, under iapply, if trailing-args is an atom other than nil (possibly a sequence, such as a vector or string), then it is treated as an ordinary argument: function is invoked with a proper argument list, whose last element is trailing-args. Secondly, if trailing-args is a list, but an improper list, then the terminating atom of trailing-args becomes an ordinary argument. Thus, in all possible cases, iapply treats an extra non-nil atom as an argument, and never calls function with an improper argument list.

Examples:


  ;; '(1 2 3) becomes arguments to list, thus (list 1 2 3).
  (apply (fun list) '(1 2 3)) -> (1 2 3)


  ;; this effectively invokes (list 1 2 3 4)
  (apply (fun list) 1 2 '(3 4)) -> (1 2 3 4)


  ;; this effectively invokes (list 1 2 . 3)
  (apply (fun list) 1 2 3)) -> (1 2 . 3)


  ;; "abc" is separated into characters which become arguments of list
  (apply (fun list) "abc") -> (#\a #\b #\c)

Dialect Note:

Note that some uses of this function that are necessary in other Lisp dialects are not necessary in TXR Lisp. The reason is that in TXR Lisp, improper list syntax is accepted as a compound form, and performs application:


  (foo a b . x)

Here, the variables a and b supply the first two arguments for foo. In the dotted position, x must evaluate to a list or vector. The list or vector's elements are pulled out and treated as additional arguments for foo. Of course, this syntax can only be used if x is a symbolic form or an atom. It cannot be a compound form, because (foo a b . (x)) and (foo a b x) are equivalent structures.

 

9.21.42 Functions reduce-left and reduce-right

Syntax:


  (reduce-left
binary-function list
               [
init-value [key-function]])


  (reduce-right
binary-function list
                [
init-value [key-function]])

Description:

The reduce-left and reduce-right functions reduce lists of operands specified by list and init-value to a single value by the repeated application of binary-function.

An effective list of operands is formed by combining list and init-value. If key-function is specified, then the items of list are mapped to a new values through key-function. If init-value is supplied, then in the case of reduce-left, the effective list of operands is formed by prepending init-value to list. In the case of reduce-right, the effective operand list is produced by appending init-value to list.

The production of the effective list can be expressed like this, though this is not to be understood as the actual implementation:


  (append (if init-value-present (list init-value))
          [mapcar (or key-function identity) list]))))

In the reduce-right case, the arguments to append are reversed.

If the effective list of operands is empty, then binary-function is called with no arguments at all, and its value is returned. This is the only case in which binary-function is called with no arguments; in all remaining cases, it is called with two arguments.

If the effective list contains one item, then that item is returned.

Otherwise, the effective list contains two or more items, and is decimated as follows.

Note that an init-value specified as nil is not the same as a missing init-value; this means that the initial value is the object nil. Omitting init-value is the same as specifying a value of : (the colon symbol). It is possible to specify key-function while omitting an init-value argument. This is achieved by explicitly specifying : as the init-value argument.

Under reduce-left, the leftmost pair of operands is removed from the list and passed as arguments to binary-function, in the same order that they appear in the list, and the resulting value initializes an accumulator. Then, for each remaining item in the list, binary-function is invoked on two arguments: the current accumulator value, and the next element from the list. After each call, the accumulator is updated with the return value of binary-function. The final value of the accumulator is returned.

Under reduce-right, the list is processed right to left. The rightmost pair of elements in the effective list is removed, and passed as arguments to binary-function, in the same order that they appear in the list. The resulting value initializes an accumulator. Then, for each remaining item in the list, binary-function is invoked on two arguments: the next element from the list, in right to left order, and the current accumulator value. After each call, the accumulator is updated with the return value of binary-function. The final value of the accumulator is returned.

Examples:


  ;;; effective list is (1) so 1 is returned
  (reduce-left (fun +) () 1 nil)  ->  1


  ;;; computes (- (- (- 0 1) 2) 3)
  (reduce-left (fun -) '(1 2 3) 0 nil) -> -6


  ;;; computes (- 1 (- 2 (- 3 0)))
  (reduce-right (fun -) '(1 2 3) 0 nil) -> 2


  ;;; computes (* 1 2 3)
  (reduce-left (fun *) '((1) (2) (3)) nil (fun first)) -> 6


  ;;; computes 1 because the effective list is empty
  ;;; and so * is called with no arguments, which yields 1.
  (reduce-left (fun *) nil)

 

9.21.43 Functions some, all and none

Syntax:


  (some
sequence [predicate-fun [key-fun]])
  (all
sequence [predicate-fun [key-fun]])
  (none
sequence [predicate-fun [key-fun]])

Description:

The some, all and none functions apply a predicate test function predicate-fun over a list of elements. If the argument key-fun is specified, then elements of sequence are passed into key-fun, and predicate-fun is applied to the resulting values. If key-fun is omitted, the behavior is as if key-fun is the identity function. If predicate-fun is omitted, the behavior is as if predicate-fun is the identity function.

These functions have short-circuiting semantics and return conventions similar to the and and or operators.

The some function applies predicate-fun to successive values produced by retrieving elements of list and processing them through key-fun. If the list is empty, it returns nil. Otherwise it returns the first non-nil return value returned by a call to predicate-fun and stops evaluating more elements. If predicate-fun returns nil for all elements, it returns nil.

The all function applies predicate-fun to successive values produced by retrieving elements of list and processing them through key-fun. If the list is empty, it returns t. Otherwise, if predicate-fun yields nil for any value, the all function immediately returns without invoking predicate-fun on any more elements. If all the elements are processed, then the all function returns the value which predicate-fun yielded for the last element.

The none function applies predicate-fun to successive values produced by retrieving elements of list and processing them through key-fun. If the list is empty, it returns t. Otherwise, if predicate-fun yields non-nil for any value, the none function immediately returns nil. If predicate-fun yields nil for all values, the none function returns t.

Examples:


  ;; some of the integers are odd
  [some '(2 4 6 9) oddp] -> t


  ;; none of the integers are even
  [none '(1 3 4 7) evenp] -> t

 

9.21.44 Function multi

Syntax:


  (multi
function sequence*)

Description:

The multi function distributes an arbitrary list processing function multi over multiple sequences given by the list arguments.

The sequence arguments are first transposed into a single list of tuples. Each successive element of this transposed list consists of a tuple of the successive items from the lists. The length of the transposed list is that of the shortest list argument.

The transposed list is then passed to function as an argument.

The function is expected to produce a list of tuples, which are transposed again to produce a list of lists which is then returned.

Conceptually, the input sequences are columns and function is invoked on a list of the rows formed from these columns. The output of function is a transformed list of rows which is reconstituted into a list of columns.

Example:


  ;; Take three lists in parallel, and remove from all of them
  ;; them the element at all positions where the third list
  ;; has an element of 20.


  (multi (op remove-if (op eql 20) @1 third)
         '(1 2 3)
         '(a b c)
         '(10 20 30))


  -> ((1 3) (a c) (10 30))


  ;; The (2 b 20) "row" is gone from the three "columns".


  ;; Note that the (op remove if (op eql 20) @1 third)
  ;; expression can be simplified using the ap operator:
  ;;
  ;; (op remove-if (ap eql @3 20))

 

9.21.45 Function sort

Syntax:


  (sort
sequence [lessfun [keyfun]])

Description:

The sort function destructively sorts sequence, producing a sequence which is sorted according to the lessfun and keyfun arguments.

The keyfun argument specifies a function which is applied to elements of the sequence to obtain the key values which are then compared using the lessfun. If keyfun is omitted, the identity function is used by default: the sequence elements themselves are their own sort keys.

The lessfun argument specifies the comparison function which determines the sorting order. It must be a binary function which can be invoked on pairs of keys as produced by the key function. It must return a non-nil value if the left argument is considered to be lesser than the right argument. For instance, if the numeric function < is used on numeric keys, it produces an ascending sorted order. If the function > is used, then a descending sort is produced. If lessfun is omitted, then it defaults to the generic less function.

The sort function is stable for sequences which are lists. This means that the original order of items which are considered identical is preserved. For strings and vectors, sort is not stable.

 

9.21.46 Function shuffle

Syntax:


  (shuffle
sequence)

Description:

The shuffle function pseudo-randomly rearranges the elements of sequence. This is performed in place: sequence object is modified.

The return value is sequence itself.

The rearrangement depends on pseudo-random numbers obtained from the rand function.

 

9.21.47 Function sort-group

Syntax:


  (sort-group
sequence [keyfun [lessfun]])

Description:

The sort-group function sorts sequence according to the keyfun and lessfun arguments, and then breaks the resulting sequence into groups, based on the equivalence of the elements under keyfun.

The following equivalence holds:


  (sort-group sq lf kf) <--> (partition-by kf (sort (copy sq) kf lf))

Note the reversed order of keyfun and lessfun arguments between sort and sort-group.

 

9.21.48 Function uniq

Syntax:


  (uniq
sequence)

Description:

The uniq function returns a sequence of the same kind as sequence, but with duplicates removed. Elements of sequence are considered equal under the equal function. The first occurrence of each element is retained, and the subsequent duplicates of that element, of any, are suppressed, such that the order of the elements is otherwise preserved.

The following equivalence holds between uniq and unique:


  (uniq s) <--> [unique s : :equal-based]

That is, uniq is like unique with the default keyfun argument (the identity function) and an equal-based hash table.

 

9.21.49 Function unique

Syntax:


  (uniq
sequence [keyfun {hash-arg}* ])

Description:

The unique function is a generalization of uniq. It returns a sequence of the same kind as sequence, but with duplicates removed.

If neither keyfun nor hash-arg-s are specified, then elements of sequence are considered equal under the eql function. The first occurrence of each element is retained, and the subsequent duplicates of that element, of any, are suppressed, such that the order of the elements is otherwise preserved.

If keyfun is specified, then that function is applied to each element, and the resulting values are compared for equality. In other words, the behavior is as if keyfun were the identity function.

If one or more hash-arg-s are present, these specify the arguments for the construction of the internal hash table used by unique. The arguments are like those of the hash function. In particular, the argument :equal-based causes unique to use equal equality.

 

9.21.50 Function tuples

Syntax:


  (tuples
length sequence [fill-value])

Description:

The tuples function produces a lazy list which represents a reorganization of the elements of sequence into tuples of length, where length must be a positive integer.

The length of the sequence might not be evenly divisible by the tuple length. In this case, if a fill-value argument is specified, then the last tuple is padded with enough repetitions of fill-value to make it have length elements. If fill-value is not specified, then the last tuple is left shorter than length.

The output of the function is a list, but the tuples themselves are sequences of the same kind as sequence. If sequence is any kind of list, they are lists, and not lazy lists.

Examples:


  (tuples 3 #(1 2 3 4 5 6 7 8) 0) -> (#(1 2 3) #(4 5 6) #(7 8 0))
  (tuples 3 "abc") -> ("abc")
  (tuples 3 "abcd") -> ("abc" "d")
  (tuples 3 "abcd" #\z) -> ("abc" "dzz")
  (tuples 3 (list 1 2) #\z) -> ((1 2 #\z))

 

9.21.51 Function partition-by

Syntax:


  (partition-by
function sequence)

Description:

If sequence is empty, then partition-by returns an empty list, and function is never called.

Otherwise, partition-by returns a lazy list of partitions of the sequence sequence. Partitions are consecutive, non-empty sub-strings of sequence, of the same kind as sequence.

The partitioning begins with the first element of sequence being placed into a partition.

The subsequent partitioning is done according to function, which is applied to each element of sequence. Whenever, for the next element, the function returns the same value as it returned for the previous element, the element is placed into the same partition. Otherwise, the next element is placed into, and begins, a new partition.

The return values of the calls to function are compared using the equal function.

Examples:


  [partition-by identity '(1 2 3 3 4 4 4 5)] -> ((1) (2) (3 3)
                                                 (4 4 4) (5))


  (partition-by (op = 3) #(1 2 3 4 5 6 7)) -> (#(1 2) #(3)
                                               #(4 5 6 7))

 

9.21.52 Function make-like

Syntax:


  (make-like
list ref-sequence)

Description:

The list argument must be a list. If ref-sequence is a sequence type, then list is converted to the same type of sequence and returned. Otherwise the original list is returned.

Note: the make-like function is a helper which supports the development of unoptimized versions of a generic function that accepts any type of sequence as input, and produces a sequence of the same type as output. The implementation of such a function can internally accumulate a list, and then convert the resulting list to the same type as an input value by using make-like.

 

9.21.53 Function nullify

Syntax:


  (nullify
sequence)

Description:

The nullify function returns nil if sequence is an empty sequence. Otherwise it returns sequence itself.

Note: the nullify function is a helper to support unoptimized generic programming over sequences. Thanks to the generic behavior of cdr, any sequence can be traversed using cdr functions, checking for the nil value as a terminator. This, however, breaks for empty sequences which are not lists, because they are not equal to nil: to car and cdr they look like a one-element sequence containing nil. The nullify function reduces all empty sequences to nil, thereby correcting the behavior of code which traverses sequences using cdr, and tests for termination with nil.

 

9.22 Procedural List Construction

TXR Lisp provides an a structure type called list-builder which encapsulates state and methods for constructing lists procedurally. Among the advantages of using list-builder is that lists can be constructed in the left to right direction without requiring multiple traversals or reversal. For example, list-builder naturally combines with iteration or recursion: items visited in an iterative or recursive process can be collected easily using list-builder in the order they are visited.

The basic workflow begins with the instantiation of a list-builder object. This object may be initialized with a piece of list material which begins the to-be-constructed list, or it may be initialized to begin with an empty list. Methods such as add and pend are invoked on this object to extend the list with new elements. At any point, the list constructed so far is available using the get method, which is also how the final version of the list is eventually retrieved.

The build macro is provided which syntactically streamlines the process. It implicitly creates a list-builder instance and binds it to a hidden lexical variable. It then evaluates forms in a lexical scope in which short-hand macros are available for building the list.

 

9.22.1 Structure list-builder

Syntax:


  (defstruct list-builder nil
    head tail)

Description:

The list-builder structure encapsulates the state for a list building process. Programs should use the build-list function for creating an instance of list-builder. The head and tail members should be regarded as internal variables.

 

9.22.2 Function build-list

Syntax:


  (build-list [
initial-list])

Description:

The build-list function instantiates and returns an object of struct type list-builder.

If no initial-list argument is supplied, then the object is implicitly with an empty list.

If the argument is supplied, then it is equivalent to calling build-list without an argument to produce an object obj the invoking the method call obj.(ncon list) on this object. The object produced by the expression list is installed (without being copied) into the object as the prefix of the list to be constructed.

Example:


   ;; build the list (a b) trivially


   (let ((lb (build-list '(a b))))
     lb.(get)
   -> (a b)

 

9.22.3 Methods add and add*

Syntax:


  
list-builder.(add element*)
  
list-builder.(add* element*)

Description:

The add and add* methods extend the list being constructed by a list-builder object by adding individual elements to it. The add method adds elements at the tail of the list, whereas add* adds elements at the front.

Example:


  ;; Build the list (1 2 3 4)


  (let ((lb (build-list)))
    lb.(add 3 4)
    lb.(add* 1 2)
    lb.(get))
  -> (1 2 3 4)

 

9.22.4 Methods pend and pend*

Syntax:


  
list-builder.(pend list*)
  
list-builder.(pend* list*)

Description:

The pend and pend* methods extend the list being constructed by a list-builder object by adding lists to it. The pend method catenates the list arguments together as if by the append function, then appends the resulting list to the end of the list being constructed. The pend* method is similar, except it prepends the catenated lists to the front of the list being constructed.

Both methods have the property that the constructed list does not share structure with the input lists.

Example:


  ;; Build the list (1 2 3 4)


  (let ((lb (build-list)))
    lb.(pend '(3 4))
    lb.(pend* '(1 2))
    lb.(get))
  -> (1 2 3 4)

 

9.22.5 Methods ncon and ncon*

Syntax:


  
list-builder.(ncon list*)
  
list-builder.(ncon* list*)

Description:

The ncon and ncon* methods extend the list being constructed by a list-builder object by adding lists to it. The ncon method destructively catenates the list arguments as if by the nconc function. The resulting list is appended to the list being constructed. The ncon* method is similar, except it prepends the catenated lists to the front of the list being constructed.

These methods destructively manipulate the input lists. Moreover, they cause the list being constructed to share substructure with the input lists.

Additionally, the ncon* function can be called with a single argument which is an atom. This atom will simply be installed as the terminating atom of the list being constructed.

Example:


  ;; Build the list (1 2 3 4 . 5)


  (let ((lb (build-list)))
    lb.(ncon* (list 1 2))
    lb.(ncon (list 3 4))
    lb.(ncon 5)
    lb.(get))
  -> (1 2 3 4 . 5)

 

9.22.6 Method get

Syntax:


  
list-builder.(get)

Description:

The get method retrieves the list constructed so far by a list-builder object. It doesn't change the state of the object. The retrieved list may be passed as an argument into the construction methods on the same object.

Examples:


  ;; Build the circular list (1 1 1 1 ...)
  ;; by appending (1) to itself destructively:


  (let ((lb (build-list '(1))))
    lb.(ncon* lb.(get))
    lb.(get))
  -> (1 1 1 1 ...)


  ;; build the list (1 2 1 2 1 2 1 2)
  ;; by doubling (1 2) twice:


  (let ((lb (build-list)))
    lb.(add 1 2)
    lb.(pend lb.(get))
    lb.(pend lb.(get))
    lb.(get))
  -> (1 2 1 2 1 2 1 2)

 

9.22.7 Macro build

Syntax:


  (build
form*)

Description:

The build macro provides a shorthand notation for constructing lists using the list-builder structure. It eliminates the explicit call to the build-list function to construct the object, and eliminates the explicit references to the object.

Instead, build creates a lexical environment in which a list-builder object is implicitly constructed and bound to a hidden variable. Local macros which mimic the list-builder methods operate implicitly on this hidden variable, so that the object need not be mentioned as an argument.

Examples:


  ;; Build the circular list (1 1 1 1 ...)
  ;; by appending (1) to itself destructively:


  (build
    (add 1)
    (ncon* (get))) -> (1 1 1 1 ...)


  ;; build the list (1 2 1 2 1 2 1 2)
  ;; by doubling (1 2) twice:


  (build
    (add 1 2)
    (pend (get))
    (pend (get))) -> (1 2 1 2 1 2 1 2)

 

9.23 Permutations and Combinations

 

9.23.1 Function perm

Syntax:


  (perm
seq [len])

Description:

The rperm function returns a lazy list which consists of all length len permutations of formed by items taken from seq. The permutations do not use any element of seq more than once.

Argument len, if present, must be a positive integer, and seq must be a sequence.

If len is not present, then its value defaults to the length of seq: the list of the full permutations of the entire sequence is returned.

The permutations in the returned list are sequences of the same kind as seq.

If len is zero, then a list containing one permutation is returned, and that permutation is of zero length.

If len exceeds the length of seq, then an empty list is returned, since it is impossible to make a single non-repeating permutation that requires more items than are available.

The permutations are lexicographically ordered.

 

9.23.2 Function rperm

Syntax:


  (rperm
seq len)

Description:

The rperm function returns a lazy list which consists of all the repeating permutations of length len formed by items taken from seq. "Repeating" means that the items from seq can appear more than once in the permutations.

The permutations which are returned are sequences of the same kind as seq.

Argument len must be a nonnegative integer, and seq must be a sequence.

If len is zero, then a single permutation is returned, of zero length. This is true regardless of whether seq is itself empty.

If seq is empty and len is greater than