Trie-Gen: Trie Lookup Code Generator

The Trie-Gen Program

Trie-Gen's main vocation is that of generating the text of trie lookup C/C++ functions associated to a given set of keywords. For example, in the case of the following key/value pairs:

value  key
-----  ---
  1    pot
  2    potato
  3    pottery

the generated functions would look like the ones below. Note that the trie lookup function on the right accepts all unique prefixes of the input keywords.

int lookup(const char* p)                  int lookup(const char* p)
{                                          {
    if (*p ++ == 'p' &&                        if (*p ++ == 'p' &&
        *p ++ == 'o' &&                            *p ++ == 'o' &&
        *p ++ == 't') {                            *p ++ == 't') {
        if (*p == 0)                               if (*p == 0)
            return 1;                                  return 1;
        switch (*p ++) {                           switch (*p ++) {
        case 'a':                                  case 'a':
            if (*p ++ == 't' &&                        if ((*p == 0 || (*p ++ == 't' &&
                *p ++ == 'o' &&                            (*p == 0 || (*p ++ == 'o' &&
                *p == 0)                                   (*p == 0))))))
                return 2;                                  return 2;
            return error;                              return error;
        case 't':                                  case 't':
            if (*p ++ == 'e' &&                        if ((*p == 0 || (*p ++ == 'e' &&
                *p ++ == 'r' &&                            (*p == 0 || (*p ++ == 'r' &&
                *p ++ == 'y' &&                            (*p == 0 || (*p ++ == 'y' &&
                *p == 0)                                   (*p == 0))))))))
                return 3;                                  return 3;
        }                                          }
    }                                          }
    return error;                              return error;
}                                          }

Trie-Gen can also produce a contracted form of the above trie lookup functions:

int lookup(const char* p)                  int lookup(const char* p)
{                                          {
    if (prefix("pot", p)) {                    if (prefix("pot", p)) {
        p += 3;                                    p += 3;
        if (*p == 0)                               if (*p == 0)
            return 1;                                  return 1;
        switch (*p ++) {                           switch (*p ++) {
        case 'a':                                  case 'a':
            if (equal(p, "to"))                        if (prefix(p, "to"))
                return 2;                                  return 2;
            return error;                              return error;
        case 't':                                  case 't':
            if (equal(p, "ery"))                       if (prefix(p, "ery"))
                return 3;                                  return 3;
        }                                          }
    }                                          }
    return error;                              return error;
}                                          }

The functions prefix and equal used by these trie lookup functions are defined simply as:

int prefix(const char* p, const char* q)
{
    while (*p && *p == *q)
         ++ p, ++ q;
    return *p == 0;
}

int equal(const char* p, const char* q)
{
    while (*p && *p == *q)
         ++ p, ++ q;
    return *p == *q;
}

Note that an expression of form equal(p, q) is equivalent with !strcmp(p, q).

The unique prefixes trie lookup functions produced by Trie-Gen might be of use when one wants to improve the speed of long command line options parsing of C library function getopt_long.

Internally, when getopt_long gets a long command line option for to parse it, the function loops sequentially over an array encoding long options specifications using strcmp to find the entry to which the given long option refers to. This sequential loop over a potentially large array (maybe of hundreds of entries) can be alleviated by using a unique prefixes trie lookup function like the ones above. For an implementation of such improvement of getopt_long one should look in file src/opt.cpp for the function getopt_long2 and in file src/opt-impl.hpp for the generated trie lookup function lookup_long_opt.

In addition to what has been seen above, Trie-Gen offers a few more interesting use cases. For more details see the documentation section below. Trie-Gen is one of the meta-tools of Json-Type.

Downloading

The source code tarballs for all releases of Trie-Gen are available from project's download area on Savannah. The current development sources can be browsed using cgit or gitweb or can be retrieved locally with the command:

$ git clone git://git.sv.nongnu.org/trie-gen

Trie-Gen releases are signed by Stefan Vargyas. His key is available on Savannah. Each released tarball is paired by a .sig file which contains the digital signature created by GnuPG for the respective tarball. The procedure of verifying a signature is straightforward (see the GnuPG FAQ).

Documentation

The README file—to be found in the top directory of the source tree—describes how to configure, build and test Trie-Gen.

The file trie-gen.txt—located in the doc directory of the source tree—provides comprehensive details of the internal and external aspects of Trie-Gen. For example, section 4, The Use Cases of Trie-Gen, of this document accounts for the most important use cases of Trie-Gen's main program trie and, respectively, of trie's associated shell function gen-func.

Mailing Etiquette

Please do not send messages encoded as HTML, nor encoded as base64 MIME, nor included as multiple formats. Please send messages as plain text. Please include a descriptive subject line. Please avoid sending large messages, without prior contact.

Licensing

Trie-Gen is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.

Author

The author of this software is Stefan Vargyas <stvar@yahoo.com>.

Copyright © 2016, 2018   Stefan Vargyas