Baroque's A Readability-Oriented Quasi-Universal Engine


Baroque is an application which highlights the lexicon of source files with colors and/or fonts.

Baroque is essentially a collection of two different sort of modules:

  • input modules, which translate from source languages (such as C++ or Lisp) to an intermediate language, called TML.
  • output modules, which translate from TML to output languages (such as Troff, TeX or PostScript).
  • Baroque can already generate something like this:

    SELECT B.bookID, B.title
    FROM Books B
    WHERE POSITION('C++' IN B.title) > 0;

    Input and output modules work in pipe: in this case the SQL input module translates the SQL source code into the intermediate TML code; the HTML output module takes the TML code and produces the HTML data that you see rendered above.

    It's quite easy to write input and output modules: they are essentially lexical analyzers which can be automatically generated using flex, and most languages share a big part of the lexical definitions.
    The real challenge is to support all languages (I eventually intend to write or mantain modules for any language having at least one free implementation) in a coherent way, making user configuration essentially independent from the language. This is one big issue.

    Configuration files for output modules (if needed I will extend this also to input modules) are Scheme programs binding values to variables which output modules can easily read. Scheme support is implemented with the excellent GNU Guile.

    Current status

    Baroque does exist and does work but I must clean the code and write documentation before adding important new features.

    A version of all these modules is already working:

    Input modulesOutput modules
    text with ANSI terminal sequences

    I also plan to write modules for:

    Input modulesOutput modules
    BASIC (?)
    lex (hard)
    make (hard?)
    yacc (hard)
    Dot-matrix and ink-jet text printer

    Input modules for flex and bison (and also make?) would be very attractive but they would also need syntactic knowledge apart from lexical knowledge; they can't be created using flex alone, they would need both it and bison. I'm going to investigate and see whether it's worth the trouble.

    There are many interesting and widely-used languages that I don't know or don't know yet; I need help for this. If the language doesn't require syntactic analysis (all common programming languages don't) then helping is very easy: a flex scanner is all it's needed. Extensive documentation is near to come.
    I need help to support:

    Input modulesOutput modules
    Assembly languages (AT&T syntax)
    PostScript (via GNU Enscript?)


    The software is distributed under the GNU GPL.

    Plans and wishes

    In the medium term I plan to:

  • Support as many input and output languages (for which free implementations exist) as possible; contributions of code will be very appreciated, but I'm not ready yet: first some implementation choices have to be made.
  • Write documentation in GNU Texinfo.
  • How could I automatically recognize the language used in an input document? I could use Emacs-style comments in the first line, but I fear (I can be wrong) that this would cause problems with flex because of the reset of the input stream; I could use the extension of the name of the source file, but the mapping from the extensions to the languages is not injective: for example .pl stands for both Prolog and Perl, and .h is used for both C and C++ headers.
  • A CGI interface using the HTML output module.
  • Find a nicer recursive acronym for Baroque?
  • Easily support languages which need both scanning and parsing, i.e. both flex and GNU Bison support. Most probably the first language of this series will be flex itself.
  • I also would like very much:

  • Apply for inclusion of this package in the GNU Project.

  • Download

    There's not an official release yet, but you can fetch the latest CVS tree from GNU Savannah.

    Luca Saiu,

    $Author: positrone $
    $Date: 2003/06/11 22:04:06 $