BNF Parser Generator - Homepage

[ Summary - Homepage - Support - CVS - Download ]

Node:Top, Next:, Up:(dir)

The BNF parser generator takes a syntax not unlike BNF and generates a "C" parser for it, a parser that can parse either strings or files. This is a flexible tool, ment for smaller parsing tasks where bison+flex are just to big to use.


Node:grammar, Next:, Previous:Top, Up:Top

BNF input grammar

The BNF input grammar, as the name suggests follows the estabishled BNF syntax quite closely, and if you're familliar with that grammar you won't have very much trouble understanding the following. All grammar rules are written in the format as the bnf program recognizes them, so they also function as examples.

grammar ::= [{ assignment }] eoi;

A BNF grammar is an optional repetition of assignments, followed by eoi which matches with the end of the input.

assignment ::= name ('::=' | '=') expression;

An assignment is a name followed by '=' or '::=' followed by an expression. The expression is the meat of the grammar.

expression = term [{ '|' term }];

An expression is a list of terms, seperated by a '|', meaning or.

term = factor [{ white factor }];

A term is a list of whitespace seperated factors.

factor = IO | name | '[' expression ']' | '&{' expression '&}';

A factor is either an IO term, or its a name, or its an optional expression, or its an repetition of an expression.

IO = '\'' string '\'' | '"' string '"' | '`' string '`';

With the IO rule we handle the input that the generated parser reads and the output it produces. IO are strings, quoted in three different ways, either forward quoted to denote that it matches a string in the input only, doublequoted meaning that the string should be copied verbatim to the output stream, or backward quoted meaning that this string should be copied to the output by the parser.


Node:example, Next:, Previous:grammar, Up:Top

A simple example

As a first example we want a program that reads a repetition of the word "foo" and replaces it with a single word "bar". Here is the grammar you feed to the BNF program.

sample = [{ {'foo'} `bar` | . }] eoi;

There is something new here, the dot, which simply means 'match and echo any character'. When you place the above grammar in a sample.bnf file and generate the output "C" parser, bnf defines a "C" function called sample() and the body of that function parses the input we want. We need to write an additional "C" file containing a call to this function. That "C" file looks like this:

#include <stdio.h>

int __BNF_fil_io_parser(FILE* input,FILE* output,int (*syntax)());
int sample(void);

int main() {
return __BNF_fil_io_parser(stdin,stdout,sample);
}

The __BNF_fil_io_parser() function, like the sample() function are defined in the generated "C" file. Now all there is left to do is compile the two files and link them together. There is also a __BNF_str_io_parser() function that uses strings instead of files, so you can also use the parser in the situation that files are unapproperiate.

A fundamental property of parsers generated with BNF is that they are simple "C" functions, so you can intermix calls to ordinairy "C" functions that you define yourself within the grammar. This is the fundamental way of making parsers that do more than simple echoing of the input. You can make stuff like:

foo = somefunc { 'foo' `bar` | . } eoi;
-----
int somefunc() { /* do something */ return 0; }

Returning 0 in such a function denotes OK, which means to to parser can continue with processing input. If you return a non-zero value, the parser will 'reject' it. Ofcourse you could make the call an optional element in the grammar like this: foo = [somefunc] "foo";. Now the returnvalue of somefunc() is always ignored.


Node:infix, Next:, Previous:example, Up:Top

More eleborate example: an infix calculator

Any decent parser generator should be able to parse infix algebraic expressions, and BNF is no exception, so here follows the grammar:

input	::= ws expr ws eoi;

expr	::= ws powterm [{ws '^' ws powterm}];
powterm	::= ws factor [{ws ('*'|'/') ws factor}];
factor	::= ws term [{ws ('+'|'-') ws term}];
term	::= '(' ws expr ws ')' | '-' ws expr | number;

number	::= {dgt} ['.' {dgt}] [('e'|'E') ['-'] {dgt}];
dgt	::= '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';
ws	::= [{' '|'\t'|'\n'|'\r'}];

A simple compile body like the one above could be used and then you'll be able to verify weather a certain expression is correct or not. Now that's exciting, isn't it? We would also like to calculate the value of the expression ofcourse, and that's exactly where things get ugly with calls to "C" functions cluttered everywhere around the above syntax.

Therefore, before actually start cluttering your code, always check if the bare grammar parses what you want and as a next stage, start implementing 'what to do' with the grammar. To play with this you should know that there is an important (global) variable: extern char* __BNF_input that always points to 'where you are' in the input stream/string.


Node:Concept Index, Previous:infix, Up:Top

Concept Index

Table of Contents


Want to contribute? Join the project, get source from cvs, code away, and publish the release.