Libann

Libann: Example Programs


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

A. Example Programs

This section describes some of the demonstration programs which are distributed with Libann. They do not form part of the library, but are a useful source of reference to anyone wanting to implement similar functions.

The examples are distributed in the `demos' directory of the distribution. These examples are deliberately over simplified. In real life examples, a better choice of feature vector and more elaborate selection of training parameters would be required.

A.1 Natural Language Selection Using a Kohonen Network  Natural Language Selection
A.2 Character Recognition using a Multi-Layer Perceptron  Optical Character Recognition
A.3 Style Classification using a Multi-Layer Perceptron  Deciding the Literacy Style of Texts
A.4 Hopfield Network  Recalling Noisy Images
A.5 The Boltzmann Machine as a Classifier  Another Character Recognition Program


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

A.1 Natural Language Selection Using a Kohonen Network

The Kohonen network is useful when classifying data for which you do not have prior knowledge of the classes or the distribution of features. The wordFreq example program shows how a Kohonen network can classify written texts according to language, based upon their word frequencies.

In the directory `demos/data/texts' there several files downloaded from http://promo.net/pg. These are text files written in

  • English
  • French
  • German
  • Spanish
  • Latin

One obvoius way a Kohonen network might classify these, is according to their language. The output from the wordFreq program confirms this expectation. Note that all of the texts, regardless of their primary language contain approximately 1800 words of copyright information written in English. One of the advantages of neural network classifiers is their tolerance to this sort of `noisy' data.

The first step in classifying the texts is to define some sort of feature vector. In this case, the vector is the relative frequency of words in the texts. The program first examines all the texts and identifies the most common words among them. We have a priori knowledge of the languages used, and so for best results, the feature vector would have at least as many elements as there are classes (languages). However, the Kohonen network is used most commonly where this information is not known. The program uses the less than optimal vector size of 3. Running the program displays the following output:

 
Creating word vector
Using file /tmp/libann1.2.D007/demos/data/texts//1drll10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//81agt10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8bern11.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8cinq10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8cnci07.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8fau110.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8hrmr10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//8trdi10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//alad10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//auglg10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//civil10.txt
Using file /tmp/libann1.2.D007/demos/data/texts//lazae11.txt

The most common words are: the, de, und

 . 
 .
 .

The next thing the program does, is to take each file individually and to calculate the occurance of each of the words `the', `de' and `und'. The frequencies are normalised relative to the total number of words in the text, otherwise the network would be variant to the length of the text. The program uses a C++ class called WordFreq which is inherited from ann::ExtInput. This makes it easy to create the feature vectors and to train the network.

 
// Create frequency counts and put them into a set
typedef set<string>::const_iterator CI;
for (CI ci = files.begin(); ci != files.end() ; ci++) {

  FreqCount fc(*ci,wv);

  trainingData.insert(fc);

}

// Create the network
ann::Kohonen net(vectorSize,7);

// Train the network
net.train(trainingData,0.3,0.8,0.1,0.40);

After training, each feature vector is presented to the network. The program creates a directory for each class it detects, and copies the text into it.
 
bash-2.05a$ ls 1*
1111111111111100111001111110111111111111011111010:
1drll10.txt  alad10.txt  civil10.txt

1111111111111101111011111111111110011111011111001:
auglg10.txt

1111111111111101111011111111111110011111011111011:
81agt10.txt  8bern11.txt  8fau110.txt  8hrmr10.txt

1111111111111111111010111100111110111111011111100:
8cinq10.txt

1111111111111111111010111100111111111111011111100:
8cnci07.txt  8trdi10.txt  lazae11.txt

There are several interesting points about this result:

  • The first directory contains only English texts.
  • The second directory, contains a single file which has both Latin and German text (and the English copyright information).
  • The third directory contains only German texts.
  • The network has been unable to clearly discriminate between French and Spanish text. This is a result of too few dimensions in the feature vector. The word `de' is common in both languages, whereas `the' and `und' are not common in either of them. The best it could do was to create two classes, one containing both French and Spanish texts, the other containing only French.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

A.2 Character Recognition using a Multi-Layer Perceptron

Optical character recognition is a common application for neural networks. This example program demonstrates how a multi-layer perceptron can be used to recognise and classify printed characters. In a character recognition application, we know what classes to expect [a--z] and we can manually classify some of the samples. This situation makes the problem suitable for supervised leaning using multi-layer perceptron network.

The mlp-char program uses a multi-layer perceptron network to classify bitmap glyphs. The glyphs concerned are in the directory `demos/data/glyphs'. There are 6 instances of glyphs representing the characters [a--e]. The mlp-char program uses a C++ class called Glyph inherited from ann::ExtInput. A Glyph is a feature vector of the same length as the number of pixels in the bitmap. A black pixel is represented by 1 and a white pixel by 0. The first thing the program does therefore is to create a feature map from the glyphs and their respective classes.

 
// Populate the feature map from the files
ann::FeatureMap fm;

for (CI ci = de.begin() ; ci != de.end() ; ++ci ) {
  const string filename(*ci);

  // Reserve files with a 6 in them for recall
  // Don't train with them
  if ( "6.char" == filename.substr(len -6, len) ) 
    continue;

  // The classname is the first letter of the filename
  const string className(filename.substr(0,1));


  // Create the glyph and add it to the map
  const Glyph g(filename);
  fm.addFeature(className,g);

}

Note that one sample from each class is not put into the feature map and will therefore not be used for training.

The glyphs happen to be of resolution 8x8 and therefore the feature vector (and hence the input layer of the network) are of size 64. There are 5 classes, which can be represented by a network output layer of size 3. The next task therefore is to train the network.

 
// Set up the network and train it.

const int outputSize=3;
const int inputSize=fm.featureSize();

ann::Mlp net(inputSize,outputSize,1,0);

net.train(fm);

Finally, we want to use the recall method to classify glyphs. The program does this with a loop, recalling all glyphs (including those used for training).
 
// Recall  all the glyphs
for (CI ci = de.begin() ; ci != de.end() ; ++ci ) {
  const string pathname(*ci);

  const Glyph g(pathname);

  cout << pathname << " has class " << net.recall(g) << endl;

}

The following shows the results of running the program, filtering the samples ending in `6.char' (the ones not used for training).
 
bash-2.05a$ ./mlp-char ../data/glyphs/ | grep 6.char
../data/glyphs//a6.char has class a
../data/glyphs//b6.char has class b
../data/glyphs//c6.char has class c
../data/glyphs//d6.char has class d
../data/glyphs//e6.char has class e
bash-2.05a$ 

These happen to be all correctly classified. Running the program again, this time without the filter, showed 2 samples (out of a total of 30) incorrectly classified. The ratio of correctly classified samples to the total number of samples (in our case 28/30 = 0.94) is called the precision of the classifier. A precision of 95% is a reasonable figure for most applications. Adjustment of the training parameters, and increasing the size of the hidden layer can improve the precision.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

A.3 Style Classification using a Multi-Layer Perceptron

This program is a slightly more ambitious application of a multi-layer perceptron. It attempts to classify different types of document according to their grammatical style. To do this, it uses the style program published by the Free Software Foundation (http://www.gnu.org/software/diction/diction.html). This program takes a English text file and produces 9 different metrics about the author's grammatical style.

Our program takes a set of files, runs the style program over each of them, postprocesses the output and then reads that output to create objects of class DictStyle which is inhereted from ann::ExtInput. Each DictStyle is then entered into a FeatureMap as before. In this program, the first part of the filename is assumed to be the name of the class for training purposes.

 
 // Create a feature map from the files given on the command line
 for ( int i = 3; i < argc ; ++i ) { 

     const string pathname(argv[i]);


     // extract the filename from the full pathname
     const string filename(pathname.substr(pathname.rfind("/")+1,
					   string::npos));

     // The classname is filename upto the first number
     const string className(filename.substr(0,filename.find_first_of("0123456789")));

     // Create a DictStyle object from the text file
     DictStyle ds(pathname);

     fm.addFeature(className,ds);

   }

The Libann source comes with some examples of text files which can be used to test this classifier. These are located in `demo/data/text/style' and comprise extracts from 5 each

  • Novels.
  • User Manuals.
  • Legal Documents.

We would expect these types of document to have a quite different style of language, and therefore to be able to classify them accordingly.

Training the classifier and recalling from it is simple:
 
   ann::Mlp classifier(9,2,0.4,6,0.45,1,3);


   cout << "Training the  classifier\n";

   classifier.train(fm);

   cout  << "Writing to " << netfile << endl;


   .
   .
   .

   // Recall files
  for ( int i = 3; i < argc ; ++i ) { 

   const string pathname(argv[i]);

   DictStyle ds(pathname);

   cout << classifier.recall(ds) << endl;
  }

Investigating the precision of this classifier is left as an exercise for the reader.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

A.4 Hopfield Network

In the directory `demos/data/glyphs/numerals' there are 5 files each containing a bit map pattern of the numerals from 1 to 5. This program demonstrates how a Hopfield network can learn these patterns, and then how a noisy pattern can be presented to the network and be identified as the original.

The network is created from a set of all the patterns it is to hold, as described in See section 4.4 Hopfield Networks.

 
    // Create a training set
    set<ExtInput> inset;
    
    for ( CI ci = filenames.begin() ; ci != filenames.end() ; ++ci ) { 
      Glyph g(*ci, true);
      inset.insert(g);
    }

   // Instantiate a hopfield net trained with our patterns
   Hopfield h(inset);

Having done this, the program mutates a few of the bits. The program uses a special function called mutate for this purpose.

 
   for ( CI ci = recallNames.begin() ; ci != recallNames.end() ; ++ci ) { 
     Glyph g(*ci);
    
     mutate(g);

     ann::vector result = h.recall(g);
   }

Results of the running the program show correct recall for the numerals 1--4. However the network has problems identifying a noisy number 5. This is because of the similarity between a `3' and a `5', and because of the overlap in patterns when too many are given to the network to learn. The Boltzmann machine can overcome these limitations.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

A.5 The Boltzmann Machine as a Classifier

One application of a Boltzmann machine is its use as a classifier. However it is not as fast as the Multi-Layer Perceptron, but this example shows how a simple classification task can be achieved. This demonstration program is located in `demos/boltzmann/boltzmann-char.cc' and the data which we'll use are found in `demos/data/glyphs/xo' which are a number of bitmaps representing the characters `+', `o' and `x'. The program creates Boltzmann machine which recognises what each of these characters look like, and then presents it with another similar glyph from each class.

Like the Multi-Layer Perceptron example, the program creates a ann::FeatureMap with the glyphs it is to be trained with.

 
   for (CRI ci = de.rbegin() ; ci != de.rend() ; ++ci ) {
     const string pathname(*ci);

     // Get the filename from the pathname
     const string filename(pathname.substr(pathname.rfind("/")+1,
					   string::npos));

     // Save these ones for recall purposes
     if ( filename.find_first_of("23") != std::string::npos)
       continue;

     // The classname is the first letter of the filename
     const string className(filename.substr(0,1));

     // Create the glyph and add it to the map
     const Glyph g(pathname,true);

     fm.addFeature(className,g);

   }

Note that files which have a `2' or a `3' in them are not added to the feature map, because they will not be used in training, but only for recall.

Now the Boltzmann machine itself is created:

 
   ann::Boltzmann net(fm,10,10,0.9);

The parameters after fm are the number of hidden units, the initial temperature and the cooling rate respectively.

Looking up values in the Boltzmann machine is simply a matter of presenting the value to the recall method; This method will return a string representing the class to which the feature belongs.

In this case, all 9 glyphs in `demos/data/glyphs/xo' are correctly classified, despite the network having been trained with only one from each class.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by John Darrington on May, 15 2003 using texi2html

[Home] [Frequently Asked Questions] [News] [Development] [Links] [Using and Installing] [Download] [Licence]