Chapter 17. Regular Expression

Besides application, GNUstep also support library, bundle and tool. Here is the demonstration of them. I'll write a regular expression library, which will be used by a tool.

Library is very simple. It's just a collection of classes. Unix usually have a built-in posix regular expression engine. Here, I wrap it into a library.

GNUmakefile

include $(GNUSTEP_MAKEFILES)/common.make

LIBRARY_NAME = libRegEx

libRegEx_OBJC_FILES = RegEx.m 

libRegEx_HEADERS = RegEx.h 

# The Headers that are to be installed
libRegEx_HEADER_FILES = RegEx.h 

libRegEx_HEADER_FILES_DIR = .
libRegEx_HEADER_FILES_INSTALL_DIR = RegEx

include $(GNUSTEP_MAKEFILES)/library.make

That's almost the same as the GNUmakefile for application. It use LIBRARY_NAME instead APP_NAME. libRegEx_HEADER_FILES is the files to be installed as headers. libRegEx_HEADER_FILES_DIR is where the header files are in the source code. libRegEx_HEADER_FILES_INSTALL_DIR is where the headers to be install. Once this library is installed, the headers should be in GNUstep/Local/Library/Headers/RegEx/. There is a document for GNUstep Makefiles Package

Then here is a source code of RegEx class, which use C library

RegEx.h

#ifndef _RegEx_H_
#define _RegEx_H_

#include <Foundation/Foundation.h>
#include <regex.h>

@interface RegExPattern : NSObject
{
  regex_t *preg;
  /* Mask could be regex options. For example: REG_ICASE, REG_NEWLINE*/
  unsigned int _mask;
}

+ (RegExPattern *) regexPattern: (NSString *) pattern;

- (id) initWithPattern: (NSString *) pattern options: (unsigned int) mask;
- (regex_t *) pattern;

@end

@interface RegExParser: NSObject
{
}

/* The return range is related to the whole string.
 * Not related to the given range.
 */
+ (NSRange) rangeOfString:(NSString *) pattern
                 inString: (NSString *) string;

+ (NSRange) rangeOfPattern: (RegExPattern *) pattern
                  inString: (NSString *) string;

+ (NSRange) rangeOfString: (NSString *) pattern
                 inString: (NSString *) string
                    range: (NSRange) range;

+ (NSRange) rangeOfPattern: (RegExPattern *) pattern
                  inString: (NSString *) string
                     range: (NSRange) range;
@end

#endif /* _RegEx_H_ */

RegEx.m

#include "RegEx.h"

@implementation RegExPattern

+ (RegExPattern *) regexPattern: (NSString *) pattern
{
  id object = [[RegExPattern alloc] initWithPattern: pattern
                                            options: REG_EXTENDED];

  return AUTORELEASE(object);
}

- (void) dealloc
{
  regfree(preg);
  free(preg); /* Not sure about this */
  [super dealloc];
}

- (id) initWithPattern: (NSString *) pattern options: (unsigned int) mask
{
  int result;
  char errbuf[255];
  _mask = mask;

  preg = malloc(sizeof(regex_t));
  result = regcomp(preg, [pattern cString], mask);

  if (result != 0)
    {
      regerror(result, preg, errbuf, 255);
      NSLog(@"RegEx Error: Couldn't compile regex %@: %s", pattern, errbuf);

      regfree(preg);
      return nil;
    }

  self =  [super init];
  return self;
}

- (regex_t *) pattern
{
  return preg;
}
@end

static  regmatch_t pmatch[1];
static  char errbuf[255];

@implementation RegExParser

+ (NSRange) rangeOfString:(NSString *) pattern
                 inString: (NSString *) string
{
  return [RegExParser rangeOfString: pattern
                           inString: string
                              range: NSMakeRange(0, [string length])];
}

+ (NSRange) rangeOfPattern: (RegExPattern *) pattern
                  inString: (NSString *) string
{
  return [RegExParser rangeOfPattern: pattern
                            inString: string
                               range: NSMakeRange(0, [string length])];
}

+ (NSRange) rangeOfString: (NSString *) pattern
                 inString: (NSString *) string
                    range: (NSRange) range;
{
  return [RegExParser rangeOfPattern: [RegExPattern regexPattern: pattern]
                            inString: string
                               range: range];
}

+ (NSRange) rangeOfPattern: (RegExPattern *) pattern
                  inString: (NSString *) string
                     range: (NSRange) range
{
  int result;
  int location, length;
  int mask = 0;

  /* Considering the situation of beginning line */
  if (range.location != 0)
    mask = mask | REG_NOTBOL;
  if ((range.location + range.length) != [string length])
    mask = mask | REG_NOTEOL;
   
  result = regexec([pattern pattern], 
                   [[string substringWithRange: range] cString],
                   1, pmatch, mask);
  if (result != 0)
    {
      if (result != REG_NOMATCH) 
        {
          regerror(result, [pattern pattern], errbuf, 255);
          NSLog(@"RegEx Error: Couldn't match RegEx %s", errbuf);
        }
      return NSMakeRange(NSNotFound, 0);
    }

  location = range.location + pmatch->rm_so;
  length = pmatch->rm_eo - pmatch->rm_so;

  return NSMakeRange(location, length);
}

@end

There is nothing special in the RegEx source code. Just use C library from Objective-C source. After compilation and installation, I can use it as library.

Here is a little tool to test the library, called regex_test

GNUmakefile

include $(GNUSTEP_MAKEFILES)/common.make

TOOL_NAME = regex_test

regex_test_OBJC_FILES = \
        main.m

regex_HEADERS =

ADDITIONAL_TOOL_LIBS += -lRegEx

include $(GNUSTEP_MAKEFILES)/tool.make

Again, use TOOL_NAME instead APP_NAME. ADDITIONAL_TOOL_LIBS include the new libRegEx library.

main.m

#include <Foundation/Foundation.h>
#include <RegEx/RegEx.h>

int main (int argc, const char **argv)
{
  NSRange range;
  NSAutoreleasePool *pool = [NSAutoreleasePool new];

  range = [RegExParser rangeOfString: @"middle"
                            inString: @"head middle end"];
  NSLog(@"%@", NSStringFromRange(range));

  RELEASE(pool);
  return 0;
}

Generally, GNUstep tools are the same as unix commands. You can type the command name directly with correct path environment variable, or you can use opentool to open GNUstep tools. opentool can avoid the problem of paths if any.

If you want to put all the tools, libraries in a project rather than many separated directories, you can use the subproject function of GNUstep-make.

Say I want to put everything in the ~/foo/ directory. Tool is in ~/foo/, and library is in ~/foo/RegEx/. Then the GNUmakefile of this tool can be:

GNUmakefile

include $(GNUSTEP_MAKEFILES)/common.make

SUBPROJECTS = RegEx

TOOL_NAME = regex_test

regex_test_OBJC_FILES = \
        main.m

regex_HEADERS =

ADDITIONAL_TOOL_LIBS += -lRegEx
ADDITIONAL_LIB_DIRS += -LRegEx/$(GNUSTEP_OBJ_DIR)

include $(GNUSTEP_MAKEFILES)/aggregate.make
include $(GNUSTEP_MAKEFILES)/tool.make

SUBPROJECTS indicates the subproject is in the directory RegEx. ADDITIONAL_LIB_DIRS can access the compiled library under the directory RegEx so that I don't need to install the libRegEx first, then compile the tool. aggregate.make told gmake that there are sub-projects so that it will go into the sub-directories.

By doing this, all the source code is under the ~/foo/ directory. Here is the source code:RegEx-1-src.tar.gz

Instead of library, GNUstep support dynamically loaded bundle, which act as plug-in in other applications. Bundles can be loaded anytime, and library has to be linked at compilation.

Now, change the RegEx library into bundle.

GNUmakefile

include $(GNUSTEP_MAKEFILES)/common.make

BUNDLE_NAME = RegEx
BUNDLE_EXTENSION = .bundle
BUNDLE_INSTALL_DIR = $(GNUSTEP_INSTALLATION_DIR)/Library/Bundles

RegEx_OBJC_FILES = RegEx.m 

RegEx_HEADERS = RegEx.h 

RegEx_PRINCIPAL_CLASS = RegExParser

include $(GNUSTEP_MAKEFILES)/bundle.make

Just change LIBRARY to BUNDLE. Most important one is the RegEx_PRINCIPAL_CLASS. You can get the principal class from a bundle without known its name. That's all for the bundle. No change in source code.

You have to access the bundle by its path, then get the class inside either by class name or by principal class. Here is the example:

GNUmakefile

include $(GNUSTEP_MAKEFILES)/common.make

SUBPROJECTS = RegEx

TOOL_NAME = regex_test

regex_test_OBJC_FILES = main.m

regex_HEADERS =

include $(GNUSTEP_MAKEFILES)/aggregate.make
include $(GNUSTEP_MAKEFILES)/tool.make

main.m

#include <Foundation/Foundation.h>
#include <RegEx/RegEx.h>

int main (int argc, const char **argv)
{
  NSRange range;
  NSAutoreleasePool *pool;
  NSArray *paths;
  NSFileManager *fileManager;
  NSString *path;
  NSBundle *bundle;
  Class RegExClass;
  int i;

  pool = [NSAutoreleasePool new];

Search the bundle first

  fileManager = [NSFileManager defaultManager];

  /* Search for the bundles */
  paths = NSSearchPathForDirectoriesInDomains(NSLibraryDirectory,
                                              NSLocalDomainMask, YES);

  for (i = 0; i < [paths count]; i++)
    {
      path = [[paths objectAtIndex: i] stringByAppendingPathComponent: @"Bundles/RegEx.bundle"];
      if ([fileManager fileExistsAtPath: path])
        break;
    }

Get the bundle by its path. Then get the principalClass, which is the regular expression parser.

  bundle = [NSBundle bundleWithPath: path];
  RegExClass = [bundle principalClass];

Once the class is got, use it as normal class from libraries

  range = [RegExClass rangeOfString: @"middle"
                           inString: @"head middle end"];

  NSLog(@"%@", NSStringFromRange(range));
  
  RELEASE(pool);
  return 0;
}

Here is the source code: RegEx-2-src.tar.gz

The advantage of bundle is to load it dynamically. For example, if the application can't find the RegEx bundle, it can disable the functions at runtime. In this way, many function can be put into bundle, and depending on which bundle is installed, an application can offer different kind of function. Bundle can also be used as plug-in. For example, an application can have many bundles for different file formats. They only need to share a common header, either using protocal or using the same super class. An application can give a file to all the bundles, and ask which one can handle it. Then use the one to process the file format. The way to install header of bundles is the same as library. Use _HEADER_FILES in GNUmakefile to specify the headers to install.

Bundle is something really useful in GNUstep. You should take some time to read the header of it. You can even have some resources (images, sound) in the bundle.