Unbounded by reference

This section documents a GNU Modula-2 compiler switch which implements a language optimisation surrounding the implementation of unbounded arrays. In GNU Modula-2 the unbounded array is implemented by utilising an internal structure struct {dataType *address, unsigned int high}. So given the Modula-2 procedure declaration:

PROCEDURE foo (VAR a: ARRAY OF dataType) ;
BEGIN
   IF a[2]= (* etc *)
END foo ;

it is translated into GCC trees, which can be represented in their C form thus:

void foo (struct {dataType *address, unsigned int high} a)
{
   if (a.address[2] == /* etc */
}

Whereas if the procedure foo was declared as:

PROCEDURE foo (a: ARRAY OF dataType) ;
BEGIN
   IF a[2]= (* etc *)
END foo ;

then it is implemented by being translated into the following GCC trees, which can be represented in their C form thus:

void foo (struct {dataType *address, unsigned int high} a)
{
   dataType *copyContents = (dataType *)alloca (a.high+1);
   memcpy(copyContents, a.address, a.high+1);
   a.address = copyContents;

   if (a.address[2] == /* etc */
}

This implementation works, but it makes a copy of each non VAR unbounded array when a procedure is entered. If the unbounded array is not changed during procedure foo then this implementation will be very inefficient. In effect Modula-2 lacks the REF keyword of Ada. Consequently the programmer maybe tempted to sacrifice semantic clarity for greater efficiency by declaring the parameter using the VAR keyword in place of REF.

The -funbounded-by-reference switch instructs the compiler to check and see if the programmer is modifying the content of any unbounded array. If it is modified then a copy will be made upon entry into the procedure. Conversely if the content is only read and never modified then this non VAR unbounded array is a candidate for being passed by reference. It is only a candidate as it is still possible that passing this parameter by reference could alter the meaning of the source code. For example consider the following case:

PROCEDURE StrConCat (VAR a: ARRAY OF CHAR; b, c: ARRAY OF CHAR) ;
BEGIN
   (* code which performs string a := b + c *)
END StrConCat ;

PROCEDURE foo ;
VAR
   a: ARRAY [0..3] OF CHAR ;
BEGIN
   a := 'q' ;
   StrConCat(a, a, a)
END foo ;

In the code above we see that the same parameter, a, is being passed three times to StrConCat. Clearly even though parameters b and c are never modified it would be incorrect to implement them as pass by reference. Therefore the compiler checks to see if any non VAR parameter is type compatible with any VAR parameter and if so it generates runtime procedure entry checks to determine whether the contents of parameters b or c matches the contents of a. If a match is detected then a copy is made and the address in the unbounded structure is modified.

The compiler will check the address range of each candidate against the address range of any VAR parameter, providing they are type compatible. For example consider:

PROCEDURE foo (a: ARRAY OF BYTE; VAR f: REAL) ;
BEGIN
   f := 3.14 ;
   IF a[0]=BYTE(0)
   THEN
      (* etc *)
   END
END foo ;

PROCEDURE bar ;
BEGIN
   r := 2.0 ;
   foo(r, r)
END bar ;

Here we see that although parameter, a, is a candidate for the passing by reference, it would be incorrect to use this transformation. Thus the compiler detects that parameters, a and f are type compatible and will produce runtime checking code to test whether the address range of their respective contents intersect.

The GNU Modula-2 front end to GCC

Unbounded by reference