 # POOMA Tutorial 5Pointwise Operations

## Introduction

One of the major goals of POOMA is to support stencil-based computation on large arrays. A stencil is a pattern of weights used to update each value in an array based on the values of its neighbors. Stencils are usually represented as a grid of weights around a central point, which indicates the location of the value being updated.

Since the simplest possible stencil consists of just a single point, and since a major goal of POOMA is to make simple operations simple to implement, this tutorial starts by showing how to implement functions that are to be applied pointwise to each element of an array in turn. More general stencils, which combine the values in regular or irregular neighborhoods, are then described.

## Pointwise Functions

The program below (included in the release as examples/UserFunction/CosTimes) shows how to create a pointwise function by instantiating a templated POOMA class called UserFunction. The class used as a template argument to UserFunction must supply an operator()() function, which must take a single value as an argument, and return the result of applying the desired function to that element. In this case, the function we wish to apply returns the cosine of a constant times its input value:

```01  #include "Pooma/Arrays.h"
02
03  class CosTimes
04  {
05  public:
06    CosTimes(double x) : x_m(x) {}
07    double operator()(double y) const { return cos(x_m*y); }
08    double &constant() { return x_m; }
09    const double &constant() const { return x_m; }
10  private:
11    double x_m;
12  };
13
14  int main(int argc, char *argv[])
15  {
16    Pooma::initialize(argc,argv);
17
18    int N=5;
19    Array<2> x(N,N);
20    const double Pi = 2.0 * asin(1.0);
21
22    // Need to guard evaluation of scalar code.
23    Pooma::blockAndEvaluate();
24
25    // Fill the array with some data.
26    for (int i=0; i<N; ++i)
27      for (int j=0; j<N; ++j)
28        x(i,j) = i+0.1*j;
29
30    // Build an object that calculates cos(2*pi*x).
31    UserFunction<CosTimes> cos2pi( 2*Pi );
32
33    // An array to test the output.
34    Array<2> y(N,N);
35
36    // Make sure the difference is zero.
37    y = cos2pi(x) - cos(2*Pi*x);
38    double diff = sum(y*y);
39    cout << cos2pi.function().constant() << ": " << diff << endl;
40
41    // Clean up and exit.
42    Pooma::finalize();
43    return 0;
44  }
```

Note that the name of the functional class CosTimes is passed as a template argument to UserFunction so that that operator()() does not incur the performance penalty of a virtual method call. Note also how the scaling constant is passed as a constructor argument to UserFunction<CosTimes> on line 31, which invokes the CosTimes constructor on line 06. This works because UserFunction defines templated constructors taking up to seven arguments, each of which calls through to a constructor of the template argument class. An example is the constructor on lines 8-11 of the abbreviated class definition shown below:

```001  template<class Func>
002  class UserFunction
003  {
004    UserFunction(const Func &func)
005      : function_m(func)
006    { }
007
008    template<class I1, class I2>
009    UserFunction(const I1 &i1, const I2 &i2)
010      : function_m(i1,i2)
011    { }
012
013    other constructors and class body
014
015  private:
016    Func function_m;
017  };
```

The UserFunction class stores an instance of its template argument class so that data such as lookup tables can be embedded in such classes. UserFunction overloads the method function() to return either a reference or a constant reference to this member. The listing of CosTimes.cpp uses this method on line 39 to access the value of the multiplier embedded in the function object.

Note that it may be more efficient in some cases to create a UserFunction by creating an instance of the application-specific function class, and then passing it as a constructor argument to UserFunction. This uses the templated constructor defined on lines 4-6 of the abbreviated class definition above.

## Stencils

The first tutorial introduced Laplace's Equation in two dimensions, and described the way in which repeated application of the function:

V(i, j) = (V(i+1, j) + V(i, j+1) + V(i-1, j) + V(i, j-1)) / 4

could be used to calculate the electric potential in a flat plate. The implementation in that tutorial, and the ones that followed, was based on whole-array manipulation. However, the same calculation can be done using a stencil by implementing the function shown above.

The first step is to represent the stencil as a class, which can then be used as a template parameter to the generic stencil class Stencil. Like UserFunction, this stencil-specific class must provide a templated member function called operator()(), which must take an array and one or more indices as arguments, and return the stencil's value at the point indicated by those indices. In the case of the Laplacian stencil, this method can be implemented as shown in examples/Stencil/Laplace/LaplaceStencil.h:

```class LaplaceStencil
{
public:
LaplaceStencil() {}

template <class A>
inline
typename A::Element_t
operator()(const A& x, int i, int j) const
{
return 0.25*(x(i+1, j) + x(i-1, j) + x(i, j+1) + x(i, j-1));
}

other methods
};
```

The stencil's operator()() method is templated on the type of the array to which the stencil is being applied, so that a single stencil can be applied to arrays of float, complex<double>, or user-defined values. The method's return type is declared to be A::Element_t, i.e. the element type of the given array, for the same reason. Finally, operator()() must take one or more integer indices as arguments, which tell it where it is being applied.

The computation implemented by the stencil can be very complex, but must not rely on accumulated state information. Setting every array element to the maximum value seen so far, for example, is easy to code up, but almost certain to produce incorrect results when executed by multiple threads, or on distributed-memory platforms.

Classes used as arguments to Stencil must also implement two methods called lowerExtent() and upperExtent(). These methods tell the framework the extent of the border that the stencil requires. In the case of LaplaceStencil, the border is one unit in all directions, so these methods are:

```class LaplaceStencil
{
public:
constructor

calculation function

inline Loc<2> lowerExtent() const { return Loc<2>(1,1); }
inline Loc<2> upperExtent() const { return Loc<2>(1,1); }
};
```

As the names suggest, the Loc returned by lowerExtent() specifies how much data in the negative direction along each axis the stencil reads, while the Loc returned by upperExtent() specifies the corresponding requirement in the positive direction.

Once this class has been created, the stencil can be applied to an array with the correct number of dimensions simply by calling it like a function (since the generic class Stencil overloads the parentheses operator). The code that does this is included in the release in examples/Stencil/Laplace/Laplace.cpp, and is:

```01  int main(int argc, char* argv[])
02  {
03    Pooma::initialize(argc,argv);
04
05    // get problem size, number of iterations, and RNG seed
06    if (argc != 3){
07      std::cerr << "usage: Life worldSize numIter" << endl;
08      exit(1);
09    }
10    int worldSize = atoi(argv);
11    int numIter   = atoi(argv);
12
13    // create the world array, and the stencil used to update it
14    Array<2> world(worldSize, worldSize);
15    Stencil<LaplaceStencil> stencil;
16
17    // describe the interior of the world
18    Interval<1> interior_1(1, worldSize-2);
19    Interval<2> interior_2(interior_1, interior_1);
20
21    // initialize array element values
22    world = 0.0;
23    Pooma::blockAndEvaluate();
24    world(worldSize/2, worldSize/2) = 2.0 * numIter;
25    std::cout << "start" << endl << world << endl << endl;
26
27    // update elements
28    for (int i=0; i<numIter; ++i) {
29      world(interior_2) = stencil(world);
30      Pooma::blockAndEvaluate();
31      std::cout << i << endl << world << endl << endl;
32    }
33
34    Pooma::finalize();
35    return 0;
36  }
```

The key statements are on lines 15 and 29. The first of these creates an instance of the customized stencil class; the second uses that instance as a function object by invoking its overloaded operator(). As explained in the previous tutorial, the Pooma::blockAndEvaluate() calls are needed to ensure that multi-threaded evaluation has completed before the central element of the array is overwritten (line 24), and again before the whole array is written out (line 31).

Note that this code assigns the result of the stencil calculation back into the array from which values are being read. POOMA automatically creates a temporary array to ensure that values are not overwritten while the update is in progress, so that this assignment is equivalent to:

```temp = stencil(world);
world = temp;
```

If many iterations were being done, it would be more efficient to use the usual "back and forth" update method:

```for (int i=0; i<numIter; i+=2) {
temp  = stencil(world);
world = stencil(temp);
}
```

(Note that blockAndEvaluate() calls are not needed in the loop above, since both of the enclosed statements are data-parallel. blockAndEvaluate() is only needed to ensure correct execution when data-parallel and sequential statements are mixed.)

Another important thing to note about stencils is that the whole of the stencil object is copied in iterative expressions. If, for example, a cellular automaton stencil contains a lookup table, as in the following code fragment:

```class BoghosianDiffusion
{
public:
BoghosianDiffusion() {}

template <class A>
inline
typename A::Element_t
operator()(const A& x, int i, int j) const
{
return StateTable[x(i,j)];
}

other methods

protected:
const unsigned int StateBits      = 12;
const unsigned int StateTableSize = 1 << StateBits;
const int StateTable[StateTableSize] = {
4096 integer constants
};
};
```

then the whole 4096-element state transition table is copied each time the stencil is instantiated. In this case, it would be better for each stencil instance to use the same state table, either by having it be static to the class or by having each instance contain a pointer to the table.

## Managing Boundaries by Hand

Managing a stencil's boundary conditions is often the most difficult part of creating a stencil-based application. Part of the responsibility for doing this lies on the stencil writer, who must give the framework information about the stencil's extent by implementing the lowerExtent() and upperExtent() methods described in the previous section. The rest of the responsibility lies with the stencil user, who must ensure that the domain to which the stencil is applied is large enough.

Stencils can be applied to arrays in two different ways. The first, which has already been shown, uses the syntax:

```B(I) = stencil(A);
```

where A and B are arrays, and I is an Interval. When this form is used, the framework reads from the whole of the source array, but only writes to a subset of its elements. That subset is determined by the left and right extents returned by the stencil. If A is a 1000×1000 array, for example, and the left and right extents were both 2 elements wide, then only the interior 996×996 elements of the array would be written to.

The other way to apply a stencil to an array is to explicitly specify the elements that are to be overwritten. If I is once again an Interval object, then the statement:

```B(I) = stencil(A, I);
```

will read from both the elements specified by I, and the elements in the border around them. Once again, the values returned by the stencil's lowerExtent() and upperExtent() methods are used to determine how wide a border to read from. It is up to the user in this case to ensure that there are actually enough elements bordering I, i.e. that reading above or below the specified area of the array does not cause an index error.

Stencil application is implemented in this way so that a single stencil object can be applied in a variety of ways, as shown by the example program included in the examples/Stencil/Life directory. The stencil itself, which is in the file LifeStencil.h, implements a well-known cellular automaton called the Game of Life:

```01  class LifeStencil : public Stencil<LifeStencil> {
02  public:
03    LifeStencil() {}
04
05    template <class A>
06    inline
07    typename A::Element_t
08    operator()(const A& x, int i, int j) const
09    {
10      int count =
11        x(i-1, j+1) + x(i, j+1) + x(i+1, j+1) +
12        x(i-1, j  ) +             x(i+1, j  ) +
13        x(i-1, j-1) + x(i, j-1) + x(i+1, j-1) ;
14
15      int result = 0;
16      if ((count == 3) || (x(i,j) && (count == 2))){
17        result = 1;
18      }
19      return result;
20    }
21
22    inline Loc<2> lowerExtent()  const { return Loc<2>(1,1); }
23    inline Loc<2> upperExtent() const { return Loc<2>(1,1); }
24  };
```

The update rule, on lines 10-19, is that an element is "alive" in iteration i+1 if it has three live neighbors, or if it is already alive and still has two live neighbors. The program in Life.cpp applies this stencil to the interior elements of an array world in the same way as the Laplacian stencil was applied in the previous example:

```  Interval<1> interior_1(1, worldSize-2);
Interval<2> interior_2(interior_1, interior_1);
for (int i=0; i<numIter; ++i) {
world(interior_2) = stencil(world);
Pooma::blockAndEvaluate();
}
```

The Life program then re-initializes world, and repeatedly applies the stencil once again, but this time makes the domain from which it reads explicit:

```  for (int i=0; i<numIter; ++i) {
world(interior_2) = stencil(world, interior_2);
Pooma::blockAndEvaluate();
}
```

As the output of the program shows, both techniques generate the same result. However, if the following erroneous code was executed:

```  // wrong!
Interval<1> whole_1(0, worldSize-1);          // wrong interval
Interval<2> whole_2(interior_1, interior_1);
for (int i=0; i<numIter; ++i) {
world = stencil(world, whole_2);
Pooma::blockAndEvaluate();
}
```

then a run-time array bounds error would occur, since the library would be trying to read from non-existent array elements with indices such as (-1,-1).

## Managing Boundaries Automatically

Keeping track of the intervals that stencils read and write is tedious and error-prone. As the tutorial on fields shows, POOMA's Field class represents the combination of an interior domain and its guard elements. When a stencil is applied to a Field, the library automatically reads from both the interior and the guard elements, while writing to the interior alone.

Another way to reduce the amount of bookkeeping in programs is to query stencils in order to determine the interval on which they might apply. The Stencil class from which all stencils are derived has a method insetDomain(), which takes as its argument the Interval over which the stencil is to read, and calculates the Interval over which that stencil will write. For example, given an Array<2> called world the statement:

```Interval<2> interior = stencil.insetDomain(world.domain());
```

creates a (possibly empty) Interval that can be used in the update of world:

```world(interior) = stencil(world);
```

If the stencil was to be applied to only a portion of world specified by an Interval called portion, the stencil itself could calculate the right range of indices:

```Interval<2> interior = stencil.insetDomain(portion);
world(interior) = stencil(world(portion));
```

It is important to note the difference between this update statement, and the one used earlier, i.e. the difference between:

```world(interior) = stencil(world(portion));
```

and:

```world(interior) = stencil(world, interior);
```

The first of these, in which an Interval is used as an argument to the array world, reads from the specified Interval portion, and generates a result which is smaller. The second form, in which both the array world and the interval interior are arguments to stencil, generates a result which spans the specified interval by reading a larger set of values. For this reason, the statement:

```world(interior) = stencil(world, portion);    // wrong!
```

could result in a run-time bounds error, since the library would read from an interval even larger than the specified portion, and then try to assign the result to the smaller interval specified by interior.

## Summary

Many important and useful computations can be expressed as updates to individual elements of arrays, which may or may not involve manipulation of neighboring elements. POOMA therefore provides efficient support for both pure pointwise operations, and multi-element stencils in an arbitrary number of dimensions.