POOMA Tutorial 2Red/Black Update

Introduction

This tutorial shows how Range objects can be used to specify more general multi-valued array indices. It also introduces the ConstArray class, and delves a bit more deeply into the similarities between POOMA's arrays and the Standard Template Library's iterators.

Red-Black Update

Jacobi iteration is a good general-purpose relaxation method, but there are several ways to speed up its convergence rate. One of these, called red-black updating, can also reduce the amount of memory that a program requires. Imagine that the array's elements are alternately colored red and black, like the squares on a checkerboard. In even-numbered iterations, the red squares are updated using the values of their black neighbors; on odd-numbered iterations, the black squares are updated using the red squares' values. These updates can clearly be done in place, without any need for temporaries, and yield faster convergence for an equivalent number of calculations than simple Jacobi iteration.

A complete program that implements this is shown below (and is included in the release as examples/Solvers/RBJacobi). Its key elements are the declaration and initialization of two Range objects on line 37, the definition of the function that applies Jacobi relaxation on a specified domain on lines 9-17, and the four calls to that function on lines 43-47. The sections following the program source discuss each of these points in turn.

```01  #include "Pooma/Arrays.h"
02
03  #include <iostream>
04
05  // The size of each side of the domain.
06  const int N = 20;
07
08  // Apply a Jacobi iteration on the given domain.
09  void
10  ApplyJacobi(
11      const Array<2>       & V,       // to be relaxed
12      const ConstArray<2>  & b,       // fixed term
13      const Range<1>       & I,       // range on first axis
14      const Range<1>       & J        // range on second axis
15  ){
16      V(I,J) = 0.25 * (V(I+1,J) + V(I-1,J) + V(I,J+1) + V(I,J-1) - b(I,J));
17  }
18
19  int
20  main(
21      int                 argc,           // argument count
22      char*               argv[]          // argument list
23  ){
24      // Initialize POOMA.
25      Pooma::initialize(argc, argv);
26
27      // The array we'll be solving for.
28      Array<2> V(N, N);
29      V = 0.0;
30
31      // The right hand side of the equation.
32      Array<2> b(N,N);
33      b = 0.0;
34      b(N/2, N/2) = -1.0;
35
36      // The interior domain, now with stride 2.
37      Range<1> I(1, N-3, 2), J(1, N-3, 2);
38
39      // Iterate 100 times.
40      for (int iteration=0; iteration<100; ++iteration)
41      {
42          // red
43          ApplyJacobi(V, b, I,   J);
44          ApplyJacobi(V, b, I+1, J+1);
45          // black
46          ApplyJacobi(V, b, I+1, J);
47          ApplyJacobi(V, b, I,   J+1);
48      }
49
50      // Print out the result.
51      std::cout << V << std::endl;
52
53      // Clean up and report success.
54      Pooma::finalize();
55      return 0;
56  }
```

Ranges

Our first requirement is a simple, efficient way to specify non-adjacent array elements. POOMA borrows the terminology of Fortran 90 and other data-parallel languages, referring to the spacing between a sequence of index values as the sequence's stride. For example, the sequence of indices {1,3,5,7} has a stride of 2, while the sequence {8,13,18,23,28} has a stride of 5, and the sequence {10,7,4,1} has a stride of -3.

POOMA programs represent index sequences with non-unit strides using Range objects. The templated class Range is a generalization of the Interval class seen in the previous tutorial (although for implementation reasons Interval is not derived from Range). When a Range is declared, the program must specify its rank (i.e., the number of dimensions it spans). The object's constructor parameters then specify the initial value of the sequence it represents, the upper bound on the sequence's value (or lower bound, if the stride is negative), and the actual stride. For example, the three sequences in the previous paragraph would be declared as:

```Range<1> first ( 1,  7,  2);
Range<1> second( 8, 30,  5);
Range<1> third (10,  0, -3);
```

Note that the range's bound does not have to be a value in the sequence: an upward range stops at the greatest sequence element less than or equal to the bound, while a downward range stops at the smallest sequence element greater than or equal to the bound. This conforms to the meaning of the Fortran 90 triplet notation.

It may seem redundant to define a separate class for Interval, since it is just a Range with a stride of 1. However, the use of an Interval is a signal that certain optimizations are possible during compilation that take advantage of Interval's unit stride. These optimizations cannot efficiently be deferred until the program is executing, since that would, in effect, require a conditional inside an inner loop. Another reason for making Interval and Range different classes is that Intervals can be used when declaring Array dimensions, but Ranges cannot, since Arrays must always have unit stride.

Engines

The previous tutorial said that the use of a non-scalar index as an array subscript selected a section from that array. The way this is implemented is tied into POOMA's notion of an engine. Arrays are just handles on engines, which are entities that give the appearance of managing actual data. Engines come in two types: storage engines, which actually contain data, and proxy engines, which can alias storage engines' data areas, calculate data values on demand, or do just about anything else in order to give the appearance there's an actual data area in there somewhere.

When an Array is declared, a storage engine is created to store that array's elements. When that array is subscripted with an Interval or a Range, the temporary Array that is created is bound to a view engine, which aliases the memory of the storage engine. Similarly, when an Array or ConstArray is passed by value to a function, the parameter is given a view engine, so that the values in the argument are aliased, rather than being copied. This happens in the calls to ApplyJacobi(), which is discussed below.

POOMA's engine-based architecture allows it to implement a number of useful tools efficiently. One of the simplest of these is the ConstantFunction engine, which provides a way to make a scalar behave like an array. For example, the following statements:

```ConstArray<1, double, ConstantFunction> c(10);
c.engine().setConstant(3.14);
```

produce a full-featured read-only array that returns 3.14 for all elements. This is more efficient and uses less storage than making a Brick array with constant values. Engines that select components from arrays of structured types, or present arrays whose values are calculated on the fly as simple functions of their indices, are discussed in Tutorial 4 and Tutorial 6.

Passing Arrays to Functions

Lines 9-17 of this program define a function that applies Jacobi relaxation to a specified subset of the elements of an array. The actual calculation appears identical to that seen in the previous tutorial. However, the function's parameter declarations specify that I and J are Range objects, instead of Intervals. This means that the set of elements being read or written is guaranteed to be regularly spaced, although the actual spacing is not known until the program is run.

Another new feature in this function declaration is the use of the class ConstArray. Declaring something to be of type ConstArray is not the same as declaring it to be a const Array. As mentioned earlier, POOMA's Array classes are handles on actual data storage areas. If something is declared to be a const Array, it cannot itself be modified, but the data it refers to can be. This is illustrated in line 16, which modifies the elements of V even though it is declared const. Put another way, the following is perfectly legal:

```Array<1> original(10);
const Array<1>& reference = original;
reference(4) = 3.14159;
```

If an immutable array is really desired, the program must use the class ConstArray. This class overloads the element access method operator() to return a constant reference to an underlying data element, rather than a mutable reference. As a result, the following code would fail to compile:

```Array<1> original(10);
ConstArray<1>& reference = original;
reference(4) = 3.14159;
```

since the assignment on its third line is attempting to overwrite a const reference. In fact, Array is derived from ConstArray by adding assignment and indexing that return mutable references. This allows an Array to be used as a ConstArray, but not vice versa. There is a subtle issue here though. One cannot initialize a ConstArray object with an Array object. The following code would fail to compile:

```Array<1> a(10);
ConstArray<1> ca(a);
```

This problem results from a design decision to allow a ConstArray to be constructed with an arbitrary domain:

```template<class Sub1>
ConstArray(const Sub1 & s1);
```

While an Array is a ConstArray, this function will be chosen by C++ compilers over the copy constructor because an exact match is preferred over a promotion to a base class. To avoid this problem, pass arrays by reference.

It is good programming practice to use ConstArray wherever possible, both because it documents the way the particular array is being used, and because it makes it harder (although not impossible) for functions to have inadvertent side effects.

It is important to note that the Range arguments to ApplyJacobi() must be defined as const references. The reason for this is that C++ does not allow programs to bind non-const references to temporary variables. For example, the following code is illegal:

```void fxn(int& i)
{
....
}

void caller()
{
int a = 5;
fxn(a + 3);
}
```

Similarly, when the main body of the relaxation program adds offsets to the Range objects I and J on lines 44, 46, and 47, the overloaded addition operator creates a temporary object. ApplyJacobi() must therefore declare its corresponding arguments to be const Range<1>&.

The bottom line is that if a routine can get a temporary object, arguments should be passed by value or by const reference. If there is no possibility of the routine getting a temporary, arguments can be declared to be non-const reference. For example:

```template<int D, class T, class E>
void f(const Array<D, T, E>& a);

template<int D, class T, class E>
void g(Array<D, T, E> a);

template<int D, class T, class E>
void h(Array<D, T, E>& a);

void example()
{
Interval<3> I(...);
Array<3> x(...);

f(x);                               // OK
g(x);                               // OK
h(x);                               // OK
f(x(I));                            // OK
g(x(I));                            // OK
h(x(I));                            // Bad, x(I) generates a temporary.
}
```

Note again that in the functions f(), g(), and h(), the array argument a can appear on the left hand side of an assignment. This is because Array is like an STL iterator: a const iterator or const Array can be dereferenced, it just can't be modified itself. If you want to ensure that the array itself can't be changed, use ConstArray.

Calling the Function

Lines 43-47 bring all of this together by passing the arrays V and b by value to ApplyJacobi(). The program makes four calls to this function; the first pair update the red array elements, while the second pair update the black array elements.

To see why two calls are needed to update each pair, consider the fact that each Range object specifies one half of the array's elements. The use of two orthogonal Ranges therefore specifies (1/2)2=1/4 of the array's elements. Simple counting rules of this kind are a useful check on the correctness of complicated subscript expressions.

As discussed above, each call to ApplyJacobi() constructs one temporary Array and one temporary ConstArray, each of which is bound to a view engine instead of a storage engine. Since these temporary objects are allocated automatically, they are also automatically destroyed when the function returns. POOMA uses reference counting to determine when the last handle on an actual area of array storage has been destroyed, and releases that area's memory at that time. Note that in this case, both arrays are bound to view engines, which do not have data storage areas of their own, so creating and destroying ApplyJacobi()'s arguments is very fast.

A Note on Expressions

As you may have guessed from the preceding discussion, POOMA expressions are first-class non-writable Arrays with an expression engine. As a consequence, expressions can be subscripted directly, as in:

```Array<1> a(Interval<1>(-4, 0)), b(5), c(5);
for (int i = 0; i < 5; i++)
c(i) = (a + 2.0 * b)(i);
```

This is equivalent, both semantically and in performance, to the loop:

```for (int i = 0; i < 5; i++)
c(i) = a(i - 4) + 2.0 * b(i);
```

Note that the offsetting of the non-zero-based arrays in expressions is handled automatically by POOMA.

POOMA also now includes a function called iota(), which allows applications to initialize array elements in parallel using expressions that depend on elements' indices. Instead of writing a sequential loop, such as:

```for (i = 0; i < n1; ++i)
{
for (j = 0; j < n2; ++j)
{
a(i,j) = sin(i)+j*5;
}
}
```

a program could simply use:

```a = sin(iota(n1,n2).comp(0)) + iota(n1,n2).comp(1)*5;
```

In general, iota(domain) returns an Array whose elements are vectors, such that iota(domain)(i,j) is Vector<2,int>(i,j). These values can be used in expressions, or stored in objects, as in:

```Iota<2>::Index_t I(iota(n1,n2).comp(0));
Iota<2>::Index_t J(iota(n1,n2).comp(1));
a = sin(I*0.2) + J*5;
```

Using Two-Dimensional Ranges

As a general rule, whenever a set of objects are always used together, they should be combined into a single larger structure. If we examine the example program shown at the start of this tutorial, we can see that the two Range objects used to subscript arrays along their first and second axes are created in the same place, passed as parameters to the same function, and always used as a pair. We could therefore improve this program by combining these two objects in some way.

In POOMA, that way is to use a 2-dimensional Interval or Range instead of a pair of 1-dimensional Intervals or Ranges. A 2-dimensional Interval is just the cross-product of its 1-dimensional constituents: it specifies a dense rectangular patch of an array. Similarly, a 2-dimensional Range is a generalization of the red or black squares on a checkerboard: the elements it specifies are regularly spaced, but need not have the same spacing along different axes.

An N-dimensionalInterval is declared in the same way as its 1-dimensional cousin. An N-dimensional Interval is usually initialized by giving its constructor N 1-dimensional Intervals as arguments, as shown in the following example:

```Interval<2> calc( Interval<1>(1, N), Interval<1>(1, N) );
```

Multi-dimensional POOMA arrays can be subscripted with any combination of 1-, 2-, and higher-dimensional indices, so long as the total dimensionality of those indices equals the dimension of the array. Thus, a 4-dimensional array can be subscripted using:

• four 1-dimensional indices
• a 2-dimensional index and a pair of 1-dimensional indices (in any order)
• a pair of 2-dimensional indices
• one 3-dimensional index and one 1-dimensional index (in any order); or
• a single 4-dimensional index.

If only a single array element is required, a new templated index class called Loc can be used as an index. Like other domain classes, this class can specify up to seven dimensions; unlike other domain classes, it only specifies a single location along each axis. Thus, the declaration:

```Loc<2> origin(0, 0);
```

specifies the origin of a grid, while the declaration:

```Loc<3> centerBottom(N/2, N/2, 0);
```

specifies the center of the bottom face of an N×N×N rectangular block. Loc objects are typically used to specify key points in an array, or as offsets for specifying shifted domains. The latter of these uses is shown in the function ApplyJacobi() in the program below (which is included in the release as examples/Solvers/RBJacobi). This program re-implements the red/black relaxation scheme introduced at the start of this tutorial using 2-dimensional subscripting:

```01  #include "Pooma/Arrays.h"
02
03  #include <iostream>
04
05  // The size of each side of the domain. Must be even.
06  const int N = 20;
07
08  // Apply a Jacobi iteration on the given domain.
09  void
10  ApplyJacobi(
11      const Array<2>       & V,       // to be relaxed
12      const ConstArray<2>  & b,       // fixed term
13      const Range<2>       & IJ       // region of calculation
14  ){
15    V(IJ) = 0.25 * (V(IJ+Loc<2>(1, 0)) + V(IJ+Loc<2>(-1,  0)) +
16                    V(IJ+Loc<2>(0, 1)) + V(IJ+Loc<2>( 0, -1)) - b(IJ));
17  }
18
19  int
20  main(
21      int                 argc,           // argument count
22      char*               argv[]          // argument vector
23  ){
24      // Initialize POOMA.
25      Pooma::initialize(argc, argv);
26
27      // The calculation domain.
28      Interval<2> calc( Interval<1>(1, N-2), Interval<1>(1, N-2) );
29
30      // The domain with guard elements on the boundary.
31      Interval<2> guarded( Interval<1>(0, N-1) , Interval<1>(0, N-1) );
32
33      // The array we'll be solving for.
34      Array<2> V(guarded);
35      V = 0.0;
36
37      // The right hand side of the equation.
38      Array<2> b(calc);
39      b = 0.0;
40      b(N/2, N/2) = -1.0;
41
42      // The interior domain, now with stride 2.
43      Range<2> IJ( Range<1>(1, N-3, 2), Range<1>(1, N-3, 2) );
44
45      // Iterate 100 times.
46      for (int i=0; i<100; ++i)
47      {
48          ApplyJacobi(V, b, IJ);
49          ApplyJacobi(V, b, IJ+Loc<2>(1, 1));
50          ApplyJacobi(V, b, IJ+Loc<2>(1, 0));
51          ApplyJacobi(V, b, IJ+Loc<2>(0, 1));
52      }
53
54      // Print out the result.
55      std::cout << V << std::endl;
56
57      // Clean up and report success
58      Pooma::finalize();
59      return 0;
60  }
```

The keys to this version of red/black relaxation are the Interval declarations on lines 28 and 31, and the array declarations on lines 34 and 38. The first Interval declaration defines the N-2 × N-2 region on which the calculation is actually done; the region defined by the second declaration pads the first with an extra column on each side, and an extra row on the top and the bottom. These extra elements are not part of the problem domain proper, but instead are used to ensure zero boundary conditions. Any other arbitrary boundary condition could be represented equally well by assigning values to these padding elements.

Using Interval objects that run from 1 to N-2 to specify the dimensions of the Interval object calc defined on line 28 means that when the array b is defined (line 38), its legal indices also run from 1 to N-2 along each axis. While POOMA uses 0..N-1 indexing by default, any array can have arbitrary lower and upper bounds along any axis, as this example shows. This is particularly useful when the natural representation for a problem uses a domain whose indices are in -N..N.

Note that line 31 could equally well have been written:

```Interval<2> guarded(N, N);
```

In other words, integers work inside of Domain declarations the same way they do in Array declarations. If a program needs to declare a point, it can use:

```Interval<2> x(Interval<1>(2, 2), Interval<1>(3, 2));
```

The declaration of calc on line 28 does need to be written as it is because the axes start at 1.

Examination of the update loop on lines 48-51, and the update assignment statement on lines 15-16, shows that the padding elements are never assigned to. Instead, the assignment on lines 15-16 only overwrites the interior of the array V. Note also that the domain used for the array b, which represents the fixed term in the Laplace equation, is only defined on the inner N-2 × N-2 domain. While the memory this saves is inconsequential in this 20×20 case, the savings grow quickly as the size and dimension of the problems being tackled increase.

Periodic Boundary Conditions

Our last look at red/black updating replaces the zero boundary condition of the previous examples with periodic boundaries in both directions. As is usual in programs of this kind, this is implemented by copying the values on one edge of the array into the padding elements next to the array's opposite edge after each relaxation iteration. For example, the padding elements to the right of the last column of the array are filled with the values from the first actual column of the array, and so on. In the program shown below (included in the release as examples/Solvers/PeriodicJacobi), the "actual" values of the array V are stored in the region [1..N]×[1..N]. Elements with an index of either 0 or N+1 on either axis are padding, and are to be overwritten during each iteration.

The function that actually updates the periodic boundary conditions is called ApplyPeriodic(), and is shown on lines 20-33 below. The key to understanding this code is that when a "naked" integer is used to subscript a POOMA array, the result of that subscripting operation is reduced by one dimension in relation to that of the subscripted array. Thus, if a 2-dimensional array is subscripted using two specific integers, the result is a scalar value; if that same array is subscripted using an integer and a Interval or Range, the result is a 1-dimensional array.

Note that subscripting an Array with a Loc<2> yields a single scalar value, just as subscripting with two integers does, while subscripting with an Interval or Range that happens to refer to just one point yields an Array with just one element. There isn't a zero-dimensional Array (at least not in this release of POOMA), which is what the Loc<2> would have returned. The reduction in rank has to come from compile-time information, so Loc and integers reduce dimensionality, but Interval and Range do not.

```01  #include "Pooma/Arrays.h"
02
03  #include <iostream>
04
05  // The size of each side of the domain. Must be even.
06  const int N = 18;
07
08  // Apply a Jacobi iteration on the given domain.
09  void
10  ApplyJacobi(
11      const Array<2>       & V,             // to be relaxed
12      const ConstArray<2>  & b,             // fixed term
13      const Range<2>       & IJ             // region of calculation
14  ){
15      V(IJ) = 0.25 * (V(IJ+Loc<2>(1,0)) + V(IJ+Loc<2>(-1,0)) +
16                      V(IJ+Loc<2>(0,1)) + V(IJ+Loc<2>(0,-1)) - b(IJ));
17  }
18
19  // Apply periodic boundary conditions by copying each slice in turn.
20  void
21  ApplyPeriodic(
22      const Array<2>       & V               // to be wrapped
23  ){
24      // Get the horizontal and vertical extents of the domain.
25      Interval<1> I = V.domain()[0],
26                  J = V.domain()[1];
27
28      // Copy each of the four slices in turn.
29      V(0,    J) = V(N, J);
30      V(N+1,  J) = V(1, J);
31      V(I,    0) = V(I, N);
32      V(I,  N+1) = V(I, 1);
33  }
34
35  int main(
36      int                 argc,           // argument count
37      char*               argv[]          // argument vector
38  ){
39      // Initialize POOMA.
40      Pooma::initialize(argc, argv);
41
42      // The calculation domain.
43      Interval<2> calc( Interval<1>(1, N), Interval<1>(1, N) );
44
45      // The domain with guard elements on the boundary.
46      Interval<2> guarded( Interval<1>(0, N+1), Interval<1>(0, N+1) );
47
48      // The array we'll be solving for.
49      Array<2> V(guarded);
50      V = 0.0;
51
52      // The right hand side of the equation.
53      Array<2> b(calc);
54      b = 0.0;
55      b(3*N/4,   N/4) = -1.0;
56      b(  N/4, 3*N/4) =  1.0;
57
58      // The interior domain, now with stride 2.
59      Range<2> IJ( Range<1>(1, N-1, 2), Range<1>(1, N-1, 2) );
60
61      // Iterate 200 times.
62      for (int i=0; i<200; ++i)
63      {
64          ApplyJacobi(V, b, IJ);
65          ApplyJacobi(V, b, IJ+Loc<2>(1,0));
66          ApplyJacobi(V, b, IJ+Loc<2>(0,1));
67          ApplyJacobi(V, b, IJ+Loc<2>(1,1));
68          ApplyPeriodic(V);
69      }
70
71      // Print out the result.
72      std::cout << V << std::endl;
73
74      // Clean up and report success.
75      Pooma::finalize();
76      return 0;
77  }
```

Note that, as we shall see in the next tutorial, the body of ApplyPeriodic() could more generally be written:

```29  V(I.first(), J)         = V(I.last()-1,  J);
30  V(I.last(),  J)         = V(I.first()+1, J);
31  V(I,         J.first()) = V(I,           J.last()-1);
32  V(I,         J.last())  = V(I,           J.first()+1);
```

Operations and Their Results

One of the primary features of the POOMA array concept is the notion that "everything is an Array". For example, if you take a view of an Array, the result is a full-featured array. If you add two Arrays together, the result is an Array. The table below illustrates this, using the declarations:

 Array<2,Vector<2>> a Array<2> b Interval<2> I Interval<1> J Range<2> R