Indirect Addressing

**Contents:**

Introduction

Notation

Example

Summary

Indirect addressing is a fundamental operation in many numerical and scientific algorithms. Instead of accessing array elements with loop indices directly, indirect addressing uses the element of one array (sometimes called an index table or indirection table) as indices for another array. These index tables can store either static information (such as the neighbors of points in an unstructured mesh), or dynamic information (such as the sorting order of the elements in a vector).

This tutorial shows how to perform indirect addressing in POOMA, and discusses some of the subtleties and complexities that arise when indirection and multithreading are combined.

Suppose that the array `X` contains the following values:

a | b | c | d |

while the array `J` contains:

3 | 0 | 1 | 2 |

A program could re-order the elements of `X` while copying
them into another array `Y` using the following loop:

for (int i=0; i<4; ++i) { Y = X(J(i)); }

The effect of this would be to fill `Y` with the following
values:

d | a | b | c |

POOMA allows this operation to be expressed more
economically, simply by using the array `J` as a subscript on
the array `X` directly:

Y = X(J);

Indirect addressing on the source side of an assignment is sometimes called "pull" addressing, since the index array's values are being used to "pull" values from the source into the destination. POOMA also supports "push" addressing, in which the index array is used on the destination side of the assignment. The syntax for this is simply:

Y(J) = X;

which is equivalent to the loop:

for (int i=0; i<4; ++i) { Y(J(i)) = X; }

This operation would fill `Y` with:

b | c | d | a |

So long as `J` is a strict permutation of the indices
*0...N-1*, this will have the same effect as the loop shown
above. The effects of this statement if `J` has repeated or
missing elements are discussed in the next section.

One-dimensional arrays of integers can be used as subscripts to
one-dimensional arrays of any other type, but how are
multi-dimensional arrays to be subscripted? In POOMA, the
answer is to use arrays whose elements are of type
`Loc`. An `Array<Loc<2>>`, for
example, can be used to re-order the elements of a two-dimensional
array, since each element of the index array can act as a coordinate
in the data array. Similarly, an `Array<Loc<3>>`
can be used to subscript a 3-dimensional array of any type. Future
releases of POOMA will support higher-dimensional indirect
addressing as well.

Indirect addressing is a very powerful tool, but must be used carefully. The most important consideration is that the order of data movement during indirection is not defined. If indirection is performed using an index table that sends many values to a single location, for example, then there is no way to predict which of those values will be written into that location.

However, indirect addressing can always be used to *read*
values safely, and to thereby perform a scatter operation.
Suppose a source array `S` contains the values
`[3.14, 2.71]`, while an index array `IA` contains
the values [0,1,0,0,1,1]. The expression `S(IA)` yields:

[3.14, 2.71, 3.14, 3.14, 2.71, 2.71]

and can always be used safely on the right-hand side of an
expression. This works because the domain of `a(b)` is the
domain of `b`. In expressions like
`a(b) = c`, the domains of `c` and `b`
have to match, but the domain of `a` can be arbitrary.

The example for this tutorial is a 1-dimensional Fast Fourier Transform (FFT) that shuffles data using indirect addressing. This FFT implementation is not efficient---it recalculates trigonometric constants repeatedly, for example, rather than pre-calculating and storing them---but it does illustrate the power of indirection.

The source for this example is included in the release in the file
`examples/Indirection/FFT/FFT.cpp`. The main body of this
program initializes POOMA, creates and fills an array of complex
values, transforms it, and prints the result of that transform, as
shown below:

138 int main(int argc, char* argv[]) 139 { 140 Pooma::initialize(argc, argv); 141 142 int size = 16; 143 144 Array<1, complex<double>, Brick> a(size); 145 146 int i; 147 for (i = 0; i < size; ++i) 148 { 149 a(i) = sin(4*i*Pi/size); 150 } 151 152 std::cout << a << std::endl; 153 154 fft(a); 155 156 std::cout << a << std::endl; 157 158 Pooma::finalize(); 159 return 0; 160 }

The key statement is on line 154, where the `fft()`
function is invoked. The program contains two overloaded versions of
this function. The first version, shown below, determines the level
of the FFT (i.e. the number of recursive steps the calculation
requires), then invokes the second version, which actually performs
the calculations. (Note that in this simple example, the input
array's length is required to be a power of two. Also, `Pi`
is defined as a `static const double` equal to the value of
pi computed from the expression `acos(-1.0)` because some
compilers do not define mathematical constants such as `M_PI`
in the <math.h> header file.)

117 void fft(const Array<1, complex<double>, Brick> &array) 118 { 119 int size = array.domain().size(); 120 121 //Determine size as power of 2122 int level = -1; 123 while (size > 1) 124 { 125 PAssert(!(size & 1)); 126 ++level; 127 size /= 2; 128 } 129 130 if (level >= 0) 131 fft(array, level); 132 }

The second version of `fft()` does the real
number-crunching. If the computation has reached its final stage, odd
and even elements are combined directly (lines 106-111). If the
computation is still recurring, the elements are shuffled, a
half-sized transform is applied on each subsection, and the results
are combined (lines 100-102). All of these operations use indirect
addressing to move data values around. Most of the rest of the
program can be viewed as infrastructure needed to make this data
movement simple and efficient.

083 void fft(const Array<1,complex<double>,Brick> &array, int level) 084 { 085 Interval<1> domain = array.domain(); 086 087 if (level > 0) 088 { 089 ConstArray<1, int, IndexFunction<LeftMap> > left(domain); 090 ConstArray<1, int, IndexFunction<RightMap> > right(domain); 091 ConstArray<1, int, IndexFunction<ShuffleMap> > shuffle(domain); 092 ConstArray<1, complex<double>, IndexFunction<TrigFactor> > trig(domain); 093 094 left.engine().setFunctor(LeftMap(level)); 095 right.engine().setFunctor(RightMap(level)); 096 shuffle.engine().setFunctor(ShuffleMap(level)); 097 trig.engine().setFunctor(TrigFactor(level)); 098 099 //Shuffle values, compute n/2 length ffts, combine results.100 array = array(shuffle); 101 fft(array, level-1); 102 array = array(left) + trig*array(right); 103 } 104 else 105 { 106 int size = domain.size(); 107 Range<1> left (0, size-2, 2), 108 right(1, size-1, 2); 109 110 array(left) += array(right); 111 array(right) = array(left) - 2.0 * array(right); 112 } 113 }

The shuffling step on line 100 uses an indirection array
called `shuffle` to pull values into the right positions. This
array, which is declared on line 91, is a
`ConstArray` of integers. Instead of storing the
values, however, the array calculates them on the fly using an
`IndexFunction` engine, which is bound to the array on
line 96. The `IndexFunction` engine works as one would
expect: having been specialized with a user-defined class with an
overloaded `operator()`, the engine transforms an index
`i` into some new value by calling that `operator()`.
In this case, the specializing class is `ShuffleMap`, which is
shown below:

003 struct ShuffleMap 004 { 005 ShuffleMap(int n = 0) 006 : degree_m(n) 007 { 008 nbit_m = 1 << n; 009 mask1_m = nbit_m - 1; 010 mask2_m = ~(nbit_m | mask1_m); 011 } 012 013 int operator()(int i) const 014 { 015 return 016 (mask2_m & i) 017 | ( (mask1_m & i) << 1 ) 018 | ( (nbit_m & i) ? 1 : 0 ); 019 } 020 021 int nbit_m, mask1_m, mask2_m, degree_m; 022 };

Similar engines are used to select and combine elements of the
arrays after the sub-FFTs have been performed. These use the
overloaded `operator()` in the classes `LeftMap` and
`RightMap`, shown below:

029 struct LeftMap 030 { 031 LeftMap(int n = 0) 032 : nbit_m(~(1 << n)) 033 { } 034 035 int operator()(int i) const 036 { 037 return (nbit_m & i); 038 } 039 040 int nbit_m; 041 }; 042 043 struct RightMap 044 { 045 RightMap(int n = 0) 046 : nbit_m(1 << n) 047 { } 048 049 int operator()(int i) const 050 { 051 return (nbit_m | i); 052 } 053 054 int nbit_m; 055 };

Finally, an `IndexFunction` engine is also used to calculate
the trigonometric weights used in combining. This
`IndexFunction` is an extreme example of trading time for
space: it does not store anything, but repeatedly recalculates factors
on demand.

065 struct TrigFactor 066 { 067 TrigFactor(int n = 0) 068 : n_m(1 << n) 069 { } 070 071 complex<double> operator()(int i) const 072 { 073 return complex<double>(cos(Pi*i/n_m), sin(Pi*i/n_m)); 074 } 075 076 int n_m; 077 };

Efficient support for indirect addressing---the use of the values
in one array to change the way another array's elements are
accessed---is one of the features that characterizes full-strength
numerical libraries. This release of POOMA supports indirect
addressing in both "push" and "pull" modes using conventional
data-parallel syntax, without compromising the performance of regular
index operations.

[Prev] | [Home] | [Next] |