# Implementation

The kernel of the PDE1 benchmark is a stencil operation, which is iteratively applied to a 3-dimensional grid for a given number of times. The stencil operation itself is rather trivial: for each inner grid element at some index position *$(i,j,k)$*, the values of the 6 directly adjacent elements are summed up, are added to a fixed number $h^2f_{i,j,k}$ and are subsequently multiplied by a constant factor. However, in the case of red/black relaxation, this stencil operation is not applied to all inner grid elements in a single step, but is restricted to two disjoint subsets in two consecutive steps. These sets are called the *red* set and the *black* set. For the PDE1 benchmark, the red set consists of those elements with an even index in the outermost dimension whereas the black set refers to the remaining elements with an odd index.

DO NREL=1,ITER ! ! RELAXATION OF THE RED POINTS ! WHERE(RED(2:NX-1,2:NY-1,2:NZ-1)) U1(2:NX-1,2:NY-1,2:NZ-1) = & & FACTOR*(HSQ*F(2:NX-1,2:NY-1,2:NZ-1)+ & & U(1:NX-2,2:NY-1,2:NZ-1)+U(3:NX,2:NY-1,2:NZ-1)+ & & U(2:NX-1,1:NY-2,2:NZ-1)+U(2:NX-1,3:NY,2:NZ-1)+ & & U(2:NX-1,2:NY-1,1:NZ-2)+U(2:NX-1,2:NY-1,3:NZ)) END WHERE WHERE(RED(2:NX-1,2:NY-1,2:NZ-1)) U (2:NX-1,2:NY-1,2:NZ-1) = U1 (2:NX-1,2:NY-1,2:NZ-1) END WHERE ! ! RELAXATION OF THE BLACK POINTS ! WHERE(.not. RED(2:NX-1,2:NY-1,2:NZ-1)) U1(2:NX-1,2:NY-1,2:NZ-1) = & & FACTOR*(HSQ*F(2:NX-1,2:NY-1,2:NZ-1)+ & & U(1:NX-2,2:NY-1,2:NZ-1)+U(3:NX,2:NY-1,2:NZ-1)+ & & U(2:NX-1,1:NY-2,2:NZ-1)+U(2:NX-1,3:NY,2:NZ-1)+ & & U(2:NX-1,2:NY-1,1:NZ-2)+U(2:NX-1,2:NY-1,3:NZ)) END WHERE WHERE(.not. RED(2:NX-1,2:NY-1,2:NZ-1)) U (2:NX-1,2:NY-1,2:NZ-1) = U1 (2:NX-1,2:NY-1,2:NZ-1) END WHERE ENDDO

The source code shows the relevant part of the HPF reference implementation, where NX, NY, and NZ denote the extents of the grid in the three dimensions and U represents the grid itself. The relaxation step is specified by aggregate array operations using the HPF triple notation. The restriction of the stencil operation to either the red set or the black set of elements is realized by embedding it within a where-construct. An additional 3-dimensional constant array of booleans called RED serves as a mask for the selection of either red or black elements. However, two where-constructs are needed for specifying each of the two relaxation steps. In fact, re-computed grid elements are temporarily stored in an auxiliary array U1 and, hence, must be copied back to U after each relaxation operation.

This HPF implementation of PDE1 can be carried over to SAC more or less straightforwardly. The stencil operation may be implemented in SAC by means of a single WITH-loop, as illustrated by the definition of the function Relax. It is specified from the perspective of a single grid element at some index position iv and then mapped to all inner grid elements by means of the WITH-loop generator.

double[.,.,.] Relax( double factor, double hsq, double[.,.,.] f, double[.,.,.] u) { u1 = with (. < iv < .) { sum = u[iv+[1,0,0]] + u[iv-[1,0,0]] + u[iv+[0,1,0]] + u[iv-[0,1,0]] + u[iv+[0,0,1]] + u[iv-[0,0,1]]; val = factor * (hsq * f[iv] + sum); } modarray (u, iv, val); return( u1); }

However, this implementation of the relaxation step has a drawback just as the HPF code: it is tailor-made for the given stencil and for the given problem dimensionality. Although stencil operations are fairly common, these restrictions make opportunities for reusing the given implementation in contexts other than the PDE1 benchmark rather unlikely. With the explicit specification of constant offset vectors this SAC implementation does not significantly exceed the level of abstraction achieved by using the triple notation in HPF.

double[+] Relax2( double factor, double hsq, double[+] f, double[+] u, double[+] weights) { u1 = with (. < iv < .) { blk = tile( shape(weights), iv-1, u); sum = sum( weights * blk); val = factor * (hsq * f[iv] + sum); } modarray( u, iv, val); return( u1); }

To raise the level of abstraction, and thus to improve opportunities for code reuse, it is desirable to abstract as far as possible from the problem specific part of the relaxation operation, in particular from dimensionality and from stencil layout. This can be achieved by providing the function Relax with an additional parameter which yields an abstract specification of the stencil by means of an array of weights. This alternative implementation is shown above. For the specific needs of the PDE1 benchmark, weights is an array of shape [3,3,3] with all elements being set to 0 except for the six elements directly adjacent to the central element, which are set to 1. In SAC, this can be specified as

weights = reshape( [3,3,3], [ 0d,0d,0d, 0d,1d,0d, 0d,0d,0d, 0d,1d,0d, 1d,0d,1d, 0d,1d,0d, 0d,0d,0d, 0d,1d,0d, 0d,0d,0d ]);

As shown the function Relax2, extracts a subarray blk from the argument array u, whose shape is identical to that of the given array of weights. This is done by means of the library function tile( *shp, offset, array*), which creates an array of shape *shp*, whose elements are taken from *array* starting at index position *offset*. The computation of the weighted sum of adjacent elements thus turns into sum( weights * block), where ( *array * array* ) refers to an element-wise multiplication of *arrays*, and sum( *array*) sums up all elements of array. As outlined in the next example, these library functions can all be implemented by a single WITH-loop each.

double[+] tile( int[.] shp, int[.] offset, double[+] array) { blk = with (. <= iv <= .) genarray( shp, array[pos+iv]); return( blk); } double sum( double[+] array) { s = with (0*shape(array) <= iv < shape(array)) fold( +, 0d, array[iv]); return( s); } double[+] * (double[+] a, double[+] b) { m = with (. <= iv <= .) genarray( min( shape(a), shape(b)), a[iv] * b[iv]); return( m); }

Abstracting from a concrete stencil actually has two benefits. On the one hand, the resulting specification supports different stencil layouts without altering the definition of Relax2. On the other hand, the function Relax2 may be applied to arrays of any dimensionality, again without modifying its implementation. This is reflected by changing the respective type declarations from double[.,.,.] to double[+] (see Representation of Arrays for a discussion of SAC array types).

Nevertheless, the specification of Relax2 still is specific to the PDE1 benchmark as it explicitly implements the arithmetic operations involving factor, hsq, and f on an element-wise basis. However, to improve opportunities for code reuse, it is desirable to clearly separate application specific code from more generally applicable code. This is achieved by the definition of the function RelaxPDE1, as shown below. Whereas RelaxPDE1 itself contains the benchmark specific part of the relaxation step, the more general relaxation kernel of summing up some adjacent array elements based on an array of weights is isolated within a separate function definition.

double[+] RelaxPDE1( double factor, double hsq, double[+] f, double[+] u, double[+] weights) { rel = factor * (hsq * f + RelaxKernel( u, weights)); rel = CombineInnerOuter( rel, u); return( rel); } double[+] RelaxKernel( double[+] u, double[+] weights) { rel = with (. < iv < .) { blk = tile( shape(weights), iv-1, u); sum = sum( weights * blk); } modarray( u, iv, sum); return( rel); } double[+] CombineInnerOuter( double[+] inner, double[+] outer) { res = with (. < iv < .) modarray( outer, iv, inner[iv]); return( res); }

Rather than being specified element-wise as in preceding implementations of the PDE1 relaxation step, the arithmetic operations involving factor, hsq, and f are realized by means of the respective aggregate array operations, which are provided by the SAC array library. However, they effect all elements of the involved arrays in a uniform way rather than being limited to the inner ones, as is required by PDE1 to keep boundary elements unchanged. Therefore, the additional function CombineInnerOuter is employed for their subsequent correction. As shown above, this function takes two identically shaped arrays inner and outer as arguments and creates a new array whose inner elements are taken from inner whereas its boundary elements are taken from outer. Similar to the implementation of RelaxKernel, the function CombineInnerOuter is completely unspecific to the PDE1 benchmark and may be reused in any suitable context.

With an appropriate implementation of the PDE1 relaxation step at hand we may now focus on its extension to red/black relaxation. Actually, the effect of relaxation is to be restricted to two disjoint subsets of grid elements, i.e. to the red set and to the black set, as discussed in the beginning of this section. In analogy to the HPF implementation they are represented by a single boolean array red, which is initialized according to the specifications of the PDE1 benchmark. Whereas in HPF the restriction of the stencil operation to either the red or to the black elements is specified by means of the built-in where-construct, exactly the same functionality may be provided in SAC by means of a function where, as defined below. Being a generalization of the previously defined function CombineInnerOuter, it takes as arguments a boolean array mask and two arrays a and b. Assuming that all three arguments are of equal shapes, it creates a new array whose elements are taken either from a or from b, depending on the respective value of mask.

double[+] where( bool[+] mask, double[+] a, double[+] b) { c = with (. <= iv <= .) genarray( shape(mask), mask[iv] ? a[iv] : b[iv]); return( c); } double[+] RedBlack( double factor, double hsq, double[+] f, double[+] u, double[+] weights, bool[+] red, int iter) { for (nrel=1; nrel<=iter; nrel+=1) { u = where( red, RelaxPDE1( factor, hsq, f, u, weights), u); u = where( !red, RelaxPDE1( factor, hsq, f, u, weights), u); } return( u); }

Based on this implementation of where, red/black successive over-relaxation may be specified as shown. Note that the resulting function RedBlack, which basically implements the PDE1 benchmark, is invariant against the problem dimensionality as well as against variations of the stencil layout. In its definition, the black set is referred to by !red, i.e., by using the element-wise extension of the negation operator, as defined in the SAC array library.