Lead Developers: Miguel Sousa Diogo, TBD

The ultimate goal is to achieve automatic heterogeneous computation, having a host and a accelerator cooperate in computing with-loops. This would currently be achieved by basically merging the MT and CUDA backends of the SaC compiler, so at the moment one can think of the host as being a multicore machine and the accelerator as being a Nvidia graphics card. Kernels are unsuited for MT, and vice-versa. Dividing with-loops between host and device would leave computation results distributed across device and host memory. The available SaC primitives can only copy whole arrays, and are thus unsuitable. Literature suggests that dynamic scheduling is more effective on heterogeneous systems. With dynamic scheduling we can no longer define all data transfers statically, like in the CUDA backend. Memory transfers are bad, we want to make as few and as short as possible.

Current Status: As of 29/03/2012 Generating CUDA and MT with-loop versions is currently implemented by duplicating the with-loops early. The copy is unmarked for CUDA and both are placed inside a conditional, each on a different branche. The condition itself is a dummy for now. Regular transformations from the CUDA and MT backend then operate on their own separate copies, resulting in both different version of the same with-loop.

Needed Work:

  • Implement distributed variables, and their control structure. For simplicity, arrays are to be split into blocks of some fixed size, each block tracked separately using a simple MSI scheme.
  • Implement conversion functions for distributed variables.
  • Introduce conversion functions before array accesses, and after writes to arrays.
  • Assign a worker thread to manage CUDA.
  • Place whole CUDA/host with-loop conditionals inside SPMD functions.
  • Change dummy condition to a check for the CUDA management thread.
  • Place the dynamic scheduler loop inside each branch, encapsulating the with-loop and required conversion functions.

For more info, check paper on tex repository under projects/cudahybrid/tfp12.