Lead Developers: Miguel Sousa Diogo, TBD
The ultimate goal is to achieve automatic heterogeneous computation, having a host and a accelerator cooperate in computing with-loops. This would currently be achieved by basically merging the MT and CUDA backends of the SaC compiler, so at the moment one can think of the host as being a multicore machine and the accelerator as being a Nvidia graphics card. Kernels are unsuited for MT, and vice-versa. Dividing with-loops between host and device would leave computation results distributed across device and host memory. The available SaC primitives can only copy whole arrays, and are thus unsuitable. Literature suggests that dynamic scheduling is more effective on heterogeneous systems. With dynamic scheduling we can no longer define all data transfers statically, like in the CUDA backend. Memory transfers are bad, we want to make as few and as short as possible.
Current Status: As of 29/03/2012 Generating CUDA and MT with-loop versions is currently implemented by duplicating the with-loops early. The copy is unmarked for CUDA and both are placed inside a conditional, each on a different branche. The condition itself is a dummy for now. Regular transformations from the CUDA and MT backend then operate on their own separate copies, resulting in both different version of the same with-loop.
For more info, check paper on tex repository under projects/cudahybrid/tfp12.