This work presents a framework for the automatic code generation of medical imaging algorithms on GPU accelerators. A domain-specific language (DSL) decouples the algorithm from its schedule. Together with domain knowledge and an architecture model, this enables an efficient mapping of the algorithm to the deep memory hierarchy as found in today's GPUs. Tailored code variants are generated for different target architectures, significantly improving the programmer's productivity. Compared to hand written codes, the performance of the generated low-level CUDA and OpenCL implementations is competitive on GPU accelerators from NVIDIA and AMD while preserving high portability.
Richard Membarth