HPC HPC Back Domain Decomposition GPU Acceleration OpenMP ASIC FPGA Cuda Optimizing x86 code LLVM MCA MPI Bindings CUDA Direct Thrust