Thrust

Introduction

Thrust is a C++ library that allows Cuda, and OpenMP to be used underneath. Thrust is supposed to increase productivity compared to Cuda, and it is possibility to interpolate between Cuda.

Overview

Data representation

Data in the host is represented as host_vectors and in the device is represented as device_vectors. Data transfor from host to device and device to host can be done directly through assignment.

int N = 10;
thrust::host_vector<float> arr(N);
thrust::generate(arr.begin(), arr.end(), rand);

thrust::device_vector<float> arr_d(N);

// Data transfer from host to device
arr_d = arr;

Cuda Device Memory Interpolation

Memory allocated with cuda can be converted into thrust device pointers and can be used in thrust APIs

float* ptr;
cudaMalloc(&ptr, sizeof(float) * 1000);

thrust::device_ptr<float> d_ptr = thrust::device_pointer_cast(ptr);

d_ptr[0] = 1;

cudaFree(ptr);

Challenges

Creation of host_vector

There needs to be a way to convert an existing vector or memory allocated in the host to host_vectors if the high level

async

Cuda API calls get added to a cuda stream and are async, the host will not have to wait unless a synchronize is present. Thrust by default is not async. In Thrust THRUST_CPP_DIALECT >= 2011 std::async can be used to achieve async

Stream synchronization

Stream synchronization happens after an algorithm returns, this can be customized but the default is synchronizes the stream

References

https://nvidia.github.io/thrust/
https://docs.nvidia.com/cuda/thrust/index.html
https://rocmdocs.amd.com/en/latest/ROCm_API_References/Thrust.html