Published on

HelloCuda 系列: CUDA Thrust Basic

  • Thrust: The C++ Parallel Algorithms Library 最主要的2个component:
  • thrust::device_vector
  • thrust::host_vector
#include <thrust/host_vector.h>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <thrust/reduce.h>
#include <iostream>

void thrustBasic() {
   int host_data[] = {1, 2, 3, 4, 5};

   // Transfer data to device using Thrust
   thrust::device_vector<int> d_data(host_data, host_data + 5);

   // Sort data on the device
   thrust::sort(d_data.begin(), d_data.end());

   // Sum of the data on the device
   int sum = thrust::reduce(d_data.begin(), d_data.end(), 0, thrust::plus<int>());
   std::cout << "Sum of sorted data: " << sum << std::endl;

   // Transfer data back to host
   thrust::copy(d_data.begin(), d_data.end(), host_data);

   std::cout << "Sorted data: ";
   for (int i = 0; i < 5; ++i) {
      std::cout << host_data[i] << " ";
   }

   std::cout << std::endl;
}

Caveat:

nvcc -Wno-deprecated-gpu-targets -rdc=true hello_world.cu -arch=sm_61 -o hello_world && ./hello_world

需要加上 -arch=sm_61(我的GPU架构)来编译 Thrust代码,否则会报错。

terminate called after throwing an instance of 'thrust::THRUST_200802_SM_520_NS::system::system_error'
  what():  radix_sort: failed on 1st step: cudaErrorInvalidDeviceFunction: invalid device function

参考资料

THE END