推力自动选择GPU后端时,我提供了一个算法与迭代器从thrust::device_vector
,因为矢量的数据存在于GPU。但是,当我只向算法提供thrust::counting_iterator
参数时,如何选择它在哪个后端上执行呢?
在下面的thrust::find
调用中,没有device_vector
迭代器参数,那么推力如何选择使用哪个后端(CPU、OMP、TBB、CUDA)?
在不使用thrust::device_vector<>
的情况下,如何控制该算法在哪个后端上执行?
thrust::counting_iterator<uint64_t> first(i);
thrust::counting_iterator<uint64_t> last = first + step_size;
auto iter = thrust::find(
thrust::make_transform_iterator(first, functor),
thrust::make_transform_iterator(last, functor),
true);
更新23.01.14。MSVS2012,CUDA5.5,推力1.7
编译成功!
#include <iostream>
#include <thrust/iterator/counting_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <thrust/find.h>
#include <thrust/functional.h>
#include <thrust/execution_policy.h>
struct is_odd : public thrust::unary_function<uint64_t, bool> {
__host__ __device__ bool operator()(uint64_t const& x) {
return x & 1;
}
};
int main() {
thrust::counting_iterator<uint64_t> first(0);
thrust::counting_iterator<uint64_t> last = first + 100;
auto iter = thrust::find(thrust::device,
thrust::make_transform_iterator(first, is_odd()),
thrust::make_transform_iterator(last, is_odd()),
true);
int bbb; std::cin >> bbb;
return 0;
}
发布于 2013-12-30 04:46:36
有时候,执行推力算法的地方可能是模棱两可的,就像在您的counting_iterator
示例中一样,因为它相关的“后端系统”是thrust::any_system_tag
( counting_iterator
可以在任何地方被取消引用,因为它没有数据支持)。在这种情况下,推力将使用设备后端。默认情况下,这将是CUDA。但是,您可以通过几种方式显式控制执行的方式。
可以像ngimel的答案那样通过模板参数显式地指定系统,或者在示例中提供thrust::device
执行策略作为thrust::find
的第一个参数:
#include <thrust/execution_policy.h>
...
thrust::counting_iterator<uint64_t> first(i);
thrust::counting_iterator<uint64_t> last = first + step_size;
auto iter = thrust::find(thrust::device,
thrust::make_transform_iterator(first, functor),
thrust::make_transform_iterator(last, functor),
true);
这项技术要求推力1.7或更高。
发布于 2013-12-24 03:18:13
实例化counting_iterator时,必须指定系统模板参数:
typedef thrust::device_system_tag System;
thrust::counting_iterator<uint64_t,System> first(i)
发布于 2014-01-09 09:18:49
如果您使用的是当前版本的推力,请遵循Jared Hoberock提到的方式。但是,如果您可能使用旧版本(您工作的系统可能有旧版本的CUDA),那么下面的示例可能会有所帮助。
#include <thrust/version.h>
#if THRUST_MINOR_VERSION > 6
#include <thrust/execution_policy.h>
#elif THRUST_MINOR_VERSION == 6
#include <thrust/iterator/retag.h>
#else
#endif
...
#if THRUST_MINOR_VERSION > 6
total =
thrust::transform_reduce(
thrust::host
, thrust::counting_iterator<unsigned int>(0)
, thrust::counting_iterator<unsigned int>(N)
, AFunctor(), 0, thrust::plus<unsigned int>());
#elif THRUST_MINOR_VERSION == 6
total =
thrust::transform_reduce(
thrust::retag<thrust::host_system_tag>(thrust::counting_iterator<unsigned int>(0))
, thrust::retag<thrust::host_system_tag>(thrust::counting_iterator<unsigned int>(N))
, AFunctor(), 0, thrust::plus<unsigned int>());
#else
total =
thrust::transform_reduce(
thrust::counting_iterator<unsigned int, thrust::host_space_tag>(0)
, thrust::counting_iterator<unsigned int, thrust::host_space_tag>(objectCount)
, AFunctor(), 0, thrust::plus<unsigned int>());
#endif
@见Thrust: How to directly control where an algorithm invocation executes?
https://stackoverflow.com/questions/20753011
复制相似问题