OpenCL一直被软件工程师诟病说很难学习,但我觉得这是不公平的。OpenCL API的通用性,导致了它比较繁琐。一旦你写了一些OpenCL代码,你就会意识到很多运行在host处理器上的 代码实际上是 boilerplate.
我会用 PyOpenCL - a neat Python module written by Andreas Klöckner. (If you are reading this Andreas, keep up the good work!)
请安装 PyOpenCL 和 NumPy - 你已经准备好了l!
In [1]:
import pyopencl as cl
import numpy asnp
Trivial example
Suppose we want to create an array containing integers from 0 to 15.
In [2]: N =16That's trivial using NumPy:
In [3]:np_range = np.arange(N, dtype=np.int32)
np_range
Out[3]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)
but our PyOpenCL example will fill in a similar array using OpenCL:
In [4]:cl_range = np.zeros(N, dtype=np.int32)
cl_range
Out[4]:array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)
Gimme some context!
Creating a context could hardly be easier:
In [5]: context=cl.create_some_context()Ditto creating a command queue:
In [6]: queue=cl.CommandQueue(context)Building and running
An OpenCL C program equivalent to np.arange(N) follows:
In [7]:source = '''
kernel void arange(global int * buffer)
{
const size_t gid = get_global_id(0);
buffer[gid] = convert_int(gid);
}
'''
The kernel will be launched as N work-items over a one-dimensional range [0, N-1]. Each work-item will get its unique index gid in the range (that is, an integer between 0 and N-1inclusive) and write it into argument buffer at offset gid.
Let's build the program:
In [8]: program = cl.Program(context, source).build()allocate a memory buffer:
In [9]: memory_flags =cl.mem_flags.WRITE_ONLY | cl.mem_flags.ALLOC_HOST_PTR memory = cl.Buffer(context, flags=memory_flags, size=cl_range.nbytes)launch the kernel:
In [10]: kernel = program.arange(queue, [N], None, memory)and copy the results from the buffer to cl_range:
In [11]:cl.enqueue_copy(queue, cl_range, memory, wait_for=[kernel])
cl_range
Out[11]:array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)
Finally, let's confirm that arrays np_range and cl_range match element-wise:
In [12]: np.all(np_range == cl_range)Out[12]:True
说明
我很乐意得到反馈。我的目标主要已经熟悉的OpenCL的软件工程师,这样我就可以直接跳到更高级的主题。但如果你觉得你喜欢多一点的解释,只管问:我会很乐意解释,或指向你一些优秀的学习资源在那里,或者考在我以后的文章做补充...
Also, I prepared this post using the wonderful IPython Notebook environment. I haven't seen much in the way of using PyOpenCL and IPython together, so would be very grateful for any links.
在下次的帖子里,我会介绍如何在ARM@Mali-T7600 GPU上优化 OpenCL kernerls.