前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >[教程] 系列报道——PyOpenCL介绍

[教程] 系列报道——PyOpenCL介绍

作者头像
GPUS Lady
发布2018-03-30 11:54:51
2.5K0
发布2018-03-30 11:54:51
举报
文章被收录于专栏:GPUS开发者GPUS开发者GPUS开发者

OpenCL一直被软件工程师诟病说很难学习,但我觉得这是不公平的。OpenCL API的通用性,导致了它比较繁琐。一旦你写了一些OpenCL代码,你就会意识到很多运行在host处理器上的 代码实际上是 boilerplate.

我会用 PyOpenCL - a neat Python module written by Andreas Klöckner. (If you are reading this Andreas, keep up the good work!)

请安装 PyOpenCL 和 NumPy - 你已经准备好了l!

In [1]:

import pyopencl as cl

import numpy asnp

Trivial example

Suppose we want to create an array containing integers from 0 to 15.

In [2]: N =16That's trivial using NumPy:

In [3]:np_range = np.arange(N, dtype=np.int32)

np_range

Out[3]:

array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

but our PyOpenCL example will fill in a similar array using OpenCL:

In [4]:cl_range = np.zeros(N, dtype=np.int32)

cl_range

Out[4]:array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

Gimme some context!

Creating a context could hardly be easier:

In [5]: context=cl.create_some_context()Ditto creating a command queue:

In [6]: queue=cl.CommandQueue(context)Building and running

An OpenCL C program equivalent to np.arange(N) follows:

In [7]:source = '''

kernel void arange(global int * buffer)

{

const size_t gid = get_global_id(0);

buffer[gid] = convert_int(gid);

}

'''

The kernel will be launched as N work-items over a one-dimensional range [0, N-1]. Each work-item will get its unique index gid in the range (that is, an integer between 0 and N-1inclusive) and write it into argument buffer at offset gid.

Let's build the program:

In [8]: program = cl.Program(context, source).build()allocate a memory buffer:

In [9]: memory_flags =cl.mem_flags.WRITE_ONLY | cl.mem_flags.ALLOC_HOST_PTR memory = cl.Buffer(context, flags=memory_flags, size=cl_range.nbytes)launch the kernel:

In [10]: kernel = program.arange(queue, [N], None, memory)and copy the results from the buffer to cl_range:

In [11]:cl.enqueue_copy(queue, cl_range, memory, wait_for=[kernel])

cl_range

Out[11]:array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], dtype=int32)

Finally, let's confirm that arrays np_range and cl_range match element-wise:

In [12]: np.all(np_range == cl_range)Out[12]:True

说明

我很乐意得到反馈。我的目标主要已经熟悉的OpenCL的软件工程师,这样我就可以直接跳到更高级的主题。但如果你觉得你喜欢多一点的解释,只管问:我会很乐意解释,或指向你一些优秀的学习资源在那里,或者考在我以后的文章做补充...

Also, I prepared this post using the wonderful IPython Notebook environment. I haven't seen much in the way of using PyOpenCL and IPython together, so would be very grateful for any links.

在下次的帖子里,我会介绍如何在ARM@Mali-T7600 GPU上优化 OpenCL kernerls.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2014-03-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 GPUS开发者 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档