DAY21:阅读CUDA Array

3.2.11.3. CUDA Arrays

CUDA arrays are opaque memory layouts optimized for texture fetching. They are one dimensional, two dimensional, or three-dimensional and composed of elements, each of which has 1, 2 or 4 components that may be signed or unsigned 8-, 16-, or 32-bit integers, 16-bit floats, or 32-bit floats. CUDA arrays are only accessible by kernels through texture fetching as described in Texture Memory or surface reading and writing as described in Surface Memory.

3.2.11.4. Read/Write Coherency

The texture and surface memory is cached (see Device Memory Accesses) and within the same kernel call, the cache is not kept coherent with respect to global memory writes and surface memory writes, so any texture fetch or surface read to an address that has been written to via a global write or a surface write in the same kernel call returns undefined data. In other words, a thread can safely read some texture or surface memory location only if this memory location has been updated by a previous kernel call or memory copy, but not if it has been previously updated by the same thread or another thread from the same kernel call.

3.2.12. Graphics Interoperability

Some resources from OpenGL and Direct3D may be mapped into the address space of CUDA, either to enable CUDA to read data written by OpenGL or Direct3D, or to enable CUDA to write data for consumption by OpenGL or Direct3D.

A resource must be registered to CUDA before it can be mapped using the functions mentioned in OpenGL Interoperability and Direct3D Interoperability. These functions return a pointer to a CUDA graphics resource of type struct cudaGraphicsResource. Registering a resource is potentially high-overhead and therefore typically called only once per resource. A CUDA graphics resource is unregistered using cudaGraphicsUnregisterResource(). Each CUDA context which intends to use the resource is required to register it separately.

Once a resource is registered to CUDA, it can be mapped and unmapped as many times as necessary using cudaGraphicsMapResources() and cudaGraphicsUnmapResources().cudaGraphicsResourceSetMapFlags() can be called to specify usage hints (write-only, read-only) that the CUDA driver can use to optimize resource management.

A mapped resource can be read from or written to by kernels using the device memory address returned by cudaGraphicsResourceGetMappedPointer() for buffers andcudaGraphicsSubResourceGetMappedArray() for CUDA arrays.

Accessing a resource through OpenGL, Direct3D, or another CUDA context while it is mapped produces undefined results. OpenGL Interoperability and Direct3D Interoperability give specifics for each graphics API and some code samples. SLI Interoperability gives specifics for when the system is in SLI mode.

本文备注/经验分享:

CUDA Array——

CUDA Array是一种为纹理拾取优化过布局的存储,具体存储布局对用户来说是不透明的。它由1维,2维,或者3维的元素(纹元)构成,每个元素可以有1个,2个,或者4个分量(注意并没有3个分量的元素)。而每个分量则是8-bit, 16-bit, 32-bit的整数(有符号或者无符号),或者16-bit和32-bit浮点数构成。CUDA Array在kernel里只能通过texture的拾取(读取),或者Surface的读写来访问。如同之前在Texture Memory章节和Surface Memory章节那里描述的一样。

CUDA Array是普通的数组么?这个不是普通的数组的。普通的数组布局是知道的(一个元素接着一个元素,先行,再列),而这个的布局NV不告诉你的。你只需要知道是一种优化过的秘密布局方式即可。这个是和普通的数组的最大区别。如果你想知道内部的秘密,网上有第三方资料(特别是AMD的资料)可以告诉你内部的真实情况。

Read/Write Coherency 读取和写入的一致性问题:Texture和Surface的存储经过缓存,在同一次kernel启动期间,这个(读取用的)缓存不维持和普通的global memoryx写入的一致性,也不维持和Surface写入的一致性。所以试图在同一次kernel启动内部,试图通过Texture拾取或者Surface读取一个刚刚通过普通global memory(指针)写入过的,或者Surface写入过的地址,返回的结果将是未定义的。(这就是我们之前说过的,本次的写入,下一次启动才能生效)。注意因为texture和surface的后备存储前者可能是普通线性内存或者CUDA Array, 或者是CUDA Array,可能通过普通写入或者surface写入来改变内容的,所以这里两种都说了。但这个写入本次如果立刻读取来用,值是未定义的。(可能读取到你写入后的新值,也可能读取到写入之前的老值,甚至可能是这两种的混合情况。所以说将读取到未定义的结果),换句话说,一个(设备端)的线程,如果想安全的读取到一些texture或者surface的内容,那么必须是之前通过cudaMemcpy*()系列函数,或者是之前的kernel改写过才可以。而不是同样的一次kernel调用期间,被这个线程自己,或者其他线程改写过。

Graphics Interoperability 这部分是讲OpenGL和Direct3D互操作这个,我们对这部分不是很熟悉,所以就不讲了,抱歉了各位!

有不明白的地方,请在本文后留言

或者在我们的技术论坛bbs.gpuworld.cn上发帖

原文发布于微信公众号 - 吉浦迅科技(gpusolution)

原文发表时间:2018-05-28

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏转载gongluck的CSDN博客

cocos2dx 打灰机

#include "GamePlane.h" #include "PlaneSprite.h" #include "BulletNode.h" #include...

7286
来自专栏闻道于事

js登录滑动验证,不滑动无法登陆

js的判断这里是根据滑块的位置进行判断,应该是用一个flag判断 <%@ page language="java" contentType="text/html...

8808
来自专栏pangguoming

Spring Boot集成JasperReports生成PDF文档

由于工作需要,要实现后端根据模板动态填充数据生成PDF文档,通过技术选型,使用Ireport5.6来设计模板,结合JasperReports5.6工具库来调用渲...

1.4K7
来自专栏hbbliyong

WPF Trigger for IsSelected in a DataTemplate for ListBox items

<DataTemplate DataType="{x:Type vm:HeaderSlugViewModel}"> <vw:HeaderSlug...

4224
来自专栏一个爱瞎折腾的程序猿

sqlserver使用存储过程跟踪SQL

USE [master] GO /****** Object: StoredProcedure [dbo].[sp_perfworkload_trace_s...

2960
来自专栏我和未来有约会

Silverlight第三方控件专题

这里我收集整理了目前网上silverlight第三方控件的专题,若果有所遗漏请告知我一下。 名称 简介 截图 telerik 商 RadC...

4405
来自专栏张善友的专栏

Miguel de Icaza 细说 Mix 07大会上的Silverlight和DLR

Mono之父Miguel de Icaza 详细报道微软Mix 07大会上的Silverlight和DLR ,上面还谈到了Mono and Silverligh...

3007
来自专栏我和未来有约会

Kit 3D 更新

Kit3D is a 3D graphics engine written for Microsoft Silverlight. Kit3D was inita...

2936
来自专栏大内老A

The .NET of Tomorrow

Ed Charbeneau(http://developer.telerik.com/featured/the-net-of-tomorrow/) Exciti...

39210
来自专栏张善友的专栏

Mix 10 上的asp.net mvc 2的相关Session

Beyond File | New Company: From Cheesy Sample to Social Platform Scott Hansel...

2787

扫码关注云+社区