DAY19：阅读纹理内存之Texture Gather

GPUS Lady

发布于 2018-06-22 18:15:05

3.6K0

3.2.11.1.5. Cubemap Textures

A cubemap texture is a special type of two-dimensional layered texture that has six layers representing the faces of a cube:

· The width of a layer is equal to its height.

· The cubemap is addressed using three texture coordinates x, y, and z that are interpreted as a direction vector emanating from the center of the cube and pointing to one face of the cube and a texel within the layer corresponding to that face. More specifically, the face is selected by the coordinate with largest magnitude m and the corresponding layer is addressed using coordinates (s/m+1)/2 and (t/m+1)/2 where s and t are defined in Table 1.

Table 1. Cubemap Fetch
		face	m	s	t
\|x\| > \|y\| and \|x\| > \|z\|	x > 0	0	x	-z	-y
\|x\| > \|y\| and \|x\| > \|z\|	x < 0	1	-x	z	-y
\|y\| > \|x\| and \|y\| > \|z\|	y > 0	2	y	x	z
\|y\| > \|x\| and \|y\| > \|z\|	y < 0	3	-y	x	-z
\|z\| > \|x\| and \|z\| > \|y\|	z > 0	4	z	x	-y
\|z\| > \|x\| and \|z\| > \|y\|	z < 0	5	-z	-x	-y

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayCubemap flag.

Cubemap textures are fetched using the device function described in texCubemap() and texCubemap().

Cubemap textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.6. Cubemap Layered Textures

A cubemap layered texture is a layered texture whose layers are cubemaps of same dimension.

A cubemap layered texture is addressed using an integer index and three floating-point texture coordinates; the index denotes a cubemap within the sequence and the coordinates address a texel within that cubemap.

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered and cudaArrayCubemap flags.

Cubemap layered textures are fetched using the device function described in texCubemapLayered() and texCubemapLayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.

Cubemap layered textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.7. Texture Gather

Texture gather is a special texture fetch that is available for two-dimensional textures only. It is performed by the tex2Dgather() function, which has the same parameters astex2D(), plus an additional comp parameter equal to 0, 1, 2, or 3 (see tex2Dgather() and tex2Dgather()). It returns four 32-bit numbers that correspond to the value of the componentcomp of each of the four texels that would have been used for bilinear filtering during a regular texture fetch. For example, if these texels are of values (253, 20, 31, 255), (250, 25, 29, 254), (249, 16, 37, 253), (251, 22, 30, 250), and comp is 2, tex2Dgather() returns (31, 29, 37, 30).

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering). For example, with an x texture coordinate of 2.49805: xB=x-0.5=1.99805, however the fractional part of xB is stored in an 8-bit fixed-point format. Since 0.99805 is closer to 256.f/256.f than it is to 255.f/256.f, xB has the value 2. A tex2Dgather() in this case would therefore return indices 2 and 3 in x, instead of indices 1 and 2.

Texture gather is only supported on devices of compute capability 2.0 and higher.

（扫描二维码查看Table 14）

本文备注/经验分享：

Cubemap Textures和Cubemap Layered Textures 这部分我们也不常用。很多书都直接越过这个cubemap的。

Texture Gather这个功能挺好的。例如说有个点p要插出来, 需要计算周围的4个点, 记做p0, p1, p2, p3；p.x 是对p0.x, p1.x, p2.x, p3.x的函数。而恰巧p.y 是对p0.y, p1.y, p2.y, p3.y的函数，如此类推...之前说过常见的都是用的连续的4个分量的向量表示一个点，这样正好会导致一种手册没有说过的, 但可以见于其他资料的叫寄存器bank conflict的东西，类似shared memory的bank conflict。这个会将计算指令的吞吐率下降到1/4，损失4倍速度，而用这个texture gather过程，可以将原本的:

p0.x, p0.y, p0.z, p0.w p1.x, p1.y, p1.z, p1.w p2.x, p2.y, p2.z, p2.w p3.x, p3.y, p3.z, p3.w (横向是寄存器中的排列) 给转换成: p0.x, p1.x, p2.x, p3.x p0.y, p1.y, p2.y, p3.y p0.z, p1.z, p2.z, p3.z p0.w, p1.w, p2.w, p3.w 也就是相当于免费转置了行列，这种访问在寄存器中是高效的。所以不仅仅这个gather操作用起来方便(一次正好取得需要的特定的一行的某种分量如上图)，而且可以立刻规避寄存器bank conflict。所以用起来很好很重要。哪怕你不是在用双线性插值，有类似的访问方式的，都可以用到它。很好的一个特性，否则这个需要用户的额外代码来处理，例如改变布局方式，而不用用户的劳动, 又不用texture gather的话，导致的bank conflict无法被maxwell/pascal的reuse flag给掩盖，所以这个特性很好的。

这个是说纹理坐标的精度问题。之前说过纹理了坐标和纹理自带的插值有很低的精度，需要高精度的话得自己来。这个是说纹理坐标方面只有8-bit的分数精度，然后可能导致的一系列问题。

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch. 这个就是字面意思，前一句说, 必须用CUDA Array。外加特定的标志，后一句说, 纹理的W*H比普通的纹理更有限制(这个限制可以查询设备属性)。单独查询一下设备属性, 有个MaxTexture2DGather的属性的，那里会告诉你当前设备上的限制。

有不明白的地方，请在本文后留言

或者在我们的技术论坛bbs.gpuworld.cn上发帖

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2018-05-25，如有侵权请联系 cloudcommunity@tencent.com 删除

深度学习

单片机

本文分享自 GPUS开发者微信公众号，前往查看

如有侵权，请联系 cloudcommunity@tencent.com 删除。

本文参与腾讯云自媒体同步曝光计划，欢迎热爱写作的你一起参与！

深度学习

单片机

登录后参与评论

0 条评论

热度