专栏首页吉浦迅科技DAY19:阅读纹理内存之Texture Gather

DAY19:阅读纹理内存之Texture Gather

3.2.11.1.5. Cubemap Textures

A cubemap texture is a special type of two-dimensional layered texture that has six layers representing the faces of a cube:

· The width of a layer is equal to its height.

· The cubemap is addressed using three texture coordinates x, y, and z that are interpreted as a direction vector emanating from the center of the cube and pointing to one face of the cube and a texel within the layer corresponding to that face. More specifically, the face is selected by the coordinate with largest magnitude m and the corresponding layer is addressed using coordinates (s/m+1)/2 and (t/m+1)/2 where s and t are defined in Table 1.

Table 1. Cubemap Fetch

face

m

s

t

|x| > |y| and |x| > |z|

x > 0

0

x

-z

-y

x < 0

1

-x

z

-y

|y| > |x| and |y| > |z|

y > 0

2

y

x

z

y < 0

3

-y

x

-z

|z| > |x| and |z| > |y|

z > 0

4

z

x

-y

z < 0

5

-z

-x

-y

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayCubemap flag.

Cubemap textures are fetched using the device function described in texCubemap() and texCubemap().

Cubemap textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.6. Cubemap Layered Textures

A cubemap layered texture is a layered texture whose layers are cubemaps of same dimension.

A cubemap layered texture is addressed using an integer index and three floating-point texture coordinates; the index denotes a cubemap within the sequence and the coordinates address a texel within that cubemap.

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered and cudaArrayCubemap flags.

Cubemap layered textures are fetched using the device function described in texCubemapLayered() and texCubemapLayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.

Cubemap layered textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.7. Texture Gather

Texture gather is a special texture fetch that is available for two-dimensional textures only. It is performed by the tex2Dgather() function, which has the same parameters astex2D(), plus an additional comp parameter equal to 0, 1, 2, or 3 (see tex2Dgather() and tex2Dgather()). It returns four 32-bit numbers that correspond to the value of the componentcomp of each of the four texels that would have been used for bilinear filtering during a regular texture fetch. For example, if these texels are of values (253, 20, 31, 255), (250, 25, 29, 254), (249, 16, 37, 253), (251, 22, 30, 250), and comp is 2, tex2Dgather() returns (31, 29, 37, 30).

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering). For example, with an x texture coordinate of 2.49805: xB=x-0.5=1.99805, however the fractional part of xB is stored in an 8-bit fixed-point format. Since 0.99805 is closer to 256.f/256.f than it is to 255.f/256.f, xB has the value 2. A tex2Dgather() in this case would therefore return indices 2 and 3 in x, instead of indices 1 and 2.

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch.

Texture gather is only supported on devices of compute capability 2.0 and higher.

(扫描二维码查看Table 14)

本文备注/经验分享:

Cubemap Textures和Cubemap Layered Textures 这部分我们也不常用。很多书都直接越过这个cubemap的。

Texture Gather这个功能挺好的。例如说有个点p要插出来, 需要计算周围的4个点, 记做p0, p1, p2, p3;p.x 是对p0.x, p1.x, p2.x, p3.x的函数。而恰巧p.y 是对p0.y, p1.y, p2.y, p3.y的函数,如此类推...之前说过常见的都是用的连续的4个分量的向量表示一个点,这样正好会导致一种手册没有说过的, 但可以见于其他资料的叫寄存器bank conflict的东西,类似shared memory的bank conflict。 这个会将计算指令的吞吐率下降到1/4,损失4倍速度,而用这个texture gather过程,可以将原本的:

p0.x, p0.y, p0.z, p0.w p1.x, p1.y, p1.z, p1.w p2.x, p2.y, p2.z, p2.w p3.x, p3.y, p3.z, p3.w (横向是寄存器中的排列) 给转换成: p0.x, p1.x, p2.x, p3.x p0.y, p1.y, p2.y, p3.y p0.z, p1.z, p2.z, p3.z p0.w, p1.w, p2.w, p3.w 也就是相当于免费转置了行列,这种访问在寄存器中是高效的。所以不仅仅这个gather操作用起来方便(一次正好取得需要的特定的一行的某种分量如上图),而且可以立刻规避寄存器bank conflict。所以用起来很好很重要。 哪怕你不是在用双线性插值,有类似的访问方式的,都可以用到它。很好的一个特性, 否则这个需要用户的额外代码来处理,例如改变布局方式,而不用用户的劳动, 又不用texture gather的话,导致的bank conflict无法被maxwell/pascal的reuse flag给掩盖,所以这个特性很好的。

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering)

这个是说纹理坐标的精度问题。之前说过纹理了坐标和纹理自带的插值有很低的精度,需要高精度的话得自己来。这个是说纹理坐标方面只有8-bit的分数精度,然后可能导致的一系列问题。

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch. 这个就是字面意思,前一句说, 必须用CUDA Array。外加特定的标志,后一句说, 纹理的W*H比普通的纹理更有限制(这个限制可以查询设备属性)。 单独查询一下设备属性, 有个MaxTexture2DGather的属性的,那里会告诉你当前设备上的限制。

有不明白的地方,请在本文后留言

或者在我们的技术论坛bbs.gpuworld.cn上发帖

本文分享自微信公众号 - 吉浦迅科技(gpusolution),作者:GPU世界论坛

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-05-25

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • DAY15:阅读CUDA C runtime之纹理内存

    GPUS Lady
  • DAY17:阅读纹理内存之纹理引用API

    GPUS Lady
  • DAY18:阅读纹理内存之Layered Textures

    GPUS Lady
  • DAY15:阅读CUDA C runtime之纹理内存

    GPUS Lady
  • 数据库攻击下LQ自适应控制的后悔范围(扩展版)(CS.SY)

    本文关注的是了解和应对数据库攻击对基于学习的线性二次自适应控制器的影响。这种攻击既不针对传感器也不针对执行器,而只是破坏了作为调节方案一部分的学习算法和参数估计...

    蔡小雪7100294
  • 将ABAP透明表的定义(元数据)解析出来导入到剪切板(clipboard)里

    Recently I am planning an internal training regarding Software engineering conce...

    Jerry Wang
  • SAP ABAP Netweaver里的胖接口(fat interface)

    Recently I am planning an internal training regarding Software engineering conce...

    Jerry Wang
  • CoppeliaSim(V-Rep)和ROS2的使用说明

    版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。

    zhangrelay
  • Deepfakes 的产生和检测:调查报告(CS CV)

    生成式深度学习算法已经发展到了难以分辨真假的地步。2018年,人们发现这种技术很容易被用于不道德和恶意的应用,如传播错误信息、冒充政治领导人和诽谤无辜的个体等。...

    刘持诚
  • 经典Keller- Segel模型的完全离散逼近分析:下界和先验界(CS NA)

    本文研究了经典凯勒-西格尔模型的趋化性问题。它由一个非线性抛物方程系统组成,其中未知数是细胞(或生物体)的平均密度(守恒变量)和化学吸引的平均密度。

    非过度曝光

扫码关注云+社区

领取腾讯云代金券