DAY19:阅读纹理内存之Texture Gather

3.2.11.1.5. Cubemap Textures

A cubemap texture is a special type of two-dimensional layered texture that has six layers representing the faces of a cube:

· The width of a layer is equal to its height.

· The cubemap is addressed using three texture coordinates x, y, and z that are interpreted as a direction vector emanating from the center of the cube and pointing to one face of the cube and a texel within the layer corresponding to that face. More specifically, the face is selected by the coordinate with largest magnitude m and the corresponding layer is addressed using coordinates (s/m+1)/2 and (t/m+1)/2 where s and t are defined in Table 1.

Table 1. Cubemap Fetch

face

m

s

t

|x| > |y| and |x| > |z|

x > 0

0

x

-z

-y

x < 0

1

-x

z

-y

|y| > |x| and |y| > |z|

y > 0

2

y

x

z

y < 0

3

-y

x

-z

|z| > |x| and |z| > |y|

z > 0

4

z

x

-y

z < 0

5

-z

-x

-y

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayCubemap flag.

Cubemap textures are fetched using the device function described in texCubemap() and texCubemap().

Cubemap textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.6. Cubemap Layered Textures

A cubemap layered texture is a layered texture whose layers are cubemaps of same dimension.

A cubemap layered texture is addressed using an integer index and three floating-point texture coordinates; the index denotes a cubemap within the sequence and the coordinates address a texel within that cubemap.

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered and cudaArrayCubemap flags.

Cubemap layered textures are fetched using the device function described in texCubemapLayered() and texCubemapLayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.

Cubemap layered textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.7. Texture Gather

Texture gather is a special texture fetch that is available for two-dimensional textures only. It is performed by the tex2Dgather() function, which has the same parameters astex2D(), plus an additional comp parameter equal to 0, 1, 2, or 3 (see tex2Dgather() and tex2Dgather()). It returns four 32-bit numbers that correspond to the value of the componentcomp of each of the four texels that would have been used for bilinear filtering during a regular texture fetch. For example, if these texels are of values (253, 20, 31, 255), (250, 25, 29, 254), (249, 16, 37, 253), (251, 22, 30, 250), and comp is 2, tex2Dgather() returns (31, 29, 37, 30).

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering). For example, with an x texture coordinate of 2.49805: xB=x-0.5=1.99805, however the fractional part of xB is stored in an 8-bit fixed-point format. Since 0.99805 is closer to 256.f/256.f than it is to 255.f/256.f, xB has the value 2. A tex2Dgather() in this case would therefore return indices 2 and 3 in x, instead of indices 1 and 2.

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch.

Texture gather is only supported on devices of compute capability 2.0 and higher.

(扫描二维码查看Table 14)

本文备注/经验分享:

Cubemap Textures和Cubemap Layered Textures 这部分我们也不常用。很多书都直接越过这个cubemap的。

Texture Gather这个功能挺好的。例如说有个点p要插出来, 需要计算周围的4个点, 记做p0, p1, p2, p3;p.x 是对p0.x, p1.x, p2.x, p3.x的函数。而恰巧p.y 是对p0.y, p1.y, p2.y, p3.y的函数,如此类推...之前说过常见的都是用的连续的4个分量的向量表示一个点,这样正好会导致一种手册没有说过的, 但可以见于其他资料的叫寄存器bank conflict的东西,类似shared memory的bank conflict。 这个会将计算指令的吞吐率下降到1/4,损失4倍速度,而用这个texture gather过程,可以将原本的:

p0.x, p0.y, p0.z, p0.w p1.x, p1.y, p1.z, p1.w p2.x, p2.y, p2.z, p2.w p3.x, p3.y, p3.z, p3.w (横向是寄存器中的排列) 给转换成: p0.x, p1.x, p2.x, p3.x p0.y, p1.y, p2.y, p3.y p0.z, p1.z, p2.z, p3.z p0.w, p1.w, p2.w, p3.w 也就是相当于免费转置了行列,这种访问在寄存器中是高效的。所以不仅仅这个gather操作用起来方便(一次正好取得需要的特定的一行的某种分量如上图),而且可以立刻规避寄存器bank conflict。所以用起来很好很重要。 哪怕你不是在用双线性插值,有类似的访问方式的,都可以用到它。很好的一个特性, 否则这个需要用户的额外代码来处理,例如改变布局方式,而不用用户的劳动, 又不用texture gather的话,导致的bank conflict无法被maxwell/pascal的reuse flag给掩盖,所以这个特性很好的。

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering)

这个是说纹理坐标的精度问题。之前说过纹理了坐标和纹理自带的插值有很低的精度,需要高精度的话得自己来。这个是说纹理坐标方面只有8-bit的分数精度,然后可能导致的一系列问题。

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch. 这个就是字面意思,前一句说, 必须用CUDA Array。外加特定的标志,后一句说, 纹理的W*H比普通的纹理更有限制(这个限制可以查询设备属性)。 单独查询一下设备属性, 有个MaxTexture2DGather的属性的,那里会告诉你当前设备上的限制。

有不明白的地方,请在本文后留言

或者在我们的技术论坛bbs.gpuworld.cn上发帖

原文发布于微信公众号 - 吉浦迅科技(gpusolution)

原文发表时间:2018-05-25

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏CreateAMind

神经网络图灵机(Neural Turing Machines, NTM)论文完整翻译

Alex Graves gravesa@google.com Greg Wayne gregwayne@google.com Ivo Danihelka dan...

832
来自专栏新智元

PyTorch 最新版发布:API 变动,增加新特征,多项运算和加载速度提升

【新智元导读】PyTorch 发布了最新版,API 有一些变动,增加了一系列新的特征,多项运算或加载速度提升,而且修改了大量bug。官方文档也提供了一些示例。 ...

5417
来自专栏人工智能LeadAI

机器学习实战 | 第一章:sklearn常用工具介绍

写在前面: 花了大力气学了很多的理论,也用Python实现了其中大部分的算法.接下来开始就进入实战阶段了. 实战阶段有三个重点: 1.选择合适的机器学习框...

27310
来自专栏CreateAMind

神经网络图灵机(Neural Turing Machines, NTM)论文完整翻译

1194
来自专栏jeremy的技术点滴

使用keras破解验证码

7556
来自专栏懒人开发

(7.1)James Stewart Calculus 5th Edition:Integration by Parts

注意: 这样做,目的是为了 降阶, 如果转换后,对应的没有起到 降阶 的作用,就没有什么意义了

881
来自专栏owent

2018年的新通用伪随机数算法(xoshiro / xoroshiro)的C++(head only)实现

前段时间看到说Lua 5.4用了一种新的通用随机数算法,替换掉本来内部使用的CRT的随机数引擎。我看了一下大致的实现,CPU和空间复杂度任然保持了一个较低的水平...

802
来自专栏AI研习社

自动文本摘要

摘要的主要思想是找到包含整个集合的“信息”的数据子集。这种技术在今天的工业中被广泛使用。搜索引擎就是一个例子;其他的例子包括文档、图像集合和视频的汇总。文档摘要...

1161
来自专栏企鹅号快讯

输验证码输到崩溃?教你15分钟黑掉全球最流行的验证码插件

大数据文摘作品 编译:Katrine Ren、朝夕、钱天培 验证码这种东西真的是反人类。虽然它在保证账号安全、反作弊以及反广告有着至关重要的作用,但对于普通用户...

2288
来自专栏大数据文摘

手把手丨输验证码输到崩溃?教你15分钟黑掉全球最流行的验证码插件

1101

扫码关注云+社区