DAY19:阅读纹理内存之Texture Gather

3.2.11.1.5. Cubemap Textures

A cubemap texture is a special type of two-dimensional layered texture that has six layers representing the faces of a cube:

· The width of a layer is equal to its height.

· The cubemap is addressed using three texture coordinates x, y, and z that are interpreted as a direction vector emanating from the center of the cube and pointing to one face of the cube and a texel within the layer corresponding to that face. More specifically, the face is selected by the coordinate with largest magnitude m and the corresponding layer is addressed using coordinates (s/m+1)/2 and (t/m+1)/2 where s and t are defined in Table 1.

Table 1. Cubemap Fetch

face

m

s

t

|x| > |y| and |x| > |z|

x > 0

0

x

-z

-y

x < 0

1

-x

z

-y

|y| > |x| and |y| > |z|

y > 0

2

y

x

z

y < 0

3

-y

x

-z

|z| > |x| and |z| > |y|

z > 0

4

z

x

-y

z < 0

5

-z

-x

-y

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayCubemap flag.

Cubemap textures are fetched using the device function described in texCubemap() and texCubemap().

Cubemap textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.6. Cubemap Layered Textures

A cubemap layered texture is a layered texture whose layers are cubemaps of same dimension.

A cubemap layered texture is addressed using an integer index and three floating-point texture coordinates; the index denotes a cubemap within the sequence and the coordinates address a texel within that cubemap.

A layered texture can only be a CUDA array by calling cudaMalloc3DArray() with the cudaArrayLayered and cudaArrayCubemap flags.

Cubemap layered textures are fetched using the device function described in texCubemapLayered() and texCubemapLayered(). Texture filtering (see Texture Fetching) is done only within a layer, not across layers.

Cubemap layered textures are only supported on devices of compute capability 2.0 and higher.

3.2.11.1.7. Texture Gather

Texture gather is a special texture fetch that is available for two-dimensional textures only. It is performed by the tex2Dgather() function, which has the same parameters astex2D(), plus an additional comp parameter equal to 0, 1, 2, or 3 (see tex2Dgather() and tex2Dgather()). It returns four 32-bit numbers that correspond to the value of the componentcomp of each of the four texels that would have been used for bilinear filtering during a regular texture fetch. For example, if these texels are of values (253, 20, 31, 255), (250, 25, 29, 254), (249, 16, 37, 253), (251, 22, 30, 250), and comp is 2, tex2Dgather() returns (31, 29, 37, 30).

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering). For example, with an x texture coordinate of 2.49805: xB=x-0.5=1.99805, however the fractional part of xB is stored in an 8-bit fixed-point format. Since 0.99805 is closer to 256.f/256.f than it is to 255.f/256.f, xB has the value 2. A tex2Dgather() in this case would therefore return indices 2 and 3 in x, instead of indices 1 and 2.

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch.

Texture gather is only supported on devices of compute capability 2.0 and higher.

(扫描二维码查看Table 14)

本文备注/经验分享:

Cubemap Textures和Cubemap Layered Textures 这部分我们也不常用。很多书都直接越过这个cubemap的。

Texture Gather这个功能挺好的。例如说有个点p要插出来, 需要计算周围的4个点, 记做p0, p1, p2, p3;p.x 是对p0.x, p1.x, p2.x, p3.x的函数。而恰巧p.y 是对p0.y, p1.y, p2.y, p3.y的函数,如此类推...之前说过常见的都是用的连续的4个分量的向量表示一个点,这样正好会导致一种手册没有说过的, 但可以见于其他资料的叫寄存器bank conflict的东西,类似shared memory的bank conflict。 这个会将计算指令的吞吐率下降到1/4,损失4倍速度,而用这个texture gather过程,可以将原本的:

p0.x, p0.y, p0.z, p0.w p1.x, p1.y, p1.z, p1.w p2.x, p2.y, p2.z, p2.w p3.x, p3.y, p3.z, p3.w (横向是寄存器中的排列) 给转换成: p0.x, p1.x, p2.x, p3.x p0.y, p1.y, p2.y, p3.y p0.z, p1.z, p2.z, p3.z p0.w, p1.w, p2.w, p3.w 也就是相当于免费转置了行列,这种访问在寄存器中是高效的。所以不仅仅这个gather操作用起来方便(一次正好取得需要的特定的一行的某种分量如上图),而且可以立刻规避寄存器bank conflict。所以用起来很好很重要。 哪怕你不是在用双线性插值,有类似的访问方式的,都可以用到它。很好的一个特性, 否则这个需要用户的额外代码来处理,例如改变布局方式,而不用用户的劳动, 又不用texture gather的话,导致的bank conflict无法被maxwell/pascal的reuse flag给掩盖,所以这个特性很好的。

Note that texture coordinates are computed with only 8 bits of fractional precision. tex2Dgather() may therefore return unexpected results for cases where tex2D() would use 1.0 for one of its weights (α or β, see Linear Filtering)

这个是说纹理坐标的精度问题。之前说过纹理了坐标和纹理自带的插值有很低的精度,需要高精度的话得自己来。这个是说纹理坐标方面只有8-bit的分数精度,然后可能导致的一系列问题。

Texture gather is only supported for CUDA arrays created with the cudaArrayTextureGather flag and of width and height less than the maximum specified in Table 14 for texture gather, which is smaller than for regular texture fetch. 这个就是字面意思,前一句说, 必须用CUDA Array。外加特定的标志,后一句说, 纹理的W*H比普通的纹理更有限制(这个限制可以查询设备属性)。 单独查询一下设备属性, 有个MaxTexture2DGather的属性的,那里会告诉你当前设备上的限制。

有不明白的地方,请在本文后留言

或者在我们的技术论坛bbs.gpuworld.cn上发帖

原文发布于微信公众号 - 吉浦迅科技(gpusolution)

原文发表时间:2018-05-25

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

发表于

我来说两句

0 条评论
登录 后参与评论

相关文章

来自专栏海天一树

NOIP 2011初赛普及组C/C++答案详解

3 C 8G = 8 * 1024 M 8 * 1024 / 2 = 4096张 注意,题目说的是“大约”,不要求精确。

932
来自专栏C语言及其他语言

【每日一题】问题 1429[蓝桥杯][历届试题]兰顿蚂蚁

题目描述 ? 兰顿蚂蚁,是于1986年,由克里斯·兰顿提出来的,属于细胞自动机的一种。 平面上的正方形格子被填上黑色或白色。在其中一格正方形内有...

2686
来自专栏大数据钻研

canvas图形绘制之星空、噪点与烟雾效果

一、三合一 三个效果合成一篇文章。 有多个小伙伴问我,为何不开个公众号,现在都是移动时代,你博客文章写好后,公众号再复制一份,花不了多长时间,同时传播方便迅速,...

2944
来自专栏java一日一条

有没有一段代码,让你觉得人类的智慧也可以璀璨无比?

Kyle McCormick 在 StackExchange 上发起了一个叫做 Tweetable Mathematical Art 的比赛,参赛者需要用三条推...

573
来自专栏数据结构与算法

BZOJ4709: [Jsoi2011]柠檬(决策单调性)

那么设\(f[i]\)表示到第\(i\)个位置的最大价值,\(s[i]\)表示到\(i\)位置,\(a[i]\)的出现次数,转移方程为

302
来自专栏数据小魔方

R语言可视化——地图与气泡图结合应用

今天跟大家分享如何在地图上进行散点图、气泡图绘制。 昨天跟大家介绍了ggplot函数进行地图绘制的原理,通过轮廓点和分组来定义每一个地区(国家边界),通过多边形...

2954
来自专栏落影的专栏

程序员进阶之算法练习(十一)有感而发

前言 经过这几年的观察,我发现,国内本科高校的ACM集训队,往往汇聚着该校相对靠谱的那一批人。 拿本校举例,队内的众学长学姐毕业之后,有去国内top2的高校...

34710
来自专栏较真的前端

为什么 CSS 这么难学?

1664
来自专栏工科狗和生物喵

【计算机本科补全计划】CCF 2016_09_04 交通规划 (Dijkstra - 单源最短路径算法)

具体的想法来自下面这篇写的很好的博客,当然,他的代码很复杂,不如我的精简,但是解释这个算法的手法比我好得多!

1012
来自专栏BestSDK

14万程序员挑战过的算法题,看看你处于哪个阶段?(附答案)

程序员都想挑战这四道算法趣题!通过挑战你也可以看到自己大体处于哪个级别。 在挑战之前,先介绍下问题的具体形式: 每个问题大致分为“问题”和“详解”两部分。 请各...

3264

扫码关注云+社区