GPUS开发者-腾讯云开发者社区

开发者社区

文档建议反馈控制台

最新优惠活动

文章/答案/技术大牛

发布

GPUS开发者

专注NVIDIA Jetson产品开发。

专栏成员

1106

文章

1830170

阅读量

207

订阅数

在NVIDIA Jetson上开发要知道的知识点

https 网络安全 kernel html

本周二晚，针对参加NVIDIA Jetson开发大赛的开发者们，NVIDIA做了一个内部培训，我们把培训讲座整理了一下，Highlight几个关键点（尤其是几个规格对比图，大家一定要保存）：

2023-01-04

9120

CUDA优化冷知识23|如何执行配置优化以及对性能调优的影响

kernel 深度学习单片机 vr 视频解决方案 api

这一系列文章面向CUDA开发者来解读《CUDA C Best Practices Guide》（CUDA C最佳实践指南） CUDA优化冷知识22|测量Occupancy的三种方式我们今天主要进行<CUDA Best Practices Guide>的章节10的剩余内容https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#occupancy，也就是接上一篇的occupancy后面，继续说说寄存器的延迟掩盖，blocks

2022-08-31

1.2K0

CUDA优化冷知识22|测量Occupancy的三种方式

kernel 深度学习单片机编程算法

这一系列文章面向CUDA开发者来解读《CUDA C Best Practices Guide》（CUDA C最佳实践指南） CUDA优化冷知识21|occupancy越高越好么？ CUDA优化冷知识20|不改变代码本身如何提升性能？一般的来说, occupancy往往有个折中点, 过高了或者过低了性能都不好. (就如同你干得过少, 或者干得过累都不好一样). 好了, 我们有了occupancy的概念, 知道了无需一味的去追逐occupancy, 就已经是一个很大的胜利了. 我们下面将具体看一下, 如

2022-08-29

5500

NVIDIA JetPack 4.6来了

新升级的JetPack 4.6 ，支持所有 Jetson 模块，包括 Jetson AGX Xavier 工业模组。JetPack 4.6 包括对Triton 推理服务器的支持、新版本的 CUDA、cuDNN 和 TensorRT、支持新计算机视觉算法和 python 绑定的VPI 1.1、具有无线更新功能、安全功能和新的 L4T 32.6.1刷机工具。

2021-09-22

2.1K0

CUDA优化冷知识21|occupancy越高越好么？

kernel 深度学习

这一系列文章面向CUDA开发者来解读《CUDA C Best Practices Guide》（CUDA C最佳实践指南）

2021-03-12

1.6K0

CUDA优化冷知识20|不改变代码本身如何提升性能？

kernel 单片机打包 ide 深度学习

这一系列文章面向CUDA开发者来解读《CUDA C Best Practices Guide》（CUDA C最佳实践指南）

2021-03-12

4520

CUDA优化的冷知识19|constant和寄存器

单片机深度学习 ide kernel

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html 来阅读原文。

2021-02-05

6270

CUDA优化的冷知识16|纹理存储优势（2）

编程算法 kernel 深度学习

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html 来阅读原文。

2021-02-05

4790

CUDA优化的冷知识 7 |GPU端Event计时的重要特色

kernel cuda event gpu jobs

我们在上面的内容中说过, cuda event计时还有它的丰富的特色, 你已经看到了它能正确的计时, 还不耽误老板(CPU)上的提前半夜调度的便利. 我们下一个要讲的, 就是说它可以方便的跨流, 跨一堆任务进行计时. 但在说这个特色前, 我们需要将手册的一点说法进行修正。

2021-01-06

6500

CUDA优化的冷知识 6 |GPU端的CUDA Event计时

深度学习 kernel

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html 来阅读原文。

2021-01-06

1.2K0

CUDA优化的冷知识 5 | 似是而非的计时方法

深度学习 windows kernel linux

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html 来阅读原文。

2021-01-06

9910

CUDA优化的冷知识 4 | 打工人的时间是如何计算的

kernel 深度学习

https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html 来阅读原文。

2021-01-06

7940

CUDA优化的冷知识|什么是APOD开发模型？

https kernel 深度学习网络安全

大家可以访问：https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html 来阅读原文。

2020-12-21

8230

大佬，开机卡在using random host ethernet address怎么搞？

ssh kernel ethernet host nvidia

今天中午，一个用户在我们技术群里求救，说他的Xavier开发套件接显示器（转接头的）nvidia图标一闪就没后续了，无信号；接HDMI接口（只有投影仪），就卡在using random host ethernet address

2020-12-08

1.5K0

今天学习如何用TLT和TensorRT做路标识别的训练和推理，你学废了么？

深度学习 kernel 编程算法

今天（9月20日）早上9点30分开始，参加第二届Sky Hackathon大赛的学生团队、导师和旁听的开发者，近200人参加了NVIDIA举办的赛前在线培训课程。

2020-09-25

1.4K0

CUDA新手要首先弄清楚的这些问题

kernel https 深度学习网络安全编程算法

1 问：当下一个新的GPU架构发布时，我必须重写我的CUDA内核吗? 答复：不需要重写的，CUDA具有高层次的描述能力（抽象能力），同时CUDA编译器生成的PTX代码也不是固定于特定硬件的。这样在运

2019-11-11

1.8K0

如何自定义Jetson NANO 40-pin 扩展头

linux windows kernel

默认情况下，所有接口信号引脚都配置为GPIO输入，除了引脚3和5、引脚27和28 (I2C SDA和SCL)、引脚8和10 (UART TX和RX)。

2019-08-09

3.9K0

DAY96:阅读Stream Association Examples

kernel 人工智能

Associating data with a stream allows fine-grained control over CPU + GPU concurrency, but what data is visible to which streams must be kept in mind when using devices of compute capability lower than 6.x. Looking at the earlier synchronization example:

2018-12-27

6510

DAY 94:阅读Explicit Synchronization and Logical GPU Activity

Note that explicit synchronization is required even if kernel runs quickly and finishes before the CPU touches y in the above example. Unified Memory uses logical activity to determine whether the GPU is idle. This aligns with the CUDA programming model, which specifies that a kernel can run at any time following a launch and is not guaranteed to have finished until the host issues a synchronization call.

2018-12-26

4440

DAY93：阅读Coherency and Concurrency

kernel 深度学习

Simultaneous access to managed memory on devices of compute capability lower than 6.x is not possible, because coherence could not be guaranteed if the CPU accessed a Unified Memory allocation while a GPU kernel was active. However, devices of compute capability 6.x on supporting operating systems allow the CPUs and GPUs to access Unified Memory allocations simultaneously via the new page faulting mechanism. A program can query whether a device supports concurrent access to managed memory by checking a new concurrentManagedAccessproperty. Note, as with any parallel application, developers need to ensure correct synchronization to avoid data hazards between processors.

2018-12-25

6620

点击加载更多

社区活动

【纪录片】中国数据库前世今生

穿越半个世纪，探寻中国数据库50年的发展历程

Python精品学习库

代码在线跑，知识轻松学

博客搬家 | 分享价值百万资源包

自行/邀约他人一键搬运博客，速成社区影响力并领取好礼

技术创作特训营·精选知识专栏

往期视频·千货材料·成员作品最新动态