腾讯云开发者社区-腾讯云

开发者社区

文档建议反馈控制台

最新优惠活动

文章/答案/技术大牛

发布

CreateAMind

专栏作者

948

文章

588718

阅读量

55

订阅数

Qzero算法介绍

编程算法强化学习 linux

1. AlphaZero算法：算法的核心是把MCTS算法与深度强化学习（DRL）结合起来（使用MCTS算法作为RL的policy improvement机制）。为了达到这个目的做了两点改动：

2020-09-28

1.3K1

GTrXL架构介绍 (Transformer在RL中的应用)

https 网络安全 linux 编程算法

论文链接：https://arxiv.org/pdf/1910.06764.pdf

2020-07-08

1.6K0

强化学习框架 IMPALA 介绍

linux 编程算法

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction

2020-02-25

1K0

从更统一的视角分析ＶＡＥ

linux 编程算法 python

With an overall view of VAE, we can extend standard VAE loss in the reconstruction loss and regularization loss, for the original VAE, which is the KL term, but there are all kinds of versions which we can treat them as different combination of regularization methods

2019-11-07

4360

金句频频：用信息瓶颈的迁移学习和探索；关键状态

We present a hierarchical reinforcement learning (HRL) or options framework for identifying ‘decision states’. Informally speaking, these are states considered ‘important’ by the agent’s policy – e.g., for navigation, decision states would be crossroads or doors where an agent needs to make strategic decisions. While previous work (most notably Goyal et al., 2019) discovers decision states in a task/goal specific (or ‘supervised’) manner, we do so in a goal-independent (or

2019-08-09

3870

State Representation Learning for Control: An Overview

linux 编程算法 python

As an example, infants expect inertial objects to follow principles of persistence, continuity, cohesion and solidity before appearance-based elements such as color, texture and perceptual goodness. At the same time, these principles help guide later learnings such as object’ rigidity, softness and liquids properties.

2019-04-28

6840

用信息瓶颈的迁移学习和探索

Transfer and Exploration via the Information Bottleneck

2019-04-28

5240

用信息瓶颈的迁移学习和探索

Transfer and Exploration via the Information Bottleneck

2019-04-28

3820

4篇前沿强化学习论文

HIERARCHICAL VISUOMOTOR CONTROL OF HUMANOIDS

2018-12-26

5060

A Tutorial on Energy-Based Learning

linux 网站编程算法

Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu Jie Huang The Courant Institute of Mathematical Sciences, New York University{yann,sumit,raia,ranzato,jhuangfu}@cs.nyu.edu http://yann.lecun.com v1.0, August 19, 2006 To appear in “Predicting Structured Data”, G. Bakir, T. Hofman, B. Scho ̈lkopf, A. Smola, B. Taskar (eds) MIT Press, 2006

2018-12-26

7150

Disentangled的假设的探讨

Francesco Locatello, Stefan Bauer, Mario Lucic, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

2018-12-26

7570

ros rviz 滴滴数据可视化

数据可视化 https linux windows

https://aws.amazon.com/cn/premiumsupport/knowledge-center/connect-to-linux-desktop-from-windows/

2018-07-24

1.3K0

没有更多了

社区活动

腾讯技术创作狂欢月

“码”上创作 21 天，分 10000 元奖品池！

Python精品学习库

代码在线跑，知识轻松学

博客搬家 | 分享价值百万资源包

自行/邀约他人一键搬运博客，速成社区影响力并领取好礼

技术创作特训营·精选知识专栏

往期视频·千货材料·成员作品最新动态