首页
学习
活动
专区
工具
TVP
发布

CreateAMind

专栏作者
948
文章
588718
阅读量
55
订阅数
Qzero算法介绍
1. AlphaZero算法: 算法的核心是把MCTS算法与深度强化学习(DRL)结合起来(使用MCTS算法作为RL的policy improvement机制)。为了达到这个目的做了两点改动:
用户1908973
2020-09-28
1.3K1
GTrXL架构介绍 (Transformer在RL中的应用)
论文链接:https://arxiv.org/pdf/1910.06764.pdf
用户1908973
2020-07-08
1.6K0
强化学习框架 IMPALA 介绍
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction
用户1908973
2020-02-25
1K0
从更统一的视角分析VAE
With an overall view of VAE, we can extend standard VAE loss in the reconstruction loss and regularization loss, for the original VAE, which is the KL term, but there are all kinds of versions which we can treat them as different combination of regularization methods
用户1908973
2019-11-07
4360
金句频频:用信息瓶颈的迁移学习和探索;关键状态
We present a hierarchical reinforcement learning (HRL) or options framework for identifying ‘decision states’. Informally speaking, these are states considered ‘important’ by the agent’s policy – e.g., for navigation, decision states would be crossroads or doors where an agent needs to make strategic decisions. While previous work (most notably Goyal et al., 2019) discovers decision states in a task/goal specific (or ‘supervised’) manner, we do so in a goal-independent (or
用户1908973
2019-08-09
3870
State Representation Learning for Control: An Overview
As an example, infants expect inertial objects to follow principles of persistence, continuity, cohesion and solidity before appearance-based elements such as color, texture and perceptual goodness. At the same time, these principles help guide later learnings such as object’ rigidity, softness and liquids properties.
用户1908973
2019-04-28
6840
用信息瓶颈的迁移学习和探索
Transfer and Exploration via the Information Bottleneck
用户1908973
2019-04-28
5240
用信息瓶颈的迁移学习和探索
Transfer and Exploration via the Information Bottleneck
用户1908973
2019-04-28
3820
4篇前沿强化学习论文
HIERARCHICAL VISUOMOTOR CONTROL OF HUMANOIDS
用户1908973
2018-12-26
5060
A Tutorial on Energy-Based Learning
Yann LeCun, Sumit Chopra, Raia Hadsell, Marc’Aurelio Ranzato, and Fu Jie Huang The Courant Institute of Mathematical Sciences, New York University{yann,sumit,raia,ranzato,jhuangfu}@cs.nyu.edu http://yann.lecun.com v1.0, August 19, 2006 To appear in “Predicting Structured Data”, G. Bakir, T. Hofman, B. Scho ̈lkopf, A. Smola, B. Taskar (eds) MIT Press, 2006
用户1908973
2018-12-26
7150
Disentangled的假设的探讨
Francesco Locatello, Stefan Bauer, Mario Lucic, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem
用户1908973
2018-12-26
7570
ros rviz 滴滴数据可视化
https://aws.amazon.com/cn/premiumsupport/knowledge-center/connect-to-linux-desktop-from-windows/
用户1908973
2018-07-24
1.3K0
没有更多了
社区活动
腾讯技术创作狂欢月
“码”上创作 21 天,分 10000 元奖品池!
Python精品学习库
代码在线跑,知识轻松学
博客搬家 | 分享价值百万资源包
自行/邀约他人一键搬运博客,速成社区影响力并领取好礼
技术创作特训营·精选知识专栏
往期视频·千货材料·成员作品 最新动态
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档