专栏首页CreateAMind强化学习Universal Planning Networks

强化学习Universal Planning Networks

https://arxiv.org/abs/1804.00645

https://sites.google.com/view/upn-public/home

Abstract:

A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities.

本文分享自微信公众号 - CreateAMind(createamind)

原文出处及转载信息见文内详细说明,如有侵权,请联系 yunjia_community@tencent.com 删除。

原始发表时间:2018-05-15

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • stackGAN通过文字描述生成图片的V2项目

    https://github.com/hanzhanggit/StackGAN-v2

    用户1908973
  • Nonparametric VAE for Hierarchical Representation Learning

    The recently developed variational autoencoders (VAEs) have proved to be an effe...

    用户1908973
  • 图像结构样式分开生成的生成模型论文代码

    Generative Image Modeling using Style and Structure Adversarial Networks

    用户1908973
  • Android蓝牙连接a2dp蓝牙耳机

    开发工具:Androidstudio 适配机型:honor8(Android6.0), 坚果R1(Android8.0) 开发功能:Android中蓝牙连接...

    fanfan
  • 43.QT-访问远程SQLite数据库

    将要共享的share文件夹设置为共享(远程访问默认只能读),如果要想远程访问能够写的话,则点击权限进行修改,然后应用.

    张诺谦
  • Android滤镜--Alpha值滤镜处理之MaskFilter

    aruba
  • Flink Forward 2019--实战相关4--流媒体公司COMCAST详解生产问题解决方案

    Adventures in Scaling from Zero to 5 Billion Data Points per Day -- Dave Torok(C...

    阿泽
  • GOF23种设计模式类型、描述和类图(上)

    描述:Java 中最简单的设计模式之一。这种类型的设计模式属于创建型模式,它提供了一种创建对象的最佳方式。这种模式涉及到一个单一的类,该类负责创建自己的对象,同...

    用户4143945
  • NPC的突变特性(逆向收费读文献2019-13)

    本周尝试一种新的文献解读方式,就是一次性解读3篇文献,总结他们的异同点,整体把握该领域发展情况。

    生信技能树
  • 悉尼科大徐亦达教授:1000+页机器学习讲义,32 份主题推介

    悉尼科大徐亦达教授近日在GitHub更新了他2019年以来的机器学习新材料,超过1000页的讲义,总共涵盖 32 个主题。

    新智元

扫码关注云+社区

领取腾讯云代金券