前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >NIPS的最佳论文 强化学习Value iteration Network 及代码

NIPS的最佳论文 强化学习Value iteration Network 及代码

作者头像
用户1908973
发布2018-07-24 18:04:13
8600
发布2018-07-24 18:04:13
举报
文章被收录于专栏:CreateAMindCreateAMind

TensorFlow实现:https://github.com/TheAbhiKumar/tensorflow-value-iteration-networks

下面文章作者 https://www.zhihu.com/people/ikerpeng/

代码实现介绍:

Value Iteration Networks in TensorFlow

Tamar, A., Wu, Y., Thomas, G., Levine, S., and Abbeel, P. Value Iteration Networks. Neural Information Processing Systems (NIPS) 2016

This repository contains an implementation of Value Iteration Networks in TensorFlow which won the Best Paper Award at NIPS 2016. This code is based on the original Theano implementation by the authors.

Training

  • Download the 16x16 and 28x28 GridWorld datasets from the author's repository. This repository contains the 8x8 GridWorld dataset for convenience and its small size.

python3 train.py

If you want to monitor training progress change config.log to True and launch tensorboard --logdir /tmp/vintf/. The log directory is /tmp/vintf/ by default, but can be changed in config.logdir. The code currently runs the 8x8 GridWorld model by default.

The 8x8 GridWorld model converges in under 30 epochs with about ~98.5% accuracy. The paper lists that it should be around 99.6% and I was able to reproduce this with the Theano code. The TensorFlow model is not perfect as NaNs result when training with the same parameters as the Theano implementation on the 16x16 and 28x28 domain.

Dependencies

  • Python >= 3.5
  • TensorFlow >= 0.12
  • SciPy >= 0.18.1 (to load the data)

Datasets

  • The GridWorld dataset used is from the author's repository. It also contains Matlab scripts to generate the dataset. The code to process the dataset is from the original repository with minor modifications under this license
  • The model was also originally tested on three other domains and the author's original code will be released eventually
    • Mars Rover Navigation
    • Continuous control
    • WebNav

Resources

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2017-02-18,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 CreateAMind 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档