MontezumaRevengeNoFrameskip-v4'
https://github.com/openai/random-network-distillation
https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/
Our team running a new experiment on MontezumaRevenge with RND model, which successfully reach 17 rooms. Watching that agent collecting keys and using the sword. You got to question, what is the real definition of intelligence.
使用replayer.py运行默认在/tmp下面的日志目录即可出现下面的训练记录plot video show or save video
up is actual is videos but this is only one frame
Demo
Start training