使用50行Python教AI玩运杆游戏

强化学习

RL智能体的典型例子是AlphaGo，智能体已经学会了如何玩围棋获取最大奖励（赢得游戏）。在本教程中，我们将创建一个智能体，通过向左或向右推动小车，可以解决平衡小车上的杆的问题。

基础

`import gym`
`import numpy as np`
`env= gym.make('CartPole-v1')`

`def  play（env，policy）：`
`  observation= env.reset（）`

`done= False`
`score= 0`
`observations= []`

`for _in range(5000):`
`    observations+= [observation.tolist()]# Record the observations for normalization and replay`
`    if done:# If the simulation was over last iteration, exit loop`
`      break`
`    # Pick an action according to the policy matrix`
`    outcome= np.dot(policy, observation)`
`    action= 1 if outcome >0 else 0`
`    # Make the action, record reward`
`    observation, reward, done, info= env.step(action)`
`    score+= reward`
`  return score, observations`

`outcome= np.dot(policy, observation)`
`action= 1 if outcome >0 else 0`

`import gym`
`import numpy as np`
`env= gym.make('CartPole-v1')`
`def play(env, policy):`
`  observation= env.reset()`
`  done= False`
`  score= 0`
`  observations= []`
`  for _in range(5000):`
`    observations+= [observation.tolist()]# Record the observations for normalization and replay`
`    if done:# If the simulation was over last iteration, exit loop`
`      break`
`    # Pick an action according to the policy matrix`
`    outcome= np.dot(policy, observation)`
`    action= 1 if outcome >0 else 0`
`    # Make the action, record reward`
`    observation, reward, done, info= env.step(action)`
`    score+= reward`
`  return score, observations`

玩第一场比赛

`policy= np.random.rand(1,4)`

`score, observations= play(env, policy)`
`print('Policy Score', score)`

智能体

`from flaskimport Flask`
`import json`
`app= Flask(__name__, static_folder='.')`
`@app.route("/data")`
`def data():`
`    return json.dumps(observations)`
`@app.route('/')`
`def root():`
`    return app.send_static_file('./index.html')`
`app.run(host='0.0.0.0', port='3000')`

策略检索

`max  =（0，[]，[]）`

`for _in range(10):`
`  policy= np.random.rand(1,4)`
`  score, observations= play(env, policy)`
`  if score >max[0]:`
`    max = (score, observations, policy)`
`print('Max Score',max[0])`

`@app.route("/data")`
`def data():`
`    return json.dumps(observations)`

`@ app.route（“/ data ”）`
`def  data（）：`
`    return json.dumps（max [1 ]）`

`import gym`
`import numpy as np`
`env= gym.make('CartPole-v1')`
`def play(env, policy):`
`  observation= env.reset()`
`  done= False`
`  score= 0`
`  observations= []`
`  for _in range(5000):`
`    observations+= [observation.tolist()]# Record the observations for normalization and replay`
`    if done:# If the simulation was over last iteration, exit loop`
`      break`
`    # Pick an action according to the policy matrix`
`    outcome= np.dot(policy, observation)`
`    action= 1 if outcome >0 else 0`
`    # Make the action, record reward`
`    observation, reward, done, info= env.step(action)`
`    score+= reward`
`  return score, observations`
`max = (0, [], [])`
`for _in range(10):`
`  policy= np.random.rand(1,4)`
`  score, observations= play(env, policy)`
`  if score >max[0]:`
`    max = (score, observations, policy)`
`print('Max Score',max[0])`
`from flaskimport Flask`
`import json`
`app= Flask(__name__, static_folder='.')`
`@app.route("/data")`
`def data():`
`    return json.dumps(max[1])`
`@app.route('/')`
`def root():`
`    return app.send_static_file('./index.html')`
`app.run(host='0.0.0.0', port='3000')`

补充

`import gym`
`import numpy as np`
`env= gym.make('CartPole-v1')`
`def play(env, policy):`
`  observation= env.reset()`
`  done= False`
`  score= 0`
`  observations= []`
`  for _in range(5000):`
`    observations+= [observation.tolist()]# Record the observations for normalization and replay`
`    if done:# If the simulation was over last iteration, exit loop`
`      break`
`    # Pick an action according to the policy matrix`
`    outcome= np.dot(policy, observation)`
`    action= 1 if outcome >0 else 0`
`    # Make the action, record reward`
`    observation, reward, done, info= env.step(action)`
`    score+= reward`
`  return score, observations`
`max = (0, [], [])`
`# We changed the next two lines!`
`for _in range(100):`
`  policy= np.random.rand(1,4)- 0.5`
`  score, observations= play(env, policy)`
`  if score >max[0]:`
`    max = (score, observations, policy)`
`print('Max Score',max[0])`
`from flaskimport Flask`
`import json`
`app= Flask(__name__, static_folder='.')`
`@app.route("/data")`
`def data():`
`    return json.dumps(max[1])`
`@app.route('/')`
`def root():`
`    return app.send_static_file('./index.html')`
`app.run(host='0.0.0.0', port='3000')`

补充2

`scores= []`
`for _in range(100):`
`  score, _ = play(env,max[2])`
`  scores+= [score]`
`print('Average Score (100 trials)', np.mean(scores))`

0 条评论

相关文章

37270

38870

10820

45480

2.6K30

671100

教程 | 如何优雅而高效地使用Matplotlib实现数据可视化

Matplotlib 能创建非常多的可视化图表，它也有一个丰富的 Python 工具生态环境，很多更高级的可视化工具使用 Matplotlib 作为基础库。因此...

6920

【教程】一小时向非程序员介绍 R 编程语言

(1)下载R和RStudio 我对RStudio的印象不错，对于初学者来说，它既方便又很有帮助，对专业人士也很有用。尤其对于初学者：鼠标指向-点击式(point...

30580

553130

28240