首页
学习
活动
专区
工具
TVP
发布
精选内容/技术社群/优惠产品,尽在小程序
立即前往

程序猿python学习AIphaZero,TensorFlow强化学习AI游戏,100行代码运行看看!

打败世界冠军?AIphaGo Zero原理?

没错,本篇文章利用100行代码展示如何利用TensorFlow框架编写一个很简单的深度强化游戏AI核心部分,希望在本篇文章里,同学们能学到DQN网络原理。再也不用担心麻麻说我学机器学习搬砖啦!

Deep Q Network是DeepMind在2013年提出来的网络,是第一个成功地将深度学习和强化学习结合起来的模型,也是打败世界围棋冠军柯洁AIphaGO Zero核心原理,启发了后续一系列的工作。这些后续工作中比较有名的有Double DQN, Prioritized Replay 和 Dueling Network。.

游戏操作:

按住鼠标左键左移小棒子,按住鼠标右键右移小棒子。每次用棒子接住小方块得一分。通过深度强化学习算法,让计算机自动完成游戏操作。

安装Python依赖库

lPipinstall pygame

lPip install numpy

核心代码展示

定义CNN卷积网络:

1.defconvolutional_neural_network(input_image):

2.weights = {'w_conv1':tf.Variable(tf.zeros([8, 8, 4, 32])),

3.'w_conv2':tf.Variable(tf.zeros([4, 4, 32, 64])),

4.'w_conv3':tf.Variable(tf.zeros([3, 3, 64, 64])),

5.'w_fc4':tf.Variable(tf.zeros([3456, 784])),

6.'w_out':tf.Variable(tf.zeros([784, output]))}

7.

8.biases = {'b_conv1':tf.Variable(tf.zeros([32])),

9.'b_conv2':tf.Variable(tf.zeros([64])),

10.'b_conv3':tf.Variable(tf.zeros([64])),

11.'b_fc4':tf.Variable(tf.zeros([784])),

12.'b_out':tf.Variable(tf.zeros([output]))}

13.

17.conv3_flat = tf.reshape(conv3, [-1, 3456])

19.

20.output_layer = tf.matmul(fc4, weights['w_out']) + biases['b_out']

21.returnoutput_layer

其中包含三个卷积层,一个全连接层,通过relu激活函数输出给下一层。

训练神经网络方法:

1.deftrain_neural_network(input_image):

2.predict_action = convolutional_neural_network(input_image)

3.

4.argmax = tf.placeholder("float", [None, output])

5.gt = tf.placeholder("float", [None])

6.

7.action = tf.reduce_sum(tf.mul(predict_action, argmax), reduction_indices = 1)

8.cost = tf.reduce_mean(tf.square(action - gt))

9.optimizer = tf.train.AdamOptimizer(1e-6).minimize(cost)

10.

11.game = Game()

12.D = deque()

13.

14._, image = game.step(MOVE_STAY)

15.#转换为灰度值

16.image = cv2.cvtColor(cv2.resize(image, (100, 80)), cv2.COLOR_BGR2GRAY)

17.#转换为二值

18.ret, image = cv2.threshold(image, 1, 255, cv2.THRESH_BINARY)

19.input_image_data = np.stack((image, image, image, image), axis = 2)

20.

21.with tf.Session() as sess:

22.sess.run(tf.initialize_all_variables())

23.

24.saver = tf.train.Saver()

25.

26.n = 0

27.epsilon = INITIAL_EPSILON

28.whileTrue:

29.action_t = predict_action.eval(feed_dict = )[0]

30.

31.argmax_t = np.zeros([output], dtype=np.int)

32.if(random.random()

33.maxIndex = random.randrange(output)

34.else:

35.maxIndex = np.argmax(action_t)

36.argmax_t[maxIndex] = 1

37.ifepsilon > FINAL_EPSILON:

38.epsilon -= (INITIAL_EPSILON - FINAL_EPSILON) / EXPLORE

39.

41.# if event.type == QUIT:

42.# pygame.quit()

43.# sys.exit()

44.reward, image = game.step(list(argmax_t))

45.

46.image = cv2.cvtColor(cv2.resize(image, (100, 80)), cv2.COLOR_BGR2GRAY)

47.ret, image = cv2.threshold(image, 1, 255, cv2.THRESH_BINARY)

48.image = np.reshape(image, (80, 100, 1))

49.input_image_data1 = np.append(image, input_image_data[:, :, 0:3], axis = 2)

50.

51.D.append((input_image_data, argmax_t, reward, input_image_data1))

52.

53.iflen(D) > REPLAY_MEMORY:

54.D.popleft()

55.

56.ifn > OBSERVE:

57.minibatch = random.sample(D, BATCH)

58.input_image_data_batch = [d[0]fordinminibatch]

59.argmax_batch = [d[1]fordinminibatch]

60.reward_batch = [d[2]fordinminibatch]

61.input_image_data1_batch = [d[3]fordinminibatch]

62.

63.gt_batch = []

64.

65.out_batch = predict_action.eval(feed_dict = )

66.

67.foriinrange(0, len(minibatch)):

68.gt_batch.append(reward_batch[i] + LEARNING_RATE * np.max(out_batch[i]))

69.

70.optimizer.run(feed_dict = )

71.

72.input_image_data = input_image_data1

73.n = n+1

74.

75.ifn % 10000 == 0:

76.saver.save(sess,'game.cpk', global_step = n)#保存模型

77.

78.print(n,"epsilon:", epsilon," ","action:", maxIndex," ","reward:", reward)

79.

80.

81.train_neural_network(input_image)

训练效果:

AI傻乎乎的自动尝试玩这款游戏,不断试错,玩的不亦乐乎。

项目总结:

本次项目展示100行python代码,实现了利用TensorFlow框架展示深度强化学习的效果。

  • 发表于:
  • 原文链接http://kuaibao.qq.com/s/20171227G0D1JZ00?refer=cp_1026
  • 腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号(企鹅号)传播渠道之一,根据《腾讯内容开放平台服务协议》转载发布内容。
  • 如有侵权,请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长 进交流群

领取专属 10元无门槛券

私享最新 技术干货

扫码加入开发者社群
领券