强化学习实战(一)(tensorlayer乒乓球教程)

时间:2024-04-13 09:43:36

详细请参考tensorlayer官方文档:http://tensorlayercn.readthedocs.io/zh/latest/

运行乒乓球例子
在本教程的第二部分,我们将运行一个深度强化学习的例子,它在Karpathy的两篇博客 Deep Reinforcement Learning:Pong from Pixels 有介绍。

python tutorial_atari_pong.py
在运行教程代码之前 您需要安装 OpenAI gym environment ,它提供了大量强化学习常用的游戏环境。 如果一切运行正常,您将得到以下的输出:

[2016-07-12 09:31:59,760] Making new env: Pong-v0
[TL] InputLayer input_layer (?, 6400)
[TL] DenseLayer relu1: 200, relu
[TL] DenseLayer output_layer: 3, identity
param 0: (6400, 200) (mean: -0.000009 median: -0.000018 std: 0.017393)
param 1: (200,) (mean: 0.000000 median: 0.000000 std: 0.000000)
param 2: (200, 3) (mean: 0.002239 median: 0.003122 std: 0.096611)
param 3: (3,) (mean: 0.000000 median: 0.000000 std: 0.000000)
num of params: 1280803
layer 0: Tensor(“Relu:0”, shape=(?, 200), dtype=float32)
layer 1: Tensor(“add_1:0”, shape=(?, 3), dtype=float32)
episode 0: game 0 took 0.17381s, reward: -1.000000
episode 0: game 1 took 0.12629s, reward: 1.000000 !!!!!!!!
episode 0: game 2 took 0.17082s, reward: -1.000000
episode 0: game 3 took 0.08944s, reward: -1.000000
episode 0: game 4 took 0.09446s, reward: -1.000000
episode 0: game 5 took 0.09440s, reward: -1.000000
episode 0: game 6 took 0.32798s, reward: -1.000000
episode 0: game 7 took 0.74437s, reward: -1.000000
episode 0: game 8 took 0.43013s, reward: -1.000000
episode 0: game 9 took 0.42496s, reward: -1.000000
episode 0: game 10 took 0.37128s, reward: -1.000000
episode 0: game 11 took 0.08979s, reward: -1.000000
episode 0: game 12 took 0.09138s, reward: -1.000000
episode 0: game 13 took 0.09142s, reward: -1.000000
episode 0: game 14 took 0.09639s, reward: -1.000000
episode 0: game 15 took 0.09852s, reward: -1.000000
episode 0: game 16 took 0.09984s, reward: -1.000000
episode 0: game 17 took 0.09575s, reward: -1.000000
episode 0: game 18 took 0.09416s, reward: -1.000000
episode 0: game 19 took 0.08674s, reward: -1.000000
episode 0: game 20 took 0.09628s, reward: -1.000000
resetting env. episode reward total was -20.000000. running mean: -20.000000
episode 1: game 0 took 0.09910s, reward: -1.000000
episode 1: game 1 took 0.17056s, reward: -1.000000
episode 1: game 2 took 0.09306s, reward: -1.000000
episode 1: game 3 took 0.09556s, reward: -1.000000
episode 1: game 4 took 0.12520s, reward: 1.000000 !!!!!!!!
episode 1: game 5 took 0.17348s, reward: -1.000000
episode 1: game 6 took 0.09415s, reward: -1.000000
这个例子让神经网络通过游戏画面来学习如何像人类一样打乒乓球。神经网络将于伪AI电脑对战不断地对战,最后学会战胜它。 在经过15000个序列的训练之后,神经网络就可以赢得20%的比赛。 在20000个序列的训练之后,神经网络可以赢得35%的比赛, 我们可以看到计算机学的越来越快,这是因为它有更多的胜利的数据来进行训练。 训练了30000个序列后,神经网络再也不会输了。

render = False
resume = False
如果您想显示游戏过程,那就设置 render 为 True 。 当您再次运行该代码,您可以设置 resume 为 True,那么代码将加载现有的模型并且会基于它继续训练。

下面来介绍安装和运行demo教程。
pip install gym

OpenAI Gym是开发和比较强化学习算法的工具包。
强化学习关注的是做出好决策,而监督式学习和非监督式学习主要关注的是做出预测。
强化学习有两个基本概念:环境(即外部世界)和智能体(即你正在编写的算法)。智能体向环境发送行为,环境回复观察和奖励(即分数)。
OpenAI Gym由两部分组成:
1.gym开源库:一个测试问题集合—环境(environment),可以用于自己的强化学习算法开发,这些环境有共享的接口,允许用户设计通用的算法
2.OpenAI Gym服务: 一个站点和API,允许用户对他们训练的算法进行性能比较。

运行一个简单例子,移动平台使木棒不掉落。
import gym
from gym.wrappers import Monitor

env = gym.make(‘CartPole-v0’)
env = Monitor(env,directory=’D:\其他\技术文献\强化_深度学习\gym\cartpole-experiment-1’,video_callable=False, write_upon_reset=True)
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print(“Episode finished after {} timesteps”.format(t+1))
break
强化学习实战(一)(tensorlayer乒乓球教程)

接下来开始进入tensorlayer乒乓球教程:
python tutorial_atari_pong.py

报错:No module named ‘atari_py’

虽然安装了gym,但是缺少atari_py模块。
网上搜索到:
pip install –no-index -f https://github.com/Kojoley/atari-py/releases atari_py

C:\Users\23683>pip install –no-index -f https://github.com/Kojoley/atari-py/releases atari_py
Looking in links: https://github.com/Kojoley/atari-py/releases
Collecting atari_py
Downloading https://github.com/Kojoley/atari-py/releases/download/0.1.1/atari_py-0.1.1-cp36-cp36m-win_amd64.whl (666kB)
100% |████████████████████████████████| 675kB 133kB/s
Requirement already satisfied: numpy in c:\users\23683\anaconda3\lib\site-packages (from atari_py) (1.14.5)
Requirement already satisfied: six in c:\users\23683\anaconda3\lib\site-packages (from atari_py) (1.11.0)
Installing collected packages: atari-py
Successfully installed atari-py-0.1.1
安装成功。

再次运行python tutorial_atari_pong.py
成功!