Merge pull request #1011 from zenghsh3/develop

Add Chinese version of README and provide saved models

Merge pull request #1011 from zenghsh3/develop
Add Chinese version of README and provide saved models
84fc0804 · Yibing Liu · GitHub · 039e501d · 81d7c73e · 84fc0804
5 changed file
--- a/fluid/DeepQNetwork/README.md
+++ b/fluid/DeepQNetwork/README.md
-# Reproduce DQN, DoubleDQN, DuelingDQN model with fluid version of PaddlePaddle
+[中文版](README_cn.md)

-+ DQN in:
+## Reproduce DQN, DoubleDQN, DuelingDQN model with Fluid version of PaddlePaddle
+Based on PaddlePaddle's next-generation API Fluid, the DQN model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. The model receives the image of the game as input, and uses the end-to-end model to directly predict the next step. The repository contains the following three types of models.
+ DQN in
 [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
 + DoubleDQN in:
 [Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
 + DuelingDQN in:
 [Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)

-# Atari benchmark & performance
-## [Atari games introduction](https://gym.openai.com/envs/#atari)
+## Atari benchmark & performance
+### [Atari games introduction](https://gym.openai.com/envs/#atari)

-+ Pong game result
+### Pong game result
+The average game rewards that can be obtained for the three models as the number of training steps changes during the training are as follows(about 3 hours/1 Million steps):
 ![DQN result](assets/dqn.png)

-# How to use
-+ Dependencies:
-    + python2.7
-    + gym
-    + tqdm
-    + paddlepaddle-gpu==0.12.0
-
-+ Start Training:
+## How to use
+### Dependencies:
+ python2.7
+ gym
+ tqdm
+ opencv-python
+ paddlepaddle-gpu>=0.12.0
+ ale_python_interface
+
+### Install Dependencies:
+ Install PaddlePaddle:
+    recommended to compile and install PaddlePaddle from source code
+ Install other dependencies:
+    ```
+    pip install -r requirement.txt
+    pip install gym[atari]
    ```
-    # To train a model for Pong game with gpu (use DQN model as default)
-    python train.py --rom ./rom_files/pong.bin --use_cuda
+    Install ale_python_interface, can reference：https://github.com/mgbellemare/Arcade-Learning-Environment

-    # To train a model for Pong with DoubleDQN
-    python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
+### Start Training:
+```
+# To train a model for Pong game with gpu (use DQN model as default)
+python train.py --rom ./rom_files/pong.bin --use_cuda

-    # To train a model for Pong with DuelingDQN
-    python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
-    ```
+# To train a model for Pong with DoubleDQN
+python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
+
+# To train a model for Pong with DuelingDQN
+python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
+```

 To train more games, can install more rom files from [here](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)

-+ Start Testing:
-    ```
-    # Play the game with saved model and calculate the average rewards
-    python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong/stepXXXXX
+### Start Testing:
+```
+# Play the game with saved best model and calculate the average rewards
+python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong

-    # Play the game with visualization
-    python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong/stepXXXXX --viz 0.01
-    ```
+# Play the game with visualization
+python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01
+```
+[Here](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA) is saved models for Pong and Breakout games. You can use it to play the game directly.
--- a/fluid/DeepQNetwork/README_cn.md
+++ b/fluid/DeepQNetwork/README_cn.md
+## 基于PaddlePaddle的Fluid版本复现DQN, DoubleDQN, DuelingDQN三个模型
+基于PaddlePaddle下一代API Fluid复现了深度强化学习领域的DQN模型，在经典的Atari 游戏上复现了论文同等水平的指标，模型接收游戏的图像作为输入，采用端到端的模型直接预测下一步要执行的控制信号，本仓库一共包含以下3类模型。
+ DQN模型：
+[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
+ DoubleDQN模型：
+[Deep Reinforcement Learning with Double Q-Learning](https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12389)
+ DuelingDQN模型：
+[Dueling Network Architectures for Deep Reinforcement Learning](http://proceedings.mlr.press/v48/wangf16.html)
+
+## 模型效果：Atari游戏表现
+### [Atari游戏介绍](https://gym.openai.com/envs/#atari)
+
+### Pong游戏训练结果
+三个模型在训练过程中随着训练步数的变化，能得到的平均游戏奖励如下图所示（大概3小时每1百万步）：
+![DQN result](assets/dqn.png)
+
+## 使用教程
+### 依赖:
+ python2.7
+ gym
+ tqdm
+ opencv-python
+ paddlepaddle-gpu>=0.12.0
+ ale_python_interface
+
+### 下载依赖：
+ 安装PaddlePaddle：
+    建议通过PaddlePaddle源码进行编译安装  
+ 下载其它依赖：
+    ```
+    pip install -r requirement.txt
+    pip install gym[atari]
+    ```
+    安装ale_python_interface可以参考：https://github.com/mgbellemare/Arcade-Learning-Environment
+
+### 训练模型：
+```
+# 使用GPU训练Pong游戏（默认使用DQN模型）
+python train.py --rom ./rom_files/pong.bin --use_cuda
+
+# 训练DoubleDQN模型
+python train.py --rom ./rom_files/pong.bin --use_cuda --alg DoubleDQN
+
+# 训练DuelingDQN模型
+python train.py --rom ./rom_files/pong.bin --use_cuda --alg DuelingDQN
+```
+
+训练更多游戏，可以下载游戏rom从[这里](https://github.com/openai/atari-py/tree/master/atari_py/atari_roms)
+
+### 测试模型：
+```
+# Play the game with saved model and calculate the average rewards
+# 使用训练过程中保存的最好模型玩游戏，以及计算平均奖励（rewards）
+python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong
+
+# 以可视化的形式来玩游戏
+python play.py --rom ./rom_files/pong.bin --use_cuda --model_path ./saved_model/DQN-pong --viz 0.01
+```
+[这里](https://pan.baidu.com/s/1gIsbNw5V7tMeb74ojx-TMA)是Pong和Breakout游戏训练好的模型，可以直接用来测试。
--- a/fluid/DeepQNetwork/play.py
+++ b/fluid/DeepQNetwork/play.py
@@ -11,7 +11,7 @@ from tqdm import tqdm

 def predict_action(exe, state, predict_program, feed_names, fetch_targets,
                   action_dim):
-    if np.random.randint(100) == 0:
+    if np.random.random() < 0.01:
        act = np.random.randint(action_dim)
    else:
        state = np.expand_dims(state, axis=0)

--- a/fluid/DeepQNetwork/requirement.txt
+++ b/fluid/DeepQNetwork/requirement.txt
+numpy
+gym
+tqdm
+opencv-python
+paddlepaddle-gpu==0.12.0
--- a/fluid/DeepQNetwork/train.py
+++ b/fluid/DeepQNetwork/train.py
@@ -120,6 +120,9 @@ def train_agent():
    pbar = tqdm(total=1e8)
    recent_100_reward = []
    total_step = 0
+    max_reward = None
+    save_path = os.path.join(args.model_dirname, '{}-{}'.format(
+        args.alg, os.path.basename(args.rom).split('.')[0]))
    while True:
        # start epoch
        total_reward, step = run_train_episode(agent, env, exp)
@@ -134,14 +137,11 @@ def train_agent():
            print("eval_agent done, (steps, eval_reward): ({}, {})".format(
                total_step, eval_reward))

-        if total_step // args.save_every_steps == save_flag:
-            save_flag += 1
-            save_path = os.path.join(args.model_dirname, '{}-{}'.format(
-                args.alg, os.path.basename(args.rom).split('.')[0]),
-                                     'step{}'.format(total_step))
-            fluid.io.save_inference_model(save_path, ['state'],
-                                          agent.pred_value, agent.exe,
-                                          agent.predict_program)
+            if max_reward is None or eval_reward > max_reward:
+                max_reward = eval_reward
+                fluid.io.save_inference_model(save_path, ['state'],
+                                              agent.pred_value, agent.exe,
+                                              agent.predict_program)
    pbar.close()


@@ -173,11 +173,6 @@ if __name__ == '__main__':
        type=str,
        default='saved_model',
        help='dirname to save model')
-    parser.add_argument(
-        '--save_every_steps',
-        type=int,
-        default=100000,
-        help='every steps number to save model')
    parser.add_argument(
        '--test_every_steps',
        type=int,