README.md 2.0 KB
Newer Older
H
Hongsheng Zeng 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13
## Reproduce GA3C with PARL
Based on PARL, the GA3C algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks.

Original paper: [GA3C: GPU-based A3C for Deep Reinforcement Learning](https://www.researchgate.net/profile/Iuri_Frosio2/publication/310610848_GA3C_GPU-based_A3C_for_Deep_Reinforcement_Learning/links/583c6c0b08ae502a85e3dbb9/GA3C-GPU-based-A3C-for-Deep-Reinforcement-Learning.pdf)

A hybrid CPU/GPU version of the [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/abs/1602.01783) algorithm.

### Atari games introduction
Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari games.

### Benchmark result
Results with one learner (in a P40 GPU) and 24 simulators (in 12 CPU) in 10 million sample steps.
<img src=".benchmark/GA3C_Pong.jpg" width = "400" height ="300" alt="GA3C_Pong" /> <img src=".benchmark/GA3C_Breakout.jpg" width = "400" height ="300" alt="GA3C_Breakout"/>
H
Hongsheng Zeng 已提交
14 15
<img src=".benchmark/GA3C_BeamRider.jpg" width = "400" height ="300" alt="GA3C_BeamRider" /> <img src=".benchmark/GA3C_Qbert.jpg" width = "400" height ="300" alt="GA3C_Qbert"/>
<img src=".benchmark/GA3C_SpaceInvaders.jpg" width = "400" height ="300" alt="GA3C_SpaceInvaders" />
H
Hongsheng Zeng 已提交
16 17 18 19 20 21 22

## How to use
### Dependencies
+ python2.7 or python3.5+
+ [paddlepaddle>=1.3.0](https://github.com/PaddlePaddle/Paddle)
+ [parl](https://github.com/PaddlePaddle/PARL)
+ gym
B
Bo Zhou 已提交
23
+ atari-py
H
Hongsheng Zeng 已提交
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47


### Distributed Training

#### Learner
```sh
python train.py 
```

#### Simulators (Suggest: 24 simulators in 12+ CPUs)
```sh
for i in $(seq 1 24); do
    python simulator.py &
done;
wait
```

You can change training settings (e.g. `env_name`, `server_ip`) in `ga3c_config.py`.
Training result will be saved in `log_dir/train/result.csv`.

[Tips] The performance can be influenced dramatically in a slower computational environment, especially when training with low-speed CPUs. It may be caused by the policy-lag problem.

### Reference
+ [tensorpack](https://github.com/tensorpack/tensorpack)