README.md 1.0 KB
Newer Older
H
Hongsheng Zeng 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13
## Reproduce PPO with PARL
Based on PARL, the PPO model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Mujoco game.
Include following approach:
+ Clipped Surrogate Objective
+ Adaptive KL Penalty Coefficient

> PPO in
[Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)

### Mujoco games introduction
Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco game.

### Benchmark result
14 15

<img src=".benchmark/PPO_HalfCheetah-v2.png" width = "400" height ="300" alt="PPO_HalfCheetah-v2" />  
H
Hongsheng Zeng 已提交
16 17 18

## How to use
### Dependencies:
H
Hongsheng Zeng 已提交
19
+ python3.5+
H
Hongsheng Zeng 已提交
20
+ [paddlepaddle>=1.0.0](https://github.com/PaddlePaddle/Paddle)
H
Hongsheng Zeng 已提交
21
+ [parl](https://github.com/PaddlePaddle/PARL)
H
Hongsheng Zeng 已提交
22 23 24 25 26 27 28 29 30 31 32
+ gym
+ tqdm
+ mujoco-py>=1.50.1.0

### Start Training:
```
# To train an agent for HalfCheetah-v2 game (default: CLIP loss)
python train.py

# To train for different game and different loss type
# python train.py --env [ENV_NAME] --loss_type [CLIP|KLPEN]