README.md 982 字节
Newer Older
L
LI Yunxiang 已提交
1 2 3 4 5 6 7
## Reproduce SAC with PARL
Based on PARL, the SAC algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Mujoco benchmarks.

Include following approaches:
+ DDPG Style with Stochastic Policy
+ Maximum Entropy

R
rical730 已提交
8
> Paper: SAC in [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290)
L
LI Yunxiang 已提交
9 10 11 12 13 14 15 16 17 18 19

### Mujoco games introduction
Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games.

### Benchmark result

<img src=".benchmark/merge.png" width = "1500" height ="260" alt="Performance" />

## How to use
### Dependencies:
+ python3.5+
B
Bo Zhou 已提交
20
+ [paddlepaddle>=1.6.1](https://github.com/PaddlePaddle/Paddle)
L
LI Yunxiang 已提交
21 22 23 24 25 26 27 28 29 30 31
+ [parl](https://github.com/PaddlePaddle/PARL)
+ gym
+ mujoco-py>=1.50.1.0

### Start Training:
```
# To train an agent for HalfCheetah-v2 game
python train.py

# To train for different games
# python train.py --env [ENV_NAME]