## Reproduce PPO with PARLBased on PARL, the PPO model of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Mujoco game.Include following approach:+ Clipped Surrogate Objective+ Adaptive KL Penalty Coefficient> PPO in[Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)### Mujoco games introductionPlease see [here](https://github.com/openai/mujoco-py) to know more about Mujoco game.### Benchmark result- HalfCheetah-v2<imgsrc=".benchmark/PPO_HalfCheetah-v2.png"/>## How to use### Dependencies:+ python2.7 or python3.5++[paddlepaddle>=1.0.0](https://github.com/PaddlePaddle/Paddle)+ gym+ tqdm+ mujoco-py>=1.50.1.0### Start Training:```# To train an agent for HalfCheetah-v2 game (default: CLIP loss)python train.py# To train for different game and different loss type# python train.py --env [ENV_NAME] --loss_type [CLIP|KLPEN]