update readme (#346)

fa93980e · rical730 · GitHub · fb43d292 · fa93980e · fa93980e
8 changed file
--- a/examples/DDPG/README.md
+++ b/examples/DDPG/README.md
 ## Reproduce DDPG with PARL
 Based on PARL, the DDPG algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks.
-> DDPG in
+> Paper: DDPG in [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
-[Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
 ### Mujoco games introduction
 Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games.

--- a/examples/DQN/README.md
+++ b/examples/DQN/README.md
 ## Reproduce DQN with PARL
 Based on PARL, we provide a simple demonstration of DQN.
-> DQN in
+ Paper: DQN in [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
-[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
 ### Result

--- a/examples/DQN_variant/README.md
+++ b/examples/DQN_variant/README.md
@@ -4,7 +4,9 @@ Based on PARL, the DQN algorithm of deep reinforcement learning has been reprodu
 + Papers: 
 > DQN in [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
 > DDQN in [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461)
 > Dueling DQN in [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)
 ### Atari games introduction

--- a/examples/IMPALA/README.md
+++ b/examples/IMPALA/README.md
 ## Reproduce IMPALA with PARL
 Based on PARL, the IMPALA algorithm of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game.
-> IMPALA in
+> Paper: IMPALA in [Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures](https://arxiv.org/abs/1802.01561)
-[Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures](https://arxiv.org/abs/1802.01561)
 ### Atari games introduction
 Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari games.

--- a/examples/MADDPG/README.md
+++ b/examples/MADDPG/README.md
 ## Reproduce MADDPG with PARL
 Based on PARL, the MADDPG algorithm of deep reinforcement learning has been reproduced.
-> MADDPG in 
+> Paper: MADDPG in [ Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/abs/1706.02275)
-[ Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/abs/1706.02275)
 ### Multi-agent particle environment introduction
 A simple multi-agent particle world based on gym. Please see [here](https://github.com/openai/multiagent-particle-envs) to install and know more about the environment.

--- a/examples/PPO/README.md
+++ b/examples/PPO/README.md
@@ -5,8 +5,7 @@ Include following approach:
 + Clipped Surrogate Objective
 + Adaptive KL Penalty Coefficient
-> PPO in
+> Paper: PPO in [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)
-[Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)
 ### Mujoco games introduction
 Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games.

--- a/examples/SAC/README.md
+++ b/examples/SAC/README.md
@@ -5,8 +5,7 @@ Include following approaches:
 + DDPG Style with Stochastic Policy
 + Maximum Entropy
-> SAC in
+> Paper: SAC in [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290)
-[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290)
 ### Mujoco games introduction
 Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games.

--- a/examples/TD3/README.md
+++ b/examples/TD3/README.md
@@ -6,8 +6,7 @@ Include following approaches:
 + Target Networks and Delayed Policy Update
 + Target Policy Smoothing Regularization
-> TD3 in
+> Paper: TD3 in [Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477)
-[Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477)
 ### Mujoco games introduction
 Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games.