From fa93980e8306c9984f840e5e9c253b8ebe29f499 Mon Sep 17 00:00:00 2001 From: rical730 Date: Tue, 14 Jul 2020 10:19:28 +0800 Subject: [PATCH] update readme (#346) --- examples/DDPG/README.md | 3 +-- examples/DQN/README.md | 3 +-- examples/DQN_variant/README.md | 2 ++ examples/IMPALA/README.md | 3 +-- examples/MADDPG/README.md | 3 +-- examples/PPO/README.md | 3 +-- examples/SAC/README.md | 3 +-- examples/TD3/README.md | 3 +-- 8 files changed, 9 insertions(+), 14 deletions(-) diff --git a/examples/DDPG/README.md b/examples/DDPG/README.md index 1fb5496..a69a49f 100644 --- a/examples/DDPG/README.md +++ b/examples/DDPG/README.md @@ -1,8 +1,7 @@ ## Reproduce DDPG with PARL Based on PARL, the DDPG algorithm of deep reinforcement learning has been reproduced, reaching the same level of indicators as the paper in Atari benchmarks. -> DDPG in -[Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971) +> Paper: DDPG in [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971) ### Mujoco games introduction Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games. diff --git a/examples/DQN/README.md b/examples/DQN/README.md index 14646f5..59fe688 100644 --- a/examples/DQN/README.md +++ b/examples/DQN/README.md @@ -1,8 +1,7 @@ ## Reproduce DQN with PARL Based on PARL, we provide a simple demonstration of DQN. -> DQN in -[Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) ++ Paper: DQN in [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) ### Result diff --git a/examples/DQN_variant/README.md b/examples/DQN_variant/README.md index 6de3a24..010334d 100644 --- a/examples/DQN_variant/README.md +++ b/examples/DQN_variant/README.md @@ -4,7 +4,9 @@ Based on PARL, the DQN algorithm of deep reinforcement learning has been reprodu + Papers: > DQN in [Human-level Control Through Deep Reinforcement Learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) + > DDQN in [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461) + > Dueling DQN in [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581) ### Atari games introduction diff --git a/examples/IMPALA/README.md b/examples/IMPALA/README.md index cec4eaa..e97069a 100755 --- a/examples/IMPALA/README.md +++ b/examples/IMPALA/README.md @@ -1,8 +1,7 @@ ## Reproduce IMPALA with PARL Based on PARL, the IMPALA algorithm of deep reinforcement learning is reproduced, and the same level of indicators of the paper is reproduced in the classic Atari game. -> IMPALA in -[Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures](https://arxiv.org/abs/1802.01561) +> Paper: IMPALA in [Impala: Scalable distributed deep-rl with importance weighted actor-learner architectures](https://arxiv.org/abs/1802.01561) ### Atari games introduction Please see [here](https://gym.openai.com/envs/#atari) to know more about Atari games. diff --git a/examples/MADDPG/README.md b/examples/MADDPG/README.md index b20d459..b7a0535 100644 --- a/examples/MADDPG/README.md +++ b/examples/MADDPG/README.md @@ -1,8 +1,7 @@ ## Reproduce MADDPG with PARL Based on PARL, the MADDPG algorithm of deep reinforcement learning has been reproduced. -> MADDPG in -[ Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/abs/1706.02275) +> Paper: MADDPG in [ Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](https://arxiv.org/abs/1706.02275) ### Multi-agent particle environment introduction A simple multi-agent particle world based on gym. Please see [here](https://github.com/openai/multiagent-particle-envs) to install and know more about the environment. diff --git a/examples/PPO/README.md b/examples/PPO/README.md index 9a87c11..35cc060 100644 --- a/examples/PPO/README.md +++ b/examples/PPO/README.md @@ -5,8 +5,7 @@ Include following approach: + Clipped Surrogate Objective + Adaptive KL Penalty Coefficient -> PPO in -[Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347) +> Paper: PPO in [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347) ### Mujoco games introduction Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games. diff --git a/examples/SAC/README.md b/examples/SAC/README.md index 05854e5..fc7c1cd 100644 --- a/examples/SAC/README.md +++ b/examples/SAC/README.md @@ -5,8 +5,7 @@ Include following approaches: + DDPG Style with Stochastic Policy + Maximum Entropy -> SAC in -[Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290) +> Paper: SAC in [Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor](https://arxiv.org/abs/1801.01290) ### Mujoco games introduction Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games. diff --git a/examples/TD3/README.md b/examples/TD3/README.md index a1aee0c..4916266 100644 --- a/examples/TD3/README.md +++ b/examples/TD3/README.md @@ -6,8 +6,7 @@ Include following approaches: + Target Networks and Delayed Policy Update + Target Policy Smoothing Regularization -> TD3 in -[Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477) +> Paper: TD3 in [Addressing Function Approximation Error in Actor-Critic Methods](https://arxiv.org/abs/1802.09477) ### Mujoco games introduction Please see [here](https://github.com/openai/mujoco-py) to know more about Mujoco games. -- GitLab