From 12a727cd25f217c190e6abd4ec079b09017ba7a1 Mon Sep 17 00:00:00 2001 From: niuyazhe Date: Mon, 6 Sep 2021 13:43:37 +0800 Subject: [PATCH] style(nyz): add algorithm list in README --- README.md | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/README.md b/README.md index d73ae9f..7bfd838 100644 --- a/README.md +++ b/README.md @@ -85,7 +85,60 @@ The detailed documentation are hosted on [doc](https://opendilab.github.io/DI-en ding -m serial -e cartpole -p dqn -s 0 ``` +## Feature + +### Algorithm Versatility + +| No | Algorithm | Label | Implementation | Runnable Demo | +| :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | +| 1 | [DQN](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/dqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/dqn.py) | python3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0 | +| 2 | [C51](https://arxiv.org/pdf/1707.06887.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/c51](https://github.com/opendilab/DI-engine/blob/main/ding/policy/c51.py) | ding -m serial -c cartpole_c51_config.py -s 0 | +| 3 | [QRDQN](https://arxiv.org/pdf/1806.06923v1.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/qrdqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qrdqn.py) | ding -m serial -c cartpole_qrdqn_config.py -s 0 | +| 4 | [IQN](https://arxiv.org/pdf/1806.06923.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/iqn](https://github.com/opendilab/DI-engine/blob/main/ding/policy/iqn.py) | ding -m serial -c cartpole_iqn_config.py -s 0 | +| 5 | [Rainbow](https://arxiv.org/abs/1710.02298) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/rainbow](https://github.com/opendilab/DI-engine/blob/main/ding/policy/rainbow.py) | ding -m serial -c cartpole_rainbow_config.py -s 0 | +| 6 | [SQL]([A2C](https://arxiv.org/abs/1602.01783)[PPO](https://arxiv.org/abs/1707.06347)[PPG](https://arxiv.org/pdf/2009.04416.pdf)) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green) | [policy/sql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sql.py) | ding -m serial -c cartpole_sql_config.py -s 0 | +| 7 | [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) | ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/r2d2](https://github.com/opendilab/DI-engine/blob/main/ding/policy/r2d2.py) | ding -m serial -c cartpole_r2d2_config.py -s 0 | +| 8 | [A2C](https://arxiv.org/abs/1602.01783) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/a2c](https://github.com/opendilab/DI-engine/blob/main/ding/policy/a2c.py) | ding -m serial -c cartpole_a2c_config.py -s 0 | +| 9 | [PPO](https://arxiv.org/abs/1707.06347) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green) | [policy/ppo](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppo.py) | python3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0 | +| 10 | [PPG](https://arxiv.org/pdf/2009.04416.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/ppg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ppg.py) | python3 -u cartpole_ppg_main.py | +| 11 | [ACER](https://arxiv.org/pdf/1611.01224.pdf) | ![discrete](https://img.shields.io/badge/-discrete-brightgreen)![continuous](https://img.shields.io/badge/-continous-green) | [policy/acer](https://github.com/opendilab/DI-engine/blob/main/ding/policy/acer.py) | ding -m serial -c cartpole_acer_config.py -s 0 | +| 12 | [IMPALA](https://arxiv.org/abs/1802.01561) | ![dist](https://img.shields.io/badge/-distributed-blue)![discrete](https://img.shields.io/badge/-discrete-brightgreen) | [policy/impala](https://github.com/opendilab/DI-engine/blob/main/ding/policy/impala.py) | ding -m serial -c cartpole_impala_config.py -s 0 | +| 13 | [DDPG](![dist](https://img.shields.io/badge/-distributed-blue)) | ![continuous](https://img.shields.io/badge/-continous-green) | [policy/ddpg](https://github.com/opendilab/DI-engine/blob/main/ding/policy/ddpg.py) | ding -m serial -c pendulum_ddpg_config.py -s 0 | +| 14 | [TD3](https://arxiv.org/pdf/1802.09477.pdf) | ![continuous](https://img.shields.io/badge/-continous-green) | [policy/td3](https://github.com/opendilab/DI-engine/blob/main/ding/policy/td3.py) | python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0 | +| 15 | [SAC](https://arxiv.org/abs/1801.01290) | ![continuous](https://img.shields.io/badge/-continous-green) | [policy/sac](https://github.com/opendilab/DI-engine/blob/main/ding/policy/sac.py) | ding -m serial -c pendulum_sac_config.py -s 0 | +| 16 | [QMIX](https://arxiv.org/abs/1801.01290) | ![MARL](https://img.shields.io/badge/-MARL-yellow) | [policy/qmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qmix.py) | ding -m serial -c smac_3s5z_qmix_config.py -s 0 | +| 17 | [COMA](https://arxiv.org/pdf/1705.08926.pdf) | ![MARL](https://img.shields.io/badge/-MARL-yellow) | [policy/coma](https://github.com/opendilab/DI-engine/blob/main/ding/policy/coma.py) | ding -m serial -c smac_3s5z_coma_config.py -s 0 | +| 18 | [QTran](https://arxiv.org/abs/1905.05408) | ![MARL](https://img.shields.io/badge/-MARL-yellow) | [policy/qtran](https://github.com/opendilab/DI-engine/blob/main/ding/policy/qtran.py) | ding -m serial -c smac_3s5z_qtran_config.py -s 0 | +| 19 | [WQMIX](https://arxiv.org/abs/2006.10800) | ![MARL](https://img.shields.io/badge/-MARL-yellow) | [policy/wqmix](https://github.com/opendilab/DI-engine/blob/main/ding/policy/wqmix.py) | ding -m serial -c smac_3s5z_wqmix_config.py -s 0 | +| 20 | [CollaQ](https://arxiv.org/pdf/2010.08531.pdf) | ![MARL](https://img.shields.io/badge/-MARL-yellow) | [policy/collaq](https://github.com/opendilab/DI-engine/blob/main/ding/policy/collaq.py) | ding -m serial -c smac_3s5z_collaq_config.py -s 0 | +| 21 | [GAIL](https://arxiv.org/abs/2006.10800) | ![IL](https://img.shields.io/badge/-IL-purple) | [reward_model/gail](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/gail_irl_model.py) | ding -m serial_reward_model -c cartpole_dqn_config.py -s 0 | +| 22 | [SQIL](https://arxiv.org/pdf/1905.11108.pdf) | ![IL](https://img.shields.io/badge/-IL-purple) | [entry/sqil](https://github.com/opendilab/DI-engine/blob/main/ding/entry/serial_entry_sqil.py) | ding -m serial_sqil -c cartpole_sql_config.py -s 0 | +| 23 | [HER](https://arxiv.org/abs/2006.10800) | ![exp](https://img.shields.io/badge/-exploration-orange) | [reward_model/her](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/her_reward_model.py) | python3 -u bitflip_her_dqn.py | +| 24 | [RND](https://arxiv.org/abs/1810.12894) | ![exp](https://img.shields.io/badge/-exploration-orange) | [reward_model/rnd](https://github.com/opendilab/DI-engine/blob/main/ding/reward_model/rnd_reward_model.py) | python3 -u cartpole_ppo_rnd_main.py | +| 25 | [CQL](https://arxiv.org/pdf/2006.04779.pdf) | ![offline](https://img.shields.io/badge/-offlineRL-darkblue) | [policy/cql](https://github.com/opendilab/DI-engine/blob/main/ding/policy/cql.py) | ding -m serial_offline -c cartpole_cql_config.py -s 0 | +| 26 | [PER](https://arxiv.org/pdf/1511.05952.pdf) | ![other](https://img.shields.io/badge/-other-lightgrey) | [worker/replay_buffer](https://github.com/opendilab/DI-engine/blob/main/ding/worker/replay_buffer/advanced_buffer.py) | `rainbow demo` | +| 27 | [GAE](https://arxiv.org/pdf/1506.02438.pdf) | ![other](https://img.shields.io/badge/-other-lightgrey) | [rl_utils/gae](https://github.com/opendilab/DI-engine/blob/main/ding/rl_utils/gae.py) | `ppo demo` | + +![discrete](https://img.shields.io/badge/-discrete-brightgreen) means discrete action space, which is only label in normal DQL algorithms(1-15) + +![continuous](https://img.shields.io/badge/-continous-green) means continuous action space, which is only label in normal DQL algorithms(1-15) + +![dist](https://img.shields.io/badge/-distributed-blue) means distributed training (collector-learner parallel) RL algorithm + +![MARL](https://img.shields.io/badge/-MARL-yellow)means multi-agent RL algorithm + +![exp](https://img.shields.io/badge/-exploration-orange) means RL algorithm which is related to exploration and sparse reward + +![IL](https://img.shields.io/badge/-IL-purple) Means Imitation Learning, including Behaviour Cloning, Inverse RL, Adversarial Structured IL + +![offline](https://img.shields.io/badge/-offlineRL-darkblue) means offline RL algorithm + +![other](https://img.shields.io/badge/-other-lightgrey) means other sub-direction algorithm, usually as plugin-in in the whole pipeline + +P.S: The `.py` file in `Runnable Demo` can be found in `dizoo` + ## Contributing + We appreciate all contributions to improve DI-engine, both algorithms and system designs. Please refer to CONTRIBUTING.md for more guides. And our roadmap can be accessed by [this link](https://github.com/opendilab/DI-engine/projects/1). And users can join our [slack communication channel](https://join.slack.com/t/opendilab/shared_invite/zt-v9tmv4fp-nUBAQEH1_Kuyu_q4plBssQ) for more detailed discussion. -- GitLab