1. 30 12月, 2021 1 次提交
  2. 21 12月, 2021 2 次提交
  3. 15 12月, 2021 1 次提交
  4. 14 12月, 2021 1 次提交
  5. 09 12月, 2021 1 次提交
  6. 08 12月, 2021 1 次提交
    • W
      feature(nyp): add Trex algorithm (#119) · 63105fef
      Will-Nie 提交于
      * add trex algorithm for pong
      
      * sort style
      
      * add atari, ll,cp; fix device, collision; add_ppo
      
      * add accuracy evaluation
      
      * correct style
      
      * add seed to make sure results are replicable
      
      * remove useless part in cum return  of model part
      
      * add mujoco onppo training pipeline; ppo config
      
      * improve style
      
      * add sac training config for mujoco
      
      * add log, add save data; polish config
      
      * logger; hyperparameter;walker
      
      * correct style
      
      * modify else condition
      
      * change rnd to trex
      
      * revise according to comments, add eposode collect
      
      * new collect mode for trex, fix all bugs, commnets
      
      * final change
      
      * polish after the final comment
      
      * add readme/test
      
      * add test for serial entry of trex/gcl
      
      * sort style
      63105fef
  7. 06 12月, 2021 1 次提交
  8. 03 12月, 2021 2 次提交
    • N
      v0.2.2 · 312f274d
      niuyazhe 提交于
      312f274d
    • K
      feature(lk): implement multi pass DQN (#131) · f087d2c7
      Ke Li 提交于
      * feature(lk): add initial version of MP-PDQN
      
      * fix(lk): fix expand function bug
      
      * refactor(nyz): refactor mpdqn continuous args inputs module
      
      * fix(nyz): fix pdqn scatter index generation
      
      * fix(lk): fix pdqn scatter assignment bug
      
      * feature(lk): polish mpdqn code and style format
      
      * feature(lk): add mpdqn config and test file
      
      * feature(lk): polish mpdqn code and style format
      
      * fix(lk): fix import bug
      
      * polish(lk): add test for mpdqn
      
      * polish(lk): polish code style and format
      
      * polish(lk): rm print debug info
      
      * polish(lk): rm print debug info
      
      * polish(lk): polish code style and format
      
      * polish(lk): add MPDQN in readme.md
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      f087d2c7
  9. 01 12月, 2021 1 次提交
  10. 26 11月, 2021 1 次提交
    • polish(pu): add loss statistics and polish r2d3 pong config (#126) · 81602ce9
      蒲源 提交于
      * fix(pu): fix adam weight decay bug
      
      * feature(pu): add pitfall offppo config
      
      * feature(pu): add qbert spaceinvaders pitfall r2d3 config
      
      * fix(pu): fix expert offfppo config in r2d3
      
      * fix(pu): fix pong connfig
      
      * polish(pu): add loss statistics
      
      * fix(pu): fix loss statistics bug
      
      * polish(pu): polish pong r2d3 config
      
      * polish(pu): polish r2d3 pong and lunarlander config
      
      * polish(pu): delete unused files
      81602ce9
  11. 25 11月, 2021 1 次提交
    • T
      feature(zt): add curiosity icm algorithm (#41) · b50e8aea
      timothijoe 提交于
      * curisity_icm_v1
      
      * modified version1
      
      * modified v2
      
      * one_hot function change
      
      * add paper information
      
      * format minigrid ppo curiosity
      
      * flake8 ding checked
      
      * 6th-Oct-gpu-modified
      
      * reset configs in minigrid files
      
      * minigird-env-doorkey88-100-300
      
      * use modulelist instead of list in icm module
      
      * change icm reward model
      
      * delete origin curiosit_reward model and add icm_reward model
      
      * modified icm reward model
      
      * polish icm model by zt, (1) polish ding/reward_model/icm_reward_model.py and related __init__.py (2) add config files for pong:dizoo/atari/config/serial/pong/pong_ppo_offpolicy_icm.py and minigrid env: dizoo/minigrid/config/doorkey8_icm_config.py,fourroom_icm_config.py,minigrid_icm_config.py  (3) add element icm in README
      
      * remove some useless config files in minigrid
      
      * remove redundant part in ppo.py, add cartpole_ppo_icm_config.py, changed test_icm.py and Readme
      b50e8aea
  12. 22 11月, 2021 2 次提交
    • W
      feature(wyh): add guided cost algorithm (#57) · ffe8d7c0
      Weiyuhong-1998 提交于
      * guided_cost
      
      * max_e
      
      * guided_cost
      
      * fix(wyh):fix guided cost recompute bug
      
      * fix(wyh):add model save
      
      * feature(wyh):polish guided cost
      
      * feature(wyh):on guided cost
      
      * fix(wyh):gcl-modify
      
      * fix(wyh):gcl sac config
      
      * fix(wyh):gcl style
      
      * fix(wyh):modify comments
      
      * fix(wyh):masac_5m6m best config
      
      * fix(wyh):sac bug
      
      * fix(wyh):GCL readme
      
      * fix(wyh):GCL readme conflicts
      ffe8d7c0
    • N
      v0.2.1 · cf8ad134
      niuyazhe 提交于
      cf8ad134
  13. 20 11月, 2021 1 次提交
  14. 19 11月, 2021 2 次提交
    • K
      feature(lk): add PDQN algorithm for hybrid action spaces (#118) · 39a7cfe3
      Ke Li 提交于
      * add_pdqn_model
      
      * modify_model_structure
      
      * initial_version_PDQN
      
      * bug_free_PDQN_no_test_convergence
      
      * update_pdqn_config
      
      * add_noise_to_continuous_args
      
      * polish(nyz): polish code style and add noise in pdqn
      
      * seperate_dis_and_cont_model
      
      * fix_bug_for_separation
      
      * fix(pu): current q value use the data action, fix cont loss detach bug, 1 encoder, dist and cont learning rate
      
      * polish(pu): actor delay update
      
      * fix(pu): fix disc cont update frequency
      
      * polish(pu): polish pdqn config
      
      * polish(lk): add comments and typelint for pdqn and dqn
      
      * feature(lk): add test file for pdqn model and policy
      
      * polish(lk): code style
      
      * polish(lk): rm the modify of unrelated files
      
      * polish(lk): rm useless commentes code in pdqn
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      Co-authored-by: Npuyuan1996 <2402552459@qq.com>
      39a7cfe3
    • N
      d8115c50
  15. 15 11月, 2021 1 次提交
  16. 29 10月, 2021 2 次提交
    • S
      feature(lcm): add MBPO algorithm (#113) · b1e9b4ea
      Swain 提交于
      * feature(lcm): add MBPO algorithm (#87)
      
      * add model-based rl
      
      * fix yazhe's comments
      
      * format
      
      * pass flake8 test
      
      * polish(nyz): polish mbpo import, name and test
      Co-authored-by: Nlichuming <lichuming@lichumingdeMacBook-Pro.local>
      b1e9b4ea
    • S
      feature(nyz): add PADDPG for hybrid action space as baseline (#109) · d2f79536
      Swain 提交于
      * fix(nyz): fix gym_hybrid env not scale action bug
      
      * feature(nyz): add PADDPG basic implementation for hybrid action space
      
      * fix(nyz): fix td3/d4pg comatibility bug with new modifications
      
      * fix(nyz): fix hybrid ddpg action type grad bug and update config
      
      * feature(nyz): add eps greedy + multinomial wrapper and gym_hybrid ddpg convergence config
      
      * style(nyz): update PADDPG in README
      
      * test_model_hybrid_qac
      
      * fix_typo_in_README
      
      * test_policy_hybrid_qac
      
      * polish(nyz): polish hybrid action space to dict structure and polish unittest
      
      * fix(nyz): fix td3bc compatibility bug
      Co-authored-by: N李可 <like2@CN0014008466M.local>
      d2f79536
  17. 28 10月, 2021 1 次提交
    • S
      feature(nyz): add gobigger baseline (#95) · a8fec8bb
      Swain 提交于
      * feature(nyz): add gobigger baseline
      
      * style(nyz): add gobigger env infor
      
      * feature(nyz): add ignore prefix in default collate
      
      * feautre(nyz): add vsbot training baseline
      
      * fix(nyz): fix to_tensor empty list bug and polish gobigger baseline
      
      * style(nyz): split gobigger baseline code
      a8fec8bb
  18. 22 10月, 2021 1 次提交
    • Y
      feature(zym): add offlineRL algo td3_bc and polish policy comments(#88) · 7c1b5e95
      Yinmin.Zhang 提交于
      * feature(zym): add offlineRL algo td3_bc.
      
      * feature(zym): add offlineRL algo td3_bc.
      
      * feature(zym): add offlineRL algo td3_bc.
      
      * polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`.
      
      * fix(lj): fix dimension bug in cql for continuous env.
      
      * fix(zym): fix dimension bug in cql for continuous env.
      
      * fix(zym): fix dimension bug in cql for continuous env.
      
      * polish(zym): update README.md.
      7c1b5e95
  19. 21 10月, 2021 1 次提交
    • K
      feature(lk): add gym-soccer (HFO) env (#94) · 8f47f4cb
      Ke Li 提交于
      * add_soccer_env
      
      * add_info
      
      * close
      
      * format
      
      * test_gym_soccer
      
      * rm_torch
      
      * replay_log
      
      * format_style
      
      * add_gym_soccer_to_readme
      
      * separate render_func
      
      * add_gif_file
      
      * scale_action
      
      * flake_style_format
      
      * resolve_review_comments
      
      * add branch info for gym hybrid
      8f47f4cb
  20. 19 10月, 2021 1 次提交
  21. 16 10月, 2021 1 次提交
    • W
      feature(nyp): add DQfD algorithm (#48) · e2ca8738
      Will-Nie 提交于
      * add_dqfd
      
      * Is_expert to is_expert
      
      * modify according to the last commnets
      
      * value_gamma; done; marginloss; sqil compatibility
      
      * finally shorten the code, revise config
      
      * revise config, style
      
      * add_readme/two_more_config
      
      * correct format
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      e2ca8738
  22. 12 10月, 2021 1 次提交
  23. 08 10月, 2021 1 次提交
    • L
      feature(zlx): add vs bot training and self-play training with slime volley env (#23) · dbf432cd
      LuciusMos 提交于
      * slime volley env in dizoo, first commit
      
      * fix bug in slime volley env
      
      * modify volley env to satisfy ding 1v1 requirements; add naive self-play and league training pipeline(evaluator is not finished, now use a very naive one)
      
      * adopt volley builtin ai as default eval opponent
      
      * polish(nyz): polish slime_volley_env and its test
      
      * feature(nyz): add slime_volley vs bot ppo demo
      
      * feature(nyz): add battle_sample_serial_collector and adapt abnormal check in subprocess env manager
      
      * feature(nyz): add slime volley self-play demo
      
      * style(nyz): add slime_volleyball env gif and split MARL and selfplay label
      
      * feature(nyz): add save replay function in slime volleyball env
      Co-authored-by: Nzlx-sensetime <zhaoliangxuan@sensetime.com>
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      dbf432cd
  24. 01 10月, 2021 1 次提交
  25. 30 9月, 2021 4 次提交
  26. 24 9月, 2021 1 次提交
  27. 23 9月, 2021 1 次提交
  28. 17 9月, 2021 2 次提交
  29. 14 9月, 2021 1 次提交
  30. 08 9月, 2021 2 次提交