1. 03 12月, 2021 2 次提交
    • K
      feature(lk): implement multi pass DQN (#131) · f087d2c7
      Ke Li 提交于
      * feature(lk): add initial version of MP-PDQN
      
      * fix(lk): fix expand function bug
      
      * refactor(nyz): refactor mpdqn continuous args inputs module
      
      * fix(nyz): fix pdqn scatter index generation
      
      * fix(lk): fix pdqn scatter assignment bug
      
      * feature(lk): polish mpdqn code and style format
      
      * feature(lk): add mpdqn config and test file
      
      * feature(lk): polish mpdqn code and style format
      
      * fix(lk): fix import bug
      
      * polish(lk): add test for mpdqn
      
      * polish(lk): polish code style and format
      
      * polish(lk): rm print debug info
      
      * polish(lk): rm print debug info
      
      * polish(lk): polish code style and format
      
      * polish(lk): add MPDQN in readme.md
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      f087d2c7
    • D
      benchmark(davide): Bsuite memory benchmark (#138) · 5ee17ad1
      Davide Liu 提交于
      * added r2d2 + a2c configs
      
      * changed convergence reward for some env
      
      * removed configs that don't converge
      
      * removed 'on_policy' param in 2rd2 configs
      5ee17ad1
  2. 26 11月, 2021 1 次提交
    • polish(pu): add loss statistics and polish r2d3 pong config (#126) · 81602ce9
      蒲源 提交于
      * fix(pu): fix adam weight decay bug
      
      * feature(pu): add pitfall offppo config
      
      * feature(pu): add qbert spaceinvaders pitfall r2d3 config
      
      * fix(pu): fix expert offfppo config in r2d3
      
      * fix(pu): fix pong connfig
      
      * polish(pu): add loss statistics
      
      * fix(pu): fix loss statistics bug
      
      * polish(pu): polish pong r2d3 config
      
      * polish(pu): polish r2d3 pong and lunarlander config
      
      * polish(pu): delete unused files
      81602ce9
  3. 25 11月, 2021 3 次提交
    • N
      polish(nyz): polish impala atrai config · 41dce176
      niuyazhe 提交于
      41dce176
    • W
      feature(nyp): add apple key to door treasure env(#128) · 4157cdae
      Will-Nie 提交于
      * add apple key to door treasure and polish
      
      * add test, revise reward, build four envs
      
      * add 7x7-1 ADTKT
      4157cdae
    • T
      feature(zt): add curiosity icm algorithm (#41) · b50e8aea
      timothijoe 提交于
      * curisity_icm_v1
      
      * modified version1
      
      * modified v2
      
      * one_hot function change
      
      * add paper information
      
      * format minigrid ppo curiosity
      
      * flake8 ding checked
      
      * 6th-Oct-gpu-modified
      
      * reset configs in minigrid files
      
      * minigird-env-doorkey88-100-300
      
      * use modulelist instead of list in icm module
      
      * change icm reward model
      
      * delete origin curiosit_reward model and add icm_reward model
      
      * modified icm reward model
      
      * polish icm model by zt, (1) polish ding/reward_model/icm_reward_model.py and related __init__.py (2) add config files for pong:dizoo/atari/config/serial/pong/pong_ppo_offpolicy_icm.py and minigrid env: dizoo/minigrid/config/doorkey8_icm_config.py,fourroom_icm_config.py,minigrid_icm_config.py  (3) add element icm in README
      
      * remove some useless config files in minigrid
      
      * remove redundant part in ppo.py, add cartpole_ppo_icm_config.py, changed test_icm.py and Readme
      b50e8aea
  4. 24 11月, 2021 1 次提交
  5. 22 11月, 2021 4 次提交
    • W
      feature(wyh): add guided cost algorithm (#57) · ffe8d7c0
      Weiyuhong-1998 提交于
      * guided_cost
      
      * max_e
      
      * guided_cost
      
      * fix(wyh):fix guided cost recompute bug
      
      * fix(wyh):add model save
      
      * feature(wyh):polish guided cost
      
      * feature(wyh):on guided cost
      
      * fix(wyh):gcl-modify
      
      * fix(wyh):gcl sac config
      
      * fix(wyh):gcl style
      
      * fix(wyh):modify comments
      
      * fix(wyh):masac_5m6m best config
      
      * fix(wyh):sac bug
      
      * fix(wyh):GCL readme
      
      * fix(wyh):GCL readme conflicts
      ffe8d7c0
    • P
      polish(pu): polish value norm and fix get_gae · 7992b5d3
      puyuan1996 提交于
      7992b5d3
    • N
      fix(nyz): simplify onppo with traj_flag · 7e51de4f
      niuyazhe 提交于
      7e51de4f
    • fix(pu): fix recompute advantage in on policy ppo and polish rnd_onppo algorithm (#124) · 0b46dd24
      蒲源 提交于
      * test rnd
      
      * fix mz config
      
      * fix config
      
      * fix config
      
      * fix(pu): fix r2d2
      
      * fix(pu): fix ppo-onpolicy-rnd adv bug
      
      * fix(puyuan): fix r2d2
      
      * feature(puyuan): add minigrid r2d2 config
      
      * polish minigrid config
      
      * dev-ppo-onpolicy-rnd
      
      * fix(pu): fix rnd reward normalize bug
      
      * feature(pu): add minigrid fourrooms and doorkey env info
      
      * feature(pu): add serial_entry_onpolicy
      
      * fix(pu): fix config params of onpolicy ppo
      
      * feature(pu): add obs normalization
      
      * polish(pu): polish rnd intrinsic reward normalization
      
      * fix(pu): fix clear data bug
      
      * test(pu): add off-policy ppo config
      
      * polish(pu): polish minigrid onppo-rnd config
      
      * polish(pu): polish rnd reward model and minigrid config for rnd_onppo
      
      * polish(pu): polish minigrid rnd_onppo config
      
      * feature(pu): add gym-minigrid
      
      * fix(pu): fix ISerialEvaluator bug
      
      * fix(pu): fix cuda device compatibility
      
      * fix(pu): fix MiniGrid-ObstructedMaze-2Dlh-v0 env_id bug
      
      * polish(pu): squash rnd intrinsic reward to [0,1] according to the batch min and max
      
      * style(pu): yapf format
      
      * polich(pu):polish pitfall offppo config
      
      * polish(pu): polish rnd-onppo and onppo config
      
      * polish(pu): polish config and weight last reward
      
      * polish(pu):polish rnd-onppo config
      
      * fix(pu)" fix mujoco onppo config
      
      * fix(pu): fix continous version of  dict_data_split_traj_and_compute_adv
      
      * polish(pu):polish config
      
      * fix(pu): add key traj_flag in data to split traj correctly  when ignore_done is True in halfcheetah
      
      * polish(pu): polish annatation
      
      * polish(pu): withdraw files submitted wrongly
      
      * polish(pu): withdraw files deleted wrongly
      
      * polish(pu): polish onppo config
      
      * fix(pu): fix remaining_traj_data recompute adv bug and polish rnd onppo code
      
      * style(pu): yapf format
      
      * polish(pu): polish gae_traj_flag function
      
      * polish(pu): delete redundant function in onppo
      0b46dd24
  6. 19 11月, 2021 2 次提交
    • D
      polish(davide) add example of GAIL entry + config for Mujoco and Cartpole (#114) · d1bc1387
      Davide Liu 提交于
      * added gail entry
      
      * added lunarlander and cartpole config
      
      * added gail mujoco config
      
      * added mujoco exp
      
      * update22-10
      
      * added third exp
      
      * added metric to evaluate policies
      
      * added GAIL entry and config for Cartpole and Walker2d
      
      * checked style and unittest
      
      * restored lunarlander env
      
      * style problems
      
      * bug correction
      
      * Delete expert_data_train.pkl
      
      * changed loss of GAIL
      
      * Update walker2d_ddpg_gail_config.py
      
      * changed gail reward from -D(s, a) to -log(D(s, a))
      
      * added small constant to reward function
      
      * added comment to clarify config
      
      * Update walker2d_ddpg_gail_config.py
      
      * added lunarlander entry + config
      
      * Added Atari discriminator + Pong entry config
      
      * Update gail_irl_model.py
      
      * Update gail_irl_model.py
      
      * added gail serial pipeline and onehot actions for gail atari
      
      * related to previous commit
      
      * removed main files
      
      * removed old comment
      d1bc1387
    • K
      feature(lk): add PDQN algorithm for hybrid action spaces (#118) · 39a7cfe3
      Ke Li 提交于
      * add_pdqn_model
      
      * modify_model_structure
      
      * initial_version_PDQN
      
      * bug_free_PDQN_no_test_convergence
      
      * update_pdqn_config
      
      * add_noise_to_continuous_args
      
      * polish(nyz): polish code style and add noise in pdqn
      
      * seperate_dis_and_cont_model
      
      * fix_bug_for_separation
      
      * fix(pu): current q value use the data action, fix cont loss detach bug, 1 encoder, dist and cont learning rate
      
      * polish(pu): actor delay update
      
      * fix(pu): fix disc cont update frequency
      
      * polish(pu): polish pdqn config
      
      * polish(lk): add comments and typelint for pdqn and dqn
      
      * feature(lk): add test file for pdqn model and policy
      
      * polish(lk): code style
      
      * polish(lk): rm the modify of unrelated files
      
      * polish(lk): rm useless commentes code in pdqn
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      Co-authored-by: Npuyuan1996 <2402552459@qq.com>
      39a7cfe3
  7. 18 11月, 2021 2 次提交
  8. 16 11月, 2021 1 次提交
  9. 15 11月, 2021 1 次提交
  10. 03 11月, 2021 2 次提交
  11. 01 11月, 2021 2 次提交
    • N
      fix(nyz): fix r2d2 and dqtd error unittest bug · 28930a86
      niuyazhe 提交于
      28930a86
    • feature(pu): add NGU algorithm (#40) · 286ea243
      蒲源 提交于
      * test rnd
      
      * fix mz config
      
      * fix config
      
      * feature(pu): fix r2d2, add beta to actor
      
      * feature(pu): add ngu-dev
      
      * fix(pu): fix r2d2
      
      * fix(puyuan): fix r2d2
      
      * feature(puyuan): add minigrid r2d2 config
      
      * polish minigrid config
      
      * dev-ngu
      
      * feature(pu): add action and reward as inputs of q network
      
      * feature(pu): add episodic reward model
      
      * feature(pu): add episodic reward model, modify r2d2 and collector for ngu
      
      * fix(pu): recover files that were changed by mistake
      
      * fix(pu): fix tblogger cnt bug
      
      * add_dqfd
      
      * Is_expert to is_expert
      
      * fix(pu): fix r2d2 bug
      
      * fix(pu): fix beta index to gamma bug
      
      * fix(pu): fix numerical stability problem
      
      * style(pu): flake8 format
      
      * fix(pu): fix rnd reward model train times
      
      * polish(pu): polish r2d2 reset problem
      
      * fix(pu): fix episodic reward normalize bug
      
      * polish(pu): polish config params and episodic_reward init value
      
      * modify according to the last commnets
      
      * value_gamma;done;marginloss;sqil适配
      
      * feature(pu): add r2d3 algorithm and config of lunarlander and pong
      
      * fix(pu): fix demo path bug
      
      * fix(pu): fix cuda bug at function get_gae in adder.py
      
      * feature(pu): add pong r2d2 config
      
      * polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1
      
      * polish(pu): polish r2d2 config
      
      * test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3
      
      * polish(pu): polish config
      
      * polish(pu): polish config and annotation
      
      * fix(pu): fix r2d2 target net update bug and done bug
      
      * polish(pu): polish pong r2d2 config and add montezuma r2d2 config
      
      * polish(pu): add some logs for debugging in r2d2
      
      * polish(pu): recover config deleted by mistake
      
      * fix(pu): fix r2d3 config of lunarlander and pong
      
      * fix(pu): fix the r2d2 bug in r2d3
      
      * fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py
      
      * fix(pu): fix n_sample bug in serial_entry_r2d3
      
      * polish(pu): polish minigrid r2d2 config
      
      * fix(pu): add info dict of fourrooms doorkey in minigrid_env
      
      * polish(pu): polish r2d2 config
      
      * fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper
      
      * fix(pu): fix r2d2 done and target update bug, polish config
      
      * fix(pu): fix null_padding transition obs to zeros
      
      * fix(pu): episodic_reward transform to [0,1]
      
      * fix(pu): fix the value_gamma bug
      
      * fix(pu): fix device bug in ngu_reward_model.py
      
      * fix(pu): fix null_padding problem in rnd and episodic reward model
      
      * polish(pu): polish config
      
      * fix(pu): use the deepcopy train_data to add bonus reward
      
      * polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode
      
      * fix(pu): fix the episode length 1 bug and weight intrinsic reward bug
      
      * feature(pu): add montezuma ngu config
      
      * fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000
      
      * test(pu): episodic reward transforms to [0,1]
      
      * fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj
      
      * fix(pu): fix r2d2_collect_traj.py
      
      * feature(pu): add pong_r2d3_r2d2expert_config
      
      * polish(pu): yapf format
      
      * polish(pu): fix td.py conflict
      
      * polish(pu): flake8 format
      
      * polish(pu): add lambda_one_step_td key in dqfd error
      
      * test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0
      
      * style(pu): yapf format
      
      * style(pu): format
      
      * polish(nyz): fix ngu detailed compatibility error
      
      * fix(nyz): fix dqfd one_step td lambda bug
      
      * fix(pu): fix test_acer and test_rnd compatibility error
      Co-authored-by: NSwain <niuyazhe314@outlook.com>
      Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>
      286ea243
  12. 31 10月, 2021 1 次提交
  13. 29 10月, 2021 2 次提交
    • S
      feature(lcm): add MBPO algorithm (#113) · b1e9b4ea
      Swain 提交于
      * feature(lcm): add MBPO algorithm (#87)
      
      * add model-based rl
      
      * fix yazhe's comments
      
      * format
      
      * pass flake8 test
      
      * polish(nyz): polish mbpo import, name and test
      Co-authored-by: Nlichuming <lichuming@lichumingdeMacBook-Pro.local>
      b1e9b4ea
    • S
      feature(nyz): add PADDPG for hybrid action space as baseline (#109) · d2f79536
      Swain 提交于
      * fix(nyz): fix gym_hybrid env not scale action bug
      
      * feature(nyz): add PADDPG basic implementation for hybrid action space
      
      * fix(nyz): fix td3/d4pg comatibility bug with new modifications
      
      * fix(nyz): fix hybrid ddpg action type grad bug and update config
      
      * feature(nyz): add eps greedy + multinomial wrapper and gym_hybrid ddpg convergence config
      
      * style(nyz): update PADDPG in README
      
      * test_model_hybrid_qac
      
      * fix_typo_in_README
      
      * test_policy_hybrid_qac
      
      * polish(nyz): polish hybrid action space to dict structure and polish unittest
      
      * fix(nyz): fix td3bc compatibility bug
      Co-authored-by: N李可 <like2@CN0014008466M.local>
      d2f79536
  14. 28 10月, 2021 1 次提交
    • S
      feature(nyz): add gobigger baseline (#95) · a8fec8bb
      Swain 提交于
      * feature(nyz): add gobigger baseline
      
      * style(nyz): add gobigger env infor
      
      * feature(nyz): add ignore prefix in default collate
      
      * feautre(nyz): add vsbot training baseline
      
      * fix(nyz): fix to_tensor empty list bug and polish gobigger baseline
      
      * style(nyz): split gobigger baseline code
      a8fec8bb
  15. 25 10月, 2021 1 次提交
  16. 22 10月, 2021 2 次提交
    • Y
      feature(zym): add offlineRL algo td3_bc and polish policy comments(#88) · 7c1b5e95
      Yinmin.Zhang 提交于
      * feature(zym): add offlineRL algo td3_bc.
      
      * feature(zym): add offlineRL algo td3_bc.
      
      * feature(zym): add offlineRL algo td3_bc.
      
      * polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`.
      
      * fix(lj): fix dimension bug in cql for continuous env.
      
      * fix(zym): fix dimension bug in cql for continuous env.
      
      * fix(zym): fix dimension bug in cql for continuous env.
      
      * polish(zym): update README.md.
      7c1b5e95
    • S
      polish(nyz): fix ppo bugs and update atari ppo offpolicy config (#108) · 2d5ec7c3
      Swain 提交于
      * fix(nyz): fix ppo cuda bug and random collect bug
      
      * config(nyz): add pong ppo off policy better config
      
      * fix(nyz): fix ppo device bug in get_train_sample and update ppo offpolicy config
      
      * style(nyz): correct yapf format
      2d5ec7c3
  17. 21 10月, 2021 2 次提交
  18. 19 10月, 2021 1 次提交
  19. 16 10月, 2021 1 次提交
    • W
      feature(nyp): add DQfD algorithm (#48) · e2ca8738
      Will-Nie 提交于
      * add_dqfd
      
      * Is_expert to is_expert
      
      * modify according to the last commnets
      
      * value_gamma; done; marginloss; sqil compatibility
      
      * finally shorten the code, revise config
      
      * revise config, style
      
      * add_readme/two_more_config
      
      * correct format
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      e2ca8738
  20. 15 10月, 2021 1 次提交
  21. 12 10月, 2021 2 次提交
  22. 08 10月, 2021 1 次提交
    • L
      feature(zlx): add vs bot training and self-play training with slime volley env (#23) · dbf432cd
      LuciusMos 提交于
      * slime volley env in dizoo, first commit
      
      * fix bug in slime volley env
      
      * modify volley env to satisfy ding 1v1 requirements; add naive self-play and league training pipeline(evaluator is not finished, now use a very naive one)
      
      * adopt volley builtin ai as default eval opponent
      
      * polish(nyz): polish slime_volley_env and its test
      
      * feature(nyz): add slime_volley vs bot ppo demo
      
      * feature(nyz): add battle_sample_serial_collector and adapt abnormal check in subprocess env manager
      
      * feature(nyz): add slime volley self-play demo
      
      * style(nyz): add slime_volleyball env gif and split MARL and selfplay label
      
      * feature(nyz): add save replay function in slime volleyball env
      Co-authored-by: Nzlx-sensetime <zhaoliangxuan@sensetime.com>
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      dbf432cd
  23. 02 10月, 2021 2 次提交
  24. 01 10月, 2021 2 次提交