1. 01 1月, 2022 1 次提交
  2. 24 12月, 2021 1 次提交
    • S
      feature(nyz): add H-PPO hybrid action space algorithm (#140) · 0b71fc4e
      Swain 提交于
      * feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma
      
      * polish(nyz): polish ppo config continous field, move to action_space field
      
      * fix(nyz): fix ppo action_space field compatibility bug
      
      * fix(nyz): fix ppg/sac/cql action_space field compatibility bug
      
      * demo(nyz): update gym hybrid hppo config
      
      * polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase
      
      * polish(pu):polish as review
      
      * polish(pu): polish hppo config
      
      * polish(pu): entropy weight=0.03 performs best empirically
      
      * fix(nyz): fix unittest compatibility bugs
      
      * polish(nyz): remove atari env unused print(ci skip)
      Co-authored-by: Npuyuan1996 <2402552459@qq.com>
      0b71fc4e
  3. 16 12月, 2021 1 次提交
    • W
      feature(nyp): add residual in R2D2(#150) · ab94376c
      Will-Nie 提交于
      * add comments for r2d2
      
      * sort style
      
      * revise according to the comments
      
      * fix style
      
      * add r2d2 residual link + commnets
      
      * revise according to comments, add spaceinvader
      
      * add test for the model and fix test bugs
      ab94376c
  4. 14 12月, 2021 2 次提交
    • W
      polish(nyp): fix unittest for trex training and collecting (#144) · f089d02a
      Will-Nie 提交于
      * add trex algorithm for pong
      
      * sort style
      
      * add atari, ll,cp; fix device, collision; add_ppo
      
      * add accuracy evaluation
      
      * correct style
      
      * add seed to make sure results are replicable
      
      * remove useless part in cum return  of model part
      
      * add mujoco onppo training pipeline; ppo config
      
      * improve style
      
      * add sac training config for mujoco
      
      * add log, add save data; polish config
      
      * logger; hyperparameter;walker
      
      * correct style
      
      * modify else condition
      
      * change rnd to trex
      
      * revise according to comments, add eposode collect
      
      * new collect mode for trex, fix all bugs, commnets
      
      * final change
      
      * polish after the final comment
      
      * add readme/test
      
      * add test for serial entry of trex/gcl
      
      * sort style
      
      * change mujoco to cartpole for test for trex_onppo
      
      * remove files generated by testing
      
      * revise tests for entry
      
      * sort style
      
      * revise tests
      
      * modify pytest
      
      * fix(nyz): speed up ppg/ppo and marl algo unittest
      
      * polish(nyz): speed up trex unittest and fix trex entry default config bug
      
      * fix(nyz): fix same name bug
      
      * fix(nyz): fix remove conflict bug(ci skip)
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      f089d02a
    • W
      polish(nyp):add R2d2 comments (#149) · a2edf6a2
      Will-Nie 提交于
      * add comments for r2d2
      
      * sort style
      
      * revise according to the comments
      
      * fix style
      a2edf6a2
  5. 09 12月, 2021 1 次提交
    • X
      feature(xjx): refactor buffer (#129) · a490729f
      Xu Jingxin 提交于
      * Init base buffer and storage
      
      * Use ratelimit as middleware
      
      * Pass style check
      
      * Keep the return original return value
      
      * Add buffer.view
      
      * Add replace flag on sample, rewrite middleware processing
      
      * Test slicing
      
      * Add buffer copy middleware
      
      * Add update/delete api in buffer, rename middleware
      
      * Implement update and delete api of buffer
      
      * add naive use time count middleware in buffer
      
      * Rename next to chain
      
      * feature(nyz): add staleness check middleware and polish buffer
      
      * feature(nyz): add naive priority experience replay
      
      * Sample by indices
      
      * Combine buffer and storage layers
      
      * Support indices when deleting items from the queue
      
      * Use dataclass to save buffered data, remove return_index and return_meta
      
      * Add ignore_insufficient
      
      * polish(nyz): add return index in push and copy same data in sample
      
      * Drop useless import
      
      * Fix sample with indices, ensure return size is equal to input size or indices size
      
      * Make sure sampled data in buffer is different from each other
      
      * Support sample by grouped meta key
      
      * Support sample by rolling window
      
      * Add import/export data in buffer
      
      * Padding after sampling from buffer
      
      * Polish use_time_check
      
      * Use buffer as dataset
      
      * Set collate_fn in buffer test
      
      * feature(nyz): add deque buffer compatibility wrapper and demo
      
      * polish(nyz): polish code style and add pong dqn new deque buffer demo
      
      * feature(nyz): add use_time_count compatibility in wrapper
      
      * feature(nyz): add priority replay buffer compatibility in wrapper
      
      * Improve performance of buffer.update
      
      * polish(nyz): add priority max limit and correct flake8
      
      * Use __call__ to rewrite middleware
      
      * Rewrite buffer index
      
      * Fix buffer delete
      
      * Skip first item
      
      * Rewrite buffer delete
      
      * Use caller
      
      * Use caller in priority
      
      * Add group sample
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      a490729f
  6. 08 12月, 2021 2 次提交
    • W
      feature(nyp): add Trex algorithm (#119) · 63105fef
      Will-Nie 提交于
      * add trex algorithm for pong
      
      * sort style
      
      * add atari, ll,cp; fix device, collision; add_ppo
      
      * add accuracy evaluation
      
      * correct style
      
      * add seed to make sure results are replicable
      
      * remove useless part in cum return  of model part
      
      * add mujoco onppo training pipeline; ppo config
      
      * improve style
      
      * add sac training config for mujoco
      
      * add log, add save data; polish config
      
      * logger; hyperparameter;walker
      
      * correct style
      
      * modify else condition
      
      * change rnd to trex
      
      * revise according to comments, add eposode collect
      
      * new collect mode for trex, fix all bugs, commnets
      
      * final change
      
      * polish after the final comment
      
      * add readme/test
      
      * add test for serial entry of trex/gcl
      
      * sort style
      63105fef
    • W
      feature(wyh):add masac algorithms (#112) · 18b3720a
      Weiyuhong-1998 提交于
      * fix(wyh):masac
      
      * feature(wyh):single agent discrete sac
      
      * feature(wyh):single agent discrete sac td
      
      * fix(wyh):fix pong bug
      
      * fix(wyh):fix smac bug
      
      * fix(wyh):masac_5m6m best config
      
      * env(wyh):allow SMAC env return ippo/isac obs
      
      * fix(wyh):masac polish
      
      * fix(wyh):masac style
      
      * fix(wyh):masac test
      18b3720a
  7. 25 11月, 2021 1 次提交
    • T
      feature(zt): add curiosity icm algorithm (#41) · b50e8aea
      timothijoe 提交于
      * curisity_icm_v1
      
      * modified version1
      
      * modified v2
      
      * one_hot function change
      
      * add paper information
      
      * format minigrid ppo curiosity
      
      * flake8 ding checked
      
      * 6th-Oct-gpu-modified
      
      * reset configs in minigrid files
      
      * minigird-env-doorkey88-100-300
      
      * use modulelist instead of list in icm module
      
      * change icm reward model
      
      * delete origin curiosit_reward model and add icm_reward model
      
      * modified icm reward model
      
      * polish icm model by zt, (1) polish ding/reward_model/icm_reward_model.py and related __init__.py (2) add config files for pong:dizoo/atari/config/serial/pong/pong_ppo_offpolicy_icm.py and minigrid env: dizoo/minigrid/config/doorkey8_icm_config.py,fourroom_icm_config.py,minigrid_icm_config.py  (3) add element icm in README
      
      * remove some useless config files in minigrid
      
      * remove redundant part in ppo.py, add cartpole_ppo_icm_config.py, changed test_icm.py and Readme
      b50e8aea
  8. 22 11月, 2021 2 次提交
    • W
      feature(wyh): add guided cost algorithm (#57) · ffe8d7c0
      Weiyuhong-1998 提交于
      * guided_cost
      
      * max_e
      
      * guided_cost
      
      * fix(wyh):fix guided cost recompute bug
      
      * fix(wyh):add model save
      
      * feature(wyh):polish guided cost
      
      * feature(wyh):on guided cost
      
      * fix(wyh):gcl-modify
      
      * fix(wyh):gcl sac config
      
      * fix(wyh):gcl style
      
      * fix(wyh):modify comments
      
      * fix(wyh):masac_5m6m best config
      
      * fix(wyh):sac bug
      
      * fix(wyh):GCL readme
      
      * fix(wyh):GCL readme conflicts
      ffe8d7c0
    • N
      fix(nyz): simplify onppo with traj_flag · 7e51de4f
      niuyazhe 提交于
      7e51de4f
  9. 19 11月, 2021 1 次提交
    • D
      polish(davide) add example of GAIL entry + config for Mujoco and Cartpole (#114) · d1bc1387
      Davide Liu 提交于
      * added gail entry
      
      * added lunarlander and cartpole config
      
      * added gail mujoco config
      
      * added mujoco exp
      
      * update22-10
      
      * added third exp
      
      * added metric to evaluate policies
      
      * added GAIL entry and config for Cartpole and Walker2d
      
      * checked style and unittest
      
      * restored lunarlander env
      
      * style problems
      
      * bug correction
      
      * Delete expert_data_train.pkl
      
      * changed loss of GAIL
      
      * Update walker2d_ddpg_gail_config.py
      
      * changed gail reward from -D(s, a) to -log(D(s, a))
      
      * added small constant to reward function
      
      * added comment to clarify config
      
      * Update walker2d_ddpg_gail_config.py
      
      * added lunarlander entry + config
      
      * Added Atari discriminator + Pong entry config
      
      * Update gail_irl_model.py
      
      * Update gail_irl_model.py
      
      * added gail serial pipeline and onehot actions for gail atari
      
      * related to previous commit
      
      * removed main files
      
      * removed old comment
      d1bc1387
  10. 18 11月, 2021 1 次提交
  11. 16 11月, 2021 1 次提交
  12. 01 11月, 2021 2 次提交
    • N
      fix(nyz): fix r2d2 and dqtd error unittest bug · 28930a86
      niuyazhe 提交于
      28930a86
    • feature(pu): add NGU algorithm (#40) · 286ea243
      蒲源 提交于
      * test rnd
      
      * fix mz config
      
      * fix config
      
      * feature(pu): fix r2d2, add beta to actor
      
      * feature(pu): add ngu-dev
      
      * fix(pu): fix r2d2
      
      * fix(puyuan): fix r2d2
      
      * feature(puyuan): add minigrid r2d2 config
      
      * polish minigrid config
      
      * dev-ngu
      
      * feature(pu): add action and reward as inputs of q network
      
      * feature(pu): add episodic reward model
      
      * feature(pu): add episodic reward model, modify r2d2 and collector for ngu
      
      * fix(pu): recover files that were changed by mistake
      
      * fix(pu): fix tblogger cnt bug
      
      * add_dqfd
      
      * Is_expert to is_expert
      
      * fix(pu): fix r2d2 bug
      
      * fix(pu): fix beta index to gamma bug
      
      * fix(pu): fix numerical stability problem
      
      * style(pu): flake8 format
      
      * fix(pu): fix rnd reward model train times
      
      * polish(pu): polish r2d2 reset problem
      
      * fix(pu): fix episodic reward normalize bug
      
      * polish(pu): polish config params and episodic_reward init value
      
      * modify according to the last commnets
      
      * value_gamma;done;marginloss;sqil适配
      
      * feature(pu): add r2d3 algorithm and config of lunarlander and pong
      
      * fix(pu): fix demo path bug
      
      * fix(pu): fix cuda bug at function get_gae in adder.py
      
      * feature(pu): add pong r2d2 config
      
      * polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1
      
      * polish(pu): polish r2d2 config
      
      * test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3
      
      * polish(pu): polish config
      
      * polish(pu): polish config and annotation
      
      * fix(pu): fix r2d2 target net update bug and done bug
      
      * polish(pu): polish pong r2d2 config and add montezuma r2d2 config
      
      * polish(pu): add some logs for debugging in r2d2
      
      * polish(pu): recover config deleted by mistake
      
      * fix(pu): fix r2d3 config of lunarlander and pong
      
      * fix(pu): fix the r2d2 bug in r2d3
      
      * fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py
      
      * fix(pu): fix n_sample bug in serial_entry_r2d3
      
      * polish(pu): polish minigrid r2d2 config
      
      * fix(pu): add info dict of fourrooms doorkey in minigrid_env
      
      * polish(pu): polish r2d2 config
      
      * fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper
      
      * fix(pu): fix r2d2 done and target update bug, polish config
      
      * fix(pu): fix null_padding transition obs to zeros
      
      * fix(pu): episodic_reward transform to [0,1]
      
      * fix(pu): fix the value_gamma bug
      
      * fix(pu): fix device bug in ngu_reward_model.py
      
      * fix(pu): fix null_padding problem in rnd and episodic reward model
      
      * polish(pu): polish config
      
      * fix(pu): use the deepcopy train_data to add bonus reward
      
      * polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode
      
      * fix(pu): fix the episode length 1 bug and weight intrinsic reward bug
      
      * feature(pu): add montezuma ngu config
      
      * fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000
      
      * test(pu): episodic reward transforms to [0,1]
      
      * fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj
      
      * fix(pu): fix r2d2_collect_traj.py
      
      * feature(pu): add pong_r2d3_r2d2expert_config
      
      * polish(pu): yapf format
      
      * polish(pu): fix td.py conflict
      
      * polish(pu): flake8 format
      
      * polish(pu): add lambda_one_step_td key in dqfd error
      
      * test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0
      
      * style(pu): yapf format
      
      * style(pu): format
      
      * polish(nyz): fix ngu detailed compatibility error
      
      * fix(nyz): fix dqfd one_step td lambda bug
      
      * fix(pu): fix test_acer and test_rnd compatibility error
      Co-authored-by: NSwain <niuyazhe314@outlook.com>
      Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>
      286ea243
  13. 31 10月, 2021 1 次提交
  14. 25 10月, 2021 1 次提交
  15. 22 10月, 2021 1 次提交
    • Y
      feature(zym): add offlineRL algo td3_bc and polish policy comments(#88) · 7c1b5e95
      Yinmin.Zhang 提交于
      * feature(zym): add offlineRL algo td3_bc.
      
      * feature(zym): add offlineRL algo td3_bc.
      
      * feature(zym): add offlineRL algo td3_bc.
      
      * polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`.
      
      * fix(lj): fix dimension bug in cql for continuous env.
      
      * fix(zym): fix dimension bug in cql for continuous env.
      
      * fix(zym): fix dimension bug in cql for continuous env.
      
      * polish(zym): update README.md.
      7c1b5e95
  16. 21 10月, 2021 1 次提交
  17. 16 10月, 2021 1 次提交
    • W
      feature(nyp): add DQfD algorithm (#48) · e2ca8738
      Will-Nie 提交于
      * add_dqfd
      
      * Is_expert to is_expert
      
      * modify according to the last commnets
      
      * value_gamma; done; marginloss; sqil compatibility
      
      * finally shorten the code, revise config
      
      * revise config, style
      
      * add_readme/two_more_config
      
      * correct format
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      e2ca8738
  18. 15 10月, 2021 1 次提交
  19. 12 10月, 2021 1 次提交
  20. 02 10月, 2021 1 次提交
  21. 01 10月, 2021 1 次提交
  22. 30 9月, 2021 2 次提交
    • D
      feature(davide): Implementation of D4PG (#76) · 16a89c35
      Davide Liu 提交于
      * added experience replay and n-step
      
      * implementing distributional q value
      
      * added distributional q-value
      
      * added overview in qac_dist and d4pg
      
      * derived D4PG from DDPG
      
      * fixed a bug when action shape >1
      
      * benchmark D4PG mujoco + minor fixs
      
      -entry for DDPG mujoco
      -entry for D4PG mujoco
      -config for D4PG mujoco
      -fixed style D4PG code
      -unittests for QAC distributional
      
      * formatted code
      
      * minor updates (read description)
      
      -added d4pg seria_entry test
      -updated comments in QACDIST
      -added d4pg in commander register
      -added q_value in d4pg return dict
      -added priority update in d4pg entry
      -added assertion in QACDIST
      16a89c35
    • Y
      feature(zym): add offlineRL algo Discrete CQL; add hdf5 dataset for offlineRL. (#68) · 206186f1
      Yinmin.Zhang 提交于
      * feature(zym): add offlineRL algo Discrete CQL.
      
      * feature(zym): add offlineRL algo Discrete CQL; add hdf5 dataset for offlineRL.
      206186f1
  23. 17 9月, 2021 1 次提交
    • fix(pu): fix r2d2 done slice bug and LSTM hidden state reset bug (#52) · 2ffff07e
      蒲源 提交于
      * test rnd
      
      * fix mz config
      
      * fix config
      
      * fix(pu): fix r2d2
      
      * fix(puyuan): fix r2d2
      
      * feature(puyuan): add minigrid r2d2 config
      
      * polish minigrid config
      
      * modified as review
      
      * fix(pu): fix bugffor compatibility
      
      * polish(pu): add annotations and polish slice operation
      
      * style(pu): run format.sh
      
      * style(pu): correct yapf format
      
      * fix(pu): fix config
      
      * fix(pu): fix done slice bug and lstm reset bug
      
      * style(pu): format config
      
      * polish(pu): polish config params for cartpole, lunarlander and minigrid
      
      * polish(pu): polish minigrid config params
      
      * Update r2d2.py
      
      * polish(pu): polish rnn reset problem
      
      * fix(pu): fix merge error
      
      * polish(pu): polish cartpole config
      
      * polish(nyz): polish cartpole r2d2 config for faster convergence
      
      * test(nyz): enable r2d2 algotest
      Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
      2ffff07e
  24. 13 9月, 2021 1 次提交
  25. 08 9月, 2021 2 次提交
    • S
      feature(nyz): add supervised learning image classification training demo (#27) · 11cc97e8
      Swain 提交于
      * feature(nyz): add resnet for cv sl task
      
      * feature(nyz): add imagenet classification dataset and adapt compile config for sl
      
      * feature(nyz): add naive image training entry demo
      
      * style(nyz): polish image cls train log
      
      * polish(nyz): polish multi gpu training setting
      
      * feature(nyz): add nn training bp and update async execution
      
      * feature(nyz): add distributed sampler for different dist backend
      
      * fix(nyz): fix compile config collector and buffer compatibility problem
      
      * style(nyz): correct yapf format
      
      * fix(nyz): fix env manager compile config compatibility bug
      
      * refactor(nyz): abstarct ISerialEvaluator and rename serial evaluation implementation
      
      * refactor(nyz): refactor collector name
      
      * feature(nyz): add metric evaluator and image cls acc metric eval demo
      
      * fix(nyz): fix cuda and multi gpu bug in image cls demo
      11cc97e8
    • W
      style(wyh): add env information in readme (#46) · fa453ef0
      Weiyuhong-1998 提交于
      * env-list
      
      * env-list-fix-grammmer
      
      * env-only-test
      
      * modify-gif
      
      * modify-gif-pendulum
      
      * modify-gif-delect-maze
      fa453ef0
  26. 06 9月, 2021 2 次提交
    • Y
      feature(zym): add offlineRL algo CQL; add offlineRL env D4RL (#37) · 69828ed5
      Yinmin.Zhang 提交于
      * feature(zym): add pybullet env info; add entropy type in sac.
      
      * feature(zym): add cql; add serial entry for offlineRL.
      
      * feature/polish(zym): add generation entry in mujoco env for offlineRL; polish cql/serial entry for offlineRL.
      
      * feature(lj): add d4rl env for offlineRL.
      
      * polish(zym): polish cql.
      
      * feature/polish(zym): add dataset registry; polish offlineRL pipeline.
      
      * fix(zym): fix bug in d4rl/mujoco config; fix bug in dataset for offlineRL.
      
      * style(zym): add pybulletgym and d4rl requirements in setup.
      
      * fix/polish(zym): support str in NaiveRLDataset; polish cql.
      
      * polish(zym): polish command policy.
      
      * feature(zym): add cql in pendulum env; add unittest/algotest for cql.
      
      * fix(zym): fix cql bug in unittest/algotest for cql.
      69828ed5
    • fix(pu): fix r2d2 bug (#36) · c8dac674
      蒲源 提交于
      * test rnd
      
      * fix mz config
      
      * fix config
      
      * fix(pu): fix r2d2
      
      * feature(puyuan): add minigrid r2d2 config
      
      * polish minigrid config
      
      * modified as review
      
      * fix(pu): fix bugffor compatibility
      
      * polish(pu): add annotations and polish slice operation
      
      * style(pu): run format.sh
      
      * style(pu): correct yapf format
      c8dac674
  27. 02 9月, 2021 1 次提交
  28. 27 8月, 2021 1 次提交
  29. 25 8月, 2021 1 次提交
  30. 24 8月, 2021 1 次提交
  31. 20 8月, 2021 1 次提交
    • W
      SQIL (#25) · 9929dc37
      Will-Nie 提交于
      * add sqil
      
      * conceal all the personal info
      
      * revise according to the comments
      
      * correct_format
      
      * add_comment to hardcodes part
      
      * pass flake8
      
      * add force_reproducibility = True; device, ex_model
      
      * check format
      9929dc37
  32. 01 8月, 2021 1 次提交
    • S
      add ACER algorithm(szj) (#14) · dd4de1a0
      simonat2011 提交于
      * add endoro env config. add enduro's ppo,dqn,drdqn,rainbow,impala config.
      
      * modified as reviewer mentions
      
      * add qacd network
      
      * fix bugs
      
      * fix bugs
      
      * update acer algorithm
      
      * update ACER code
      
      * update acer config
      
      * fix bug
      
      * update pong acer's config
      
      * edit commit
      
      * update code as mention
      
      * fix the comment table and trust region
      
      * fix format
      
      * fix typing lint
      
      * fix format,flake8
      
      * fix format
      
      * fix whitespace problem
      
      * test(nyz): add acer unittest and algotest
      
      * style(nyz): correct flake8 style
      Co-authored-by: Nshenziju <simonshen2011@foxmail.com>
      Co-authored-by: NSwain <niuyazhe314@outlook.com>
      dd4de1a0
  33. 29 7月, 2021 1 次提交