提交 · ae6ab6c7d6f595b0ed53edc97df58ac0733786cc · OpenDILab开源决策智能平台 / DI-engine

01 1月, 2022 1 次提交
- N
  
  fix(nyz): fix exp_name seedx name bug with data generation path · ae6ab6c7
  由 niuyazhe 提交于 1月 01, 2022
  
  ae6ab6c7
24 12月, 2021 1 次提交

feature(nyz): add H-PPO hybrid action space algorithm (#140) · 0b71fc4e

由 Swain 提交于 12月 24, 2021

* feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma

* polish(nyz): polish ppo config continous field, move to action_space field

* fix(nyz): fix ppo action_space field compatibility bug

* fix(nyz): fix ppg/sac/cql action_space field compatibility bug

* demo(nyz): update gym hybrid hppo config

* polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase

* polish(pu):polish as review

* polish(pu): polish hppo config

* polish(pu): entropy weight=0.03 performs best empirically

* fix(nyz): fix unittest compatibility bugs

* polish(nyz): remove atari env unused print(ci skip)
Co-authored-by: Npuyuan1996 <2402552459@qq.com>

0b71fc4e

16 12月, 2021 1 次提交

feature(nyp): add residual in R2D2(#150) · ab94376c

由 Will-Nie 提交于 12月 16, 2021

* add comments for r2d2

* sort style

* revise according to the comments

* fix style

* add r2d2 residual link + commnets

* revise according to comments, add spaceinvader

* add test for the model and fix test bugs

ab94376c

14 12月, 2021 2 次提交

polish(nyp): fix unittest for trex training and collecting (#144) · f089d02a

由 Will-Nie 提交于 12月 14, 2021

* add trex algorithm for pong

* sort style

* add atari, ll,cp; fix device, collision; add_ppo

* add accuracy evaluation

* correct style

* add seed to make sure results are replicable

* remove useless part in cum return  of model part

* add mujoco onppo training pipeline; ppo config

* improve style

* add sac training config for mujoco

* add log, add save data; polish config

* logger; hyperparameter;walker

* correct style

* modify else condition

* change rnd to trex

* revise according to comments, add eposode collect

* new collect mode for trex, fix all bugs, commnets

* final change

* polish after the final comment

* add readme/test

* add test for serial entry of trex/gcl

* sort style

* change mujoco to cartpole for test for trex_onppo

* remove files generated by testing

* revise tests for entry

* sort style

* revise tests

* modify pytest

* fix(nyz): speed up ppg/ppo and marl algo unittest

* polish(nyz): speed up trex unittest and fix trex entry default config bug

* fix(nyz): fix same name bug

* fix(nyz): fix remove conflict bug(ci skip)
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>

f089d02a

W
polish(nyp):add R2d2 comments (#149) · a2edf6a2
由 Will-Nie 提交于 12月 14, 2021
```
* add comments for r2d2

* sort style

* revise according to the comments

* fix style
```
a2edf6a2

09 12月, 2021 1 次提交

feature(xjx): refactor buffer (#129) · a490729f

由 Xu Jingxin 提交于 12月 09, 2021

* Init base buffer and storage

* Use ratelimit as middleware

* Pass style check

* Keep the return original return value

* Add buffer.view

* Add replace flag on sample, rewrite middleware processing

* Test slicing

* Add buffer copy middleware

* Add update/delete api in buffer, rename middleware

* Implement update and delete api of buffer

* add naive use time count middleware in buffer

* Rename next to chain

* feature(nyz): add staleness check middleware and polish buffer

* feature(nyz): add naive priority experience replay

* Sample by indices

* Combine buffer and storage layers

* Support indices when deleting items from the queue

* Use dataclass to save buffered data, remove return_index and return_meta

* Add ignore_insufficient

* polish(nyz): add return index in push and copy same data in sample

* Drop useless import

* Fix sample with indices, ensure return size is equal to input size or indices size

* Make sure sampled data in buffer is different from each other

* Support sample by grouped meta key

* Support sample by rolling window

* Add import/export data in buffer

* Padding after sampling from buffer

* Polish use_time_check

* Use buffer as dataset

* Set collate_fn in buffer test

* feature(nyz): add deque buffer compatibility wrapper and demo

* polish(nyz): polish code style and add pong dqn new deque buffer demo

* feature(nyz): add use_time_count compatibility in wrapper

* feature(nyz): add priority replay buffer compatibility in wrapper

* Improve performance of buffer.update

* polish(nyz): add priority max limit and correct flake8

* Use __call__ to rewrite middleware

* Rewrite buffer index

* Fix buffer delete

* Skip first item

* Rewrite buffer delete

* Use caller

* Use caller in priority

* Add group sample
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>

a490729f

08 12月, 2021 2 次提交

feature(nyp): add Trex algorithm (#119) · 63105fef

由 Will-Nie 提交于 12月 08, 2021

* add trex algorithm for pong

* sort style

* add atari, ll,cp; fix device, collision; add_ppo

* add accuracy evaluation

* correct style

* add seed to make sure results are replicable

* remove useless part in cum return  of model part

* add mujoco onppo training pipeline; ppo config

* improve style

* add sac training config for mujoco

* add log, add save data; polish config

* logger; hyperparameter;walker

* correct style

* modify else condition

* change rnd to trex

* revise according to comments, add eposode collect

* new collect mode for trex, fix all bugs, commnets

* final change

* polish after the final comment

* add readme/test

* add test for serial entry of trex/gcl

* sort style

63105fef

feature(wyh):add masac algorithms (#112) · 18b3720a

由 Weiyuhong-1998 提交于 12月 08, 2021

* fix(wyh):masac

* feature(wyh):single agent discrete sac

* feature(wyh):single agent discrete sac td

* fix(wyh):fix pong bug

* fix(wyh):fix smac bug

* fix(wyh):masac_5m6m best config

* env(wyh):allow SMAC env return ippo/isac obs

* fix(wyh):masac polish

* fix(wyh):masac style

* fix(wyh):masac test

18b3720a

25 11月, 2021 1 次提交

feature(zt): add curiosity icm algorithm (#41) · b50e8aea

由 timothijoe 提交于 11月 25, 2021

* curisity_icm_v1

* modified version1

* modified v2

* one_hot function change

* add paper information

* format minigrid ppo curiosity

* flake8 ding checked

* 6th-Oct-gpu-modified

* reset configs in minigrid files

* minigird-env-doorkey88-100-300

* use modulelist instead of list in icm module

* change icm reward model

* delete origin curiosit_reward model and add icm_reward model

* modified icm reward model

* polish icm model by zt, (1) polish ding/reward_model/icm_reward_model.py and related __init__.py (2) add config files for pong:dizoo/atari/config/serial/pong/pong_ppo_offpolicy_icm.py and minigrid env: dizoo/minigrid/config/doorkey8_icm_config.py,fourroom_icm_config.py,minigrid_icm_config.py  (3) add element icm in README

* remove some useless config files in minigrid

* remove redundant part in ppo.py, add cartpole_ppo_icm_config.py, changed test_icm.py and Readme

b50e8aea

22 11月, 2021 2 次提交

feature(wyh): add guided cost algorithm (#57) · ffe8d7c0

由 Weiyuhong-1998 提交于 11月 22, 2021

* guided_cost

* max_e

* guided_cost

* fix(wyh):fix guided cost recompute bug

* fix(wyh):add model save

* feature(wyh):polish guided cost

* feature(wyh):on guided cost

* fix(wyh):gcl-modify

* fix(wyh):gcl sac config

* fix(wyh):gcl style

* fix(wyh):modify comments

* fix(wyh):masac_5m6m best config

* fix(wyh):sac bug

* fix(wyh):GCL readme

* fix(wyh):GCL readme conflicts

ffe8d7c0

N

fix(nyz): simplify onppo with traj_flag · 7e51de4f
由 niuyazhe 提交于 11月 22, 2021

7e51de4f

19 11月, 2021 1 次提交

polish(davide) add example of GAIL entry + config for Mujoco and Cartpole (#114) · d1bc1387

由 Davide Liu 提交于 11月 19, 2021

* added gail entry

* added lunarlander and cartpole config

* added gail mujoco config

* added mujoco exp

* update22-10

* added third exp

* added metric to evaluate policies

* added GAIL entry and config for Cartpole and Walker2d

* checked style and unittest

* restored lunarlander env

* style problems

* bug correction

* Delete expert_data_train.pkl

* changed loss of GAIL

* Update walker2d_ddpg_gail_config.py

* changed gail reward from -D(s, a) to -log(D(s, a))

* added small constant to reward function

* added comment to clarify config

* Update walker2d_ddpg_gail_config.py

* added lunarlander entry + config

* Added Atari discriminator + Pong entry config

* Update gail_irl_model.py

* Update gail_irl_model.py

* added gail serial pipeline and onehot actions for gail atari

* related to previous commit

* removed main files

* removed old comment

d1bc1387

18 11月, 2021 1 次提交
- N
  feature(nyz): add registry force_overwrite argument and polish cartpole · cbee45b4
  由 niuyazhe 提交于 11月 18, 2021
```
qrdqn config
```
  cbee45b4
16 11月, 2021 1 次提交
- N
  
  polish(nyz): add torch1.1.0 compatibility for torch.utils.data · 171dddc4
  由 niuyazhe 提交于 11月 16, 2021
  
  171dddc4
01 11月, 2021 2 次提交

N

fix(nyz): fix r2d2 and dqtd error unittest bug · 28930a86
由 niuyazhe 提交于 11月 01, 2021

28930a86

蒲

feature(pu): add NGU algorithm (#40) · 286ea243

由蒲源提交于 11月 01, 2021

* test rnd

* fix mz config

* fix config

* feature(pu): fix r2d2, add beta to actor

* feature(pu): add ngu-dev

* fix(pu): fix r2d2

* fix(puyuan): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* dev-ngu

* feature(pu): add action and reward as inputs of q network

* feature(pu): add episodic reward model

* feature(pu): add episodic reward model, modify r2d2 and collector for ngu

* fix(pu): recover files that were changed by mistake

* fix(pu): fix tblogger cnt bug

* add_dqfd

* Is_expert to is_expert

* fix(pu): fix r2d2 bug

* fix(pu): fix beta index to gamma bug

* fix(pu): fix numerical stability problem

* style(pu): flake8 format

* fix(pu): fix rnd reward model train times

* polish(pu): polish r2d2 reset problem

* fix(pu): fix episodic reward normalize bug

* polish(pu): polish config params and episodic_reward init value

* modify according to the last commnets

* value_gamma;done;marginloss;sqil适配

* feature(pu): add r2d3 algorithm and config of lunarlander and pong

* fix(pu): fix demo path bug

* fix(pu): fix cuda bug at function get_gae in adder.py

* feature(pu): add pong r2d2 config

* polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1

* polish(pu): polish r2d2 config

* test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3

* polish(pu): polish config

* polish(pu): polish config and annotation

* fix(pu): fix r2d2 target net update bug and done bug

* polish(pu): polish pong r2d2 config and add montezuma r2d2 config

* polish(pu): add some logs for debugging in r2d2

* polish(pu): recover config deleted by mistake

* fix(pu): fix r2d3 config of lunarlander and pong

* fix(pu): fix the r2d2 bug in r2d3

* fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py

* fix(pu): fix n_sample bug in serial_entry_r2d3

* polish(pu): polish minigrid r2d2 config

* fix(pu): add info dict of fourrooms doorkey in minigrid_env

* polish(pu): polish r2d2 config

* fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper

* fix(pu): fix r2d2 done and target update bug, polish config

* fix(pu): fix null_padding transition obs to zeros

* fix(pu): episodic_reward transform to [0,1]

* fix(pu): fix the value_gamma bug

* fix(pu): fix device bug in ngu_reward_model.py

* fix(pu): fix null_padding problem in rnd and episodic reward model

* polish(pu): polish config

* fix(pu): use the deepcopy train_data to add bonus reward

* polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode

* fix(pu): fix the episode length 1 bug and weight intrinsic reward bug

* feature(pu): add montezuma ngu config

* fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000

* test(pu): episodic reward transforms to [0,1]

* fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj

* fix(pu): fix r2d2_collect_traj.py

* feature(pu): add pong_r2d3_r2d2expert_config

* polish(pu): yapf format

* polish(pu): fix td.py conflict

* polish(pu): flake8 format

* polish(pu): add lambda_one_step_td key in dqfd error

* test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0

* style(pu): yapf format

* style(pu): format

* polish(nyz): fix ngu detailed compatibility error

* fix(nyz): fix dqfd one_step td lambda bug

* fix(pu): fix test_acer and test_rnd compatibility error
Co-authored-by: NSwain <niuyazhe314@outlook.com>
Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>

286ea243

31 10月, 2021 1 次提交
- N
  
  polish(nyz): remove on_policy option in dizoo config and entry · a6aa2c65
  由 niuyazhe 提交于 10月 30, 2021
  
  a6aa2c65
25 10月, 2021 1 次提交

test(wyh): add more unittest for ppo and sac policy (#104) · c5af1cf2

由 Weiyuhong-1998 提交于 10月 25, 2021

* fix(wyh):reward model test

* fix(wyh):sac ppo test

* fix(wyh):ppo_continuous test

* fix(wyh):style

* fix(wyh):ppo test
Co-authored-by: NSwain <niuyazhe314@outlook.com>

c5af1cf2

22 10月, 2021 1 次提交

feature(zym): add offlineRL algo td3_bc and polish policy comments(#88) · 7c1b5e95

由 Yinmin.Zhang 提交于 10月 22, 2021

* feature(zym): add offlineRL algo td3_bc.

* feature(zym): add offlineRL algo td3_bc.

* feature(zym): add offlineRL algo td3_bc.

* polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`.

* fix(lj): fix dimension bug in cql for continuous env.

* fix(zym): fix dimension bug in cql for continuous env.

* fix(zym): fix dimension bug in cql for continuous env.

* polish(zym): update README.md.

7c1b5e95

21 10月, 2021 1 次提交
- N
  
  polish(nyz): modify dizoo test mark to envtest(enable docker, smac docker) · f04b9eb7
  由 niuyazhe 提交于 10月 21, 2021
  
  f04b9eb7
16 10月, 2021 1 次提交

feature(nyp): add DQfD algorithm (#48) · e2ca8738

由 Will-Nie 提交于 10月 16, 2021

* add_dqfd

* Is_expert to is_expert

* modify according to the last commnets

* value_gamma; done; marginloss; sqil compatibility

* finally shorten the code, revise config

* revise config, style

* add_readme/two_more_config

* correct format
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>

e2ca8738

15 10月, 2021 1 次提交
- N
  
  polish(nyz): remove torch in env and correct dizoo yapf format · 4b7e50c4
  由 niuyazhe 提交于 10月 15, 2021
  
  4b7e50c4
12 10月, 2021 1 次提交
- N
  
  polish(nyz): polish sac and cql policy · f537adf0
  由 niuyazhe 提交于 10月 12, 2021
  
  f537adf0
02 10月, 2021 1 次提交
- N
  
  fix(nyz): fix test discrete cql config mismatch bug(enable docker, smac docker) · c500a2e5
  由 niuyazhe 提交于 10月 02, 2021
  
  c500a2e5
01 10月, 2021 1 次提交
- N
  
  fix(nyz): fix test discrete cql unittest bug · e173e663
  由 niuyazhe 提交于 10月 01, 2021
  
  e173e663
30 9月, 2021 2 次提交

feature(davide): Implementation of D4PG (#76) · 16a89c35

由 Davide Liu 提交于 9月 30, 2021

* added experience replay and n-step

* implementing distributional q value

* added distributional q-value

* added overview in qac_dist and d4pg

* derived D4PG from DDPG

* fixed a bug when action shape >1

* benchmark D4PG mujoco + minor fixs

-entry for DDPG mujoco
-entry for D4PG mujoco
-config for D4PG mujoco
-fixed style D4PG code
-unittests for QAC distributional

* formatted code

* minor updates (read description)

-added d4pg seria_entry test
-updated comments in QACDIST
-added d4pg in commander register
-added q_value in d4pg return dict
-added priority update in d4pg entry
-added assertion in QACDIST

16a89c35

Y
feature(zym): add offlineRL algo Discrete CQL; add hdf5 dataset for offlineRL. (#68) · 206186f1
由 Yinmin.Zhang 提交于 9月 30, 2021
```
* feature(zym): add offlineRL algo Discrete CQL.

* feature(zym): add offlineRL algo Discrete CQL; add hdf5 dataset for offlineRL.
```
206186f1

17 9月, 2021 1 次提交

蒲

fix(pu): fix r2d2 done slice bug and LSTM hidden state reset bug (#52) · 2ffff07e

由蒲源提交于 9月 17, 2021

* test rnd

* fix mz config

* fix config

* fix(pu): fix r2d2

* fix(puyuan): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* modified as review

* fix(pu): fix bugffor compatibility

* polish(pu): add annotations and polish slice operation

* style(pu): run format.sh

* style(pu): correct yapf format

* fix(pu): fix config

* fix(pu): fix done slice bug and lstm reset bug

* style(pu): format config

* polish(pu): polish config params for cartpole, lunarlander and minigrid

* polish(pu): polish minigrid config params

* Update r2d2.py

* polish(pu): polish rnn reset problem

* fix(pu): fix merge error

* polish(pu): polish cartpole config

* polish(nyz): polish cartpole r2d2 config for faster convergence

* test(nyz): enable r2d2 algotest
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>

2ffff07e

13 9月, 2021 1 次提交
- W
  fix(wyh):formatted config no eval bug (#53) · b4e4cabe
  由 Weiyuhong-1998 提交于 9月 13, 2021
```
* fix_formatted_config_bug_eval

* fix(wyh):add config pytest
```
  b4e4cabe
08 9月, 2021 2 次提交

feature(nyz): add supervised learning image classification training demo (#27) · 11cc97e8

由 Swain 提交于 9月 08, 2021

* feature(nyz): add resnet for cv sl task

* feature(nyz): add imagenet classification dataset and adapt compile config for sl

* feature(nyz): add naive image training entry demo

* style(nyz): polish image cls train log

* polish(nyz): polish multi gpu training setting

* feature(nyz): add nn training bp and update async execution

* feature(nyz): add distributed sampler for different dist backend

* fix(nyz): fix compile config collector and buffer compatibility problem

* style(nyz): correct yapf format

* fix(nyz): fix env manager compile config compatibility bug

* refactor(nyz): abstarct ISerialEvaluator and rename serial evaluation implementation

* refactor(nyz): refactor collector name

* feature(nyz): add metric evaluator and image cls acc metric eval demo

* fix(nyz): fix cuda and multi gpu bug in image cls demo

11cc97e8

style(wyh): add env information in readme (#46) · fa453ef0

由 Weiyuhong-1998 提交于 9月 08, 2021

* env-list

* env-list-fix-grammmer

* env-only-test

* modify-gif

* modify-gif-pendulum

* modify-gif-delect-maze

fa453ef0

06 9月, 2021 2 次提交

feature(zym): add offlineRL algo CQL; add offlineRL env D4RL (#37) · 69828ed5

由 Yinmin.Zhang 提交于 9月 06, 2021

* feature(zym): add pybullet env info; add entropy type in sac.

* feature(zym): add cql; add serial entry for offlineRL.

* feature/polish(zym): add generation entry in mujoco env for offlineRL; polish cql/serial entry for offlineRL.

* feature(lj): add d4rl env for offlineRL.

* polish(zym): polish cql.

* feature/polish(zym): add dataset registry; polish offlineRL pipeline.

* fix(zym): fix bug in d4rl/mujoco config; fix bug in dataset for offlineRL.

* style(zym): add pybulletgym and d4rl requirements in setup.

* fix/polish(zym): support str in NaiveRLDataset; polish cql.

* polish(zym): polish command policy.

* feature(zym): add cql in pendulum env; add unittest/algotest for cql.

* fix(zym): fix cql bug in unittest/algotest for cql.

69828ed5

蒲

fix(pu): fix r2d2 bug (#36) · c8dac674

由蒲源提交于 9月 06, 2021

* test rnd

* fix mz config

* fix config

* fix(pu): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* modified as review

* fix(pu): fix bugffor compatibility

* polish(pu): add annotations and polish slice operation

* style(pu): run format.sh

* style(pu): correct yapf format

c8dac674

02 9月, 2021 1 次提交
- N
  
  hotfix(nyz): fix cartpole ppg value buffer sample typo · da19fdbd
  由 niuyazhe 提交于 9月 02, 2021
  
  da19fdbd
27 8月, 2021 1 次提交
- N
  
  polish(nyz): polish cartpole dqn visualize demo and add solo eval demo · 020eba28
  由 niuyazhe 提交于 8月 27, 2021
  
  020eba28
25 8月, 2021 1 次提交
- N
  
  style(nyz): rename advanced_buffer register name to advanced · 84583d44
  由 niuyazhe 提交于 8月 25, 2021
  
  84583d44
24 8月, 2021 1 次提交
- N
  
  test(nyz): add sqil unittest and algotest, remove adder comment in policy, polish sqil config · 42e31ea2
  由 niuyazhe 提交于 8月 24, 2021
  
  42e31ea2
20 8月, 2021 1 次提交

SQIL (#25) · 9929dc37

由 Will-Nie 提交于 8月 20, 2021

* add sqil

* conceal all the personal info

* revise according to the comments

* correct_format

* add_comment to hardcodes part

* pass flake8

* add force_reproducibility = True; device, ex_model

* check format

9929dc37

01 8月, 2021 1 次提交

add ACER algorithm(szj) (#14) · dd4de1a0

由 simonat2011 提交于 8月 01, 2021

* add endoro env config. add enduro's ppo,dqn,drdqn,rainbow,impala config.

* modified as reviewer mentions

* add qacd network

* fix bugs

* fix bugs

* update acer algorithm

* update ACER code

* update acer config

* fix bug

* update pong acer's config

* edit commit

* update code as mention

* fix the comment table and trust region

* fix format

* fix typing lint

* fix format,flake8

* fix format

* fix whitespace problem

* test(nyz): add acer unittest and algotest

* style(nyz): correct flake8 style
Co-authored-by: Nshenziju <simonshen2011@foxmail.com>
Co-authored-by: NSwain <niuyazhe314@outlook.com>

dd4de1a0

29 7月, 2021 1 次提交
- N
  
  polish(nyz): polish cartpole ppo demo and related unittest · 4e833da2
  由 niuyazhe 提交于 7月 29, 2021
  
  4e833da2

OpenDILab开源决策智能平台 / DI-engine 上一次同步 2 年多

OpenDILab开源决策智能平台 / DI-engine
上一次同步 2 年多