提交 · 2b181eda47444cbccb74af4f53897add0f9c0a00 · OpenDILab开源决策智能平台 / DI-engine

03 12月, 2021 4 次提交

N

fix(nyz): rename sum keepdims to keepdim for compatiblity and remove sql wrapper · 2b181eda
由 niuyazhe 提交于 12月 03, 2021

2b181eda

feature(lk): implement multi pass DQN (#131) · f087d2c7

由 Ke Li 提交于 12月 03, 2021

* feature(lk): add initial version of MP-PDQN

* fix(lk): fix expand function bug

* refactor(nyz): refactor mpdqn continuous args inputs module

* fix(nyz): fix pdqn scatter index generation

* fix(lk): fix pdqn scatter assignment bug

* feature(lk): polish mpdqn code and style format

* feature(lk): add mpdqn config and test file

* feature(lk): polish mpdqn code and style format

* fix(lk): fix import bug

* polish(lk): add test for mpdqn

* polish(lk): polish code style and format

* polish(lk): rm print debug info

* polish(lk): rm print debug info

* polish(lk): polish code style and format

* polish(lk): add MPDQN in readme.md
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>

f087d2c7

benchmark(davide): Bsuite memory benchmark (#138) · 5ee17ad1

由 Davide Liu 提交于 12月 03, 2021

* added r2d2 + a2c configs

* changed convergence reward for some env

* removed configs that don't converge

* removed 'on_policy' param in 2rd2 configs

5ee17ad1

fix(crb): fix subenvmanager reset bug (#137) · 0cfa4235

由 Robin Chen 提交于 12月 03, 2021

* update base env manager and test

* add test reset once

* update subprecess env manager and test

* format code

* update picking error

* add unpickle catch for sync

* fix reset waitingenv bug

0cfa4235

02 12月, 2021 1 次提交

fix(crb): add unpickling error catch for sync env manager (#134) · 4b6e6a7d

由 Robin Chen 提交于 12月 02, 2021

* update base env manager and test

* add test reset once

* update subprecess env manager and test

* format code

* update picking error

* add unpickle catch for sync

4b6e6a7d

01 12月, 2021 1 次提交
- N
  
  style(nyz): add supporters in README(ci skip) · e8e1d09d
  由 niuyazhe 提交于 12月 01, 2021
  
  e8e1d09d
30 11月, 2021 1 次提交
- N
  
  fix(nyz): fix hidden state wrapper h compatibility(smac docker) · c6763f8e
  由 niuyazhe 提交于 11月 30, 2021
  
  c6763f8e
26 11月, 2021 2 次提交

蒲

polish(pu): add loss statistics and polish r2d3 pong config (#126) · 81602ce9

由蒲源提交于 11月 26, 2021

* fix(pu): fix adam weight decay bug

* feature(pu): add pitfall offppo config

* feature(pu): add qbert spaceinvaders pitfall r2d3 config

* fix(pu): fix expert offfppo config in r2d3

* fix(pu): fix pong connfig

* polish(pu): add loss statistics

* fix(pu): fix loss statistics bug

* polish(pu): polish pong r2d3 config

* polish(pu): polish r2d3 pong and lunarlander config

* polish(pu): delete unused files

81602ce9

fix(crb): add renew for env manager; update retry and timeout logit for... · f88bc0e0

由 Robin Chen 提交于 11月 26, 2021

fix(crb): add renew for env manager; update retry and timeout logit for subprecess env manager (#127)

* update base env manager and test

* add test reset once

* update subprecess env manager and test

* format code

* update picking error

f88bc0e0

25 11月, 2021 4 次提交

N

polish(nyz): polish impala atrai config · 41dce176
由 niuyazhe 提交于 11月 25, 2021

41dce176
W
feature(nyp): add apple key to door treasure env(#128) · 4157cdae
由 Will-Nie 提交于 11月 25, 2021
```
* add apple key to door treasure and polish

* add test, revise reward, build four envs

* add 7x7-1 ADTKT
```
4157cdae
N

style(nyz): modify style and test workflow trigger(ci skip) · 045937e3
由 niuyazhe 提交于 11月 25, 2021

045937e3

feature(zt): add curiosity icm algorithm (#41) · b50e8aea

由 timothijoe 提交于 11月 25, 2021

* curisity_icm_v1

* modified version1

* modified v2

* one_hot function change

* add paper information

* format minigrid ppo curiosity

* flake8 ding checked

* 6th-Oct-gpu-modified

* reset configs in minigrid files

* minigird-env-doorkey88-100-300

* use modulelist instead of list in icm module

* change icm reward model

* delete origin curiosit_reward model and add icm_reward model

* modified icm reward model

* polish icm model by zt, (1) polish ding/reward_model/icm_reward_model.py and related __init__.py (2) add config files for pong:dizoo/atari/config/serial/pong/pong_ppo_offpolicy_icm.py and minigrid env: dizoo/minigrid/config/doorkey8_icm_config.py,fourroom_icm_config.py,minigrid_icm_config.py  (3) add element icm in README

* remove some useless config files in minigrid

* remove redundant part in ppo.py, add cartpole_ppo_icm_config.py, changed test_icm.py and Readme

b50e8aea

24 11月, 2021 3 次提交
- N
  
  fix(nyz): fix naive buffer auto create bug · 5216fb31
  由 niuyazhe 提交于 11月 24, 2021
  
  5216fb31
- N
  
  style(nyz): polish dqn config table · 5963d076
  由 niuyazhe 提交于 11月 24, 2021
  
  5963d076
- N
  
  fix(nyz): fix gym_soccer env install and test bugs · 51cb4a0e
  由 niuyazhe 提交于 11月 24, 2021
  
  51cb4a0e
22 11月, 2021 8 次提交

N

fix(nyz): fix rnd and gae unittest bugs · 7359054c
由 niuyazhe 提交于 11月 22, 2021

7359054c
N

Merge branch 'fix-ppo-adv' · 39e8671d
由 niuyazhe 提交于 11月 22, 2021

39e8671d
N

style(nyz): correct format · 7ae259a6
由 niuyazhe 提交于 11月 22, 2021

7ae259a6

feature(wyh): add guided cost algorithm (#57) · ffe8d7c0

由 Weiyuhong-1998 提交于 11月 22, 2021

* guided_cost

* max_e

* guided_cost

* fix(wyh):fix guided cost recompute bug

* fix(wyh):add model save

* feature(wyh):polish guided cost

* feature(wyh):on guided cost

* fix(wyh):gcl-modify

* fix(wyh):gcl sac config

* fix(wyh):gcl style

* fix(wyh):modify comments

* fix(wyh):masac_5m6m best config

* fix(wyh):sac bug

* fix(wyh):GCL readme

* fix(wyh):GCL readme conflicts

ffe8d7c0

P

polish(pu): polish value norm and fix get_gae · 7992b5d3
由 puyuan1996 提交于 11月 22, 2021

7992b5d3
N

v0.2.1 · cf8ad134
由 niuyazhe 提交于 11月 22, 2021

cf8ad134
N

fix(nyz): simplify onppo with traj_flag · 7e51de4f
由 niuyazhe 提交于 11月 22, 2021

7e51de4f

蒲

fix(pu): fix recompute advantage in on policy ppo and polish rnd_onppo algorithm (#124) · 0b46dd24

由蒲源提交于 11月 22, 2021

* test rnd

* fix mz config

* fix config

* fix config

* fix(pu): fix r2d2

* fix(pu): fix ppo-onpolicy-rnd adv bug

* fix(puyuan): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* dev-ppo-onpolicy-rnd

* fix(pu): fix rnd reward normalize bug

* feature(pu): add minigrid fourrooms and doorkey env info

* feature(pu): add serial_entry_onpolicy

* fix(pu): fix config params of onpolicy ppo

* feature(pu): add obs normalization

* polish(pu): polish rnd intrinsic reward normalization

* fix(pu): fix clear data bug

* test(pu): add off-policy ppo config

* polish(pu): polish minigrid onppo-rnd config

* polish(pu): polish rnd reward model and minigrid config for rnd_onppo

* polish(pu): polish minigrid rnd_onppo config

* feature(pu): add gym-minigrid

* fix(pu): fix ISerialEvaluator bug

* fix(pu): fix cuda device compatibility

* fix(pu): fix MiniGrid-ObstructedMaze-2Dlh-v0 env_id bug

* polish(pu): squash rnd intrinsic reward to [0,1] according to the batch min and max

* style(pu): yapf format

* polich(pu):polish pitfall offppo config

* polish(pu): polish rnd-onppo and onppo config

* polish(pu): polish config and weight last reward

* polish(pu):polish rnd-onppo config

* fix(pu)" fix mujoco onppo config

* fix(pu): fix continous version of  dict_data_split_traj_and_compute_adv

* polish(pu):polish config

* fix(pu): add key traj_flag in data to split traj correctly  when ignore_done is True in halfcheetah

* polish(pu): polish annatation

* polish(pu): withdraw files submitted wrongly

* polish(pu): withdraw files deleted wrongly

* polish(pu): polish onppo config

* fix(pu): fix remaining_traj_data recompute adv bug and polish rnd onppo code

* style(pu): yapf format

* polish(pu): polish gae_traj_flag function

* polish(pu): delete redundant function in onppo

0b46dd24

20 11月, 2021 1 次提交
- N
  
  style(nyz): add PDQN/MAPPO link, DQN doc zh link and correct format · 96103e9b
  由 niuyazhe 提交于 11月 20, 2021
  
  96103e9b
19 11月, 2021 4 次提交

polish(davide) add example of GAIL entry + config for Mujoco and Cartpole (#114) · d1bc1387

由 Davide Liu 提交于 11月 19, 2021

* added gail entry

* added lunarlander and cartpole config

* added gail mujoco config

* added mujoco exp

* update22-10

* added third exp

* added metric to evaluate policies

* added GAIL entry and config for Cartpole and Walker2d

* checked style and unittest

* restored lunarlander env

* style problems

* bug correction

* Delete expert_data_train.pkl

* changed loss of GAIL

* Update walker2d_ddpg_gail_config.py

* changed gail reward from -D(s, a) to -log(D(s, a))

* added small constant to reward function

* added comment to clarify config

* Update walker2d_ddpg_gail_config.py

* added lunarlander entry + config

* Added Atari discriminator + Pong entry config

* Update gail_irl_model.py

* Update gail_irl_model.py

* added gail serial pipeline and onehot actions for gail atari

* related to previous commit

* removed main files

* removed old comment

d1bc1387

feature(lk): add PDQN algorithm for hybrid action spaces (#118) · 39a7cfe3

由 Ke Li 提交于 11月 19, 2021

* add_pdqn_model

* modify_model_structure

* initial_version_PDQN

* bug_free_PDQN_no_test_convergence

* update_pdqn_config

* add_noise_to_continuous_args

* polish(nyz): polish code style and add noise in pdqn

* seperate_dis_and_cont_model

* fix_bug_for_separation

* fix(pu): current q value use the data action, fix cont loss detach bug, 1 encoder, dist and cont learning rate

* polish(pu): actor delay update

* fix(pu): fix disc cont update frequency

* polish(pu): polish pdqn config

* polish(lk): add comments and typelint for pdqn and dqn

* feature(lk): add test file for pdqn model and policy

* polish(lk): code style

* polish(lk): rm the modify of unrelated files

* polish(lk): rm useless commentes code in pdqn
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
Co-authored-by: Npuyuan1996 <2402552459@qq.com>

39a7cfe3

N

style(nyz): modify supported PyTorch version and correct format · d8115c50
由 niuyazhe 提交于 11月 19, 2021

d8115c50
Z

test(zjow): verified the compatibility of PyTorch==1.10.0 · 3922ffb5
由 zjowowen 提交于 11月 19, 2021

3922ffb5

18 11月, 2021 3 次提交
- N
  
  polish(nyz): add naive buffer periodic thruput seconds argument · abc4b3a2
  由 niuyazhe 提交于 11月 18, 2021
  
  abc4b3a2
- J
  polish(yzj): add DataParallel and DataDistributedParallel (#123) · d1188d71
  由 jayyoung0802 提交于 11月 18, 2021
```
* add spaceinvaders multi gpu

* add dp and ddp

* Update __init__.py

* recover init
```
  d1188d71
- N
  feature(nyz): add registry force_overwrite argument and polish cartpole · cbee45b4
  由 niuyazhe 提交于 11月 18, 2021
```
qrdqn config
```
  cbee45b4
17 11月, 2021 1 次提交
- X
  Merge pull request #122 from opendilab/dev-torch1.1.0 · f0014586
  由 Xu Jingxin 提交于 11月 17, 2021
```
feature(nyz): extend torch1.1.0 support
```
  f0014586
16 11月, 2021 4 次提交
- N
  
  refactor(nyz): add new compatibility file in ding top level · 495e2f1a
  由 niuyazhe 提交于 11月 16, 2021
  
  495e2f1a
- N
  
  polish(nyz): add torch1.1.0 compatibility for nn.Flatten · 7acdb671
  由 niuyazhe 提交于 11月 16, 2021
  
  7acdb671
- N
  
  polish(nyz): add torch1.1.0 compatibility for torch.utils.data · 171dddc4
  由 niuyazhe 提交于 11月 16, 2021
  
  171dddc4
- N
  
  style(nyz): add torch1.1.0 support · 8df82e01
  由 niuyazhe 提交于 11月 16, 2021
  
  8df82e01
15 11月, 2021 2 次提交
- J
  feature(jrn): add the bipedalwalker config of sac and ppo (#121) · 38480a5b
  由 Jia Ruonan 提交于 11月 15, 2021
```
* commit bipedalwalkere_ppo_config

* commit bipedalwalker_sac_config
```
  38480a5b
- N
  
  style(nyz): add mbrl badge and env doc link · 12bc041d
  由 niuyazhe 提交于 11月 15, 2021
  
  12bc041d
07 11月, 2021 1 次提交
- N
  feature(nyz): enable arbitrary policy num in serial sample collector and... · 3a91c429
  由 niuyazhe 提交于 11月 07, 2021
```
feature(nyz): enable arbitrary policy num in serial sample collector and evaluator, add git in docker(smac docker)
```
  3a91c429

OpenDILab开源决策智能平台 / DI-engine 上一次同步 2 年多

OpenDILab开源决策智能平台 / DI-engine
上一次同步 2 年多