提交 · 3d698d0ff17ab028dcc861d57ea933606781fc67 · OpenDILab开源决策智能平台 / DI-engine

08 11月, 2021 2 次提交
- X
  
  Fix sample with indices, ensure return size is equal to input size or indices size · 3d698d0f
  由 Xu Jingxin 提交于 11月 08, 2021
  
  3d698d0f
- X
  
  Drop useless import · 1572fd3e
  由 Xu Jingxin 提交于 11月 08, 2021
  
  1572fd3e
07 11月, 2021 1 次提交
- N
  
  polish(nyz): add return index in push and copy same data in sample · 9c67db8b
  由 niuyazhe 提交于 11月 07, 2021
  
  9c67db8b
05 11月, 2021 19 次提交
- X
  
  Add ignore_insufficient · f764de31
  由 Xu Jingxin 提交于 11月 03, 2021
  
  f764de31
- X
  
  Use dataclass to save buffered data, remove return_index and return_meta · 6e93ff58
  由 Xu Jingxin 提交于 11月 03, 2021
  
  6e93ff58
- X
  
  Support indices when deleting items from the queue · 6b963533
  由 Xu Jingxin 提交于 11月 03, 2021
  
  6b963533
- X
  
  Combine buffer and storage layers · 8108eb23
  由 Xu Jingxin 提交于 11月 03, 2021
  
  8108eb23
- X
  
  Sample by indices · 658a9050
  由 Xu Jingxin 提交于 11月 03, 2021
  
  658a9050
- N
  
  feature(nyz): add naive priority experience replay · 638f5110
  由 niuyazhe 提交于 11月 02, 2021
  
  638f5110
- N
  
  feature(nyz): add staleness check middleware and polish buffer · d066372a
  由 niuyazhe 提交于 11月 02, 2021
  
  d066372a
- X
  
  Rename next to chain · fd863c00
  由 Xu Jingxin 提交于 11月 02, 2021
  
  fd863c00
- N
  
  add naive use time count middleware in buffer · 3de05532
  由 niuyazhe 提交于 11月 01, 2021
  
  3de05532
- X
  
  Implement update and delete api of buffer · 368f2c26
  由 Xu Jingxin 提交于 10月 28, 2021
  
  368f2c26
- X
  
  Add update/delete api in buffer, rename middleware · b9df5d9d
  由 Xu Jingxin 提交于 10月 28, 2021
  
  b9df5d9d
- X
  
  Add buffer copy middleware · d3f1a516
  由 Xu Jingxin 提交于 10月 26, 2021
  
  d3f1a516
- X
  
  Test slicing · e9246b74
  由 Xu Jingxin 提交于 10月 21, 2021
  
  e9246b74
- X
  
  Add replace flag on sample, rewrite middleware processing · d40bdf16
  由 Xu Jingxin 提交于 10月 21, 2021
  
  d40bdf16
- X
  
  Add buffer.view · e6d1c10b
  由 Xu Jingxin 提交于 10月 20, 2021
  
  e6d1c10b
- X
  
  Keep the return original return value · 756c1c69
  由 Xu Jingxin 提交于 10月 19, 2021
  
  756c1c69
- X
  
  Pass style check · 3e5016b2
  由 Xu Jingxin 提交于 10月 18, 2021
  
  3e5016b2
- X
  
  Use ratelimit as middleware · 1afceefe
  由 Xu Jingxin 提交于 10月 18, 2021
  
  1afceefe
- X
  
  Init base buffer and storage · 02a0e808
  由 Xu Jingxin 提交于 10月 18, 2021
  
  02a0e808
03 11月, 2021 3 次提交
- N
  
  fix(nyz): fix wqmix target_model state_dict bug and polish mujoco model env · f70d3ddb
  由 niuyazhe 提交于 11月 03, 2021
  
  f70d3ddb
- N
  
  fix(nyz): fix learn state_dict target model bug · db642fd3
  由 niuyazhe 提交于 11月 03, 2021
  
  db642fd3
- D
  fix(davide): small fix on bsuite environment (#117) · ad780d49
  由 Davide Liu 提交于 11月 03, 2021
```
* small fix

* added bsuite env version

* modified test
```
  ad780d49
01 11月, 2021 3 次提交

N

fix(nyz): fix target model wrapper hard reset bug · fd11c88f
由 niuyazhe 提交于 11月 01, 2021

fd11c88f
N

fix(nyz): fix r2d2 and dqtd error unittest bug · 28930a86
由 niuyazhe 提交于 11月 01, 2021

28930a86

feature(pu): add NGU algorithm (#40) · 286ea243

由蒲源提交于 11月 01, 2021

* test rnd

* fix mz config

* fix config

* feature(pu): fix r2d2, add beta to actor

* feature(pu): add ngu-dev

* fix(pu): fix r2d2

* fix(puyuan): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* dev-ngu

* feature(pu): add action and reward as inputs of q network

* feature(pu): add episodic reward model

* feature(pu): add episodic reward model, modify r2d2 and collector for ngu

* fix(pu): recover files that were changed by mistake

* fix(pu): fix tblogger cnt bug

* add_dqfd

* Is_expert to is_expert

* fix(pu): fix r2d2 bug

* fix(pu): fix beta index to gamma bug

* fix(pu): fix numerical stability problem

* style(pu): flake8 format

* fix(pu): fix rnd reward model train times

* polish(pu): polish r2d2 reset problem

* fix(pu): fix episodic reward normalize bug

* polish(pu): polish config params and episodic_reward init value

* modify according to the last commnets

* value_gamma;done;marginloss;sqil适配

* feature(pu): add r2d3 algorithm and config of lunarlander and pong

* fix(pu): fix demo path bug

* fix(pu): fix cuda bug at function get_gae in adder.py

* feature(pu): add pong r2d2 config

* polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1

* polish(pu): polish r2d2 config

* test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3

* polish(pu): polish config

* polish(pu): polish config and annotation

* fix(pu): fix r2d2 target net update bug and done bug

* polish(pu): polish pong r2d2 config and add montezuma r2d2 config

* polish(pu): add some logs for debugging in r2d2

* polish(pu): recover config deleted by mistake

* fix(pu): fix r2d3 config of lunarlander and pong

* fix(pu): fix the r2d2 bug in r2d3

* fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py

* fix(pu): fix n_sample bug in serial_entry_r2d3

* polish(pu): polish minigrid r2d2 config

* fix(pu): add info dict of fourrooms doorkey in minigrid_env

* polish(pu): polish r2d2 config

* fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper

* fix(pu): fix r2d2 done and target update bug, polish config

* fix(pu): fix null_padding transition obs to zeros

* fix(pu): episodic_reward transform to [0,1]

* fix(pu): fix the value_gamma bug

* fix(pu): fix device bug in ngu_reward_model.py

* fix(pu): fix null_padding problem in rnd and episodic reward model

* polish(pu): polish config

* fix(pu): use the deepcopy train_data to add bonus reward

* polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode

* fix(pu): fix the episode length 1 bug and weight intrinsic reward bug

* feature(pu): add montezuma ngu config

* fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000

* test(pu): episodic reward transforms to [0,1]

* fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj

* fix(pu): fix r2d2_collect_traj.py

* feature(pu): add pong_r2d3_r2d2expert_config

* polish(pu): yapf format

* polish(pu): fix td.py conflict

* polish(pu): flake8 format

* polish(pu): add lambda_one_step_td key in dqfd error

* test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0

* style(pu): yapf format

* style(pu): format

* polish(nyz): fix ngu detailed compatibility error

* fix(nyz): fix dqfd one_step td lambda bug

* fix(pu): fix test_acer and test_rnd compatibility error
Co-authored-by: NSwain <niuyazhe314@outlook.com>
Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>

286ea243

31 10月, 2021 1 次提交
- N
  
  polish(nyz): remove on_policy option in dizoo config and entry · a6aa2c65
  由 niuyazhe 提交于 10月 30, 2021
  
  a6aa2c65
29 10月, 2021 4 次提交

feature(lcm): add MBPO algorithm (#113) · b1e9b4ea

由 Swain 提交于 10月 29, 2021

* feature(lcm): add MBPO algorithm (#87)

* add model-based rl

* fix yazhe's comments

* format

* pass flake8 test

* polish(nyz): polish mbpo import, name and test
Co-authored-by: Nlichuming <lichuming@lichumingdeMacBook-Pro.local>

b1e9b4ea

N

style(nyz): restrict deploy trigger branch · edb14698
由 niuyazhe 提交于 10月 29, 2021

edb14698
N

style(nyz): modify doc and deploy trigger and update mujoco license download link(smac docker) · c77ebf78
由 niuyazhe 提交于 10月 29, 2021

c77ebf78

feature(nyz): add PADDPG for hybrid action space as baseline (#109) · d2f79536

由 Swain 提交于 10月 29, 2021

* fix(nyz): fix gym_hybrid env not scale action bug

* feature(nyz): add PADDPG basic implementation for hybrid action space

* fix(nyz): fix td3/d4pg comatibility bug with new modifications

* fix(nyz): fix hybrid ddpg action type grad bug and update config

* feature(nyz): add eps greedy + multinomial wrapper and gym_hybrid ddpg convergence config

* style(nyz): update PADDPG in README

* test_model_hybrid_qac

* fix_typo_in_README

* test_policy_hybrid_qac

* polish(nyz): polish hybrid action space to dict structure and polish unittest

* fix(nyz): fix td3bc compatibility bug
Co-authored-by: N李可 <like2@CN0014008466M.local>

d2f79536

28 10月, 2021 1 次提交

feature(nyz): add gobigger baseline (#95) · a8fec8bb

由 Swain 提交于 10月 28, 2021

* feature(nyz): add gobigger baseline

* style(nyz): add gobigger env infor

* feature(nyz): add ignore prefix in default collate

* feautre(nyz): add vsbot training baseline

* fix(nyz): fix to_tensor empty list bug and polish gobigger baseline

* style(nyz): split gobigger baseline code

a8fec8bb

26 10月, 2021 2 次提交
- N
  
  polish(nyz): polish collector benchmark test(enable docker, smac docker) · fed80b44
  由 niuyazhe 提交于 10月 26, 2021
  
  fed80b44
- J
  test(yzj): add unittest for dataset, metric_serial_evaluator and learner (#107) · 0414eda5
  由 jayyoung0802 提交于 10月 26, 2021
```
* add 4 pytest dataset.py learner_aggregator.py learner_hook.py metric_serial_evaluator.py

* fix yapf and flake8 And remove invalid self._env

* fix fake_cls_config.py flake8
```
  0414eda5
25 10月, 2021 1 次提交

test(wyh): add more unittest for ppo and sac policy (#104) · c5af1cf2

由 Weiyuhong-1998 提交于 10月 25, 2021

* fix(wyh):reward model test

* fix(wyh):sac ppo test

* fix(wyh):ppo_continuous test

* fix(wyh):style

* fix(wyh):ppo test
Co-authored-by: NSwain <niuyazhe314@outlook.com>

c5af1cf2

22 10月, 2021 3 次提交

feature(zym): add offlineRL algo td3_bc and polish policy comments(#88) · 7c1b5e95

由 Yinmin.Zhang 提交于 10月 22, 2021

* feature(zym): add offlineRL algo td3_bc.

* feature(zym): add offlineRL algo td3_bc.

* feature(zym): add offlineRL algo td3_bc.

* polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`.

* fix(lj): fix dimension bug in cql for continuous env.

* fix(zym): fix dimension bug in cql for continuous env.

* fix(zym): fix dimension bug in cql for continuous env.

* polish(zym): update README.md.

7c1b5e95

polish(nyz): fix ppo bugs and update atari ppo offpolicy config (#108) · 2d5ec7c3

由 Swain 提交于 10月 22, 2021

* fix(nyz): fix ppo cuda bug and random collect bug

* config(nyz): add pong ppo off policy better config

* fix(nyz): fix ppo device bug in get_train_sample and update ppo offpolicy config

* style(nyz): correct yapf format

2d5ec7c3

N

fix(nyz): fix base policy model state_dict overlap bug · c2b14d48
由 niuyazhe 提交于 10月 22, 2021

c2b14d48

OpenDILab开源决策智能平台 / DI-engine 上一次同步 接近 3 年

OpenDILab开源决策智能平台 / DI-engine
上一次同步接近 3 年