提交 · 7e51de4f3d35fba11c4801c95b00158ff330c438 · OpenDILab开源决策智能平台 / DI-engine

22 11月, 2021 2 次提交

N

fix(nyz): simplify onppo with traj_flag · 7e51de4f
由 niuyazhe 提交于 11月 22, 2021

7e51de4f

fix(pu): fix recompute advantage in on policy ppo and polish rnd_onppo algorithm (#124) · 0b46dd24

由蒲源提交于 11月 22, 2021

* test rnd

* fix mz config

* fix config

* fix config

* fix(pu): fix r2d2

* fix(pu): fix ppo-onpolicy-rnd adv bug

* fix(puyuan): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* dev-ppo-onpolicy-rnd

* fix(pu): fix rnd reward normalize bug

* feature(pu): add minigrid fourrooms and doorkey env info

* feature(pu): add serial_entry_onpolicy

* fix(pu): fix config params of onpolicy ppo

* feature(pu): add obs normalization

* polish(pu): polish rnd intrinsic reward normalization

* fix(pu): fix clear data bug

* test(pu): add off-policy ppo config

* polish(pu): polish minigrid onppo-rnd config

* polish(pu): polish rnd reward model and minigrid config for rnd_onppo

* polish(pu): polish minigrid rnd_onppo config

* feature(pu): add gym-minigrid

* fix(pu): fix ISerialEvaluator bug

* fix(pu): fix cuda device compatibility

* fix(pu): fix MiniGrid-ObstructedMaze-2Dlh-v0 env_id bug

* polish(pu): squash rnd intrinsic reward to [0,1] according to the batch min and max

* style(pu): yapf format

* polich(pu):polish pitfall offppo config

* polish(pu): polish rnd-onppo and onppo config

* polish(pu): polish config and weight last reward

* polish(pu):polish rnd-onppo config

* fix(pu)" fix mujoco onppo config

* fix(pu): fix continous version of  dict_data_split_traj_and_compute_adv

* polish(pu):polish config

* fix(pu): add key traj_flag in data to split traj correctly  when ignore_done is True in halfcheetah

* polish(pu): polish annatation

* polish(pu): withdraw files submitted wrongly

* polish(pu): withdraw files deleted wrongly

* polish(pu): polish onppo config

* fix(pu): fix remaining_traj_data recompute adv bug and polish rnd onppo code

* style(pu): yapf format

* polish(pu): polish gae_traj_flag function

* polish(pu): delete redundant function in onppo

0b46dd24

25 10月, 2021 1 次提交

test(wyh): add more unittest for ppo and sac policy (#104) · c5af1cf2

由 Weiyuhong-1998 提交于 10月 25, 2021

* fix(wyh):reward model test

* fix(wyh):sac ppo test

* fix(wyh):ppo_continuous test

* fix(wyh):style

* fix(wyh):ppo test
Co-authored-by: NSwain <niuyazhe314@outlook.com>

c5af1cf2

22 10月, 2021 2 次提交

feature(zym): add offlineRL algo td3_bc and polish policy comments(#88) · 7c1b5e95

由 Yinmin.Zhang 提交于 10月 22, 2021

* feature(zym): add offlineRL algo td3_bc.

* feature(zym): add offlineRL algo td3_bc.

* feature(zym): add offlineRL algo td3_bc.

* polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`.

* fix(lj): fix dimension bug in cql for continuous env.

* fix(zym): fix dimension bug in cql for continuous env.

* fix(zym): fix dimension bug in cql for continuous env.

* polish(zym): update README.md.

7c1b5e95

polish(nyz): fix ppo bugs and update atari ppo offpolicy config (#108) · 2d5ec7c3

由 Swain 提交于 10月 22, 2021

* fix(nyz): fix ppo cuda bug and random collect bug

* config(nyz): add pong ppo off policy better config

* fix(nyz): fix ppo device bug in get_train_sample and update ppo offpolicy config

* style(nyz): correct yapf format

2d5ec7c3

13 9月, 2021 1 次提交
- W
  fix(wyh): mappo nan bug and dict obs cannot unsqueeze bug (#54) · 6341684a
  由 Weiyuhong-1998 提交于 9月 13, 2021
```
* fix_mappo_bug_masknan_and_dict_cannot_unsqueeze

* squeeze_bug
```
  6341684a
07 9月, 2021 1 次提交
- N
  
  feature(wyh): add mappo algorithm for SMAC · f1bf66d0
  由 niuyazhe 提交于 9月 07, 2021
  
  f1bf66d0
24 8月, 2021 1 次提交
- N
  
  test(nyz): add sqil unittest and algotest, remove adder comment in policy, polish sqil config · 42e31ea2
  由 niuyazhe 提交于 8月 24, 2021
  
  42e31ea2
29 7月, 2021 1 次提交
- N
  
  polish(nyz): polish cartpole ppo demo and related unittest · 4e833da2
  由 niuyazhe 提交于 7月 29, 2021
  
  4e833da2
23 7月, 2021 1 次提交
- Z
  
  modify the unittest for the gae; format code. · f4440650
  由 zhangyinmin 提交于 7月 23, 2021
  
  f4440650
21 7月, 2021 2 次提交
- Z
  
  modify the gae recomputation; add/update ppo config/entry. · e30a3d3c
  由 zhangyinmin 提交于 7月 14, 2021
  
  e30a3d3c
- N
  
  feature(nyz): add minigrid ppo config and fix ppo adv norm location bug · 4723a633
  由 niuyazhe 提交于 7月 21, 2021
  
  4723a633
16 7月, 2021 1 次提交

polish(nyz): codestyle optimization by lgtm (#7) · f361bd3b

由 Swain 提交于 7月 16, 2021

* refactor(nyz): refactor read_config to 3 different function interface

* feature(nyz): enable env_setting param in entry

* polish(nyz): remove redundant code and global declaration

* polish(nyz): remove flag in import_helper

* polish(nyz): remove unused import

* style(nyz): correct format

f361bd3b

13 7月, 2021 1 次提交
- Z
  
  add on policy ppo; modify ddpg/td3 config. · 8fffde51
  由 zhangyinmin 提交于 7月 13, 2021
  
  8fffde51
08 7月, 2021 1 次提交
- N
  
  v0.1.0 · 09050dba
  由 niuyazhe 提交于 7月 08, 2021
  
  09050dba

OpenDILab开源决策智能平台 / DI-engine 上一次同步 2 年多

OpenDILab开源决策智能平台 / DI-engine
上一次同步 2 年多