提交 · 2b181eda47444cbccb74af4f53897add0f9c0a00 · OpenDILab开源决策智能平台 / DI-engine

03 12月, 2021 1 次提交
- N
  
  fix(nyz): rename sum keepdims to keepdim for compatiblity and remove sql wrapper · 2b181eda
  由 niuyazhe 提交于 12月 03, 2021
  
  2b181eda
30 11月, 2021 1 次提交
- N
  
  fix(nyz): fix hidden state wrapper h compatibility(smac docker) · c6763f8e
  由 niuyazhe 提交于 11月 30, 2021
  
  c6763f8e
19 11月, 2021 1 次提交

feature(lk): add PDQN algorithm for hybrid action spaces (#118) · 39a7cfe3

由 Ke Li 提交于 11月 19, 2021

* add_pdqn_model

* modify_model_structure

* initial_version_PDQN

* bug_free_PDQN_no_test_convergence

* update_pdqn_config

* add_noise_to_continuous_args

* polish(nyz): polish code style and add noise in pdqn

* seperate_dis_and_cont_model

* fix_bug_for_separation

* fix(pu): current q value use the data action, fix cont loss detach bug, 1 encoder, dist and cont learning rate

* polish(pu): actor delay update

* fix(pu): fix disc cont update frequency

* polish(pu): polish pdqn config

* polish(lk): add comments and typelint for pdqn and dqn

* feature(lk): add test file for pdqn model and policy

* polish(lk): code style

* polish(lk): rm the modify of unrelated files

* polish(lk): rm useless commentes code in pdqn
Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
Co-authored-by: Npuyuan1996 <2402552459@qq.com>

39a7cfe3

01 11月, 2021 2 次提交

N

fix(nyz): fix target model wrapper hard reset bug · fd11c88f
由 niuyazhe 提交于 11月 01, 2021

fd11c88f

蒲

feature(pu): add NGU algorithm (#40) · 286ea243

由蒲源提交于 11月 01, 2021

* test rnd

* fix mz config

* fix config

* feature(pu): fix r2d2, add beta to actor

* feature(pu): add ngu-dev

* fix(pu): fix r2d2

* fix(puyuan): fix r2d2

* feature(puyuan): add minigrid r2d2 config

* polish minigrid config

* dev-ngu

* feature(pu): add action and reward as inputs of q network

* feature(pu): add episodic reward model

* feature(pu): add episodic reward model, modify r2d2 and collector for ngu

* fix(pu): recover files that were changed by mistake

* fix(pu): fix tblogger cnt bug

* add_dqfd

* Is_expert to is_expert

* fix(pu): fix r2d2 bug

* fix(pu): fix beta index to gamma bug

* fix(pu): fix numerical stability problem

* style(pu): flake8 format

* fix(pu): fix rnd reward model train times

* polish(pu): polish r2d2 reset problem

* fix(pu): fix episodic reward normalize bug

* polish(pu): polish config params and episodic_reward init value

* modify according to the last commnets

* value_gamma;done;marginloss;sqil适配

* feature(pu): add r2d3 algorithm and config of lunarlander and pong

* fix(pu): fix demo path bug

* fix(pu): fix cuda bug at function get_gae in adder.py

* feature(pu): add pong r2d2 config

* polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1

* polish(pu): polish r2d2 config

* test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3

* polish(pu): polish config

* polish(pu): polish config and annotation

* fix(pu): fix r2d2 target net update bug and done bug

* polish(pu): polish pong r2d2 config and add montezuma r2d2 config

* polish(pu): add some logs for debugging in r2d2

* polish(pu): recover config deleted by mistake

* fix(pu): fix r2d3 config of lunarlander and pong

* fix(pu): fix the r2d2 bug in r2d3

* fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py

* fix(pu): fix n_sample bug in serial_entry_r2d3

* polish(pu): polish minigrid r2d2 config

* fix(pu): add info dict of fourrooms doorkey in minigrid_env

* polish(pu): polish r2d2 config

* fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper

* fix(pu): fix r2d2 done and target update bug, polish config

* fix(pu): fix null_padding transition obs to zeros

* fix(pu): episodic_reward transform to [0,1]

* fix(pu): fix the value_gamma bug

* fix(pu): fix device bug in ngu_reward_model.py

* fix(pu): fix null_padding problem in rnd and episodic reward model

* polish(pu): polish config

* fix(pu): use the deepcopy train_data to add bonus reward

* polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode

* fix(pu): fix the episode length 1 bug and weight intrinsic reward bug

* feature(pu): add montezuma ngu config

* fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000

* test(pu): episodic reward transforms to [0,1]

* fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj

* fix(pu): fix r2d2_collect_traj.py

* feature(pu): add pong_r2d3_r2d2expert_config

* polish(pu): yapf format

* polish(pu): fix td.py conflict

* polish(pu): flake8 format

* polish(pu): add lambda_one_step_td key in dqfd error

* test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0

* style(pu): yapf format

* style(pu): format

* polish(nyz): fix ngu detailed compatibility error

* fix(nyz): fix dqfd one_step td lambda bug

* fix(pu): fix test_acer and test_rnd compatibility error
Co-authored-by: NSwain <niuyazhe314@outlook.com>
Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>

286ea243

29 10月, 2021 1 次提交

feature(nyz): add PADDPG for hybrid action space as baseline (#109) · d2f79536

由 Swain 提交于 10月 29, 2021

* fix(nyz): fix gym_hybrid env not scale action bug

* feature(nyz): add PADDPG basic implementation for hybrid action space

* fix(nyz): fix td3/d4pg comatibility bug with new modifications

* fix(nyz): fix hybrid ddpg action type grad bug and update config

* feature(nyz): add eps greedy + multinomial wrapper and gym_hybrid ddpg convergence config

* style(nyz): update PADDPG in README

* test_model_hybrid_qac

* fix_typo_in_README

* test_policy_hybrid_qac

* polish(nyz): polish hybrid action space to dict structure and polish unittest

* fix(nyz): fix td3bc compatibility bug
Co-authored-by: N李可 <like2@CN0014008466M.local>

d2f79536

20 8月, 2021 1 次提交

SQIL (#25) · 9929dc37

由 Will-Nie 提交于 8月 20, 2021

* add sqil

* conceal all the personal info

* revise according to the comments

* correct_format

* add_comment to hardcodes part

* pass flake8

* add force_reproducibility = True; device, ex_model

* check format

9929dc37

16 7月, 2021 1 次提交

polish(nyz): codestyle optimization by lgtm (#7) · f361bd3b

由 Swain 提交于 7月 16, 2021

* refactor(nyz): refactor read_config to 3 different function interface

* feature(nyz): enable env_setting param in entry

* polish(nyz): remove redundant code and global declaration

* polish(nyz): remove flag in import_helper

* polish(nyz): remove unused import

* style(nyz): correct format

f361bd3b

08 7月, 2021 1 次提交
- N
  
  v0.1.0 · 09050dba
  由 niuyazhe 提交于 7月 08, 2021
  
  09050dba

OpenDILab开源决策智能平台 / DI-engine 上一次同步 2 年多

OpenDILab开源决策智能平台 / DI-engine
上一次同步 2 年多