1. 17 11月, 2021 1 次提交
  2. 16 11月, 2021 4 次提交
  3. 15 11月, 2021 2 次提交
  4. 07 11月, 2021 1 次提交
  5. 03 11月, 2021 3 次提交
  6. 01 11月, 2021 3 次提交
    • N
      fix(nyz): fix target model wrapper hard reset bug · fd11c88f
      niuyazhe 提交于
      fd11c88f
    • N
      fix(nyz): fix r2d2 and dqtd error unittest bug · 28930a86
      niuyazhe 提交于
      28930a86
    • feature(pu): add NGU algorithm (#40) · 286ea243
      蒲源 提交于
      * test rnd
      
      * fix mz config
      
      * fix config
      
      * feature(pu): fix r2d2, add beta to actor
      
      * feature(pu): add ngu-dev
      
      * fix(pu): fix r2d2
      
      * fix(puyuan): fix r2d2
      
      * feature(puyuan): add minigrid r2d2 config
      
      * polish minigrid config
      
      * dev-ngu
      
      * feature(pu): add action and reward as inputs of q network
      
      * feature(pu): add episodic reward model
      
      * feature(pu): add episodic reward model, modify r2d2 and collector for ngu
      
      * fix(pu): recover files that were changed by mistake
      
      * fix(pu): fix tblogger cnt bug
      
      * add_dqfd
      
      * Is_expert to is_expert
      
      * fix(pu): fix r2d2 bug
      
      * fix(pu): fix beta index to gamma bug
      
      * fix(pu): fix numerical stability problem
      
      * style(pu): flake8 format
      
      * fix(pu): fix rnd reward model train times
      
      * polish(pu): polish r2d2 reset problem
      
      * fix(pu): fix episodic reward normalize bug
      
      * polish(pu): polish config params and episodic_reward init value
      
      * modify according to the last commnets
      
      * value_gamma;done;marginloss;sqil适配
      
      * feature(pu): add r2d3 algorithm and config of lunarlander and pong
      
      * fix(pu): fix demo path bug
      
      * fix(pu): fix cuda bug at function get_gae in adder.py
      
      * feature(pu): add pong r2d2 config
      
      * polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1
      
      * polish(pu): polish r2d2 config
      
      * test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3
      
      * polish(pu): polish config
      
      * polish(pu): polish config and annotation
      
      * fix(pu): fix r2d2 target net update bug and done bug
      
      * polish(pu): polish pong r2d2 config and add montezuma r2d2 config
      
      * polish(pu): add some logs for debugging in r2d2
      
      * polish(pu): recover config deleted by mistake
      
      * fix(pu): fix r2d3 config of lunarlander and pong
      
      * fix(pu): fix the r2d2 bug in r2d3
      
      * fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py
      
      * fix(pu): fix n_sample bug in serial_entry_r2d3
      
      * polish(pu): polish minigrid r2d2 config
      
      * fix(pu): add info dict of fourrooms doorkey in minigrid_env
      
      * polish(pu): polish r2d2 config
      
      * fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper
      
      * fix(pu): fix r2d2 done and target update bug, polish config
      
      * fix(pu): fix null_padding transition obs to zeros
      
      * fix(pu): episodic_reward transform to [0,1]
      
      * fix(pu): fix the value_gamma bug
      
      * fix(pu): fix device bug in ngu_reward_model.py
      
      * fix(pu): fix null_padding problem in rnd and episodic reward model
      
      * polish(pu): polish config
      
      * fix(pu): use the deepcopy train_data to add bonus reward
      
      * polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode
      
      * fix(pu): fix the episode length 1 bug and weight intrinsic reward bug
      
      * feature(pu): add montezuma ngu config
      
      * fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000
      
      * test(pu): episodic reward transforms to [0,1]
      
      * fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj
      
      * fix(pu): fix r2d2_collect_traj.py
      
      * feature(pu): add pong_r2d3_r2d2expert_config
      
      * polish(pu): yapf format
      
      * polish(pu): fix td.py conflict
      
      * polish(pu): flake8 format
      
      * polish(pu): add lambda_one_step_td key in dqfd error
      
      * test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0
      
      * style(pu): yapf format
      
      * style(pu): format
      
      * polish(nyz): fix ngu detailed compatibility error
      
      * fix(nyz): fix dqfd one_step td lambda bug
      
      * fix(pu): fix test_acer and test_rnd compatibility error
      Co-authored-by: NSwain <niuyazhe314@outlook.com>
      Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>
      286ea243
  7. 31 10月, 2021 1 次提交
  8. 29 10月, 2021 4 次提交
  9. 28 10月, 2021 1 次提交
    • S
      feature(nyz): add gobigger baseline (#95) · a8fec8bb
      Swain 提交于
      * feature(nyz): add gobigger baseline
      
      * style(nyz): add gobigger env infor
      
      * feature(nyz): add ignore prefix in default collate
      
      * feautre(nyz): add vsbot training baseline
      
      * fix(nyz): fix to_tensor empty list bug and polish gobigger baseline
      
      * style(nyz): split gobigger baseline code
      a8fec8bb
  10. 26 10月, 2021 2 次提交
  11. 25 10月, 2021 1 次提交
  12. 22 10月, 2021 4 次提交
  13. 21 10月, 2021 3 次提交
  14. 20 10月, 2021 1 次提交
  15. 19 10月, 2021 1 次提交
  16. 17 10月, 2021 1 次提交
  17. 16 10月, 2021 3 次提交
  18. 15 10月, 2021 3 次提交
  19. 12 10月, 2021 1 次提交