- 08 11月, 2021 2 次提交
-
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
- 07 11月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 05 11月, 2021 19 次提交
-
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
由 Xu Jingxin 提交于
-
由 niuyazhe 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
由 Xu Jingxin 提交于
-
- 03 11月, 2021 3 次提交
-
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
由 Davide Liu 提交于
* small fix * added bsuite env version * modified test
-
- 01 11月, 2021 3 次提交
-
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
由 蒲源 提交于
* test rnd * fix mz config * fix config * feature(pu): fix r2d2, add beta to actor * feature(pu): add ngu-dev * fix(pu): fix r2d2 * fix(puyuan): fix r2d2 * feature(puyuan): add minigrid r2d2 config * polish minigrid config * dev-ngu * feature(pu): add action and reward as inputs of q network * feature(pu): add episodic reward model * feature(pu): add episodic reward model, modify r2d2 and collector for ngu * fix(pu): recover files that were changed by mistake * fix(pu): fix tblogger cnt bug * add_dqfd * Is_expert to is_expert * fix(pu): fix r2d2 bug * fix(pu): fix beta index to gamma bug * fix(pu): fix numerical stability problem * style(pu): flake8 format * fix(pu): fix rnd reward model train times * polish(pu): polish r2d2 reset problem * fix(pu): fix episodic reward normalize bug * polish(pu): polish config params and episodic_reward init value * modify according to the last commnets * value_gamma;done;marginloss;sqil适配 * feature(pu): add r2d3 algorithm and config of lunarlander and pong * fix(pu): fix demo path bug * fix(pu): fix cuda bug at function get_gae in adder.py * feature(pu): add pong r2d2 config * polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1 * polish(pu): polish r2d2 config * test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3 * polish(pu): polish config * polish(pu): polish config and annotation * fix(pu): fix r2d2 target net update bug and done bug * polish(pu): polish pong r2d2 config and add montezuma r2d2 config * polish(pu): add some logs for debugging in r2d2 * polish(pu): recover config deleted by mistake * fix(pu): fix r2d3 config of lunarlander and pong * fix(pu): fix the r2d2 bug in r2d3 * fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py * fix(pu): fix n_sample bug in serial_entry_r2d3 * polish(pu): polish minigrid r2d2 config * fix(pu): add info dict of fourrooms doorkey in minigrid_env * polish(pu): polish r2d2 config * fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper * fix(pu): fix r2d2 done and target update bug, polish config * fix(pu): fix null_padding transition obs to zeros * fix(pu): episodic_reward transform to [0,1] * fix(pu): fix the value_gamma bug * fix(pu): fix device bug in ngu_reward_model.py * fix(pu): fix null_padding problem in rnd and episodic reward model * polish(pu): polish config * fix(pu): use the deepcopy train_data to add bonus reward * polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode * fix(pu): fix the episode length 1 bug and weight intrinsic reward bug * feature(pu): add montezuma ngu config * fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000 * test(pu): episodic reward transforms to [0,1] * fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj * fix(pu): fix r2d2_collect_traj.py * feature(pu): add pong_r2d3_r2d2expert_config * polish(pu): yapf format * polish(pu): fix td.py conflict * polish(pu): flake8 format * polish(pu): add lambda_one_step_td key in dqfd error * test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0 * style(pu): yapf format * style(pu): format * polish(nyz): fix ngu detailed compatibility error * fix(nyz): fix dqfd one_step td lambda bug * fix(pu): fix test_acer and test_rnd compatibility error Co-authored-by: NSwain <niuyazhe314@outlook.com> Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>
-
- 31 10月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 29 10月, 2021 4 次提交
-
-
由 Swain 提交于
* feature(lcm): add MBPO algorithm (#87) * add model-based rl * fix yazhe's comments * format * pass flake8 test * polish(nyz): polish mbpo import, name and test Co-authored-by: Nlichuming <lichuming@lichumingdeMacBook-Pro.local>
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
由 Swain 提交于
* fix(nyz): fix gym_hybrid env not scale action bug * feature(nyz): add PADDPG basic implementation for hybrid action space * fix(nyz): fix td3/d4pg comatibility bug with new modifications * fix(nyz): fix hybrid ddpg action type grad bug and update config * feature(nyz): add eps greedy + multinomial wrapper and gym_hybrid ddpg convergence config * style(nyz): update PADDPG in README * test_model_hybrid_qac * fix_typo_in_README * test_policy_hybrid_qac * polish(nyz): polish hybrid action space to dict structure and polish unittest * fix(nyz): fix td3bc compatibility bug Co-authored-by: N李可 <like2@CN0014008466M.local>
-
- 28 10月, 2021 1 次提交
-
-
由 Swain 提交于
* feature(nyz): add gobigger baseline * style(nyz): add gobigger env infor * feature(nyz): add ignore prefix in default collate * feautre(nyz): add vsbot training baseline * fix(nyz): fix to_tensor empty list bug and polish gobigger baseline * style(nyz): split gobigger baseline code
-
- 26 10月, 2021 2 次提交
-
-
由 niuyazhe 提交于
-
由 jayyoung0802 提交于
* add 4 pytest dataset.py learner_aggregator.py learner_hook.py metric_serial_evaluator.py * fix yapf and flake8 And remove invalid self._env * fix fake_cls_config.py flake8
-
- 25 10月, 2021 1 次提交
-
-
由 Weiyuhong-1998 提交于
* fix(wyh):reward model test * fix(wyh):sac ppo test * fix(wyh):ppo_continuous test * fix(wyh):style * fix(wyh):ppo test Co-authored-by: NSwain <niuyazhe314@outlook.com>
-
- 22 10月, 2021 3 次提交
-
-
由 Yinmin.Zhang 提交于
* feature(zym): add offlineRL algo td3_bc. * feature(zym): add offlineRL algo td3_bc. * feature(zym): add offlineRL algo td3_bc. * polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`. * fix(lj): fix dimension bug in cql for continuous env. * fix(zym): fix dimension bug in cql for continuous env. * fix(zym): fix dimension bug in cql for continuous env. * polish(zym): update README.md.
-
由 Swain 提交于
* fix(nyz): fix ppo cuda bug and random collect bug * config(nyz): add pong ppo off policy better config * fix(nyz): fix ppo device bug in get_train_sample and update ppo offpolicy config * style(nyz): correct yapf format
-
由 niuyazhe 提交于
-