- 30 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 21 12月, 2021 2 次提交
- 15 12月, 2021 1 次提交
-
-
由 Weiyuhong-1998 提交于
* ma mujoco env and masac code * env(wyh):ma mujoco agent id * feature(wyh):maqac continuous * fix(wyh):multi-mujoco add readme * fix(wyh): td error * fix(wyh)style * fix(wyh):multi agent mujoco test
-
- 14 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 09 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 08 12月, 2021 1 次提交
-
-
由 Will-Nie 提交于
* add trex algorithm for pong * sort style * add atari, ll,cp; fix device, collision; add_ppo * add accuracy evaluation * correct style * add seed to make sure results are replicable * remove useless part in cum return of model part * add mujoco onppo training pipeline; ppo config * improve style * add sac training config for mujoco * add log, add save data; polish config * logger; hyperparameter;walker * correct style * modify else condition * change rnd to trex * revise according to comments, add eposode collect * new collect mode for trex, fix all bugs, commnets * final change * polish after the final comment * add readme/test * add test for serial entry of trex/gcl * sort style
-
- 06 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 03 12月, 2021 2 次提交
-
-
由 niuyazhe 提交于
-
由 Ke Li 提交于
* feature(lk): add initial version of MP-PDQN * fix(lk): fix expand function bug * refactor(nyz): refactor mpdqn continuous args inputs module * fix(nyz): fix pdqn scatter index generation * fix(lk): fix pdqn scatter assignment bug * feature(lk): polish mpdqn code and style format * feature(lk): add mpdqn config and test file * feature(lk): polish mpdqn code and style format * fix(lk): fix import bug * polish(lk): add test for mpdqn * polish(lk): polish code style and format * polish(lk): rm print debug info * polish(lk): rm print debug info * polish(lk): polish code style and format * polish(lk): add MPDQN in readme.md Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 01 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 26 11月, 2021 1 次提交
-
-
由 蒲源 提交于
* fix(pu): fix adam weight decay bug * feature(pu): add pitfall offppo config * feature(pu): add qbert spaceinvaders pitfall r2d3 config * fix(pu): fix expert offfppo config in r2d3 * fix(pu): fix pong connfig * polish(pu): add loss statistics * fix(pu): fix loss statistics bug * polish(pu): polish pong r2d3 config * polish(pu): polish r2d3 pong and lunarlander config * polish(pu): delete unused files
-
- 25 11月, 2021 1 次提交
-
-
由 timothijoe 提交于
* curisity_icm_v1 * modified version1 * modified v2 * one_hot function change * add paper information * format minigrid ppo curiosity * flake8 ding checked * 6th-Oct-gpu-modified * reset configs in minigrid files * minigird-env-doorkey88-100-300 * use modulelist instead of list in icm module * change icm reward model * delete origin curiosit_reward model and add icm_reward model * modified icm reward model * polish icm model by zt, (1) polish ding/reward_model/icm_reward_model.py and related __init__.py (2) add config files for pong:dizoo/atari/config/serial/pong/pong_ppo_offpolicy_icm.py and minigrid env: dizoo/minigrid/config/doorkey8_icm_config.py,fourroom_icm_config.py,minigrid_icm_config.py (3) add element icm in README * remove some useless config files in minigrid * remove redundant part in ppo.py, add cartpole_ppo_icm_config.py, changed test_icm.py and Readme
-
- 22 11月, 2021 2 次提交
-
-
由 Weiyuhong-1998 提交于
* guided_cost * max_e * guided_cost * fix(wyh):fix guided cost recompute bug * fix(wyh):add model save * feature(wyh):polish guided cost * feature(wyh):on guided cost * fix(wyh):gcl-modify * fix(wyh):gcl sac config * fix(wyh):gcl style * fix(wyh):modify comments * fix(wyh):masac_5m6m best config * fix(wyh):sac bug * fix(wyh):GCL readme * fix(wyh):GCL readme conflicts
-
由 niuyazhe 提交于
-
- 20 11月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 19 11月, 2021 2 次提交
-
-
由 Ke Li 提交于
* add_pdqn_model * modify_model_structure * initial_version_PDQN * bug_free_PDQN_no_test_convergence * update_pdqn_config * add_noise_to_continuous_args * polish(nyz): polish code style and add noise in pdqn * seperate_dis_and_cont_model * fix_bug_for_separation * fix(pu): current q value use the data action, fix cont loss detach bug, 1 encoder, dist and cont learning rate * polish(pu): actor delay update * fix(pu): fix disc cont update frequency * polish(pu): polish pdqn config * polish(lk): add comments and typelint for pdqn and dqn * feature(lk): add test file for pdqn model and policy * polish(lk): code style * polish(lk): rm the modify of unrelated files * polish(lk): rm useless commentes code in pdqn Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com> Co-authored-by: Npuyuan1996 <2402552459@qq.com>
-
由 niuyazhe 提交于
-
- 15 11月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 29 10月, 2021 2 次提交
-
-
由 Swain 提交于
* feature(lcm): add MBPO algorithm (#87) * add model-based rl * fix yazhe's comments * format * pass flake8 test * polish(nyz): polish mbpo import, name and test Co-authored-by: Nlichuming <lichuming@lichumingdeMacBook-Pro.local>
-
由 Swain 提交于
* fix(nyz): fix gym_hybrid env not scale action bug * feature(nyz): add PADDPG basic implementation for hybrid action space * fix(nyz): fix td3/d4pg comatibility bug with new modifications * fix(nyz): fix hybrid ddpg action type grad bug and update config * feature(nyz): add eps greedy + multinomial wrapper and gym_hybrid ddpg convergence config * style(nyz): update PADDPG in README * test_model_hybrid_qac * fix_typo_in_README * test_policy_hybrid_qac * polish(nyz): polish hybrid action space to dict structure and polish unittest * fix(nyz): fix td3bc compatibility bug Co-authored-by: N李可 <like2@CN0014008466M.local>
-
- 28 10月, 2021 1 次提交
-
-
由 Swain 提交于
* feature(nyz): add gobigger baseline * style(nyz): add gobigger env infor * feature(nyz): add ignore prefix in default collate * feautre(nyz): add vsbot training baseline * fix(nyz): fix to_tensor empty list bug and polish gobigger baseline * style(nyz): split gobigger baseline code
-
- 22 10月, 2021 1 次提交
-
-
由 Yinmin.Zhang 提交于
* feature(zym): add offlineRL algo td3_bc. * feature(zym): add offlineRL algo td3_bc. * feature(zym): add offlineRL algo td3_bc. * polish(zym): polish some annotations in td3/ddpg/sac/ppo; polish `_forward_collect` and `_foward_eval`. * fix(lj): fix dimension bug in cql for continuous env. * fix(zym): fix dimension bug in cql for continuous env. * fix(zym): fix dimension bug in cql for continuous env. * polish(zym): update README.md.
-
- 21 10月, 2021 1 次提交
-
-
由 Ke Li 提交于
* add_soccer_env * add_info * close * format * test_gym_soccer * rm_torch * replay_log * format_style * add_gym_soccer_to_readme * separate render_func * add_gif_file * scale_action * flake_style_format * resolve_review_comments * add branch info for gym hybrid
-
- 19 10月, 2021 1 次提交
-
-
由 Will-Nie 提交于
-
- 16 10月, 2021 1 次提交
-
-
由 Will-Nie 提交于
* add_dqfd * Is_expert to is_expert * modify according to the last commnets * value_gamma; done; marginloss; sqil compatibility * finally shorten the code, revise config * revise config, style * add_readme/two_more_config * correct format Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 12 10月, 2021 1 次提交
-
-
由 Swain 提交于
* feature(nyz): add gym-hybrid hybrid action space env * style(nyz): update readme for gym_hybrid env
-
- 08 10月, 2021 1 次提交
-
-
由 LuciusMos 提交于
* slime volley env in dizoo, first commit * fix bug in slime volley env * modify volley env to satisfy ding 1v1 requirements; add naive self-play and league training pipeline(evaluator is not finished, now use a very naive one) * adopt volley builtin ai as default eval opponent * polish(nyz): polish slime_volley_env and its test * feature(nyz): add slime_volley vs bot ppo demo * feature(nyz): add battle_sample_serial_collector and adapt abnormal check in subprocess env manager * feature(nyz): add slime volley self-play demo * style(nyz): add slime_volleyball env gif and split MARL and selfplay label * feature(nyz): add save replay function in slime volleyball env Co-authored-by: Nzlx-sensetime <zhaoliangxuan@sensetime.com> Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 01 10月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 30 9月, 2021 4 次提交
-
-
由 niuyazhe 提交于
-
由 Davide Liu 提交于
* added experience replay and n-step * implementing distributional q value * added distributional q-value * added overview in qac_dist and d4pg * derived D4PG from DDPG * fixed a bug when action shape >1 * benchmark D4PG mujoco + minor fixs -entry for DDPG mujoco -entry for D4PG mujoco -config for D4PG mujoco -fixed style D4PG code -unittests for QAC distributional * formatted code * minor updates (read description) -added d4pg seria_entry test -updated comments in QACDIST -added d4pg in commander register -added q_value in d4pg return dict -added priority update in d4pg entry -added assertion in QACDIST
-
由 niuyazhe 提交于
-
由 niuyazhe 提交于
-
- 24 9月, 2021 1 次提交
-
-
由 Swain 提交于
-
- 23 9月, 2021 1 次提交
-
-
由 Swain 提交于
-
- 17 9月, 2021 2 次提交
-
-
由 Davide Liu 提交于
* start implementing bsuite env * add bsuite env * Implemented * removed unused file * added cartpole_swing environment * Update test_bsuite_env.py * added env in readme and in setup.py * Create bsuite.png
-
由 niuyazhe 提交于
-
- 14 9月, 2021 1 次提交
-
-
由 Xu Jingxin 提交于
-
- 08 9月, 2021 2 次提交