- 30 12月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 24 12月, 2021 1 次提交
-
-
由 Swain 提交于
* feature(nyz): add hybrid ppo, unify action_space field and use dict type mu sigma * polish(nyz): polish ppo config continous field, move to action_space field * fix(nyz): fix ppo action_space field compatibility bug * fix(nyz): fix ppg/sac/cql action_space field compatibility bug * demo(nyz): update gym hybrid hppo config * polish(pu): polish hppo hyper-para, use tanh and fixed sigma 0.3 in actor_action_args, use clamp [0,1] and [-1,1] for acceleration_value and rotation_value correspondingly after sample from the pi distri. in collect phase * polish(pu):polish as review * polish(pu): polish hppo config * polish(pu): entropy weight=0.03 performs best empirically * fix(nyz): fix unittest compatibility bugs * polish(nyz): remove atari env unused print(ci skip) Co-authored-by: Npuyuan1996 <2402552459@qq.com>
-
- 09 12月, 2021 1 次提交
-
-
由 Xu Jingxin 提交于
* Init base buffer and storage * Use ratelimit as middleware * Pass style check * Keep the return original return value * Add buffer.view * Add replace flag on sample, rewrite middleware processing * Test slicing * Add buffer copy middleware * Add update/delete api in buffer, rename middleware * Implement update and delete api of buffer * add naive use time count middleware in buffer * Rename next to chain * feature(nyz): add staleness check middleware and polish buffer * feature(nyz): add naive priority experience replay * Sample by indices * Combine buffer and storage layers * Support indices when deleting items from the queue * Use dataclass to save buffered data, remove return_index and return_meta * Add ignore_insufficient * polish(nyz): add return index in push and copy same data in sample * Drop useless import * Fix sample with indices, ensure return size is equal to input size or indices size * Make sure sampled data in buffer is different from each other * Support sample by grouped meta key * Support sample by rolling window * Add import/export data in buffer * Padding after sampling from buffer * Polish use_time_check * Use buffer as dataset * Set collate_fn in buffer test * feature(nyz): add deque buffer compatibility wrapper and demo * polish(nyz): polish code style and add pong dqn new deque buffer demo * feature(nyz): add use_time_count compatibility in wrapper * feature(nyz): add priority replay buffer compatibility in wrapper * Improve performance of buffer.update * polish(nyz): add priority max limit and correct flake8 * Use __call__ to rewrite middleware * Rewrite buffer index * Fix buffer delete * Skip first item * Rewrite buffer delete * Use caller * Use caller in priority * Add group sample Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 08 12月, 2021 2 次提交
-
-
由 Will-Nie 提交于
* add trex algorithm for pong * sort style * add atari, ll,cp; fix device, collision; add_ppo * add accuracy evaluation * correct style * add seed to make sure results are replicable * remove useless part in cum return of model part * add mujoco onppo training pipeline; ppo config * improve style * add sac training config for mujoco * add log, add save data; polish config * logger; hyperparameter;walker * correct style * modify else condition * change rnd to trex * revise according to comments, add eposode collect * new collect mode for trex, fix all bugs, commnets * final change * polish after the final comment * add readme/test * add test for serial entry of trex/gcl * sort style
-
由 Weiyuhong-1998 提交于
* fix(wyh):masac * feature(wyh):single agent discrete sac * feature(wyh):single agent discrete sac td * fix(wyh):fix pong bug * fix(wyh):fix smac bug * fix(wyh):masac_5m6m best config * env(wyh):allow SMAC env return ippo/isac obs * fix(wyh):masac polish * fix(wyh):masac style * fix(wyh):masac test
-
- 26 11月, 2021 1 次提交
-
-
由 蒲源 提交于
* fix(pu): fix adam weight decay bug * feature(pu): add pitfall offppo config * feature(pu): add qbert spaceinvaders pitfall r2d3 config * fix(pu): fix expert offfppo config in r2d3 * fix(pu): fix pong connfig * polish(pu): add loss statistics * fix(pu): fix loss statistics bug * polish(pu): polish pong r2d3 config * polish(pu): polish r2d3 pong and lunarlander config * polish(pu): delete unused files
-
- 22 11月, 2021 2 次提交
-
-
由 Weiyuhong-1998 提交于
* guided_cost * max_e * guided_cost * fix(wyh):fix guided cost recompute bug * fix(wyh):add model save * feature(wyh):polish guided cost * feature(wyh):on guided cost * fix(wyh):gcl-modify * fix(wyh):gcl sac config * fix(wyh):gcl style * fix(wyh):modify comments * fix(wyh):masac_5m6m best config * fix(wyh):sac bug * fix(wyh):GCL readme * fix(wyh):GCL readme conflicts
-
由 蒲源 提交于
* test rnd * fix mz config * fix config * fix config * fix(pu): fix r2d2 * fix(pu): fix ppo-onpolicy-rnd adv bug * fix(puyuan): fix r2d2 * feature(puyuan): add minigrid r2d2 config * polish minigrid config * dev-ppo-onpolicy-rnd * fix(pu): fix rnd reward normalize bug * feature(pu): add minigrid fourrooms and doorkey env info * feature(pu): add serial_entry_onpolicy * fix(pu): fix config params of onpolicy ppo * feature(pu): add obs normalization * polish(pu): polish rnd intrinsic reward normalization * fix(pu): fix clear data bug * test(pu): add off-policy ppo config * polish(pu): polish minigrid onppo-rnd config * polish(pu): polish rnd reward model and minigrid config for rnd_onppo * polish(pu): polish minigrid rnd_onppo config * feature(pu): add gym-minigrid * fix(pu): fix ISerialEvaluator bug * fix(pu): fix cuda device compatibility * fix(pu): fix MiniGrid-ObstructedMaze-2Dlh-v0 env_id bug * polish(pu): squash rnd intrinsic reward to [0,1] according to the batch min and max * style(pu): yapf format * polich(pu):polish pitfall offppo config * polish(pu): polish rnd-onppo and onppo config * polish(pu): polish config and weight last reward * polish(pu):polish rnd-onppo config * fix(pu)" fix mujoco onppo config * fix(pu): fix continous version of dict_data_split_traj_and_compute_adv * polish(pu):polish config * fix(pu): add key traj_flag in data to split traj correctly when ignore_done is True in halfcheetah * polish(pu): polish annatation * polish(pu): withdraw files submitted wrongly * polish(pu): withdraw files deleted wrongly * polish(pu): polish onppo config * fix(pu): fix remaining_traj_data recompute adv bug and polish rnd onppo code * style(pu): yapf format * polish(pu): polish gae_traj_flag function * polish(pu): delete redundant function in onppo
-
- 19 11月, 2021 1 次提交
-
-
由 Davide Liu 提交于
* added gail entry * added lunarlander and cartpole config * added gail mujoco config * added mujoco exp * update22-10 * added third exp * added metric to evaluate policies * added GAIL entry and config for Cartpole and Walker2d * checked style and unittest * restored lunarlander env * style problems * bug correction * Delete expert_data_train.pkl * changed loss of GAIL * Update walker2d_ddpg_gail_config.py * changed gail reward from -D(s, a) to -log(D(s, a)) * added small constant to reward function * added comment to clarify config * Update walker2d_ddpg_gail_config.py * added lunarlander entry + config * Added Atari discriminator + Pong entry config * Update gail_irl_model.py * Update gail_irl_model.py * added gail serial pipeline and onehot actions for gail atari * related to previous commit * removed main files * removed old comment
-
- 15 11月, 2021 1 次提交
-
-
由 Jia Ruonan 提交于
* commit bipedalwalkere_ppo_config * commit bipedalwalker_sac_config
-
- 01 11月, 2021 1 次提交
-
-
由 蒲源 提交于
* test rnd * fix mz config * fix config * feature(pu): fix r2d2, add beta to actor * feature(pu): add ngu-dev * fix(pu): fix r2d2 * fix(puyuan): fix r2d2 * feature(puyuan): add minigrid r2d2 config * polish minigrid config * dev-ngu * feature(pu): add action and reward as inputs of q network * feature(pu): add episodic reward model * feature(pu): add episodic reward model, modify r2d2 and collector for ngu * fix(pu): recover files that were changed by mistake * fix(pu): fix tblogger cnt bug * add_dqfd * Is_expert to is_expert * fix(pu): fix r2d2 bug * fix(pu): fix beta index to gamma bug * fix(pu): fix numerical stability problem * style(pu): flake8 format * fix(pu): fix rnd reward model train times * polish(pu): polish r2d2 reset problem * fix(pu): fix episodic reward normalize bug * polish(pu): polish config params and episodic_reward init value * modify according to the last commnets * value_gamma;done;marginloss;sqil适配 * feature(pu): add r2d3 algorithm and config of lunarlander and pong * fix(pu): fix demo path bug * fix(pu): fix cuda bug at function get_gae in adder.py * feature(pu): add pong r2d2 config * polish(pu): r2d2 uses the mixture priority, episodic_reward transforms to mean 0 std1 * polish(pu): polish r2d2 config * test(pu): test cuda compatiality of dqfd_nstep_td_error in r2d3 * polish(pu): polish config * polish(pu): polish config and annotation * fix(pu): fix r2d2 target net update bug and done bug * polish(pu): polish pong r2d2 config and add montezuma r2d2 config * polish(pu): add some logs for debugging in r2d2 * polish(pu): recover config deleted by mistake * fix(pu): fix r2d3 config of lunarlander and pong * fix(pu): fix the r2d2 bug in r2d3 * fix(pu): fix r2d3 cpu device bug in fun dqfd_nstep_td_error of td.py * fix(pu): fix n_sample bug in serial_entry_r2d3 * polish(pu): polish minigrid r2d2 config * fix(pu): add info dict of fourrooms doorkey in minigrid_env * polish(pu): polish r2d2 config * fix(pu): fix expert policy collect traj bug, now we use the argmax_sample wrapper * fix(pu): fix r2d2 done and target update bug, polish config * fix(pu): fix null_padding transition obs to zeros * fix(pu): episodic_reward transform to [0,1] * fix(pu): fix the value_gamma bug * fix(pu): fix device bug in ngu_reward_model.py * fix(pu): fix null_padding problem in rnd and episodic reward model * polish(pu): polish config * fix(pu): use the deepcopy train_data to add bonus reward * polish(pu): add the operation of enlarging seq_length times to the last reward of the whole episode * fix(pu): fix the episode length 1 bug and weight intrinsic reward bug * feature(pu): add montezuma ngu config * fix(pu): fix lunarlander ngu unroll_len to 998 so that the sequence length is equal to the max step 1000 * test(pu): episodic reward transforms to [0,1] * fix(pu): fix r2d3 one-step rnn init bug and add r2d2_collect_traj * fix(pu): fix r2d2_collect_traj.py * feature(pu): add pong_r2d3_r2d2expert_config * polish(pu): yapf format * polish(pu): fix td.py conflict * polish(pu): flake8 format * polish(pu): add lambda_one_step_td key in dqfd error * test(pu): set key lambda_one_step_td and lambda_supervised_loss as 0 * style(pu): yapf format * style(pu): format * polish(nyz): fix ngu detailed compatibility error * fix(nyz): fix dqfd one_step td lambda bug * fix(pu): fix test_acer and test_rnd compatibility error Co-authored-by: NSwain <niuyazhe314@outlook.com> Co-authored-by: NWill_Nie <nieyunpengwill@hotmail.com>
-
- 31 10月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 21 10月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 19 10月, 2021 1 次提交
-
-
由 Will-Nie 提交于
-
- 16 10月, 2021 1 次提交
-
-
由 Will-Nie 提交于
* add_dqfd * Is_expert to is_expert * modify according to the last commnets * value_gamma; done; marginloss; sqil compatibility * finally shorten the code, revise config * revise config, style * add_readme/two_more_config * correct format Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 15 10月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 17 9月, 2021 1 次提交
-
-
由 蒲源 提交于
* test rnd * fix mz config * fix config * fix(pu): fix r2d2 * fix(puyuan): fix r2d2 * feature(puyuan): add minigrid r2d2 config * polish minigrid config * modified as review * fix(pu): fix bugffor compatibility * polish(pu): add annotations and polish slice operation * style(pu): run format.sh * style(pu): correct yapf format * fix(pu): fix config * fix(pu): fix done slice bug and lstm reset bug * style(pu): format config * polish(pu): polish config params for cartpole, lunarlander and minigrid * polish(pu): polish minigrid config params * Update r2d2.py * polish(pu): polish rnn reset problem * fix(pu): fix merge error * polish(pu): polish cartpole config * polish(nyz): polish cartpole r2d2 config for faster convergence * test(nyz): enable r2d2 algotest Co-authored-by: Nniuyazhe <niuyazhe@sensetime.com>
-
- 08 9月, 2021 1 次提交
-
-
由 Weiyuhong-1998 提交于
* env-list * env-list-fix-grammmer * env-only-test * modify-gif * modify-gif-pendulum * modify-gif-delect-maze
-
- 07 9月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 06 9月, 2021 1 次提交
-
-
由 蒲源 提交于
* test rnd * fix mz config * fix config * fix(pu): fix r2d2 * feature(puyuan): add minigrid r2d2 config * polish minigrid config * modified as review * fix(pu): fix bugffor compatibility * polish(pu): add annotations and polish slice operation * style(pu): run format.sh * style(pu): correct yapf format
-
- 24 8月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 23 8月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 20 8月, 2021 1 次提交
-
-
由 Will-Nie 提交于
* add sqil * conceal all the personal info * revise according to the comments * correct_format * add_comment to hardcodes part * pass flake8 * add force_reproducibility = True; device, ex_model * check format
-
- 11 8月, 2021 2 次提交
- 10 8月, 2021 1 次提交
-
-
由 garyzhang99 提交于
* init runable ppo * init overcooked env * overcooked ppo in place * runable ppo with shaped rewards * modified config * feature(nyz): modify win rate calculation with draws * remove redundant code, modified baseline model * Update __init__.py * Update config.py * modify temp_config_file.close() position in config.py to work in windows os * remove redundant comments and rename files * fix name bug and use namedlist * add simple readme and remove redundant comments from copies * resolve threads * remove debug comments Co-authored-by: Nniuyazhe <niuyazhe314@outlook.com>
-
- 03 8月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-
- 01 8月, 2021 1 次提交
-
-
由 simonat2011 提交于
* add endoro env config. add enduro's ppo,dqn,drdqn,rainbow,impala config. * modified as reviewer mentions * add qacd network * fix bugs * fix bugs * update acer algorithm * update ACER code * update acer config * fix bug * update pong acer's config * edit commit * update code as mention * fix the comment table and trust region * fix format * fix typing lint * fix format,flake8 * fix format * fix whitespace problem * test(nyz): add acer unittest and algotest * style(nyz): correct flake8 style Co-authored-by: Nshenziju <simonshen2011@foxmail.com> Co-authored-by: NSwain <niuyazhe314@outlook.com>
-
- 08 7月, 2021 1 次提交
-
-
由 niuyazhe 提交于
-