2022.01.01(v0.2.3) - env: add multi-agent mujoco env (#146) - env: add delay reward mujoco env (#145) - env: fix port conflict in gym_soccer (#139) - algo: MASAC algorithm (#112) - algo: TREX algorithm (#119) (#144) - algo: H-PPO hybrid action space algorithm (#140) - algo: residual link in R2D2 (#150) - algo: gumbel softmax (#169) - algo: move actor_head_type to action_space field - feature: new main pipeline and async/parallel framework (#142) (#166) (#168) - feature: refactor buffer, separate algorithm and storage (#129) - feature: cli in new pipeline(ditask) (#160) - feature: add multiprocess tblogger, fix circular reference problem (#156) - feature: add multiple seed cli - feature: polish eps_greedy_multinomial_sample in model_wrapper (#154) - fix: R2D3 abs priority problem (#158) (#161) - fix: multi-discrete action space policies random action bug (#167) - fix: doc generate bug with enum_tools (#155) - style: more comments about R2D2 (#149) - style: add doc about how to migrate a new env - style: add doc about env tutorial in dizoo - style: add conda auto release (#148) - style: udpate zh doc link - style: update kaggle tutorial link 2021.12.03(v0.2.2) - env: apple key to door treasure env (#128) - env: add bsuite memory benchmark (#138) - env: polish atari impala config - algo: Guided Cost IRL algorithm (#57) - algo: ICM exploration algorithm (#41) - algo: MP-DQN hybrid action space algorithm (#131) - algo: add loss statistics and polish r2d3 pong config (#126) - feautre: add renew env mechanism in env manager and update timeout mechanism (#127) (#134) - fix: async subprocess env manager reset bug (#137) - fix: keepdims name bug in model wrapper - fix: on-policy ppo value norm bug - fix: GAE and RND unittest bug - fix: hidden state wrapper h tensor compatiblity - fix: naive buffer auto config create bug - style: add supporters list 2021.11.22(v0.2.1) - env: gym-hybrid env (#86) - env: gym-soccer (HFO) env (#94) - env: Go-Bigger env baseline (#95) - env: add the bipedalwalker config of sac and ppo (#121) - algo: DQfD Imitation Learning algorithm (#48) (#98) - algo: TD3BC offline RL algorithm (#88) - algo: MBPO model-based RL algorithm (#113) - algo: PADDPG hybrid action space algorithm (#109) - algo: PDQN hybrid action space algorithm (#118) - algo: fix R2D2 bugs and produce benchmark, add naive NGU (#40) - algo: self-play training demo in slime_volley env (#23) - algo: add example of GAIL entry + config for mujoco (#114) - feature: enable arbitrary policy num in serial sample collector - feautre: add torch DataParallel for single machine multi-GPU - feature: add registry force_overwrite argument - feature: add naive buffer periodic thruput seconds argument - test: add pure docker setting test (#103) - test: add unittest for dataset and evaluator (#107) - test: add unittest for on-policy algorithm (#92) - test: add unittest for ppo and td (MARL case) (#89) - test: polish collector benchmark test - fix: target model wrapper hard reset bug - fix: fix learn state_dict target model bug - fix: ppo bugs and update atari ppo offpolicy config (#108) - fix: pyyaml version bug (#99) - fix: small fix on bsuite environment (#117) - fix: discrete cql unittest bug - fix: release workflow bug - fix: base policy model state_dict overlap bug - fix: remove on_policy option in dizoo config and entry - fix: remove torch in env - style: gym version > 0.20.0 - style: torch version >= 1.1.0, <= 1.10.0 - style: ale-py == 0.7.0 2021.9.30(v0.2.0) - env: overcooked env (#20) - env: procgen env (#26) - env: modified predator env (#30) - env: d4rl env (#37) - env: imagenet dataset (#27) - env: bsuite env (#58) - env: move atari_py to ale-py - algo: SQIL algorithm (#25) (#44) - algo: CQL algorithm (discrete/continuous) (#37) (#68) - algo: MAPPO algorithm (#62) - algo: WQMIX algorithm (#24) - algo: D4PG algorithm (#76) - algo: update multi discrete policy(dqn, ppo, rainbow) (#51) (#72) - feature: image classification training pipeline (#27) - feature: add force_reproducibility option in subprocess env manager - feature: add/delete/restart replicas via cli for k8s - feautre: add league metric (trueskill and elo) (#22) - feature: add tb in naive buffer and modify tb in advanced buffer (#39) - feature: add k8s launcher and di-orchestrator launcher, add related unittest (#45) (#49) - feature: add hyper-parameter scheduler module (#38) - feautre: add plot function (#59) - fix: acer bug and update atari result (#21) - fix: mappo nan bug and dict obs cannot unsqueeze bug (#54) - fix: r2d2 hidden state and obs arange bug (#36) (#52) - fix: ppo bug when use dual_clip and adv > 0 - fix: qmix double_q hidden state bug - fix: spawn context problem in interaction unittest (#69) - fix: formatted config no eval bug (#53) - fix: the catch statments that will never succeed and system proxy bug (#71) (#79) - fix: lunarlander config - fix: c51 head dimension mismatch bug - fix: mujoco config typo bug - fix: ppg atari config bug - fix: max use and priority update special branch bug in advanced_buffer - style: add docker deploy in github workflow (#70) (#78) (#80) - style: support PyTorch 1.9.0 - style: add algo/env list in README - style: rename advanced_buffer register name to advanced 2021.8.3(v0.1.1) - env: selfplay/league demo (#12) - env: pybullet env (#16) - env: minigrid env (#13) - env: atari enduro config (#11) - algo: on policy PPO (#9) - algo: ACER algorithm (#14) - feature: polish experiment directory structure (#10) - refactor: split doc to new repo (#4) - fix: atari env info action space bug - fix: env manager retry wrapper raise exception info bug - fix: dist entry disable-flask-log typo - style: codestyle optimization by lgtm (#7) - style: code/comment statistics badge - style: github CI workflow 2021.7.8(v0.1.0)