DI-engine v0.2.1
API Change
- remove torch in all envs (numpy array is the basic data format in env)
- remove
on_policy
field in all the config - change
eval_freq
from 50 to 1000
Tutorial and Doc
Env (dizoo)
- gym-hybrid env (#86)
- gym-soccer (HFO) env (#94)
- Go-Bigger env baseline (#95)
- sac and ppo config for bipedalwalker env(#121)
Algorithm
- DQfD Imitation Learning algorithm (#48) (#98)
- TD3BC offline RL algorithm (#88)
- MBPO model-based RL algorithm (#113)
- PADDPG hybrid action space algorithm (#109)
- PDQN hybrid action space algorithm (#118)
- fix R2D2 bugs and produce benchmark, add naive NGU (#40)
- self-play training demo in slime_volley env (#23)
- add example of GAIL entry + config for mujoco (#114)
Enhancement
- enable arbitrary policy num in serial sample collector
- add torch DataParallel for single machine multi-GPU
- add registry force_overwrite argument
- add naive buffer periodic thruput seconds argument
Fix
- target model wrapper hard reset bug
- fix learn state_dict target model bug
- ppo bugs and update atari ppo offpolicy config (#108)
- pyyaml version bug (#99)
- small fix on bsuite environment (#117)
- discrete cql unittest bug
- release workflow bug
- base policy model state_dict overlap bug
- remove on_policy option in dizoo config and entry
- remove torch in env
Test
- add pure docker setting test (#103)
- add unittest for dataset and evaluator (#107)
- add unittest for on-policy algorithm (#92)
- add unittest for ppo and td (MARL case) (#89)
Style
- gym version == 0.20.0
- torch version >= 1.1.0, <= 1.10.0
- ale-py == 0.7.0
New Repo
- Go-Bigger OpenDILab Multi-Agent Decision Intelligence Environment
- GoBigger-Challenge-2021 Basic code and description for GoBigger challenge 2021
Contributors: @PaParaZz1 @puyuan1996 @Will-Nie @YinminZhang @Weiyuhong-1998 @LikeJulia @sailxjx @davide97l @jayyoung0802 @lichuminglcm @yifan123 @RobinC94 @zjowowen