[DRAFT] Tentative implementation of MiCS (#2964)
* include mics config and optimizer * change private vars to public vars so the child class can initialize these vars * Port the init function from stage3 * adding a model test file for mics * adopt to get_acceleartor api and fp16 group defrag * WIP: porting mics modification to ms master * WIP: included gradient all-reduce among replication groups * WIP: ported hierarchical all gather part did basic loss test on a simple MLP model * [Bug fix] using the comm group attached on the param * torch2.0 support * remove print * delegate wait op * [Bug] fix naming * adding doc string * resolving recursive import * fix formating, typo and license * fix license and unit test error --------- Co-authored-by: NUbuntu <ubuntu@ip-172-31-14-191.us-west-2.compute.internal> Co-authored-by: NUbuntu <ubuntu@ip-172-31-7-70.us-west-2.compute.internal> Co-authored-by: NZhen Zhang <zhzhn@amazon.com> Co-authored-by: Nzhzhn <zhzhn@ip-10-2-57-114.us-west-2.compute.internal>
Showing
deepspeed/runtime/zero/mics.py
0 → 100755
此差异已折叠。
想要评论请 注册 或 登录