[model weights] zero_to_fp32 multiple improvements (#1181)
* add live zero checkpoint to fp32 consolidation version
* some more docs
* zero2 model states uses a different filename
* fix
* make debug mode cli configurable
* copy the script only on node 0 process 0
* validate that we have the right number of files
* revamp _get_zero_param_shapes, instrument with easier debug
* correct assertion
* rename API; add even simpler API
* style
* docs improve
* update the docs
* revert the unpartitioned_params detection and report as it's most likely persistent params
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Showing
deepspeed/utils/zero_to_fp32.py
100644 → 100755
想要评论请 注册 或 登录