feat(activation_checkpointing): add `non_reentrant_checkpoint` to support...
feat(activation_checkpointing): add `non_reentrant_checkpoint` to support inputs require no grad (#4118) * feat: add `non_reentrant_checkpoint` * feat: add missing output postprocess and change the hook to record leaf forward tensor refs * fix: make the multi_grad_hook registered after graph construction * fix: backward compatibility for multi_tensor_hook * fix: nonlocal reference error of deepspeed_saved_tensors * fix: reduce repeating hook registration * test: add test for `activation_checkpointing.checkpointing.non_reentrant_checkpoint` * Pass correct node size for ZeRO++ (#4085) * Pass correct node size * formatting --------- Co-authored-by: NConnor Holmes <development@cmikeh2.me> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * add deepspeed chat arxiv report (#4110) * add deepspeed chat arxiv report * add zeroquant v2 and fp * add selective enhencement * add ignore for 'Youn' in spell checker --------- Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * style: change flake8 detected style missmatch * test: hack to clone the `test_activation_checkpointing` module for reuse and add regression tests * doc: explain the introduction of `non_reentrant_checkpoint` * doc: explain the test of `non_reentrant_checkpoint` --------- Co-authored-by: NConnor Holmes <connorholmes@microsoft.com> Co-authored-by: NConnor Holmes <development@cmikeh2.me> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NConglong Li <conglong.li@gmail.com> Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Showing
想要评论请 注册 或 登录