- 29 5月, 2020 1 次提交
-
-
由 Chunyang Wen 提交于
* fix: typo in code docs * more pythonic code
-
- 28 5月, 2020 5 次提交
-
-
由 Chunyang Wen 提交于
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* add support for predivide as a flag * add predivide json config, remove allgather_disable (as it's not currently used anymore)
-
由 Samyam Rajbhandari 提交于
Contiguous Gradients should be set to false by default. Its not useful unless the model is very large
-
由 Samyam Rajbhandari 提交于
* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather
-
- 27 5月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
* updates to support fp32 grad clipping and disable max_grad_norm
-
由 Shaden Smith 提交于
-
- 25 5月, 2020 1 次提交
-
-
由 Chunyang Wen 提交于
-
- 22 5月, 2020 2 次提交
-
-
由 Shaden Smith 提交于
-
由 Shaden Smith 提交于
-
- 21 5月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 19 5月, 2020 7 次提交
-
-
由 Shaden Smith 提交于
-
由 Shaden Smith 提交于
-
由 Shaden Smith 提交于
* BERT title
-
由 Shaden Smith 提交于
-
由 Shaden Smith 提交于
-
由 Jeff Rasley 提交于
Updates for ZeRO stage 2 + ZeRO stage 1 w. RS Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Nyuxionghe <yuxhe@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com>
-
由 Arash Ashari 提交于
* adding BingSqaud e2e test * updating the draft test; bring final step under try section * finalizinf test for base deepspeed and deepspeed with ZeRO * applying the comment (thanks Jeff); fixed formatting
-
- 15 5月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 14 5月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 13 5月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 12 5月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Support dynamic loss scale args in fp16 optimizers * Update names
-
- 07 5月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 06 5月, 2020 2 次提交
-
-
由 Shaden Smith 提交于
-
由 Jeff Rasley 提交于
* add basic post-install test
-
- 05 5月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 01 5月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
* update apex version to feb 5th commit * use gradient clipping instead of max grad norm in tests * add warning when user provides max_grad_norm * update examples commit
-
由 Jeff Rasley 提交于
-
- 29 4月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
1) CSR parameter names should end with .weight. 2) When using basic optimizer directly, DeepSpeed should handle zero_grad. Letting the basic optimizer do the zero_grad resulted in residual gradients in the embedding layer due to unknown reasons.
-
- 28 4月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 25 4月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
Remove explicit torch version requirement so that we can more easily support other versions
-
由 Olatunji Ruwase 提交于
-
- 23 4月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 22 4月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 21 4月, 2020 2 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
-
由 marload 提交于
-
- 16 4月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 12 4月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
-
- 11 4月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-