- 15 6月, 2022 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Conglong Li 提交于
-
- 14 6月, 2022 1 次提交
-
-
由 Quentin Anthony 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 11 6月, 2022 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NQuentin Anthony <qganthony@yahoo.com> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 08 6月, 2022 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jerry Mannil 提交于
Add '-S' argument to pdsh command to return the largest error code from the ssh sessions
-
- 07 6月, 2022 2 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 01 6月, 2022 4 次提交
-
-
由 Cheng Li 提交于
-
由 Michael Wyatt 提交于
-
由 Michael Wyatt 提交于
* added unit test for various HF model families and tasks * formatting * added missing import * fixed broken pytest global vars * modified test to conform to other test structure * removed gpt-j. it cannot run on V100s (OOM) Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Reza Yazdani 提交于
-
- 26 5月, 2022 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 25 5月, 2022 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 23 5月, 2022 1 次提交
-
-
由 Mikhail Druzhinin 提交于
* Fix do not updated sparse grads * Remove call .data for sparse grads Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 21 5月, 2022 1 次提交
-
-
由 Quentin Anthony 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 20 5月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 19 5月, 2022 5 次提交
-
-
由 Jeff Rasley 提交于
-
由 Quentin Anthony 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 dependabot[bot] 提交于
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.4 to 1.13.6. - [Release notes](https://github.com/sparklemotion/nokogiri/releases) - [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md) - [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.13.4...v1.13.6) --- updated-dependencies: - dependency-name: nokogiri dependency-type: indirect ... Signed-off-by: Ndependabot[bot] <support@github.com> Co-authored-by: Ndependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
由 liamcli 提交于
-
由 Quentin Anthony 提交于
-
- 18 5月, 2022 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 16 5月, 2022 1 次提交
-
-
由 kisseternity 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 14 5月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* DeepSpeed needs to start cleaning up * Remove debug prints
-
- 12 5月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 10 5月, 2022 3 次提交
-
-
由 Stas Bekman 提交于
* [pipe] prevent deadlock with multiple evals sequence * style * style * style * align DSE commit w. latest master Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Shuai Zheng 提交于
* fix step in adam * fix backward compatibility and add unittest * add unittest * fix unbounded error when there are more than 1 param groups * fix typo * remove trailing whitespace * fix end of file Co-authored-by: NShuai Zheng <shzheng@amazon.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Manuel R. Ciosici 提交于
-
- 07 5月, 2022 1 次提交
-
-
由 Stas Bekman 提交于
* GatheredParameters - accept any iterable * torch tensor is an iterable, so can't use collections.abc.Iterable * fix
-
- 06 5月, 2022 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Olatunji Ruwase 提交于
* Fix OOM and type mismatch * Toggle prefetching * Disable z3 prefetching for inference (temp workaround) * Fix zero3 tracing issues * Remove debug prints * Enable prefetch for inference * Code clarity * Invalidate trace cache * Trace cache invalidation when needed Separate nvme prefetch from all-gather prefetch * Track last used step id * Use debug name in error message * Construct param trace from module trace Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 04 5月, 2022 1 次提交
-
-
由 Zhengqiang Yin 提交于
-
- 03 5月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 30 4月, 2022 1 次提交
-
-
由 kisseternity 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 29 4月, 2022 2 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Ramya Ramineni 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 4月, 2022 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Michael Wyatt 提交于
-