- 08 12月, 2020 1 次提交
-
-
由 Stas Bekman 提交于
-
- 05 12月, 2020 1 次提交
-
-
由 Zhun 提交于
* 1) Register layout as buffer of module so that we can save/load checkpoint; 2) Add a broadcast of layout at the beginning to ensure different processes will have consistent layout during distributed training. * Add docstring for max_seq_length argument in SparseSelfAttention Co-authored-by: NZhun Liu <zhunliu@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 03 12月, 2020 5 次提交
-
-
由 Stas Bekman 提交于
-
由 Jeff Rasley 提交于
-
由 Stas Bekman 提交于
-
由 Jeff Rasley 提交于
-
由 Stas Bekman 提交于
* [cifar tutorial] improve readability
-
- 02 12月, 2020 2 次提交
-
-
由 Reza Yazdani 提交于
* tracking optimizer step in cpu-adam when loading checkpoint * add warning/error message for updating optimizer step count * resolve build issue * supporting state update from the python side * track step from python in all cases * remove comma
-
由 Reza Yazdani 提交于
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 11月, 2020 1 次提交
-
-
由 Stas Bekman 提交于
This PR: * fixes a misspelled method name * also `( () )` doesn't read too well, until one reads the code and understands that it's not a formatting bug. I proposed to simply say that it's a callable object.
-
- 26 11月, 2020 4 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Shaden Smith 提交于
-
- 23 11月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 25 11月, 2020 6 次提交
-
-
由 Jeff Rasley 提交于
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 24 11月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None
-
- 23 11月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
-
- 22 11月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 21 11月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Use zero-tensors for missing gradients to avoid size mismatch * Unit test for unbalanced gradients in ZeRO * Formatting fixes
-
- 20 11月, 2020 6 次提交
-
-
由 Jeff Rasley 提交于
-
由 Ammar Ahmad Awan 提交于
* Use AML method to set env vars instead of using mpi4py. Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Seunghwan Hong 提交于
* Add guard to not using `torch.version.cuda` above no-CUDA environment. * Fix several typos on setup.py. Signed-off-by: NSeunghwan Hong <seunghwan@scatterlab.co.kr> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* zero-1 memory fix * auto-tune max elems per comm to reduce padding/comm intervals * clean-up and added previously missing reduction options * fix testing backing to work with torch1.7
-
- 19 11月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 18 11月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Fix layout bug in ZeRO Stage 1 checkpoint logic Add elastic checkpoint option for ZeRO stage 1, default to True * Format fixes
-
- 14 11月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* remove cpu-feature * remove psutils requirement
-
- 13 11月, 2020 3 次提交
-
-
由 Shaden Smith 提交于
* Adds torch install requirement to documentation. * build ops documentation
-
由 Jeff Rasley 提交于
* on cpu box error gracefully if cuda home doesn't exist * gaurd against torch import issue * fix sytax error * fix import
-
由 Jeff Rasley 提交于
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
-
- 12 11月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
* Update zero.md Update to ZeRO tutorial to specify the use of activation checkpointing * Update zero-offload.md Use activation checkpointing with ZeRO-Offload Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-