- 18 12月, 2020 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 16 12月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
-
由 Stas Bekman 提交于
* [doc] xref to hostfile discussion wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed. * remove whitespace
-
- 15 12月, 2020 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 12 12月, 2020 5 次提交
-
-
由 Jeff Rasley 提交于
* Update launch.py * formatting
-
由 carefree0910 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
* fix arch flags, add PTX * bug fix Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
- 10 12月, 2020 4 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 09 12月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
* Switch from deprecated allreduce interface. * Make pipeline checkpoint files portable.
-
- 08 12月, 2020 2 次提交
-
-
由 Stas Bekman 提交于
RTX-30 series are compute_86 ``` python -c "import torch; print(torch.cuda.get_device_capability())" ``` This PR adds support for this compute capability. Reference: https://developer.nvidia.com/cuda-gpusCo-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
-
- 05 12月, 2020 1 次提交
-
-
由 Zhun 提交于
* 1) Register layout as buffer of module so that we can save/load checkpoint; 2) Add a broadcast of layout at the beginning to ensure different processes will have consistent layout during distributed training. * Add docstring for max_seq_length argument in SparseSelfAttention Co-authored-by: NZhun Liu <zhunliu@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 03 12月, 2020 5 次提交
-
-
由 Stas Bekman 提交于
-
由 Jeff Rasley 提交于
-
由 Stas Bekman 提交于
-
由 Jeff Rasley 提交于
-
由 Stas Bekman 提交于
* [cifar tutorial] improve readability
-
- 02 12月, 2020 2 次提交
-
-
由 Reza Yazdani 提交于
* tracking optimizer step in cpu-adam when loading checkpoint * add warning/error message for updating optimizer step count * resolve build issue * supporting state update from the python side * track step from python in all cases * remove comma
-
由 Reza Yazdani 提交于
* supporting different hidden dimensions * add support for larger hidden dimensions (greater than 8K) * remove empty line * add loop unrolling factor for dropout kernels * update different kernels based on the reviews Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 11月, 2020 1 次提交
-
-
由 Stas Bekman 提交于
This PR: * fixes a misspelled method name * also `( () )` doesn't read too well, until one reads the code and understands that it's not a formatting bug. I proposed to simply say that it's a callable object.
-
- 26 11月, 2020 4 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Shaden Smith 提交于
-
- 23 11月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 25 11月, 2020 6 次提交
-
-
由 Jeff Rasley 提交于
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 24 11月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None
-
- 23 11月, 2020 1 次提交
-
-
由 Samyam Rajbhandari 提交于
-
- 22 11月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 21 11月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Use zero-tensors for missing gradients to avoid size mismatch * Unit test for unbalanced gradients in ZeRO * Formatting fixes
-