- 13 11月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 12 11月, 2021 6 次提交
-
-
由 Baizhou Huang 提交于
* Add warmup_type arguments in WarmupLR and WarmupDecayLR * Add warmup_type unit test * replace hardcoded constants with global vars Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Reza Yazdani 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Conglong Li 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 11 11月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 10 11月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
* fixing the softmax masking when using triangular masking * move the TILE declaration outside of the SIMD loop * remove unrelated changes * fix Adagrad compile issue
-
- 09 11月, 2021 3 次提交
-
-
由 Chunyang Wen 提交于
Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Chunyang Wen 提交于
-
- 08 11月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
* [docs] fix 404 This PR fixes a few broken links * fix 404
-
- 06 11月, 2021 3 次提交
-
-
由 Nathan Frey 提交于
Fix typos in Flops Profiler message Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 alexandremuzio 提交于
Co-authored-by: NAlex Muzio <alferre@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Chunyang Wen 提交于
* Use fstr in launcher * Fix wrong condition for word_info * Fix typo
-
- 05 11月, 2021 1 次提交
-
-
由 Cheng Li 提交于
-
- 03 11月, 2021 3 次提交
-
-
由 Jeff Rasley 提交于
-
由 Alex Hedges 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Chunyang Wen 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 02 11月, 2021 3 次提交
-
-
由 Stas Bekman 提交于
This PR suggests a small improvement to code readability. -------------------------- I was puzzling over this code: https://github.com/microsoft/DeepSpeed/blob/85ce85dd5f4b18c0019a5121b06900e3a2c3933b/deepspeed/runtime/pipe/module.py#L381-L385 I had no idea this construct existed. After reading up on it, it appears to be used incorrectly. The only point to using it with `break`. It's explained here https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops So I'm proposing to remove the `else` control and just run the code in its branch normally since it *always* gets executed as there is no `break` statement. And the objective of this code is to always be run if I understand it correctly. So let's make it loud and clear. Here is a quick proof: ``` for i in []: print(i) else: print("loop did not finish via break") # runs! for i in [0]: print(i) else: print("loop did not finish via break") # runs! for i in [0]: print(i) break else: print("loop did not finish via break") # does not run ``` @tjruwase
-
由 Chunyang Wen 提交于
-
由 Rana Ali Amjad 提交于
* Changes for bfloat16 Zero2 * Cleaned up additional comments and debugging code * Adapted fp16_master_weights_and_grads option to cover BF16 * Reverted fp16_master_weights_and_gradients extension to BFloat16 and minor cleanup * Fixed formatting and variable naming errors recognized in testing * Added relevant unit tests for bfloat16 with ZeRO-2 * Updates conditions for skipping BFloat16 unit tests * Added check for NCCL inconsistent version naming convention * Update skip message for Bfloat16 tests to mention additional checks Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 01 11月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 31 10月, 2021 1 次提交
-
-
由 Zhen Zhang 提交于
* remove norm(), avoid memcpy after allgather 1) Removing the norm computation in debug printing 2) Changing _all_gather to be sync op in fetch_sub_module Reason: the async version is not async at all, because each all_gather calls torch.cuda.synchronize() to guarantee previous communication op to be completed 3) Adding new function _allgather_params_split_launch the existing _allgather_params has explicit memcpy after the all-gather op. We can avoid the explicit memory copy at python side, to improve the performance. Known issue: the `torch.distributed.all_gather` will do implicit memcpy at the end of each `ncclAllgather`. * WIP: wrapped ncclAllgather as customized op in DS micro benchmark shows the improvement of allgather a transformer layer with 9834560 elements in half precision is about 1.1ms on aws-p4d instance. * WIP: integrated into partition_parameters Performance improvement of 5.1B bert on aws-p4d: fwd: 300ms -> 200ms bwd: 680ms -> 610ms * Fix format * cleaned dead code, modified unit test * removed customized c++ extension revert back to use torch distributed API * change torch.ones to torch empty * typo * warn if not cuda tensor for allgather * fix formatting * fix: move ds_tensor to cuda device but it is strange that the ds_tensor haven't been moved to cuda * remove try clause on the path for fetching params Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 30 10月, 2021 6 次提交
-
-
由 Conglong Li 提交于
* update CL doc * doc fix
-
由 Jeff Rasley 提交于
-
由 Reza Yazdani 提交于
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Anes Benmerzoug 提交于
* Add regression test for onecyclelr zerodivision error * Wrap computation of lr and mom decay factors in try except block This handles the case when decay_step_size is set to zero, which is the default case, and prevents a zero division error * Use boolean attributes instead of try/except block
-
由 Manuel R. Ciosici 提交于
-
- 29 10月, 2021 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
* Add a flag to enable/disable token dropping in moe/top-1 gating. * fix syntax and formatting.
-
- 28 10月, 2021 2 次提交
-
-
由 Olatunji Ruwase 提交于
* Synchronize folder creation; Single latest file writer * Address PR feedback
-
-
- 27 10月, 2021 2 次提交
-
-
由 Mikhail Druzhinin 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Wenhao Hu 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 23 10月, 2021 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
* Add unit test to check moe+zero checkpoints * Fix zero stage2 checkpoint loading logic to deal with experts related state dicts.
-
- 22 10月, 2021 3 次提交
-
-
由 Conglong Li 提交于
-
由 Conglong Li 提交于
* fix pp * better fix
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-