- 10 9月, 2020 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
* 1-bit adam (#353) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NYour Name <you@example.com> Co-authored-by: Ntanghl1994 <htang14@ur.rochester.edu> Co-authored-by: NHank <tanghl1994@gmail.com> Co-authored-by: Nroot <root@node2x12b.cs.rochester.edu> Co-authored-by: NAmmar Ahmad Awan <awan.ammar@microsoft.com>
-
- 06 9月, 2020 1 次提交
-
-
由 Arash Ashari 提交于
-
- 02 9月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com>
-
- 01 9月, 2020 2 次提交
-
-
由 Samyam Rajbhandari 提交于
* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation * Gradient Accumulation support for Stage 2. Model tests added to test the feature * formatting * Update deepspeed_light.py removing comment * Update ds_config_func_bs8_zero1.json reverting this file back. Its not needed for this PR * defining baseline prefix Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Samyam Rajbhandari 提交于
* Update deepspeed_checkpointing.py * formatting Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 8月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* turn off multi-node launch if only 1 node
-
- 14 8月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* update fan out flag for pdsh
-
- 13 8月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 11 8月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* add fix and tests for get_lr from lr_scheduler before training starts
-
- 08 8月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
The parenthesis alter the evaluation of the assert() and it will always evaluate to True.
-
- 01 8月, 2020 1 次提交
-
-
由 Emmanuel Kahembwe 提交于
mpu object is bound to the class instance.. the if statement uses `self.mpu' but just `mpu` is called in the following lines.. This raises a NameError
-
- 24 7月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* updates to amp to support grad clip and grad accumulation * zero grad using optimizer if in amp mode
-
- 23 7月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Avoid deadlock for unsynchronized non-zero checkpointing * Fix formatting issues Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 22 7月, 2020 1 次提交
-
-
由 Arash Ashari 提交于
-
- 16 7月, 2020 2 次提交
-
-
由 Jeff Rasley 提交于
* empty grad fix * add unit tests for empty grad
-
由 Olatunji Ruwase 提交于
-
- 14 7月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Support saving and loading ZeRO checkpoints on different data parallelism degree. * Fix formatting * Support checkpoint with varying GPU count in ZeRO stage 1 * Fix formatting * Formatting fixes * Update model tests * Remove pprint * Minor fix * Fix formatting * Update model tests Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 12 7月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* add amp support for deepspeed (non-ZeRO) * tests for amp mode
-
- 07 7月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Load non-DeepSpeed checkpoints into ZeRO optimizer * Handle parameters smaller than DP * Formatting fixes * Handle empty partitions * Fix perf bug Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 24 6月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Load non-DeepSpeed checkpoints into ZeRO optimizer * Handle parameters smaller than DP * Formatting fixes
-
- 20 6月, 2020 3 次提交
-
-
由 Samyam Rajbhandari 提交于
* Removing handle_overflow debugging code in deepspeed_utils.py * Removing handle_overflow debugging code in deepspeed_zero_optimizer.py Removing unnecessary overflow handle code. Not sure why it was there in the first place.
-
由 Shaden Smith 提交于
This reverts commit 54c0267e.
-
由 Tunji Ruwase 提交于
-
- 18 6月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 12 6月, 2020 1 次提交
-
-
由 Chunyang Wen 提交于
-
- 09 6月, 2020 1 次提交
-
-
由 eltonzheng 提交于
-
- 06 6月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Debugging * Fix step() bug; Make step timing optional * Remove unnecessary changes * Format fixes * Replace list with scalar variable * Remove redundant code * Fix typo
-
- 05 6月, 2020 2 次提交
-
-
由 Vidush Vishwanath 提交于
-
由 Chunyang Wen 提交于
* Add log util * replace all occurrences of print and logging * address format * disable propagate to avoid duplicate log
-
- 04 6月, 2020 1 次提交
-
-
由 eltonzheng 提交于
-
- 30 5月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* Transformer kernels release Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
-
- 29 5月, 2020 1 次提交
-
-
由 Chunyang Wen 提交于
* fix: typo in code docs * more pythonic code
-
- 28 5月, 2020 5 次提交
-
-
由 Chunyang Wen 提交于
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
* add support for predivide as a flag * add predivide json config, remove allgather_disable (as it's not currently used anymore)
-
由 Samyam Rajbhandari 提交于
Contiguous Gradients should be set to false by default. Its not useful unless the model is very large
-
由 Samyam Rajbhandari 提交于
* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather * Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather
-
- 27 5月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* updates to support fp32 grad clipping and disable max_grad_norm
-