- 01 5月, 2021 3 次提交
-
-
由 Cheng Li 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
-
由 Stas Bekman 提交于
-
- 30 4月, 2021 3 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Samyam Rajbhandari 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 29 4月, 2021 2 次提交
-
-
由 Stas Bekman 提交于
* support param groups * terrible autoformatter
-
由 Sean Naren 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 4月, 2021 1 次提交
-
-
由 Cheng Li 提交于
Co-authored-by: NSean Naren <sean@grid.ai> Co-authored-by: NSean Naren <sean@grid.ai> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 27 4月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 25 4月, 2021 1 次提交
-
-
由 hamlet 提交于
* Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Fix syntax error * Fix yapf error * Fix yapf error * Fix yapf error * Fix yapf error * Move stage2 find_unused_parameters to config file * Add stage2 find_unused_parameters * Add stage2 find_unused_parameters * Add stage2_find_unused_parameters option * Change error msg to reflect zero_optimization config change * Fix yapf error * Fix yapf errors * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Add UnusedParametersModel for test option find_unused_parameters * Add unit test for stage2 find_unused_parameters * Add cpu-adam compatible check * Remove dups import * Trim spaces * Fix yapf errors * Trim spaces * Add False Positive test check * Fix find_unused_parameters test * Trim spaces * Fix yapf error
-
- 24 4月, 2021 2 次提交
-
-
由 Olatunji Ruwase 提交于
* Add nvme unit/perf tests * Minor tweaks/fixes * Format fixes * Address PR feedback
-
由 Olatunji Ruwase 提交于
* Use amp autocast in ZeRO3 linear * Fix typo * Handle specific exceptions * CI breaks on torch.distributed * Add autocast unit test * Format fixes * Fix skip logic Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 23 4月, 2021 4 次提交
-
-
由 Olatunji Ruwase 提交于
* Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming * Remove no longer used flag * ZeRO3 Offload release * Single GPU results * Rearrange figures * Single GPU text * tweak intro * zero3-offload section * Add asynchronous i/o docs
-
由 Stas Bekman 提交于
- `offload_param` was missing `pin_memory` - also moved the entry in `offload_optimizer` to have it in the same place.
-
由 William Buchwalter 提交于
* Fix issue where gradient_predivide_factor was called as a func. `gradient_predivide_factor` is a `float`, hence shouldn't be called as func. This crashes when `reduce_scatter` flag is set to `False`.
-
由 Olatunji Ruwase 提交于
-
- 22 4月, 2021 3 次提交
-
-
由 sdtblck 提交于
-
由 Olatunji Ruwase 提交于
* Make reduce scatter optional for ZeRO-1 as workaround * Make allreduce default for ZeRO 1 Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Cheng Li 提交于
* use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 21 4月, 2021 5 次提交
-
-
由 Conglong Li 提交于
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He Paper: https://arxiv.org/abs/2104.06069Co-authored-by: Nsdtblck <46172032+sdtblck@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Sean Naren 提交于
* Add check to see if json file is already loaded * Update doc * Address review * Remove doc comment Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Jeff Rasley 提交于
-
- 20 4月, 2021 1 次提交
-
-
由 Shaden Smith 提交于
-
- 19 4月, 2021 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
- 20 4月, 2021 1 次提交
-
-
由 Shaden Smith 提交于
* zinf tutorial * more megatron integration docs * ZInf + tiling docs
-
- 19 4月, 2021 5 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Shaden Smith 提交于
* zinf tutorial * more megatron integration docs
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
-
- 17 4月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Fix UnboundLocalError * Get full partition size
-
- 15 4月, 2021 3 次提交
-
-
由 Cheng Li 提交于
* update lr scheduler doc for doing per step or epoch update * work * trigger build Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
* faster flatten/unflatten with apex * switch to cpp flatten/unflatten * style * better comment * missing import * switch to build ops at run time * fixes Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
-
- 14 4月, 2021 2 次提交
-
-
由 Stas Bekman 提交于
* e-notation for large floats * handle ints too * readability * handle bool Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
-