- 29 6月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 26 6月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
* undo noise * another
-
- 24 6月, 2021 2 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 17 6月, 2021 1 次提交
-
-
由 Samyam Rajbhandari 提交于
* largest_partitioned_params calculation fix largest partitioned params was getting calculated incorrectly * Update stage3.py * Update stage3.py * formatting fix * changing sub-group size default to 1e9 Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 09 6月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 08 6月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
* fix missed subclassed partitioning bug * fix on exit Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 22 5月, 2021 1 次提交
-
-
由 Meng, Peng 提交于
* fix Reduce Scatter default value * Update constants.py Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 21 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Align fp16 param wap buffers * Integrating swap buffer manager for fp16 params * Support swapping misaligned fp16 parameters * Support swap into unaligned fp16 buffer
-
- 20 5月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 19 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Align fp16 param wap buffers * Integrating swap buffer manager for fp16 params * Support swapping misaligned fp16 parameters
-
- 16 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Round robin partitioning to improve ZeRO-2 Offload CPU copy * Formatting fixes * Fix index issues in debug dumps * Remove debug prints * Code cleanup * Remove unintended stage3.py changes * Add TODO
-
- 14 5月, 2021 2 次提交
-
-
由 Olatunji Ruwase 提交于
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 08 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Unused parameters assert should be disabled by default * Fix message * Invert assert logic in unit test * Change option for ignoring unused parameters Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 01 5月, 2021 2 次提交
-
-
由 Sean Naren 提交于
* Add additional conditions when checking types of output from the model * Add test * Modify test to use torch.tensor as well Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
-
- 30 4月, 2021 2 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Samyam Rajbhandari 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 29 4月, 2021 1 次提交
-
-
由 Sean Naren 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 25 4月, 2021 1 次提交
-
-
由 hamlet 提交于
* Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Fix syntax error * Fix yapf error * Fix yapf error * Fix yapf error * Fix yapf error * Move stage2 find_unused_parameters to config file * Add stage2 find_unused_parameters * Add stage2 find_unused_parameters * Add stage2_find_unused_parameters option * Change error msg to reflect zero_optimization config change * Fix yapf error * Fix yapf errors * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Add UnusedParametersModel for test option find_unused_parameters * Add unit test for stage2 find_unused_parameters * Add cpu-adam compatible check * Remove dups import * Trim spaces * Fix yapf errors * Trim spaces * Add False Positive test check * Fix find_unused_parameters test * Trim spaces * Fix yapf error
-
- 24 4月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Use amp autocast in ZeRO3 linear * Fix typo * Handle specific exceptions * CI breaks on torch.distributed * Add autocast unit test * Format fixes * Fix skip logic Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 23 4月, 2021 1 次提交
-
-
由 William Buchwalter 提交于
* Fix issue where gradient_predivide_factor was called as a func. `gradient_predivide_factor` is a `float`, hence shouldn't be called as func. This crashes when `reduce_scatter` flag is set to `False`.
-
- 22 4月, 2021 2 次提交
-
-
由 Olatunji Ruwase 提交于
* Make reduce scatter optional for ZeRO-1 as workaround * Make allreduce default for ZeRO 1 Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Cheng Li 提交于
* use wierd shaped tensor to avoid silent failures when not registering externel params * fix typo Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 21 4月, 2021 2 次提交
-
-
由 Stas Bekman 提交于
-
由 Sean Naren 提交于
* Add check to see if json file is already loaded * Update doc * Address review * Remove doc comment Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 20 4月, 2021 1 次提交
-
-
由 Shaden Smith 提交于
* zinf tutorial * more megatron integration docs * ZInf + tiling docs
-
- 19 4月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
-
- 17 4月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Fix UnboundLocalError * Get full partition size
-
- 15 4月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
* faster flatten/unflatten with apex * switch to cpp flatten/unflatten * style * better comment * missing import * switch to build ops at run time * fixes Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 14 4月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
-
- 08 4月, 2021 4 次提交
-
-
由 Samyam Rajbhandari 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Samyam Rajbhandari 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 02 4月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
* zero.Init() clarification clarify that if `model.half()` can't fit into gpu memory `zero.Init()` is a must. this proposal is via @samyam's clarification shared elsewhere. Thank you. * style * add clarity * style Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 27 3月, 2021 2 次提交
-
-
由 hamlet 提交于
* Fix zero stage2 cpu_offload when some model trainable parameters skipped in training, as in https://github.com/microsoft/DeepSpeed/issues/707 As some model trainable parameters skipped in training, their backward hooks in self.create_reduce_and_remove_grad_hooks() will not run, so they have no norm_for_param_grads * Trim space * Trim space Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 17 3月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-