- 22 5月, 2021 1 次提交
-
-
由 Meng, Peng 提交于
* fix Reduce Scatter default value * Update constants.py Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 21 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Align fp16 param wap buffers * Integrating swap buffer manager for fp16 params * Support swapping misaligned fp16 parameters * Support swap into unaligned fp16 buffer
-
- 20 5月, 2021 3 次提交
-
-
由 Mark 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 dependabot[bot] 提交于
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.11.0 to 1.11.4. - [Release notes](https://github.com/sparklemotion/nokogiri/releases) - [Changelog](https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md) - [Commits](https://github.com/sparklemotion/nokogiri/compare/v1.11.0...v1.11.4) Signed-off-by: Ndependabot[bot] <support@github.com> Co-authored-by: Ndependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
由 Jeff Rasley 提交于
-
- 19 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Align fp16 param wap buffers * Integrating swap buffer manager for fp16 params * Support swapping misaligned fp16 parameters
-
- 16 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Round robin partitioning to improve ZeRO-2 Offload CPU copy * Formatting fixes * Fix index issues in debug dumps * Remove debug prints * Code cleanup * Remove unintended stage3.py changes * Add TODO
-
- 14 5月, 2021 5 次提交
-
-
由 Shaden Smith 提交于
* is not -> != * Use pytest-randomly to seed unit tests.
-
由 Olatunji Ruwase 提交于
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
* [configure_distributed_model] improve assert This PR changes the 2 asserts to actually print the names of the params that are wrong. e.g.: ``` fp16 is enabled but the following parameters have dtype that is not fp16: wav2vec2.masked_spec_embed ``` * style Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Olatunji Ruwase 提交于
* Fix docstring * Make screenshots clickable for easier viewing * Navigation menu in alphabetical order; More clicable screenshots * Rename 1Cycle doc * Tweak naming * Remove no longer used flag * ZeRO3 Offload release * Single GPU results * Rearrange figures * Single GPU text * tweak intro * zero3-offload section * Add asynchronous i/o docs
-
- 13 5月, 2021 2 次提交
-
-
由 Cheng Li 提交于
* use the original function's name as the key to old_functions dict * update profile output format * print at global rank 0 * add flops calculation in bwd pass using time from ds timers * improve aggregated profiling out to show all depth * print samples/second * update readme and examples * update docs * fix typo and reorder printing * fix format
-
由 William Buchwalter 提交于
* rename train_step_batch_size to train_micro_batch_size_per_gpu * clarify batch_size related doc
-
- 11 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Validate env; remove write size from logs * Performance scripts for auto-tunining/auto-generating aio params of deepspeed config. * Formatting fixes * Address feedback
-
- 08 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Unused parameters assert should be disabled by default * Fix message * Invert assert logic in unit test * Change option for ignoring unused parameters Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 07 5月, 2021 1 次提交
-
-
由 Mark Saroufim 提交于
* Added community tutorials to README * Update README.md
-
- 06 5月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* NVMe intra-request validation should be on entire file Optimizer swap buffer sizes should be aligned * Add fix message for missing aio lib error.
-
- 05 5月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
Previously we would bump release version then push release, but then any unreleased commits in master are versioned to the previous release which isn't correct. Instead we will now push to PyPI then bump to next version for un-released commits.
-
- 04 5月, 2021 2 次提交
-
-
由 Stas Bekman 提交于
* fix assert The current assert "Model must initialized in fp16 mode for ZeRO Stage 3." needs TLC - I rewrote it completely to match its cousen assert, so now we have 2 consistent matching asserts: - f"fp16 is enabled but one or several model parameters have dtype that is not fp16" - f"fp16 is not enabled but one or several model parameters have dtype of fp16" * remove f
-
由 janEbert 提交于
Fix #1032
-
- 03 5月, 2021 1 次提交
-
-
由 Cheng Li 提交于
-
- 01 5月, 2021 3 次提交
-
-
由 Sean Naren 提交于
* Add additional conditions when checking types of output from the model * Add test * Modify test to use torch.tensor as well Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 dependabot[bot] 提交于
Bumps [rexml](https://github.com/ruby/rexml) from 3.2.4 to 3.2.5. - [Release notes](https://github.com/ruby/rexml/releases) - [Changelog](https://github.com/ruby/rexml/blob/master/NEWS.md) - [Commits](https://github.com/ruby/rexml/compare/v3.2.4...v3.2.5) Signed-off-by: Ndependabot[bot] <support@github.com> Co-authored-by: Ndependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Olatunji Ruwase 提交于
-
- 30 4月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 01 5月, 2021 5 次提交
-
-
由 Jiangang Zhu 提交于
Co-authored-by: NJiangang Zhu <jiangazh@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
* make it easier to run tests * cleanup Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Cheng Li 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Stas Bekman 提交于
-
由 Stas Bekman 提交于
-
- 30 4月, 2021 3 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Samyam Rajbhandari 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 29 4月, 2021 2 次提交
-
-
由 Stas Bekman 提交于
* support param groups * terrible autoformatter
-
由 Sean Naren 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 4月, 2021 1 次提交
-
-
由 Cheng Li 提交于
Co-authored-by: NSean Naren <sean@grid.ai> Co-authored-by: NSean Naren <sean@grid.ai> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 27 4月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 25 4月, 2021 1 次提交
-
-
由 hamlet 提交于
* Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Add find_unused_parameters option As unused parameters in modules may not be expected sometimes, add an explicit error msg when it occurred and an option to avoid the error: https://github.com/microsoft/DeepSpeed/issues/707 * Fix syntax error * Fix yapf error * Fix yapf error * Fix yapf error * Fix yapf error * Move stage2 find_unused_parameters to config file * Add stage2 find_unused_parameters * Add stage2 find_unused_parameters * Add stage2_find_unused_parameters option * Change error msg to reflect zero_optimization config change * Fix yapf error * Fix yapf errors * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Change find_unused_parameters option name * Add UnusedParametersModel for test option find_unused_parameters * Add unit test for stage2 find_unused_parameters * Add cpu-adam compatible check * Remove dups import * Trim spaces * Fix yapf errors * Trim spaces * Add False Positive test check * Fix find_unused_parameters test * Trim spaces * Fix yapf error
-
- 24 4月, 2021 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Add nvme unit/perf tests * Minor tweaks/fixes * Format fixes * Address PR feedback
-