- 26 7月, 2022 4 次提交
-
-
由 Alex Hedges 提交于
-
由 Jeff Rasley 提交于
-
由 Quentin Anthony 提交于
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Juan Villamizar 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 23 7月, 2022 4 次提交
-
-
由 Quentin Anthony 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 dependabot[bot] 提交于
Bumps [tzinfo](https://github.com/tzinfo/tzinfo) from 1.2.9 to 1.2.10. - [Release notes](https://github.com/tzinfo/tzinfo/releases) - [Changelog](https://github.com/tzinfo/tzinfo/blob/master/CHANGES.md) - [Commits](https://github.com/tzinfo/tzinfo/compare/v1.2.9...v1.2.10) --- updated-dependencies: - dependency-name: tzinfo dependency-type: direct:production ... Signed-off-by: Ndependabot[bot] <support@github.com> Co-authored-by: Ndependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Ammar Ahmad Awan 提交于
-
- 22 7月, 2022 3 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
-
由 Quentin Anthony 提交于
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
-
- 21 7月, 2022 2 次提交
-
-
由 Olatunji Ruwase 提交于
* unit test, remove exception, add notes * Move param_shapes to model files * Remove hard-coded constants * Conditioned to zero optimizer * Add zero checkpoint merging * Print checkpoint version * Reshape zero_* ckpt files * Merge zero* files contraction * Utils for 3D contraction reshaping * Remove bogus import * Support bf16_zero ckpts * Add param slice mappings * Load universal checkpoints * Per group mappings from Stas * Hack to load bf16 zero files * Param attributes * WIP * Fix api bug * Update lp with local/remote hp * Disable vocab padding handling * Update z2 checkpoint * Remove debug prints * Remove debug prints; Rebase unit test * Add reshape assert * Padding * Typo * Catch nonexistent checkpoint path * Cleanup * Restore checkpoint state comparisons * Add torch version guards * More precise avoidance of false positives. Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Conglong Li 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NXiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com>
-
- 20 7月, 2022 8 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jianfeng Liu 提交于
Thanks a lot for finding this issue and fixed it :) Co-authored-by: NZhewei Yao <zheweiyao@gmail.com>
-
由 Michael Wyatt 提交于
* fix hard-coded rocm install path * added fix for newest torch+rocm install * added backup for not detecting rocm at all
-
由 Zhewei Yao 提交于
Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu> Co-authored-by: Nxiaoxiawu <yxiaoxiawu@microsoft.com> Co-authored-by: NConglong Li <conglong.li@gmail.com> Co-authored-by: NXiaoxia (Shirley) Wu <94406484+xiaoxiawu-microsoft@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Michael Wyatt 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Aman Sanger 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
* [ds-inference] checkpoint loading => tqdm solve 2 issues: - less noise using tqdm progress bar - more informative - tell users how much to wait and how many shards to load New way: ``` Loading 72 checkpoints: 12%|█▎ | 9/72 [01:12<08:39, 8.25s/it] ``` * write only from one process * style
-
- 19 7月, 2022 2 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Jeff Rasley 提交于
-
- 14 7月, 2022 4 次提交
-
-
由 Siddharth Singh 提交于
* Shards expert parameter groups * Do upscaling, optimizer and deletion of fp32 grads one-by-one on each parameter group in zero-2 Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Zeyu 提交于
Co-authored-by: NCheng Li <pistasable@gmail.com>
-
由 Manuel R. Ciosici 提交于
Co-authored-by: NConglong Li <conglong.li@gmail.com>
-
由 Cheng Li 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 09 7月, 2022 1 次提交
-
-
由 Sam Ade Jacobs 提交于
* Add github username to CODEOWNERS * add import torch.distributed to small model debugging test script * Replace torch.dist with deepspeed.com
-
- 08 7月, 2022 1 次提交
-
-
由 Siddharth Singh 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 07 7月, 2022 4 次提交
-
-
由 Alex Hedges 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 kisseternity 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
-
- 30 6月, 2022 2 次提交
-
-
由 Quentin Anthony 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Alex Hedges 提交于
* Fix missing import in replace_module.py * Change import from torch.distributed to deepspeed.comm
-
- 28 6月, 2022 1 次提交
-
-
由 Siddharth Singh 提交于
-
- 24 6月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* assert no FP16 with AMD CPUs * add unit test for AMD assert error * missing import * downgrade assert to warning
-
- 23 6月, 2022 3 次提交
-
-
由 Reza Yazdani 提交于
* Fix the half-precision version of CPU-Adam * remove unexpected return * fix the increase width (fp32/fp16) * support fp16 tests for cpu-adam * fix the fp16 data-loading * change unit-test for fp16 check & slight change to parameter size * fix for numpy error Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Conglong Li 提交于
-
由 Michael Wyatt 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-