- 30 6月, 2023 1 次提交
-
-
由 Alexander Jipa 提交于
Co-authored-by: NAlexander Jipa <azzhipa@amazon.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 15 6月, 2023 1 次提交
-
-
由 mzl 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 03 6月, 2023 1 次提交
-
-
由 digger yu 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 12 5月, 2023 1 次提交
-
-
由 digger-yu 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
- 03 5月, 2023 1 次提交
-
-
由 Joe Mayer 提交于
* Adding torch.optim.Adagrad * adding adagrad for zero 1 2 * Adding Adagrad support to zero 3. * Adding documentation and DeepSpeedCPUAdagrad to list. --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 26 4月, 2023 2 次提交
-
-
由 hablb 提交于
No usage of extra_large_param_to_reduce if contiguous_gradients is False. It keeps reference of the param for the lifetime of the application. Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 郭叶军 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 31 3月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 30 3月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Make fp32 default communication data type * Fix asserts
-
- 29 3月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
* disable CPUAdam pathways in optimizer copy/step * Update stage_1_and_2.py --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 27 3月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 15 3月, 2023 1 次提交
-
-
由 Quentin Anthony 提交于
* Improve overflow logs * Trigger CI --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 07 3月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 2月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Enable tensor fragments for zero 2 * Update deepspeed/utils/tensor_fragment.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Support offload * Support multi-gpu * Cleanup * WIP * Update deepspeed/runtime/zero/stage3.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Support padding * Update deepspeed/runtime/zero/stage3.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * z3 optimizer state support; aligned api * Support frozen z3 params * Unit tests * Check NVMe offload capability * Formatting * Docs * More docs * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * More docs * More docs * Update docs/code-docs/source/zero3.rst Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * More docs * Support unsharded fp32 grad * Remove debug prints * Fix off-by-one detection of empty grads * Update deepspeed/utils/tensor_fragment.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Update deepspeed/utils/tensor_fragment.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Update deepspeed/runtime/zero/stage3.py Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> * Fix off-by-one error * Skip ranks with no gradient data * Formatting * Add license * Fix license --------- Co-authored-by: NStas Bekman <stas00@users.noreply.github.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 24 2月, 2023 1 次提交
-
-
由 Yasyf Mohamedali 提交于
* Remove deprecated `torch._six` imports Closes #2845. * Support older versions of PyTorch as well. --------- Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 26 1月, 2023 1 次提交
-
-
由 Ma, Guokai 提交于
* Integrate accelerator abstraction interface into deepspeed/ * Fix error message in fp16/fused_optimizer * fix error message in fp16/unfused_optimizer.py * assign get_accelerator().pin_memory() result to input Tensor name * no need to check cuda and whether nvtx supported * move try-except into inner most block * call Event() and Stream() in get_accelerator() for data type * Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed * Apply op_builder backend api change from #2705 from @jeffra * fix tests where Builder NAME is used * keep original ...Builder.NAME interface instead of ...Builder().NAME interface * fix builder closure for installation * fix randomltd builder * add comments to clarify create_op_builder and get_op_builder * fix compatibility with pip install -e Co-authored-by: NCheng Li <pistasable@gmail.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 25 1月, 2023 1 次提交
-
-
由 loadams 提交于
-
- 14 1月, 2023 1 次提交
-
-
由 Joe Mayer 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 11 1月, 2023 1 次提交
-
-
由 JackieWu 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 09 1月, 2023 2 次提交
-
-
由 JackieWu 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 li-yi-dong 提交于
* Remove unnecessary device synchronization for stage 2 * Remove unnecessary device synchronization for stage 2 Co-authored-by: Nliyidong.lyd <liyidong.lyd@alibaba-inc.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJoe Mayer <114769929+jomayeri@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 17 12月, 2022 1 次提交
-
-
由 郭叶军 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 10 11月, 2022 1 次提交
-
-
由 郭叶军 提交于
-
- 18 10月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Refactor universal checkpointing and tensor fragments * Formatting * Support zero stage1; Expand TP dim * Remove debug prints * Detect sharded optimizer state * Format fixes * Encode reshaping guide * More symbolic constants Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 04 8月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Match compute and reduce dtype * Unit tests Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 27 7月, 2022 1 次提交
-
-
由 Michael Wyatt 提交于
* first pass at pydanticifying Zero Configs * added pydantic to reqs * fixed bug with deprecated values not being type-checked * fixing zero config bugs from unit tests * fixed access of Config values * removing zero constants * formatting/fix broken import * fixed bad merge * fixed issue with missing aliased field * fix for failing tests * fix how deprecated fields are processed * only process dep params when they are set * fix mistyped field name * fixes, docs, removed more constants * fix merge * more fixes after merge w master * added unit tests * formatting * added fix for transformers unit tests * separated offload config from zero config * fixed bad import * formatting and flake fixes * implement suggestion from review Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 21 7月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* unit test, remove exception, add notes * Move param_shapes to model files * Remove hard-coded constants * Conditioned to zero optimizer * Add zero checkpoint merging * Print checkpoint version * Reshape zero_* ckpt files * Merge zero* files contraction * Utils for 3D contraction reshaping * Remove bogus import * Support bf16_zero ckpts * Add param slice mappings * Load universal checkpoints * Per group mappings from Stas * Hack to load bf16 zero files * Param attributes * WIP * Fix api bug * Update lp with local/remote hp * Disable vocab padding handling * Update z2 checkpoint * Remove debug prints * Remove debug prints; Rebase unit test * Add reshape assert * Padding * Typo * Catch nonexistent checkpoint path * Cleanup * Restore checkpoint state comparisons * Add torch version guards * More precise avoidance of false positives. Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 14 7月, 2022 1 次提交
-
-
由 Siddharth Singh 提交于
* Shards expert parameter groups * Do upscaling, optimizer and deletion of fp32 grads one-by-one on each parameter group in zero-2 Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 08 7月, 2022 1 次提交
-
-
由 Siddharth Singh 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 07 7月, 2022 1 次提交
-
-
由 kisseternity 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 6月, 2022 1 次提交
-
-
由 Siddharth Singh 提交于
-
- 21 6月, 2022 1 次提交
-
-
由 Karim Foda 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 14 6月, 2022 1 次提交
-
-
由 Quentin Anthony 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 11 6月, 2022 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NQuentin Anthony <qganthony@yahoo.com> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 16 5月, 2022 1 次提交
-
-
由 kisseternity 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 12 5月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 20 4月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* bf16 updates * Got bf16 working * fp32 reduction; flattened tensors * bf16+zero_stage_1 first cut * finish zero_stage 1 sharding * Matching fp16 with debugging codes * Matching loss with fp16 * Fix gradient clipping * bf16 gradient clipping fix bf16 checkpoint save/load * Unscale grad norm * Fix grad norm scaling * Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa * Fix clip_grad key error * Reduce tied weight gradients * Fix grad norm for moe * Reduce specified gradients * Use O(n) instead of O(n^2) * Remove optimizer restriction for bf16 * Link bf16 & fp32 params * Clip gradients of last stage tied weights * Simplify tied weights reduction logic * Also clip all tp rank parameters * lp to hp mapping * Link lp/hp/optim state; Refresh links after checkpoint load * Remove debug print * Remove debug print * Simplify zero_grad logic * fp32 accessors * Fix update bug Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 23 3月, 2022 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
-
- 18 3月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 17 3月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-