- 10 6月, 2023 3 次提交
-
-
由 Logan Adams 提交于
* Add non-interactive prompt, causing issues for some users * Update pytorch version too
-
由 Abhilash Majumder 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 Ma, Guokai 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 09 6月, 2023 3 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 Logan Adams 提交于
* Fix typo in name of hybrid engine function * Fix
-
由 hablb 提交于
* Remove dead code params_already_reduced is not used * Prevent evaluation of debug strings Debug strings are evaluated even when logging is disabled * Use contiguous gradients tensor reduce scatter between ranks Use allreduce instead of reduce scatter. lower cpu overhead. * move overflow tracker to optimizer.step Don't check overflow in gradients for every bucket. Do overflow chack once on grad flat buffer just before optimizer step --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 08 6月, 2023 6 次提交
-
-
由 Conglong Li 提交于
* DeepSpeed overview in Japanese * DeepSpeed overview in Japanese
-
由 john li 提交于
* Small tweak on cuda version mismatch documentation * clarify minor versions should also match --------- Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 Michael Wyatt 提交于
* mix typo and missing epsilon value * Touch file to re-build * revert changes * Touch file to re-build * Format --------- Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NLogan Adams <loadams@microsoft.com>
-
由 digger yu 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 Reza Yazdani 提交于
* fix gpt-j inference issue for mlp_gemm_func call * bring back the gpt-j inference-test * fix formatting * fix the neox and pythia injection issue
-
由 Logan Adams 提交于
This reverts commit f2f5f21b.
-
- 07 6月, 2023 5 次提交
-
-
由 tensor-tang 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Logan Adams 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Abhilash Majumder 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
由 Byungsoo Oh 提交于
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Ramya Ramineni 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
- 06 6月, 2023 3 次提交
-
-
由 Siddharth Singh 提交于
-
由 Olatunji Ruwase 提交于
* Use logger in accelerator * Handle pre-build cases * Explain possible import failure
-
由 digger yu 提交于
* fix typo deepspeed/runtime * fix some typo --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 05 6月, 2023 1 次提交
-
-
由 Zhen Zhang 提交于
* fix mics save checkpoint hanging * MiCS load_checkpoint * copyright * fix for torch-1.9.0 all_reduce_coalesced api does not support nccl backend * Naming alignment * adding more test conditions for mics shard size * test with different shard sizes * adding assertion for better error msg --------- Co-authored-by: NZhen Zhang <zhzhn@amazon.com>
-
- 03 6月, 2023 3 次提交
-
-
由 Jeff Rasley 提交于
-
由 Buğra 提交于
* Refactor check_enabled root validator in DeepSpeedMonitorConfig * formatting * formatting --------- Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NMichael Wyatt <mrwyattii@gmail.com>
-
由 digger yu 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 02 6月, 2023 5 次提交
-
-
由 郭叶军 提交于
When activation checkpointing is enabled, most of forward is re-computed, and so the FLOPS calculation should be updated with recompute_fwd_factor=1.0 I don't find a way to pass the option from model script to deepspeed engine, and so add option directly for flops_profiler. Co-authored-by: NCheng Li <pistasable@gmail.com>
-
由 digger yu 提交于
* fix spelling error with deepspeed/runtime/ * fix typo docs/ * fix typo in comments with deepspeed/ * fix typo deepspeed/ * Update constants.py Remove the space after nebula --------- Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Michael Wyatt 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Haodong Lyu 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 郭叶军 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 01 6月, 2023 3 次提交
-
-
由 Micah Zoltu 提交于
Code (in this context) is mass noun, and thus has no plural form. Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Will Jessup 提交于
grammar fix. Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Michael Wyatt 提交于
* skip test for docs-only changes * add missing skip to blog changes
-
- 31 5月, 2023 3 次提交
-
-
由 CurryRice233 提交于
* add Ascend NPU accelerator support * clean code --------- Co-authored-by: Njializheng <jializheng@huawei.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 郭叶军 提交于
this change also aligns with the logic before reduce_scatter_coalesced Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Jeff Rasley 提交于
* tmp remove launcher args * add exclude list for env variables on aisc * add comment
-
- 27 5月, 2023 1 次提交
-
-
由 Danny Semiat 提交于
* Align InferenceEngine to store ms in _model_times When using cuda_events, the measured model time is stored in ms. When not using cuda_events, the measured model time was stored in seconds. This commit fixes the units and aligns them to store ms, the same as elapsed() function. This was observed when running the following pytest: unit/inference/test_model_profiling.py::TestModelProfiling::test[False-True-roberta-base-fill-mask] Returned values were: count=0 e2e_t=895.174312 model_t=0.8529715538024902 count=1 e2e_t=7.500252 model_t=0.0041310787200927734 count=2 e2e_t=3.887346 model_t=0.0018568038940429688 count=3 e2e_t=3.577845 model_t=0.0016334056854248047 count=4 e2e_t=3.43976 model_t=0.0016703605651855469 count=5 e2e_t=3.310903 model_t=0.0016107559204101562 count=6 e2e_t=3.299556 model_t=0.001603841781616211 count=7 e2e_t=3.605722 model_t=0.0015969276428222656 count=8 e2e_t=3.273741 model_t=0.0015516281127929688 count=9 e2e_t=3.46306 model_t=0.0016617774963378906 The units difference is observed here, when model_t is in ther order of 10e-3 comparing to e2e_t * Update engine.py --------- Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 26 5月, 2023 3 次提交
-
-
由 Quentin Anthony 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Olatunji Ruwase 提交于
-
由 Conglong Li 提交于
-
- 25 5月, 2023 1 次提交
-
-
由 Nikita Shulga 提交于
-