- 26 4月, 2023 2 次提交
-
-
由 Zhen Zhang 提交于
* include mics config and optimizer * change private vars to public vars so the child class can initialize these vars * Port the init function from stage3 * adding a model test file for mics * adopt to get_acceleartor api and fp16 group defrag * WIP: porting mics modification to ms master * WIP: included gradient all-reduce among replication groups * WIP: ported hierarchical all gather part did basic loss test on a simple MLP model * [Bug fix] using the comm group attached on the param * torch2.0 support * remove print * delegate wait op * [Bug] fix naming * adding doc string * resolving recursive import * fix formating, typo and license * fix license and unit test error --------- Co-authored-by: NUbuntu <ubuntu@ip-172-31-14-191.us-west-2.compute.internal> Co-authored-by: NUbuntu <ubuntu@ip-172-31-7-70.us-west-2.compute.internal> Co-authored-by: NZhen Zhang <zhzhn@amazon.com> Co-authored-by: Nzhzhn <zhzhn@ip-10-2-57-114.us-west-2.compute.internal>
-
由 Alexander Jipa 提交于
Co-authored-by: NAlexander Jipa <azzhipa@amazon.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
-
- 21 4月, 2023 2 次提交
-
-
由 Michael Wyatt 提交于
* move dist init out of Engine
-
由 Olatunji Ruwase 提交于
* zero3 checkpoint frozen params * Remove debug prints * Move to cpu * WIP * WIP * WIP * Cleanup * Cleanup * Extend unit test for frozen params * API fix
-
- 12 4月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NConnor Holmes <connorholmes@microsoft.com> Co-authored-by: NLok Chand Koppaka <lokoppak@microsoft.com> Co-authored-by: NMasahiro Tanaka <81312776+tohtana@users.noreply.github.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 06 4月, 2023 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 31 3月, 2023 1 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 30 3月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Make fp32 default communication data type * Fix asserts
-
- 27 3月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 24 3月, 2023 2 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 FreyaRao 提交于
Co-authored-by: NQinghuan Rao <qinghuanrao@microsoft.com>
-
- 22 3月, 2023 1 次提交
-
-
由 Molly Smith 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 15 3月, 2023 2 次提交
-
-
由 Joe Mayer 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Joe Mayer 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 13 3月, 2023 1 次提交
-
-
由 Adam Moody 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 10 3月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 28 2月, 2023 1 次提交
-
-
由 Mayank Mishra 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 22 2月, 2023 2 次提交
-
-
由 Olatunji Ruwase 提交于
* Make z3 respect comm dtype * Support fp32 comm dtype * Remove obsolete assert * Code cleanup
-
由 Conglong Li 提交于
* data efficiency library update * data efficiency library update * data efficiency update * data efficiency update
-
- 07 2月, 2023 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 01 2月, 2023 1 次提交
-
-
由 Logan Adams 提交于
* Remove hardcoded instances to fp16 in log messages. * Add model_dtype to print the correct format * Respond to PR feedback --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 31 1月, 2023 1 次提交
-
-
由 Bing Xie 提交于
* modify engine.py for formatting * commit formatting changes on engine.py
-
- 27 1月, 2023 2 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Dashiell Stander 提交于
* Fix how autotuning reports TFLOPS so that they are reported in FLOPS per second, not millisecond Co-authored-by: NNick Sarkauskas <nsarka00@gmail.com> Co-authored-by: NQuentin Anthony <anthony.301@osu.edu> Signed-off-by: NDashiell Stander <dstander@protonmail.com> * Actually it is microseconds -> seconds Signed-off-by: NDashiell Stander <dstander@protonmail.com> * Actually it is microseconds -> seconds Signed-off-by: NDashiell Stander <dstander@protonmail.com> Signed-off-by: NDashiell Stander <dstander@protonmail.com> Co-authored-by: NNick Sarkauskas <nsarka00@gmail.com> Co-authored-by: NQuentin Anthony <anthony.301@osu.edu>
-
- 26 1月, 2023 1 次提交
-
-
由 Ma, Guokai 提交于
* Integrate accelerator abstraction interface into deepspeed/ * Fix error message in fp16/fused_optimizer * fix error message in fp16/unfused_optimizer.py * assign get_accelerator().pin_memory() result to input Tensor name * no need to check cuda and whether nvtx supported * move try-except into inner most block * call Event() and Stream() in get_accelerator() for data type * Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed * Apply op_builder backend api change from #2705 from @jeffra * fix tests where Builder NAME is used * keep original ...Builder.NAME interface instead of ...Builder().NAME interface * fix builder closure for installation * fix randomltd builder * add comments to clarify create_op_builder and get_op_builder * fix compatibility with pip install -e Co-authored-by: NCheng Li <pistasable@gmail.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 25 1月, 2023 1 次提交
-
-
由 Joe Mayer 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 20 1月, 2023 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NLev Kurilenko <lekurile@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 1月, 2023 1 次提交
-
-
由 Joe Mayer 提交于
* BF16 optimizer only with ZeRO stage 1. * Updating to grad accum of fp32 for BF16 ZeRO1 case. Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 18 1月, 2023 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 17 12月, 2022 1 次提交
-
-
由 Alexander Jipa 提交于
taking gradient accumulation steps into account for throughput calculation Co-authored-by: NAlexander Jipa <azzhipa@amazon.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 13 12月, 2022 1 次提交
-
-
由 Conglong Li 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 06 12月, 2022 1 次提交
-
-
由 Ma, Guokai 提交于
* allow bf16 model with fp32 gradient accumulation datatype * allow fp32 gradient accumulation and bfloat16 model in amp mode * alternative fix for grad accumulation type mismatch. In the case of zero optimizer we should have grad accum type == model data type Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 30 11月, 2022 1 次提交
-
-
由 Cheng Li 提交于
* rollback ds config changes * fix format * Fix error when output_file is a relative path without a prefix (#2397) Co-authored-by: NBenjamin Steenhoek <benjaminjsteenhoek@gmail.com> * fix restuls and exprs path to use absolute path * use base64 encoded ds config as cmd arg * fix format * remove assert * write out optimial config after tuning * fix format * no need to update ds config path when encoding ds config * udpate * do not use abs path for result and expr dir * fix conflicts * fix run mode * fix format * fix format Co-authored-by: NBenjamin Steenhoek <benjaminjsteenhoek@gmail.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 29 11月, 2022 1 次提交
-
-
由 ShijieZZZZ 提交于
* report progress at gradient accumulation boundary * format * format
-
- 28 11月, 2022 1 次提交
-
-
由 Joe Mayer 提交于
* Adding gradient accumulation dtype config. * Switching to new DtypeEnum * Adding standalone check function, and unit tests * Variable disambiguation * Adding checks for unsupported states. * Updating for PR comments. * Reorganizing unit test. Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 11 11月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 25 10月, 2022 1 次提交
-
-
由 Joe Mayer 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 22 10月, 2022 1 次提交
-
-
由 Adam Moody 提交于
* parallelize layer checkpoints across data parallel groups * use partition_uniform to determine start/end index values * formatting fix * config: add option for parallel write of layer checkpoints in pipeline stage * yapf fixes * enable parallel layer write according to config param * avoid extraneous makedir when rank 0 writes all layers Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 18 10月, 2022 2 次提交
-
-
由 Olatunji Ruwase 提交于
* Refactor universal checkpointing and tensor fragments * Formatting * Support zero stage1; Expand TP dim * Remove debug prints * Detect sharded optimizer state * Format fixes * Encode reshaping guide * More symbolic constants Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Joe Mayer 提交于
* fixing bug 2361 * adding pytest for config initialization * chaning expected output to FusedAdam * remove print statement * running yapf on modified files * running pre-commit formatting Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-