- 25 10月, 2022 1 次提交
-
-
由 Joe Mayer 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 22 10月, 2022 3 次提交
-
-
由 Jeff Rasley 提交于
-
由 lekurile 提交于
Co-authored-by: NLev Kurilenko <lekurile@microsoft.com>
-
由 Adam Moody 提交于
* parallelize layer checkpoints across data parallel groups * use partition_uniform to determine start/end index values * formatting fix * config: add option for parallel write of layer checkpoints in pipeline stage * yapf fixes * enable parallel layer write according to config param * avoid extraneous makedir when rank 0 writes all layers Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 20 10月, 2022 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 10月, 2022 2 次提交
-
-
由 lekurile 提交于
This PR adds a TestInjectionPolicy inference unittest class for testing custom injection policies. This test differs from the existing tests in that the injection_policy dictionary is explicitly specified when calling the DeepSpeed init_inference API. The google/t5-v1_1-small text2text-generation model and the roberta-large fill-mask model are added as tests with the injection policy explicitly specified. This is done to expand our unittest coverage to test the path where the replace_wo_policy function is invoked (see GH-2387). Co-authored-by: NLev Kurilenko <lekurile@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Jeff Rasley 提交于
-
- 18 10月, 2022 3 次提交
-
-
由 Olatunji Ruwase 提交于
* Refactor universal checkpointing and tensor fragments * Formatting * Support zero stage1; Expand TP dim * Remove debug prints * Detect sharded optimizer state * Format fixes * Encode reshaping guide * More symbolic constants Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Joe Mayer 提交于
* fixing bug 2361 * adding pytest for config initialization * chaning expected output to FusedAdam * remove print statement * running yapf on modified files * running pre-commit formatting Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Michael Wyatt 提交于
* fix for gpt-j failing due to tokenizer error * limit number of gpt-j tokens generated due to low memory
-
- 15 10月, 2022 3 次提交
-
-
由 Alexander Jipa 提交于
truncating expert param storage for checkpointing Co-authored-by: NAlexander Jipa <azzhipa@amazon.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Molly Smith 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Reza Yazdani 提交于
-
- 14 10月, 2022 9 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Samyam Rajbhandari 提交于
-
由 Andrey Chernykh 提交于
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Samyam Rajbhandari 提交于
-
由 Samyam Rajbhandari 提交于
-
由 Andrey Chernykh 提交于
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Dashiell Stander 提交于
Signed-off-by: NDashiell Stander <dstander@protonmail.com> Co-authored-by: NDashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 13 10月, 2022 4 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
-
由 Ammar Ahmad Awan 提交于
-
由 Jeff Rasley 提交于
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
-
- 12 10月, 2022 1 次提交
-
-
由 Connor Holmes 提交于
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
-
- 09 10月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 08 10月, 2022 2 次提交
-
-
由 lekurile 提交于
Update the isinstance check inside the `replace_wo_policy` function to `tuple` and `str` instead of `dict`, since the layers are provided as a `tuple` type. Co-authored-by: NLev Kurilenko <lekurile@microsoft.com> Co-authored-by: NMolly Smith <mosm@microsoft.com> Co-authored-by: NLok Chand Koppaka <lokoppak@microsoft.com> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
-
由 Michael Wyatt 提交于
-
- 06 10月, 2022 1 次提交
-
-
由 Thomas-MMJ 提交于
-
- 05 10月, 2022 2 次提交
-
-
由 Arash Bakhtiari 提交于
-
由 Michael Wyatt 提交于
* batch of refactored tests * more test refactoring * fp16 test refactor * more refactors * added DistributedFixture class * applied DistributedFixture to first batch of tests as a trial * added DistributedFixture test and documentation * last tests * fixes for refactored tests * remove subdirs in workflow files * fix pytest syntax error * fix another syntax error * update imports * use DistFixture with elastic checkpoint test * missing import * update to shared class tmpdir for elastic test * moved test files * avoid duplicate test file name * last refactor and moving test files * formatting * fix broken import * testing forked AMD tests * update abstract method * use blob storage for accelerate and transformers tests * upgrade torch for acclerate CI Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 30 9月, 2022 1 次提交
-
-
由 Matt Smith 提交于
-
- 28 9月, 2022 4 次提交
-
-
由 Molly Smith 提交于
* Collect error messages in results.csv Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Arash Bakhtiari 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Arash Bakhtiari 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Arash Bakhtiari 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 27 9月, 2022 2 次提交
-
-
由 Guanhua Wang 提交于
* format * remove round fn
-
由 Jeff Rasley 提交于
-