- 27 10月, 2022 1 次提交
-
-
由 Cheng Li 提交于
* rollback ds config changes * fix format * Fix error when output_file is a relative path without a prefix (#2397) Co-authored-by: NBenjamin Steenhoek <benjaminjsteenhoek@gmail.com> * fix restuls and exprs path to use absolute path * write out optimial config after tuning * fix format * assert tuning result dir creation Co-authored-by: NBenjamin Steenhoek <benjaminjsteenhoek@gmail.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 14 10月, 2022 1 次提交
-
-
由 Dashiell Stander 提交于
Signed-off-by: NDashiell Stander <dstander@protonmail.com> Co-authored-by: NDashiell Stander <dashiell@ip-172-31-45-20.ec2.internal> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 30 7月, 2022 1 次提交
-
-
由 Arpan Jain 提交于
Co-authored-by: NArpan Jain <t-arpanjain@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 28 7月, 2022 1 次提交
-
-
由 trajep 提交于
* enable checkpoint engine * seprated nebula config * add __init__.py for nebula importing * linter fix * fix: ds_config is None * fix: ds config * fix: get sd loader fix * align the API with torch raw code * linter fix * remove duplicate tag params * make checkpoint_engine as required args * fix args * extract parameters out to config * fix: load state dict * separate load engine * linter fix * extract checkpoint engine to abstract calss * linter fix * construct function args fix * add docs for dev/customers * linter fix * remove load engine * print->log_dist * linter fix * add tag flag to distinguish the loading order Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 5月, 2022 1 次提交
-
-
由 liamcli 提交于
-
- 15 3月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 09 2月, 2022 1 次提交
-
-
由 liamcli 提交于
-
- 28 1月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 27 1月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 20 1月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 13 1月, 2022 1 次提交
-
-
由 liamcli 提交于
-
- 18 11月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 13 11月, 2021 1 次提交
-
-
由 Cheng Li 提交于
* [squash] Staging autotuning v4 Co-authored-by: NCheng Li <pistasable@gmail.com> Co-authored-by: NMinjia Zhang <minjiaz@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * add new extra, guard xgboost, cleanup dead files (#268) * Fix autotuning docs (#1553) * fix docs * rewording the goal * fix typos * fix typos (#1556) * fix typos * fix format * fix bug (#1557) * fix bug Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NMinjia Zhang <minjiaz@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 03 11月, 2021 1 次提交
-
-
由 Chunyang Wen 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 02 10月, 2021 1 次提交
-
-
由 Alex Hedges 提交于
* Fix typos in docs/ * Fix typos in code comments and output strings * Fix typos in the code itself * Fix typos in tests/ Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 21 4月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 19 4月, 2021 2 次提交
-
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
-
- 14 4月, 2021 1 次提交
-
-
由 Takuya Makino 提交于
-
- 07 4月, 2021 1 次提交
-
-
由 Takuya Makino 提交于
-
- 17 3月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 10 3月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 09 3月, 2021 1 次提交
-
-
由 Samyam Rajbhandari 提交于
* Squash stage3 v1 (#146) Co-authored-by: NSamyam <samyamr@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> * Fix correctness bug (#147) * formatting fix (#150) * stage3 bugfix (API) update and simplified FP16 Z3 tests (#151) * fp16 Z3 API update and bugfix * revert debug change * ZeRO-3 detach and race condition bugfixes (#149) * trying out ZeRO-3 race condition fix * CUDA sync instead of stream * reduction stream sync * remove commented code * Fix optimizer state_dict KeyError (#148) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152) * Simplifying the logic for getting averaged gradients (#153) * skip for now * Z3 Docs redux (#154) * removing some TODOs and commented code (#155) * New Z3 defaults (#156) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> * formatting * megatron external params Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com>
-
- 16 1月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 18 12月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 10 9月, 2020 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
* 1-bit adam (#353) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NYour Name <you@example.com> Co-authored-by: Ntanghl1994 <htang14@ur.rochester.edu> Co-authored-by: NHank <tanghl1994@gmail.com> Co-authored-by: Nroot <root@node2x12b.cs.rochester.edu> Co-authored-by: NAmmar Ahmad Awan <awan.ammar@microsoft.com>
-
- 02 9月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com>
-
- 19 8月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* turn off multi-node launch if only 1 node
-
- 14 8月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* update fan out flag for pdsh
-
- 18 6月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 05 6月, 2020 1 次提交
-
-
由 Chunyang Wen 提交于
* Add log util * replace all occurrences of print and logging * address format * disable propagate to avoid duplicate log
-
- 30 5月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* Transformer kernels release Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NTunji Ruwase <olruwase@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
-
- 23 3月, 2020 1 次提交
-
-
由 Olatunji Ruwase 提交于
-
- 11 3月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-
- 04 3月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* add support for deepspeed env file to pass custom env values * simplify deepspeed config example
-
- 27 2月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* add mpirun support for openmpi 4.0 * add master addr support from args * switch mpi detection to use mpi4py * set constant for default distributed port * Make sure deepspeed_mpi exits in args
-
- 21 2月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
Also a fix for #94
-
- 20 2月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
-
- 04 2月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
Fixing file permissions.
-
- 01 2月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
-