- 25 1月, 2023 1 次提交
-
-
由 loadams 提交于
-
- 09 1月, 2023 1 次提交
-
-
由 JackieWu 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 26 9月, 2022 1 次提交
-
-
由 Saeyeol Lee 提交于
Co-authored-by: NSaeyeol Lee <sylee@si-anlaytics.ai> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 26 7月, 2022 1 次提交
-
-
由 Alex Hedges 提交于
-
- 21 6月, 2022 1 次提交
-
-
由 Karim Foda 提交于
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 11 6月, 2022 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NQuentin Anthony <qganthony@yahoo.com> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 5月, 2022 1 次提交
-
-
由 Quentin Anthony 提交于
-
- 12 5月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
-
- 20 4月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
* bf16 updates * Got bf16 working * fp32 reduction; flattened tensors * bf16+zero_stage_1 first cut * finish zero_stage 1 sharding * Matching fp16 with debugging codes * Matching loss with fp16 * Fix gradient clipping * bf16 gradient clipping fix bf16 checkpoint save/load * Unscale grad norm * Fix grad norm scaling * Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa * Fix clip_grad key error * Reduce tied weight gradients * Fix grad norm for moe * Reduce specified gradients * Use O(n) instead of O(n^2) * Remove optimizer restriction for bf16 * Link bf16 & fp32 params * Clip gradients of last stage tied weights * Simplify tied weights reduction logic * Also clip all tp rank parameters * lp to hp mapping * Link lp/hp/optim state; Refresh links after checkpoint load * Remove debug print * Remove debug print * Simplify zero_grad logic * fp32 accessors * Fix update bug Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 11 3月, 2022 1 次提交
-
-
由 Yucheng Lu 提交于
Co-authored-by: NConglong Li <conglong.li@gmail.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 01 3月, 2022 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu> Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
-
- 08 2月, 2022 1 次提交
-
-
由 Olatunji Ruwase 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 23 1月, 2022 1 次提交
-
-
由 Alex Hedges 提交于
-
- 19 1月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NZhewei Yao <zheweiy@berkeley.edu> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
-
- 15 1月, 2022 1 次提交
-
-
由 Jeff Rasley 提交于
[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525) Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 02 10月, 2021 1 次提交
-
-
由 Alex Hedges 提交于
* Fix typos in docs/ * Fix typos in code comments and output strings * Fix typos in the code itself * Fix typos in tests/ Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 30 9月, 2021 1 次提交
-
-
由 Jeff Rasley 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NShaden Smith <shaden.smith@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>
-
- 17 8月, 2021 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NAlex Muzio <Alex.Muzio@microsoft.com> Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NConglong Li <conglong.li@gmail.com> Co-authored-by: NFelipe Cruz Salinas <Andres.Cruz@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <shaden.smith@microsoft.com> Co-authored-by: NYoung Jin Kim <youki@microsoft.com> Co-authored-by: Nbapatra <bapatra@microsoft.com> Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: NShaden Smith <shaden.smith@microsoft.com> Co-authored-by: NYoung Jin Kim <youki@microsoft.com>
-
- 06 8月, 2021 1 次提交
-
-
由 Denis Tarasov 提交于
Make add operation inplace. Without it momentum decays to zero and training has no effect on corresponding parameters
-
- 24 5月, 2021 1 次提交
-
-
由 Reza Yazdani 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NElton Zheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: Neltonzheng <eltonz@microsoft.com> Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
-
- 21 4月, 2021 1 次提交
-
-
由 Conglong Li 提交于
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed. Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He Paper: https://arxiv.org/abs/2104.06069Co-authored-by: Nsdtblck <46172032+sdtblck@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 17 3月, 2021 1 次提交
-
-
由 Conglong Li 提交于
Authors: @awan-10 @conglongli @samyam @jeffra What's new: NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation. Add support to momentum masks for those parameters with constant zero gradients during training. Bug fixes (e.g., #813). * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594) * NCCL based 1-bit Implementation + Refactor to add communication backends (#593) * add nccl 1-bit optim. * temporary commit to save stuff. * Use dist collectives instead of mpi routines. * remove old code for comm. * Fix bugs. still does not work. * modify to test the nccl side code path * Initial gather impl. Works intra-node. * Updates to comm. phase 2. nccl comm. passed the tests. * refactor code to introduce nccl/mpi as backends for onebit adam. * Refactor updates to test/engine. * Fix compile/runtime errors. * simplify support for nccl/mpi backends. * Add missign file * Add compression backend in constructor. Revert later. * modify test with some perf counting. * Implement a true non-blocking gather for nccl side. * Revert "Add compression backend in constructor. Revert later." This reverts commit df8c40d3. * improve the 1-bit adam test. * Refactor comm. and compression backend in 1-bit adam. * Fix the test. * Fix runtime errors and typos in nccl backend * fix mpi backend. modify tests. * modify nccl perf test. * fix mpi side errors. * Add an mpi perf test * Sync DSE. * Remove old collectives file. * Undo a typo. * Graceful failure for torch versions that don't support nccl pt2pt. * Revert "Merge branch 'master' into staging-1bit-nccl-v2" This reverts commit 78400850, reversing changes made to a6dba72a. * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2"" This reverts commit 6dbdd985. * comm optimization + 1-bit lamb * Saving/debugging commit. * finalizing 1-bit lamb * finalizing 1-bit lamb * add momentum mask and chkpt handling for 1-bit adam * Cleanup and modify nccl test to be runnable with deepspeed launcher. * Fix format. * fix formatting again. * make test runnable without mpi4py * Add dist.alltoall and dist.allgather instead of custom functions. * remove debug prints. * formatting and renaming * renaming * renaming * add unit test, fix existing tests * skip unit test when torch < 1.8 * revert 1-bit lamb * flatten momentum when dimension is more than 1 * add warning message for 1-bit adam under fp32 * improve version check * add fp32 test * 1-bit adam doc * fix file name * doc fix * torch 1.8 is released * doc fix * fix tests * update news * add doc for momentum mask * fix checkpoing handling, add unit test * checkpoint handling doc * doc final cleanup * bump dates * update tests * url change * doc fix * fix test * doc update Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 11 3月, 2021 1 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 12 2月, 2021 1 次提交
-
-
由 sdtblck 提交于
-
- 17 9月, 2020 1 次提交
-
-
由 Shaden Smith 提交于
* Switches fused_optimizer overflow calculation
-
- 10 9月, 2020 2 次提交
-
-
由 Shaden Smith 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Ammar Ahmad Awan 提交于
* 1-bit adam (#353) Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NYour Name <you@example.com> Co-authored-by: Ntanghl1994 <htang14@ur.rochester.edu> Co-authored-by: NHank <tanghl1994@gmail.com> Co-authored-by: Nroot <root@node2x12b.cs.rochester.edu> Co-authored-by: NAmmar Ahmad Awan <awan.ammar@microsoft.com>
-
- 02 9月, 2020 1 次提交
-
-
由 Jeff Rasley 提交于
* Sparse attn + ops/runtime refactor + v0.3.0 Co-authored-by: NArash Ashari <arashari@microsoft.com> Co-authored-by: NArash Ashari <arashari@microsoft.com>
-