1. 25 1月, 2023 1 次提交
  2. 09 1月, 2023 1 次提交
  3. 26 9月, 2022 1 次提交
  4. 26 7月, 2022 1 次提交
  5. 21 6月, 2022 1 次提交
  6. 11 6月, 2022 1 次提交
  7. 19 5月, 2022 1 次提交
  8. 12 5月, 2022 1 次提交
  9. 20 4月, 2022 1 次提交
    • O
      bf16+pipeline parallelism (#1801) · 56c52238
      Olatunji Ruwase 提交于
      * bf16 updates
      
      * Got bf16 working
      
      * fp32 reduction; flattened tensors
      
      * bf16+zero_stage_1 first cut
      
      * finish zero_stage 1 sharding
      
      * Matching fp16 with debugging codes
      
      * Matching loss with fp16
      
      * Fix gradient clipping
      
      * bf16 gradient clipping fix
      bf16 checkpoint save/load
      
      * Unscale grad norm
      
      * Fix grad norm scaling
      
      * Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa
      
      * Fix clip_grad key error
      
      * Reduce tied weight gradients
      
      * Fix grad norm for moe
      
      * Reduce specified gradients
      
      * Use O(n) instead of O(n^2)
      
      * Remove optimizer restriction for bf16
      
      * Link bf16 & fp32 params
      
      * Clip gradients of last stage tied weights
      
      * Simplify tied weights reduction logic
      
      * Also clip all tp rank parameters
      
      * lp to hp mapping
      
      * Link lp/hp/optim state; Refresh links after checkpoint load
      
      * Remove debug print
      
      * Remove debug print
      
      * Simplify zero_grad logic
      
      * fp32 accessors
      
      * Fix update bug
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      56c52238
  10. 11 3月, 2022 1 次提交
  11. 01 3月, 2022 1 次提交
  12. 08 2月, 2022 1 次提交
  13. 23 1月, 2022 1 次提交
  14. 19 1月, 2022 1 次提交
  15. 15 1月, 2022 1 次提交
  16. 02 10月, 2021 1 次提交
  17. 30 9月, 2021 1 次提交
  18. 17 8月, 2021 1 次提交
  19. 06 8月, 2021 1 次提交
    • D
      Update adam.py (#1278) · 00320a9b
      Denis Tarasov 提交于
      Make add operation inplace. Without it momentum decays to zero and training has no effect on corresponding parameters
      00320a9b
  20. 24 5月, 2021 1 次提交
  21. 21 4月, 2021 1 次提交
  22. 17 3月, 2021 1 次提交
    • C
      1-bit Adam v2 (#817) · 68c8481b
      Conglong Li 提交于
      Authors: @awan-10 @conglongli @samyam @jeffra
      
      What's new:
      
      NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation.
      Add support to momentum masks for those parameters with constant zero gradients during training.
      Bug fixes (e.g., #813).
      
      * NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594)
      
      * NCCL based 1-bit Implementation + Refactor to add communication backends (#593)
      
      * add nccl 1-bit optim.
      
      * temporary commit to save stuff.
      
      * Use dist collectives instead of mpi routines.
      
      * remove old code for comm.
      
      * Fix bugs. still does not work.
      
      * modify to test the nccl side code path
      
      * Initial gather impl. Works intra-node.
      
      * Updates to comm. phase 2. nccl comm. passed the tests.
      
      * refactor code to introduce nccl/mpi as backends for onebit adam.
      
      * Refactor updates to test/engine.
      
      * Fix compile/runtime errors.
      
      * simplify support for nccl/mpi backends.
      
      * Add missign file
      
      * Add compression backend in constructor. Revert later.
      
      * modify test with some perf counting.
      
      * Implement a true non-blocking gather for nccl side.
      
      * Revert "Add compression backend in constructor. Revert later."
      
      This reverts commit df8c40d3.
      
      * improve the 1-bit adam test.
      
      * Refactor comm. and compression backend in 1-bit adam.
      
      * Fix the test.
      
      * Fix runtime errors and typos in nccl backend
      
      * fix mpi backend. modify tests.
      
      * modify nccl perf test.
      
      * fix mpi side errors.
      
      * Add an mpi perf test
      
      * Sync DSE.
      
      * Remove old collectives file.
      
      * Undo a typo.
      
      * Graceful failure for torch versions that don't support nccl pt2pt.
      
      * Revert "Merge branch 'master' into staging-1bit-nccl-v2"
      
      This reverts commit 78400850, reversing
      changes made to a6dba72a.
      
      * Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2""
      
      This reverts commit 6dbdd985.
      
      * comm optimization + 1-bit lamb
      
      * Saving/debugging commit.
      
      * finalizing 1-bit lamb
      
      * finalizing 1-bit lamb
      
      * add momentum mask and chkpt handling for 1-bit adam
      
      * Cleanup and modify nccl test to be runnable with deepspeed launcher.
      
      * Fix format.
      
      * fix formatting again.
      
      * make test runnable without mpi4py
      
      * Add dist.alltoall and dist.allgather instead of custom functions.
      
      * remove debug prints.
      
      * formatting and renaming
      
      * renaming
      
      * renaming
      
      * add unit test, fix existing tests
      
      * skip unit test when torch < 1.8
      
      * revert 1-bit lamb
      
      * flatten momentum when dimension is more than 1
      
      * add warning message for 1-bit adam under fp32
      
      * improve version check
      
      * add fp32 test
      
      * 1-bit adam doc
      
      * fix file name
      
      * doc fix
      
      * torch 1.8 is released
      
      * doc fix
      
      * fix tests
      
      * update news
      
      * add doc for momentum mask
      
      * fix checkpoing handling, add unit test
      
      * checkpoint handling doc
      
      * doc final cleanup
      
      * bump dates
      
      * update tests
      
      * url change
      
      * doc fix
      
      * fix test
      
      * doc update
      Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
      Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
      68c8481b
  23. 11 3月, 2021 1 次提交
  24. 12 2月, 2021 1 次提交
  25. 17 9月, 2020 1 次提交
  26. 10 9月, 2020 2 次提交
  27. 02 9月, 2020 1 次提交