提交 · 34a11688c4b3dd98fe23d6222fc4c54e7ac74fe3 · Greenplum / DeepSpeed

25 1月, 2023 1 次提交
- L
  
  Change zero_grad() argument to match pytorch (#2741) · 34a11688
  由 loadams 提交于 1月 24, 2023
  
  34a11688
09 1月, 2023 1 次提交
- J
  [Bug Fixed] use torch.cuda.is_available() (#2661) · 323c266c
  由 JackieWu 提交于 1月 09, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  323c266c
26 9月, 2022 1 次提交

Add Onebit Optimzers in __init__ (#2340) · f210256a

由 Saeyeol Lee 提交于 9月 26, 2022

Co-authored-by: NSaeyeol Lee <sylee@si-anlaytics.ai>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

f210256a

26 7月, 2022 1 次提交
- A
  
  Add flake8 to pre-commit checks (#2051) · 316c4a43
  由 Alex Hedges 提交于 7月 25, 2022
  
  316c4a43
21 6月, 2022 1 次提交
- K
  fix import errors (#2026) · 735406e5
  由 Karim Foda 提交于 6月 20, 2022
```
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
```
  735406e5
11 6月, 2022 1 次提交

DeepSpeed comm backend v1 (#1985) · 36ad3119

由 Ammar Ahmad Awan 提交于 6月 10, 2022

Co-authored-by: NQuentin Anthony <qganthony@yahoo.com>
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

36ad3119

19 5月, 2022 1 次提交
- Q
  
  Add loss scale guard to avoid inf loop (#1958) · 44085856
  由 Quentin Anthony 提交于 5月 18, 2022
  
  44085856
12 5月, 2022 1 次提交
- J
  
  Fairseq support (#1915) · 50893458
  由 Jeff Rasley 提交于 5月 11, 2022
  
  50893458
20 4月, 2022 1 次提交

bf16+pipeline parallelism (#1801) · 56c52238

由 Olatunji Ruwase 提交于 4月 19, 2022

* bf16 updates

* Got bf16 working

* fp32 reduction; flattened tensors

* bf16+zero_stage_1 first cut

* finish zero_stage 1 sharding

* Matching fp16 with debugging codes

* Matching loss with fp16

* Fix gradient clipping

* bf16 gradient clipping fix
bf16 checkpoint save/load

* Unscale grad norm

* Fix grad norm scaling

* Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa

* Fix clip_grad key error

* Reduce tied weight gradients

* Fix grad norm for moe

* Reduce specified gradients

* Use O(n) instead of O(n^2)

* Remove optimizer restriction for bf16

* Link bf16 & fp32 params

* Clip gradients of last stage tied weights

* Simplify tied weights reduction logic

* Also clip all tp rank parameters

* lp to hp mapping

* Link lp/hp/optim state; Refresh links after checkpoint load

* Remove debug print

* Remove debug print

* Simplify zero_grad logic

* fp32 accessors

* Fix update bug
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

56c52238

11 3月, 2022 1 次提交

01 adam optimizer (#1790) · b80e5624

由 Yucheng Lu 提交于 3月 10, 2022

Co-authored-by: NConglong Li <conglong.li@gmail.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

b80e5624

01 3月, 2022 1 次提交
- A
  Refactor MoE and Groups API to simplify model creation and mangement (#1798) · c0af6d90
  由 Ammar Ahmad Awan 提交于 2月 28, 2022
```
Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
```
  c0af6d90
08 2月, 2022 1 次提交
- O
  Move param_shapes to model files (#1732) · 135a6256
  由 Olatunji Ruwase 提交于 2月 07, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  135a6256
23 1月, 2022 1 次提交
- A
  
  Add codespell to pre-commit checks (#1717) · 4cf970e6
  由 Alex Hedges 提交于 1月 22, 2022
  
  4cf970e6
19 1月, 2022 1 次提交

MoE inference + PR-MoE model support (#1705) · e46d808a

由 Jeff Rasley 提交于 1月 18, 2022

Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NZhewei Yao <zheweiy@berkeley.edu>
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>

e46d808a

15 1月, 2022 1 次提交

[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory... · 3293cf72

由 Jeff Rasley 提交于 1月 14, 2022

[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

3293cf72

02 10月, 2021 1 次提交

Fix many typos (#1423) · be789b16

由 Alex Hedges 提交于 10月 01, 2021

* Fix typos in docs/

* Fix typos in code comments and output strings

* Fix typos in the code itself

* Fix typos in tests/
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

be789b16

30 9月, 2021 1 次提交

Big science related changes (#1407) · e2fdd254

由 Jeff Rasley 提交于 9月 29, 2021

Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NShaden Smith <shaden.smith@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

e2fdd254

17 8月, 2021 1 次提交

DeepSpeed MoE (#1310) · f2843244

由 Ammar Ahmad Awan 提交于 8月 16, 2021

Co-authored-by: NAlex Muzio <Alex.Muzio@microsoft.com>
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NConglong Li <conglong.li@gmail.com>
Co-authored-by: NFelipe Cruz Salinas <Andres.Cruz@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <shaden.smith@microsoft.com>
Co-authored-by: NYoung Jin Kim <youki@microsoft.com>
Co-authored-by: Nbapatra <bapatra@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <shaden.smith@microsoft.com>
Co-authored-by: NYoung Jin Kim <youki@microsoft.com>

f2843244

06 8月, 2021 1 次提交

Update adam.py (#1278) · 00320a9b

由 Denis Tarasov 提交于 8月 05, 2021

Make add operation inplace. Without it momentum decays to zero and training has no effect on corresponding parameters

00320a9b

24 5月, 2021 1 次提交

Quantization + inference release (#1091) · ed3de0c2

由 Reza Yazdani 提交于 5月 24, 2021

Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>

ed3de0c2

21 4月, 2021 1 次提交

1-bit LAMB optimizer (#970) · 67a48aaa

由 Conglong Li 提交于 4月 20, 2021

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed.
Author: @conglongli, @awan-10, @samyam, Hanlin Tang, Yuxiong He
Paper: https://arxiv.org/abs/2104.06069Co-authored-by: Nsdtblck <46172032+sdtblck@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

67a48aaa

17 3月, 2021 1 次提交

1-bit Adam v2 (#817) · 68c8481b

由 Conglong Li 提交于 3月 16, 2021

Authors: @awan-10 @conglongli @samyam @jeffra

What's new:

NCCL-based implementation which provides better performance and usability compared to the MPI-based implementation.
Add support to momentum masks for those parameters with constant zero gradients during training.
Bug fixes (e.g., #813).

* NCCL-based 1-bit Adam + Code Refactor for Comm. Backends (#594)

* NCCL based 1-bit Implementation + Refactor to add communication backends (#593)

* add nccl 1-bit optim.

* temporary commit to save stuff.

* Use dist collectives instead of mpi routines.

* remove old code for comm.

* Fix bugs. still does not work.

* modify to test the nccl side code path

* Initial gather impl. Works intra-node.

* Updates to comm. phase 2. nccl comm. passed the tests.

* refactor code to introduce nccl/mpi as backends for onebit adam.

* Refactor updates to test/engine.

* Fix compile/runtime errors.

* simplify support for nccl/mpi backends.

* Add missign file

* Add compression backend in constructor. Revert later.

* modify test with some perf counting.

* Implement a true non-blocking gather for nccl side.

* Revert "Add compression backend in constructor. Revert later."

This reverts commit df8c40d3.

* improve the 1-bit adam test.

* Refactor comm. and compression backend in 1-bit adam.

* Fix the test.

* Fix runtime errors and typos in nccl backend

* fix mpi backend. modify tests.

* modify nccl perf test.

* fix mpi side errors.

* Add an mpi perf test

* Sync DSE.

* Remove old collectives file.

* Undo a typo.

* Graceful failure for torch versions that don't support nccl pt2pt.

* Revert "Merge branch 'master' into staging-1bit-nccl-v2"

This reverts commit 78400850, reversing
changes made to a6dba72a.

* Revert "Revert "Merge branch 'master' into staging-1bit-nccl-v2""

This reverts commit 6dbdd985.

* comm optimization + 1-bit lamb

* Saving/debugging commit.

* finalizing 1-bit lamb

* add momentum mask and chkpt handling for 1-bit adam

* Cleanup and modify nccl test to be runnable with deepspeed launcher.

* Fix format.

* fix formatting again.

* make test runnable without mpi4py

* Add dist.alltoall and dist.allgather instead of custom functions.

* remove debug prints.

* formatting and renaming

* renaming

* add unit test, fix existing tests

* skip unit test when torch < 1.8

* revert 1-bit lamb

* flatten momentum when dimension is more than 1

* add warning message for 1-bit adam under fp32

* improve version check

* add fp32 test

* 1-bit adam doc

* fix file name

* doc fix

* torch 1.8 is released

* doc fix

* fix tests

* update news

* add doc for momentum mask

* fix checkpoing handling, add unit test

* checkpoint handling doc

* doc final cleanup

* bump dates

* update tests

* url change

* doc fix

* fix test

* doc update
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

68c8481b

11 3月, 2021 1 次提交
- S
  less scary overflow notice (#833) · 29853c3e
  由 Stas Bekman 提交于 3月 10, 2021
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  29853c3e
12 2月, 2021 1 次提交
- S
  
  fix spelling mistake (#749) · 1b8ca8ec
  由 sdtblck 提交于 2月 11, 2021
  
  1b8ca8ec
17 9月, 2020 1 次提交
- S
  Overflow fix (#416) · f5cce75e
  由 Shaden Smith 提交于 9月 16, 2020
```
* Switches fused_optimizer overflow calculation
```
  f5cce75e
10 9月, 2020 2 次提交

S
Pipeline parallel training engine. (#392) · 65c2f974
由 Shaden Smith 提交于 9月 09, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
65c2f974

Add 1-bit Adam support to DeepSpeed (#380) · 01726ce2

由 Ammar Ahmad Awan 提交于 9月 09, 2020

* 1-bit adam (#353)
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NYour Name <you@example.com>
Co-authored-by: Ntanghl1994 <htang14@ur.rochester.edu>
Co-authored-by: NHank <tanghl1994@gmail.com>
Co-authored-by: Nroot <root@node2x12b.cs.rochester.edu>
Co-authored-by: NAmmar Ahmad Awan <awan.ammar@microsoft.com>

01726ce2

02 9月, 2020 1 次提交

Sparse attn + ops/runtime refactor + v0.3.0 (#343) · e5bbc2e5

由 Jeff Rasley 提交于 9月 01, 2020

* Sparse attn + ops/runtime refactor + v0.3.0
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>

e5bbc2e5

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年