提交 · 79093d74aa1632e5eac937480774b6fd29084228 · Greenplum / DeepSpeed

10 9月, 2020 1 次提交

Add 1-bit Adam support to DeepSpeed (#380) · 01726ce2

由 Ammar Ahmad Awan 提交于 9月 09, 2020

* 1-bit adam (#353)
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NYour Name <you@example.com>
Co-authored-by: Ntanghl1994 <htang14@ur.rochester.edu>
Co-authored-by: NHank <tanghl1994@gmail.com>
Co-authored-by: Nroot <root@node2x12b.cs.rochester.edu>
Co-authored-by: NAmmar Ahmad Awan <awan.ammar@microsoft.com>

01726ce2

06 9月, 2020 1 次提交
- A
  
  fixed a typo; this was fixed before but seems like it has been lost in the refactor (#364) · a64b0abb
  由 Arash Ashari 提交于 9月 05, 2020
  
  a64b0abb
02 9月, 2020 1 次提交

Sparse attn + ops/runtime refactor + v0.3.0 (#343) · e5bbc2e5

由 Jeff Rasley 提交于 9月 01, 2020

* Sparse attn + ops/runtime refactor + v0.3.0
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>

e5bbc2e5

01 9月, 2020 2 次提交

Samyamr/grad acc stage2 (#338) · 7240abf3

由 Samyam Rajbhandari 提交于 8月 31, 2020

* Adding gradient accumulation support for ZeRO Stage 2. Changing all Megatron-LM tests to also test gradient accumulation

* Gradient Accumulation support for Stage 2. Model tests added to test the feature

* formatting

* Update deepspeed_light.py

removing comment

* Update ds_config_func_bs8_zero1.json

reverting this file back. Its not needed for this PR

* defining baseline prefix
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

7240abf3

Update deepspeed_checkpointing.py (#336) · 458c0d92

由 Samyam Rajbhandari 提交于 8月 31, 2020

* Update deepspeed_checkpointing.py

* formatting
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

458c0d92

19 8月, 2020 1 次提交
- J
  Turn off multi-node launch if only 1 node (#322) · e69b1eec
  由 Jeff Rasley 提交于 8月 18, 2020
```
* turn off multi-node launch if only 1 node
```
  e69b1eec
14 8月, 2020 2 次提交
- J
  
  attach empty grad to its param to ensure it's copied after reduction (#316) · e1bea67f
  由 Jeff Rasley 提交于 8月 13, 2020
  
  e1bea67f
- J
  Update fan out flag for pdsh (#315) · 6855ba1c
  由 Jeff Rasley 提交于 8月 13, 2020
```
* update fan out flag for pdsh
```
  6855ba1c
13 8月, 2020 1 次提交
- J
  
  Update deepspeed_lr_schedules.py (#314) · 3437342c
  由 Jeff Rasley 提交于 8月 12, 2020
  
  3437342c
11 8月, 2020 1 次提交
- J
  Fix+tests for get_lr from lr_scheduler before training starts (#310) · cd68e6e5
  由 Jeff Rasley 提交于 8月 10, 2020
```
* add fix and tests for get_lr from lr_scheduler before training starts
```
  cd68e6e5
08 8月, 2020 1 次提交
- S
  Removing () from assertion. (#307) · c35e9441
  由 Shaden Smith 提交于 8月 07, 2020
```
The parenthesis alter the evaluation of the assert() and it will always evaluate to True.
```
  c35e9441
01 8月, 2020 1 次提交

NameError: name 'mpu' is not defined (#305) · 9d07d756

由 Emmanuel Kahembwe 提交于 8月 01, 2020

mpu object is bound to the class instance.. 

the if statement uses  `self.mpu'  but just `mpu` is called in the following lines.. 

This raises a NameError

9d07d756

24 7月, 2020 2 次提交
- J
  
  pass steps_per_print to tput timer (#299) · ec943410
  由 Jeff Rasley 提交于 7月 23, 2020
  
  ec943410
- J
  updates to amp to support grad clip and grad accumulation (#290) · eb74c3f1
  由 Jeff Rasley 提交于 7月 23, 2020
```
* updates to amp to support grad clip and grad accumulation
* zero grad using optimizer if in amp mode
```
  eb74c3f1
23 7月, 2020 1 次提交

Avoid deadlock for unsynchronized non-zero checkpointing (#297) · 3cc96e17

由 Olatunji Ruwase 提交于 7月 22, 2020

* Avoid deadlock for unsynchronized non-zero checkpointing

* Fix formatting issues
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

3cc96e17

22 7月, 2020 1 次提交
- A
  
  only global rank 0 can log tensorboard data; avoid multi gpu/node race for the log directory (#296) · 1f972427
  由 Arash Ashari 提交于 7月 21, 2020
  
  1f972427
16 7月, 2020 2 次提交
- J
  Empty grad fix (#291) · 376818ef
  由 Jeff Rasley 提交于 7月 15, 2020
```
* empty grad fix
* add unit tests for empty grad
```
  376818ef
- O
  
  Fix bug in fp32 optimizer state loading (#289) · 607814fe
  由 Olatunji Ruwase 提交于 7月 15, 2020
  
  607814fe
14 7月, 2020 1 次提交

Support loading and saving ZeRO checkpoints with changing DP degree (#240) · 7ccc9daf

由 Olatunji Ruwase 提交于 7月 14, 2020

* Support saving and loading ZeRO checkpoints on different data
parallelism degree.

* Fix formatting

* Support checkpoint with varying GPU count in ZeRO stage 1

* Fix formatting

* Formatting fixes

* Update model tests

* Remove pprint

* Minor fix

* Fix formatting

* Update model tests
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

7ccc9daf

12 7月, 2020 1 次提交
- J
  Support amp deepspeed backend (#286) · f5453124
  由 Jeff Rasley 提交于 7月 11, 2020
```
* add amp support for deepspeed (non-ZeRO)
* tests for amp mode
```
  f5453124
07 7月, 2020 1 次提交

ZeRO-2: Handle gradients of empty partitions (#275) · 4a3234e0

由 Olatunji Ruwase 提交于 7月 06, 2020

* Load non-DeepSpeed checkpoints into ZeRO optimizer

* Handle parameters smaller than DP

* Formatting fixes

* Handle empty partitions

* Fix perf bug
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

4a3234e0

24 6月, 2020 1 次提交

Handle parameter groups smaller than DP (#273) · 88c319aa

由 Olatunji Ruwase 提交于 6月 23, 2020

* Load non-DeepSpeed checkpoints into ZeRO optimizer

* Handle parameters smaller than DP

* Formatting fixes

88c319aa

20 6月, 2020 3 次提交
- S
  Update deepspeed_utils.py (#270) · 224494bd
  由 Samyam Rajbhandari 提交于 6月 19, 2020
```
* Removing handle_overflow debugging code in deepspeed_utils.py

* Removing handle_overflow debugging code in deepspeed_zero_optimizer.py

Removing unnecessary overflow handle code. Not sure why it was there in the first place.
```
  224494bd
- S
  Revert "Load non-DeepSpeed checkpoints into ZeRO optimizer" · 02c392f0
  由 Shaden Smith 提交于 6月 19, 2020
```
This reverts commit 54c0267e.
```
  02c392f0
- T
  
  Load non-DeepSpeed checkpoints into ZeRO optimizer · 54c0267e
  由 Tunji Ruwase 提交于 6月 20, 2020
  
  54c0267e
18 6月, 2020 1 次提交
- J
  
  add quotes around user argument values (#267) · fedd0386
  由 Jeff Rasley 提交于 6月 17, 2020
  
  fedd0386
12 6月, 2020 1 次提交
- C
  
  minor refactor loss scaler (#261) · 96c4daab
  由 Chunyang Wen 提交于 6月 12, 2020
  
  96c4daab
09 6月, 2020 1 次提交
- E
  
  fix transformer kernel preln config (#257) · fc1de4ff
  由 eltonzheng 提交于 6月 08, 2020
  
  fc1de4ff
06 6月, 2020 1 次提交

Support migration to FP16 optimizer (#249) · 2312f04b

由 Olatunji Ruwase 提交于 6月 05, 2020

* Debugging

* Fix step() bug; Make step timing optional

* Remove unnecessary changes

* Format fixes

* Replace list with scalar variable

* Remove redundant code

* Fix typo

2312f04b

05 6月, 2020 2 次提交
- V
  
  Specify num_replicas and rank when creating sampler (#216) · 0f72988d
  由 Vidush Vishwanath 提交于 6月 04, 2020
  
  0f72988d
- C
  Add log util (#230) · e1ad8803
  由 Chunyang Wen 提交于 6月 05, 2020
```
* Add log util

* replace all occurrences of print and logging

* address format

* disable propagate to avoid duplicate log
```
  e1ad8803
04 6月, 2020 1 次提交
- E
  
  reduce memcpy between host and device (#248) · 8353c594
  由 eltonzheng 提交于 6月 03, 2020
  
  8353c594
30 5月, 2020 1 次提交

Transformer kernel release (#242) · 734d8991

由 Jeff Rasley 提交于 5月 29, 2020

* Transformer kernels release
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NTunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NTunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>

734d8991

29 5月, 2020 1 次提交
- C
  fix: typo (#238) · b652395e
  由 Chunyang Wen 提交于 5月 29, 2020
```
* fix: typo in code docs

* more pythonic code
```
  b652395e
28 5月, 2020 5 次提交

remove redundant init code (#234) · 6fe0edb8

由 Chunyang Wen 提交于 5月 28, 2020

Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

6fe0edb8

J

if using lamb force legacy_fusion=True (#236) · a2b75527
由 Jeff Rasley 提交于 5月 27, 2020

a2b75527

add support for predivide as a config option (#235) · bc36b91d

由 Jeff Rasley 提交于 5月 27, 2020

* add support for predivide as a flag
* add predivide json config, remove allgather_disable (as it's not currently used anymore)

bc36b91d

S
Default Contiguous Gradients False (#239) · 01e848b3
由 Samyam Rajbhandari 提交于 5月 27, 2020
```
Contiguous Gradients should be set to false by default. Its not useful unless the model is very large
```
01e848b3

Samyamr/cpu memory bloat fix zero (#233) · d24d3de9

由 Samyam Rajbhandari 提交于 5月 27, 2020

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

d24d3de9

27 5月, 2020 1 次提交
- J
  Support fp32 grad clipping and fix max_grad_norm confusion (#232) · abe2204d
  由 Jeff Rasley 提交于 5月 26, 2020
```
* updates to support fp32 grad clipping and disable max_grad_norm
```
  abe2204d

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年