提交 · b652395e5e0584a32123b49adcd8a758fb7e9aab · Greenplum / DeepSpeed

29 5月, 2020 1 次提交
- C
  fix: typo (#238) · b652395e
  由 Chunyang Wen 提交于 5月 29, 2020
```
* fix: typo in code docs

* more pythonic code
```
  b652395e
28 5月, 2020 5 次提交

remove redundant init code (#234) · 6fe0edb8

由 Chunyang Wen 提交于 5月 28, 2020

Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

6fe0edb8

J

if using lamb force legacy_fusion=True (#236) · a2b75527
由 Jeff Rasley 提交于 5月 27, 2020

a2b75527

add support for predivide as a config option (#235) · bc36b91d

由 Jeff Rasley 提交于 5月 27, 2020

* add support for predivide as a flag
* add predivide json config, remove allgather_disable (as it's not currently used anymore)

bc36b91d

S
Default Contiguous Gradients False (#239) · 01e848b3
由 Samyam Rajbhandari 提交于 5月 27, 2020
```
Contiguous Gradients should be set to false by default. Its not useful unless the model is very large
```
01e848b3

Samyamr/cpu memory bloat fix zero (#233) · d24d3de9

由 Samyam Rajbhandari 提交于 5月 27, 2020

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

* Fix for CPU memory Bloating Issue caused by pyorch backward graph creation in allgather. Fixed by calling detach on tensors before calling all_gather

d24d3de9

27 5月, 2020 2 次提交
- J
  Support fp32 grad clipping and fix max_grad_norm confusion (#232) · abe2204d
  由 Jeff Rasley 提交于 5月 26, 2020
```
* updates to support fp32 grad clipping and disable max_grad_norm
```
  abe2204d
- S
  
  updating website packages (#231) · 6a9d57f6
  由 Shaden Smith 提交于 5月 26, 2020
  
  6a9d57f6
25 5月, 2020 1 次提交
- C
  
  fix redundant (#228) · b466e84e
  由 Chunyang Wen 提交于 5月 25, 2020
  
  b466e84e
22 5月, 2020 2 次提交
- S
  
  Tense fix (#225) · e62afbc8
  由 Shaden Smith 提交于 5月 21, 2020
  
  e62afbc8
- S
  
  WIP tutorial warning (#224) · 00183ed5
  由 Shaden Smith 提交于 5月 21, 2020
  
  00183ed5
21 5月, 2020 1 次提交
- J
  
  reduce size of megatron tests (#223) · 53ac7947
  由 Jeff Rasley 提交于 5月 20, 2020
  
  53ac7947
19 5月, 2020 7 次提交

S

Updates new live news links on deepspeed.ai (#222) · 8a18e73e
由 Shaden Smith 提交于 5月 19, 2020

8a18e73e
S

Adds links to new blog. (#221) · 1230e31b
由 Shaden Smith 提交于 5月 19, 2020

1230e31b
S
News edits (#220) · 4eade17a
由 Shaden Smith 提交于 5月 19, 2020
```
* BERT title
```
4eade17a
S

news edits (#219) · 0c824830
由 Shaden Smith 提交于 5月 19, 2020

0c824830
S

adds readthedocs badge (#218) · 10a46a14
由 Shaden Smith 提交于 5月 19, 2020

10a46a14

ZeRO-2 (#217) · f2ac7eaf

由 Jeff Rasley 提交于 5月 19, 2020

Updates for ZeRO stage 2 + ZeRO stage 1 w. RS
Co-authored-by: NTunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: Nyuxionghe <yuxhe@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>

f2ac7eaf

adding BingSqaud e2e test (#214) · c61e23b4

由 Arash Ashari 提交于 5月 18, 2020

* adding BingSqaud e2e test

* updating the draft test; bring final step under try section

* finalizinf test for base deepspeed and deepspeed with ZeRO

* applying the comment (thanks Jeff); fixed formatting

c61e23b4

15 5月, 2020 1 次提交
- S
  
  reverting test matrix to debug node (#215) · ccaa9901
  由 Shaden Smith 提交于 5月 14, 2020
  
  ccaa9901
14 5月, 2020 1 次提交
- J
  
  Bump nv-peer-mem version in docker · 57678bed
  由 Jeff Rasley 提交于 5月 13, 2020
  
  57678bed
13 5月, 2020 1 次提交
- S
  
  Test matrix (#213) · c7863d97
  由 Shaden Smith 提交于 5月 12, 2020
  
  c7863d97
12 5月, 2020 1 次提交
- O
  Support dynamic loss scale args in fp16 optimizers (#212) · 0be026e3
  由 Olatunji Ruwase 提交于 5月 11, 2020
```
* Support dynamic loss scale args in fp16 optimizers

* Update names
```
  0be026e3
07 5月, 2020 1 次提交
- S
  
  Fix global_steps checkpoint loading. (#139) · b2c87edf
  由 Shaden Smith 提交于 5月 06, 2020
  
  b2c87edf
06 5月, 2020 2 次提交
- S
  
  check for model_parallel attr (#204) · 4f42bbb0
  由 Shaden Smith 提交于 5月 06, 2020
  
  4f42bbb0
- J
  add basic post-install test (#209) · 7dc209c6
  由 Jeff Rasley 提交于 5月 05, 2020
```
* add basic post-install test
```
  7dc209c6
05 5月, 2020 1 次提交
- J
  
  Bug fix for sparse grads (#208) · a90a32d7
  由 Jeff Rasley 提交于 5月 04, 2020
  
  a90a32d7
01 5月, 2020 2 次提交
- J
  Upgrade apex version, turn off legacy fusion (#205) · 3ce531c9
  由 Jeff Rasley 提交于 4月 30, 2020
```
* update apex version to feb 5th commit

* use gradient clipping instead of max grad norm in tests

* add warning when user provides max_grad_norm

* update examples commit
```
  3ce531c9
- J
  
  Add back torch/torchvision requirements · f8fa1325
  由 Jeff Rasley 提交于 4月 30, 2020
  
  f8fa1325
29 4月, 2020 1 次提交

CSR+FP32 fix (#206) · 6cb332f1

由 Samyam Rajbhandari 提交于 4月 28, 2020

1) CSR parameter names should end with .weight. 
2) When using basic optimizer directly, DeepSpeed should handle zero_grad. Letting the basic optimizer do the zero_grad resulted in residual gradients in the embedding layer due to unknown reasons.

6cb332f1

28 4月, 2020 1 次提交
- S
  
  Moved environment variable docs. (#203) · a0cd61e8
  由 Shaden Smith 提交于 4月 27, 2020
  
  a0cd61e8
25 4月, 2020 2 次提交
- J
  Remove explicit torch version requirement · 7cf65d0e
  由 Jeff Rasley 提交于 4月 24, 2020
```
Remove explicit torch version requirement so that we can more easily support other versions
```
  7cf65d0e
- O
  
  Fix index out of range error when parameter count is not multiple of ranks (#202) · 512a0d4d
  由 Olatunji Ruwase 提交于 4月 24, 2020
  
  512a0d4d
23 4月, 2020 1 次提交
- S
  
  Fixes missing newline in code example (#201) · c014a55b
  由 Shaden Smith 提交于 4月 22, 2020
  
  c014a55b
22 4月, 2020 1 次提交
- S
  
  README and RTD improvements. (#198) · dd166ee6
  由 Shaden Smith 提交于 4月 21, 2020
  
  dd166ee6
21 4月, 2020 2 次提交
- O
  Fix perf bug (#194) · bf4797c2
  由 Olatunji Ruwase 提交于 4月 20, 2020
```
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
```
  bf4797c2
- M
  
  Early Return Pattern "if return else return" -> "if return return" (#197) · b7f5cb78
  由 marload 提交于 4月 21, 2020
  
  b7f5cb78
16 4月, 2020 1 次提交
- J
  
  Delete tmp · 675d73e0
  由 Jeff Rasley 提交于 4月 15, 2020
  
  675d73e0
12 4月, 2020 1 次提交
- S
  
  AllReduce bucket fix. (#186) · 90017d3a
  由 Samyam Rajbhandari 提交于 4月 11, 2020
  
  90017d3a
11 4月, 2020 1 次提交
- S
  
  Add BERT pretraining tutorial to navigation bar. (#190) · 4cbfcc75
  由 Shaden Smith 提交于 4月 10, 2020
  
  4cbfcc75

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年