提交 · fd2f970bdfd3eada289b4e19a3adcf2c352a4d8f · Greenplum / DeepSpeed

18 12月, 2020 1 次提交
- R
  Transformer-kernel - supporting any arbitrary sequence-length (#587) · fd2f970b
  由 Reza Yazdani 提交于 12月 17, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  fd2f970b
16 12月, 2020 2 次提交

J
Fixes for RTD build errors (#606) · 6380ee35
由 Jeff Rasley 提交于 12月 15, 2020
```
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
```
6380ee35

[doc] xref to hostfile discussion (#604) · 007466e5

由 Stas Bekman 提交于 12月 15, 2020

* [doc] xref to hostfile discussion

wasn't clear where to find what was meant by `hostfile` - so adding a link to where it's discussed.

* remove whitespace

007466e5

15 12月, 2020 1 次提交
- S
  implement missing get_last_lr (#595) · 9f8e8f38
  由 Stas Bekman 提交于 12月 14, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  9f8e8f38
12 12月, 2020 5 次提交
- J
  Update launcher to set local rank environ variable (#597) · c5a449f9
  由 Jeff Rasley 提交于 12月 11, 2020
```
* Update launch.py

* formatting
```
  c5a449f9
- C
  Supported customizing kwargs for lr_scheduler (#584) · a4763f55
  由 carefree0910 提交于 12月 12, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  a4763f55
- S
  add DeepSpeedZeroConfig repr method (#596) · 66268bd3
  由 Stas Bekman 提交于 12月 11, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  66268bd3
- S
  [build] fix computer capability arch flags, add PTX, handle PTX (#591) · 8a184b6b
  由 Stas Bekman 提交于 12月 11, 2020
```
* fix arch flags, add PTX

* bug fix
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  8a184b6b
- J
  
  add manual workflow to run tests with precompiled ops · 0518252d
  由 Jeff Rasley 提交于 12月 11, 2020
  
  0518252d
10 12月, 2020 4 次提交
- J
  
  Add AML video link · 7300f3e3
  由 Jeff Rasley 提交于 12月 09, 2020
  
  7300f3e3
- J
  
  Add papers/videos to readme/website (#592) · 19acd6cf
  由 Jeff Rasley 提交于 12月 09, 2020
  
  19acd6cf
- J
  
  bump to 0.3.8 · cb7c7da6
  由 Jeff Rasley 提交于 12月 09, 2020
  
  cb7c7da6
- J
  
  Pin triton to 0.2.3 for now, 0.3.0 is broken · d901a6d2
  由 Jeff Rasley 提交于 12月 09, 2020
  
  d901a6d2
09 12月, 2020 1 次提交
- S
  Pipeline warnings and checkpoint portability (#588) · 2f626978
  由 Shaden Smith 提交于 12月 08, 2020
```
* Switch from deprecated allreduce interface.

* Make pipeline checkpoint files portable.
```
  2f626978
08 12月, 2020 2 次提交

[build] add compute_86 (#577) · e8b126d9

由 Stas Bekman 提交于 12月 07, 2020

RTX-30 series are compute_86
```
python -c "import torch; print(torch.cuda.get_device_capability())"
```
This PR adds support for this compute capability.

Reference: https://developer.nvidia.com/cuda-gpusCo-authored-by: NJeff Rasley <jerasley@microsoft.com>

e8b126d9

S

[build] make builder smarter and configurable wrt compute capabilities + docs (#578) · ce363d0e
由 Stas Bekman 提交于 12月 07, 2020

ce363d0e

05 12月, 2020 1 次提交

Fix potential random layout inconsistency issues in sparse attention modules (#534) · 1e44d48d

由 Zhun 提交于 12月 04, 2020

* 1) Register layout as buffer of module so that we can save/load checkpoint; 2) Add a broadcast of layout at the beginning to ensure different processes will have consistent layout during distributed training.

* Add docstring for max_seq_length argument in SparseSelfAttention
Co-authored-by: NZhun Liu <zhunliu@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

1e44d48d

03 12月, 2020 5 次提交
- S
  
  [build] build against installed cuda-11.1 while torch built w/ cuda-11.0 (#570) · ff58fa7e
  由 Stas Bekman 提交于 12月 02, 2020
  
  ff58fa7e
- J
  
  Add compute capability 8.0 if on cuda 11+ (#572) · be33bea4
  由 Jeff Rasley 提交于 12月 02, 2020
  
  be33bea4
- S
  
  [engine] train should be able to get `mode` arg (#571) · 2d1f7c01
  由 Stas Bekman 提交于 12月 02, 2020
  
  2d1f7c01
- J
  
  Add 'latest' checkpoint save/load support (#569) · 845921b3
  由 Jeff Rasley 提交于 12月 02, 2020
  
  845921b3
- S
  [cifar tutorial] improve readability (#567) · 7a75f8b3
  由 Stas Bekman 提交于 12月 02, 2020
```
* [cifar tutorial] improve readability 
```
  7a75f8b3
02 12月, 2020 2 次提交

tracking optimizer step in cpu-adam when loading checkpoint (#564) · 9f52a36f

由 Reza Yazdani 提交于 12月 01, 2020

* tracking optimizer step in cpu-adam when loading checkpoint

* add warning/error message for updating optimizer step count

* resolve build issue

* supporting state update from the python side

* track step from python in all cases

* remove comma

9f52a36f

supporting different hidden dimensions (#559) · c78c29f9

由 Reza Yazdani 提交于 12月 01, 2020

* supporting different hidden dimensions

* add support for larger hidden dimensions (greater than 8K)

* remove empty line

* add loop unrolling factor for dropout kernels

* update different kernels based on the reviews
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

c78c29f9

28 11月, 2020 1 次提交

[doc] typo fix and clarification (#563) · 17f36f1b

由 Stas Bekman 提交于 11月 27, 2020

This PR:
* fixes a misspelled method name
* also `( () )` doesn't read too well, until one reads the code and understands that it's not a formatting bug. I proposed to simply say that it's a callable object.

17f36f1b

26 11月, 2020 4 次提交
- J
  
  bump to 0.3.7 · c51fa65d
  由 Jeff Rasley 提交于 11月 25, 2020
  
  c51fa65d
- J
  
  update manifest · e4e20662
  由 Jeff Rasley 提交于 11月 25, 2020
  
  e4e20662
- J
  
  bump to 0.3.6 and fix manifest to include reqs (#561) · 73c3262d
  由 Jeff Rasley 提交于 11月 25, 2020
  
  73c3262d
- S
  
  Adds long_description to setup.py (#560) · 60097136
  由 Shaden Smith 提交于 11月 25, 2020
  
  60097136
23 11月, 2020 1 次提交
- J
  
  bump to 0.3.5 · 16313a96
  由 Jeff Rasley 提交于 11月 23, 2020
  
  16313a96
25 11月, 2020 6 次提交
- J
  
  Turn back on PP tests (#558) · eec44af1
  由 Jeff Rasley 提交于 11月 24, 2020
  
  eec44af1
- A
  Simplify dist init and only init if needed. (#553) · 0e831e23
  由 Ammar Ahmad Awan 提交于 11月 24, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  0e831e23
- O
  Deprecate client ability to disable gradient reduction (#552) · 6e65c2cc
  由 Olatunji Ruwase 提交于 11月 24, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  6e65c2cc
- J
  
  Update badges and CI name (#557) · 1ef5cd23
  由 Jeff Rasley 提交于 11月 24, 2020
  
  1ef5cd23
- J
  
  Switch to CI to GitHub Actions (#556) · 3347460e
  由 Jeff Rasley 提交于 11月 24, 2020
  
  3347460e
- J
  
  Create main.yml · c18fb0de
  由 Jeff Rasley 提交于 11月 24, 2020
  
  c18fb0de
24 11月, 2020 1 次提交

Bug fix for norm calculation in absence of model parallel group (#551) · 00c3a254

由 Samyam Rajbhandari 提交于 11月 23, 2020

In the absence of a model parallel group, model_parallel_allreduce should not do any reduction. This commit fixes the bug which was doing a model parallel allreduce across world group when model parallel group is None

00c3a254

23 11月, 2020 1 次提交
- S
  
  Adding static_loss_scale to unfused optimizer (#546) · bcd56f97
  由 Samyam Rajbhandari 提交于 11月 22, 2020
  
  bcd56f97
22 11月, 2020 1 次提交
- O
  
  Support non-tensor state in checkpoint (#548) · 6021b702
  由 Olatunji Ruwase 提交于 11月 21, 2020
  
  6021b702
21 11月, 2020 1 次提交

Fix unbalanced gradients bug in ZeRO-2 gradient accumulation (#545) · 0178e6cc

由 Olatunji Ruwase 提交于 11月 20, 2020

* Use zero-tensors for missing gradients to avoid size mismatch

* Unit test for unbalanced gradients in ZeRO

* Formatting fixes

0178e6cc

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年