提交 · a029239812e15cf35334514449ed3127b915780a · Greenplum / DeepSpeed

29 6月, 2021 1 次提交
- S
  clean up logging (#1190) · a0292398
  由 Stas Bekman 提交于 6月 28, 2021
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  a0292398
26 6月, 2021 1 次提交
- S
  undo noise (#1191) · bc019a53
  由 Stas Bekman 提交于 6月 25, 2021
```
* undo noise

* another
```
  bc019a53
24 6月, 2021 4 次提交

Fix bugs about non-contiguous tensor broadcasting (#1168) · 429cbc89

由 Hyunwoong Ko 提交于 6月 24, 2021

* Fix bugs about non-contiguous tensor broadcasting

* Fix typo
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

429cbc89

introduce debug utils (#1136) · c0c4ebf1

由 Stas Bekman 提交于 6月 23, 2021

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

c0c4ebf1

ZeRO 2+3 memory estimators (#965) · 0c1802cc

由 Stas Bekman 提交于 6月 23, 2021

Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

0c1802cc

zero_to_fp32: restore persistent buffers (#1146) · df8b1f88

由 Stas Bekman 提交于 6月 23, 2021

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

df8b1f88

18 6月, 2021 2 次提交
- O
  
  Avoid partitioning small activations (#1154) · b1669c0d
  由 Olatunji Ruwase 提交于 6月 17, 2021
  
  b1669c0d
- J
  
  add assert to make sure zero isn't enabled with 1bit-* (#1169) · 1ba3e8e3
  由 Jeff Rasley 提交于 6月 17, 2021
  
  1ba3e8e3
17 6月, 2021 2 次提交

J

patch in ompi local rank if local_rank isn't set (#1164) · 4c114a27
由 Jeff Rasley 提交于 6月 16, 2021

4c114a27

Samyamr/largest partitioned params calculation fix (#1150) · 4eaf9106

由 Samyam Rajbhandari 提交于 6月 16, 2021

* largest_partitioned_params calculation fix

largest partitioned params was getting calculated incorrectly

* Update stage3.py

* Update stage3.py

* formatting fix

* changing sub-group size default to 1e9
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

4eaf9106

09 6月, 2021 2 次提交

Add local attention for GPT-Neo model architecture (#1114) · aca7fc54

由 Reza Yazdani 提交于 6月 08, 2021

* fix links for inference tutorial

* Fix automatic injection. Add the local-attention for GPT-Neo

* fix the inference for generation of large sequences (>1K & <32K)

* fix format
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

aca7fc54

correct cpu_offload deprecation (#1140) · a8d6dfe8

由 Stas Bekman 提交于 6月 08, 2021

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

a8d6dfe8

08 6月, 2021 2 次提交
- S
  missing f-string (#1134) · 1b187191
  由 Stas Bekman 提交于 6月 07, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  1b187191
- S
  [zero] fix missed subclasses partitioning bug (#1135) · 5ca81678
  由 Stas Bekman 提交于 6月 07, 2021
```
* fix missed subclassed partitioning bug

* fix on exit
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  5ca81678
04 6月, 2021 1 次提交

fix config name (#1103) · c697d7ae

由 Stas Bekman 提交于 6月 03, 2021

fixes: s/micro_batch_per_gpu/train_micro_batch_size_per_gpu/
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

c697d7ae

03 6月, 2021 1 次提交

Change the sparse attention API to be compatible with latest changes of triton (#902) · 26e3841c

由 Reza Yazdani 提交于 6月 02, 2021

* Change the sparse attention API to be compatible with latest changes on the triton side

* remove compatibility checks for CUDA 11

* Update requirements-sparse_attn.txt
Co-authored-by: NArash Ashari <arashari@microsoft.com>

26e3841c

26 5月, 2021 1 次提交
- R
  
  release inference quantized kernels (#1104) · d2cf66a6
  由 Reza Yazdani 提交于 5月 25, 2021
  
  d2cf66a6
25 5月, 2021 2 次提交

J
delay imports for replace policies and fix missing req (#1100) · 96eb5b12
由 Jeff Rasley 提交于 5月 24, 2021
```
* delay imports for replace policies and fix missing req

* fix issue with _orig_layer_class always being None
```
96eb5b12

Fix inference Api (#1095) · f8a65cb5

由 Reza Yazdani 提交于 5月 24, 2021

* Fix Inference and Quantization tutorial links

* fix inference api

* use correct attention scaling
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

f8a65cb5

24 5月, 2021 1 次提交

Quantization + inference release (#1091) · ed3de0c2

由 Reza Yazdani 提交于 5月 24, 2021

Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>

ed3de0c2

22 5月, 2021 1 次提交
- M
  fix ZERO_OPTIMIZATION_REDUCE_SCATTER_DEFAULT default value (#1058) · 093e59ec
  由 Meng, Peng 提交于 5月 22, 2021
```
* fix Reduce Scatter default value

* Update constants.py
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  093e59ec
21 5月, 2021 1 次提交

ZeRO-Infinity: Swap into unaligned fp16 buffer (#1086) · e9e9d5b8

由 Olatunji Ruwase 提交于 5月 20, 2021

* Align fp16 param wap buffers

* Integrating swap buffer manager for fp16 params

* Support swapping misaligned fp16 parameters

* Support swap into unaligned fp16 buffer

e9e9d5b8

20 5月, 2021 1 次提交
- J
  
  ZeRO stage 1 refresh (#1042) · cfa63f5d
  由 Jeff Rasley 提交于 5月 19, 2021
  
  cfa63f5d
19 5月, 2021 1 次提交
- O
  ZeRO-Infinity: support swapping misaligned sized fp16 tensors (#1076) · d88d9279
  由 Olatunji Ruwase 提交于 5月 18, 2021
```
* Align fp16 param wap buffers

* Integrating swap buffer manager for fp16 params

* Support swapping misaligned fp16 parameters
```
  d88d9279
16 5月, 2021 1 次提交

ZeRO2-Offload: Load balance gradient copying to CPU (#1067) · ee4deabd

由 Olatunji Ruwase 提交于 5月 15, 2021

* Round robin partitioning to improve ZeRO-2 Offload CPU copy

* Formatting fixes

* Fix index issues in debug dumps

* Remove debug prints

* Code cleanup

* Remove unintended stage3.py changes

* Add TODO

ee4deabd

14 5月, 2021 4 次提交

S
Seeded unit tests (#1072) · 46f4573b
由 Shaden Smith 提交于 5月 13, 2021
```
* is not -> !=

* Use pytest-randomly to seed unit tests.
```
46f4573b
O

Get correct fp16 reuse buffer size (#1071) · 6b49b60e
由 Olatunji Ruwase 提交于 5月 13, 2021

6b49b60e
S
ensure only ds params are gathered (#1044) · 29b444b6
由 Stas Bekman 提交于 5月 13, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
29b444b6

[configure_distributed_model] improve assert (#1053) · 1eb9b218

由 Stas Bekman 提交于 5月 13, 2021

* [configure_distributed_model] improve assert

This PR changes the 2 asserts to actually print the names of the params that are wrong. e.g.:
```
fp16 is enabled but the following parameters have dtype that is not fp16: wav2vec2.masked_spec_embed
```

* style
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

1eb9b218

13 5月, 2021 1 次提交

Improve flops profiler functionality (#1065) · 4544b7d2

由 Cheng Li 提交于 5月 12, 2021

* use the original function's name as the key to old_functions dict

* update profile output format

* print at global rank 0

* add flops calculation in bwd pass using time from ds timers

* improve aggregated profiling out to show all depth

* print samples/second

* update readme and examples

* update docs

* fix typo and reorder printing

* fix format

4544b7d2

08 5月, 2021 1 次提交

Avoid unused parameters assert by default (#1039) · 5b393f15

由 Olatunji Ruwase 提交于 5月 07, 2021

* Unused parameters assert should be disabled by default

* Fix message

* Invert assert logic in unit test

* Change option for ignoring unused parameters
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

5b393f15

06 5月, 2021 1 次提交

Align optimizer swap buffer sizes (#1036) · be5cd76a

由 Olatunji Ruwase 提交于 5月 05, 2021

* NVMe intra-request validation should be on entire file
Optimizer swap buffer sizes should be aligned

* Add fix message for missing aio lib error.

be5cd76a

04 5月, 2021 2 次提交

fix assert message (#1040) · 47ec97eb

由 Stas Bekman 提交于 5月 03, 2021

* fix assert

The current assert "Model must initialized in fp16 mode for ZeRO Stage 3." needs TLC - I rewrote it completely to match its cousen assert, so now we have 2 consistent matching asserts:

- f"fp16 is enabled but one or several model parameters have dtype that is not fp16"
- f"fp16 is not enabled but one or several model parameters have dtype of fp16"

* remove f

47ec97eb

J
Change methods to be static (#1038) · 21047072
由 janEbert 提交于 5月 03, 2021
```
Fix #1032
```
21047072

03 5月, 2021 1 次提交
- C
  
  fix format (#1032) · 962dbc63
  由 Cheng Li 提交于 5月 03, 2021
  
  962dbc63
01 5月, 2021 5 次提交
- S
  [Stage][Fix] Add additional conditions when checking types of output from the model (#1026) · b3870363
  由 Sean Naren 提交于 5月 01, 2021
```
* Add additional conditions when checking types of output from the model

* Add test

* Modify test to use torch.tensor as well
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  b3870363
- J
  Remove duplicated _initialize_parameter_parallel_groups definition in engine (#709) · 6da4fccc
  由 Jiangang Zhu 提交于 5月 01, 2021
```
Co-authored-by: NJiangang Zhu <jiangazh@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  6da4fccc
- C
  Relax dataset type check in deepspeed io (#1012) · c5700bc0
  由 Cheng Li 提交于 4月 30, 2021
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  c5700bc0
- S
  
  improve assert message (#1024) · c4485e2f
  由 Stas Bekman 提交于 4月 30, 2021
  
  c4485e2f
- S
  
  [fp32] fix default dtype (#1023) · 18a26e86
  由 Stas Bekman 提交于 4月 30, 2021
  
  18a26e86

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年