提交 · adc21a4dfdca0c540b31068e21610752689ac349 · Greenplum / DeepSpeed

03 8月, 2021 1 次提交

ZeRO-1 empty grads fix + tests (#1273) · adc21a4d

由 Jeff Rasley 提交于 8月 02, 2021

* fix empty grad zero tests

* dont clear grads in stage 1 code path

* prevent none grads from being reduced

adc21a4d

31 7月, 2021 1 次提交
- J
  
  bump to 0.4.5 · 1ff25748
  由 Jeff Rasley 提交于 7月 30, 2021
  
  1ff25748
30 7月, 2021 1 次提交

[Doc] round_robin_gradients (#1261) · 40c381df

由 Olatunji Ruwase 提交于 7月 29, 2021

* Fix docstring

* Make screenshots clickable for easier viewing

* Navigation menu in alphabetical order; More clicable screenshots

* Rename 1Cycle doc

* Tweak naming

* Remove no longer used flag

* ZeRO3 Offload release

* Single GPU results

* Rearrange figures

* Single GPU text

* tweak intro

* zero3-offload section

* Add asynchronous i/o docs

* Fix print_per_steps doc

* Document round_robin_gradients

* Tweak description

* Trigger CI

40c381df

29 7月, 2021 4 次提交

query for libaio package using known package managers (#1250) · e82060d0

由 Adam Moody 提交于 7月 29, 2021

* aio: test for libaio with various package managers

* aio: note typical tool used to install libaio package

* setup: abort with error if cannot build requested op

* setup: define op_envvar to return op build environment variable

* setup: call is_compatible once for each op

* setup: only print suggestion to disable op when its envvar not set

* setup: add method to abort from fatal error

* Revert "setup: add method to abort from fatal error"

This reverts commit 0e4cde6b0a650591c3fafface7e27b4efd9aad4f.

* setup: add method to abort from fatal error
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

e82060d0

Use correct default for round robin gradients (#1258) · 97f7ed9e

由 Olatunji Ruwase 提交于 7月 28, 2021

* Make round robin gradient partitioning configurable (default False)

* Use the correct default

* Log config setting

97f7ed9e

I
Fix cudaErrorInvalidConfiguration in attn_softmax() for large seq_length*heads values (#1239) · bfe7f0db
由 Ivan Komarov 提交于 7月 28, 2021
```
Co-authored-by: NIvan Komarov <dfyz@yandex-team.ru>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
```
bfe7f0db
O

Make round robin gradient partitioning configurable (default False) (#1256) · 4d420df5
由 Olatunji Ruwase 提交于 7月 28, 2021

4d420df5

27 7月, 2021 2 次提交
- J
  ignore overlap/contiguous_gradients if using zero 1 (#1246) · 6ae756c0
  由 Jeff Rasley 提交于 7月 26, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  6ae756c0
- A
  aio: note the reason for compiling with -O0 (#1249) · 10cc2c97
  由 Adam Moody 提交于 7月 26, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  10cc2c97
25 7月, 2021 1 次提交
- A
  
  aio: test link against libaio using distutils (#1247) · 6ec84967
  由 Adam Moody 提交于 7月 24, 2021
  
  6ec84967
21 7月, 2021 1 次提交

Fixing inference api for FP32 and non-masking GPT-based models (#1204) · 6ba96289

由 Reza Yazdani 提交于 7月 20, 2021

* fixing inference api for FP32 and non-masking GPT-based models

* use a dummy tensor if input_mask is none

* fix input_mask

* minor fix

* send input_mask to compute_attn func for checking

6ba96289

20 7月, 2021 1 次提交
- S
  [zero2] zero_param_shapes: switch to round_robin_fp16_groups (#1240) · 7392e459
  由 Stas Bekman 提交于 7月 20, 2021
```
* zero_param_shapes: switch to round_robin_fp16_groups

* add test

* old torch workaround
```
  7392e459
16 7月, 2021 2 次提交

S
[see_memory_usage] fix deprecation (#1234) · 32e85eda
由 Stas Bekman 提交于 7月 15, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
32e85eda

enable async io op on powerpc architectures (#1224) · 89b0fb42

由 Adam Moody 提交于 7月 15, 2021

* enable async io op on powerpc architectures

* drop any empty strings returned by cxx_args
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

89b0fb42

14 7月, 2021 3 次提交

S
[zero3] fix reference counting in backward over multiple forwards (#1227) · 3fa24208
由 Stas Bekman 提交于 7月 14, 2021
```
* fix reference counting in backward over multiple forwards

* test + cleanup
```
3fa24208

[zero3] release tmp memory when consolidating fp16 weights take2 (#1223) · 81039463

由 Stas Bekman 提交于 7月 13, 2021

* release tmp memory when consolidating fp16 weights take2

* cleanup
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

81039463

Bump addressable from 2.7.0 to 2.8.0 in /docs (#1225) · 54bed327

由 dependabot[bot] 提交于 7月 13, 2021

Bumps [addressable](https://github.com/sporkmonger/addressable) from 2.7.0 to 2.8.0.
- [Release notes](https://github.com/sporkmonger/addressable/releases)
- [Changelog](https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md)
- [Commits](https://github.com/sporkmonger/addressable/compare/addressable-2.7.0...addressable-2.8.0)

---
updated-dependencies:
- dependency-name: addressable
  dependency-type: indirect
...
Signed-off-by: Ndependabot[bot] <support@github.com>
Co-authored-by: Ndependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

54bed327

13 7月, 2021 7 次提交

O
ZeRO2-Offload: Disable copy overlapping (#1219) · 97207c8c
由 Olatunji Ruwase 提交于 7月 13, 2021
```
* Disable copy stream

* Format fixes

* Remove debug codes

* Remove debug codes

* Fix indent
```
97207c8c
J

formatting fix for release script · d1a7a55e
由 Jeff Rasley 提交于 7月 12, 2021

d1a7a55e
J

bump to 0.4.4 + add release scripts · dba0f431
由 Jeff Rasley 提交于 7月 12, 2021

dba0f431

revert part of #1220 (#1221) · bc451c09

由 Stas Bekman 提交于 7月 12, 2021

https://github.com/microsoft/DeepSpeed/pull/1220 fixed the leak, but lead to another problem. reverting that part so that we could do release and will work on it after the release.

@jeffra

bc451c09

S
[zero3] release tmp memory when consolidating fp16 weights (#1220) · 2660cc4d
由 Stas Bekman 提交于 7月 12, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
2660cc4d

[model weights] zero_to_fp32 multiple improvements (#1181) · 2a921069

由 Stas Bekman 提交于 7月 12, 2021

* add live zero checkpoint to fp32 consolidation version

* some more docs

* zero2 model states uses a different filename

* fix

* make debug mode cli configurable

* copy the script only on node 0 process 0

* validate that we have the right number of files

* revamp _get_zero_param_shapes, instrument with easier debug

* correct assertion

* rename API; add even simpler API

* style

* docs improve

* update the docs

* revert the unpartitioned_params detection and report as it's most likely persistent params
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

2a921069

enable cpu adam op on powerpc architectures (#1213) · f65ff908

由 Adam Moody 提交于 7月 12, 2021

* enable cpu adam operation on powerpc

* fix formatting
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

f65ff908

12 7月, 2021 2 次提交
- S
  json wrong format (#1210) · 5652072e
  由 senwang 提交于 7月 12, 2021
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  5652072e
- S
  
  improve debug (#1215) · 5127b2fa
  由 Stas Bekman 提交于 7月 12, 2021
  
  5127b2fa
10 7月, 2021 2 次提交

[zero.Init] post_init partitining is to be run only by a child module (#1202) · 497b741f

由 Stas Bekman 提交于 7月 09, 2021

* post_init to be run only by a child module

* better solution

* add test

* safer attr name

* wants half()

* improve doc
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

497b741f

[zero3] params_to_reduce isn't always there (#1214) · 91f58c06

由 Stas Bekman 提交于 7月 09, 2021

* [zero3] params_to_reduce isn't always there

Trying to port HF's Electra model's to Deepspeed I'm getting this on the very first backward step (with some extra debug):

```
Incrementing with parameter id 42
------ Before allocating allgather param name=generator_lm_head.weight id=41 shape=torch.Size([1]) status=ZeroParamStatus.NOT_AVAILABLE partition size=327680
------allgather param with name=generator_lm_head.weight id=41 shape=torch.Size([1]) status=ZeroParamStatus.NOT_AVAILABLE partition size=327680
------ Before allocating allgather param name=generator_lm_head.bias id=42 shape=torch.Size([1]) status=ZeroParamStatus.NOT_AVAILABLE partition size=5120
------allgather param with name=generator_lm_head.bias id=42 shape=torch.Size([1]) status=ZeroParamStatus.NOT_AVAILABLE partition size=5120
Backward name=generator_lm_head.weight id=41 shape=torch.Size([5120, 64])
Inside reduce ipg buckets. name=generator_lm_head.weight id=41 shape=torch.Size([5120, 64]), ipg elements 0, reduce bucket size 4096
Params in ipg bucket []
Reducing []
GOT 1
torch.Size([4096])
Traceback (most recent call last):
  File "examples/pytorch/language-modeling/run_mlm.py", line 533, in <module>
    main()
  File "examples/pytorch/language-modeling/run_mlm.py", line 484, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero_to_fp32-tests/src/transformers/trainer.py", line 1269, in train
    tr_loss += self.training_step(model, inputs)
  File "/mnt/nvme1/code/huggingface/transformers-ds-zero_to_fp32-tests/src/transformers/trainer.py", line 1778, in training_step
    loss = self.deepspeed.backward(loss)
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/engine.py", line 1188, in backward
    self.optimizer.backward(loss)
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/zero/stage3.py", line 2964, in backward
    self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/fp16/loss_scaler.py", line 53, in backward
    scaled_loss.backward(retain_graph=retain_graph)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/stas/anaconda3/envs/py38-pt19/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
    Variable._execution_engine.run_backward(
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/zero/stage3.py", line 1867, in reduce_partition_and_remove_grads
    self.reduce_ready_partitions_and_remove_grads(param, i)
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/zero/stage3.py", line 2212, in reduce_ready_partitions_and_remove_grads
    self.reduce_independent_p_g_buckets_and_remove_grads(param, i)
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/zero/stage3.py", line 1897, in reduce_independent_p_g_buckets_and_remove_grads
    self.reduce_ipg_grads()
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/zero/stage3.py", line 2193, in reduce_ipg_grads
    self.average_tensor(reduction_list, params_to_reduce)
  File "/mnt/nvme1/code/github/00optimize/DeepSpeed-zero-init-child-only-post_init/deepspeed/runtime/zero/stage3.py", line 1972, in average_tensor
    params_to_reduce[0].reduce_gradients_at_owner(
```

Is it always that `params_to_reduce` is populated?

If I add this check the problem goes away it seems.

* real fix

91f58c06

08 7月, 2021 1 次提交
- J
  
  Upgrade bundler to address CVE-2020-36327 · aeea85b4
  由 Jeff Rasley 提交于 7月 07, 2021
  
  aeea85b4
02 7月, 2021 2 次提交

contiguous gradients should be set to True by default (#1199) · c9fee821

由 Samyam Rajbhandari 提交于 7月 01, 2021

* contiguous gradients should be set to True by default

* Set contiguous gradients to True by default

Features such as reduce_scatter depends on contiguous gradients being True. This is also the preferred default configuration.

c9fee821

J

bump to 0.4.3 · 36126122
由 Jeff Rasley 提交于 7月 01, 2021

36126122

29 6月, 2021 1 次提交
- S
  clean up logging (#1190) · a0292398
  由 Stas Bekman 提交于 6月 28, 2021
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  a0292398
26 6月, 2021 1 次提交
- S
  undo noise (#1191) · bc019a53
  由 Stas Bekman 提交于 6月 25, 2021
```
* undo noise

* another
```
  bc019a53
24 6月, 2021 5 次提交

Fix bugs about non-contiguous tensor broadcasting (#1168) · 429cbc89

由 Hyunwoong Ko 提交于 6月 24, 2021

* Fix bugs about non-contiguous tensor broadcasting

* Fix typo
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

429cbc89

introduce debug utils (#1136) · c0c4ebf1

由 Stas Bekman 提交于 6月 23, 2021

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

c0c4ebf1

ZeRO 2+3 memory estimators (#965) · 0c1802cc

由 Stas Bekman 提交于 6月 23, 2021

Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

0c1802cc

zero_to_fp32: restore persistent buffers (#1146) · df8b1f88

由 Stas Bekman 提交于 6月 23, 2021

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

df8b1f88

J

bump to 0.4.2 · 0982ecca
由 Jeff Rasley 提交于 6月 23, 2021

0982ecca

22 6月, 2021 1 次提交
- J
  
  remove torchvision dependency (#1178) · 3b689844
  由 Jeff Rasley 提交于 6月 21, 2021
  
  3b689844
18 6月, 2021 1 次提交
- O
  
  Avoid partitioning small activations (#1154) · b1669c0d
  由 Olatunji Ruwase 提交于 6月 17, 2021
  
  b1669c0d

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年