提交 · 86477538a62f8b192d3c4978a8da747f67f4a3d3 · Greenplum / DeepSpeed

01 2月, 2023 1 次提交

Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. (#2743) · 86477538

由 Logan Adams 提交于 1月 31, 2023

* Remove hardcoded instances to fp16 in log messages.

* Add model_dtype to print the correct format

* Respond to PR feedback

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

86477538

31 1月, 2023 2 次提交
- C
  Add links to new azureML examples (#2756) · 1db4ade3
  由 cassieesvelt 提交于 1月 30, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  1db4ade3
- B
  Bing/formatting correction (#2764) · 8d3b42c2
  由 Bing Xie 提交于 1月 30, 2023
```
* modify engine.py for formatting

* commit formatting changes on engine.py
```
  8d3b42c2
29 1月, 2023 1 次提交
- C
  
  Add environment variable to make nvcc compilation more verbose (#2759) · 258d2831
  由 Connor Holmes 提交于 1月 28, 2023
  
  258d2831
27 1月, 2023 5 次提交

Skip test_bias_gelu unit test if torch < 1.12 (#2754) · cc3d7cb9

由 Lev Kurilenko 提交于 1月 26, 2023

This PR adds a torch version check in the test_bias_gelu unit test to skip if the torch version < 1.12. This is due to gelu implementation differences in versions prior to 1.12.

cc3d7cb9

Fix softmax backward (#2709) · 0b06e0cb

由 Reza Yazdani 提交于 1月 26, 2023

* Reset KV-cache at the beginning of text-generation

* Add new backward kernel to handle large softmax-length

* remove unrelated changes
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>

0b06e0cb

J
[zero] remove misleading dtype log (#2732) · a60e31a7
由 Jeff Rasley 提交于 1月 26, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
a60e31a7

fix a mispelled attribute (#2750) · 30d3f5df

由 Stas Bekman 提交于 1月 26, 2023

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

30d3f5df

Fix autotuning so that it records Floating Point Operations per second, not microsecond (#2711) · d4bfae41

由 Dashiell Stander 提交于 1月 26, 2023

* Fix how autotuning reports TFLOPS so that they are reported in FLOPS per second, not millisecond
Co-authored-by: NNick Sarkauskas <nsarka00@gmail.com>
Co-authored-by: NQuentin Anthony <anthony.301@osu.edu>
Signed-off-by: NDashiell Stander <dstander@protonmail.com>

* Actually it is microseconds -> seconds
Signed-off-by: NDashiell Stander <dstander@protonmail.com>

* Actually it is microseconds -> seconds
Signed-off-by: NDashiell Stander <dstander@protonmail.com>
Signed-off-by: NDashiell Stander <dstander@protonmail.com>
Co-authored-by: NNick Sarkauskas <nsarka00@gmail.com>
Co-authored-by: NQuentin Anthony <anthony.301@osu.edu>

d4bfae41

26 1月, 2023 2 次提交

Abstract accelerator (step 3) (#2677) · 98cc35b6

由 Ma, Guokai 提交于 1月 26, 2023

* Integrate accelerator abstraction interface into deepspeed/

* Fix error message in fp16/fused_optimizer

* fix error message in fp16/unfused_optimizer.py

* assign get_accelerator().pin_memory() result to input Tensor name

* no need to check cuda and whether nvtx supported

* move try-except into inner most block

* call Event() and Stream() in get_accelerator() for data type

* Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed

* Apply op_builder backend api change from #2705 from @jeffra

* fix tests where Builder NAME is used

* keep original ...Builder.NAME interface instead of ...Builder().NAME interface

* fix builder closure for installation

* fix randomltd builder

* add comments to clarify create_op_builder and get_op_builder

* fix compatibility with pip install -e
Co-authored-by: NCheng Li <pistasable@gmail.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

98cc35b6

[GatheredParameters] fix memory leak (#2665) · ddd48b36

由 Stas Bekman 提交于 1月 26, 2023

* [GatheredParameters] fix memory leak

* simplify

* cleanup and move

* style

* Formatting

* fix test

* fix test

* fix test take 2

* Trigger CI
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJoe Mayer <114769929+jomayeri@users.noreply.github.com>

ddd48b36

25 1月, 2023 3 次提交

J
fixing optimizer sanity check (#2742) · 4be8df72
由 Joe Mayer 提交于 1月 25, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
4be8df72

Automatic tensor parallelism v2 (#2670) · d59b5729

由 Molly Smith 提交于 1月 24, 2023

* loop through pipe.model

* tp_parser first draft

* client_module must be type object

* Simplify layernorm tracking. Add unittest.

* cleanup

* Add more models to unittest

* cleanup inference pytest for merging

* Add unittest

* cleanup

* pre-commit

* unittest id and pytest marker

* try marian for unittest

* precommit

* Move tp code to seperate file

* Add new auto tp file

* pre-commit and type

* Update deepspeed/module_inject/auto_tp.py
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* Update deepspeed/module_inject/auto_tp.py
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* Update tests/unit/inference/test_inference.py
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* remove unused fillmask function
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

d59b5729

L

Change zero_grad() argument to match pytorch (#2741) · 34a11688
由 loadams 提交于 1月 24, 2023

34a11688

20 1月, 2023 1 次提交

Inference Refactor (replace_with_policy, model_implementations) (#2554) · 867da307

由 Ammar Ahmad Awan 提交于 1月 19, 2023

Co-authored-by: NLev Kurilenko <lekurile@microsoft.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

867da307

19 1月, 2023 4 次提交
- M
  fix typo (#2718) · 8df50a26
  由 Michael Wyatt 提交于 1月 18, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  8df50a26
- J
  BF16 optimizer for BF16+ZeRO Stage 1 (#2706) · 8d87c89e
  由 Joe Mayer 提交于 1月 18, 2023
```
* BF16 optimizer only with ZeRO stage 1.

* Updating to grad accum of fp32 for BF16 ZeRO1 case.
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  8d87c89e
- M
  update for lm-eval==0.3.0 (#2713) · 23e5133c
  由 Michael Wyatt 提交于 1月 18, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  23e5133c
- J
  [install] only add deepspeed pkg at install (#2714) · 0b549ad7
  由 Jeff Rasley 提交于 1月 18, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  0b549ad7
18 1月, 2023 6 次提交

M

remove master branch from CI triggers (#2712) · df2495ca
由 Michael Wyatt 提交于 1月 17, 2023

df2495ca

CUDA optional deepspeed ops (#2507) · 3f210c97

由 Olatunji Ruwase 提交于 1月 17, 2023

* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU

* guarde the CUDA-related include files and variables

* remove CUDA dependency from op_builder when building against CPU

* fixing the builder issues

* fix formatting

* return true when there is no mismatch on the cuda version

* guard for when cuda is not available & test with cpu-only environment

* Update cpu_adam and cpu_adagrad

* Format fixes

* Add configurable half precision type; Build/run in CUDA environment

* Run cpu_adam and cpu_adagrad in cpu only environment

* Mark CUDA only unit tests

* CPU environment CI

* Format fixes

* Remove --forked

* Add --forked

* CPU only CI should pass

* Format fixes

* Format fixes

* Remove scattered pytest.skip

* Fix cpu_adam unit test

* Update .github/workflows/nv-torch-latest-cpu.yml
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* Update .github/workflows/nv-torch-latest-cpu.yml
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* Address PR feedback

* OpenMP linking

* Fix unit tests
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

3f210c97

J

bump to 0.8.1 · 7d0e4270
由 Jeff Rasley 提交于 1月 17, 2023

7d0e4270
O

ZeRO3 handling frozen weights] (#2653) · bf6b9802
由 Olatunji Ruwase 提交于 1月 17, 2023

bf6b9802
J

remove print side effect from importing deepspeed (#2704) · 35575bce
由 Jeff Rasley 提交于 1月 17, 2023

35575bce
J
non-MoE stage 1 requires CG disabled (#2703) · e4ba7222
由 Jeff Rasley 提交于 1月 17, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
e4ba7222

14 1月, 2023 4 次提交
- J
  using correct loss scale in zero step (#2695) · fe728e3e
  由 Joe Mayer 提交于 1月 14, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  fe728e3e
- J
  
  exclude benchmarks during install (#2698) · cd271a4a
  由 Jeff Rasley 提交于 1月 13, 2023
  
  cd271a4a
- M
  fix for latest diffusers (#2699) · c9c6ab9e
  由 Michael Wyatt 提交于 1月 13, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  c9c6ab9e
- S
  [GatheredParameters] add support for any iterator (#2664) · 217cc07b
  由 Stas Bekman 提交于 1月 13, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  217cc07b
13 1月, 2023 1 次提交

Extend quantization utils features (#2683) · aef8a856

由 LOK CHAND KOPPAKA 提交于 1月 12, 2023

* Extend quantization utils features

* remove unwanted files

* fix cahce setting
Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>

aef8a856

12 1月, 2023 2 次提交
- L
  Pass training flag to forward call from Eval (#2604) · e7c14026
  由 LOK CHAND KOPPAKA 提交于 1月 11, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
```
  e7c14026
- M
  fix import path to op_builder (#2687) · cf9e433f
  由 Masahiro Tanaka 提交于 1月 11, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  cf9e433f
11 1月, 2023 2 次提交

Add mlflow logging for aml (#2495) · a3d7f106

由 cassieesvelt 提交于 1月 10, 2023

* add logging changes

* try w/out abspath

* undo last change

* start mlflow debug

* remove mlflow from export_envs

* add mlflow logging for reversed

* remove mlflow.start_run

* add back start run

* don't clean cmd

* print os environment variables

* remove first start run

* add run_id to mlflow star

* remove context managers

* move last end run

* add extra parent start_runs

* add run id logging

* add logging to run_ds_config

* change run_id to run_name

* add back context managers and run_id logs

* remove context mng

* debug environment variable

* reset environment variables

* add env variable deletion

* clean up

* remove unused import

* fix yapf/whitespace errors
Co-authored-by: NCheng Li <pistasable@gmail.com>

a3d7f106

J
remove duplicated code in ZeRO (#2655) · 89da037e
由 JackieWu 提交于 1月 11, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
89da037e

10 1月, 2023 2 次提交
- M
  
  real_accelerator validation check for both accelerator and deepspeed.accelerator path (#2685) · 62c071e0
  由 Ma, Guokai 提交于 1月 10, 2023
  
  62c071e0
- J
  
  [inference] ds-mlp refactor w.r.t. ops (#2668) · c702b64c
  由 Jeff Rasley 提交于 1月 09, 2023
  
  c702b64c
09 1月, 2023 4 次提交

X
fix Tensor contiguous bug in model_compression (#2671) · be6d19f0
由 Xiaoxia (Shirley) Wu 提交于 1月 09, 2023
```
double check the unit tests
```
be6d19f0
S
[fp16] lower initial_scale_power (#2663) · f30a0308
由 Stas Bekman 提交于 1月 09, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
f30a0308
J
[Bug Fixed] use torch.cuda.is_available() (#2661) · 323c266c
由 JackieWu 提交于 1月 09, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
323c266c

Remove unnecessary device synchronization for stage 2 (#2500) · 97deaaec

由 li-yi-dong 提交于 1月 09, 2023

* Remove unnecessary device synchronization for stage 2

* Remove unnecessary device synchronization for stage 2
Co-authored-by: Nliyidong.lyd <liyidong.lyd@alibaba-inc.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJoe Mayer <114769929+jomayeri@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

97deaaec

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年