提交 · 2ded2ff0be6ef5f1e592af7d2f7e05c1151a8d44 · Greenplum / DeepSpeed

30 6月, 2023 1 次提交
- A
  checking process_group before merging bucket ranges (#3521) (#3577) · 2ded2ff0
  由 Alexander Jipa 提交于 6月 30, 2023
```
Co-authored-by: NAlexander Jipa <azzhipa@amazon.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  2ded2ff0
15 6月, 2023 1 次提交

remove UtilsBuilder load, use torch (un)flatten ops (#3728) · 5a5340d0

由 mzl 提交于 6月 15, 2023

Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

5a5340d0

03 6月, 2023 1 次提交
- D
  fix typo deepspeed/runtime (#3663) · 5d14afd2
  由 digger yu 提交于 6月 03, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  5d14afd2
12 5月, 2023 1 次提交
- D
  fix spelling error with deepspeed/runtime/ (#3509) · 254663a2
  由 digger-yu 提交于 5月 12, 2023
```
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
```
  254663a2
03 5月, 2023 1 次提交

Adagrad support in ZeRO (#3401) · d3550dc8

由 Joe Mayer 提交于 5月 03, 2023

* Adding torch.optim.Adagrad

* adding adagrad for zero 1 2

* Adding Adagrad support to zero 3.

* Adding documentation and DeepSpeedCPUAdagrad to list.

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

d3550dc8

26 4月, 2023 2 次提交

Fix memory leak in zero2 contiguous gradients (#3306) · 01d17492

由 hablb 提交于 4月 26, 2023

No usage of extra_large_param_to_reduce if contiguous_gradients is False.
It keeps reference of the param for the lifetime of the application.
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>

01d17492

stage_1_and_2.py: do gradient scale only for fp16 (#3166) · 0e357666
由郭叶军提交于 4月 26, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
0e357666

31 3月, 2023 1 次提交
- M
  Update DeepSpeed copyright license to Apache 2.0 (#3111) · b361c727
  由 Michael Wyatt 提交于 3月 30, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  b361c727
30 3月, 2023 1 次提交
- O
  Make fp32 default communication data type (#2970) · 261d6370
  由 Olatunji Ruwase 提交于 3月 30, 2023
```
* Make fp32 default communication data type

* Fix asserts
```
  261d6370
29 3月, 2023 1 次提交

Disable Stage 1&2 CPUAdam pathways (#3097) · 4b6d7c15

由 Michael Wyatt 提交于 3月 28, 2023

* disable CPUAdam pathways in optimizer copy/step

* Update stage_1_and_2.py

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

4b6d7c15

27 3月, 2023 1 次提交
- J
  
  update formatter version and style settings (#3098) · 91d63e02
  由 Jeff Rasley 提交于 3月 27, 2023
  
  91d63e02
15 3月, 2023 1 次提交

Improve loss overflow logs (#3008) · ac2c9ffa

由 Quentin Anthony 提交于 3月 15, 2023

* Improve overflow logs

* Trigger CI

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

ac2c9ffa

07 3月, 2023 1 次提交
- O
  Improve overflow handling (#2944) · 80d8fcbd
  由 Olatunji Ruwase 提交于 3月 06, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  80d8fcbd
28 2月, 2023 1 次提交

Enable tensor fragments for zero 2 & 3 (#2727) · 541e423a

由 Olatunji Ruwase 提交于 2月 27, 2023

* Enable tensor fragments for zero 2

* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Support offload

* Support multi-gpu

* Cleanup

* WIP

* Update deepspeed/runtime/zero/stage3.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Support padding

* Update deepspeed/runtime/zero/stage3.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* z3 optimizer state support; aligned api

* Support frozen z3 params

* Unit tests

* Check NVMe offload capability

* Formatting

* Docs

* More docs

* More docs

* Update docs/code-docs/source/zero3.rst
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* More docs

* Update docs/code-docs/source/zero3.rst
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* More docs

* More docs

* Update docs/code-docs/source/zero3.rst
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* More docs

* Support unsharded fp32 grad

* Remove debug prints

* Fix off-by-one detection of empty grads

* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/utils/tensor_fragment.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Update deepspeed/runtime/zero/stage3.py
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>

* Fix off-by-one error

* Skip ranks with no gradient data

* Formatting

* Add license

* Fix license

---------
Co-authored-by: NStas Bekman <stas00@users.noreply.github.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

541e423a

24 2月, 2023 1 次提交

Remove deprecated `torch._six` imports (#2863) · d3de7375

由 Yasyf Mohamedali 提交于 2月 23, 2023

* Remove deprecated `torch._six` imports

Closes #2845.

* Support older versions of PyTorch as well.

---------
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

d3de7375

26 1月, 2023 1 次提交

Abstract accelerator (step 3) (#2677) · 98cc35b6

由 Ma, Guokai 提交于 1月 26, 2023

* Integrate accelerator abstraction interface into deepspeed/

* Fix error message in fp16/fused_optimizer

* fix error message in fp16/unfused_optimizer.py

* assign get_accelerator().pin_memory() result to input Tensor name

* no need to check cuda and whether nvtx supported

* move try-except into inner most block

* call Event() and Stream() in get_accelerator() for data type

* Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed

* Apply op_builder backend api change from #2705 from @jeffra

* fix tests where Builder NAME is used

* keep original ...Builder.NAME interface instead of ...Builder().NAME interface

* fix builder closure for installation

* fix randomltd builder

* add comments to clarify create_op_builder and get_op_builder

* fix compatibility with pip install -e
Co-authored-by: NCheng Li <pistasable@gmail.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

98cc35b6

25 1月, 2023 1 次提交
- L
  
  Change zero_grad() argument to match pytorch (#2741) · 34a11688
  由 loadams 提交于 1月 24, 2023
  
  34a11688
14 1月, 2023 1 次提交
- J
  using correct loss scale in zero step (#2695) · fe728e3e
  由 Joe Mayer 提交于 1月 14, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  fe728e3e
11 1月, 2023 1 次提交
- J
  remove duplicated code in ZeRO (#2655) · 89da037e
  由 JackieWu 提交于 1月 11, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  89da037e
09 1月, 2023 2 次提交

J
[Bug Fixed] use torch.cuda.is_available() (#2661) · 323c266c
由 JackieWu 提交于 1月 09, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
323c266c

Remove unnecessary device synchronization for stage 2 (#2500) · 97deaaec

由 li-yi-dong 提交于 1月 09, 2023

* Remove unnecessary device synchronization for stage 2

* Remove unnecessary device synchronization for stage 2
Co-authored-by: Nliyidong.lyd <liyidong.lyd@alibaba-inc.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJoe Mayer <114769929+jomayeri@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

97deaaec

17 12月, 2022 1 次提交
- call empty_cache to really free up GPU memory as described in comment (#2620) · d0dbc95a
  由郭叶军提交于 12月 17, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  d0dbc95a
10 11月, 2022 1 次提交
- stage_1_and_2.py: no allreduce needed when mp size is 1 (#2494) · 3ca9878d
  由郭叶军提交于 11月 10, 2022
  
  3ca9878d
18 10月, 2022 1 次提交

Universal checkpoint for zero stage 1 (#2284) · 799120e7

由 Olatunji Ruwase 提交于 10月 18, 2022

* Refactor universal checkpointing and tensor fragments

* Formatting

* Support zero stage1; Expand TP dim

* Remove debug prints

* Detect sharded optimizer state

* Format fixes

* Encode reshaping guide

* More symbolic constants
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

799120e7

04 8月, 2022 1 次提交

Match compute and reduce dtype (#2145) · e419f7cb

由 Olatunji Ruwase 提交于 8月 04, 2022

* Match compute and reduce dtype

* Unit tests
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

e419f7cb

27 7月, 2022 1 次提交

Refactor ZeRO configs to use Pydantic (#2004) · 59975896

由 Michael Wyatt 提交于 7月 27, 2022

* first pass at pydanticifying Zero Configs

* added pydantic to reqs

* fixed bug with deprecated values not being type-checked

* fixing zero config bugs from unit tests

* fixed access of Config values

* removing zero constants

* formatting/fix broken import

* fixed bad merge

* fixed issue with missing aliased field

* fix for failing tests

* fix how deprecated fields are processed

* only process dep params when they are set

* fix mistyped field name

* fixes, docs, removed more constants

* fix merge

* more fixes after merge w master

* added unit tests

* formatting

* added fix for transformers unit tests

* separated offload config from zero config

* fixed bad import

* formatting and flake fixes

* implement suggestion from review
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

59975896

21 7月, 2022 1 次提交

Checkpoint reshaping (#1953) · 80d0a32f

由 Olatunji Ruwase 提交于 7月 20, 2022

* unit test, remove exception, add notes

* Move param_shapes to model files

* Remove hard-coded constants

* Conditioned to zero optimizer

* Add zero checkpoint merging

* Print checkpoint version

* Reshape zero_* ckpt files

* Merge zero* files contraction

* Utils for 3D contraction reshaping

* Remove bogus import

* Support bf16_zero ckpts

* Add param slice mappings

* Load universal checkpoints

* Per group mappings from Stas

* Hack to load bf16 zero files

* Param attributes

* WIP

* Fix api bug

* Update lp with local/remote hp

* Disable vocab padding handling

* Update z2 checkpoint

* Remove debug prints

* Remove debug prints; Rebase unit test

* Add reshape assert

* Padding

* Typo

* Catch nonexistent checkpoint path

* Cleanup

* Restore checkpoint state comparisons

* Add torch version guards

* More precise avoidance of false positives.
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

80d0a32f

14 7月, 2022 1 次提交

Improving memory utilization of Z2+MoE (#2079) · c1af73f7

由 Siddharth Singh 提交于 7月 13, 2022

* Shards expert parameter groups
* Do upscaling, optimizer and deletion of fp32 grads one-by-one on each parameter group in zero-2
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

c1af73f7

08 7月, 2022 1 次提交
- S
  Fix partition id in the fp32->fp16 param copying step for z2+cpu-offload (#2059) · b3388e14
  由 Siddharth Singh 提交于 7月 07, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  b3388e14
07 7月, 2022 1 次提交

Comments for better understanding of zero stage1_2 (#2027) · 9305916d

由 kisseternity 提交于 7月 07, 2022

Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

9305916d

28 6月, 2022 1 次提交
- S
  
  correct partition_id in fp32 param -> fp16 param for MoE+z2 (#2058) · 38a00bee
  由 Siddharth Singh 提交于 6月 27, 2022
  
  38a00bee
21 6月, 2022 1 次提交
- K
  fix import errors (#2026) · 735406e5
  由 Karim Foda 提交于 6月 20, 2022
```
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
```
  735406e5
14 6月, 2022 1 次提交
- Q
  Relax assertion to allow Megatron-DeepSpeed MoE to use ZeRO-1 (#2007) · 25b2fc29
  由 Quentin Anthony 提交于 6月 13, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  25b2fc29
11 6月, 2022 1 次提交

DeepSpeed comm backend v1 (#1985) · 36ad3119

由 Ammar Ahmad Awan 提交于 6月 10, 2022

Co-authored-by: NQuentin Anthony <qganthony@yahoo.com>
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

36ad3119

16 5月, 2022 1 次提交
- trivial fix (#1954) · 5053217e
  由 kisseternity 提交于 5月 16, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  5053217e
12 5月, 2022 1 次提交
- J
  
  Fairseq support (#1915) · 50893458
  由 Jeff Rasley 提交于 5月 11, 2022
  
  50893458
20 4月, 2022 1 次提交

bf16+pipeline parallelism (#1801) · 56c52238

由 Olatunji Ruwase 提交于 4月 19, 2022

* bf16 updates

* Got bf16 working

* fp32 reduction; flattened tensors

* bf16+zero_stage_1 first cut

* finish zero_stage 1 sharding

* Matching fp16 with debugging codes

* Matching loss with fp16

* Fix gradient clipping

* bf16 gradient clipping fix
bf16 checkpoint save/load

* Unscale grad norm

* Fix grad norm scaling

* Enable loading fp16_zero_1 into bf16_zero_1 engine and vice versa

* Fix clip_grad key error

* Reduce tied weight gradients

* Fix grad norm for moe

* Reduce specified gradients

* Use O(n) instead of O(n^2)

* Remove optimizer restriction for bf16

* Link bf16 & fp32 params

* Clip gradients of last stage tied weights

* Simplify tied weights reduction logic

* Also clip all tp rank parameters

* lp to hp mapping

* Link lp/hp/optim state; Refresh links after checkpoint load

* Remove debug print

* Remove debug print

* Simplify zero_grad logic

* fp32 accessors

* Fix update bug
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

56c52238

23 3月, 2022 1 次提交
- A
  
  deepscale --> deepspeed in prints. (#1854) · 788e1c40
  由 Ammar Ahmad Awan 提交于 3月 22, 2022
  
  788e1c40
18 3月, 2022 1 次提交
- O
  Track only trainable parameters (#1780) · b84edef2
  由 Olatunji Ruwase 提交于 3月 18, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  b84edef2
17 3月, 2022 1 次提交
- J
  
  [ZeRO-1] fix bug w. cpu-offload + > 1 GPU (#1841) · 28434c00
  由 Jeff Rasley 提交于 3月 16, 2022
  
  28434c00

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年