提交 · aef6c65ce39d191ca31618b2a995599942574fd9 · Greenplum / DeepSpeed

12 7月, 2023 1 次提交

Reduce Unit Test Times (Part 3) (#3850) · aef6c65c

由 Michael Wyatt 提交于 7月 11, 2023

* add coverage report

* define env vars in shared action

* reduce time for longest running tests

* fix broken shared action

* reduce test time

* reducing Pipeline test times

* further reducing test times

* rework Z3 test

* testing new mp.pool and persistent dist envs

* fix import

* reuse distributed environment for tests with lots of param combos

* fix for dist teardown

* fix pickling issue with pool cache

* actually fix pickling problem

* avoid running pool cache stuff on non-distributed tests

* fix issues with nested mp.pool

* fix for nested pools in Pipeline Engine

* re-add params

* update workflows with pytest opts

* implement feedback

* resolve race condition with port selection

* Update tests/unit/common.py

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

aef6c65c

07 7月, 2023 1 次提交

[ROCm] Enable TestCUDABackward::test_backward unit tests (#3849) · d24629f4

由 Ramya Ramineni 提交于 7月 06, 2023

* Workaround to pass unit/ops/accelerators/test_accelerator_backward.py unit tests on ROCm

* Rearranged is_rocm_pytorch()

* Introduced is_rocm_pytorch() for ROCm

* Fixed formatting errors

* Function call

* formatting fix

---------
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: NLogan Adams <loadams@microsoft.com>

d24629f4

31 3月, 2023 1 次提交
- M
  Update DeepSpeed copyright license to Apache 2.0 (#3111) · b361c727
  由 Michael Wyatt 提交于 3月 30, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  b361c727
27 3月, 2023 1 次提交
- J
  
  update formatter version and style settings (#3098) · 91d63e02
  由 Jeff Rasley 提交于 3月 27, 2023
  
  91d63e02
08 3月, 2023 1 次提交
- M
  [RFC] add device abstraction to allow other device than CUDA be used (#2221) · 0acf7e9c
  由 Ma, Guokai 提交于 3月 08, 2023
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  0acf7e9c
28 2月, 2023 1 次提交

add missing license info to top of all source code (#2889) · da84e60d

由 Jeff Rasley 提交于 2月 27, 2023

Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: NConglong Li <conglong.li@gmail.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

da84e60d

18 1月, 2023 1 次提交

CUDA optional deepspeed ops (#2507) · 3f210c97

由 Olatunji Ruwase 提交于 1月 17, 2023

* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU

* guarde the CUDA-related include files and variables

* remove CUDA dependency from op_builder when building against CPU

* fixing the builder issues

* fix formatting

* return true when there is no mismatch on the cuda version

* guard for when cuda is not available & test with cpu-only environment

* Update cpu_adam and cpu_adagrad

* Format fixes

* Add configurable half precision type; Build/run in CUDA environment

* Run cpu_adam and cpu_adagrad in cpu only environment

* Mark CUDA only unit tests

* CPU environment CI

* Format fixes

* Remove --forked

* Add --forked

* CPU only CI should pass

* Format fixes

* Format fixes

* Remove scattered pytest.skip

* Fix cpu_adam unit test

* Update .github/workflows/nv-torch-latest-cpu.yml
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* Update .github/workflows/nv-torch-latest-cpu.yml
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

* Address PR feedback

* OpenMP linking

* Fix unit tests
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

3f210c97

05 10月, 2022 1 次提交

Refactor remaining distributed tests (#2216) · ff427438

由 Michael Wyatt 提交于 10月 04, 2022

* batch of refactored tests

* more test refactoring

* fp16 test refactor

* more refactors

* added DistributedFixture class

* applied DistributedFixture to first batch of tests as a trial

* added DistributedFixture test and documentation

* last tests

* fixes for refactored tests

* remove subdirs in workflow files

* fix pytest syntax error

* fix another syntax error

* update imports

* use DistFixture with elastic checkpoint test

* missing import

* update to shared class tmpdir for elastic test

* moved test files

* avoid duplicate test file name

* last refactor and moving test files

* formatting

* fix broken import

* testing forked AMD tests

* update abstract method

* use blob storage for accelerate and transformers tests

* upgrade torch for acclerate CI
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

ff427438

26 7月, 2022 1 次提交
- A
  
  Add flake8 to pre-commit checks (#2051) · 316c4a43
  由 Alex Hedges 提交于 7月 25, 2022
  
  316c4a43
21 1月, 2022 1 次提交

Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453) · 4912e0ad

由 Justin Chiu 提交于 1月 20, 2022

* Changes for bfloat16 Zero2

* ZeRO stage3 optimizations, with some bug fixes

optimizations for stage3:
- prefetching improvements
- batching allgather calls to amortize fixed overhead and improve
  bandwidth utilization
- batching reduce_scatter calls to amortize fixed overhead and
  improve bandwidth utilization
- using *_base variants of allgather and reduce scatter to reduce memory
  allocations and data movement
- more fine grained synchronization for communication that allows
  blocking on less work
- precomputation of fetching code - using a fetch queue rather than
  deciding what to (pre)fetch at each iteration
- limiting queued coalesced communication ops to reduce memory pressure
  on pytorch cuda caching allocator (not elegant solution)

optimizations for stage3-offload:
- made some host-device tensor copies async to improve performance

bug fixes and qol improvements:
- fix init context method when parent modules modify child weights
- speed up model initialization by moving model to GPU before weight
  initialization
- fixed unit test imports so that unit tests can be run from any
  directory
- change performance logging to include memory consumption
- add logging w/ model size when done partitioning model

new features
- bfloat16 support for ZeRO 3

* fix import in ut

* ran yapf

* improvements to cache flush warn log

* backwards compatibility with older versions of pytorch

* handle edge case where reduced tensor smaller than world size

* moved event synchronization to allgather handle wait() call

* removed unnecessary barrier call

* formatting fix after resolving merge conflict

* skip nvme prefetch when trace not complete

* opportunistically avoid memory allocation in allgather coalesced where possible

* fix indentation after merge

* fixes to account for parameter offload

* accounting for torch.cuda.memory_stats not being available

* moved partition_all_params to optimizer step

* allgathering on params before item gets called

* fix param status checks

needed after moving partition_all_parameters call to optimizer step

* fix grad accumulation with optimizer offload

* grad norm computation fix for optimizer offload

* change post divide in reduce-scatter to pre divide

* fix gradient race condition w/ optimizer offload

* improve inf/nan gradient tracking

* don't prefetch when not in training mode

* format fix after merging

* fix prefetching issue when using NVME offload

* improved defragmentation for fp16 parameters

* relative imports for bf16 tests

* changes for bwd compatibility with pytorch 1.2

* remove buffered_reduce_fallback

* removed unused parameter offset bookkeeping

* fixed tracking for multiple param groups

* unbroke bfloat16 config after merge conflict

* using base allgather params when only 1 param

* cleanup/fixes for fp16 partition defragmentation

* switch to CRLF

* convert to same new-line style as master

* align new line with master

* Fix merge issues

* switch to CRLF

* fix to LF line endings

* minor merge fixes

* remove extra bfloat16_enabled definition

* asserting params inflight for AllGatherHandle

* remove get_cuda_mem_allocated_str

* Format fixes

* fix bfloat16 zero stage check (broken after merge commit)

* +self.communication_data_type, -self.allreduce_always_fp32; delete dead code

* Add self.reduce_scatter

* Format fix

* Fix merge issues

* iterate over params_to_fetch rather than make another iterator

* add some TODOs

* remove unnecessary division by micro_step_id

* rename config keys "bfloat16" -> "bf16"

* rename stage3_gather_fp16_weights_on_model_save -> stage3_gather_16bit_weights_on_model_save

* add unit test to check backwards compatibility for gather_16bit_weights

* added test to confirm bf16 key bwd compatibility

* Format fixes
Co-authored-by: NRana Ali Amjad <raamjad@amazon.com>
Co-authored-by: NJustin Chiu <justchiu@amazon.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

4912e0ad

02 12月, 2021 1 次提交

Transformer kernel/fix layer norm (#1587) · 8e891aa5

由 Reza Yazdani 提交于 12月 01, 2021

* fixing the softmax masking when using triangular masking

* fix a bug in the the layernorm backward kernels

* revert back some changes & remove debug code

* change the constants to a macro
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

8e891aa5

01 11月, 2021 1 次提交
- R
  Transformer kernel - fix unit test (#1503) · 2c5bba6d
  由 Reza Yazdani 提交于 10月 31, 2021
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  2c5bba6d
02 10月, 2021 1 次提交

Fix many typos (#1423) · be789b16

由 Alex Hedges 提交于 10月 01, 2021

* Fix typos in docs/

* Fix typos in code comments and output strings

* Fix typos in the code itself

* Fix typos in tests/
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

be789b16

24 5月, 2021 1 次提交

Quantization + inference release (#1091) · ed3de0c2

由 Reza Yazdani 提交于 5月 24, 2021

Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NArash Ashari <arashari@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Nniumanar <60243342+niumanar@users.noreply.github.com>

ed3de0c2

08 4月, 2021 1 次提交
- R
  Supporting different hidden dimensions for transformer kernels-v2 (#934) · e721cb69
  由 Reza Yazdani 提交于 4月 07, 2021
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  e721cb69
09 3月, 2021 1 次提交

ZeRO 3 Offload (#834) · 599258f9

由 Samyam Rajbhandari 提交于 3月 08, 2021

* Squash stage3 v1 (#146)
Co-authored-by: NSamyam <samyamr@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>

* Fix correctness bug (#147)

* formatting fix (#150)

* stage3 bugfix (API) update and simplified FP16 Z3 tests (#151)

* fp16 Z3 API update and bugfix

* revert debug change

* ZeRO-3 detach and race condition bugfixes (#149)

* trying out ZeRO-3 race condition fix

* CUDA sync instead of stream

* reduction stream sync

* remove commented code

* Fix optimizer state_dict KeyError (#148)
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

* fix for smaller SGS sizes, ensures each grad is backed by unique tensors (#152)

* Simplifying the logic for getting averaged gradients (#153)

* skip for now

* Z3 Docs redux (#154)

* removing some TODOs and commented code (#155)

* New Z3 defaults (#156)
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

* formatting

* megatron external params
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: Neltonzheng <eltonz@microsoft.com>

599258f9

30 1月, 2021 1 次提交
- J
  
  Dist testing backend fixes, etc. (#708) · 2e2dd861
  由 Jeff Rasley 提交于 1月 29, 2021
  
  2e2dd861
28 1月, 2021 1 次提交
- J
  
  [transformer-kernel] turn off unit test printing (#701) · 91b1b7f3
  由 Jeff Rasley 提交于 1月 27, 2021
  
  91b1b7f3
07 1月, 2021 1 次提交

Module replacement support (#586) · 44bd538b

由 Jeff Rasley 提交于 1月 06, 2021

Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

44bd538b

18 12月, 2020 1 次提交
- R
  Transformer-kernel - supporting any arbitrary sequence-length (#587) · fd2f970b
  由 Reza Yazdani 提交于 12月 17, 2020
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  fd2f970b
02 12月, 2020 1 次提交

supporting different hidden dimensions (#559) · c78c29f9

由 Reza Yazdani 提交于 12月 01, 2020

* supporting different hidden dimensions

* add support for larger hidden dimensions (greater than 8K)

* remove empty line

* add loop unrolling factor for dropout kernels

* update different kernels based on the reviews
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

c78c29f9

13 11月, 2020 1 次提交

DeepSpeed JIT op + PyPI support (#496) · 31f46fee

由 Jeff Rasley 提交于 11月 12, 2020

Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>

31f46fee

22 9月, 2020 1 次提交
- R
  support dynamic sequence length in transformer kernels (#424) · f0f2a702
  由 RezaYazdaniAminabadi 提交于 9月 21, 2020
```
Co-authored-by: NConglong Li <conglong.li@gmail.com>
```
  f0f2a702
21 9月, 2020 1 次提交
- R
  
  Add configurable intermediate size to transformer kernels (#423) · a148bd33
  由 RezaYazdaniAminabadi 提交于 9月 21, 2020
  
  a148bd33
15 9月, 2020 1 次提交
- J
  pytest skips for tests requiring certain ops (#411) · 91b4a93d
  由 Jeff Rasley 提交于 9月 15, 2020
```
* add pytest skips around tests that require certain ops to be installed
```
  91b4a93d
12 9月, 2020 1 次提交
- J
  Revert "supporting different intermediate sizes other than 4 * hidden_dim (#389)" (#404) · 4ac9bf60
  由 Jeff Rasley 提交于 9月 11, 2020
```
This reverts commit e549be60.
```
  4ac9bf60
11 9月, 2020 1 次提交

supporting different intermediate sizes other than 4 * hidden_dim (#389) · e549be60

由 RezaYazdaniAminabadi 提交于 9月 11, 2020

* supporting different intermediate sizes other than 4*hidden_dim

* run precommit

* uncommnet the unit tests
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

e549be60

30 5月, 2020 2 次提交

J

update tests · bbd8cd7d
由 Jeff Rasley 提交于 5月 29, 2020

bbd8cd7d

Transformer kernel release (#242) · 734d8991

由 Jeff Rasley 提交于 5月 29, 2020

* Transformer kernels release
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NTunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>
Co-authored-by: NShaden Smith <ShadenTSmith@gmail.com>
Co-authored-by: NElton Zheng <eltonz@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NRezaYazdaniAminabadi <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NTunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NShaden Smith <Shaden.Smith@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>

734d8991

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年