提交 · 674c75882d93a4feb37b259ea981da38c1cd3bb7 · Greenplum / DeepSpeed

12 2月, 2022 4 次提交
- J
  
  unset torch arch list for JIT mode (#1765) · 674c7588
  由 Jeff Rasley 提交于 2月 11, 2022
  
  674c7588
- O
  [Doc] Add async I/O op (#1754) · 4f96ffd9
  由 Olatunji Ruwase 提交于 2月 11, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  4f96ffd9
- R
  Load MoE checkpint at deepspeed inference-engine (#1759) · 841f99d1
  由 Reza Yazdani 提交于 2月 11, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  841f99d1
- C
  Fix autotuning config (#1660) · cccc5450
  由 Cheng Li 提交于 2月 11, 2022
```
* remove force=multi and fix None val check in base tuner

* fix format

* ignoring optimizer dict when generating combinations of tuning configs
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  cccc5450
11 2月, 2022 3 次提交
- R
  
  fixing the inference build path when pre-building the inference op (#1755) · d3cad051
  由 Reza Yazdani 提交于 2月 10, 2022
  
  d3cad051
- J
  
  fix pytest issue (#1764) · 56eac382
  由 Jeff Rasley 提交于 2月 10, 2022
  
  56eac382
- D
  
  fixing a bf16 support issue (#1760) · 97f8a9eb
  由 Du Li 提交于 2月 10, 2022
  
  97f8a9eb
10 2月, 2022 1 次提交
- J
  
  Loosen requirement on packaging dependency (#1758) · dbe8ee16
  由 Jeff Rasley 提交于 2月 09, 2022
  
  dbe8ee16
09 2月, 2022 1 次提交
- L
  
  Improve how runner parses env var file (#1747) · dac9056e
  由 liamcli 提交于 2月 08, 2022
  
  dac9056e
08 2月, 2022 1 次提交
- O
  Move param_shapes to model files (#1732) · 135a6256
  由 Olatunji Ruwase 提交于 2月 07, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  135a6256
05 2月, 2022 1 次提交
- C
  separate add and mul flops compute function (#1745) · ba9c4cc7
  由 Cheng Li 提交于 2月 04, 2022
```
* separate add and mul flops compute function

* fix format
```
  ba9c4cc7
04 2月, 2022 1 次提交
- R
  Fix the tensor-slicing with multi-GPU inference and kernel-injection (#1724) · fa073576
  由 Reza Yazdani 提交于 2月 03, 2022
```
* use the right tensor-copy function when adding tensor-slicing

* small fix in inference tutorial
```
  fa073576
31 1月, 2022 2 次提交
- S
  [debug log] disable (#1736) · bea701a1
  由 Stas Bekman 提交于 1月 30, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  bea701a1
- S
  
  [distributed] print dist init only on rank 0 (#1737) · e26de20b
  由 Stas Bekman 提交于 1月 30, 2022
  
  e26de20b
30 1月, 2022 2 次提交
- S
  [zero3] remove debug print (#1733) · fdd59ca2
  由 Stas Bekman 提交于 1月 29, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  fdd59ca2
- S
  
  [config] fix assert message (#1734) · ed4bbe08
  由 Stas Bekman 提交于 1月 29, 2022
  
  ed4bbe08
28 1月, 2022 1 次提交
- J
  
  Multi-node save pid support + allow sparse-attn extra (#1728) · 9351266f
  由 Jeff Rasley 提交于 1月 27, 2022
  
  9351266f
27 1月, 2022 1 次提交
- J
  
  launcher save pid + require manual triton install for sparse-attn (#1727) · 171316fc
  由 Jeff Rasley 提交于 1月 26, 2022
  
  171316fc
26 1月, 2022 1 次提交

Add a very simple PyTorch Lightning test! (#1726) · df724e71

由 Sean Naren 提交于 1月 26, 2022

* Add a very simple PyTorch Lightning test

* Run for just one epoch

* Swap to using the plugin API till the strategy API makes it to 1.6

df724e71

23 1月, 2022 1 次提交
- A
  
  Add codespell to pre-commit checks (#1717) · 4cf970e6
  由 Alex Hedges 提交于 1月 22, 2022
  
  4cf970e6
22 1月, 2022 1 次提交
- M
  
  Align bfloat16 docs (#1715) · 09c065b4
  由 Manuel R. Ciosici 提交于 1月 21, 2022
  
  09c065b4
21 1月, 2022 2 次提交

O

Fix checkpoint api (#1714) · e40558de
由 Olatunji Ruwase 提交于 1月 21, 2022

e40558de

Various ZeRO Stage3 Optimizations + Improvements (including bfloat16 support) (#1453) · 4912e0ad

由 Justin Chiu 提交于 1月 20, 2022

* Changes for bfloat16 Zero2

* ZeRO stage3 optimizations, with some bug fixes

optimizations for stage3:
- prefetching improvements
- batching allgather calls to amortize fixed overhead and improve
  bandwidth utilization
- batching reduce_scatter calls to amortize fixed overhead and
  improve bandwidth utilization
- using *_base variants of allgather and reduce scatter to reduce memory
  allocations and data movement
- more fine grained synchronization for communication that allows
  blocking on less work
- precomputation of fetching code - using a fetch queue rather than
  deciding what to (pre)fetch at each iteration
- limiting queued coalesced communication ops to reduce memory pressure
  on pytorch cuda caching allocator (not elegant solution)

optimizations for stage3-offload:
- made some host-device tensor copies async to improve performance

bug fixes and qol improvements:
- fix init context method when parent modules modify child weights
- speed up model initialization by moving model to GPU before weight
  initialization
- fixed unit test imports so that unit tests can be run from any
  directory
- change performance logging to include memory consumption
- add logging w/ model size when done partitioning model

new features
- bfloat16 support for ZeRO 3

* fix import in ut

* ran yapf

* improvements to cache flush warn log

* backwards compatibility with older versions of pytorch

* handle edge case where reduced tensor smaller than world size

* moved event synchronization to allgather handle wait() call

* removed unnecessary barrier call

* formatting fix after resolving merge conflict

* skip nvme prefetch when trace not complete

* opportunistically avoid memory allocation in allgather coalesced where possible

* fix indentation after merge

* fixes to account for parameter offload

* accounting for torch.cuda.memory_stats not being available

* moved partition_all_params to optimizer step

* allgathering on params before item gets called

* fix param status checks

needed after moving partition_all_parameters call to optimizer step

* fix grad accumulation with optimizer offload

* grad norm computation fix for optimizer offload

* change post divide in reduce-scatter to pre divide

* fix gradient race condition w/ optimizer offload

* improve inf/nan gradient tracking

* don't prefetch when not in training mode

* format fix after merging

* fix prefetching issue when using NVME offload

* improved defragmentation for fp16 parameters

* relative imports for bf16 tests

* changes for bwd compatibility with pytorch 1.2

* remove buffered_reduce_fallback

* removed unused parameter offset bookkeeping

* fixed tracking for multiple param groups

* unbroke bfloat16 config after merge conflict

* using base allgather params when only 1 param

* cleanup/fixes for fp16 partition defragmentation

* switch to CRLF

* convert to same new-line style as master

* align new line with master

* Fix merge issues

* switch to CRLF

* fix to LF line endings

* minor merge fixes

* remove extra bfloat16_enabled definition

* asserting params inflight for AllGatherHandle

* remove get_cuda_mem_allocated_str

* Format fixes

* fix bfloat16 zero stage check (broken after merge commit)

* +self.communication_data_type, -self.allreduce_always_fp32; delete dead code

* Add self.reduce_scatter

* Format fix

* Fix merge issues

* iterate over params_to_fetch rather than make another iterator

* add some TODOs

* remove unnecessary division by micro_step_id

* rename config keys "bfloat16" -> "bf16"

* rename stage3_gather_fp16_weights_on_model_save -> stage3_gather_16bit_weights_on_model_save

* add unit test to check backwards compatibility for gather_16bit_weights

* added test to confirm bf16 key bwd compatibility

* Format fixes
Co-authored-by: NRana Ali Amjad <raamjad@amazon.com>
Co-authored-by: NJustin Chiu <justchiu@amazon.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

4912e0ad

20 1月, 2022 4 次提交
- J
  
  preserve cuda visible devices order (#1712) · 2d51f617
  由 Jeff Rasley 提交于 1月 19, 2022
  
  2d51f617
- R
  
  Fix inference api & add more description on inference engine tutorial (#1711) · 94de0229
  由 Reza Yazdani 提交于 1月 19, 2022
  
  94de0229
- J
  
  add logo and move news (#1709) · 2662fded
  由 Jeff Rasley 提交于 1月 19, 2022
  
  2662fded
- A
  Reorganize MoE news and tutorials. (#1708) · af074de3
  由 Ammar Ahmad Awan 提交于 1月 19, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  af074de3
19 1月, 2022 4 次提交

Add more context for the MoE Inference tutorial (#1707) · e27a60a8

由 Reza Yazdani 提交于 1月 18, 2022

Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

e27a60a8

Z
pr moe tutorial creation (#1704) · 53fdadfb
由 Zhewei Yao 提交于 1月 18, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
53fdadfb
R
add moe-inference tutorial (#1706) · 38e16c69
由 Reza Yazdani 提交于 1月 18, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
38e16c69

MoE inference + PR-MoE model support (#1705) · e46d808a

由 Jeff Rasley 提交于 1月 18, 2022

Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NZhewei Yao <zheweiy@berkeley.edu>
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NSamyam Rajbhandari <samyamr@microsoft.com>

e46d808a

15 1月, 2022 1 次提交

[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory... · 3293cf72

由 Jeff Rasley 提交于 1月 14, 2022

[ZeRO] Default disable elastic ckpt in stage 1+2 and reduce CPU memory overhead during ckpt load (#1525)
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

3293cf72

14 1月, 2022 1 次提交
- J
  
  force clear stashed tensors (#1698) · e4cf40d6
  由 Jeff Rasley 提交于 1月 13, 2022
  
  e4cf40d6
13 1月, 2022 1 次提交
- L
  
  support module and no python args for launcher (#1690) · fead387f
  由 liamcli 提交于 1月 12, 2022
  
  fead387f
11 1月, 2022 3 次提交
- J
  
  add -lcurand to fix torch-nightly issue w. JIT (#1688) · a85dce07
  由 Jeff Rasley 提交于 1月 10, 2022
  
  a85dce07
- J
  
  [docs] switch to transparent dark logo · 3a4cb042
  由 Jeff Rasley 提交于 1月 10, 2022
  
  3a4cb042
- R
  
  fix the half-precision version of rotary_pos_emb kernel (#1683) · 762e697a
  由 Reza Yazdani 提交于 1月 10, 2022
  
  762e697a
08 1月, 2022 3 次提交
- R
  GPT-J inference support (#1670) · 289c3f9b
  由 Reza Yazdani 提交于 1月 07, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  289c3f9b
- J
  
  [docs] add gh-dark-mode logo · 7e857aab
  由 Jeff Rasley 提交于 1月 07, 2022
  
  7e857aab
- J
  
  [docs] add light-mode logo · 9c5cf3a5
  由 Jeff Rasley 提交于 1月 07, 2022
  
  9c5cf3a5

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年