提交 · 6fff50f52b85dda0686bbe365fcc27be986de4a5 · Greenplum / DeepSpeed

21 12月, 2022 1 次提交

Unit tests setup own venv (#2628) · 6fff50f5

由 Michael Wyatt 提交于 12月 20, 2022

add reusable workflow that sets up fresh venv for each test and prints relevant environment info

6fff50f5

20 12月, 2022 2 次提交
- J
  
  [launcher] parse hostfile via regex and added error checks (#2626) · 8c56c25d
  由 Jeff Rasley 提交于 12月 19, 2022
  
  8c56c25d
- J
  
  [inference] check for unsupported model generate args (#2627) · 5676f5ec
  由 Jeff Rasley 提交于 12月 19, 2022
  
  5676f5ec
18 12月, 2022 1 次提交

Add Megatron CI workflow (#2614) · df985fac

由 Michael Wyatt 提交于 12月 17, 2022

* added megatron unit test

* Update nv-megatron.yml
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

df985fac

17 12月, 2022 4 次提交

Update AVX512 Detection (#2621) · a25c31b6

由 Connor Holmes 提交于 12月 17, 2022

* Update cpuinfo AVX512 detection

* Missing conversion from `_mm256` to `_mm256i`
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

a25c31b6

fixes #2498 (#2603) · 0f0e38c5

由 Alexander Jipa 提交于 12月 16, 2022

taking gradient accumulation steps into account for throughput calculation
Co-authored-by: NAlexander Jipa <azzhipa@amazon.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

0f0e38c5

Remove GatheredParameters context from replace_with_policy (#2591) · 503706ac

由 Lev Kurilenko 提交于 12月 16, 2022

This PR removes the zero-infernece GatheredParameters context from replace_with_policy due to no longer needing zero-inference after the introduction of meta tensor support for BLOOM.

503706ac

call empty_cache to really free up GPU memory as described in comment (#2620) · d0dbc95a
由郭叶军提交于 12月 17, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
d0dbc95a

16 12月, 2022 3 次提交
- skip torch.zeros and tensor.copy_ when model parallel is not used (#2479) · b08cf416
  由郭叶军提交于 12月 16, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  b08cf416
- J
  
  Use rocm/pytorch:latest (#2613) · 7e2103f8
  由 Jithun Nair 提交于 12月 15, 2022
  
  7e2103f8
- M
  
  add fix for older pydantic versions (#2611) · 2e596e67
  由 Michael Wyatt 提交于 12月 15, 2022
  
  2e596e67
14 12月, 2022 3 次提交

R
[deepspeed/autotuner] Bug fix for binary search for batch size (#2162) · 384f17b0
由 Rahil Bathwal 提交于 12月 13, 2022
```
* bug fix for binary search for batch size

* fix binary search termination condition
```
384f17b0

Move layer norm to new schedule (#2590) · 3a3dfe66

由 lokoppakmsft 提交于 12月 13, 2022

* Move layer norm to new schedule

* Pre-commit fixes

* fix comments

* format fixes

* Merge unrolls

* format fixes

* camelCase

* format fixes

* revert unwanted file

* move pow2 function

* format fixes
Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>

3a3dfe66

Migrate ops tests to new inference_ops marker (#2599) · 7425a365

由 Connor Holmes 提交于 12月 13, 2022

* Migrate ops tests to new inference_ops marker

* Disable by default

* Add missing test cases

* Reorder such that inference_ops will run[fail] first

7425a365

13 12月, 2022 3 次提交
- C
  
  fix blog link (#2600) · acde873c
  由 Conglong Li 提交于 12月 12, 2022
  
  acde873c
- C
  DeepSpeed Data Efficiency Library (#2585) · ef869377
  由 Conglong Li 提交于 12月 12, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  ef869377
- J
  
  bump to 0.7.8 · 2600db54
  由 Jeff Rasley 提交于 12月 12, 2022
  
  2600db54
10 12月, 2022 2 次提交
- J
  
  get mask token from tokenizer (#2592) · 2076bf23
  由 Jeff Rasley 提交于 12月 09, 2022
  
  2076bf23
- J
  
  Fix issues w. python 3.6 + add py-version checks to CI (#2589) · 35eabb0a
  由 Jeff Rasley 提交于 12月 09, 2022
  
  35eabb0a
09 12月, 2022 3 次提交
- J
  Updating API docs (#2586) · 18713c68
  由 Joe Mayer 提交于 12月 08, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  18713c68
- J
  Updating docs README (#2587) · 377c770a
  由 Joe Mayer 提交于 12月 08, 2022
```
* Updating docs README with API update procedure.

* Addressing comments.
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  377c770a
- M
  Add checkpoint sharding unit tests (#2561) · ccb8eb81
  由 Michael Wyatt 提交于 12月 08, 2022
```
* added checkpopint sharding tests
```
  ccb8eb81
08 12月, 2022 3 次提交

Support N-dimension input in quantization kernel (#2575) · 591744eb

由 lokoppakmsft 提交于 12月 07, 2022

* Add support for inputs > 2D

* use vec

* Add N-Dim support to Dequant kernel

* merge master and fix format

* Bug Fix

* fix num_bits

* Fix dequant
Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>

591744eb

Q
Update barrier and reduce_scatter_base to conform to PyTorch signatures (#2570) · 18d55e54
由 Quentin Anthony 提交于 12月 07, 2022
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
18d55e54

Fix MegatronLayerPolicy to have megatron_v2=True (#2579) · 731965db

由 Lev Kurilenko 提交于 12月 07, 2022

This PR updates the MegatronLayerPolicy to set megatron_v2=True, which is required in order to properly transpose in the replace_with_policy() function.

After the change in this PR, in conjunction with PR #99 in the Megatron-DeepSpeed fork, the Megatron text-generation example works with DS inference.

731965db

07 12月, 2022 2 次提交

Fix quantized-inference & Add generic support of checkpoint loading (#2547) · 35b350b2

由 Reza Yazdani 提交于 12月 06, 2022

* fix checkpoint loading when it is a dictionary

* fix some issues with saving ckpt & int8 inference

* fix quantized-inference & add generic support of checkpoint loading

* remove int8 hard-coded flag

* fix mlp return tensors

* fix several issue to load checkpoints of GPT-J, GPT-NEOX, and OPT with different TP-size

* add more comments & description for checkpoint-loading module
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

35b350b2

C
Drop Maxwell Support (#2574) · b8416282
由 Connor Holmes 提交于 12月 06, 2022
```
* Officially drop Maxwell support

* Formatting

* Comparison mismatch fix
```
b8416282

06 12月, 2022 2 次提交

Support fp32 gradaccum for bf16 model (#2566) · 06938835

由 Ma, Guokai 提交于 12月 06, 2022

* allow bf16 model with fp32 gradient accumulation datatype

* allow fp32 gradient accumulation and bfloat16 model in amp mode

* alternative fix for grad accumulation type mismatch.  In the case of zero optimizer we should have grad accum type == model data type
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

06938835

H

Add Determined to open-source DL frameworks (#2573) · 2d8f3f56
由 Hayden 提交于 12月 05, 2022

2d8f3f56

03 12月, 2022 1 次提交
- J
  
  use get_global_rank if available (#2567) · d4cab2ce
  由 Jeff Rasley 提交于 12月 02, 2022
  
  d4cab2ce
02 12月, 2022 2 次提交
- J
  
  docs: Update the recent url for Megatron-LM (#2564) · bbe030c5
  由 Jeongseok Kang 提交于 12月 02, 2022
  
  bbe030c5
- J
  
  bump to 0.7.7 · c77d42dc
  由 Jeff Rasley 提交于 12月 01, 2022
  
  c77d42dc
01 12月, 2022 1 次提交
- A
  Fix invalid check of recorded parameter orders in zero stage3. (#2550) · aeda7f9f
  由 AGUL 提交于 12月 01, 2022
```
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
```
  aeda7f9f
30 11月, 2022 3 次提交

Abstract accelerator (step 1) (#2504) · ffcf3846

由 Ma, Guokai 提交于 11月 30, 2022

* Establish building block of abstract accelerator

* Change .*Tensor variable to @property

* [op builder] add op builder reflection to allow enumerate of builders in all_ops.py and builder_names.py

* change @abstractproperty to @property @abstractmethod
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

ffcf3846

M

add missing moe deprecated fields to inference config (#2556) · c5f85858
由 Michael Wyatt 提交于 11月 29, 2022

c5f85858

encoded ds config into command line argument when launching child processes in autotuning (#2524) · abe4fc6b

由 Cheng Li 提交于 11月 29, 2022

* rollback ds config changes

* fix format

* Fix error when output_file is a relative path without a prefix (#2397)
Co-authored-by: NBenjamin Steenhoek <benjaminjsteenhoek@gmail.com>

* fix restuls and exprs path to use absolute path

* use base64 encoded ds config as cmd arg

* fix format

* remove assert

* write out optimial config after tuning

* fix format

* no need to update ds config path when encoding ds config

* udpate

* do not use abs path for result and expr dir

* fix conflicts

* fix run mode

* fix format

* fix format
Co-authored-by: NBenjamin Steenhoek <benjaminjsteenhoek@gmail.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

abe4fc6b

29 11月, 2022 1 次提交
- S
  Report progress at gradient accumulation boundary (#2553) · 340fc0cf
  由 ShijieZZZZ 提交于 11月 28, 2022
```
* report progress at gradient accumulation boundary

* format

* format
```
  340fc0cf
28 11月, 2022 1 次提交

Adding Gradient Accumulation Data Type Config (#2512) · 21c28029

由 Joe Mayer 提交于 11月 27, 2022

* Adding gradient accumulation dtype config.

* Switching to new DtypeEnum

* Adding standalone check function, and unit tests

* Variable disambiguation

* Adding checks for unsupported states.

* Updating for PR comments.

* Reorganizing unit test.
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

21c28029

24 11月, 2022 2 次提交

Pass down the new DS inference config to replace_transformer_layer. (#2539) · 90ae6884

由 Ammar Ahmad Awan 提交于 11月 23, 2022

* pass down the new DS inference config to replace_transformer_layer.

* remove quantize_settings and rename the ep_mp_group.

* Fix model_config passing. Fixes gptj issue with wrong output.

* fix small bug in gpt-neo.

Co-authored-by: Reza Yazdani and Michael Wyatt

90ae6884

Change Where DS/Triton is Used in Stable Diffusion (#2536) · 5df1eea7

由 Connor Holmes 提交于 11月 23, 2022

* Change utilization of DS/Triton kernels

* add config at Clip-encoder
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

5df1eea7

Greenplum / DeepSpeed 上一次同步 大约 1 年

Greenplum / DeepSpeed
上一次同步大约 1 年