提交 · 0712e299205b641ddce7f2c010d45f8fe635ed67 · Greenplum / DeepSpeed

25 8月, 2023 1 次提交
- D
  add meta onDevice support for LLAMA2 (#4147) · 0712e299
  由 Dino Chen 提交于 8月 25, 2023
```
Co-authored-by: NMolly Smith <112220543+molly-smith@users.noreply.github.com>
```
  0712e299
05 8月, 2023 1 次提交
- D
  
  fix typo: change polciies to policies (#4090) · 4cde5da8
  由 digger yu 提交于 8月 05, 2023
  
  4cde5da8
04 8月, 2023 1 次提交
- L
  Fix Stable Diffusion Injection (#4078) · 1ba40989
  由 Lev Kurilenko 提交于 8月 03, 2023
```
* Initial commit

* Clean up

* Fix formatting
```
  1ba40989
01 8月, 2023 1 次提交

Refactor autoTP inference for HE (#4040) · 94c7233a

由 Molly Smith 提交于 7月 31, 2023

* Refactor autoTP inference for HE

* Formatting

* Move redundant functions to autotp

* Remove self from loading class

* formatting

* Some gpt2 autotp path fixes

* precommit

94c7233a

28 7月, 2023 3 次提交

autoTP for fused qkv weight (#3844) · 6b877d2d

由 mzl 提交于 7月 28, 2023

* autoTP for fused qkv weight

* fix format

* clean up

* clean up

* clean up

* update

* make logic flow to util and move to file

* fix formatting

* remove empty line

---------
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

6b877d2d

enable autoTP for MPT (#3861) · 0bafeac4

由 Wang, Yi 提交于 7月 28, 2023

* enable autoTP for MPT
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* add model specific func to auto_tp_model_utils.py
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

0bafeac4

fix opt-350m shard loading issue in AutoTP (#3600) · 76953a37

由 Wang, Yi 提交于 7月 28, 2023

Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

76953a37

25 7月, 2023 1 次提交
- D
  
  add llama2 autoTP support in replace_module (#4022) · f3943cf9
  由 Dino Chen 提交于 7月 25, 2023
  
  f3943cf9
12 7月, 2023 1 次提交
- D
  
  fix: change ==NONE to is (#3923) · ce535945
  由 digger yu 提交于 7月 12, 2023
  
  ce535945
06 7月, 2023 1 次提交

Add FALCON Auto-TP Support (#3640) · f3c93b05

由 Reza Yazdani 提交于 7月 05, 2023

* Add FALCON auto-tp support
* added (skipped) unit test, refactored code to be more readable

---------
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

f3c93b05

16 5月, 2023 1 次提交

[CPU] Support Intel CPU inference (#3041) · 1f72082f

由 Ma, Guokai 提交于 5月 16, 2023

* add fallback path for kernels used in megatron

* temporary numactl WA for SPR 56core

* adapt core allocation according to number of ranks

* add switch to turn on numactl

* detect number of cores on the system

* allow select a subset of the cores on the system to bind

* remove unneeded changes

* add ccl backend

* change nccl to ccl

* remove unused code

* add comm/ccl to ops

* initial ccl comm support

* first broadcast case passed

* add CCL_Backend to DeepSpeed

* support comm timer for CPU

* support barrier for comm backend

* support specify master address from deepspeed command line

* support pytorch 2.0

* remove 'block' from api

* Tweak for debug
Signed-off-by: NCao, Zhong Z <zhong.z.cao@intel.com>

* Remove unecessary directory
Signed-off-by: NCao, Zhong Z <zhong.z.cao@intel.com>

* Add bf16 kernel support for inference

* Add temporary torch implement for cpu inference

* Add softmax ops cpu fallback for inference

* bind cores to numa domain as well

* merge latest change in gma/numactl

* initial bf16 kernel support with fallback path

* initial fallback path for bloom kernel injection

* fix softmax attn mask

* check KMP_AFFINITY to avoid conflict with numactl

* New CCLBackend which utilize TorchBackend for initialization

* rollback last change because there is result error

* fix bloom injection policy TP could not work issue.

injection_policy={BloomBlock: ("self_attention.dense", "mlp.dense_4h_to_h")}

* Use TorchBackend to initialize CCLBackend, make behavior consistent

* remove comm under deepspeed/ops

* add license header

* code clean up

* fix format issue

* remove magic number in main address

* add caching support but not turn on by default

* change name of inference_cuda_module to inference_module

* Check for is_synchronized_device in accelerator before get Event

* fix typo

* Fix fallback path of softmax kernel on CUDA device for BF16 data type, because CUDA tril does not support BF16 datatype, enforce fp32 data type

* add cpu backend files

* change CPU_Accelerator op_builder_dir

* remove cpu_kernel_path

* using CPU_Accelerator on non-cuda device

* fix deepspeed.op_builder => deepspeed.ops.op_builder

* add alias for num_gpus: num_accelerators

* allow loading cpu_builder in build stage

* Assume cuda available if torch not installed

* add oneccl_binding_pt to requirements

* move oneccl-binding-pt to seperate requiremetns-cpu.txt

* add missing file

* use dependency_links in setuptools.setup() call for additional dependency links

* install oneccl_bind_pt in workflows

* change oneccl_bind_pt's version from 1.13 to 2.0

* use intel_exention_for_pytorch as indicator that CPU_Accelerator should be used

* Add indicator for Accelerator used

* change foo.c to foo.cpp

* exclude 'cpu' directory in CUDA op builder reflection

* add a cpu-inference workflow

* run cpu-inference workflow on self-hosted instance

* change cpu runs-on node to v100 node

* print out python version in workflow

* add verbose in pip command to understand oneccl_bind_pt install issue

* update cpu-inference workflow

* add a stage to detect instance instruction sets

* add back bf16 support for CPU inference

* enable autoTP for bloom
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* update workflow to detect cpu instruction sets

* temporary WA for Intel Extension for PyTorch AVX2 instructioon set detection

* change cpu-inference workflow machine to ubuntu-20.04

* add sharded checkpoint loading for AutoTP path to reduce the peak memory in initialization stage
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* enable policy for llama

* use a special build ipex to test avx2 detection fix

* fix format

* fix test fail issue
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* fix gptj sharded checkpoint loading problem
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* return a not implemented build in get_op_builder in cpu_backend

* support cpu device in tests

* use cpuinfo to extract number of CPUs

* use ~/tmp as transfomer cache rather than /blob/

* Add support for mpich launcher with prefer_deepspeed_comm

* add missing modification in accelerator

* enable IMPI launcher

* remove unused file and fix formatting

* clean up ccl.cpp

* Less confusing error message when certin op builder are not implemented

* Fix license header

* Add license header

* add license headers

* add license header

* fix cuda specific code in test

* update CPU workflow

* use numactl to bind to core

* allow bind_cores_to_rank in multi-node impi runner

* fix format error

* Remove InferenceBuilder

* fix format error in numa.py

* check whether op is in installed ops in ds_report.py

* allow override accelerator with DS_ACCELERATOR='cuda','cpu' or 'xpu'

* lazy init class_dict in CUDA_Accelerator to avoid cyclic initialization of CUDA_Accelerator

* put short path in the beginning in real_accelerator.py

* device_count return number of NUMA nodes

* fix typo

* install numactl in cpu workflow

* Follow comments

* Better implementation of device_count() and current_device()

* remove dependency_link for Intel Extension for DeepSpeed

* use check is_synchronized_device in timer only once

* remove env mapping WA in cpu_accelerator

* fix duplicate definition

* fix format error

* refine ccl backend selection

* move comments to the right place

* remove prefer_deepspeed_comm, use CCLBackend by default

* refractor fallback path

* Fix execution failure in kernel injection path

* do not refractory kernel injection fallback path in  residual_add because it contains function call with side-effect

* guard residual_add fallback path with environ DS_KI_FALLBACK=True

* fix format error

* add test for allreduce on CPU workflow

* fix format error

* Fallback to TorchBackend if CCLBackend kernel are not implemented

* Update Intel Extension for Pytorch installation link

* Don't specify version number of Intel Extension for PyTorch

* install oneCCL for CCLBackend

* fix link path for CPU comm kernels

* fix source oneCCL environment

* source oneCCL env before run UT

* Give more specific instruction when CCL_ROOT not defined

---------
Signed-off-by: NCao, Zhong Z <zhong.z.cao@intel.com>
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Nsdp <sdp@aia-sdp-spr-108864.jf.intel.com>
Co-authored-by: NCao, Zhong Z <zhong.z.cao@intel.com>
Co-authored-by: NZhenhuan Chen <zhenhuan.chen@intel.com>
Co-authored-by: Nbaodii <di.bao@intel.com>
Co-authored-by: NWang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Njianan-gu <jianan.gu@intel.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: NLogan Adams <114770087+loadams@users.noreply.github.com>

1f72082f

10 5月, 2023 1 次提交

fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy()... · b31b46c0

由 Wang, Yi 提交于 5月 10, 2023

fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP (#3457)

* add UT case for shard checkpoint loading in AutoTP
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* autoTP path also support shard loading
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

b31b46c0

05 5月, 2023 1 次提交

add sharded checkpoint loading for AutoTP path to reduce the peak mem… (#3102) · d10b8ca0

由 Wang, Yi 提交于 5月 05, 2023

* add sharded checkpoint loading for AutoTP path to reduce the peak memory in initialization stage
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

* fix gptj sharded checkpoint loading problem
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>

---------
Signed-off-by: NWang, Yi A <yi.a.wang@intel.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

d10b8ca0

04 5月, 2023 1 次提交
- C
  Hybrid Engine Refactor and Llama Inference Support (#3425) · 0a61d5d6
  由 Connor Holmes 提交于 5月 03, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  0a61d5d6
02 5月, 2023 1 次提交
- R
  Add HE support for the rest of model containers (#3191) · 3e856464
  由 Reza Yazdani 提交于 5月 01, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  3e856464
22 4月, 2023 1 次提交
- M
  Diffusers 0.15.0 bug fix (#3345) · 496a9a3a
  由 Molly Smith 提交于 4月 21, 2023
```
* diffusers 0.15.0 cross attention class check

* revert diffusers_attention.py
```
  496a9a3a
12 4月, 2023 1 次提交

DeepSpeed Chat (#3186) · 47f9f13b

由 Olatunji Ruwase 提交于 4月 11, 2023

Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: Nyaozhewei <zheweiy@berkeley.edu>
Co-authored-by: NAmmar Ahmad Awan <ammar.awan@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>
Co-authored-by: NLok Chand Koppaka <lokoppak@microsoft.com>
Co-authored-by: NMasahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

47f9f13b

31 3月, 2023 1 次提交
- M
  Update DeepSpeed copyright license to Apache 2.0 (#3111) · b361c727
  由 Michael Wyatt 提交于 3月 30, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  b361c727
27 3月, 2023 1 次提交
- J
  
  update formatter version and style settings (#3098) · 91d63e02
  由 Jeff Rasley 提交于 3月 27, 2023
  
  91d63e02
22 3月, 2023 1 次提交
- M
  Assert mp_size is factor of model dimensions (#2891) · 9ea0fdc2
  由 Molly Smith 提交于 3月 21, 2023
```
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
```
  9ea0fdc2
28 2月, 2023 2 次提交

using container when loading inference checkpoints (#2875) · dc01cee5

由 Heyang Qin 提交于 2月 28, 2023

This PR updates the replace_fn function when loading inference checkpoints. The container will now be passed to the load_model_with_checkpoint() so we can call load_params() from there. load_params() is also updated to access the variables in the policy.

dc01cee5

add missing license info to top of all source code (#2889) · da84e60d

由 Jeff Rasley 提交于 2月 27, 2023

Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: NConglong Li <conglong.li@gmail.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

da84e60d

17 2月, 2023 1 次提交

Port Reza's INT8-quantization fix to container architecture (#2725) · fd1449c7

由 Lev Kurilenko 提交于 2月 16, 2023

Co-authored-by: NReza Yazdani <reyazda@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>
Co-authored-by: NHeyang Qin <heyangqin@microsoft.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

fd1449c7

16 2月, 2023 1 次提交

Fix auto TP for duplicate modules with different gems (#2784) · 46784cb5

由 Molly Smith 提交于 2月 15, 2023

* Fix auto TP for duplicate modules with different gems

* precommit and comments

* Comment

* Combine gem list of same named modules

* remove duplicates from gem_list before updating policy

* Add module attribute with name variation for ProphetNet

---------
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

46784cb5

08 2月, 2023 1 次提交

Add container load checkpoint error reporting + refactor (#2792) · 10f3c301

由 Lev Kurilenko 提交于 2月 07, 2023

This PR refactors the organization of meta tensor checkpoint loading as follows:

- Move get_param_names() abstract method definition from TransformerPolicy into MetaTensorContainer
- Model-specific get_param_names() definitions moved from policy into model-specific container
- selected_policy_g, megatron_v2_g, and transformer_config_g globals replaced with a single container_g global, since the container will contain all of the information those globals previously captured
- ckpt_load_enabled flag added to containers that's set to False by default in the base.py container and gets set to True when the MetaTensorContainer feature is inherited
- Assertion added to replace_transformer_layer before performing checkpoint loading to check if ckpt_load_enabled ==True, otherwise an error message will be printed saying that the container does not support meta tensor checkpoint loading.

The aim of these changes is to more closely couple meta tensor checkpoint loading code to the MetaTensorContainer and to allow for better error reporting of load checkpoint use on model types that don't support this feature.

10f3c301

04 2月, 2023 2 次提交

Container param cleanup + remove qkv_merging (#2780) · 0a73e6e6

由 Lev Kurilenko 提交于 2月 03, 2023

This PR cleans up some container items and removes an unused qkv_merging parameter:

- Remove qkv_merging=True from BERT containers
- Change containers config object to ds_model_config
- Remove qkv_merging param

0a73e6e6

Reset KV-cache at the beginning of text-generation (#2669) · 9f41ffe4

由 Reza Yazdani 提交于 2月 03, 2023

Co-authored-by: NMartin Cai <martincai@users.noreply.github.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

9f41ffe4

26 1月, 2023 1 次提交

Abstract accelerator (step 3) (#2677) · 98cc35b6

由 Ma, Guokai 提交于 1月 26, 2023

* Integrate accelerator abstraction interface into deepspeed/

* Fix error message in fp16/fused_optimizer

* fix error message in fp16/unfused_optimizer.py

* assign get_accelerator().pin_memory() result to input Tensor name

* no need to check cuda and whether nvtx supported

* move try-except into inner most block

* call Event() and Stream() in get_accelerator() for data type

* Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed

* Apply op_builder backend api change from #2705 from @jeffra

* fix tests where Builder NAME is used

* keep original ...Builder.NAME interface instead of ...Builder().NAME interface

* fix builder closure for installation

* fix randomltd builder

* add comments to clarify create_op_builder and get_op_builder

* fix compatibility with pip install -e
Co-authored-by: NCheng Li <pistasable@gmail.com>
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>

98cc35b6

20 1月, 2023 1 次提交

Inference Refactor (replace_with_policy, model_implementations) (#2554) · 867da307

由 Ammar Ahmad Awan 提交于 1月 19, 2023

Co-authored-by: NLev Kurilenko <lekurile@microsoft.com>
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>

867da307

29 12月, 2022 1 次提交
- J
  
  tweaks to ds-attn, distilbert policy, and mup (#2649) · d9b788d7
  由 Jeff Rasley 提交于 12月 28, 2022
  
  d9b788d7
23 12月, 2022 1 次提交
- J
  
  Fix issue w. bloom when changing tp size (#2645) · e0aa84c5
  由 Jeff Rasley 提交于 12月 22, 2022
  
  e0aa84c5
17 12月, 2022 1 次提交

Remove GatheredParameters context from replace_with_policy (#2591) · 503706ac

由 Lev Kurilenko 提交于 12月 16, 2022

This PR removes the zero-infernece GatheredParameters context from replace_with_policy due to no longer needing zero-inference after the introduction of meta tensor support for BLOOM.

503706ac

10 12月, 2022 1 次提交
- J
  
  Fix issues w. python 3.6 + add py-version checks to CI (#2589) · 35eabb0a
  由 Jeff Rasley 提交于 12月 09, 2022
  
  35eabb0a
09 12月, 2022 1 次提交
- M
  Add checkpoint sharding unit tests (#2561) · ccb8eb81
  由 Michael Wyatt 提交于 12月 08, 2022
```
* added checkpopint sharding tests
```
  ccb8eb81
07 12月, 2022 1 次提交

Fix quantized-inference & Add generic support of checkpoint loading (#2547) · 35b350b2

由 Reza Yazdani 提交于 12月 06, 2022

* fix checkpoint loading when it is a dictionary

* fix some issues with saving ckpt & int8 inference

* fix quantized-inference & add generic support of checkpoint loading

* remove int8 hard-coded flag

* fix mlp return tensors

* fix several issue to load checkpoints of GPT-J, GPT-NEOX, and OPT with different TP-size

* add more comments & description for checkpoint-loading module
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

35b350b2

24 11月, 2022 1 次提交

Pass down the new DS inference config to replace_transformer_layer. (#2539) · 90ae6884

由 Ammar Ahmad Awan 提交于 11月 23, 2022

* pass down the new DS inference config to replace_transformer_layer.

* remove quantize_settings and rename the ep_mp_group.

* Fix model_config passing. Fixes gptj issue with wrong output.

* fix small bug in gpt-neo.

Co-authored-by: Reza Yazdani and Michael Wyatt

90ae6884

15 11月, 2022 1 次提交

DeepSpeed inference config. (#2459) (#2472) · b5d18a6a

由 Ammar Ahmad Awan 提交于 11月 14, 2022

Changes to inference API to use accept a config dict and cleaning up Inference Engine to utilize the newly added inference config.
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>

b5d18a6a

12 11月, 2022 1 次提交
- L
  Make data contiguous before the inplace reshape-copy_ function (#2489) · f2710bbe
  由 lokoppakmsft 提交于 11月 11, 2022
```
Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
```
  f2710bbe
10 11月, 2022 2 次提交

Stable Diffusion Enhancements (#2491) · e7e75955

由 Connor Holmes 提交于 11月 09, 2022

Co-authored-by: Ncmikeh2 <connorholmes@microsoft.com>
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
Co-authored-by: NReza Yazdani <reyazda@microsoft.com>

e7e75955

Add `scale_attn_by_inverse_layer_idx` feature (#2486) · 6f77da1b

由 Kevin Ko 提交于 11月 10, 2022

* Add scale_attn_by_inverse_layer_idx feature

* Fix layer_id bug

* Fix scaling value
Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>
Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com>

6f77da1b

Greenplum / DeepSpeed 上一次同步 1 年多

Greenplum / DeepSpeed
上一次同步 1 年多