- 08 2月, 2023 2 次提交
-
-
由 Lev Kurilenko 提交于
This PR refactors the organization of meta tensor checkpoint loading as follows: - Move get_param_names() abstract method definition from TransformerPolicy into MetaTensorContainer - Model-specific get_param_names() definitions moved from policy into model-specific container - selected_policy_g, megatron_v2_g, and transformer_config_g globals replaced with a single container_g global, since the container will contain all of the information those globals previously captured - ckpt_load_enabled flag added to containers that's set to False by default in the base.py container and gets set to True when the MetaTensorContainer feature is inherited - Assertion added to replace_transformer_layer before performing checkpoint loading to check if ckpt_load_enabled ==True, otherwise an error message will be printed saying that the container does not support meta tensor checkpoint loading. The aim of these changes is to more closely couple meta tensor checkpoint loading code to the MetaTensorContainer and to allow for better error reporting of load checkpoint use on model types that don't support this feature.
-
由 Olatunji Ruwase 提交于
* Enable page-locked memory in cpu only env * Enable page-locked memory in cpu only env * Formatting * Add TODOs; Release page-locked memory * Update perf microbenchmark; Reduce unit test memory * Reduce CI mem usage
-
- 07 2月, 2023 2 次提交
-
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Razvan Tanase 提交于
-
- 05 2月, 2023 1 次提交
-
-
由 Olatunji Ruwase 提交于
* Common location to install libaio-dev * Update .github/workflows/setup-venv/action.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> --------- Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
- 04 2月, 2023 2 次提交
-
-
由 Lev Kurilenko 提交于
This PR cleans up some container items and removes an unused qkv_merging parameter: - Remove qkv_merging=True from BERT containers - Change containers config object to ds_model_config - Remove qkv_merging param
-
由 Reza Yazdani 提交于
Co-authored-by: NMartin Cai <martincai@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 03 2月, 2023 2 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Reza Yazdani 提交于
* Reset KV-cache at the beginning of text-generation * Pass the ckpt-loading arguments to work with meta-tensor * remove unrelated changes
-
- 02 2月, 2023 3 次提交
-
-
由 Molly Smith 提交于
-
由 Cheng Li 提交于
* fix upsample flops compute by skipping unused kargs * fix format
-
由 swli 提交于
* bugs in profiler: 1. Tensor.bmm missed in _patch_tensor_methods function 2. missed funtions in _reload_functionals and _reload_tensor_methods functions 3. torch.mm and torch.Tensor.mm will have same __name__ in wrapFunc, my suggustion is use __str__ instead. * formatting --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NCheng Li <pistasable@gmail.com>
-
- 01 2月, 2023 4 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Carlos Mocholí 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Michael Wyatt 提交于
* pydantify monitoring configs --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Logan Adams 提交于
* Remove hardcoded instances to fp16 in log messages. * Add model_dtype to print the correct format * Respond to PR feedback --------- Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 31 1月, 2023 2 次提交
-
-
由 cassieesvelt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Bing Xie 提交于
* modify engine.py for formatting * commit formatting changes on engine.py
-
- 29 1月, 2023 1 次提交
-
-
由 Connor Holmes 提交于
-
- 27 1月, 2023 5 次提交
-
-
由 Lev Kurilenko 提交于
This PR adds a torch version check in the test_bias_gelu unit test to skip if the torch version < 1.12. This is due to gelu implementation differences in versions prior to 1.12.
-
由 Reza Yazdani 提交于
* Reset KV-cache at the beginning of text-generation * Add new backward kernel to handle large softmax-length * remove unrelated changes Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NConnor Holmes <connorholmes@microsoft.com>
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Dashiell Stander 提交于
* Fix how autotuning reports TFLOPS so that they are reported in FLOPS per second, not millisecond Co-authored-by: NNick Sarkauskas <nsarka00@gmail.com> Co-authored-by: NQuentin Anthony <anthony.301@osu.edu> Signed-off-by: NDashiell Stander <dstander@protonmail.com> * Actually it is microseconds -> seconds Signed-off-by: NDashiell Stander <dstander@protonmail.com> * Actually it is microseconds -> seconds Signed-off-by: NDashiell Stander <dstander@protonmail.com> Signed-off-by: NDashiell Stander <dstander@protonmail.com> Co-authored-by: NNick Sarkauskas <nsarka00@gmail.com> Co-authored-by: NQuentin Anthony <anthony.301@osu.edu>
-
- 26 1月, 2023 2 次提交
-
-
由 Ma, Guokai 提交于
* Integrate accelerator abstraction interface into deepspeed/ * Fix error message in fp16/fused_optimizer * fix error message in fp16/unfused_optimizer.py * assign get_accelerator().pin_memory() result to input Tensor name * no need to check cuda and whether nvtx supported * move try-except into inner most block * call Event() and Stream() in get_accelerator() for data type * Make Stream and Event as properties of abstract interface so they can be used as data type in deepspeed * Apply op_builder backend api change from #2705 from @jeffra * fix tests where Builder NAME is used * keep original ...Builder.NAME interface instead of ...Builder().NAME interface * fix builder closure for installation * fix randomltd builder * add comments to clarify create_op_builder and get_op_builder * fix compatibility with pip install -e Co-authored-by: NCheng Li <pistasable@gmail.com> Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Stas Bekman 提交于
* [GatheredParameters] fix memory leak * simplify * cleanup and move * style * Formatting * fix test * fix test * fix test take 2 * Trigger CI Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com> Co-authored-by: NJoe Mayer <114769929+jomayeri@users.noreply.github.com>
-
- 25 1月, 2023 3 次提交
-
-
由 Joe Mayer 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Molly Smith 提交于
* loop through pipe.model * tp_parser first draft * client_module must be type object * Simplify layernorm tracking. Add unittest. * cleanup * Add more models to unittest * cleanup inference pytest for merging * Add unittest * cleanup * pre-commit * unittest id and pytest marker * try marian for unittest * precommit * Move tp code to seperate file * Add new auto tp file * pre-commit and type * Update deepspeed/module_inject/auto_tp.py Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Update deepspeed/module_inject/auto_tp.py Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Update tests/unit/inference/test_inference.py Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * remove unused fillmask function Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 loadams 提交于
-
- 20 1月, 2023 1 次提交
-
-
由 Ammar Ahmad Awan 提交于
Co-authored-by: NLev Kurilenko <lekurile@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
- 19 1月, 2023 4 次提交
-
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Joe Mayer 提交于
* BF16 optimizer only with ZeRO stage 1. * Updating to grad accum of fp32 for BF16 ZeRO1 case. Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
由 Michael Wyatt 提交于
Co-authored-by: NJeff Rasley <jerasley@microsoft.com>
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-
- 18 1月, 2023 6 次提交
-
-
由 Michael Wyatt 提交于
-
由 Olatunji Ruwase 提交于
* CPU-Adam: add compile-flag to enable param-copy from CPU to GPU * guarde the CUDA-related include files and variables * remove CUDA dependency from op_builder when building against CPU * fixing the builder issues * fix formatting * return true when there is no mismatch on the cuda version * guard for when cuda is not available & test with cpu-only environment * Update cpu_adam and cpu_adagrad * Format fixes * Add configurable half precision type; Build/run in CUDA environment * Run cpu_adam and cpu_adagrad in cpu only environment * Mark CUDA only unit tests * CPU environment CI * Format fixes * Remove --forked * Add --forked * CPU only CI should pass * Format fixes * Format fixes * Remove scattered pytest.skip * Fix cpu_adam unit test * Update .github/workflows/nv-torch-latest-cpu.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Update .github/workflows/nv-torch-latest-cpu.yml Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com> * Address PR feedback * OpenMP linking * Fix unit tests Co-authored-by: NReza Yazdani <reyazda@microsoft.com> Co-authored-by: NReza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: NJeff Rasley <jerasley@microsoft.com> Co-authored-by: NMichael Wyatt <michaelwyatt@microsoft.com>
-
由 Jeff Rasley 提交于
-
由 Olatunji Ruwase 提交于
-
由 Jeff Rasley 提交于
-
由 Jeff Rasley 提交于
Co-authored-by: NOlatunji Ruwase <olruwase@microsoft.com>
-