What's Changed
- CUDA optional deepspeed ops by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/2507
- Remove CI trigger for push to master by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2712
- [install] only add deepspeed pkg at install by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2714
- Fix nightly tests for new lm-eval release by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2713
- BF16 optimizer for BF16+ZeRO Stage 1 by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/2706
- Fix typo in diffusers transformer block by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2718
- Inference Refactor (replace_with_policy, model_implementations) by @awan-10 in https://github.com/microsoft/DeepSpeed/pull/2554
- Change zero_grad() argument to match pytorch by @loadams in https://github.com/microsoft/DeepSpeed/pull/2741
- Automatic tensor parallelism v2 by @molly-smith in https://github.com/microsoft/DeepSpeed/pull/2670
- Fixing Optimizer Sanity Check by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/2742
- [GatheredParameters] fix memory leak by @stas00 in https://github.com/microsoft/DeepSpeed/pull/2665
- Abstract accelerator (step 3) by @delock in https://github.com/microsoft/DeepSpeed/pull/2677
- Fix autotuning so that it records Floating Point Operations per second, not microsecond by @dashstander in https://github.com/microsoft/DeepSpeed/pull/2711
- fix a misspelled attribute by @stas00 in https://github.com/microsoft/DeepSpeed/pull/2750
- [zero] remove misleading dtype log by @jeffra in https://github.com/microsoft/DeepSpeed/pull/2732
- Fix softmax backward by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2709
- Skip test_bias_gelu unit test if torch < 1.12 by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2754
- Conditionally Make Op Building More Verbose by @cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/2759
- Bing/formatting correction by @xiexbing in https://github.com/microsoft/DeepSpeed/pull/2764
- Add links to new azureML examples by @cassieesvelt in https://github.com/microsoft/DeepSpeed/pull/2756
- Fix hardcoded instances to fp16 in optimizer creation log messages to the correct dtype. by @loadams in https://github.com/microsoft/DeepSpeed/pull/2743
- Refactor/Pydantify monitoring config by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2640
- Pin minimum
packaging
requirement by @carmocca in https://github.com/microsoft/DeepSpeed/pull/2771 - Fix for diffusers v0.12.0 by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2753
- some fix in flops_profiler by @lucasleesw in https://github.com/microsoft/DeepSpeed/pull/2068
- fix upsample flops compute by skipping unused kargs by @cli99 in https://github.com/microsoft/DeepSpeed/pull/2773
- Fix broken kernel inject bug by @molly-smith in https://github.com/microsoft/DeepSpeed/pull/2776
- Fix Checkpoint-loading with Meta-tensor by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2781
- Add hjson support for user configs by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2783
- Reset KV-cache at the beginning of text-generation by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2669
- Container param cleanup + remove qkv_merging by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2780
- Common location to install libaio-dev by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/2779
- Fixing broken link to azureml-examples recipes by @rtanase in https://github.com/microsoft/DeepSpeed/pull/2795
- remove outdated comment by @stas00 in https://github.com/microsoft/DeepSpeed/pull/2786
- Enable page-locked tensors without CUDA by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/2775
- Add container load checkpoint error reporting + refactor by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2792
- Add user defined launcher args for PDSH launcher by @loadams in https://github.com/microsoft/DeepSpeed/pull/2804
- Fix Slurm launcher user args by @loadams in https://github.com/microsoft/DeepSpeed/pull/2806
- Handle hanged tests in CI by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2808
- Fix inference CI device error by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2824
- Fix permissions issue with pip upgrade by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2823
- Fix cpu-only CI hangs by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2825
- Fix Pipeline Parallel resize unit test by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2833
- Fix auto TP for duplicate modules with different gems by @molly-smith in https://github.com/microsoft/DeepSpeed/pull/2784
- Refactor DS inference API. No longer need replace_method. by @awan-10 in https://github.com/microsoft/DeepSpeed/pull/2831
- Port Reza's INT8-quantization fix to container architecture by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2725
- Fix gpt-Neox rotary embedding implementation by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2782
- Fix for CI failure on system upgrade by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2849
New Contributors
- @loadams made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2741
- @xiexbing made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2764
- @carmocca made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2771
- @lucasleesw made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2068
- @rtanase made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2795
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.8.0...v0.8.1