What's Changed
- Enable auto TP policy for llama model by @jianan-gu in https://github.com/microsoft/DeepSpeed/pull/3170
- Allow users to use mis-matched CUDA versions by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3436
- Hybrid Engine Refactor and Llama Inference Support by @cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/3425
- add sharded checkpoint loading for AutoTP path to reduce the peak mem… by @sywangyi in https://github.com/microsoft/DeepSpeed/pull/3102
- launcher/multinode_runner.py: mapping env variables by @YizhouZ in https://github.com/microsoft/DeepSpeed/pull/3372
- Update automatic-tensor-parallelism.md by @sywangyi in https://github.com/microsoft/DeepSpeed/pull/3198
- Build: Update license in setup by @PabloEmidio in https://github.com/microsoft/DeepSpeed/pull/3484
- Doc corrections by @goodship1 in https://github.com/microsoft/DeepSpeed/pull/3435
- Fix spelling errors in comments and documents by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3486
- Fix spelling error in function GetMaxTokenLength() by @luliyucoordinate in https://github.com/microsoft/DeepSpeed/pull/3482
- Fix a type error on bf16+Pipeline Parallelism by @ys950902 in https://github.com/microsoft/DeepSpeed/pull/3441
- Fix spelling errors in DeepSpeed codebase by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3494
- fix spelling error with docs/index.md by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3443
- delete the line to keep user_zero_stages by @MrZhengXin in https://github.com/microsoft/DeepSpeed/pull/3473
- Update Inference Engine checkpoint loading + meta tensor assertions by @lekurile in https://github.com/microsoft/DeepSpeed/pull/2940
- fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP by @sywangyi in https://github.com/microsoft/DeepSpeed/pull/3457
- Add snip_momentum structured pruning which supports higher sparse ratio by @ftian1 in https://github.com/microsoft/DeepSpeed/pull/3300
- Update README.md by @goodship1 in https://github.com/microsoft/DeepSpeed/pull/3504
- Hybrid Engine Fix Llama by @lekurile in https://github.com/microsoft/DeepSpeed/pull/3505
- fix spelling error with deepspeed/runtime/ by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3509
- Skip autoTP if tp_size is 1 by @molly-smith in https://github.com/microsoft/DeepSpeed/pull/3449
- Changing monitor loss to aggregate loss over gradient accumulation steps by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/3428
- change actions/checkout@v2 to v3 by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3526
- fix typo with docs/ by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3523
- Doc updates by @goodship1 in https://github.com/microsoft/DeepSpeed/pull/3520
- Fix bug in Hybrid Engine by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3497
- Fix wrong passing of offload_optimizer_config to DeepSpeedZeRoOffload by @mmhab in https://github.com/microsoft/DeepSpeed/pull/3420
- Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 by @YizhouZ in https://github.com/microsoft/DeepSpeed/pull/2999
- share inflight registry between PartitionedParameterCoordinators by @HeyangQin in https://github.com/microsoft/DeepSpeed/pull/3462
- Syncing FusedAdam with new Apex features by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/3434
- fix typo in comments with deepspeed/ by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3537
- [ROCm] Hip headers fix by @rraminen in https://github.com/microsoft/DeepSpeed/pull/3532
- [CPU] Support Intel CPU inference by @delock in https://github.com/microsoft/DeepSpeed/pull/3041
- Clone tensors to avoid torch.save bloat by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/3348
- Fix attribute error when loading FusedAdamBuilder() by @rraminen in https://github.com/microsoft/DeepSpeed/pull/3527
- fix typo by @inkcherry in https://github.com/microsoft/DeepSpeed/pull/3559
- Fixing bf16 test by @jomayeri in https://github.com/microsoft/DeepSpeed/pull/3551
- Fix Hybrid Engine for BLOOM by @lekurile in https://github.com/microsoft/DeepSpeed/pull/3580
- Fix op_builder against PyTorch nightly by @malfet in https://github.com/microsoft/DeepSpeed/pull/3596
- data efficiency bug fix, avoid invalid range step size by @conglongli in https://github.com/microsoft/DeepSpeed/pull/3609
- DS init should not broadcast or move zero.Init models by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/3611
- Expose Consecutive Hysteresis to Users by @Quentin-Anthony in https://github.com/microsoft/DeepSpeed/pull/3553
- Align InferenceEngine to store ms in _model_times by @HolyFalafel in https://github.com/microsoft/DeepSpeed/pull/3501
- AISC launcher fixes by @jeffra in https://github.com/microsoft/DeepSpeed/pull/3637
- stage3.py: do not scale if gradient_predivide_factor is 1.0 by @guoyejun in https://github.com/microsoft/DeepSpeed/pull/3630
- Add Ascend NPU accelerator support by @CurryRice233 in https://github.com/microsoft/DeepSpeed/pull/3595
- Skip tests on docs-only changes by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3651
- Update megatron.md by @wjessup in https://github.com/microsoft/DeepSpeed/pull/3641
- Typo Correction by @MicahZoltu in https://github.com/microsoft/DeepSpeed/pull/3621
- deepspeed/comm/comm.py: fix typo of warning message by @guoyejun in https://github.com/microsoft/DeepSpeed/pull/3636
- Fix RuntimeError when using ZeRO Stage3 with mpu: #3564 by @eggiter in https://github.com/microsoft/DeepSpeed/pull/3565
- Allow dict datatype for checkpoints (inference) by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3007
- fix typo with deepspeed/ by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3547
- flops_profiler: add option recompute_fwd_factor for the case of activation c… by @guoyejun in https://github.com/microsoft/DeepSpeed/pull/3362
- fix typo deepspeed/runtime by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3663
- Refactor check_enabled root validator in DeepSpeedMonitorConfig by @bgr8 in https://github.com/microsoft/DeepSpeed/pull/3616
New Contributors
- @jianan-gu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3170
- @YizhouZ made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3372
- @PabloEmidio made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3484
- @luliyucoordinate made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3482
- @ys950902 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3441
- @MrZhengXin made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3473
- @ftian1 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3300
- @mmhab made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3420
- @malfet made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3596
- @HolyFalafel made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3501
- @CurryRice233 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3595
- @wjessup made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3641
- @MicahZoltu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3621
- @eggiter made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3565
- @bgr8 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3616
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.9.2...v0.9.3