What's Changed
- [MiCS] [Fix] saving and loading model checkpoint logic for MiCS sharding by @zarzen in https://github.com/microsoft/DeepSpeed/pull/3440
- fix some typo by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3675
- Use logger in accelerator by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/3682
- Update README to add ICS'23 paper on Tensor Parallel MoEs by @siddharth9820 in https://github.com/microsoft/DeepSpeed/pull/3687
- non-JIT build fix on ROCm by @rraminen in https://github.com/microsoft/DeepSpeed/pull/3638
- Fix local rank mismatch error when training on nodes with different number of GPUs by @byungsoo-oh in https://github.com/microsoft/DeepSpeed/pull/3409
- Correct world_size/backend for mpi by @abhilash1910 in https://github.com/microsoft/DeepSpeed/pull/3694
- Fix incorrectly formatted f string in hostfile checking by @loadams in https://github.com/microsoft/DeepSpeed/pull/3698
- fix typo name of hybrid engine func by @tensor-tang in https://github.com/microsoft/DeepSpeed/pull/3689
- Revert "fix typo name (#3689)" by @loadams in https://github.com/microsoft/DeepSpeed/pull/3702
- Fix gpt-j inference issue by @RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/3639
- change partititon_name to partition_name by @digger-yu in https://github.com/microsoft/DeepSpeed/pull/3700
- Fix unit test typo in tests/unit/ops/transformer/inference by @mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3697
- Small tweak on cuda version mismatch documentation by @jli in https://github.com/microsoft/DeepSpeed/pull/3706
- DeepSpeed overview in Japanese by @conglongli in https://github.com/microsoft/DeepSpeed/pull/3709
- zero3 performance optimizations by @hablb in https://github.com/microsoft/DeepSpeed/pull/3622
- Fix typo in name of hybrid engine function by @loadams in https://github.com/microsoft/DeepSpeed/pull/3704
- Increase tensor creator coverage by @tjruwase in https://github.com/microsoft/DeepSpeed/pull/3684
- [Bugfix][CPU] Remove C++ version in CPU OpBuilder by @delock in https://github.com/microsoft/DeepSpeed/pull/3643
- Single Node is using unreferenced pdsh kill cmd while terminating by @abhilash1910 in https://github.com/microsoft/DeepSpeed/pull/3730
- Update Dockerfile with newer cuda and torch. by @loadams in https://github.com/microsoft/DeepSpeed/pull/3716
New Contributors
- @byungsoo-oh made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3409
- @abhilash1910 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3694
- @tensor-tang made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3689
- @jli made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3706
Full Changelog: https://github.com/microsoft/DeepSpeed/compare/v0.9.3...v0.9.4