DeepSpeed 0.1.0 Release Notes

Features

  • Distributed Training with Mixed Precision
    • 16-bit mixed precision
    • Single-GPU/Multi-GPU/Multi-Node
  • Model Parallelism
    • Support for Custom Model Parallelism
    • Integration with Megatron-LM
  • Memory and Bandwidth Optimizations
    • Zero Redundancy Optimizer (ZeRO) stage 1 with all-reduce
    • Constant Buffer Optimization (CBO)
    • Smart Gradient Accumulation
  • Training Features
    • Simplified training API
    • Gradient Clipping
    • Automatic loss scaling with mixed precision
  • Training Optimizers
    • Fused Adam optimizer and arbitrary torch.optim.Optimizer
    • Memory bandwidth optimized FP16 Optimizer
    • Large Batch Training with LAMB Optimizer
    • Memory efficient Training with ZeRO Optimizer
  • Training Agnostic Checkpointing
  • Advanced Parameter Search
    • Learning Rate Range Test
    • 1Cycle Learning Rate Schedule
  • Simplified Data Loader
  • Performance Analysis and Debugging

项目简介

当前项目暂无项目简介

发行版本 51

v0.9.5: Patch release

全部发行版

贡献者 225

全部贡献者

开发语言

  • Python 78.4 %
  • C++ 10.5 %
  • Cuda 10.2 %
  • Shell 0.6 %
  • Dockerfile 0.2 %