提交 6cd5f87b 编写于 作者: J Jeff Rasley

[docs] update moe features and news post

上级 058ab819
......@@ -153,7 +153,7 @@ overview](https://www.deepspeed.ai/features/) for descriptions and usage.
* Stable and 2.6x faster GPT-2 pre-training with 8x/4x larger batch size/learning rate while maintaining token-wise convergence speed
* Complementary to many other DeepSpeed features
* [Performance Analysis and Debugging](https://www.deepspeed.ai/features/#performance-analysis-and-debugging)
* [Mixture of Experts (MoE)](https://www.deepspeed.ai/tutorials/mixture-of-experts/)
# Further Reading
......
---
layout: single
title: "DeepSpeed powers 8x larger MoE model training with high performance"
excerpt: ""
categories: news
link: https://www.microsoft.com/en-us/research/blog/deepspeed-powers-8x-larger-moe-model-training-with-high-performance/
new_post: true
date: 2021-08-18 00:00:00
---
......@@ -209,7 +209,7 @@ Below we provide a brief feature list, see our detailed [feature overview](https
* Efficient and robust compressed training
* Up to 2.5x convergence speedup for pre-training
* [Performance Analysis and Debugging](https://www.deepspeed.ai/features/#performance-analysis-and-debugging)
* [Mixture of Experts (MoE)](https://www.deepspeed.ai/tutorials/mixture-of-experts/)
# Contributing
DeepSpeed welcomes your contributions! Please see our
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册