未验证 提交 085981bf 编写于 作者: C Conglong Li 提交者: GitHub

add deepspeed chat blog links, add tags (#3369)

上级 3031eec4
...@@ -3,5 +3,5 @@ title: "ZeRO & DeepSpeed: New system optimizations enable training models with o ...@@ -3,5 +3,5 @@ title: "ZeRO & DeepSpeed: New system optimizations enable training models with o
date: 2020-02-13 date: 2020-02-13
link: https://www.microsoft.com/en-us/research/blog/ZeRO-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/ link: https://www.microsoft.com/en-us/research/blog/ZeRO-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
excerpt: "" excerpt: ""
tags: training ZeRO tags: training ZeRO English
--- ---
...@@ -3,5 +3,5 @@ title: "Turing-NLG: A 17-billion-parameter language model by Microsoft" ...@@ -3,5 +3,5 @@ title: "Turing-NLG: A 17-billion-parameter language model by Microsoft"
date: 2020-02-13 date: 2020-02-13
link: https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/ link: https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
excerpt: "DeepSpeed was used to train the world's largest language model." excerpt: "DeepSpeed was used to train the world's largest language model."
tags: training tags: training English
--- ---
--- ---
title: "ZeRO stage 1 with reduced communication" title: "ZeRO stage 1 with reduced communication"
sneak_preview: true sneak_preview: true
tags: training ZeRO English
excerpt: "Partition-aware ZeRO with up to 2x reduction in communication time!" excerpt: "Partition-aware ZeRO with up to 2x reduction in communication time!"
--- ---
......
--- ---
title: "The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels" title: "The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels"
excerpt: "" excerpt: ""
tags: training
date: 2020-05-19 00:00:00 date: 2020-05-19 00:00:00
toc: false toc: false
tags: training tags: training English
--- ---
We introduce new technology to accelerate single GPU performance via kernel We introduce new technology to accelerate single GPU performance via kernel
......
...@@ -2,6 +2,6 @@ ...@@ -2,6 +2,6 @@
title: "ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale" title: "ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale"
excerpt: "" excerpt: ""
link: https://www.microsoft.com/en-us/research/blog/ZeRO-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/ link: https://www.microsoft.com/en-us/research/blog/ZeRO-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/
tags: training ZeRO tags: training ZeRO English
date: 2020-05-19 02:00:00 date: 2020-05-19 02:00:00
--- ---
--- ---
title: "An Order-of-Magnitude Larger and Faster Training with ZeRO-2" title: "An Order-of-Magnitude Larger and Faster Training with ZeRO-2"
excerpt: "" excerpt: ""
tags: training ZeRO tags: training ZeRO English
date: 2020-05-19 01:00:00 date: 2020-05-19 01:00:00
toc: false toc: false
--- ---
......
--- ---
title: "Microsoft DeepSpeed achieves the fastest BERT training time" title: "Microsoft DeepSpeed achieves the fastest BERT training time"
excerpt: "" excerpt: ""
tags: training tags: training English
date: 2020-05-28 00:00:00 date: 2020-05-28 00:00:00
--- ---
......
--- ---
title: "DeepSpeed Microsoft Research Webinar on August 6th, 2020" title: "DeepSpeed Microsoft Research Webinar on August 6th, 2020"
excerpt: "" excerpt: ""
tags: presentations tags: presentations English
link: https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html link: https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html
image: /assets/images/webinar-aug2020.png image: /assets/images/webinar-aug2020.png
date: 2020-07-24 00:00:00 date: 2020-07-24 00:00:00
......
--- ---
title: "DeepSpeed Microsoft Research Webinar is now on-demand" title: "DeepSpeed Microsoft Research Webinar is now on-demand"
excerpt: "" excerpt: ""
tags: presentations tags: presentations English
link: https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html link: https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html
date: 2020-08-07 00:00:00 date: 2020-08-07 00:00:00
--- ---
--- ---
title: "Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention" title: "Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention"
excerpt: "" excerpt: ""
tags: training tags: training English
date: 2020-09-09 00:00:00 date: 2020-09-09 00:00:00
toc: false toc: false
--- ---
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "10x bigger model training on a single GPU with ZeRO-Offload" title: "10x bigger model training on a single GPU with ZeRO-Offload"
excerpt: "" excerpt: ""
date: 2020-09-09 00:00:00 date: 2020-09-09 00:00:00
tags: training ZeRO tags: training ZeRO English
toc: false toc: false
--- ---
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed with 1-bit Adam: 5x less communication and 3.4x faster training" title: "DeepSpeed with 1-bit Adam: 5x less communication and 3.4x faster training"
excerpt: "" excerpt: ""
date: 2020-09-09 00:00:00 date: 2020-09-09 00:00:00
tags: training tags: training English
--- ---
## 1. Introduction ## 1. Introduction
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Up to 5x less communication and 3.4x faster training through 1-bit Adam" title: "Up to 5x less communication and 3.4x faster training through 1-bit Adam"
excerpt: "" excerpt: ""
date: 2020-09-09 00:00:00 date: 2020-09-09 00:00:00
tags: training tags: training English
toc: false toc: false
--- ---
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Training a Trillion Parameters with Pipeline Parallelism" title: "Training a Trillion Parameters with Pipeline Parallelism"
excerpt: "" excerpt: ""
date: 2020-09-09 00:00:00 date: 2020-09-09 00:00:00
tags: training tags: training English
--- ---
DeepSpeed includes new support for pipeline parallelism! DeepSpeed's training DeepSpeed includes new support for pipeline parallelism! DeepSpeed's training
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed Sparse Attention" title: "DeepSpeed Sparse Attention"
excerpt: "" excerpt: ""
date: 2020-09-09 01:00:00 date: 2020-09-09 01:00:00
tags: training inference tags: training inference English
--- ---
Attention-based deep learning models such as the transformers are highly effective in capturing relationship between tokens in an input sequence, even across long distances. As a result, they are used with text, image, and sound-based inputs, where the sequence length can be in thousands of tokens. However, despite the effectiveness of attention modules to capture long term dependencies, in practice, their application to long sequence input is limited by compute and memory requirements of the attention computation that grow quadratically, `O(n^2)`, with the sequence length `n`. Attention-based deep learning models such as the transformers are highly effective in capturing relationship between tokens in an input sequence, even across long distances. As a result, they are used with text, image, and sound-based inputs, where the sequence length can be in thousands of tokens. However, despite the effectiveness of attention modules to capture long term dependencies, in practice, their application to long sequence input is limited by compute and memory requirements of the attention computation that grow quadratically, `O(n^2)`, with the sequence length `n`.
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Progressive Layer Dropping" title: "Progressive Layer Dropping"
excerpt: "" excerpt: ""
date: 2020-10-29 00:00:00 date: 2020-10-29 00:00:00
tags: training tags: training English
toc: false toc: false
--- ---
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed ZeRO-3 Offload" title: "DeepSpeed ZeRO-3 Offload"
excerpt: "" excerpt: ""
date: 2021-03-08 00:00:00 date: 2021-03-08 00:00:00
tags: training ZeRO tags: training ZeRO English
--- ---
Today we are announcing the release of ZeRO-3 Offload, a highly efficient and easy to use implementation of ZeRO Stage 3 and ZeRO Offload combined, geared towards our continued goal of democratizing AI by making efficient large-scale DL training available to everyone. The key benefits of ZeRO-3 Offload are: Today we are announcing the release of ZeRO-3 Offload, a highly efficient and easy to use implementation of ZeRO Stage 3 and ZeRO Offload combined, geared towards our continued goal of democratizing AI by making efficient large-scale DL training available to everyone. The key benefits of ZeRO-3 Offload are:
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Mixture-of-Quantization: A novel quantization approach for reducing model size with minimal accuracy impact" title: "Mixture-of-Quantization: A novel quantization approach for reducing model size with minimal accuracy impact"
excerpt: "" excerpt: ""
date: 2021-05-05 00:00:00 date: 2021-05-05 00:00:00
tags: inference tags: inference English
--- ---
## A unified suite for quantization-aware training and inference ## A unified suite for quantization-aware training and inference
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed Inference: Multi-GPU inference with customized inference kernels and quantization support" title: "DeepSpeed Inference: Multi-GPU inference with customized inference kernels and quantization support"
excerpt: "" excerpt: ""
date: 2021-03-16 00:00:00 date: 2021-03-16 00:00:00
tags: inference tags: inference English
--- ---
While DeepSpeed supports training advanced large-scale models, using these trained models in the desired application scenarios is still challenging due to three major limitations in existing inference solutions: 1) lack of support for multi-GPU inference to fit large models and meet latency requirements, 2) limited GPU kernel performance when running inference with small batch sizes, and 3) difficulties in exploiting quantization, which includes both quantizing the model to reduce the model size and latency as well as supporting high-performance inference of quantized models without specialized hardware. While DeepSpeed supports training advanced large-scale models, using these trained models in the desired application scenarios is still challenging due to three major limitations in existing inference solutions: 1) lack of support for multi-GPU inference to fit large models and meet latency requirements, 2) limited GPU kernel performance when running inference with small batch sizes, and 3) difficulties in exploiting quantization, which includes both quantizing the model to reduce the model size and latency as well as supporting high-performance inference of quantized models without specialized hardware.
......
...@@ -3,5 +3,5 @@ title: "DeepSpeed: Accelerating large-scale model inference and training via sys ...@@ -3,5 +3,5 @@ title: "DeepSpeed: Accelerating large-scale model inference and training via sys
date: 2021-05-14 date: 2021-05-14
link: https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/ link: https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/
excerpt: "" excerpt: ""
tags: inference tags: inference English
--- ---
...@@ -3,5 +3,5 @@ title: "DeepSpeed powers 8x larger MoE model training with high performance" ...@@ -3,5 +3,5 @@ title: "DeepSpeed powers 8x larger MoE model training with high performance"
excerpt: "" excerpt: ""
link: https://www.microsoft.com/en-us/research/blog/deepspeed-powers-8x-larger-moe-model-training-with-high-performance/ link: https://www.microsoft.com/en-us/research/blog/deepspeed-powers-8x-larger-moe-model-training-with-high-performance/
date: 2021-08-18 00:00:00 date: 2021-08-18 00:00:00
tags: training tags: training English
--- ---
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Autotuning: Automatically discover the optimal DeepSpeed configuration that delivers good training speed" title: "Autotuning: Automatically discover the optimal DeepSpeed configuration that delivers good training speed"
excerpt: "" excerpt: ""
date: 2021-11-16 10:00:00 date: 2021-11-16 10:00:00
tags: training tags: training English
toc: false toc: false
--- ---
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times" title: "DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times"
excerpt: "" excerpt: ""
date: 2021-12-09 22:00:00 date: 2021-12-09 22:00:00
tags: training tags: training English
--- ---
Autoregressive transformer-based natural language generation (referred to as Autoregressive transformer-based natural language generation (referred to as
......
...@@ -3,5 +3,5 @@ title: "DeepSpeed: Advancing MoE inference and training to power next-generation ...@@ -3,5 +3,5 @@ title: "DeepSpeed: Advancing MoE inference and training to power next-generation
excerpt: "" excerpt: ""
link: https://www.microsoft.com/en-us/research/blog/deepspeed-advancing-moe-inference-and-training-to-power-next-generation-ai-scale/ link: https://www.microsoft.com/en-us/research/blog/deepspeed-advancing-moe-inference-and-training-to-power-next-generation-ai-scale/
date: 2022-01-19 00:00:00 date: 2022-01-19 00:00:00
tags: inference tags: inference English
--- ---
...@@ -3,5 +3,5 @@ title: "Supporting efficient large model training on AMD Instinct GPUs with Deep ...@@ -3,5 +3,5 @@ title: "Supporting efficient large model training on AMD Instinct GPUs with Deep
excerpt: "" excerpt: ""
link: https://cloudblogs.microsoft.com/opensource/2022/03/21/supporting-efficient-large-model-training-on-amd-instinct-gpus-with-deepspeed/ link: https://cloudblogs.microsoft.com/opensource/2022/03/21/supporting-efficient-large-model-training-on-amd-instinct-gpus-with-deepspeed/
date: 2022-03-21 00:00:00 date: 2022-03-21 00:00:00
tags: training ZeRO tags: training ZeRO English
--- ---
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed" title: "Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed"
excerpt: "" excerpt: ""
date: 2022-07-26 00:09:00 date: 2022-07-26 00:09:00
tags: training azure tags: training azure English
--- ---
## Introduction ## Introduction
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "ZeRO-Inference: Democratizing massive model inference" title: "ZeRO-Inference: Democratizing massive model inference"
excerpt: "" excerpt: ""
date: 2022-09-10 00:09:00 date: 2022-09-10 00:09:00
tags: inference ZeRO tags: inference ZeRO English
--- ---
## Introduction ## Introduction
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed-MII: instant speedup on 24,000+ open-source DL models with up to 40x cheaper inference" title: "DeepSpeed-MII: instant speedup on 24,000+ open-source DL models with up to 40x cheaper inference"
excerpt: "" excerpt: ""
date: 2022-10-11 00:09:00 date: 2022-10-11 00:09:00
tags: inference tags: inference English
--- ---
[ ![Text Generation Models](/assets/images/mii/hero.png) ](/assets/images/mii/hero.png){: .align-center} [ ![Text Generation Models](/assets/images/mii/hero.png) ](/assets/images/mii/hero.png){: .align-center}
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality" title: "DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality"
excerpt: "" excerpt: ""
date: 2022-12-12 00:09:00 date: 2022-12-12 00:09:00
tags: training tags: training English
--- ---
[ ![DeepSpeed Data Efficiency](/assets/images/data_efficiency/data_efficiecy_fig0.png) ](/assets/images/data_efficiency/data_efficiecy_fig0.png){: .align-center} [ ![DeepSpeed Data Efficiency](/assets/images/data_efficiency/data_efficiecy_fig0.png) ](/assets/images/data_efficiency/data_efficiecy_fig0.png){: .align-center}
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
title: "Scaling Large-Scale Generative Mixture-of-Expert Multimodal Model With VL-MoE " title: "Scaling Large-Scale Generative Mixture-of-Expert Multimodal Model With VL-MoE "
excerpt: "" excerpt: ""
date: 2023-03-31 00:09:00 date: 2023-03-31 00:09:00
tags: training tags: training English
--- ---
The field of Artificial Intelligence-Generated Content (AIGC) is rapidly growing, with the goal of making content creation more efficient and accessible. One of the most exciting areas of AIGC is the development of large-scale multi-modal models like [Flamingo](https://arxiv.org/abs/2204.14198), [BLIP](https://arxiv.org/abs/2301.12597), and [GPT4](https://arxiv.org/abs/2303.08774), which can accept inputs from multiple resources, e.g., image, text, audio, etc., and generate a variety of formats as outputs. For example, image creation can be made through stable diffusion and DALLE using the prompt text, and the new feature in the coming Office can create slides with texts, images, animations, etc., by leveraging the power of the new Microsoft Office Copilot. The field of Artificial Intelligence-Generated Content (AIGC) is rapidly growing, with the goal of making content creation more efficient and accessible. One of the most exciting areas of AIGC is the development of large-scale multi-modal models like [Flamingo](https://arxiv.org/abs/2204.14198), [BLIP](https://arxiv.org/abs/2301.12597), and [GPT4](https://arxiv.org/abs/2303.08774), which can accept inputs from multiple resources, e.g., image, text, audio, etc., and generate a variety of formats as outputs. For example, image creation can be made through stable diffusion and DALLE using the prompt text, and the new feature in the coming Office can create slides with texts, images, animations, etc., by leveraging the power of the new Microsoft Office Copilot.
......
---
title: "DeepSpeed Chat: 一键式RLHF训练,让你的类ChatGPT千亿大模型提速省钱15倍"
excerpt: ""
link: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/chinese/README.md
date: 2023-04-24 00:00:00
tags: training ZeRO RLHF Chinese
---
---
title: "DeepSpeed Chat: ChatGPTライクなモデルを簡単・高速・低コストに、あらゆるスケールで学習"
excerpt: ""
link: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/japanese/README.md
date: 2023-04-24 00:00:00
tags: training ZeRO RLHF Japanese
---
---
title: "DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales"
excerpt: ""
link: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/README.md
date: 2023-04-24 00:00:00
tags: training ZeRO RLHF English
---
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册