未验证 提交 085981bf 编写于 作者: C Conglong Li 提交者: GitHub

add deepspeed chat blog links, add tags (#3369)

上级 3031eec4
......@@ -3,5 +3,5 @@ title: "ZeRO & DeepSpeed: New system optimizations enable training models with o
date: 2020-02-13
link: https://www.microsoft.com/en-us/research/blog/ZeRO-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
excerpt: ""
tags: training ZeRO
tags: training ZeRO English
---
......@@ -3,5 +3,5 @@ title: "Turing-NLG: A 17-billion-parameter language model by Microsoft"
date: 2020-02-13
link: https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
excerpt: "DeepSpeed was used to train the world's largest language model."
tags: training
tags: training English
---
---
title: "ZeRO stage 1 with reduced communication"
sneak_preview: true
tags: training ZeRO English
excerpt: "Partition-aware ZeRO with up to 2x reduction in communication time!"
---
......
---
title: "The Fastest and Most Efficient BERT Training through Optimized Transformer Kernels"
excerpt: ""
tags: training
date: 2020-05-19 00:00:00
toc: false
tags: training
tags: training English
---
We introduce new technology to accelerate single GPU performance via kernel
......
......@@ -2,6 +2,6 @@
title: "ZeRO-2 & DeepSpeed: Shattering Barriers of Deep Learning Speed & Scale"
excerpt: ""
link: https://www.microsoft.com/en-us/research/blog/ZeRO-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/
tags: training ZeRO
tags: training ZeRO English
date: 2020-05-19 02:00:00
---
---
title: "An Order-of-Magnitude Larger and Faster Training with ZeRO-2"
excerpt: ""
tags: training ZeRO
tags: training ZeRO English
date: 2020-05-19 01:00:00
toc: false
---
......
---
title: "Microsoft DeepSpeed achieves the fastest BERT training time"
excerpt: ""
tags: training
tags: training English
date: 2020-05-28 00:00:00
---
......
---
title: "DeepSpeed Microsoft Research Webinar on August 6th, 2020"
excerpt: ""
tags: presentations
tags: presentations English
link: https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html
image: /assets/images/webinar-aug2020.png
date: 2020-07-24 00:00:00
......
---
title: "DeepSpeed Microsoft Research Webinar is now on-demand"
excerpt: ""
tags: presentations
tags: presentations English
link: https://note.microsoft.com/MSR-Webinar-DeepSpeed-Registration-On-Demand.html
date: 2020-08-07 00:00:00
---
---
title: "Powering 10x longer sequences and 6x faster execution through DeepSpeed Sparse Attention"
excerpt: ""
tags: training
tags: training English
date: 2020-09-09 00:00:00
toc: false
---
......
......@@ -2,7 +2,7 @@
title: "10x bigger model training on a single GPU with ZeRO-Offload"
excerpt: ""
date: 2020-09-09 00:00:00
tags: training ZeRO
tags: training ZeRO English
toc: false
---
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed with 1-bit Adam: 5x less communication and 3.4x faster training"
excerpt: ""
date: 2020-09-09 00:00:00
tags: training
tags: training English
---
## 1. Introduction
......
......@@ -2,7 +2,7 @@
title: "Up to 5x less communication and 3.4x faster training through 1-bit Adam"
excerpt: ""
date: 2020-09-09 00:00:00
tags: training
tags: training English
toc: false
---
......
......@@ -2,7 +2,7 @@
title: "Training a Trillion Parameters with Pipeline Parallelism"
excerpt: ""
date: 2020-09-09 00:00:00
tags: training
tags: training English
---
DeepSpeed includes new support for pipeline parallelism! DeepSpeed's training
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed Sparse Attention"
excerpt: ""
date: 2020-09-09 01:00:00
tags: training inference
tags: training inference English
---
Attention-based deep learning models such as the transformers are highly effective in capturing relationship between tokens in an input sequence, even across long distances. As a result, they are used with text, image, and sound-based inputs, where the sequence length can be in thousands of tokens. However, despite the effectiveness of attention modules to capture long term dependencies, in practice, their application to long sequence input is limited by compute and memory requirements of the attention computation that grow quadratically, `O(n^2)`, with the sequence length `n`.
......
......@@ -2,7 +2,7 @@
title: "Progressive Layer Dropping"
excerpt: ""
date: 2020-10-29 00:00:00
tags: training
tags: training English
toc: false
---
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed ZeRO-3 Offload"
excerpt: ""
date: 2021-03-08 00:00:00
tags: training ZeRO
tags: training ZeRO English
---
Today we are announcing the release of ZeRO-3 Offload, a highly efficient and easy to use implementation of ZeRO Stage 3 and ZeRO Offload combined, geared towards our continued goal of democratizing AI by making efficient large-scale DL training available to everyone. The key benefits of ZeRO-3 Offload are:
......
......@@ -2,7 +2,7 @@
title: "Mixture-of-Quantization: A novel quantization approach for reducing model size with minimal accuracy impact"
excerpt: ""
date: 2021-05-05 00:00:00
tags: inference
tags: inference English
---
## A unified suite for quantization-aware training and inference
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed Inference: Multi-GPU inference with customized inference kernels and quantization support"
excerpt: ""
date: 2021-03-16 00:00:00
tags: inference
tags: inference English
---
While DeepSpeed supports training advanced large-scale models, using these trained models in the desired application scenarios is still challenging due to three major limitations in existing inference solutions: 1) lack of support for multi-GPU inference to fit large models and meet latency requirements, 2) limited GPU kernel performance when running inference with small batch sizes, and 3) difficulties in exploiting quantization, which includes both quantizing the model to reduce the model size and latency as well as supporting high-performance inference of quantized models without specialized hardware.
......
......@@ -3,5 +3,5 @@ title: "DeepSpeed: Accelerating large-scale model inference and training via sys
date: 2021-05-14
link: https://www.microsoft.com/en-us/research/blog/deepspeed-accelerating-large-scale-model-inference-and-training-via-system-optimizations-and-compression/
excerpt: ""
tags: inference
tags: inference English
---
......@@ -3,5 +3,5 @@ title: "DeepSpeed powers 8x larger MoE model training with high performance"
excerpt: ""
link: https://www.microsoft.com/en-us/research/blog/deepspeed-powers-8x-larger-moe-model-training-with-high-performance/
date: 2021-08-18 00:00:00
tags: training
tags: training English
---
......@@ -2,7 +2,7 @@
title: "Autotuning: Automatically discover the optimal DeepSpeed configuration that delivers good training speed"
excerpt: ""
date: 2021-11-16 10:00:00
tags: training
tags: training English
toc: false
---
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed-MoE for NLG: Reducing the training cost of language models by 5 times"
excerpt: ""
date: 2021-12-09 22:00:00
tags: training
tags: training English
---
Autoregressive transformer-based natural language generation (referred to as
......
......@@ -3,5 +3,5 @@ title: "DeepSpeed: Advancing MoE inference and training to power next-generation
excerpt: ""
link: https://www.microsoft.com/en-us/research/blog/deepspeed-advancing-moe-inference-and-training-to-power-next-generation-ai-scale/
date: 2022-01-19 00:00:00
tags: inference
tags: inference English
---
......@@ -3,5 +3,5 @@ title: "Supporting efficient large model training on AMD Instinct GPUs with Deep
excerpt: ""
link: https://cloudblogs.microsoft.com/opensource/2022/03/21/supporting-efficient-large-model-training-on-amd-instinct-gpus-with-deepspeed/
date: 2022-03-21 00:00:00
tags: training ZeRO
tags: training ZeRO English
---
......@@ -2,7 +2,7 @@
title: "Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed"
excerpt: ""
date: 2022-07-26 00:09:00
tags: training azure
tags: training azure English
---
## Introduction
......
......@@ -2,7 +2,7 @@
title: "ZeRO-Inference: Democratizing massive model inference"
excerpt: ""
date: 2022-09-10 00:09:00
tags: inference ZeRO
tags: inference ZeRO English
---
## Introduction
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed-MII: instant speedup on 24,000+ open-source DL models with up to 40x cheaper inference"
excerpt: ""
date: 2022-10-11 00:09:00
tags: inference
tags: inference English
---
[ ![Text Generation Models](/assets/images/mii/hero.png) ](/assets/images/mii/hero.png){: .align-center}
......
......@@ -2,7 +2,7 @@
title: "DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality"
excerpt: ""
date: 2022-12-12 00:09:00
tags: training
tags: training English
---
[ ![DeepSpeed Data Efficiency](/assets/images/data_efficiency/data_efficiecy_fig0.png) ](/assets/images/data_efficiency/data_efficiecy_fig0.png){: .align-center}
......
......@@ -2,7 +2,7 @@
title: "Scaling Large-Scale Generative Mixture-of-Expert Multimodal Model With VL-MoE "
excerpt: ""
date: 2023-03-31 00:09:00
tags: training
tags: training English
---
The field of Artificial Intelligence-Generated Content (AIGC) is rapidly growing, with the goal of making content creation more efficient and accessible. One of the most exciting areas of AIGC is the development of large-scale multi-modal models like [Flamingo](https://arxiv.org/abs/2204.14198), [BLIP](https://arxiv.org/abs/2301.12597), and [GPT4](https://arxiv.org/abs/2303.08774), which can accept inputs from multiple resources, e.g., image, text, audio, etc., and generate a variety of formats as outputs. For example, image creation can be made through stable diffusion and DALLE using the prompt text, and the new feature in the coming Office can create slides with texts, images, animations, etc., by leveraging the power of the new Microsoft Office Copilot.
......
---
title: "DeepSpeed Chat: 一键式RLHF训练,让你的类ChatGPT千亿大模型提速省钱15倍"
excerpt: ""
link: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/chinese/README.md
date: 2023-04-24 00:00:00
tags: training ZeRO RLHF Chinese
---
---
title: "DeepSpeed Chat: ChatGPTライクなモデルを簡単・高速・低コストに、あらゆるスケールで学習"
excerpt: ""
link: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/japanese/README.md
date: 2023-04-24 00:00:00
tags: training ZeRO RLHF Japanese
---
---
title: "DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales"
excerpt: ""
link: https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-chat/README.md
date: 2023-04-24 00:00:00
tags: training ZeRO RLHF English
---
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册