Doc corrections (#3435)

* Update 2020-09-09-sparse-attention.md * Update MoQ-tutorial.md --------- Co-authored-by: N Logan Adams <114770087+loadams@users.noreply.github.com>

Doc corrections (#3435)
* Update 2020-09-09-sparse-attention.md * Update MoQ-tutorial.md --------- Co-authored-by: N Logan Adams <114770087+loadams@users.noreply.github.com>
41321180 · Gavin Goodship · GitHub · f3f4c449 · 41321180
隐藏空白更改
内联并排

Showing with 2 addition and 2 deletion

docs/_tutorials/MoQ-tutorial.md docs/_tutorials/MoQ-tutorial.md +2 -2

未找到文件。
--- a/docs/_tutorials/MoQ-tutorial.md
+++ b/docs/_tutorials/MoQ-tutorial.md
@@ -3,7 +3,7 @@ title: "DeepSpeed Mixture-of-Quantization (MoQ)"
 tags: training quantization
 ---

-DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ).  MoQ is designed on top of QAT (Quantization-Aware Training), with the difference that it schedules various data precisions across the training process. It starts with quantizing the model with a high precision, such as FP16 or 16-bit quantization, and reduce the precision through a pre-defined schedule until reaching the target quantization bits (like 8-bit). Moreover, we use second-order information of the model parameters to dynamically adjust the quantization schedule for each of layer of the network separately. We have seen that by adding such schedule and using various data precision in the training process, we can quantize the model with better quality and preserve accuracy. For a better understanding of MoQ methodology, please refer to MoQ deep-dive, [here](https://www.deepspeed.ai/2021/05/04/MoQ.html).
+DeepSpeed introduces new support for model compression using quantization, called Mixture-of-Quantization (MoQ).  MoQ is designed on top of QAT (Quantization-Aware Training), with the difference that it schedules various data precisions across the training process. It starts with quantizing the model with a high precision, such as FP16 or 16-bit quantization, and reduce the precision through a pre-defined schedule until reaching the target quantization bits (like 8-bit). Moreover, we use second-order information of the model parameters to dynamically adjust the quantization schedule for each layer of the network separately. We have seen that by adding such schedule and using various data precision in the training process, we can quantize the model with better quality and preserve accuracy. For a better understanding of MoQ methodology, please refer to MoQ deep-dive, [here](https://www.deepspeed.ai/2021/05/04/MoQ.html).

 Below, we use fine-tune for the GLUE tasks as an illustration of how to use MoQ.

@@ -71,7 +71,7 @@ Before fine-tuning the GLUE tasks using DeepSpeed MoQ, you need:

 ### DeepSpeed Configuration File

-Prepare a config file `test.json` as below, please note following important parameters for quantization training:
+Prepare a config file `test.json` as below, please note the following important parameters for quantization training:

 ```
 {