未验证 提交 508fcbb4 编写于 作者: Z zzjjay 提交者: GitHub

fix up wrong link (#762)

上级 c46067f4
......@@ -82,19 +82,19 @@ pip install paddleslim==1.2.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
## Usage
- [QuickStart](https://paddlepaddle.github.io/PaddleSlim/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples.
- [QuickStart](https://paddleslim.readthedocs.io/en/latest/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples.
- Dynamic graph
- Pruning: [Tutorial](dygraph_docs/), [Demo](demo/dygraph/pruning)
- Pruning: [Tutorial](https://paddleslim.readthedocs.io/en/latest/tutorials/image_classification_sensitivity_analysis_tutorial_en.html), [Demo](demo/dygraph/pruning)
- Quantization: [Demo](demo/dygraph/quant)
- [Advanced Tutorials](https://paddlepaddle.github.io/PaddleSlim/tutorials/index_en.html):Tutorials about advanced usage of PaddleSlim.
- [Advanced Tutorials](https://paddleslim.readthedocs.io/en/latest/tutorials/index_en.html):Tutorials about advanced usage of PaddleSlim.
- [Model Zoo](https://paddlepaddle.github.io/PaddleSlim/model_zoo_en.html):Benchmark and pretrained models.
- [Model Zoo](https://paddleslim.readthedocs.io/en/latest/model_zoo_en.html):Benchmark and pretrained models.
- [API Documents](https://paddlepaddle.github.io/PaddleSlim/api_en/index_en.html)
- [API Documents](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html)
- [Algorithm Background](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html): Introduce the background of quantization, pruning, distillation, NAS.
- [Algorithm Background](https://paddleslim.readthedocs.io/en/latest/intro_en.html): Introduce the background of quantization, pruning, distillation, NAS.
- [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/slim): Introduce how to use PaddleSlim in PaddleDetection library.
......
......@@ -4,7 +4,7 @@
## 接口介绍
请参考 [知识蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)
请参考 [知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html)
### 1. 蒸馏训练配置
......@@ -37,4 +37,4 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python distill.py
经过120轮的蒸馏训练,MobileNet模型的Top-1/Top-5准确率达到72.77%/90.68%, Top-1/Top-5性能提升+1.78%/+1.00%
详细实验数据请参见[PaddleSlim模型库蒸馏部分](https://paddlepaddle.github.io/PaddleSlim/model_zoo/#13)
详细实验数据请参见[PaddleSlim模型库蒸馏部分](https://paddleslim.readthedocs.io/zh_CN/latest/model_zoo.html#id5)
......@@ -38,15 +38,15 @@ import numpy as np
#### 2.1 量化训练
量化训练流程可以参考 [分类模型的量化训练流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/)
量化训练流程可以参考 [分类模型的量化训练流程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/static/quant_aware_tutorial.html)
**量化训练过程中config参数:**
- **quantize_op_types:** 目前CPU上量化支持的算子为 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`。但是在量化训练阶段插入fake_quantize/fake_dequantize算子时,只需在前四种op前后插入fake_quantize/fake_dequantize 算子,因为后面四种算子 `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`的scales将从其他op的`out_threshold`属性获取。所以,在使用PaddleSlim量化训练时,只可以对 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`进行量化,不支持其他op。
- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware)
- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware)
#### 2.2 离线量化
离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1)
离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/static/quant_post_tutorial.html)
在使用PaddleSlim离线量化时,只可以对 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`进行量化,不支持其他op。
......
......@@ -13,7 +13,7 @@ The process comprises the following steps:
#### Install PaddleSlim
For PaddleSlim installation, please see [Paddle Installation Document](https://paddlepaddle.github.io/PaddleSlim/install.html)
For PaddleSlim installation, please see [Paddle Installation Document](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware)
```
git clone https://github.com/PaddlePaddle/PaddleSlim.git
cd PaddleSlim
......@@ -34,15 +34,15 @@ One can generate fake-quantized model with post-training or quant-aware strategy
#### 2.1 Quant-aware training
To generate fake quantized model with quant-aware strategy, see [Quant-aware training tutorial](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/)
To generate fake quantized model with quant-aware strategy, see [Quant-aware training tutorial](https://paddleslim.readthedocs.io/en/latest/quick_start/quant_aware_tutorial_en.html)
**The parameters during quant-aware training:**
- **quantize_op_types:** A list of operators to insert `fake_quantize` and `fake_dequantize` ops around them. In PaddlePaddle, quantization of following operators is supported for CPU: `depthwise_conv2d`, `conv2d`, `fc`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`. However, inserting fake_quantize/fake_dequantize operators during training is needed only for the first four of them (`depthwise_conv2d`, `conv2d`, `fc`, `matmul`), so setting the `quantize_op_types` parameter to the list of those four ops is enough. Scala data needed for quantization of the other five operators is reused from the fake ops or gathered from the `out_threshold` attributes of the operators.
- **Other parameters:** Please read [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware)
- **Other parameters:** Please read [PaddleSlim quant_aware API](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware)
#### 2.2 Post-training quantization
To generate post-training fake quantized model, see [Offline post-training quantization tutorial](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1)
To generate post-training fake quantized model, see [Offline post-training quantization tutorial](https://paddleslim.readthedocs.io/en/latest/quick_start/index_en.html)
## 3. Convert the fake quantized model to DNNL INT8 model
In order to deploy an INT8 model on the CPU, we need to collect scales, remove all fake_quantize/fake_dequantize operators, optimize the graph and quantize it, turning it into the final DNNL INT8 model. This is done by the script [save_quant_model.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_quant_model.py). Copy the script to the directory where the demo is located: `/PATH_TO_PaddleSlim/demo/mkldnn_quant/` and run it as follows:
......@@ -55,7 +55,7 @@ python save_quant_model.py --quant_model_path=/PATH/TO/SAVE/FLOAT32/quant/MODEL
- **int8_model_save_path:** The final INT8 model output path after the quant model is optimized and quantized by DNNL.
- **ops_to_quantize:** A comma separated list of specified op types to be quantized. It is optional. If the option is skipped, all quantizable operators will be quantized. Skipping the option is recommended in the first approach as it usually yields best performance and accuracy for image classification models and NLP models listed in the Benchmark..
- **--op_ids_to_skip:** "A comma-separated list of operator ID numbers. It is optional. Default value is none. The op ids in this list will not be quantized and will adopt FP32 type. To get the ID of a specific op, first run the script using the `--debug` option, and open the generated file `int8_<number>_cpu_quantize_placement_pass.dot` to find the op that does not need to be quantified, and the ID number is in parentheses after the Op name.
- **--debug:** Generate models graph or not. If this option is present, .dot files with graphs of the model will be generated after each optimization step that modifies the graph. For the description of DOT format, please read [DOT](https://graphviz.gitlab.io/_pages/doc/info/lang.html). To open the `*.dot` file, please use any Graphviz tool available on the system(such as the `xdot` tool on Linux or the `dot` tool on Windows. For Graphviz documentation, see [Graphviz](http://www. graphviz.org/documentation/).
- **--debug:** Generate models graph or not. If this option is present, .dot files with graphs of the model will be generated after each optimization step that modifies the graph. For the description of DOT format, please read [DOT](https://graphviz.gitlab.io/_pages/doc/info/lang.html). To open the `*.dot` file, please use any Graphviz tool available on the system(such as the `xdot` tool on Linux or the `dot` tool on Windows. For Graphviz documentation, see [Graphviz](http://www.graphviz.org/documentation/).
  
- **Note:**
- The DNNL supported quantizable ops are `conv2d`, `depthwise_conv2d`, `fc`, `matmul`, `pool2d`, `reshape2`, `transpose2`, `scale`, `concat`.
......
[English](README_en.md) | 简体中文
English | 简体中文
# SlimOCR模型库
......
# OFA压缩PaddleNLP-BERT模型
BERT-base模型是一个迁移能力很强的通用语义表示模型,但是模型中也有一些参数冗余。本教程将介绍如何使用PaddleSlim对[PaddleNLP](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/)中BERT-base模型进行压缩。
BERT-base模型是一个迁移能力很强的通用语义表示模型,但是模型中也有一些参数冗余。本教程将介绍如何使用PaddleSlim对[PaddleNLP](https://paddlenlp.readthedocs.io/zh/latest/)中BERT-base模型进行压缩。
本教程只会演示如何快速启动相应训练,详细教程请参考: [BERT](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/nlp/paddlenlp_slim_ofa_tutorial.md)
## 1. 压缩结果
......@@ -79,8 +79,8 @@ BERT-base模型是一个迁移能力很强的通用语义表示模型,但是
<span style="font-size:18px">14.93</span>
</td>
</tr>
<tr>
<td rowspan=4 align=center> 40 </td>
<tr>
<td rowspan=4 align=center> 40 </td>
<td rowspan=2 align=center> BERT </td>
<td style="text-align:center">
<span style="font-size:18px">N</span>
......@@ -185,7 +185,7 @@ pip install paddlepaddle_gpu>=2.0rc1
```
### 2.2 Fine-tuing
首先需要对Pretrain-Model在实际的下游任务上进行Fine-tuning,得到需要压缩的模型。Fine-tuning流程参考[Fine-tuning教程](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/examples/bert)
首先需要对Pretrain-Model在实际的下游任务上进行Fine-tuning,得到需要压缩的模型。Fine-tuning流程参考[Fine-tuning教程](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/bert)
Fine-tuning 在dev上的结果如压缩结果表1-1『Baseline』那一列所示。
### 2.3 压缩训练
......@@ -268,10 +268,10 @@ python -u ./run_glue_ofa.py --model_type bert \
python3.7 -u ./export_model.py --model_type bert \
--model_name_or_path ${PATH_OF_QQP_MODEL_AFTER_OFA} \
--max_seq_length 128 \
--sub_model_output_dir ./tmp/$TASK_NAME/dynamic_model \
--sub_model_output_dir ./tmp/$TASK_NAME/dynamic_model \
--static_sub_model ./tmp/$TASK_NAME/static_model \
--n_gpu 1 \
--width_mult 0.6666666666666666
--n_gpu 1 \
--width_mult 0.6666666666666666
```
其中参数释义如下:
......
......@@ -11,7 +11,7 @@ BiGRU is to train a BiGRU based LAC model from scratch; BERT fine-tuned is to fi
## Introduction
Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation.
Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned ERNIE model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation.
#### 1. Download the training data set
......@@ -37,4 +37,4 @@ bash run_teacher.sh
bash run_student.sh
```
> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis
\ No newline at end of file
> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis
......@@ -12,7 +12,7 @@ BiGRU 是使用双向GRU网络从头训练LAC任务;BERT fine-tuned 是在BERT
## 简介
Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。
Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 ERNIE 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。
#### 1. 下载训练数据集
......
......@@ -69,7 +69,7 @@ python eval.py \
## 5. 接口介绍
该示例使用了`paddleslim.Pruner`工具类,用户接口使用介绍请参考:[API文档](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/)
该示例使用了`paddleslim.Pruner`工具类,用户接口使用介绍请参考:[API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/index.html)
在调用`paddleslim.Pruner`工具类时,需要指定待裁卷积层的参数名称。不同模型的参数命名不同,
`train.py`脚本中,提供了`get_pruned_params`方法,根据用户设置的选项`--model`确定要裁剪的参数。
......@@ -79,7 +79,7 @@ LIB_ROOT/
### 2.1 将模型导出为inference model
* 可以参考[量化训练教程](https://paddleslim.readthedocs.io/zh_CN/latest/quick_start/quant_aware_tutorial.html#id9),在训练完成后导出inference model。
* 可以参考[量化训练教程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/index.html),在训练完成后导出inference model。
```
inference/
......
......@@ -134,7 +134,7 @@ Dataset:WIDER-FACE
| BlazeFace | quant_post | 8 | 640 | 87.8/85.1/74.9 (-3.7/-4.1/-4.8) | 228 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_post.tar) |
| BlazeFace | quant_aware | 8 | 640 | 90.5/87.9/77.6 (-1.0/-1.3/-2.1) | 228 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_aware.tar) |
| BlazeFace-Lite | - | 8 | 640 | 90.9/88.5/78.1 | 711 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [model]((https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar)) |
| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar) |
| BlazeFace-Lite | quant_aware | 8 | 640 | 89.7/87.3/77.0 (-1.2/-1.2/-1.1) | 211 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_aware.tar) |
| BlazeFace-NAS | - | 8 | 640 | 83.7/80.7/65.8 | 244 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| BlazeFace-NAS | quant_post | 8 | 640 | 81.6/78.3/63.6 (-2.1/-2.4/-2.2) | 71 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_nas_quant_post.tar) |
......
# Nerual Architecture Search for Image Classification
This tutorial shows how to use [API](../api/nas_api.md) about SANAS in PaddleSlim. We start experiment based on MobileNetV2 as example. The tutorial contains follow section.
This tutorial shows how to use [API](https://paddleslim.readthedocs.io/en/latest/api_en/paddleslim.nas.html) about SANAS in PaddleSlim. We start experiment based on MobileNetV2 as example. The tutorial contains follow section.
1. necessary imports
2. initial SANAS instance
......
# Training-aware Quantization of image classification model - quick start
This tutorial shows how to do training-aware quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_aware) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
This tutorial shows how to do training-aware quantization using [API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
1. Necessary imports
2. Model architecture
......@@ -90,7 +90,7 @@ test(val_program)
## 4. Quantization
We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#id2).
We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddleslim.readthedocs.io/en/latest/api_en/paddleslim.quant.html).
```python
quant_program = slim.quant.quant_aware(train_program, exe.place, for_test=False)
......@@ -115,7 +115,7 @@ test(val_quant_program)
## 6. Save model after quantization
The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``.
The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``.
```python
......
# Post-training Quantization of image classification model - quick start
This tutorial shows how to do post training quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_post) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
This tutorial shows how to do post training quantization using [API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections:
1. Necessary imports
2. Model architecture
......
# Pruning of image classification model - sensitivity
In this tutorial, you will learn how to use [sensitivity API of PaddleSlim](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#sensitivity) by a demo of MobileNetV1 model on MNIST dataset。
In this tutorial, you will learn how to use [sensitivity API of PaddleSlim](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) by a demo of MobileNetV1 model on MNIST dataset。
This tutorial following workflow:
1. Import dependency
......@@ -107,7 +107,7 @@ params = params[:5]
### 7.1 Compute in single process
Apply sensitivity analysis on pretrained model by calling [sensitivity API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#sensitivity).
Apply sensitivity analysis on pretrained model by calling [sensitivity API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html).
The sensitivities will be appended into the file given by option `sensitivities_file` during computing.
The information in this file won`t be computed repeatedly.
......@@ -197,7 +197,7 @@ Pruning model according to the sensitivities generated in section 7.3.3.
### 8.1 Get pruning ratios
Get a group of ratios by calling [get_ratios_by_loss](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#get_ratios_by_loss) fuction:
Get a group of ratios by calling [get_ratios_by_loss](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) fuction:
```python
......@@ -223,7 +223,7 @@ print("FLOPs after pruning: {}".format(slim.analysis.flops(pruned_program)))
### 8.3 Pruning test network
Note:The `only_graph` should be set to True while pruning test network. [Pruner API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#pruner)
Note:The `only_graph` should be set to True while pruning test network. [Pruner API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html)
```python
......
......@@ -18,9 +18,9 @@
- 如果量化模型在ARM上线,则需要使用[Paddle-Lite](https://paddle-lite.readthedocs.io/zh/latest/index.html).
- Paddle-Lite会对量化模型进行模型转化和优化,转化方法见[链接](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_quantization.html#paddle-lite)。
- Paddle-Lite会对量化模型进行模型转化和优化,转化方法见[链接](https://paddle-lite.readthedocs.io/zh/latest/index.html#sec-user-guides)。
- 转化之后可以像非量化模型一样使用[Paddle-Lite API](https://paddle-lite.readthedocs.io/zh/latest/user_guides/tutorial.html#lite)进行加载预测。
- 转化之后可以像非量化模型一样使用[Paddle-Lite API](https://paddle-lite.readthedocs.io/zh/latest/index.html)进行加载预测。
- 如果量化模型在GPU上线,则需要使用[Paddle-TensorRT 预测接口](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/performance_improving/inference_improving/paddle_tensorrt_infer.html).
......@@ -35,7 +35,7 @@ config->EnableTensorRtEngine(1 << 20 /* workspace_size*/,
false /* use_calib_mode*/);
```
- 如果量化模型在x86上线,需要使用[INT8 MKL-DNN](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/slim_int8_mkldnn_post_training_quantization.md)
- 如果量化模型在x86上线,需要使用[INT8 MKL-DNN](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib/slim/tests)
- 首先对模型进行转化,可以参考[脚本](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_quant_model.py)
......
......@@ -16,9 +16,9 @@
| MobileNetV1 | ImageNet | post | 608 | 27.9 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) |
| MobileNetV1 | ImageNet | post | 416 | 28.0 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) |
| MobileNetV1 | ImageNet | post | 320 | 26.0 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) |
| MobileNetV1 | ImageNet | aware | 608 | 28.1 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) |
| MobileNetV1 | ImageNet | aware | 416 | 28.2 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) |
| MobileNetV1 | ImageNet | aware | 320 | 25.8 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) |
| MobileNetV1 | ImageNet | aware | 608 | 28.1 | 下载链接 |
| MobileNetV1 | ImageNet | aware | 416 | 28.2 | 下载链接 |
| MobileNetV1 | ImageNet | aware | 320 | 25.8 | 下载链接 |
| ResNet34 | ImageNet | post | 608 | 35.7 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_post.tar) |
| ResNet34 | ImageNet | aware | 608 | 35.2 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_aware.tar) |
| ResNet34 | ImageNet | aware | 416 | 33.3 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_aware.tar) |
......
../../../demo/quant/deploy/TensorRT/README.md
\ No newline at end of file
../../../demo/quant/deploy/TensorRT/README.md
......@@ -142,7 +142,7 @@ Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6,
| BlazeFace | quant_post | 8 | 640 | 87.8/85.1/74.9 (-3.7/-4.1/-4.8) | 228 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_post.tar) |
| BlazeFace | quant_aware | 8 | 640 | 90.5/87.9/77.6 (-1.0/-1.3/-2.1) | 228 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_aware.tar) |
| BlazeFace-Lite | - | 8 | 640 | 90.9/88.5/78.1 | 711 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) |
| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [下载链接]((https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar)) |
| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar) |
| BlazeFace-Lite | quant_aware | 8 | 640 | 89.7/87.3/77.0 (-1.2/-1.2/-1.1) | 211 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_aware.tar) |
| BlazeFace-NAS | - | 8 | 640 | 83.7/80.7/65.8 | 244 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) |
| BlazeFace-NAS | quant_post | 8 | 640 | 81.6/78.3/63.6 (-2.1/-2.4/-2.2) | 71 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_nas_quant_post.tar) |
......
......@@ -116,7 +116,7 @@ quanter.save_quantized_model(
导出的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。
部署参考文档:
* 部署[文档](../../deploy/index.html)
* 部署[文档](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)
* PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
......@@ -91,7 +91,7 @@ paddleslim.quant.quant_post_static(
导出的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。
部署参考文档:
* 部署[文档](../../deploy/index.html)
* 部署[文档](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)
* PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
......@@ -176,7 +176,7 @@ paddle.static.save_inference_model(
保存的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。
部署参考文档:
* 部署[简介](../../deploy/index.html)
* 部署[简介](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)
* PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
......@@ -168,7 +168,7 @@ test(quant_post_static_prog, fetch_targets)
保存的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。
部署参考文档:
* 部署[简介](../../deploy/index.html)
* 部署[简介](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)
* PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html)
* PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html)
* PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)
......@@ -7,7 +7,7 @@ PaddleSlim提供了4种网络结构搜索的方法:基于模拟退火进行网
| [Once-For-All](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/nas/dygraph/nas_ofa.html) | OFA是一种基于One-Shot NAS的压缩方案。这种方式比较高效,其优势是只需要训练一个超网络就可以从中选择满足不同延时要求的子模型。 | Once-For-All |
| [SANAS](https://paddleslim.readthedocs.io/zh_CN/latest/quick_start/static/nas_tutorial.html) | SANAS是基于模拟退火的方式进行网络结构搜索,在机器资源不多的情况下,选择这种方式一般能得到比强化学习更好的模型。 | \ |
| [RLNAS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/nas/nas_api.html#rlnas) | RLNAS是基于强化学习的方式进行网络结构搜索,这种方式需要耗费大量机器资源。 | ENAS、NasNet、MNasNet |
| [DARTS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/darts.html) | DARTS是基于梯度的方式进行网络结构搜索,可以大大缩短搜索时长。 | DARTS、PCDARTS |
| [DARTS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/index.html) | DARTS是基于梯度的方式进行网络结构搜索,可以大大缩短搜索时长。 | DARTS、PCDARTS |
## 参考文献
[1] H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020.
......
......@@ -232,7 +232,7 @@ exe.run(startup_program)
```
#### 9.5 定义输入数据
由于本示例中对cifar10中的图片进行了一些额外的预处理操作,和[快速开始](https://paddlepaddle.github.io/PaddleSlim/quick_start/nas_tutorial.html)示例中的reader不同,所以需要自定义cifar10的reader,不能直接调用paddle中封装好的`paddle.dataset.cifar10`的reader。自定义cifar10的reader文件位于[demo/nas](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/nas/darts_cifar10_reader.py)中。
由于本示例中对cifar10中的图片进行了一些额外的预处理操作,和[快速开始](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)示例中的reader不同,所以需要自定义cifar10的reader,不能直接调用paddle中封装好的`paddle.dataset.cifar10`的reader。自定义cifar10的reader文件位于[demo/nas](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/nas/darts_cifar10_reader.py)中。
**注意:**本示例为了简化代码直接调用`paddle.dataset.cifar10`定义训练数据和预测数据,实际训练需要使用自定义cifar10文件中的reader。
```python
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册