diff --git a/README_en.md b/README_en.md index 10785bb637a1aa992a98cbcb2ae05e3b34bda51f..0c46a65fce02508ad26ae0eb9b483bfb08ac47fd 100755 --- a/README_en.md +++ b/README_en.md @@ -82,19 +82,19 @@ pip install paddleslim==1.2.0 -i https://pypi.tuna.tsinghua.edu.cn/simple ## Usage -- [QuickStart](https://paddlepaddle.github.io/PaddleSlim/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples. +- [QuickStart](https://paddleslim.readthedocs.io/en/latest/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples. - Dynamic graph - - Pruning: [Tutorial](dygraph_docs/), [Demo](demo/dygraph/pruning) + - Pruning: [Tutorial](https://paddleslim.readthedocs.io/en/latest/tutorials/image_classification_sensitivity_analysis_tutorial_en.html), [Demo](demo/dygraph/pruning) - Quantization: [Demo](demo/dygraph/quant) -- [Advanced Tutorials](https://paddlepaddle.github.io/PaddleSlim/tutorials/index_en.html):Tutorials about advanced usage of PaddleSlim. +- [Advanced Tutorials](https://paddleslim.readthedocs.io/en/latest/tutorials/index_en.html):Tutorials about advanced usage of PaddleSlim. -- [Model Zoo](https://paddlepaddle.github.io/PaddleSlim/model_zoo_en.html):Benchmark and pretrained models. +- [Model Zoo](https://paddleslim.readthedocs.io/en/latest/model_zoo_en.html):Benchmark and pretrained models. -- [API Documents](https://paddlepaddle.github.io/PaddleSlim/api_en/index_en.html) +- [API Documents](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) -- [Algorithm Background](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html): Introduce the background of quantization, pruning, distillation, NAS. +- [Algorithm Background](https://paddleslim.readthedocs.io/en/latest/intro_en.html): Introduce the background of quantization, pruning, distillation, NAS. - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/slim): Introduce how to use PaddleSlim in PaddleDetection library. diff --git a/demo/distillation/README.md b/demo/distillation/README.md index ce3bc6fa71a82af4b9ec3fdfe5006da7b2719d64..3951475d151d15b89171816883644799d2b6075d 100644 --- a/demo/distillation/README.md +++ b/demo/distillation/README.md @@ -4,7 +4,7 @@ ## 接口介绍 -请参考 [知识蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)。 +请参考 [知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html)。 ### 1. 蒸馏训练配置 @@ -37,4 +37,4 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python distill.py 经过120轮的蒸馏训练,MobileNet模型的Top-1/Top-5准确率达到72.77%/90.68%, Top-1/Top-5性能提升+1.78%/+1.00% -详细实验数据请参见[PaddleSlim模型库蒸馏部分](https://paddlepaddle.github.io/PaddleSlim/model_zoo/#13) +详细实验数据请参见[PaddleSlim模型库蒸馏部分](https://paddleslim.readthedocs.io/zh_CN/latest/model_zoo.html#id5) diff --git a/demo/mkldnn_quant/README.md b/demo/mkldnn_quant/README.md index 0d392075f39ea70ecc580992eb27c66ddf7002e4..a7188155f56f6984737cec9c9f2f175f592ebafc 100644 --- a/demo/mkldnn_quant/README.md +++ b/demo/mkldnn_quant/README.md @@ -38,15 +38,15 @@ import numpy as np #### 2.1 量化训练 -量化训练流程可以参考 [分类模型的量化训练流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/) +量化训练流程可以参考 [分类模型的量化训练流程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/static/quant_aware_tutorial.html) **量化训练过程中config参数:** - **quantize_op_types:** 目前CPU上量化支持的算子为 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`。但是在量化训练阶段插入fake_quantize/fake_dequantize算子时,只需在前四种op前后插入fake_quantize/fake_dequantize 算子,因为后面四种算子 `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`的scales将从其他op的`out_threshold`属性获取。所以,在使用PaddleSlim量化训练时,只可以对 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`进行量化,不支持其他op。 -- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware) +- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware) #### 2.2 离线量化 -离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1) +离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/static/quant_post_tutorial.html) 在使用PaddleSlim离线量化时,只可以对 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`进行量化,不支持其他op。 diff --git a/demo/mkldnn_quant/README_en.md b/demo/mkldnn_quant/README_en.md index 8031b9a15c9200b51ef0843cc8fe92e060d4157d..1c344cf8b0fbfcb36ed5f2d4f9be05ceda71925b 100644 --- a/demo/mkldnn_quant/README_en.md +++ b/demo/mkldnn_quant/README_en.md @@ -13,7 +13,7 @@ The process comprises the following steps: #### Install PaddleSlim -For PaddleSlim installation, please see [Paddle Installation Document](https://paddlepaddle.github.io/PaddleSlim/install.html) +For PaddleSlim installation, please see [Paddle Installation Document](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware) ``` git clone https://github.com/PaddlePaddle/PaddleSlim.git cd PaddleSlim @@ -34,15 +34,15 @@ One can generate fake-quantized model with post-training or quant-aware strategy #### 2.1 Quant-aware training -To generate fake quantized model with quant-aware strategy, see [Quant-aware training tutorial](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/) +To generate fake quantized model with quant-aware strategy, see [Quant-aware training tutorial](https://paddleslim.readthedocs.io/en/latest/quick_start/quant_aware_tutorial_en.html) **The parameters during quant-aware training:** - **quantize_op_types:** A list of operators to insert `fake_quantize` and `fake_dequantize` ops around them. In PaddlePaddle, quantization of following operators is supported for CPU: `depthwise_conv2d`, `conv2d`, `fc`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`. However, inserting fake_quantize/fake_dequantize operators during training is needed only for the first four of them (`depthwise_conv2d`, `conv2d`, `fc`, `matmul`), so setting the `quantize_op_types` parameter to the list of those four ops is enough. Scala data needed for quantization of the other five operators is reused from the fake ops or gathered from the `out_threshold` attributes of the operators. -- **Other parameters:** Please read [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware) +- **Other parameters:** Please read [PaddleSlim quant_aware API](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware) #### 2.2 Post-training quantization -To generate post-training fake quantized model, see [Offline post-training quantization tutorial](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1) +To generate post-training fake quantized model, see [Offline post-training quantization tutorial](https://paddleslim.readthedocs.io/en/latest/quick_start/index_en.html) ## 3. Convert the fake quantized model to DNNL INT8 model In order to deploy an INT8 model on the CPU, we need to collect scales, remove all fake_quantize/fake_dequantize operators, optimize the graph and quantize it, turning it into the final DNNL INT8 model. This is done by the script [save_quant_model.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_quant_model.py). Copy the script to the directory where the demo is located: `/PATH_TO_PaddleSlim/demo/mkldnn_quant/` and run it as follows: @@ -55,7 +55,7 @@ python save_quant_model.py --quant_model_path=/PATH/TO/SAVE/FLOAT32/quant/MODEL - **int8_model_save_path:** The final INT8 model output path after the quant model is optimized and quantized by DNNL. - **ops_to_quantize:** A comma separated list of specified op types to be quantized. It is optional. If the option is skipped, all quantizable operators will be quantized. Skipping the option is recommended in the first approach as it usually yields best performance and accuracy for image classification models and NLP models listed in the Benchmark.. - **--op_ids_to_skip:** "A comma-separated list of operator ID numbers. It is optional. Default value is none. The op ids in this list will not be quantized and will adopt FP32 type. To get the ID of a specific op, first run the script using the `--debug` option, and open the generated file `int8__cpu_quantize_placement_pass.dot` to find the op that does not need to be quantified, and the ID number is in parentheses after the Op name. -- **--debug:** Generate models graph or not. If this option is present, .dot files with graphs of the model will be generated after each optimization step that modifies the graph. For the description of DOT format, please read [DOT](https://graphviz.gitlab.io/_pages/doc/info/lang.html). To open the `*.dot` file, please use any Graphviz tool available on the system(such as the `xdot` tool on Linux or the `dot` tool on Windows. For Graphviz documentation, see [Graphviz](http://www. graphviz.org/documentation/). +- **--debug:** Generate models graph or not. If this option is present, .dot files with graphs of the model will be generated after each optimization step that modifies the graph. For the description of DOT format, please read [DOT](https://graphviz.gitlab.io/_pages/doc/info/lang.html). To open the `*.dot` file, please use any Graphviz tool available on the system(such as the `xdot` tool on Linux or the `dot` tool on Windows. For Graphviz documentation, see [Graphviz](http://www.graphviz.org/documentation/).    - **Note:** - The DNNL supported quantizable ops are `conv2d`, `depthwise_conv2d`, `fc`, `matmul`, `pool2d`, `reshape2`, `transpose2`, `scale`, `concat`. diff --git a/demo/ocr/README.md b/demo/ocr/README.md index 959c066fd02699e00bba729773c4b62c14208f23..ca6e5d792aeb33f4e1dfdc1ec60bbba1c3c5f1b4 100755 --- a/demo/ocr/README.md +++ b/demo/ocr/README.md @@ -1,4 +1,4 @@ -[English](README_en.md) | 简体中文 +English | 简体中文 # SlimOCR模型库 diff --git a/demo/ofa/bert/README.md b/demo/ofa/bert/README.md index 7114b4749d904477901fd906ace7237366dea837..dd092b13edf844d6e7fae96098bb24e8fe92c480 100644 --- a/demo/ofa/bert/README.md +++ b/demo/ofa/bert/README.md @@ -1,6 +1,6 @@ # OFA压缩PaddleNLP-BERT模型 -BERT-base模型是一个迁移能力很强的通用语义表示模型,但是模型中也有一些参数冗余。本教程将介绍如何使用PaddleSlim对[PaddleNLP](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/)中BERT-base模型进行压缩。 +BERT-base模型是一个迁移能力很强的通用语义表示模型,但是模型中也有一些参数冗余。本教程将介绍如何使用PaddleSlim对[PaddleNLP](https://paddlenlp.readthedocs.io/zh/latest/)中BERT-base模型进行压缩。 本教程只会演示如何快速启动相应训练,详细教程请参考: [BERT](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/nlp/paddlenlp_slim_ofa_tutorial.md) ## 1. 压缩结果 @@ -79,8 +79,8 @@ BERT-base模型是一个迁移能力很强的通用语义表示模型,但是 14.93 - - 40 + + 40 BERT N @@ -185,7 +185,7 @@ pip install paddlepaddle_gpu>=2.0rc1 ``` ### 2.2 Fine-tuing -首先需要对Pretrain-Model在实际的下游任务上进行Fine-tuning,得到需要压缩的模型。Fine-tuning流程参考[Fine-tuning教程](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/examples/bert) +首先需要对Pretrain-Model在实际的下游任务上进行Fine-tuning,得到需要压缩的模型。Fine-tuning流程参考[Fine-tuning教程](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/bert) Fine-tuning 在dev上的结果如压缩结果表1-1『Baseline』那一列所示。 ### 2.3 压缩训练 @@ -268,10 +268,10 @@ python -u ./run_glue_ofa.py --model_type bert \ python3.7 -u ./export_model.py --model_type bert \ --model_name_or_path ${PATH_OF_QQP_MODEL_AFTER_OFA} \ --max_seq_length 128 \ - --sub_model_output_dir ./tmp/$TASK_NAME/dynamic_model \ + --sub_model_output_dir ./tmp/$TASK_NAME/dynamic_model \ --static_sub_model ./tmp/$TASK_NAME/static_model \ - --n_gpu 1 \ - --width_mult 0.6666666666666666 + --n_gpu 1 \ + --width_mult 0.6666666666666666 ``` 其中参数释义如下: diff --git a/demo/pantheon/lexical_anlysis/README.md b/demo/pantheon/lexical_anlysis/README.md index ec3af05d28e42c9d3b0efac962ba9a8d8c283646..1287e17a9980f72c42bab32be2d02cfff5f1fdf1 100644 --- a/demo/pantheon/lexical_anlysis/README.md +++ b/demo/pantheon/lexical_anlysis/README.md @@ -11,7 +11,7 @@ BiGRU is to train a BiGRU based LAC model from scratch; BERT fine-tuned is to fi ## Introduction -Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation. +Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned ERNIE model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation. #### 1. Download the training data set @@ -37,4 +37,4 @@ bash run_teacher.sh bash run_student.sh ``` -> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis \ No newline at end of file +> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis diff --git a/demo/pantheon/lexical_anlysis/README_cn.md b/demo/pantheon/lexical_anlysis/README_cn.md index 77e4a944012482e8a0b8ca26cbd4c088e6b969a4..98691aaa794abd54b39da2f6b348789fcf9459e9 100644 --- a/demo/pantheon/lexical_anlysis/README_cn.md +++ b/demo/pantheon/lexical_anlysis/README_cn.md @@ -12,7 +12,7 @@ BiGRU 是使用双向GRU网络从头训练LAC任务;BERT fine-tuned 是在BERT ## 简介 -Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。 +Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 ERNIE 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。 #### 1. 下载训练数据集 diff --git a/demo/prune/README.md b/demo/prune/README.md index c31f1791b9e1a496b74754e30e3b1f4b14e92483..691919d341749fda8db6fafc052114abeb37fd36 100755 --- a/demo/prune/README.md +++ b/demo/prune/README.md @@ -69,7 +69,7 @@ python eval.py \ ## 5. 接口介绍 -该示例使用了`paddleslim.Pruner`工具类,用户接口使用介绍请参考:[API文档](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) +该示例使用了`paddleslim.Pruner`工具类,用户接口使用介绍请参考:[API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/index.html) 在调用`paddleslim.Pruner`工具类时,需要指定待裁卷积层的参数名称。不同模型的参数命名不同, 在`train.py`脚本中,提供了`get_pruned_params`方法,根据用户设置的选项`--model`确定要裁剪的参数。 diff --git a/demo/quant/deploy/TensorRT/README.md b/demo/quant/deploy/TensorRT/README.md index 2f133b01e8f5b1a5b5256ba43e81e4d36bc144f4..ba3eb0a5f37544331f21a24062172f21fe820449 100644 --- a/demo/quant/deploy/TensorRT/README.md +++ b/demo/quant/deploy/TensorRT/README.md @@ -79,7 +79,7 @@ LIB_ROOT/ ### 2.1 将模型导出为inference model -* 可以参考[量化训练教程](https://paddleslim.readthedocs.io/zh_CN/latest/quick_start/quant_aware_tutorial.html#id9),在训练完成后导出inference model。 +* 可以参考[量化训练教程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/index.html),在训练完成后导出inference model。 ``` inference/ diff --git a/docs/en/model_zoo_en.md b/docs/en/model_zoo_en.md index 89a83c635561c5d251d26dd5612f9aa0f0a42917..46608ea0288876b464815b92eb70eaa3cc34d4af 100644 --- a/docs/en/model_zoo_en.md +++ b/docs/en/model_zoo_en.md @@ -134,7 +134,7 @@ Dataset:WIDER-FACE | BlazeFace | quant_post | 8 | 640 | 87.8/85.1/74.9 (-3.7/-4.1/-4.8) | 228 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_post.tar) | | BlazeFace | quant_aware | 8 | 640 | 90.5/87.9/77.6 (-1.0/-1.3/-2.1) | 228 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_aware.tar) | | BlazeFace-Lite | - | 8 | 640 | 90.9/88.5/78.1 | 711 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) | -| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [model]((https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar)) | +| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar) | | BlazeFace-Lite | quant_aware | 8 | 640 | 89.7/87.3/77.0 (-1.2/-1.2/-1.1) | 211 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_aware.tar) | | BlazeFace-NAS | - | 8 | 640 | 83.7/80.7/65.8 | 244 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) | | BlazeFace-NAS | quant_post | 8 | 640 | 81.6/78.3/63.6 (-2.1/-2.4/-2.2) | 71 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_nas_quant_post.tar) | diff --git a/docs/en/quick_start/nas_tutorial_en.md b/docs/en/quick_start/nas_tutorial_en.md index 622c8224d9199665c7b679f6333e0a7d7a100e60..51065bf965c28534cf9fc4e85adedc1465f50de5 100644 --- a/docs/en/quick_start/nas_tutorial_en.md +++ b/docs/en/quick_start/nas_tutorial_en.md @@ -1,6 +1,6 @@ # Nerual Architecture Search for Image Classification -This tutorial shows how to use [API](../api/nas_api.md) about SANAS in PaddleSlim. We start experiment based on MobileNetV2 as example. The tutorial contains follow section. +This tutorial shows how to use [API](https://paddleslim.readthedocs.io/en/latest/api_en/paddleslim.nas.html) about SANAS in PaddleSlim. We start experiment based on MobileNetV2 as example. The tutorial contains follow section. 1. necessary imports 2. initial SANAS instance diff --git a/docs/en/quick_start/quant_aware_tutorial_en.md b/docs/en/quick_start/quant_aware_tutorial_en.md index ada6e6ea78fb8630f14bf5c9eca86c1208352bfc..0f44f489381674d3c7c41b61b0228e7b9178fd26 100644 --- a/docs/en/quick_start/quant_aware_tutorial_en.md +++ b/docs/en/quick_start/quant_aware_tutorial_en.md @@ -1,6 +1,6 @@ # Training-aware Quantization of image classification model - quick start -This tutorial shows how to do training-aware quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_aware) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: +This tutorial shows how to do training-aware quantization using [API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: 1. Necessary imports 2. Model architecture @@ -90,7 +90,7 @@ test(val_program) ## 4. Quantization -We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#id2). +We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddleslim.readthedocs.io/en/latest/api_en/paddleslim.quant.html). ```python quant_program = slim.quant.quant_aware(train_program, exe.place, for_test=False) @@ -115,7 +115,7 @@ test(val_quant_program) ## 6. Save model after quantization -The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``. +The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``. ```python diff --git a/docs/en/quick_start/quant_post_static_tutorial_en.md b/docs/en/quick_start/quant_post_static_tutorial_en.md index fd7f850875a8d9fdca155e5a6d70f1ffde49f7c5..6272dccc1082124d779ccdec0303e504232af1b0 100644 --- a/docs/en/quick_start/quant_post_static_tutorial_en.md +++ b/docs/en/quick_start/quant_post_static_tutorial_en.md @@ -1,6 +1,6 @@ # Post-training Quantization of image classification model - quick start -This tutorial shows how to do post training quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_post) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: +This tutorial shows how to do post training quantization using [API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: 1. Necessary imports 2. Model architecture diff --git a/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md b/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md index 043e144a1f122fd9abd1598f35c688f5bc7b6f71..1df6817de5f36d8db391454a17b7bb9536995b98 100644 --- a/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md +++ b/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md @@ -1,6 +1,6 @@ # Pruning of image classification model - sensitivity -In this tutorial, you will learn how to use [sensitivity API of PaddleSlim](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#sensitivity) by a demo of MobileNetV1 model on MNIST dataset。 +In this tutorial, you will learn how to use [sensitivity API of PaddleSlim](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) by a demo of MobileNetV1 model on MNIST dataset。 This tutorial following workflow: 1. Import dependency @@ -107,7 +107,7 @@ params = params[:5] ### 7.1 Compute in single process -Apply sensitivity analysis on pretrained model by calling [sensitivity API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#sensitivity). +Apply sensitivity analysis on pretrained model by calling [sensitivity API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html). The sensitivities will be appended into the file given by option `sensitivities_file` during computing. The information in this file won`t be computed repeatedly. @@ -197,7 +197,7 @@ Pruning model according to the sensitivities generated in section 7.3.3. ### 8.1 Get pruning ratios -Get a group of ratios by calling [get_ratios_by_loss](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#get_ratios_by_loss) fuction: +Get a group of ratios by calling [get_ratios_by_loss](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) fuction: ```python @@ -223,7 +223,7 @@ print("FLOPs after pruning: {}".format(slim.analysis.flops(pruned_program))) ### 8.3 Pruning test network -Note:The `only_graph` should be set to True while pruning test network. [Pruner API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#pruner) +Note:The `only_graph` should be set to True while pruning test network. [Pruner API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) ```python diff --git a/docs/zh_cn/FAQ/quantization_FAQ.md b/docs/zh_cn/FAQ/quantization_FAQ.md index 8ac10e43b3ab5bdea3ae0005c1ba76adfaa53195..24a9dead9767d5ebbe83d19e74cb5a221899f617 100644 --- a/docs/zh_cn/FAQ/quantization_FAQ.md +++ b/docs/zh_cn/FAQ/quantization_FAQ.md @@ -18,9 +18,9 @@ - 如果量化模型在ARM上线,则需要使用[Paddle-Lite](https://paddle-lite.readthedocs.io/zh/latest/index.html). - - Paddle-Lite会对量化模型进行模型转化和优化,转化方法见[链接](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_quantization.html#paddle-lite)。 + - Paddle-Lite会对量化模型进行模型转化和优化,转化方法见[链接](https://paddle-lite.readthedocs.io/zh/latest/index.html#sec-user-guides)。 - - 转化之后可以像非量化模型一样使用[Paddle-Lite API](https://paddle-lite.readthedocs.io/zh/latest/user_guides/tutorial.html#lite)进行加载预测。 + - 转化之后可以像非量化模型一样使用[Paddle-Lite API](https://paddle-lite.readthedocs.io/zh/latest/index.html)进行加载预测。 - 如果量化模型在GPU上线,则需要使用[Paddle-TensorRT 预测接口](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/performance_improving/inference_improving/paddle_tensorrt_infer.html). @@ -35,7 +35,7 @@ config->EnableTensorRtEngine(1 << 20 /* workspace_size*/, false /* use_calib_mode*/); ``` -- 如果量化模型在x86上线,需要使用[INT8 MKL-DNN](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/slim_int8_mkldnn_post_training_quantization.md) +- 如果量化模型在x86上线,需要使用[INT8 MKL-DNN](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib/slim/tests) - 首先对模型进行转化,可以参考[脚本](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_quant_model.py) diff --git a/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md b/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md index 63e17a15746caeab3f83eed934dee7b996e1c869..eb59c4e815ce6d8502d4485381995f5b78d8a618 100644 --- a/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md +++ b/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md @@ -16,9 +16,9 @@ | MobileNetV1 | ImageNet | post | 608 | 27.9 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) | | MobileNetV1 | ImageNet | post | 416 | 28.0 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) | | MobileNetV1 | ImageNet | post | 320 | 26.0 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) | -| MobileNetV1 | ImageNet | aware | 608 | 28.1 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) | -| MobileNetV1 | ImageNet | aware | 416 | 28.2 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) | -| MobileNetV1 | ImageNet | aware | 320 | 25.8 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) | +| MobileNetV1 | ImageNet | aware | 608 | 28.1 | 下载链接 | +| MobileNetV1 | ImageNet | aware | 416 | 28.2 | 下载链接 | +| MobileNetV1 | ImageNet | aware | 320 | 25.8 | 下载链接 | | ResNet34 | ImageNet | post | 608 | 35.7 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_post.tar) | | ResNet34 | ImageNet | aware | 608 | 35.2 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_aware.tar) | | ResNet34 | ImageNet | aware | 416 | 33.3 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_aware.tar) | diff --git a/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md b/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md index 3bb6cc51ee13aa6a533f40ccfc2b0a27781aa84d..999bbe861a81c7145fd683180118852955eadea7 120000 --- a/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md +++ b/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md @@ -1 +1 @@ -../../../demo/quant/deploy/TensorRT/README.md \ No newline at end of file +../../../demo/quant/deploy/TensorRT/README.md diff --git a/docs/zh_cn/model_zoo.md b/docs/zh_cn/model_zoo.md index a662fb2ac05a78f49dce6fb976777abf494103a4..a2fa07cbbd27af81eb1f820956f268e98ec5fd38 100644 --- a/docs/zh_cn/model_zoo.md +++ b/docs/zh_cn/model_zoo.md @@ -142,7 +142,7 @@ Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6, | BlazeFace | quant_post | 8 | 640 | 87.8/85.1/74.9 (-3.7/-4.1/-4.8) | 228 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_post.tar) | | BlazeFace | quant_aware | 8 | 640 | 90.5/87.9/77.6 (-1.0/-1.3/-2.1) | 228 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_aware.tar) | | BlazeFace-Lite | - | 8 | 640 | 90.9/88.5/78.1 | 711 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) | -| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [下载链接]((https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar)) | +| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar) | | BlazeFace-Lite | quant_aware | 8 | 640 | 89.7/87.3/77.0 (-1.2/-1.2/-1.1) | 211 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_aware.tar) | | BlazeFace-NAS | - | 8 | 640 | 83.7/80.7/65.8 | 244 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) | | BlazeFace-NAS | quant_post | 8 | 640 | 81.6/78.3/63.6 (-2.1/-2.4/-2.2) | 71 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_nas_quant_post.tar) | diff --git a/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md b/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md index 8dda87f79513919a256ba838d2f664099a6040e3..887d8f41a4ca1585ddbaa672c253cbde19d6dfbd 100644 --- a/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md +++ b/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md @@ -116,7 +116,7 @@ quanter.save_quantized_model( 导出的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[文档](../../deploy/index.html) +* 部署[文档](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md b/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md index 4281d0a6f9ab4cbaac797737038cdaffefc2a557..77cd49aa170b944b1d6e65629136e516cd14f33b 100644 --- a/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md +++ b/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md @@ -91,7 +91,7 @@ paddleslim.quant.quant_post_static( 导出的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[文档](../../deploy/index.html) +* 部署[文档](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/quick_start/static/quant_aware_tutorial.md b/docs/zh_cn/quick_start/static/quant_aware_tutorial.md index 998e8d82590eef64ecd4b6b5e39875204575a434..f4cd4b711b1960999e3c3bd91d80a4f6d1c328c4 100644 --- a/docs/zh_cn/quick_start/static/quant_aware_tutorial.md +++ b/docs/zh_cn/quick_start/static/quant_aware_tutorial.md @@ -176,7 +176,7 @@ paddle.static.save_inference_model( 保存的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[简介](../../deploy/index.html) +* 部署[简介](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md b/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md index f01649873e46ea137c26ebfd3422d49ea7ed118b..e083af5f892cf4152af2051f57fe1cac62432448 100755 --- a/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md +++ b/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md @@ -168,7 +168,7 @@ test(quant_post_static_prog, fetch_targets) 保存的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[简介](../../deploy/index.html) +* 部署[简介](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/tutorials/nas/overview.md b/docs/zh_cn/tutorials/nas/overview.md index 10556c21f12fa3d30ceb0e13de0e27373ff8ed6e..67d084c5afb9bf3f6fcdb22131470bf16f4b36d1 100644 --- a/docs/zh_cn/tutorials/nas/overview.md +++ b/docs/zh_cn/tutorials/nas/overview.md @@ -7,7 +7,7 @@ PaddleSlim提供了4种网络结构搜索的方法:基于模拟退火进行网 | [Once-For-All](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/nas/dygraph/nas_ofa.html) | OFA是一种基于One-Shot NAS的压缩方案。这种方式比较高效,其优势是只需要训练一个超网络就可以从中选择满足不同延时要求的子模型。 | Once-For-All | | [SANAS](https://paddleslim.readthedocs.io/zh_CN/latest/quick_start/static/nas_tutorial.html) | SANAS是基于模拟退火的方式进行网络结构搜索,在机器资源不多的情况下,选择这种方式一般能得到比强化学习更好的模型。 | \ | | [RLNAS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/nas/nas_api.html#rlnas) | RLNAS是基于强化学习的方式进行网络结构搜索,这种方式需要耗费大量机器资源。 | ENAS、NasNet、MNasNet | -| [DARTS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/darts.html) | DARTS是基于梯度的方式进行网络结构搜索,可以大大缩短搜索时长。 | DARTS、PCDARTS | +| [DARTS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/index.html) | DARTS是基于梯度的方式进行网络结构搜索,可以大大缩短搜索时长。 | DARTS、PCDARTS | ## 参考文献 [1] H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020. diff --git a/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md b/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md index b280db4ef88c5e2e63e398379f771976c03a3d7b..90ee4750a3456d1addcd490541ed62db2a4885f6 100644 --- a/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md +++ b/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md @@ -232,7 +232,7 @@ exe.run(startup_program) ``` #### 9.5 定义输入数据 -由于本示例中对cifar10中的图片进行了一些额外的预处理操作,和[快速开始](https://paddlepaddle.github.io/PaddleSlim/quick_start/nas_tutorial.html)示例中的reader不同,所以需要自定义cifar10的reader,不能直接调用paddle中封装好的`paddle.dataset.cifar10`的reader。自定义cifar10的reader文件位于[demo/nas](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/nas/darts_cifar10_reader.py)中。 +由于本示例中对cifar10中的图片进行了一些额外的预处理操作,和[快速开始](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)示例中的reader不同,所以需要自定义cifar10的reader,不能直接调用paddle中封装好的`paddle.dataset.cifar10`的reader。自定义cifar10的reader文件位于[demo/nas](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/nas/darts_cifar10_reader.py)中。 **注意:**本示例为了简化代码直接调用`paddle.dataset.cifar10`定义训练数据和预测数据,实际训练需要使用自定义cifar10文件中的reader。 ```python