From 508fcbb4b57d38059bbdbe0eebfb81c6cd573dc0 Mon Sep 17 00:00:00 2001 From: zzjjay <41366441+zzjjay@users.noreply.github.com> Date: Thu, 20 May 2021 13:06:46 +0800 Subject: [PATCH] fix up wrong link (#762) --- README_en.md | 12 ++++++------ demo/distillation/README.md | 4 ++-- demo/mkldnn_quant/README.md | 6 +++--- demo/mkldnn_quant/README_en.md | 10 +++++----- demo/ocr/README.md | 2 +- demo/ofa/bert/README.md | 14 +++++++------- demo/pantheon/lexical_anlysis/README.md | 4 ++-- demo/pantheon/lexical_anlysis/README_cn.md | 2 +- demo/prune/README.md | 2 +- demo/quant/deploy/TensorRT/README.md | 2 +- docs/en/model_zoo_en.md | 2 +- docs/en/quick_start/nas_tutorial_en.md | 2 +- docs/en/quick_start/quant_aware_tutorial_en.md | 6 +++--- .../quick_start/quant_post_static_tutorial_en.md | 2 +- ...ssification_sensitivity_analysis_tutorial_en.md | 8 ++++---- docs/zh_cn/FAQ/quantization_FAQ.md | 6 +++--- .../paddledetection_slim_quantization_tutorial.md | 6 +++--- .../zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md | 2 +- docs/zh_cn/model_zoo.md | 2 +- .../dygraph_quant_aware_training_tutorial.md | 2 +- .../dygraph/dygraph_quant_post_tutorial.md | 2 +- .../quick_start/static/quant_aware_tutorial.md | 2 +- .../static/quant_post_static_tutorial.md | 2 +- docs/zh_cn/tutorials/nas/overview.md | 2 +- .../tutorials/nas/static/sanas_darts_space.md | 2 +- 25 files changed, 53 insertions(+), 53 deletions(-) diff --git a/README_en.md b/README_en.md index 10785bb6..0c46a65f 100755 --- a/README_en.md +++ b/README_en.md @@ -82,19 +82,19 @@ pip install paddleslim==1.2.0 -i https://pypi.tuna.tsinghua.edu.cn/simple ## Usage -- [QuickStart](https://paddlepaddle.github.io/PaddleSlim/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples. +- [QuickStart](https://paddleslim.readthedocs.io/en/latest/quick_start/index_en.html): Introduce how to use PaddleSlim by simple examples. - Dynamic graph - - Pruning: [Tutorial](dygraph_docs/), [Demo](demo/dygraph/pruning) + - Pruning: [Tutorial](https://paddleslim.readthedocs.io/en/latest/tutorials/image_classification_sensitivity_analysis_tutorial_en.html), [Demo](demo/dygraph/pruning) - Quantization: [Demo](demo/dygraph/quant) -- [Advanced Tutorials](https://paddlepaddle.github.io/PaddleSlim/tutorials/index_en.html):Tutorials about advanced usage of PaddleSlim. +- [Advanced Tutorials](https://paddleslim.readthedocs.io/en/latest/tutorials/index_en.html):Tutorials about advanced usage of PaddleSlim. -- [Model Zoo](https://paddlepaddle.github.io/PaddleSlim/model_zoo_en.html):Benchmark and pretrained models. +- [Model Zoo](https://paddleslim.readthedocs.io/en/latest/model_zoo_en.html):Benchmark and pretrained models. -- [API Documents](https://paddlepaddle.github.io/PaddleSlim/api_en/index_en.html) +- [API Documents](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) -- [Algorithm Background](https://paddlepaddle.github.io/PaddleSlim/algo/algo.html): Introduce the background of quantization, pruning, distillation, NAS. +- [Algorithm Background](https://paddleslim.readthedocs.io/en/latest/intro_en.html): Introduce the background of quantization, pruning, distillation, NAS. - [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection/tree/master/slim): Introduce how to use PaddleSlim in PaddleDetection library. diff --git a/demo/distillation/README.md b/demo/distillation/README.md index ce3bc6fa..3951475d 100644 --- a/demo/distillation/README.md +++ b/demo/distillation/README.md @@ -4,7 +4,7 @@ ## 接口介绍 -请参考 [知识蒸馏API文档](https://paddlepaddle.github.io/PaddleSlim/api/single_distiller_api/)。 +请参考 [知识蒸馏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/dist/single_distiller_api.html)。 ### 1. 蒸馏训练配置 @@ -37,4 +37,4 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python distill.py 经过120轮的蒸馏训练,MobileNet模型的Top-1/Top-5准确率达到72.77%/90.68%, Top-1/Top-5性能提升+1.78%/+1.00% -详细实验数据请参见[PaddleSlim模型库蒸馏部分](https://paddlepaddle.github.io/PaddleSlim/model_zoo/#13) +详细实验数据请参见[PaddleSlim模型库蒸馏部分](https://paddleslim.readthedocs.io/zh_CN/latest/model_zoo.html#id5) diff --git a/demo/mkldnn_quant/README.md b/demo/mkldnn_quant/README.md index 0d392075..a7188155 100644 --- a/demo/mkldnn_quant/README.md +++ b/demo/mkldnn_quant/README.md @@ -38,15 +38,15 @@ import numpy as np #### 2.1 量化训练 -量化训练流程可以参考 [分类模型的量化训练流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/) +量化训练流程可以参考 [分类模型的量化训练流程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/static/quant_aware_tutorial.html) **量化训练过程中config参数:** - **quantize_op_types:** 目前CPU上量化支持的算子为 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`。但是在量化训练阶段插入fake_quantize/fake_dequantize算子时,只需在前四种op前后插入fake_quantize/fake_dequantize 算子,因为后面四种算子 `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`的scales将从其他op的`out_threshold`属性获取。所以,在使用PaddleSlim量化训练时,只可以对 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`进行量化,不支持其他op。 -- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware) +- **其他参数:** 请参考 [PaddleSlim quant_aware API](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware) #### 2.2 离线量化 -离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1) +离线量化模型产出可以参考[分类模型的静态离线量化流程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/static/quant_post_tutorial.html) 在使用PaddleSlim离线量化时,只可以对 `depthwise_conv2d`, `conv2d`, `mul`, `matmul`进行量化,不支持其他op。 diff --git a/demo/mkldnn_quant/README_en.md b/demo/mkldnn_quant/README_en.md index 8031b9a1..1c344cf8 100644 --- a/demo/mkldnn_quant/README_en.md +++ b/demo/mkldnn_quant/README_en.md @@ -13,7 +13,7 @@ The process comprises the following steps: #### Install PaddleSlim -For PaddleSlim installation, please see [Paddle Installation Document](https://paddlepaddle.github.io/PaddleSlim/install.html) +For PaddleSlim installation, please see [Paddle Installation Document](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware) ``` git clone https://github.com/PaddlePaddle/PaddleSlim.git cd PaddleSlim @@ -34,15 +34,15 @@ One can generate fake-quantized model with post-training or quant-aware strategy #### 2.1 Quant-aware training -To generate fake quantized model with quant-aware strategy, see [Quant-aware training tutorial](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_aware_demo/) +To generate fake quantized model with quant-aware strategy, see [Quant-aware training tutorial](https://paddleslim.readthedocs.io/en/latest/quick_start/quant_aware_tutorial_en.html) **The parameters during quant-aware training:** - **quantize_op_types:** A list of operators to insert `fake_quantize` and `fake_dequantize` ops around them. In PaddlePaddle, quantization of following operators is supported for CPU: `depthwise_conv2d`, `conv2d`, `fc`, `matmul`, `transpose2`, `reshape2`, `pool2d`, `scale`, `concat`. However, inserting fake_quantize/fake_dequantize operators during training is needed only for the first four of them (`depthwise_conv2d`, `conv2d`, `fc`, `matmul`), so setting the `quantize_op_types` parameter to the list of those four ops is enough. Scala data needed for quantization of the other five operators is reused from the fake ops or gathered from the `out_threshold` attributes of the operators. -- **Other parameters:** Please read [PaddleSlim quant_aware API](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/#quant_aware) +- **Other parameters:** Please read [PaddleSlim quant_aware API](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-aware) #### 2.2 Post-training quantization -To generate post-training fake quantized model, see [Offline post-training quantization tutorial](https://paddlepaddle.github.io/PaddleSlim/tutorials/quant_post_demo/#_1) +To generate post-training fake quantized model, see [Offline post-training quantization tutorial](https://paddleslim.readthedocs.io/en/latest/quick_start/index_en.html) ## 3. Convert the fake quantized model to DNNL INT8 model In order to deploy an INT8 model on the CPU, we need to collect scales, remove all fake_quantize/fake_dequantize operators, optimize the graph and quantize it, turning it into the final DNNL INT8 model. This is done by the script [save_quant_model.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_quant_model.py). Copy the script to the directory where the demo is located: `/PATH_TO_PaddleSlim/demo/mkldnn_quant/` and run it as follows: @@ -55,7 +55,7 @@ python save_quant_model.py --quant_model_path=/PATH/TO/SAVE/FLOAT32/quant/MODEL - **int8_model_save_path:** The final INT8 model output path after the quant model is optimized and quantized by DNNL. - **ops_to_quantize:** A comma separated list of specified op types to be quantized. It is optional. If the option is skipped, all quantizable operators will be quantized. Skipping the option is recommended in the first approach as it usually yields best performance and accuracy for image classification models and NLP models listed in the Benchmark.. - **--op_ids_to_skip:** "A comma-separated list of operator ID numbers. It is optional. Default value is none. The op ids in this list will not be quantized and will adopt FP32 type. To get the ID of a specific op, first run the script using the `--debug` option, and open the generated file `int8__cpu_quantize_placement_pass.dot` to find the op that does not need to be quantified, and the ID number is in parentheses after the Op name. -- **--debug:** Generate models graph or not. If this option is present, .dot files with graphs of the model will be generated after each optimization step that modifies the graph. For the description of DOT format, please read [DOT](https://graphviz.gitlab.io/_pages/doc/info/lang.html). To open the `*.dot` file, please use any Graphviz tool available on the system(such as the `xdot` tool on Linux or the `dot` tool on Windows. For Graphviz documentation, see [Graphviz](http://www. graphviz.org/documentation/). +- **--debug:** Generate models graph or not. If this option is present, .dot files with graphs of the model will be generated after each optimization step that modifies the graph. For the description of DOT format, please read [DOT](https://graphviz.gitlab.io/_pages/doc/info/lang.html). To open the `*.dot` file, please use any Graphviz tool available on the system(such as the `xdot` tool on Linux or the `dot` tool on Windows. For Graphviz documentation, see [Graphviz](http://www.graphviz.org/documentation/).    - **Note:** - The DNNL supported quantizable ops are `conv2d`, `depthwise_conv2d`, `fc`, `matmul`, `pool2d`, `reshape2`, `transpose2`, `scale`, `concat`. diff --git a/demo/ocr/README.md b/demo/ocr/README.md index 959c066f..ca6e5d79 100755 --- a/demo/ocr/README.md +++ b/demo/ocr/README.md @@ -1,4 +1,4 @@ -[English](README_en.md) | 简体中文 +English | 简体中文 # SlimOCR模型库 diff --git a/demo/ofa/bert/README.md b/demo/ofa/bert/README.md index 7114b474..dd092b13 100644 --- a/demo/ofa/bert/README.md +++ b/demo/ofa/bert/README.md @@ -1,6 +1,6 @@ # OFA压缩PaddleNLP-BERT模型 -BERT-base模型是一个迁移能力很强的通用语义表示模型,但是模型中也有一些参数冗余。本教程将介绍如何使用PaddleSlim对[PaddleNLP](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/)中BERT-base模型进行压缩。 +BERT-base模型是一个迁移能力很强的通用语义表示模型,但是模型中也有一些参数冗余。本教程将介绍如何使用PaddleSlim对[PaddleNLP](https://paddlenlp.readthedocs.io/zh/latest/)中BERT-base模型进行压缩。 本教程只会演示如何快速启动相应训练,详细教程请参考: [BERT](https://github.com/PaddlePaddle/PaddleSlim/blob/release/2.0.0/docs/zh_cn/nlp/paddlenlp_slim_ofa_tutorial.md) ## 1. 压缩结果 @@ -79,8 +79,8 @@ BERT-base模型是一个迁移能力很强的通用语义表示模型,但是 14.93 - - 40 + + 40 BERT N @@ -185,7 +185,7 @@ pip install paddlepaddle_gpu>=2.0rc1 ``` ### 2.2 Fine-tuing -首先需要对Pretrain-Model在实际的下游任务上进行Fine-tuning,得到需要压缩的模型。Fine-tuning流程参考[Fine-tuning教程](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/examples/bert) +首先需要对Pretrain-Model在实际的下游任务上进行Fine-tuning,得到需要压缩的模型。Fine-tuning流程参考[Fine-tuning教程](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/bert) Fine-tuning 在dev上的结果如压缩结果表1-1『Baseline』那一列所示。 ### 2.3 压缩训练 @@ -268,10 +268,10 @@ python -u ./run_glue_ofa.py --model_type bert \ python3.7 -u ./export_model.py --model_type bert \ --model_name_or_path ${PATH_OF_QQP_MODEL_AFTER_OFA} \ --max_seq_length 128 \ - --sub_model_output_dir ./tmp/$TASK_NAME/dynamic_model \ + --sub_model_output_dir ./tmp/$TASK_NAME/dynamic_model \ --static_sub_model ./tmp/$TASK_NAME/static_model \ - --n_gpu 1 \ - --width_mult 0.6666666666666666 + --n_gpu 1 \ + --width_mult 0.6666666666666666 ``` 其中参数释义如下: diff --git a/demo/pantheon/lexical_anlysis/README.md b/demo/pantheon/lexical_anlysis/README.md index ec3af05d..1287e17a 100644 --- a/demo/pantheon/lexical_anlysis/README.md +++ b/demo/pantheon/lexical_anlysis/README.md @@ -11,7 +11,7 @@ BiGRU is to train a BiGRU based LAC model from scratch; BERT fine-tuned is to fi ## Introduction -Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation. +Lexical Analysis of Chinese, or LAC for short, is a lexical analysis model that completes the tasks of Chinese word segmentation, part-of-speech tagging, and named entity recognition in a single model. We conduct an overall evaluation of word segmentation, part-of-speech tagging, and named entity recognition on a self-built dataset. We use the finetuned ERNIE model as the Teacher model and GRU as the Student model, which are needed by the Pantheon framework for online distillation. #### 1. Download the training data set @@ -37,4 +37,4 @@ bash run_teacher.sh bash run_student.sh ``` -> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis \ No newline at end of file +> If you want to learn more about LAC, you can refer to this repo: https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/lexical_analysis diff --git a/demo/pantheon/lexical_anlysis/README_cn.md b/demo/pantheon/lexical_anlysis/README_cn.md index 77e4a944..98691aaa 100644 --- a/demo/pantheon/lexical_anlysis/README_cn.md +++ b/demo/pantheon/lexical_anlysis/README_cn.md @@ -12,7 +12,7 @@ BiGRU 是使用双向GRU网络从头训练LAC任务;BERT fine-tuned 是在BERT ## 简介 -Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 [ERNIE](https://github.com/PaddlePaddle/LARK/tree/develop/ERNIE) 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。 +Lexical Analysis of Chinese,简称 LAC,是一个联合的词法分析模型,在单个模型中完成中文分词、词性标注、专名识别任务。我们在自建的数据集上对分词、词性标注、专名识别进行整体的评估效果。我们使用经过finetune的 ERNIE 模型作为Teacher模型,使用GRU作为Student模型,使用Pantheon框架进行在线蒸馏。 #### 1. 下载训练数据集 diff --git a/demo/prune/README.md b/demo/prune/README.md index c31f1791..691919d3 100755 --- a/demo/prune/README.md +++ b/demo/prune/README.md @@ -69,7 +69,7 @@ python eval.py \ ## 5. 接口介绍 -该示例使用了`paddleslim.Pruner`工具类,用户接口使用介绍请参考:[API文档](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/) +该示例使用了`paddleslim.Pruner`工具类,用户接口使用介绍请参考:[API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/index.html) 在调用`paddleslim.Pruner`工具类时,需要指定待裁卷积层的参数名称。不同模型的参数命名不同, 在`train.py`脚本中,提供了`get_pruned_params`方法,根据用户设置的选项`--model`确定要裁剪的参数。 diff --git a/demo/quant/deploy/TensorRT/README.md b/demo/quant/deploy/TensorRT/README.md index 2f133b01..ba3eb0a5 100644 --- a/demo/quant/deploy/TensorRT/README.md +++ b/demo/quant/deploy/TensorRT/README.md @@ -79,7 +79,7 @@ LIB_ROOT/ ### 2.1 将模型导出为inference model -* 可以参考[量化训练教程](https://paddleslim.readthedocs.io/zh_CN/latest/quick_start/quant_aware_tutorial.html#id9),在训练完成后导出inference model。 +* 可以参考[量化训练教程](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/quant/index.html),在训练完成后导出inference model。 ``` inference/ diff --git a/docs/en/model_zoo_en.md b/docs/en/model_zoo_en.md index 89a83c63..46608ea0 100644 --- a/docs/en/model_zoo_en.md +++ b/docs/en/model_zoo_en.md @@ -134,7 +134,7 @@ Dataset:WIDER-FACE | BlazeFace | quant_post | 8 | 640 | 87.8/85.1/74.9 (-3.7/-4.1/-4.8) | 228 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_post.tar) | | BlazeFace | quant_aware | 8 | 640 | 90.5/87.9/77.6 (-1.0/-1.3/-2.1) | 228 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_aware.tar) | | BlazeFace-Lite | - | 8 | 640 | 90.9/88.5/78.1 | 711 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) | -| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [model]((https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar)) | +| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar) | | BlazeFace-Lite | quant_aware | 8 | 640 | 89.7/87.3/77.0 (-1.2/-1.2/-1.1) | 211 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_aware.tar) | | BlazeFace-NAS | - | 8 | 640 | 83.7/80.7/65.8 | 244 | [model](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) | | BlazeFace-NAS | quant_post | 8 | 640 | 81.6/78.3/63.6 (-2.1/-2.4/-2.2) | 71 | [model](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_nas_quant_post.tar) | diff --git a/docs/en/quick_start/nas_tutorial_en.md b/docs/en/quick_start/nas_tutorial_en.md index 622c8224..51065bf9 100644 --- a/docs/en/quick_start/nas_tutorial_en.md +++ b/docs/en/quick_start/nas_tutorial_en.md @@ -1,6 +1,6 @@ # Nerual Architecture Search for Image Classification -This tutorial shows how to use [API](../api/nas_api.md) about SANAS in PaddleSlim. We start experiment based on MobileNetV2 as example. The tutorial contains follow section. +This tutorial shows how to use [API](https://paddleslim.readthedocs.io/en/latest/api_en/paddleslim.nas.html) about SANAS in PaddleSlim. We start experiment based on MobileNetV2 as example. The tutorial contains follow section. 1. necessary imports 2. initial SANAS instance diff --git a/docs/en/quick_start/quant_aware_tutorial_en.md b/docs/en/quick_start/quant_aware_tutorial_en.md index ada6e6ea..0f44f489 100644 --- a/docs/en/quick_start/quant_aware_tutorial_en.md +++ b/docs/en/quick_start/quant_aware_tutorial_en.md @@ -1,6 +1,6 @@ # Training-aware Quantization of image classification model - quick start -This tutorial shows how to do training-aware quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_aware) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: +This tutorial shows how to do training-aware quantization using [API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: 1. Necessary imports 2. Model architecture @@ -90,7 +90,7 @@ test(val_program) ## 4. Quantization -We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#id2). +We call ``quant_aware`` API to add quantization and dequantization operators in ``train_program`` and ``val_program`` according to [default configuration](https://paddleslim.readthedocs.io/en/latest/api_en/paddleslim.quant.html). ```python quant_program = slim.quant.quant_aware(train_program, exe.place, for_test=False) @@ -115,7 +115,7 @@ test(val_quant_program) ## 6. Save model after quantization -The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``. +The model in ``4. Quantization`` after calling ``slim.quant.quant_aware`` API is only suitable to train. To get the inference model, we should use [slim.quant.convert](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#convert) API to change model architecture and use [fluid.io.save_inference_model](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/api_cn/io_cn/save_inference_model_cn.html#save-inference-model) to save model. ``float_prog``'s parameters are float32 dtype but in int8's range which can be used in ``fluid`` or ``paddle-lite``. ``paddle-lite`` will change the parameters' dtype from float32 to int8 first when loading the inference model. ``int8_prog``'s parameters are int8 dtype and we can get model size after quantization by saving it. ``int8_prog`` cannot be used in ``fluid`` or ``paddle-lite``. ```python diff --git a/docs/en/quick_start/quant_post_static_tutorial_en.md b/docs/en/quick_start/quant_post_static_tutorial_en.md index fd7f8508..6272dccc 100644 --- a/docs/en/quick_start/quant_post_static_tutorial_en.md +++ b/docs/en/quick_start/quant_post_static_tutorial_en.md @@ -1,6 +1,6 @@ # Post-training Quantization of image classification model - quick start -This tutorial shows how to do post training quantization using [API](https://paddlepaddle.github.io/PaddleSlim/api_en/paddleslim.quant.html#paddleslim.quant.quanter.quant_post) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: +This tutorial shows how to do post training quantization using [API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) in PaddleSlim. We use MobileNetV1 to train image classification model as example. The tutorial contains follow sections: 1. Necessary imports 2. Model architecture diff --git a/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md b/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md index 043e144a..1df6817d 100644 --- a/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md +++ b/docs/en/tutorials/image_classification_sensitivity_analysis_tutorial_en.md @@ -1,6 +1,6 @@ # Pruning of image classification model - sensitivity -In this tutorial, you will learn how to use [sensitivity API of PaddleSlim](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#sensitivity) by a demo of MobileNetV1 model on MNIST dataset。 +In this tutorial, you will learn how to use [sensitivity API of PaddleSlim](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) by a demo of MobileNetV1 model on MNIST dataset。 This tutorial following workflow: 1. Import dependency @@ -107,7 +107,7 @@ params = params[:5] ### 7.1 Compute in single process -Apply sensitivity analysis on pretrained model by calling [sensitivity API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#sensitivity). +Apply sensitivity analysis on pretrained model by calling [sensitivity API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html). The sensitivities will be appended into the file given by option `sensitivities_file` during computing. The information in this file won`t be computed repeatedly. @@ -197,7 +197,7 @@ Pruning model according to the sensitivities generated in section 7.3.3. ### 8.1 Get pruning ratios -Get a group of ratios by calling [get_ratios_by_loss](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#get_ratios_by_loss) fuction: +Get a group of ratios by calling [get_ratios_by_loss](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) fuction: ```python @@ -223,7 +223,7 @@ print("FLOPs after pruning: {}".format(slim.analysis.flops(pruned_program))) ### 8.3 Pruning test network -Note:The `only_graph` should be set to True while pruning test network. [Pruner API](https://paddlepaddle.github.io/PaddleSlim/api/prune_api/#pruner) +Note:The `only_graph` should be set to True while pruning test network. [Pruner API](https://paddleslim.readthedocs.io/en/latest/api_en/index_en.html) ```python diff --git a/docs/zh_cn/FAQ/quantization_FAQ.md b/docs/zh_cn/FAQ/quantization_FAQ.md index 8ac10e43..24a9dead 100644 --- a/docs/zh_cn/FAQ/quantization_FAQ.md +++ b/docs/zh_cn/FAQ/quantization_FAQ.md @@ -18,9 +18,9 @@ - 如果量化模型在ARM上线,则需要使用[Paddle-Lite](https://paddle-lite.readthedocs.io/zh/latest/index.html). - - Paddle-Lite会对量化模型进行模型转化和优化,转化方法见[链接](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_quantization.html#paddle-lite)。 + - Paddle-Lite会对量化模型进行模型转化和优化,转化方法见[链接](https://paddle-lite.readthedocs.io/zh/latest/index.html#sec-user-guides)。 - - 转化之后可以像非量化模型一样使用[Paddle-Lite API](https://paddle-lite.readthedocs.io/zh/latest/user_guides/tutorial.html#lite)进行加载预测。 + - 转化之后可以像非量化模型一样使用[Paddle-Lite API](https://paddle-lite.readthedocs.io/zh/latest/index.html)进行加载预测。 - 如果量化模型在GPU上线,则需要使用[Paddle-TensorRT 预测接口](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/advanced_guide/performance_improving/inference_improving/paddle_tensorrt_infer.html). @@ -35,7 +35,7 @@ config->EnableTensorRtEngine(1 << 20 /* workspace_size*/, false /* use_calib_mode*/); ``` -- 如果量化模型在x86上线,需要使用[INT8 MKL-DNN](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/slim_int8_mkldnn_post_training_quantization.md) +- 如果量化模型在x86上线,需要使用[INT8 MKL-DNN](https://github.com/PaddlePaddle/Paddle/tree/develop/python/paddle/fluid/contrib/slim/tests) - 首先对模型进行转化,可以参考[脚本](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/save_quant_model.py) diff --git a/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md b/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md index 63e17a15..eb59c4e8 100644 --- a/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md +++ b/docs/zh_cn/cv/detection/static/paddledetection_slim_quantization_tutorial.md @@ -16,9 +16,9 @@ | MobileNetV1 | ImageNet | post | 608 | 27.9 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) | | MobileNetV1 | ImageNet | post | 416 | 28.0 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) | | MobileNetV1 | ImageNet | post | 320 | 26.0 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_post.tar) | -| MobileNetV1 | ImageNet | aware | 608 | 28.1 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) | -| MobileNetV1 | ImageNet | aware | 416 | 28.2 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) | -| MobileNetV1 | ImageNet | aware | 320 | 25.8 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_mobilenetv1_coco_quant_aware.tar) | +| MobileNetV1 | ImageNet | aware | 608 | 28.1 | 下载链接 | +| MobileNetV1 | ImageNet | aware | 416 | 28.2 | 下载链接 | +| MobileNetV1 | ImageNet | aware | 320 | 25.8 | 下载链接 | | ResNet34 | ImageNet | post | 608 | 35.7 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_post.tar) | | ResNet34 | ImageNet | aware | 608 | 35.2 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_aware.tar) | | ResNet34 | ImageNet | aware | 416 | 33.3 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/yolov3_r34_coco_quant_aware.tar) | diff --git a/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md b/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md index 3bb6cc51..999bbe86 120000 --- a/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md +++ b/docs/zh_cn/deploy/deploy_cls_model_on_nvidia_gpu.md @@ -1 +1 @@ -../../../demo/quant/deploy/TensorRT/README.md \ No newline at end of file +../../../demo/quant/deploy/TensorRT/README.md diff --git a/docs/zh_cn/model_zoo.md b/docs/zh_cn/model_zoo.md index a662fb2a..a2fa07cb 100644 --- a/docs/zh_cn/model_zoo.md +++ b/docs/zh_cn/model_zoo.md @@ -142,7 +142,7 @@ Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6, | BlazeFace | quant_post | 8 | 640 | 87.8/85.1/74.9 (-3.7/-4.1/-4.8) | 228 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_post.tar) | | BlazeFace | quant_aware | 8 | 640 | 90.5/87.9/77.6 (-1.0/-1.3/-2.1) | 228 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_origin_quant_aware.tar) | | BlazeFace-Lite | - | 8 | 640 | 90.9/88.5/78.1 | 711 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_lite.tar) | -| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [下载链接]((https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar)) | +| BlazeFace-Lite | quant_post | 8 | 640 | 89.4/86.7/75.7 (-1.5/-1.8/-2.4) | 211 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_post.tar) | | BlazeFace-Lite | quant_aware | 8 | 640 | 89.7/87.3/77.0 (-1.2/-1.2/-1.1) | 211 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_lite_quant_aware.tar) | | BlazeFace-NAS | - | 8 | 640 | 83.7/80.7/65.8 | 244 | [下载链接](https://paddlemodels.bj.bcebos.com/object_detection/blazeface_nas.tar) | | BlazeFace-NAS | quant_post | 8 | 640 | 81.6/78.3/63.6 (-2.1/-2.4/-2.2) | 71 | [下载链接](https://paddlemodels.bj.bcebos.com/PaddleSlim/blazeface_nas_quant_post.tar) | diff --git a/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md b/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md index 8dda87f7..887d8f41 100644 --- a/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md +++ b/docs/zh_cn/quick_start/dygraph/dygraph_quant_aware_training_tutorial.md @@ -116,7 +116,7 @@ quanter.save_quantized_model( 导出的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[文档](../../deploy/index.html) +* 部署[文档](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md b/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md index 4281d0a6..77cd49aa 100644 --- a/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md +++ b/docs/zh_cn/quick_start/dygraph/dygraph_quant_post_tutorial.md @@ -91,7 +91,7 @@ paddleslim.quant.quant_post_static( 导出的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[文档](../../deploy/index.html) +* 部署[文档](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/quick_start/static/quant_aware_tutorial.md b/docs/zh_cn/quick_start/static/quant_aware_tutorial.md index 998e8d82..f4cd4b71 100644 --- a/docs/zh_cn/quick_start/static/quant_aware_tutorial.md +++ b/docs/zh_cn/quick_start/static/quant_aware_tutorial.md @@ -176,7 +176,7 @@ paddle.static.save_inference_model( 保存的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[简介](../../deploy/index.html) +* 部署[简介](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md b/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md index f0164987..e083af5f 100755 --- a/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md +++ b/docs/zh_cn/quick_start/static/quant_post_static_tutorial.md @@ -168,7 +168,7 @@ test(quant_post_static_prog, fetch_targets) 保存的量化模型相比原始FP32模型,模型体积没有明显差别,这是因为量化预测模型中的权重依旧保存为FP32类型。在部署时,使用PaddleLite opt工具转换量化预测模型后,模型体积才会真实减小。 部署参考文档: -* 部署[简介](../../deploy/index.html) +* 部署[简介](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html) * PaddleLite部署量化模型[文档](https://paddle-lite.readthedocs.io/zh/latest/user_guides/quant_aware.html) * PaddleInference Intel CPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_x86_cpu_int8.html) * PaddleInference NV GPU部署量化模型[文档](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html) diff --git a/docs/zh_cn/tutorials/nas/overview.md b/docs/zh_cn/tutorials/nas/overview.md index 10556c21..67d084c5 100644 --- a/docs/zh_cn/tutorials/nas/overview.md +++ b/docs/zh_cn/tutorials/nas/overview.md @@ -7,7 +7,7 @@ PaddleSlim提供了4种网络结构搜索的方法:基于模拟退火进行网 | [Once-For-All](https://paddleslim.readthedocs.io/zh_CN/latest/tutorials/nas/dygraph/nas_ofa.html) | OFA是一种基于One-Shot NAS的压缩方案。这种方式比较高效,其优势是只需要训练一个超网络就可以从中选择满足不同延时要求的子模型。 | Once-For-All | | [SANAS](https://paddleslim.readthedocs.io/zh_CN/latest/quick_start/static/nas_tutorial.html) | SANAS是基于模拟退火的方式进行网络结构搜索,在机器资源不多的情况下,选择这种方式一般能得到比强化学习更好的模型。 | \ | | [RLNAS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/nas/nas_api.html#rlnas) | RLNAS是基于强化学习的方式进行网络结构搜索,这种方式需要耗费大量机器资源。 | ENAS、NasNet、MNasNet | -| [DARTS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/darts.html) | DARTS是基于梯度的方式进行网络结构搜索,可以大大缩短搜索时长。 | DARTS、PCDARTS | +| [DARTS](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/index.html) | DARTS是基于梯度的方式进行网络结构搜索,可以大大缩短搜索时长。 | DARTS、PCDARTS | ## 参考文献 [1] H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. Once for all: Train one network and specialize it for efficient deployment. In International Conference on Learning Representations, 2020. diff --git a/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md b/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md index b280db4e..90ee4750 100644 --- a/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md +++ b/docs/zh_cn/tutorials/nas/static/sanas_darts_space.md @@ -232,7 +232,7 @@ exe.run(startup_program) ``` #### 9.5 定义输入数据 -由于本示例中对cifar10中的图片进行了一些额外的预处理操作,和[快速开始](https://paddlepaddle.github.io/PaddleSlim/quick_start/nas_tutorial.html)示例中的reader不同,所以需要自定义cifar10的reader,不能直接调用paddle中封装好的`paddle.dataset.cifar10`的reader。自定义cifar10的reader文件位于[demo/nas](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/nas/darts_cifar10_reader.py)中。 +由于本示例中对cifar10中的图片进行了一些额外的预处理操作,和[快速开始](https://paddleslim.readthedocs.io/zh_CN/latest/deploy/index.html)示例中的reader不同,所以需要自定义cifar10的reader,不能直接调用paddle中封装好的`paddle.dataset.cifar10`的reader。自定义cifar10的reader文件位于[demo/nas](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/demo/nas/darts_cifar10_reader.py)中。 **注意:**本示例为了简化代码直接调用`paddle.dataset.cifar10`定义训练数据和预测数据,实际训练需要使用自定义cifar10文件中的reader。 ```python -- GitLab