add ptq docs and demo (#5451)

* add ptq docs and demo * fix readme * update readme

add ptq docs and demo (#5451)
* add ptq docs and demo * fix readme * update readme
fa7227e5 · Guanghua Yu · GitHub · 4f732271 · fa7227e5 · fa7227e5
8 changed file
--- a/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/README.md
+++ b/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/README.md
+# MobileNetV3
+
+## 目录
+
+
+- [1. 简介](#1)
+- [2. 离线量化](#2)
+    - [2.1 准备Inference模型及环境](#2.1)
+    - [2.2 开始离线量化](#2.2)
+    - [2.3 验证推理结果](#2.3)
+- [3. FAQ](#3)
+
+
+<a name="1"></a>
+
+## 1. 简介
+
+Paddle中静态离线量化，使用少量校准数据计算量化因子，可以快速将FP32模型量化成低比特模型（比如最常用的int8量化）。使用该量化模型进行预测，可以减少计算量、降低计算内存、减小模型大小。
+
+本文档主要基于Paddle的MobileNetV3模型进行离线量化。
+
+更多关于Paddle 模型离线量化的介绍，可以参考[Paddle 离线量化官网教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)。
+
+
+<a name="2"></a>
+
+## 2. 离线量化
+
+<a name="2.1"></a>
+
+### 2.1 准备Inference模型及环境
+
+由于离线量化直接使用Inference模型进行量化，不依赖模型组网，所以需要提前准备好Inference模型.
+我们准备好了动转静后的MobileNetv3 small的Inference模型，可以从[mobilenet_v3_small_infer](https://paddle-model-ecology.bj.bcebos.com/model/mobilenetv3_reprod/mobilenet_v3_small_infer.tar)直接下载。
+
+```shell
+wget https://paddle-model-ecology.bj.bcebos.com/model/mobilenetv3_reprod/mobilenet_v3_small_infer.tar
+tar -xf mobilenet_v3_small_infer.tar
+```
+
+也可以按照[MobileNetv3 动转静流程](xxx)，将MobileNetv3 small的模型转成Inference模型。
+
+<a name="2.2"></a>
+
+环境准备：
+
+- 安装PaddleSlim：
+```shell
+pip install paddleslim==2.2.1
+```
+
+- 安装PaddlePaddle：
+```shell
+pip install paddlepaddle-gpu==2.2.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+```
+
+- 准备数据：
+
+请参考[数据准备文档](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6#32-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE)。
+
+### 2.2 开始离线量化
+
+启动离线量化：
+
+```bash
+python post_quant.py --model_path=mobilenet_v3_small_infer/ \
+            --model_filename=inference.pdmodel \
+            --params_filename=inference.pdiparams  \
+            --data_dir=/path/dataset/ILSVRC2012/ \
+            --use_gpu=True \
+            --batch_size=32 \
+            --batch_num=20
+```
+
+部分离线量化日志如下：
+
+```
+Thu Dec 30 12:36:17-INFO: Collect quantized variable names ...
+Thu Dec 30 12:36:17-INFO: Preparation stage ...
+Thu Dec 30 12:36:27-INFO: Run batch: 0
+Thu Dec 30 12:37:10-INFO: Run batch: 5
+Thu Dec 30 12:37:43-INFO: Finish preparation stage, all batch:10
+Thu Dec 30 12:37:43-INFO: Sampling stage ...
+Thu Dec 30 12:38:10-INFO: Run batch: 0
+Thu Dec 30 12:39:03-INFO: Run batch: 5
+Thu Dec 30 12:39:46-INFO: Finish sampling stage, all batch: 10
+Thu Dec 30 12:39:46-INFO: Calculate hist threshold ...
+Thu Dec 30 12:39:47-INFO: Update the program ...
+Thu Dec 30 12:39:49-INFO: The quantized model is saved in output/mv3_int8_infer
+```
+
+离线量化完成后，会在`output_dir`中生成量化后的Inference模型。
+
+<a name="2.3"></a>
+
+### 2.3 验证推理结果
+
+- 量化推理模型重新命名：
+
+需要将`__model__`重命名为`inference.pdmodel`，将`__params__`重命名为`inference.pdiparams`。
+
+正确的命名如下：
+```shell
+output/mv3_int8_infer/
+    |----inference.pdiparams     : 模型参数文件(原__params__文件)
+    |----inference.pdmodel       : 模型结构文件(原__model__文件)
+```
+
+- 使用Paddle Inference测试模型推理结果是否正确：
+
+具体测试流程请参考[Inference推理文档](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/inference_python/README.md)
+
+如果您希望验证量化模型的在全量验证集上的精度，也可以按照下面的步骤进行操作:
+
+使用如下命令验证MobileNetv3 small模型的精度：
+
+- FP32模型：
+```bash
+python eval.py --model_path=mobilenet_v3_small_infer/ \
+        --model_filename=inference.pdmodel \
+        --params_filename=inference.pdiparams \
+        --data_dir=/path/dataset/ILSVRC2012/ \
+        --batch_size=128 \
+        --use_gpu=True
+```
+
+FP32模型精度验证日志如下：
+
+```
+batch_id 300, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
+batch_id 310, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
+batch_id 320, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
+batch_id 330, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
+batch_id 340, acc1 0.601, acc5 0.825, avg time 0.00005 sec/img
+batch_id 350, acc1 0.601, acc5 0.825, avg time 0.00005 sec/img
+batch_id 360, acc1 0.602, acc5 0.826, avg time 0.00005 sec/img
+batch_id 370, acc1 0.602, acc5 0.826, avg time 0.00005 sec/img
+batch_id 380, acc1 0.602, acc5 0.825, avg time 0.00005 sec/img
+batch_id 390, acc1 0.601, acc5 0.825, avg time 0.00005 sec/img
+End test: test image 50000.0
+test_acc1 0.6015, test_acc5 0.8253, avg time 0.00005 sec/img
+```
+
+- 量化模型：
+```shell
+python eval.py --model_path=output/mv3_int8_infer/ \
+        --model_filename=__model__ \
+        --params_filename=__params__ \
+        --data_dir=/path/dataset/ILSVRC2012/ \
+        --batch_size=128 \
+        --use_gpu=True
+```
+
+量化后模型精度验证日志如下：
+
+```
+batch_id 300, acc1 0.564, acc5 0.800, avg time 0.00006 sec/img
+batch_id 310, acc1 0.562, acc5 0.798, avg time 0.00006 sec/img
+batch_id 320, acc1 0.560, acc5 0.796, avg time 0.00006 sec/img
+batch_id 330, acc1 0.556, acc5 0.792, avg time 0.00006 sec/img
+batch_id 340, acc1 0.554, acc5 0.792, avg time 0.00006 sec/img
+batch_id 350, acc1 0.552, acc5 0.790, avg time 0.00006 sec/img
+batch_id 360, acc1 0.550, acc5 0.789, avg time 0.00006 sec/img
+batch_id 370, acc1 0.551, acc5 0.789, avg time 0.00006 sec/img
+batch_id 380, acc1 0.551, acc5 0.789, avg time 0.00006 sec/img
+batch_id 390, acc1 0.553, acc5 0.790, avg time 0.00006 sec/img
+End test: test image 50000.0
+test_acc1 0.5530, test_acc5 0.7905, avg time 0.00006 sec/img
+```
+
+<a name="3"></a>
+
+## 3. FAQ
--- a/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py
+++ b/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import os
+import numpy as np
+import time
+import sys
+import argparse
+import math
+
+sys.path[0] = os.path.join(
+    os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
+
+import paddle
+import paddle.inference as paddle_infer
+
+from presets import ClassificationPresetEval
+import paddlevision
+
+
+def eval():
+    # create predictor
+    model_file = os.path.join(FLAGS.model_path, FLAGS.model_filename)
+    params_file = os.path.join(FLAGS.model_path, FLAGS.params_filename)
+    config = paddle_infer.Config(model_file, params_file)
+    if FLAGS.use_gpu:
+        config.enable_use_gpu(1000, 0)
+    if not FLAGS.ir_optim:
+        config.switch_ir_optim(False)
+
+    predictor = paddle_infer.create_predictor(config)
+
+    input_names = predictor.get_input_names()
+    input_handle = predictor.get_input_handle(input_names[0])
+    output_names = predictor.get_output_names()
+    output_handle = predictor.get_output_handle(output_names[0])
+
+    # prepare data
+    resize_size, crop_size = (256, 224)
+    val_dataset = paddlevision.datasets.ImageFolder(
+        os.path.join(FLAGS.data_dir, 'val'),
+        ClassificationPresetEval(
+            crop_size=crop_size, resize_size=resize_size))
+
+    eval_loader = paddle.io.DataLoader(
+        val_dataset, batch_size=FLAGS.batch_size, num_workers=5)
+
+    cost_time = 0.
+    total_num = 0.
+    correct_1_num = 0
+    correct_5_num = 0
+    for batch_id, data in enumerate(eval_loader()):
+        # set input
+        img_np = np.array([tensor.numpy() for tensor in data[0]])
+        label_np = np.array([tensor.numpy() for tensor in data[1]])
+
+        input_handle.reshape(img_np.shape)
+        input_handle.copy_from_cpu(img_np)
+
+        # run
+        t1 = time.time()
+        predictor.run()
+        t2 = time.time()
+        cost_time += (t2 - t1)
+
+        output_data = output_handle.copy_to_cpu()
+
+        # calculate accuracy
+        for i in range(len(label_np)):
+            label = label_np[i][0]
+            result = output_data[i, :]
+            index = result.argsort()
+            total_num += 1
+            if index[-1] == label:
+                correct_1_num += 1
+            if label in index[-5:]:
+                correct_5_num += 1
+
+        if batch_id % 10 == 0:
+            acc1 = correct_1_num / total_num
+            acc5 = correct_5_num / total_num
+            avg_time = cost_time / total_num
+            print(
+                "batch_id {}, acc1 {:.3f}, acc5 {:.3f}, avg time {:.5f} sec/img".
+                format(batch_id, acc1, acc5, avg_time))
+
+    acc1 = correct_1_num / total_num
+    acc5 = correct_5_num / total_num
+    avg_time = cost_time / total_num
+    print("End test: test image {}".format(total_num))
+    print("test_acc1 {:.4f}, test_acc5 {:.4f}, avg time {:.5f} sec/img".format(
+        acc1, acc5, avg_time))
+    print("\n")
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        '--model_path', type=str, default="", help="The inference model path.")
+    parser.add_argument(
+        '--model_filename',
+        type=str,
+        default="model.pdmodel",
+        help="model filename")
+    parser.add_argument(
+        '--params_filename',
+        type=str,
+        default="model.pdiparams",
+        help="params filename")
+    parser.add_argument(
+        '--data_dir',
+        type=str,
+        default="dataset/ILSVRC2012/",
+        help="The ImageNet dataset root dir.")
+    parser.add_argument(
+        '--batch_size', type=int, default=10, help="Batch size.")
+    parser.add_argument(
+        '--use_gpu', type=bool, default=False, help=" Whether use gpu or not.")
+    parser.add_argument(
+        '--ir_optim', type=bool, default=False, help="Enable ir optim.")
+
+    FLAGS = parser.parse_args()
+
+    eval()
--- a/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/post_quant.py
+++ b/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/post_quant.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import division
+from __future__ import print_function
+
+import argparse
+import os
+import sys
+import numpy as np
+from PIL import Image
+
+sys.path[0] = os.path.join(
+    os.path.dirname("__file__"), os.path.pardir, os.path.pardir)
+
+import paddle
+import paddlevision
+from presets import ClassificationPresetEval
+
+from paddleslim.quant import quant_post_static
+
+
+def sample_generator(loader):
+    def __reader__():
+        for indx, data in enumerate(loader):
+            images = np.array(data[0])
+            yield images
+
+    return __reader__
+
+
+def main():
+    paddle.enable_static()
+    place = paddle.CUDAPlace(0) if FLAGS.use_gpu else paddle.CPUPlace()
+    resize_size, crop_size = (256, 224)
+    val_dataset = paddlevision.datasets.ImageFolder(
+        os.path.join(FLAGS.data_dir, 'val'),
+        ClassificationPresetEval(
+            crop_size=crop_size, resize_size=resize_size))
+    data_loader = paddle.io.DataLoader(
+        val_dataset, places=place, batch_size=FLAGS.batch_size)
+    quant_output_dir = os.path.join(FLAGS.output_dir, "mv3_int8_infer")
+
+    exe = paddle.static.Executor(place)
+    quant_post_static(
+        executor=exe,
+        model_dir=FLAGS.model_path,
+        quantize_model_path=quant_output_dir,
+        sample_generator=sample_generator(data_loader),
+        model_filename=FLAGS.model_filename,
+        params_filename=FLAGS.params_filename,
+        batch_size=FLAGS.batch_size,
+        batch_nums=FLAGS.batch_num,
+        algo=FLAGS.algo,
+        hist_percent=FLAGS.hist_percent)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser("Quantization on ImageNet")
+
+    parser.add_argument(
+        "--model_path", type=str, default=None, help="Inference model path")
+    parser.add_argument(
+        "--model_filename",
+        type=str,
+        default=None,
+        help="Inference model model_filename")
+    parser.add_argument(
+        "--params_filename",
+        type=str,
+        default=None,
+        help="Inference model params_filename")
+    parser.add_argument(
+        "--output_dir", type=str, default='output', help="save dir")
+    parser.add_argument(
+        '--data_dir',
+        default="/dataset/ILSVRC2012",
+        help='path to dataset (should have subdirectories named "train" and "val"'
+    )
+    parser.add_argument(
+        '--use_gpu',
+        default=True,
+        type=bool,
+        help='Whether to use GPU or not.')
+
+    # train
+    parser.add_argument(
+        "--batch_num", default=10, type=int, help="batch num for quant")
+    parser.add_argument(
+        "--batch_size", default=10, type=int, help="batch size for quant")
+    parser.add_argument(
+        '--algo', default='hist', type=str, help="calibration algorithm")
+    parser.add_argument(
+        '--hist_percent',
+        default=0.999,
+        type=float,
+        help="The percentile of algo:hist")
+
+    FLAGS = parser.parse_args()
+    assert FLAGS.data_dir, "error: must provide data path"
+
+    main()
--- a/tutorials/tipc/images/post_training_quant_guide.png
+++ b/tutorials/tipc/images/post_training_quant_guide.png
--- a/tutorials/tipc/kl_infer_python/kl_infer_python.md
+++ b/tutorials/tipc/kl_infer_python/kl_infer_python.md
-# Linux GPU/CPU 离线量化功能开发文档
-
-# 目录
-
- [1. 简介](#1---)
- [2. 开发流程](#2---)
- [3. FAQ](#3---)
--- a/tutorials/tipc/kl_infer_python/README.md
+++ b/tutorials/tipc/kl_infer_python/README.md
--- a/tutorials/tipc/ptq_infer_python/ptq_infer_python.md
+++ b/tutorials/tipc/ptq_infer_python/ptq_infer_python.md
+# Linux GPU/CPU 离线量化功能开发文档
+
+# 目录
+
+- [1. 简介](#1)
+- [2. 离线量化功能开发](#2)
+    - [2.1 准备校准数据和环境](#2.1)
+    - [2.2 准备推理模型](#2.2)
+    - [2.3 准备离线量化代码](#2.3)
+    - [2.4 开始离线量化](#2.4)
+    - [2.5 验证推理结果正确性](#2.5)
+- [3. FAQ](#3)
+    - [3.1 通用问题](#3.1)
+
+
+<a name="1"></a>
+
+## 1. 简介
+
+Paddle中静态离线量化，使用少量校准数据计算量化因子，可以快速将FP32模型量化成低比特模型（比如最常用的int8量化）。使用该量化模型进行预测，可以减少计算量、降低计算内存、减小模型大小。
+
+更多关于Paddle 模型离线量化的介绍，可以参考[Paddle 离线量化官网教程](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)。
+
+<a name="2"></a>
+
+## 2. 离线量化功能开发
+
+Paddle 混合精度训练开发可以分为4个步骤，如下图所示。
+
+<div align="center">
+    <img src="../images/post_training_quant_guide.png" width="600">
+</div>
+
+其中设置了2个核验点，分别为：
+
+* 准备推理模型
+* 验证量化模型推理结果正确性
+
+<a name="2.1"></a>
+
+### 2.1 准备校准数据和环境
+
+**【准备校准数据】**
+
+由于离线量化需要获得网络预测的每层的scale值，用来做数值范围的映射，所以需要适量的数据执行网络前向，故需要事先准备好校准数据集。
+
+以ImageNet1k数据集为例，可参考[数据准备文档](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6#32-%E5%87%86%E5%A4%87%E6%95%B0%E6%8D%AE)。
+
+**【准备开发环境】**
+
+- 确定已安装paddle，通过pip安装linux版本paddle命令如下，更多的版本安装方法可查看飞桨[官网](https://www.paddlepaddle.org.cn/)
+- 确定已安装paddleslim，通过pip安装linux版本paddle命令如下，更多的版本安装方法可查看[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim)
+
+```
+pip install paddlepaddle-gpu==2.2.1.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+pip install paddleslim==2.2.1
+```
+
+<a name="2.2"></a>
+
+### 2.2 准备推理模型
+
+**【基本流程】**
+
+准备推理模型分为三步：
+
+- Step1：定义继承自`paddle.nn.Layer`的网络模型
+
+- Step2：使用`paddle.jit.save`接口对模型进行动转静，导出成Inference模型
+
+- Step3：检查导出的路径下是否生成 `model.pdmodel` 和 `model.pdiparams` 文件
+
+**【实战】**
+
+模型组网可以参考[mobilenet_v3](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/paddlevision/models/mobilenet_v3.py)
+
+```python
+fp32_model = mobilenet_v3_small()
+fp32_model.eval()
+```
+
+然后将模型进行动转静：
+
+```python
+# save inference model
+input_spec = paddle.static.InputSpec(
+    shape=[None, 3, 224, 224], dtype='float32')
+fp32_output_model_path = os.path.join("mv3_fp32_infer", "model")
+paddle.jit.save(fp32_model, fp32_output_model_path, [input_spec])
+```
+
+会在`mv3_fp32_infer`文件夹下生成`model.pdmodel` 和 `model.pdiparams`两个文件。
+
+<a name="2.3"></a>
+
+### 2.3 准备离线量化代码
+
+**【基本流程】**
+
+基于PaddleSlim，使用接口``paddleslim.quant.quant_post_static``对模型进行离线量化：
+
+- Step1：定义`sample_generator`，传入paddle.io.Dataloader实例化对象，用来遍历校准数据集
+
+- Step2：定义Executor，由于离线量化模型是Inference模型，量化校准过程也需要在静态图下执行，所以需要定义静态图Executor，用来执行离线量化校准执行
+
+
+**【实战】**
+
+1）定义数据集，可以参考[Datasets定义](https://github.com/PaddlePaddle/models/blob/release/2.2/tutorials/mobilenetv3_prod/Step6/paddlevision/datasets/vision.py)
+
+2）定义`sample_generator`：
+
+```python
+def sample_generator(loader):
+    def __reader__():
+        for indx, data in enumerate(loader):
+            images = np.array(data[0])
+            yield images
+
+    return __reader__
+```
+
+2）定义Executor：
+
+```python
+use_gpu = True
+place = paddle.CUDAPlace(0) if use_gpu else paddle.CPUPlace()
+exe = paddle.static.Executor(place)
+```
+
+
+<a name="2.4"></a>
+
+### 2.4 开始离线量化
+
+**【基本流程】**
+
+使用飞桨PaddleSlim中的`quant_post_static`接口开始进行离线量化：
+
+- Step1：导入`quant_post_static`接口
+```python
+from paddleslim.quant import quant_post_static
+```
+
+- Step2：配置传入`quant_post_static`接口参数，开始离线量化
+
+```python
+fp32_model_dir = 'mv3_fp32_infer'
+quant_output_dir = 'quant_model'
+quant_post_static(
+        executor=exe,
+	model_dir=fp32_model_dir,
+	quantize_model_path=quant_output_dir,
+	sample_generator=sample_generator(data_loader),
+	model_filename='model.pdmodel',
+	params_filename='model.pdiparams',
+	batch_size=32,
+	batch_nums=10,
+	algo='KL')
+```
+
+- Step3：检查输出结果，确保离线量化后生成`__model__`和`__params__`文件。
+
+
+**【实战】**
+
+开始离线量化，具体可参考MobileNetv3[离线量化代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/post_quant.py)。
+
+
+<a name="2.5"></a>
+
+### 2.5 验证推理结果正确性
+
+**【基本流程】**
+
+使用Paddle Inference库测试离线量化模型，确保模型精度符合预期。
+
+- Step1：初始化`paddle.inference`库并配置相应参数
+
+```python
+import paddle.inference as paddle_infer
+model_file = os.path.join('quant_model', '__model__')
+params_file = os.path.join('quant_model', '__params__')
+config = paddle_infer.Config(model_file, params_file)
+if FLAGS.use_gpu:
+    config.enable_use_gpu(1000, 0)
+if not FLAGS.ir_optim:
+    config.switch_ir_optim(False)
+
+predictor = paddle_infer.create_predictor(config)
+```
+
+- Step2：配置预测库输入输出
+
+```python
+```python
+input_names = predictor.get_input_names()
+input_handle = predictor.get_input_handle(input_names[0])
+output_names = predictor.get_output_names()
+output_handle = predictor.get_output_handle(output_names[0])
+```
+
+- Step3：开始预测并检验结果正确性
+
+```python
+```python
+input_handle.copy_from_cpu(img_np)
+predictor.run()
+ output_data = output_handle.copy_to_cpu()
+```
+
+**【实战】**
+
+
+1）初始化`paddle.inference`库并配置相应参数：
+
+具体可以参考MobileNetv3 [Inference模型测试代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py)
+
+2）配置预测库输入输出：
+
+具体可以参考MobileNetv3 [Inference模型测试代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py)
+
+3）开始预测：
+
+具体可以参考MobileNetv3 [Inference模型测试代码](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/eval.py)
+
+4）测试单张图像预测结果是否正确，可参考[Inference预测文档](https://github.com/PaddlePaddle/models/blob/release/2.2/docs/tipc/train_infer_python/infer_python.md)
+
+5）同时也可以测试量化模型和FP32模型的精度，确保量化后模型精度损失符合预期。参考[MobileNet量化模型精度验证文档](https://github.com/PaddlePaddle/models/tree/release/2.2/tutorials/mobilenetv3_prod/Step6/deploy/ptq_python/README.md)
+
+<a name="3"></a>
+
+## 3. FAQ
+
+如果您在使用该文档完成离线量化的过程中遇到问题，可以给在[这里](https://github.com/PaddlePaddle/PaddleSlim/issues)提一个ISSUE，我们会高优跟进。
+
+## 3.1 通用问题
+
+- 如何选择离线量化方法？
+选择合适的离线量化方法，比如`KL`、`hist`、`mse`等，具体离线量化方法选择可以参考API文档：[quant_post_static API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/static/quant/quantization_api.rst#quant_post_static)。
--- a/tutorials/tipc/kl_infer_python/test_kl_infer_python.md
+++ b/tutorials/tipc/kl_infer_python/test_kl_infer_python.md