Add qdrop to ACT (#1474)

* add qdrop to ACT. * Add tests for ACT. * Adjust example of PTQ&HPO. * Add readme for QuantPost.

Add qdrop to ACT (#1474)
* add qdrop to ACT. * Add tests for ACT. * Adjust example of PTQ&HPO. * Add readme for QuantPost.
306f4592 · zhouzj · GitHub · ef9ab9c7 · 306f4592 · 306f4592
16 changed file
--- a/example/auto_compression/README.md
+++ b/example/auto_compression/README.md
@@ -177,7 +177,7 @@ ac = AutoCompression(
    model_filename="inference.pdmodel",
    params_filename="inference.pdiparams",
    save_dir="MobileNetV1_quant",
-    config={"Quantization": {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}},
+    config={"QuantPost": {}, "HyperParameterOptimization": {'ptq_algo': ['avg'], 'max_quant_count': 3}},
    ### config={"Quantization": {}, "Distillation": {}}, ### 如果您的系统为Windows系统, 请使用当前这一行配置
    train_dataloader=train_loader,
    eval_dataloader=train_loader)

--- a/example/auto_compression/hyperparameter_tutorial.md
+++ b/example/auto_compression/hyperparameter_tutorial.md
@@ -3,7 +3,7 @@

 ## 1.1 各压缩方法超参解析

-### 1.1.1 量化（quantization）
+### 1.1.1 量化训练（quantization）

 量化参数主要设置量化比特数和量化op类型，其中量化op包含卷积层（conv2d, depthwise_conv2d）和全连接层（mul, matmul_v2）。以下为只量化卷积层的示例：
 ```yaml
@@ -50,7 +50,53 @@ print(TENSORRT_OP_TYPES)
 - is_full_quantize: 是否量化所有可支持op类型。默认值为False.


-### 1.1.2 知识蒸馏（knowledge distillation）
+### 1.1.2 离线量化（post-traing quantization）
+离线量化中基本的量化参数和量化训练相同，不再赘述。以下介绍离线量化特有的参数：
+```yaml
+QuantPost:
+    batch_size: 32
+    batch_nums: None
+    algo: 'hist'
+    hist_percent: 0.999
+    bias_correct: False
+    recon_level: None
+    regions: None
+    epochs: 20
+    lr: 0.1
+    simulate_activation_quant: False
+    skip_tensor_list: None
+```
+以上配置项说明如下：
+- batch_size: 设置每个 batch 的图片数量。默认值为32。
+- batch_nums: 离线量化迭代次数。如果设置为 None ，则会一直运行到全部训练数据迭代结束；否则，迭代次数为 batch_nums, 即参与对 Scale 进行校正的样本个数为 batch_nums * batch_size 。
+- algo: 量化时使用的算法名称，可为 'KL'，'mse', 'hist'， 'avg' 或 'abs_max'。当 algo 设置为 'abs_max' 时，使用校正数据的激活值的绝对值的最大值当作 scale 值，当设置为 'KL' 时，则使用KL散度的方法来计算 Scale 值，当设置为 'avg' 时，使用校正数据激活值的最大绝对值平均数作为 scale 值，当设置为 'hist' 时，则使用基于百分比的直方图的方法来计算 scale 值，当设置为 'mse' 时，则使用搜索最小mse损失的方法来计算 scale 值。默认值为 'hist' 。
+- hist_percent: 'hist' 方法的百分位数。默认值为0.9999。
+- bias_correct: 是否使用 bias correction 算法。默认值为 False 。
+- recon_level: 设置该参数将在离线量化之后进行逐区域重建训练，目前支持 'layer-wise' 和 'region-wise'。当设置为'layer-wise'时， 以层为单位进行重建训练；当设置为'region-wise'时，以 `regions` 中每个块区域为单位进行重建训练；当设置为 None 时，则不进行重建训练。 默认值为 None 。
+- regions(list[list]): 当 recon_level 是 'region-wise' 时，需要设置该参数。该列表中每个元素由一个区域的输入和输出变量名组成，可参考该[示例](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/example/post_training_quantization/pytorch_yolo_series/configs/yolov6s_fine_tune.yaml#L11)。
+- epochs: 逐区域重建训练的训练次数。每个 epoch 内的样本数量为 batch_nums * batch_size 。默认值为20。
+- lr: 设置逐区域重建训练的学习率。
+- simulate_activation_quant: 是否在重建训练中引入激活量化噪声。默认值为 False 。
+- skip_tensor_list: 不进行量化的 Tensor 列表，需填入 Tensor 的 name。Tensor 的name 可以通过可视化工具查看。默认值为 None 。
+
+
+### 1.1.3 离线量化超参优化（hyper parameter optimization）
+超参优化是对离线量化中的超参数进行搜索，以选择最优的超参实现更好的量化效果。离线量化超参优化需要设置 `QuantPost` 和 `HyperParameterOptimization`。
+```yaml
+HyperParameterOptimization:
+    ptq_algo: ["KL", "hist", "avg", "mse"]
+    bias_correct: [True, False]
+    hist_percent: [0.98, 0.999],
+    batch_num: [10, 30],
+```
+以上配置项说明如下：
+- ptq_algo: 设置待搜索的离线量化算法。
+- bias_correct: 是否使用 bias correction 算法。
+- hist_percent: 设置 'hist' 算法阈值的上限和下限，实际百分比在此范围内均匀采样而得。
+- batch_num: 设置 'batch_num' 的上下限，实际数值在此范围内均匀采样而得。
+
+
+### 1.1.4 知识蒸馏（knowledge distillation）

 蒸馏参数主要设置蒸馏节点（`node`）和教师预测模型路径，如下所示：
 ```yaml
@@ -96,7 +142,7 @@ Distillation:
 - teacher_params_filename: 教师模型的参数文件名称，格式为 *.pdiparams 或 __params__。仅当设置`teacher_model_dir`后生效。


-### 1.1.3 结构化稀疏（sparsity）
+### 1.1.5 结构化稀疏（sparsity）

 结构化稀疏参数设置如下所示：
 ```yaml
@@ -126,7 +172,7 @@ for var_ in inference_program.list_vars():

 - criterion: 评估卷积通道重要性的指标。可选 “l1_norm” , “bn_scale” , “geometry_median”。具体定义和使用可参考[结构化稀疏API文档](https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/prune/prune_api.html)。

-### 1.1.4 ASP半结构化稀疏
+### 1.1.6 ASP半结构化稀疏

 半结构化稀疏参数设置如下所示：
 ```yaml
@@ -151,7 +197,7 @@ for var_ in inference_program.list_vars():

 或者，使用[Netron工具](https://netron.app/) 可视化`*.pdmodel`模型文件，选择合适的卷积层进行剪裁。

-### 1.1.5 Transformer结构化剪枝
+### 1.1.7 Transformer结构化剪枝

 针对Transformer结构的结构化剪枝参数设置如下所示：
 ```yaml
@@ -160,7 +206,7 @@ TransformerPrune:
 ```
 - pruned_ratio: 每个全链接层的被剪裁的比例。

-### 1.1.6 非结构化稀疏策略
+### 1.1.8 非结构化稀疏策略

 非结构化稀疏参数设置如下所示：
 ```yaml
@@ -200,7 +246,7 @@ UnstructurePrune:
 - local_sparsity 表示剪裁比例（ratio）应用的范围，仅在 'ratio' 模式生效。local_sparsity 开启时意味着每个参与剪裁的参数矩阵稀疏度均为 'ratio'， 关闭时表示只保证模型整体稀疏度达到'ratio'，但是每个参数矩阵的稀疏度可能存在差异。各个矩阵稀疏度保持一致时，稀疏加速更显著。
 - 更多非结构化稀疏的参数含义详见[非结构化稀疏API文档](https://github.com/PaddlePaddle/PaddleSlim/blob/develop/docs/zh_cn/api_cn/dygraph/pruners/unstructured_pruner.rst)

-### 1.1.7 训练超参
+### 1.1.9 训练超参

 训练参数主要设置学习率、训练次数（epochs）和优化器等。
 ```yaml

--- a/example/auto_compression/nlp/README.md
+++ b/example/auto_compression/nlp/README.md
@@ -157,7 +157,9 @@ Prune:
  pruned_ratio: 0.25
 ```

- 优化参数
+- 离线量化超参搜索
+
+本示例的离线量化采取了超参搜索策略，以选择最优的超参数取得更好的离线量化效果。首先，配置待搜索的参数：

 ```yaml
 HyperParameterOptimization:
@@ -177,12 +179,12 @@ HyperParameterOptimization:
  - channel_wise_abs_max
 ```

- 量化参数
+其次，配置离线量化参数：

 量化参数主要设置量化比特数和量化op类型，其中量化op包含卷积层（conv2d, depthwise_conv2d）和全连接层（mul，matmul_v2）。

 ```yaml
-Quantization:
+QuantPost:
  activation_bits: 8
  quantize_op_types:
  - conv2d

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/afqmc.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 6
  eval_iter: 1070

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/cluewsc.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 100
  eval_iter: 70

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/cmnli.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 6
  eval_iter: 2000

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/csl.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 16
  eval_iter: 1000

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/iflytek.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 12
  eval_iter: 750

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/ocnli.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 20
  eval_iter: 1050

--- a/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
+++ b/example/auto_compression/nlp/configs/pp-minilm/auto/tnews.yaml
@@ -10,7 +10,7 @@ TransformerPrune:
  pruned_ratio: 0.25
 HyperParameterOptimization:
 Distillation:
-Quantization:
+QuantPost:
 TrainConfig:
  epochs: 6
  eval_iter: 1110

--- a/example/auto_compression/pytorch_yolo_series/README.md
+++ b/example/auto_compression/pytorch_yolo_series/README.md
@@ -44,9 +44,9 @@

 ## 3. 自动压缩流程

-#### 3.1 准备环境
+### 3.1 准备环境
 - PaddlePaddle >= 2.3.2版本 （可从[Paddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)根据相应环境的安装指令进行安装）
- PaddleSlim develop 版本
+- PaddleSlim >= 2.3.3版本

 （1）安装paddlepaddle
 ```
@@ -56,13 +56,19 @@ pip install paddlepaddle==2.3.2
 python -m pip install paddlepaddle-gpu==2.3.2.post112 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
 ```

-（2）安装paddleslim>=2.3.3：
+（2）安装paddleslim>=2.3.3
 ```shell
 pip install paddleslim==2.3.3
 ```

+#### 版本对齐

-#### 3.2 准备数据集
+|  PaddleSlim   | x2paddle   |
+| :-----------: | :------------: |
+| 2.3.x         | 1.3.8          |
+| develop / 2.4         | 1.3.9          |
+
+### 3.2 准备数据集

 **选择(1)或(2)中一种方法准备数据即可。**

@@ -107,7 +113,7 @@ pip install paddleslim==2.3.3
  ```


-#### 3.3 准备预测模型
+### 3.3 准备预测模型

 （1）准备ONNX模型：

@@ -130,7 +136,7 @@ pip install paddleslim==2.3.3

  **注意**：目前ACT支持**不带NMS**模型，使用如上命令导出即可。也可以直接下载我们已经准备好的[yolov7.onnx](https://paddle-slim-models.bj.bcebos.com/act/yolov7-tiny.onnx)。

-#### 3.4 自动压缩并产出模型
+### 3.4 自动压缩并产出模型

 蒸馏量化自动压缩示例通过run.py脚本启动，会使用接口```paddleslim.auto_compression.AutoCompression```对模型进行自动压缩。配置config文件中模型路径、蒸馏、量化、和训练等部分的参数，配置完成后便可对模型进行量化和蒸馏。

@@ -160,7 +166,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 python -m paddle.distributed.launch --log_dir=log -
 │   ├── calibration.cache     # TensorRT可以直接加载的校准表
 ```

-#### Paddle Inference部署测试
+### Paddle Inference部署测试

 量化模型在GPU上可以使用TensorRT进行加速，在CPU上可以使用MKLDNN进行加速。

@@ -219,7 +225,7 @@ bash compile.sh
 ./build/trt_run --model_file yolov7_quant/model.pdmodel --params_file yolov7_quant/model.pdiparams --run_mode=trt_int8
 ```

-#### 导出至ONNX使用TensorRT部署
+### 导出至ONNX使用TensorRT部署

 加载`quant_model.onnx`和`calibration.cache`，可以直接使用TensorRT测试脚本进行验证，详细代码可参考[TensorRT部署](./TensorRT)


--- a/paddleslim/auto_compression/auto_strategy.py
+++ b/paddleslim/auto_compression/auto_strategy.py
@@ -128,7 +128,7 @@ def create_strategy_config(strategy_str, model_type):
            quant_config = Quantization(**default_quant_config)
            hpo_config = HyperParameterOptimization(**hpo_config_tester)
            configs.append({
-                'Quantization': quant_config,
+                'QuantPost': quant_config,
                'HyperParameterOptimization': hpo_config
            })
        else:
@@ -251,7 +251,7 @@ def get_final_quant_config(ptq_loss, model_type=None):
        quant_config = Quantization(**default_quant_config)
        hpo_config = HyperParameterOptimization(**default_hpo_config)
        configs = [{
-            'Quantization': quant_config,
+            'QuantPost': quant_config,
            'HyperParameterOptimization': hpo_config
        }]


--- a/paddleslim/auto_compression/compressor.py
+++ b/paddleslim/auto_compression/compressor.py
@@ -26,6 +26,7 @@ import paddle
 import itertools
 import paddle.distributed.fleet as fleet
 from ..quant.quanter import convert, quant_post
+from ..quant.reconstruction_quantization import quant_recon_static
 from ..common.recover_program import recover_inference_program
 from ..common import get_logger
 from ..common.patterns import get_patterns, find_final_nodes
@@ -88,27 +89,29 @@ class AutoCompression:
                to None. Default: None. 
            strategy_config(dict, list(dict), optional): The strategy config. You can set single config to get multi-strategy config, such as
                1. set ``Quantization`` and ``Distillation`` to get quant_aware and distillation compress config.
-                    The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24`_ .
-                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
-                2. set ``Quantization`` and ``HyperParameterOptimization`` to get quant_post and hyperparameter optimization compress config.
-                    The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L24`_ .
-                    The HyperParameterOptimization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L73`_ .
+                    The Quantization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L55`_ .
+                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ .
+                2. set ``QuantPost`` and ``HyperParameterOptimization`` to get quant_post and hyperparameter optimization compress config.
+                    The QuantPost config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L187`_ .
+                    The HyperParameterOptimization config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L160`_ .
                3. set ``ChannelPrune`` and ``Distillation`` to get channel prune and distillation compress config.
-                    The ChannelPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
-                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
+                    The ChannelPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L254`_ .
+                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ .
                4. set ``ASPPrune`` and ``Distillation`` to get asp prune and distillation compress config.
-                    The ASPPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
-                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
+                    The ASPPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L268`_ .
+                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ .
                5. set ``TransformerPrune`` and ``Distillation`` to get transformer prune and distillation compress config.
-                    The TransformerPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L82`_ .
-                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
+                    The TransformerPrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L278`_ .
+                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ .
                6. set ``UnstructurePrune`` and ``Distillation`` to get unstructureprune and distillation compress config.
-                    The UnstructurePrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L91`_ .
-                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
+                    The UnstructurePrune config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L288`_ .
+                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ .
                7. set ``Distillation`` to use one teacher modol to distillation student model.
-                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L39`_ .
+                    The Distillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L107`_ .
                8. set ``MultiTeacherDistillation`` to use multi-teacher to distillation student model.
-                    The MultiTeacherDistillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L56`_ .
+                    The MultiTeacherDistillation config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L134`_ .
+                9. set ``QuantPost`` to get quant_post compress config.
+                    The QuantPost config can reference `https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/auto_compression/strategy_config.py#L187`_ .

                If set to None, will choose a strategy automatically. Default: None.
            target_speedup(float, optional): target speedup ratio by the way of auto compress. Default: None.
@@ -344,6 +347,7 @@ class AutoCompression:
        for strategy_c in strategy_config:
            quant_config = strategy_c.get("Quantization", None)
            hpo_config = strategy_c.get("HyperParameterOptimization", None)
+            ptq_config = strategy_c.get("QuantPost", None)
            prune_config = strategy_c.get("ChannelPrune", None)
            asp_config = strategy_c.get("ASPPrune", None)
            transformer_prune_config = strategy_c.get("TransformerPrune", None)
@@ -394,10 +398,10 @@ class AutoCompression:
                                 self._distill_config))

            ### case5: quant_config & hpo_config ==> PTQ & HPO
-            if quant_config is not None and hpo_config is not None:
+            if ptq_config is not None and hpo_config is not None:
                only_distillation = False
                strategy.append('ptq_hpo')
-                config.append(merge_config(quant_config, hpo_config))
+                config.append(merge_config(ptq_config, hpo_config))

            ### case6: quant_config & distill config ==> QAT & Distill
            if quant_config is not None and self._distill_config is not None and 'ptq_hpo' not in strategy:
@@ -414,6 +418,11 @@ class AutoCompression:
                    strategy.append('multi_teacher_dis')
                    config.append(multi_teacher_distill_config)

+            ### case8: only qtp_config ==> PTQ
+            if ptq_config is not None and hpo_config is None:
+                strategy.append('quant_post')
+                config.append(ptq_config)
+
        ### NOTE: keep quantation in the last step
        idx = -1
        if 'qat_dis' in strategy and strategy.index('qat_dis') != (
@@ -572,6 +581,7 @@ class AutoCompression:
        config = None
        train_config = None
        strategy_idx = None
+        self.final_metric = -1.0
        for strategy_idx, (
                strategy, config, train_config
        ) in enumerate(zip(self._strategy, self._config, self.train_config)):
@@ -599,6 +609,19 @@ class AutoCompression:
                if os.path.isfile(_file_path):
                    shutil.copy(_file_path, final_model_path)
            shutil.rmtree(self.tmp_dir)
+
+            if self.eval_function is not None and self.final_metric < 0.0:
+                [inference_program, feed_target_names, fetch_targets]= load_inference_model( \
+                    final_model_path, \
+                    model_filename=self.model_filename, params_filename=self.params_filename,
+                    executor=self._exe)
+                self.final_metric = self.eval_function(
+                    self._exe, inference_program, feed_target_names,
+                    fetch_targets)
+            if self.eval_function is not None:
+                _logger.info("==> The metric of final model is {:.4f}".format(
+                    self.final_metric))
+
            _logger.info(
                "==> The ACT compression has been completed and the final model is saved in `{}`".
                format(final_model_path))
@@ -621,41 +644,64 @@ class AutoCompression:
                params_filename=self.params_filename,
                executor=self._exe)
        if strategy == 'quant_post':
-            quant_post(
-                self._exe,
-                model_dir=model_dir,
-                quantize_model_path=os.path.join(
-                    self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))),
-                data_loader=self.train_dataloader,
-                model_filename=self.model_filename,
-                params_filename=self.params_filename,
-                save_model_filename=self.model_filename,
-                save_params_filename=self.params_filename,
-                batch_size=1,
-                batch_nums=config.batch_num,
-                algo=config.ptq_algo,
-                round_type='round',
-                bias_correct=config.bias_correct,
-                hist_percent=config.hist_percent,
-                quantizable_op_type=config.quantize_op_types,
-                is_full_quantize=config.is_full_quantize,
-                weight_bits=config.weight_bits,
-                activation_bits=config.activation_bits,
-                activation_quantize_type='range_abs_max',
-                weight_quantize_type=config.weight_quantize_type,
-                onnx_format=False)
+            if config.recon_level is None:
+                quant_post(
+                    self._exe,
+                    model_dir=self.updated_model_dir,
+                    quantize_model_path=os.path.join(
+                        self.tmp_dir,
+                        'strategy_{}'.format(str(strategy_idx + 1))),
+                    data_loader=self.train_dataloader,
+                    model_filename=self.model_filename,
+                    params_filename=self.params_filename,
+                    save_model_filename=self.model_filename,
+                    save_params_filename=self.params_filename,
+                    batch_size=config.batch_size,
+                    batch_nums=config.batch_nums,
+                    algo=config.algo,
+                    bias_correction=config.bias_correction,
+                    hist_percent=config.hist_percent,
+                    quantizable_op_type=config.quantize_op_types,
+                    is_full_quantize=config.is_full_quantize,
+                    weight_bits=config.weight_bits,
+                    activation_bits=config.activation_bits,
+                    activation_quantize_type=config.activation_quantize_type,
+                    weight_quantize_type=config.weight_quantize_type,
+                    onnx_format=config.onnx_format)
+            else:
+                quant_recon_static(
+                    executor=self._exe,
+                    model_dir=self.updated_model_dir,
+                    quantize_model_path=os.path.join(
+                        self.tmp_dir,
+                        'strategy_{}'.format(str(strategy_idx + 1))),
+                    data_loader=self.train_dataloader,
+                    model_filename=self.model_filename,
+                    params_filename=self.params_filename,
+                    batch_size=config.batch_size,
+                    batch_nums=config.batch_nums,
+                    algo=config.algo,
+                    hist_percent=config.hist_percent,
+                    quantizable_op_type=config.quantize_op_types,
+                    is_full_quantize=config.is_full_quantize,
+                    bias_correction=config.bias_correction,
+                    onnx_format=config.onnx_format,
+                    weight_bits=config.weight_bits,
+                    activation_bits=config.activation_bits,
+                    weight_quantize_type=config.weight_quantize_type,
+                    activation_quantize_type=config.activation_quantize_type,
+                    recon_level=config.recon_level,
+                    simulate_activation_quant=config.simulate_activation_quant,
+                    regions=config.regions,
+                    region_weights_names=config.region_weights_names,
+                    skip_tensor_list=config.skip_tensor_list,
+                    epochs=config.epochs,
+                    lr=config.lr)

        elif strategy == 'ptq_hpo':
            if platform.system().lower() != 'linux':
                raise NotImplementedError(
                    "post-quant-hpo is not support in system other than linux")
-            if self.updated_model_dir != model_dir:
-                # If model is ONNX, convert it to inference model firstly.
-                load_inference_model(
-                    model_dir,
-                    model_filename=self.model_filename,
-                    params_filename=self.params_filename,
-                    executor=self._exe)
            if self.eval_function is None:
                # If eval function is None, ptq_hpo will use emd distance to eval the quantized model, so need the dataloader without label
                eval_dataloader = self.train_dataloader
@@ -664,7 +710,7 @@ class AutoCompression:
            post_quant_hpo.quant_post_hpo(
                self._exe,
                self._places,
-                model_dir=model_dir,
+                model_dir=self.updated_model_dir,
                quantize_model_path=os.path.join(
                    self.tmp_dir, 'strategy_{}'.format(str(strategy_idx + 1))),
                train_dataloader=self.train_dataloader,
@@ -781,7 +827,7 @@ class AutoCompression:
                                        self.metric_before_compressed)
                            ) / self.metric_before_compressed <= 0.005:
                                _logger.info(
-                                    "The error rate between the compressed model and original model is less than 5%. The training process ends."
+                                    "The error rate between the compressed model and original model is less than 0.5%. The training process ends."
                                )
                                stop_training = True
                                break
@@ -803,8 +849,9 @@ class AutoCompression:
                        )
                if (train_config.train_iter and total_train_iter >=
                        train_config.train_iter) or stop_training:
+                    stop_training = True
                    break
-
+        self.final_metric = best_metric
        if 'unstructure' in self._strategy or train_config.sparse_model:
            self._pruner.update_params()


--- a/paddleslim/auto_compression/strategy_config.py
+++ b/paddleslim/auto_compression/strategy_config.py
@@ -29,6 +29,7 @@ __all__ = [
    "TrainConfig",
    "SUPPORTED_CONFIG",
    "TRAIN_CONFIG_NAME",
+    "QuantPost",
 ]

 SUPPORTED_CONFIG = [
@@ -40,6 +41,7 @@ SUPPORTED_CONFIG = [
    "UnstructurePrune",
    "TransformerPrune",
    "ASPPrune",
+    "QuantPost",
 ]

 TRAIN_CONFIG_NAME = "TrainConfig"
@@ -182,6 +184,73 @@ class HyperParameterOptimization(BaseStrategy):
        self.max_quant_count = max_quant_count


+class QuantPost(BaseStrategy):
+    def __init__(self,
+                 batch_size=32,
+                 batch_nums=None,
+                 epochs=20,
+                 lr=0.1,
+                 algo='hist',
+                 hist_percent=0.999,
+                 regions=None,
+                 region_weights_names=None,
+                 recon_level=None,
+                 is_full_quantize=False,
+                 bias_correction=False,
+                 weight_quantize_type='channel_wise_abs_max',
+                 activation_quantize_type='range_abs_max',
+                 simulate_activation_quant=False,
+                 skip_tensor_list=None,
+                 onnx_format=False,
+                 quantize_op_types=[
+                     "conv2d", "depthwise_conv2d", "mul", "matmul", "matmul_v2"
+                 ],
+                 weight_bits=8,
+                 activation_bits=8):
+        """
+        QuantPost Config.
+        Args:
+            batch_size(int, optional): The batch size of DataLoader. Default: 1.
+            batch_nums(int, optional): If batch_nums is not None, the number of calibrate data is 'batch_size*batch_nums'. If batch_nums is None, use all data generated by sample_generator as calibrate data. Default: None.
+            lr(float, optional): The learning rate of Reconstruction Quanter. Default: 0.1.
+            algo(str, optional): Post-Training Quantization algorithm, can be set reference the algo from `<https://paddleslim.readthedocs.io/zh_CN/latest/api_cn/static/quant/quantization_api.html#quant-post-static>`. Default: 'hist'.
+            hist_percent(float, optional): The percentile of histogram for algo hist. Default: 0.999.
+            regions(list[list], optional): The list of some regions, each region is a subgraph of fp32 program and it will have exact 1 input operation and 1 output operation. When the recon-level is region, the reconstruction loss of each region is minimized. Default: None.
+            region_weights_names(list[list], optional): The weight names inside every region. Default: None.
+            recon_level(str, optional): The type of reconstruction granularity. Currently support ['layer-wise', 'region-wise'] types. Only when recon_level isn't None can Reconstruction Quanter be used. Default: None. 
+            is_full_quantize(bool): If True, 'quantoze_op_types' will be TRANSFORM_PASS_OP_TYPES + QUANT_DEQUANT_PASS_OP_TYPES. Default: False.
+            bias_correct(list(bool)): Whether to use bias correction method of https://arxiv.org/abs/1810.05723. Default: False.
+            weight_quantize_type(str): Weight quantize type. Default: 'channel_wise_abs_max'.
+            activation_quantize_type(str): Activation quantize type. Default: 'moving_average_abs_max'.
+            simulate_activation_quant(bool, optional): Whether we need the noise caused by activation quantization during the reconstruction process. Default: False.
+            skip_tensor_list(list): List of skip quant tensor name. Default: None.
+            onnx_format(bool): Whether to export the quantized model with format of ONNX. Default: False.
+            quantize_op_types(list(str)): Ops of type in quantize_op_types, will be quantized. Default: ['conv2d', 'depthwise_conv2d', 'mul', 'matmul', 'matmul_v2'].
+            weight_bits(int): Weight quantize bit num. Default: 8.
+            activation_bits(int): Activation quantize bit num. Default: 8.
+        """
+        super(QuantPost, self).__init__("PTQ")
+        self.batch_size = batch_size
+        self.batch_nums = batch_nums
+        self.epochs = epochs
+        self.lr = lr
+        self.algo = algo
+        self.hist_percent = hist_percent
+        self.regions = regions
+        self.region_weights_names = region_weights_names
+        self.recon_level = recon_level
+        self.is_full_quantize = is_full_quantize
+        self.bias_correction = bias_correction
+        self.weight_quantize_type = weight_quantize_type
+        self.activation_quantize_type = activation_quantize_type
+        self.simulate_activation_quant = simulate_activation_quant
+        self.skip_tensor_list = skip_tensor_list
+        self.onnx_format = onnx_format
+        self.quantize_op_types = quantize_op_types
+        self.weight_bits = weight_bits
+        self.activation_bits = activation_bits
+
+
 class ChannelPrune:
    def __init__(self, pruned_ratio, prune_params_name, criterion='l1_norm'):
        """

--- a/tests/act/test_act_api.py
+++ b/tests/act/test_act_api.py
@@ -153,5 +153,55 @@ class TestLoadONNXModel(ACTBase):
            deploy_backend='tensorrt')


+class TestDictPTQ(ACTBase):
+    def __init__(self, *args, **kwargs):
+        super(TestDictPTQ, self).__init__(*args, **kwargs)
+
+    def test_compress(self):
+        image = paddle.static.data(
+            name='data', shape=[-1, 3, 32, 32], dtype='float32')
+        train_loader = paddle.io.DataLoader(
+            self.eval_dataset,
+            feed_list=[image],
+            batch_size=4,
+            return_list=False)
+        ac = AutoCompression(
+            model_dir=self.tmpdir.name,
+            model_filename="infer.pdmodel",
+            params_filename="infer.pdiparams",
+            save_dir="output",
+            config={'QuantPost': {}},
+            train_dataloader=train_loader,
+            eval_dataloader=train_loader
+        )  # eval_function to verify accuracy         
+        ac.compress()
+
+
+class TestDictPTQRecon(ACTBase):
+    def __init__(self, *args, **kwargs):
+        super(TestDictPTQRecon, self).__init__(*args, **kwargs)
+
+    def test_compress(self):
+        image = paddle.static.data(
+            name='data', shape=[-1, 3, 32, 32], dtype='float32')
+        train_loader = paddle.io.DataLoader(
+            self.eval_dataset,
+            feed_list=[image],
+            batch_size=4,
+            return_list=False)
+        ac = AutoCompression(
+            model_dir=self.tmpdir.name,
+            model_filename="infer.pdmodel",
+            params_filename="infer.pdiparams",
+            save_dir="output",
+            config={'QuantPost': {
+                'recon_level': 'layer-wise'
+            }},
+            train_dataloader=train_loader,
+            eval_dataloader=train_loader
+        )  # eval_function to verify accuracy         
+        ac.compress()
+
+
 if __name__ == '__main__':
    unittest.main()
--- a/tests/act/test_demo.py
+++ b/tests/act/test_demo.py
@@ -56,7 +56,7 @@ class ACTDemo(unittest.TestCase):
            params_filename="inference.pdiparams",
            save_dir="MobileNetV1_quant",
            config={
-                'Quantization': {},
+                'QuantPost': {},
                "HyperParameterOptimization": {
                    'ptq_algo': ['avg'],
                    'max_quant_count': 3