add slim docs

0f1a59d5 · dongshuilong · 3703f63f · 0f1a59d5 · 0f1a59d5 · 3703f63f
11 changed file
--- a/deploy/slim/quant/README.md
+++ b/deploy/slim/quant/README.md

-## 介绍
-复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余，模型量化将全精度缩减到定点数减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
+## Slim功能介绍
+复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余。此部分提供精简模型的功能，包括两部分：模型量化（量化训练、离线量化）、模型剪枝。
+
+其中模型量化将全精度缩减到定点数减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
 模型量化可以在基本不损失模型的精度的情况下，将FP32精度的模型参数转换为Int8精度，减小模型参数大小并加速计算，使用量化后的模型在移动端等部署时更具备速度优势。

+模型剪枝将CNN中不重要的卷积核裁剪掉，减少模型参数量，从而降低模型计算复杂度。
+
 本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleClas模型的压缩。
 [PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化（包括量化训练和离线量化）、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能，如果您感兴趣，可以关注并了解。

-在开始本教程之前，建议先了解[PaddleClas模型的训练方法](../../../docs/zh_CN/tutorials/quick_start.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
+在开始本教程之前，建议先了解[PaddleClas模型的训练方法](../../docs/zh_CN/tutorials/getting_started.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)


 ## 快速开始
-量化多适用于轻量模型在移动端的部署，当训练出一个模型后，如果希望进一步的压缩模型大小并加速预测，可使用量化的方法压缩模型。
+当训练出一个模型后，如果希望进一步的压缩模型大小并加速预测，可使用量化或者剪枝的方法压缩模型。

-模型量化主要包括五个步骤：
+模型压缩主要包括五个步骤：
 1. 安装 PaddleSlim
 2. 准备训练好的模型
-3. 量化训练
+3. 模型压缩
 4. 导出量化推理模型
 5. 量化模型预测部署

@@ -24,7 +28,7 @@
 * 可以通过pip install的方式进行安装。

 ```bash
-pip3.7 install paddleslim==2.0.0
+pip install paddleslim -i https://pypi.tuna.tsinghua.edu.cn/simple
 ```

 * 如果获取PaddleSlim的最新特性，可以从源码安装。
@@ -37,70 +41,97 @@ python3.7 setup.py install

 ### 2. 准备训练好的模型

-PaddleClas提供了一系列训练好的[模型](../../../docs/zh_CN/models/models_intro.md)，如果待量化的模型不在列表中，需要按照[常规训练](../../../docs/zh_CN/tutorials/getting_started.md)方法得到训练好的模型。
+PaddleClas提供了一系列训练好的[模型](../../docs/zh_CN/models/models_intro.md)，如果待量化的模型不在列表中，需要按照[常规训练](../../docs/zh_CN/tutorials/getting_started.md)方法得到训练好的模型。
+
+### 3. 模型压缩
+
+进入PaddleClas根目录
+
+```bash
+cd PaddleClas
+```
+
+#### 3.1 模型量化

-### 3. 量化训练
 量化训练包括离线量化训练和在线量化训练，在线量化训练效果更好，需加载预训练模型，在定义好量化策略后即可对模型进行量化。

+##### 3.1.1 在线量化训练

-量化训练的代码位于`deploy/slim/quant/quant.py` 中，训练指令如下：
+训练指令如下：

 * CPU/单机单卡启动

 ```bash
-python3.7 deploy/slim/quant/quant.py \
-    -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
+python3.7 deploy/slim/slim.py -m train -c ppcls/configs/slim/ResNet50_vd_quantalization.yaml -o Global.device cpu
 ```

+其中`yaml`文件解析详见[参考文档](../../docs/zh_CN/tutorials/config_description.md)
+
+`-m`：表示`slim.py`支持的模式，有`train、val、infer、export`，4种模式，分别为：训练、测试、动态图预测、导出`inference model`
+
 * 单机单卡/单机多卡/多机多卡启动

 ```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
+    --gpus="0,1,2,3" \
+      deploy/slim/slim.py \
+      -m train \
+      -c ppcls/configs/slim/ResNet50_vd_quantalization.yaml
+```
+
+##### 3.1.2 离线量化
+
+**注意**：目前离线量化，必须使用已经训练好的模型，导出的`inference model`进行量化。一般模型导出`inference model`可参考[教程](../../docs/zh_CN/inference.md)
+
+生成`inference model`后，离线量化运行方式如下
+
+```bash
+python3.7 deploy/slim/quant_post_static.py -c ppcls/configs/ImageNet/ResNet/ResNet50_vd.yaml -o Global.save_inference_dir=./deploy/models/class_ResNet50_vd_ImageNet_infer
 ```

+其中`Global.save_inference_dir`是`inference model`存放的目录。执行成功后，在`Global.save_inference_dir`的目录下，生成`quant_post_static_model`文件夹，其中存储生成的离线量化模型，其可以直接进行预测部署，无需再重新导出模型。
+
+#### 3.2 模型剪枝
+
+训练指令如下：
+
+- CPU/单机单卡启动
+
+```bash
+python3.7 deploy/slim/slim.py -m export -c ppcls/configs/slim/ResNet50_vd_prune.yaml -o Global.device cpu
+```

-* 下面是量化`MobileNetV3_large_x1_0`模型的训练示例脚本。
+- 单机单卡/单机多卡/多机多卡启动

 ```bash
-# 下载预训练模型
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
-# 启动训练，这里如果因为显存限制，batch size无法设置过大，可以将batch size和learning rate同比例缩小。
+export CUDA_VISIBLE_DEVICES=0,1,2,3
 python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-        -o LEARNING_RATE.params.lr=0.13 \
-        -o epochs=100
+    --gpus="0,1,2,3" \
+      deploy/slim/slim.py \
+      -m train \
+      -c ppcls/configs/slim/ResNet50_vd_prune.yaml
 ```

 ### 4. 导出模型

-在得到量化训练保存的模型后，可以将其导出为inference model，用于预测部署：
+在得到在线量化训练、模型剪枝保存的模型后，可以将其导出为inference model，用于预测部署，以模型剪枝为例：

 ```bash
-python3.7 deploy/slim/quant/export_model.py \
-    -m MobileNetV3_large_x1_0 \
-    -p output/MobileNetV3_large_x1_0/best_model/ppcls \
-    -o ./MobileNetV3_large_x1_0_infer/ \
-    --img_size=224 \
-    --class_dim=1000
+python3.7 deploy/slim/slim.py \
+    -m export \
+    -c ppcls/configs/slim/ResNet50_vd_prune.yaml \
+        -o Global.save_inference_dir=./inference
 ```


-### 5. 量化模型部署
+### 5. 模型部署

-上述步骤导出的量化模型，参数精度仍然是FP32，但是参数的数值范围是int8，导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。
-量化模型部署的可参考 [移动端模型部署](../../lite/readme.md)
+上述步骤导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。
+量化模型部署的可参考 [移动端模型部署](../lite/readme.md)


-## 量化训练超参数建议
+## 训练超参数建议

 * 量化训练时，建议加载常规训练得到的预训练模型，加速量化训练收敛。
 * 量化训练时，建议初始学习率修改为常规训练的`1/20~1/10`，同时将训练epoch数修改为常规训练的`1/5~1/2`，学习率策略方面，加上Warmup，其他配置信息不建议修改。
--- a/deploy/slim/quant/README_en.md
+++ b/deploy/slim/quant/README_en.md
--- a/deploy/slim/quant/export_model.py
+++ b/deploy/slim/quant/export_model.py
-# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import argparse
-import os
-import sys
-__dir__ = os.path.dirname(os.path.abspath(__file__))
-sys.path.append(__dir__)
-sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..')))
-sys.path.append(
-    os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools')))
-
-from ppcls.arch import backbone
-from ppcls.utils.save_load import load_dygraph_pretrain
-import paddle
-import paddle.nn.functional as F
-from paddle.jit import to_static
-from paddleslim.dygraph.quant import QAT
-
-from pact_helper import get_default_quant_config
-
-
-def parse_args():
-    def str2bool(v):
-        return v.lower() in ("true", "t", "1")
-
-    parser = argparse.ArgumentParser()
-    parser.add_argument("-m", "--model", type=str)
-    parser.add_argument("-p", "--pretrained_model", type=str)
-    parser.add_argument("-o", "--output_path", type=str, default="./inference")
-    parser.add_argument("--class_dim", type=int, default=1000)
-    parser.add_argument("--load_static_weights", type=str2bool, default=False)
-    parser.add_argument("--img_size", type=int, default=224)
-
-    return parser.parse_args()
-
-
-class Net(paddle.nn.Layer):
-    def __init__(self, net, class_dim, model=None):
-        super(Net, self).__init__()
-        self.pre_net = net(class_dim=class_dim)
-        self.model = model
-
-    def forward(self, inputs):
-        x = self.pre_net(inputs)
-        if self.model == "GoogLeNet":
-            x = x[0]
-        x = F.softmax(x)
-        return x
-
-
-def main():
-    args = parse_args()
-
-    net = backbone.__dict__[args.model]
-    model = Net(net, args.class_dim, args.model)
-
-    # get QAT model
-    quant_config = get_default_quant_config()
-    # TODO(littletomatodonkey): add PACT for export model
-    # quant_config["activation_preprocess_type"] = "PACT"
-    quanter = QAT(config=quant_config)
-    quanter.quantize(model)
-
-    load_dygraph_pretrain(
-        model.pre_net,
-        path=args.pretrained_model,
-        load_static_weights=args.load_static_weights)
-    model.eval()
-
-    save_path = os.path.join(args.output_path, "inference")
-    quanter.save_quantized_model(
-        model,
-        save_path,
-        input_spec=[
-            paddle.static.InputSpec(
-                shape=[None, 3, args.img_size, args.img_size], dtype='float32')
-        ])
-    print('inference QAT model is saved to {}'.format(save_path))
-
-
-if __name__ == "__main__":
-    main()
--- a/deploy/slim/quant/pact_helper.py
+++ b/deploy/slim/quant/pact_helper.py
-# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import paddle
-
-
-def get_default_quant_config():
-    quant_config = {
-        # weight preprocess type, default is None and no preprocessing is performed. 
-        'weight_preprocess_type': None,
-        # activation preprocess type, default is None and no preprocessing is performed.
-        'activation_preprocess_type': None,
-        # weight quantize type, default is 'channel_wise_abs_max'
-        'weight_quantize_type': 'channel_wise_abs_max',
-        # activation quantize type, default is 'moving_average_abs_max'
-        'activation_quantize_type': 'moving_average_abs_max',
-        # weight quantize bit num, default is 8
-        'weight_bits': 8,
-        # activation quantize bit num, default is 8
-        'activation_bits': 8,
-        # data type after quantization, such as 'uint8', 'int8', etc. default is 'int8'
-        'dtype': 'int8',
-        # window size for 'range_abs_max' quantization. default is 10000
-        'window_size': 10000,
-        # The decay coefficient of moving average, default is 0.9
-        'moving_rate': 0.9,
-        # for dygraph quantization, layers of type in quantizable_layer_type will be quantized
-        'quantizable_layer_type': ['Conv2D', 'Linear'],
-    }
-    return quant_config
--- a/deploy/slim/quant/quant.py
+++ b/deploy/slim/quant/quant.py
-# copyright (c) 2021 PaddlePaddle Authors. All Rights Reserve.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#    http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import argparse
-import os
-import sys
-__dir__ = os.path.dirname(os.path.abspath(__file__))
-sys.path.append(__dir__)
-sys.path.append(os.path.abspath(os.path.join(__dir__, '..', '..', '..')))
-sys.path.append(
-    os.path.abspath(os.path.join(__dir__, '..', '..', '..', 'tools')))
-
-import paddle
-from paddleslim.dygraph.quant import QAT
-
-from ppcls.data import Reader
-from ppcls.utils.config import get_config
-from ppcls.utils.save_load import init_model, save_model
-from ppcls.utils import logger
-import program
-
-from pact_helper import get_default_quant_config
-
-
-def parse_args():
-    parser = argparse.ArgumentParser("PaddleClas train script")
-    parser.add_argument(
-        '-c',
-        '--config',
-        type=str,
-        default='configs/ResNet/ResNet50.yaml',
-        help='config file path')
-    parser.add_argument(
-        '-o',
-        '--override',
-        action='append',
-        default=[],
-        help='config options to be overridden')
-    args = parser.parse_args()
-    return args
-
-
-def main(args):
-    paddle.seed(12345)
-
-    config = get_config(args.config, overrides=args.override, show=True)
-    # assign the place
-    use_gpu = config.get("use_gpu", True)
-    place = paddle.set_device('gpu' if use_gpu else 'cpu')
-
-    trainer_num = paddle.distributed.get_world_size()
-    use_data_parallel = trainer_num != 1
-    config["use_data_parallel"] = use_data_parallel
-
-    if config["use_data_parallel"]:
-        paddle.distributed.init_parallel_env()
-
-    net = program.create_model(config.ARCHITECTURE, config.classes_num)
-
-    # prepare to quant
-    quant_config = get_default_quant_config()
-    quant_config["activation_preprocess_type"] = "PACT"
-    quanter = QAT(config=quant_config)
-    quanter.quantize(net)
-
-    optimizer, lr_scheduler = program.create_optimizer(
-        config, parameter_list=net.parameters())
-
-    init_model(config, net, optimizer)
-
-    if config["use_data_parallel"]:
-        net = paddle.DataParallel(net)
-
-    train_dataloader = Reader(config, 'train', places=place)()
-
-    if config.validate:
-        valid_dataloader = Reader(config, 'valid', places=place)()
-
-    last_epoch_id = config.get("last_epoch", -1)
-    best_top1_acc = 0.0  # best top1 acc record
-    best_top1_epoch = last_epoch_id
-    for epoch_id in range(last_epoch_id + 1, config.epochs):
-        net.train()
-        # 1. train with train dataset
-        program.run(train_dataloader, config, net, optimizer, lr_scheduler,
-                    epoch_id, 'train')
-
-        # 2. validate with validate dataset
-        if config.validate and epoch_id % config.valid_interval == 0:
-            net.eval()
-            with paddle.no_grad():
-                top1_acc = program.run(valid_dataloader, config, net, None,
-                                       None, epoch_id, 'valid')
-            if top1_acc > best_top1_acc:
-                best_top1_acc = top1_acc
-                best_top1_epoch = epoch_id
-                model_path = os.path.join(config.model_save_dir,
-                                          config.ARCHITECTURE["name"])
-                save_model(net, optimizer, model_path, "best_model")
-            message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
-                best_top1_acc, best_top1_epoch)
-            logger.info(message)
-
-        # 3. save the persistable model
-        if epoch_id % config.save_interval == 0:
-            model_path = os.path.join(config.model_save_dir,
-                                      config.ARCHITECTURE["name"])
-            save_model(net, optimizer, model_path, epoch_id)
-
-
-if __name__ == '__main__':
-    args = parse_args()
-    main(args)
--- a/tools/slim/quant_post_static.py
+++ b/tools/slim/quant_post_static.py
@@ -67,7 +67,7 @@ def main():
        quantize_model_path=os.path.join(
            config["Global"]["save_inference_dir"], "quant_post_static_model"),
        sample_generator=sample_generator(train_dataloader),
-        batch_nums=5)
+        batch_nums=10)


 if __name__ == "__main__":

--- a/tools/slim/slim.py
+++ b/tools/slim/slim.py
--- a/ppcls/configs/slim/ResNet50_vd_prune.yaml
+++ b/ppcls/configs/slim/ResNet50_vd_prune.yaml
 # global configs
 Global:
  checkpoints: null
-  pretrained_model: ./output/ResNet50_vd/best_model
  pretrained_model: null 
  output_dir: ./output/
  device: gpu
  save_interval: 1
  eval_during_train: True
  eval_interval: 1
-  epochs: 30
+  epochs: 200
  print_batch_step: 10
  use_visualdl: False
  # used for static mode and model export
@@ -26,6 +25,7 @@ Slim:
 Arch:
  name: ResNet50_vd
  class_num: 1000
+  pretrained: True
 
 # loss function config for traing/eval process
 Loss:

--- a/ppcls/configs/slim/ResNet50_vd_quantalization.yaml
+++ b/ppcls/configs/slim/ResNet50_vd_quantalization.yaml
 # global configs
 Global:
  checkpoints: null
-  pretrained_model: ./output/ResNet50_vd/best_model
  pretrained_model: null 
  output_dir: ./output/
  device: gpu
@@ -25,6 +24,7 @@ Slim:
 Arch:
  name: ResNet50_vd
  class_num: 1000
+  pretrained: True
 
 # loss function config for traing/eval process
 Loss:

--- a/tools/slim/README.md
+++ b/tools/slim/README.md
-
-## 介绍
-复杂的模型有利于提高模型的性能，但也导致模型中存在一定冗余，模型量化将全精度缩减到定点数减少这种冗余，达到减少模型计算复杂度，提高模型推理性能的目的。
-模型量化可以在基本不损失模型的精度的情况下，将FP32精度的模型参数转换为Int8精度，减小模型参数大小并加速计算，使用量化后的模型在移动端等部署时更具备速度优势。
-
-本教程将介绍如何使用飞桨模型压缩库PaddleSlim做PaddleClas模型的压缩。
-[PaddleSlim](https://github.com/PaddlePaddle/PaddleSlim) 集成了模型剪枝、量化（包括量化训练和离线量化）、蒸馏和神经网络搜索等多种业界常用且领先的模型压缩功能，如果您感兴趣，可以关注并了解。
-
-在开始本教程之前，建议先了解[PaddleClas模型的训练方法](../../../docs/zh_CN/tutorials/quick_start.md)以及[PaddleSlim](https://paddleslim.readthedocs.io/zh_CN/latest/index.html)
-
-
-## 快速开始
-量化多适用于轻量模型在移动端的部署，当训练出一个模型后，如果希望进一步的压缩模型大小并加速预测，可使用量化的方法压缩模型。
-
-模型量化主要包括五个步骤：
-1. 安装 PaddleSlim
-2. 准备训练好的模型
-3. 量化训练
-4. 导出量化推理模型
-5. 量化模型预测部署
-
-### 1. 安装PaddleSlim
-
-* 可以通过pip install的方式进行安装。
-
-```bash
-pip3.7 install paddleslim==2.0.0
-```
-
-* 如果获取PaddleSlim的最新特性，可以从源码安装。
-
-```bash
-git clone https://github.com/PaddlePaddle/PaddleSlim.git
-cd Paddleslim
-python3.7 setup.py install
-```
-
-### 2. 准备训练好的模型
-
-PaddleClas提供了一系列训练好的[模型](../../../docs/zh_CN/models/models_intro.md)，如果待量化的模型不在列表中，需要按照[常规训练](../../../docs/zh_CN/tutorials/getting_started.md)方法得到训练好的模型。
-
-### 3. 量化训练
-量化训练包括离线量化训练和在线量化训练，在线量化训练效果更好，需加载预训练模型，在定义好量化策略后即可对模型进行量化。
-
-
-量化训练的代码位于`deploy/slim/quant/quant.py` 中，训练指令如下：
-
-* CPU/单机单卡启动
-
-```bash
-python3.7 deploy/slim/quant/quant.py \
-    -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-* 单机单卡/单机多卡/多机多卡启动
-
-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-
-* 下面是量化`MobileNetV3_large_x1_0`模型的训练示例脚本。
-
-```bash
-# 下载预训练模型
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
-# 启动训练，这里如果因为显存限制，batch size无法设置过大，可以将batch size和learning rate同比例缩小。
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-        -o LEARNING_RATE.params.lr=0.13 \
-        -o epochs=100
-```
-
-### 4. 导出模型
-
-在得到量化训练保存的模型后，可以将其导出为inference model，用于预测部署：
-
-```bash
-python3.7 deploy/slim/quant/export_model.py \
-    -m MobileNetV3_large_x1_0 \
-    -p output/MobileNetV3_large_x1_0/best_model/ppcls \
-    -o ./MobileNetV3_large_x1_0_infer/ \
-    --img_size=224 \
-    --class_dim=1000
-```
-
-
-### 5. 量化模型部署
-
-上述步骤导出的量化模型，参数精度仍然是FP32，但是参数的数值范围是int8，导出的模型可以通过PaddleLite的opt模型转换工具完成模型转换。
-量化模型部署的可参考 [移动端模型部署](../../lite/readme.md)
-
-
-## 量化训练超参数建议
-
-* 量化训练时，建议加载常规训练得到的预训练模型，加速量化训练收敛。
-* 量化训练时，建议初始学习率修改为常规训练的`1/20~1/10`，同时将训练epoch数修改为常规训练的`1/5~1/2`，学习率策略方面，加上Warmup，其他配置信息不建议修改。
--- a/tools/slim/README_en.md
+++ b/tools/slim/README_en.md
-
-## Introduction
-
-Generally, a more complex model would achive better performance in the task, but it also leads to some redundancy in the model.
-Quantization is a technique that reduces this redundancy by reducing the full precision data to a fixed number,
-so as to reduce model calculation complexity and improve model inference performance.
-
-This example uses PaddleSlim provided [APIs of Quantization](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/) to compress the PaddleClas models.
-
-It is recommended that you could understand following pages before reading this example：
- [The training strategy of PaddleClas models](../../../docs/en/tutorials/quick_start_en.md)
- [PaddleSlim Document](https://paddlepaddle.github.io/PaddleSlim/api/quantization_api/)
-
-## Quick Start
-Quantization is mostly suitable for the deployment of lightweight models on mobile terminals.
-After training, if you want to further compress the model size and accelerate the prediction, you can use quantization methods to compress the model according to the following steps.
-
-1. Install PaddleSlim
-2. Prepare trained model
-3. Quantization-Aware Training
-4. Export inference model
-5. Deploy quantization inference model
-
-
-### 1. Install PaddleSlim
-
-* Install by pip.
-
-```bash
-pip3.7 install paddleslim==2.0.0
-```
-
-* Install from source code to get the lastest features.
-
-```bash
-git clone https://github.com/PaddlePaddle/PaddleSlim.git
-cd Paddleslim
-python setup.py install
-```
-
-
-### 2. Download Pretrain Model
-PaddleClas provides a series of trained [models](../../../docs/en/models/models_intro_en.md).
-If the model to be quantified is not in the list, you need to follow the [Regular Training](../../../docs/en/tutorials/getting_started_en.md) method to get the trained model.
-
-
-### 3. Quant-Aware Training
-Quantization training includes offline quantization training and online quantization training.
-Online quantization training is more effective. It is necessary to load the pre-trained model.
-After the quantization strategy is defined, the model can be quantified.
-
-The code for quantization training is located in `deploy/slim/quant/quant.py`. The training command is as follow:
-
-* CPU/Single GPU training
-
-```bash
-python3.7 deploy/slim/quant/quant.py \
-    -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-    -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-* Distributed training
-
-```bash
-export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-```
-
-* The command of quantizing `MobileNetV3_large_x1_0` model is as follow:
-
-```bash
-# download pre-trained model
-wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x1_0_pretrained.pdparams
-
-# run training
-python3.7 -m paddle.distributed.launch \
-    --gpus="0,1,2,3,4,5,6,7" \
-    deploy/slim/quant/quant.py \
-        -c configs/MobileNetV3/MobileNetV3_large_x1_0.yaml \
-        -o pretrained_model="./MobileNetV3_large_x1_0_pretrained"
-        -o LEARNING_RATE.params.lr=0.13 \
-        -o epochs=100
-```
-
-
-### 4. Export inference model
-
-After getting the model quantization aware trained, we can export it as inference model for predictive deployment:
-
-```bash
-python3.7 deploy/slim/quant/export_model.py \
-    -m MobileNetV3_large_x1_0 \
-    -p output/MobileNetV3_large_x1_0/best_model/ppcls \
-    -o ./MobileNetV3_large_x1_0_infer/ \
-    --img_size=224 \
-    --class_dim=1000
-```
-
-### 5. Deploy
-The type of quantized model's parameters derived from the above steps is still FP32, but the numerical range of the parameters is int8.
-The derived model can be converted through the `opt tool` of PaddleLite.
-
-For quantitative model deployment, please refer to [Mobile terminal model deployment](../../lite/readme_en.md)
-
-## Notes:
-
-* In quantitative training, it is suggested to load the pre-trained model obtained from conventional training to accelerate the convergence of quantitative training.
-* In quantitative training, it is suggested that the initial learning rate should be changed to `1 / 20 ~ 1 / 10` of the conventional training, and the training epoch number should be changed to `1 / 5 ~ 1 / 2` of the conventional training. In terms of learning rate strategy, it's better to train with warmup, other configuration information is not recommended to be changed.