Merge branch 'develop' into add_multilabel

a90881c9 · cuicheng01 · GitHub · 5992be4a · c4f38454 · a90881c9
39 changed file
--- a/deploy/paddleserving/README.md
+++ b/deploy/paddleserving/README.md
@@ -4,9 +4,9 @@

 PaddleClas provides two service deployment methods:
 - Based on **PaddleHub Serving**: Code path is "`./deploy/hubserving`". Please refer to the [tutorial](../../deploy/hubserving/readme_en.md)
- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`". Please follow this tutorial.
+- Based on **PaddleServing**: Code path is "`./deploy/paddleserving`".  if you prefer retrieval_based image reocognition service, please refer to [tutorial](./recognition/README.md)，if you'd like image classification service, Please follow this tutorial.

-# Service deployment based on PaddleServing  
+# Image Classification Service deployment based on PaddleServing  

 This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the ResNet50_vd model as a pipeline online service.

@@ -131,7 +131,7 @@ fetch_var {
    config.yml                # configuration file of starting the service
    pipeline_http_client.py   # script to send pipeline prediction request by http
    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
-    resnet50_web_service.py   # start the script of the pipeline server
+    classification_web_service.py   # start the script of the pipeline server
    ```

 2. Run the following command to start the service.

--- a/deploy/paddleserving/README_CN.md
+++ b/deploy/paddleserving/README_CN.md
@@ -4,9 +4,9 @@

 PaddleClas提供2种服务部署方式：
 - 基于PaddleHub Serving的部署：代码路径为"`./deploy/hubserving`"，使用方法参考[文档](../../deploy/hubserving/readme.md)；
- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"，按照本教程使用。
+- 基于PaddleServing的部署：代码路径为"`./deploy/paddleserving`"， 基于检索方式的图像识别服务参考[文档](./recognition/README_CN.md)， 图像分类服务按照本教程使用。

-# 基于PaddleServing的服务部署
+# 基于PaddleServing的图像分类服务部署

 本文档以经典的ResNet50_vd模型为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas
 动态图模型的pipeline在线服务。
@@ -127,7 +127,7 @@ fetch_var {
    config.yml                 # 启动服务的配置文件
    pipeline_http_client.py    # http方式发送pipeline预测请求的脚本
    pipeline_rpc_client.py     # rpc方式发送pipeline预测请求的脚本
-    resnet50_web_service.py    # 启动pipeline服务端的脚本
+    classification_web_service.py    # 启动pipeline服务端的脚本
    ```

 2. 启动服务可运行如下命令：

--- a/deploy/paddleserving/imgs/results_recog.png
+++ b/deploy/paddleserving/imgs/results_recog.png
--- a/deploy/paddleserving/imgs/start_server_recog.png
+++ b/deploy/paddleserving/imgs/start_server_recog.png
--- a/deploy/paddleserving/recognition/README.md
+++ b/deploy/paddleserving/recognition/README.md
+# Product Recognition Service deployment based on PaddleServing  
+
+(English|[简体中文](./README_CN.md))
+
+This document will introduce how to use the [PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README.md) to deploy the product recognition model based on retrieval method as a pipeline online service.
+
+Some Key Features of Paddle Serving:
+- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed with one line command.
+- Industrial serving features supported, such as models management, online loading, online A/B testing etc.
+- Highly concurrent and efficient communication between clients and servers supported.
+
+The introduction and tutorial of Paddle Serving service deployment framework reference [document](https://github.com/PaddlePaddle/Serving/blob/develop/README.md).
+
+## Contents
+- [Environmental preparation](#environmental-preparation)
+- [Model conversion](#model-conversion)
+- [Paddle Serving pipeline deployment](#paddle-serving-pipeline-deployment)
+- [FAQ](#faq)
+
+<a name="environmental-preparation"></a>
+## Environmental preparation
+
+PaddleClas operating environment and PaddleServing operating environment are needed.
+
+1. Please prepare PaddleClas operating environment reference [link](../../docs/zh_CN/tutorials/install.md).
+   Download the corresponding paddle whl package according to the environment, it is recommended to install version 2.1.0.
+
+2. The steps of PaddleServing operating environment prepare are as follows:
+
+    Install serving which used to start the service
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # Other GPU environments need to confirm the environment and then choose to execute the following commands
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+
+3. Install the client to send requests to the service
+    In [download link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md) find the client installation package corresponding to the python version.
+    The python3.7 version is recommended here:
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+
+4. Install serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+
+   **note:** If you want to install the latest version of PaddleServing, refer to [link](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md).
+
+
+<a name="model-conversion"></a>
+## Model conversion
+When using PaddleServing for service deployment, you need to convert the saved inference model into a serving model that is easy to deploy.
+The following assumes that the current working directory is the PaddleClas root directory
+
+Firstly, download the inference model of ResNet50_vd
+```
+cd deploy
+# Download and unzip the ResNet50_vd model
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+
+Then, you can use installed paddle_serving_client tool to convert inference model to mobile model.
+```
+#  Product recognition model conversion
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+
+After the ResNet50_vd inference model is converted, there will be additional folders of `product_ResNet50_vd_aliproduct_v1.0_serving` and `product_ResNet50_vd_aliproduct_v1.0_client` in the current folder, with the following format:
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+```
+
+Once you have the model file for deployment, you need to change the alias name in `serving_server_conf.prototxt`:  change `alias_name` in `fetch_var` to `features`,
+The modified serving_server_conf.prototxt file is as follows:
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+
+Next，download and unpack the built index of product gallery
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+
+
+<a name="paddle-serving-pipeline-deployment"></a>
+## Paddle Serving pipeline deployment
+
+1. Download the PaddleClas code, if you have already downloaded it, you can skip this step.
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+
+    # Enter the working directory  
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+
+    The paddleserving directory contains the code to start the pipeline service and send prediction requests, including:
+    ```
+    __init__.py
+    config.yml                # configuration file of starting the service
+    pipeline_http_client.py   # script to send pipeline prediction request by http
+    pipeline_rpc_client.py    # script to send pipeline prediction request by rpc
+    recognition_web_service.py   # start the script of the pipeline server
+    ```
+
+2. Run the following command to start the service.
+    ```
+    # Start the service and save the running log in log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    After the service is successfully started, a log similar to the following will be printed in log.txt
+    ![](../imgs/start_server_recog.png)
+
+3. Send service request
+    ```
+    python3 pipeline_http_client.py
+    ```
+    After successfully running, the predicted result of the model will be printed in the cmd window. An example of the result is:
+    ![](../imgs/results_recog.png)  
+
+    Adjust the number of concurrency in config.yml to get the largest QPS. 
+
+    ```
+    op:
+        concurrency: 8
+        ...
+    ```
+
+    Multiple service requests can be sent at the same time if necessary.
+
+    The predicted performance data will be automatically written into the `PipelineServingLogs/pipeline.tracer` file.
+
+<a name="faq"></a>
+## FAQ
+**Q1**: No result return after sending the request.
+
+**A1**: Do not set the proxy when starting the service and sending the request. You can close the proxy before starting the service and before sending the request. The command to close the proxy is:
+```
+unset https_proxy
+unset http_proxy
+```  
--- a/deploy/paddleserving/recognition/README_CN.md
+++ b/deploy/paddleserving/recognition/README_CN.md
+# 基于PaddleServing的商品识别服务部署
+
+([English](./README.md)|简体中文)
+
+本文以商品识别为例，介绍如何使用[PaddleServing](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)工具部署PaddleClas动态图模型的pipeline在线服务。
+
+相比较于hubserving部署，PaddleServing具备以下优点：
+- 支持客户端和服务端之间高并发和高效通信
+- 支持 工业级的服务能力 例如模型管理，在线加载，在线A/B测试等
+- 支持 多种编程语言 开发客户端，例如C++, Python和Java
+
+更多有关PaddleServing服务化部署框架介绍和使用教程参考[文档](https://github.com/PaddlePaddle/Serving/blob/develop/README_CN.md)。
+
+## 目录
+- [环境准备](#环境准备)
+- [模型转换](#模型转换)
+- [Paddle Serving pipeline部署](#部署)
+- [FAQ](#FAQ)
+
+<a name="环境准备"></a>
+## 环境准备
+
+需要准备PaddleClas的运行环境和PaddleServing的运行环境。
+
+- 准备PaddleClas的[运行环境](../../docs/zh_CN/tutorials/install.md), 根据环境下载对应的paddle whl包，推荐安装2.1.0版本
+
+- 准备PaddleServing的运行环境，步骤如下
+
+1. 安装serving，用于启动服务
+    ```
+    pip3 install paddle-serving-server==0.6.1 # for CPU
+    pip3 install paddle-serving-server-gpu==0.6.1 # for GPU
+    # 其他GPU环境需要确认环境再选择执行如下命令
+    pip3 install paddle-serving-server-gpu==0.6.1.post101 # GPU with CUDA10.1 + TensorRT6
+    pip3 install paddle-serving-server-gpu==0.6.1.post11 # GPU with CUDA11 + TensorRT7
+    ```
+
+2. 安装client，用于向服务发送请求
+    在[下载链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)中找到对应python版本的client安装包，这里推荐python3.7版本：
+
+    ```
+    wget https://paddle-serving.bj.bcebos.com/test-dev/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
+    pip3 install paddle_serving_client-0.0.0-cp37-none-any.whl
+    ```
+
+3. 安装serving-app
+    ```
+    pip3 install paddle-serving-app==0.6.1
+    ```
+    **Note:** 如果要安装最新版本的PaddleServing参考[链接](https://github.com/PaddlePaddle/Serving/blob/develop/doc/LATEST_PACKAGES.md)。
+
+<a name="模型转换"></a>
+## 模型转换
+
+使用PaddleServing做服务化部署时，需要将保存的inference模型转换为serving易于部署的模型。 
+以下内容假定当前工作目录为PaddleClas根目录。
+
+首先，下载商品识别的inference模型
+```
+cd deploy
+
+# 下载并解压商品识别模型
+wget -P models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/product_ResNet50_vd_aliproduct_v1.0_infer.tar
+cd models
+tar -xf product_ResNet50_vd_aliproduct_v1.0_infer.tar
+```
+
+接下来，用安装的paddle_serving_client把下载的inference模型转换成易于server部署的模型格式。
+
+```
+# 转换商品识别模型
+python3 -m paddle_serving_client.convert --dirname ./product_ResNet50_vd_aliproduct_v1.0_infer/ \
+                                         --model_filename inference.pdmodel  \
+                                         --params_filename inference.pdiparams \
+                                         --serving_server ./product_ResNet50_vd_aliproduct_v1.0_serving/ \
+                                         --serving_client ./product_ResNet50_vd_aliproduct_v1.0_client/
+```
+商品识别推理模型转换完成后，会在当前文件夹多出`product_ResNet50_vd_aliproduct_v1.0_serving` 和`product_ResNet50_vd_aliproduct_v1.0_client`的文件夹，具备如下格式：
+```
+|- product_ResNet50_vd_aliproduct_v1.0_serving/
+  |- __model__  
+  |- __params__
+  |- serving_server_conf.prototxt  
+  |- serving_server_conf.stream.prototxt
+
+|- product_ResNet50_vd_aliproduct_v1.0_client
+  |- serving_client_conf.prototxt  
+  |- serving_client_conf.stream.prototxt
+
+```
+得到模型文件之后，需要修改serving_server_conf.prototxt中的alias名字： 将`fetch_var`中的`alias_name`改为`features`, 
+修改后的serving_server_conf.prototxt内容如下：
+```
+feed_var {
+  name: "x"
+  alias_name: "x"
+  is_lod_tensor: false
+  feed_type: 1
+  shape: 3
+  shape: 224
+  shape: 224
+}
+fetch_var {
+  name: "save_infer_model/scale_0.tmp_1"
+  alias_name: "features"
+  is_lod_tensor: true
+  fetch_type: 1
+  shape: -1
+}
+```
+
+接下来，下载并解压已经构建后的商品库index
+```
+cd ../
+wget https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/recognition_demo_data_v1.1.tar && tar -xf recognition_demo_data_v1.1.tar
+```
+
+
+<a name="部署"></a>
+## Paddle Serving pipeline部署
+
+1. 下载PaddleClas代码，若已下载可跳过此步骤
+    ```
+    git clone https://github.com/PaddlePaddle/PaddleClas
+
+    # 进入到工作目录
+    cd PaddleClas/deploy/paddleserving/recognition
+    ```
+    paddleserving目录包含启动pipeline服务和发送预测请求的代码，包括：
+    ```
+    __init__.py
+    config.yml                    # 启动服务的配置文件
+    pipeline_http_client.py       # http方式发送pipeline预测请求的脚本
+    pipeline_rpc_client.py        # rpc方式发送pipeline预测请求的脚本
+    recognition_web_service.py    # 启动pipeline服务端的脚本
+    ```
+
+2. 启动服务可运行如下命令：
+    ```
+    # 启动服务，运行日志保存在log.txt
+    python3 recognition_web_service.py &>log.txt &
+    ```
+    成功启动服务后，log.txt中会打印类似如下日志
+    ![](../imgs/start_server_recog.png)
+
+3. 发送服务请求：
+    ```
+    python3 pipeline_http_client.py
+    ```
+    成功运行后，模型预测的结果会打印在cmd窗口中，结果示例为：
+    ![](../imgs/results_recog.png)
+
+    调整 config.yml 中的并发个数可以获得最大的QPS
+    ```
+    op:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 8
+        ...
+    ```
+    有需要的话可以同时发送多个服务请求
+
+    预测性能数据会被自动写入 `PipelineServingLogs/pipeline.tracer` 文件中。
+
+<a name="FAQ"></a>
+## FAQ
+**Q1**： 发送请求后没有结果返回或者提示输出解码报错
+
+**A1**： 启动服务和发送请求时不要设置代理，可以在启动服务前和发送请求前关闭代理，关闭代理的命令是：
+```
+unset https_proxy
+unset http_proxy
+```
--- a/deploy/paddleserving/recognition/__init__.py
+++ b/deploy/paddleserving/recognition/__init__.py
--- a/deploy/paddleserving/recognition/config.yml
+++ b/deploy/paddleserving/recognition/config.yml
+#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程，每个进程内构建grpcSever和DAG
+##当build_dag_each_worker=False时，框架会设置主线程grpc线程池的max_workers=worker_num
+worker_num: 1
+
+#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时，不自动生成http_port
+http_port: 18081
+rpc_port: 9994
+
+dag:
+    #op资源类型, True, 为线程模型；False，为进程模型
+    is_thread_op: False
+op:
+    rec:
+        #并发数，is_thread_op=True时，为线程并发；否则为进程并发
+        concurrency: 1
+
+        #当op配置没有server_endpoints时，从local_service_conf读取本地服务配置
+        local_service_conf:
+
+            #uci模型路径
+            model_config: ../../models/product_ResNet50_vd_aliproduct_v1.0_serving
+
+            #计算硬件类型: 空缺时由devices决定(CPU/GPU)，0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
+            device_type: 1
+
+            #计算硬件ID，当devices为""或不写时为CPU预测；当devices为"0", "0,1,2"时为GPU预测，表示使用的GPU卡
+            devices: "0" # "0,1"
+
+            #client类型，包括brpc, grpc和local_predictor.local_predictor不启动Serving服务，进程内预测
+            client_type: local_predictor
+
+            #Fetch结果列表，以client_config中fetch_var的alias_name为准
+            fetch_list: ["features"]
+            
+    det:
+        concurrency: 1
+        local_service_conf:
+            client_type: local_predictor
+            device_type: 1
+            devices: '0'
+            fetch_list:
+            - save_infer_model/scale_0.tmp_1
+            model_config: ../../models/ppyolov2_r50vd_dcn_mainbody_v1.0_serving/
\ No newline at end of file
--- a/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
+++ b/deploy/paddleserving/recognition/daoxiangcunjinzhubing_6.jpg
--- a/deploy/paddleserving/recognition/label_list.txt
+++ b/deploy/paddleserving/recognition/label_list.txt
+foreground
+background
\ No newline at end of file
--- a/deploy/paddleserving/recognition/pipeline_http_client.py
+++ b/deploy/paddleserving/recognition/pipeline_http_client.py
+import requests
+import json
+import base64
+import os
+
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+if __name__ == "__main__":
+    url = "http://127.0.0.1:18081/recognition/prediction"
+
+    with open(os.path.join(".",  imgpath), 'rb') as file:
+        image_data1 = file.read()
+    image = cv2_to_base64(image_data1)
+    data = {"key": ["image"], "value": [image]}
+
+    for i in range(1):
+        r = requests.post(url=url, data=json.dumps(data))
+        print(r.json())
--- a/deploy/paddleserving/recognition/pipeline_rpc_client.py
+++ b/deploy/paddleserving/recognition/pipeline_rpc_client.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+try:
+    from paddle_serving_server_gpu.pipeline import PipelineClient
+except ImportError:
+    from paddle_serving_server.pipeline import PipelineClient
+import base64
+
+client = PipelineClient()
+client.connect(['127.0.0.1:9994'])
+imgpath = "daoxiangcunjinzhubing_6.jpg"
+
+def cv2_to_base64(image):
+    return base64.b64encode(image).decode('utf8')
+
+if __name__ == "__main__":
+    with open(imgpath, 'rb') as file:
+        image_data = file.read()
+    image = cv2_to_base64(image_data)
+
+    for i in range(1):
+        ret = client.predict(feed_dict={"image": image}, fetch=["result"])
+        print(ret)
--- a/deploy/paddleserving/recognition/recognition_web_service.py
+++ b/deploy/paddleserving/recognition/recognition_web_service.py
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from paddle_serving_server.web_service import WebService, Op
+import logging
+import numpy as np
+import sys
+import cv2
+from paddle_serving_app.reader import *
+import base64
+import os
+import faiss
+import pickle
+import json
+
+class DetOp(Op):
+    def init_op(self):
+        self.img_preprocess = Sequential([
+            BGR2RGB(), Div(255.0),
+            Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
+            Resize((640, 640)), Transpose((2, 0, 1))
+        ])
+
+        self.img_postprocess = RCNNPostprocess("label_list.txt", "output")
+        self.threshold = 0.2
+        self.max_det_results = 5
+
+    def generate_scale(self, im):
+        """
+        Args:
+            im (np.ndarray): image (np.ndarray)
+        Returns:
+            im_scale_x: the resize ratio of X
+            im_scale_y: the resize ratio of Y
+        """
+        target_size = [640, 640]
+        origin_shape = im.shape[:2]
+        resize_h, resize_w = target_size
+        im_scale_y = resize_h / float(origin_shape[0])
+        im_scale_x = resize_w / float(origin_shape[1])
+        return im_scale_y, im_scale_x
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        imgs = []
+        raw_imgs = []
+        for key in input_dict.keys():
+            data = base64.b64decode(input_dict[key].encode('utf8'))
+            raw_imgs.append(data)
+            data = np.fromstring(data, np.uint8)
+            raw_im = cv2.imdecode(data, cv2.IMREAD_COLOR)
+
+            im_scale_y, im_scale_x = self.generate_scale(raw_im)
+            im = self.img_preprocess(raw_im)
+            
+            imgs.append({
+              "image": im[np.newaxis, :],
+              "im_shape": np.array(list(im.shape[1:])).reshape(-1)[np.newaxis,:],
+              "scale_factor": np.array([im_scale_y, im_scale_x]).astype('float32'),
+            })
+        self.raw_img = raw_imgs
+
+        feed_dict = {
+            "image":        np.concatenate([x["image"] for x in imgs], axis=0),
+            "im_shape":     np.concatenate([x["im_shape"] for x in imgs], axis=0),
+            "scale_factor": np.concatenate([x["scale_factor"] for x in imgs], axis=0)
+        }
+        return feed_dict, False,  None,  ""
+
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        boxes = self.img_postprocess(fetch_dict, visualize=False)
+        boxes.sort(key = lambda x: x["score"], reverse = True)
+        boxes = filter(lambda x: x["score"] >= self.threshold, boxes[:self.max_det_results])
+        boxes = list(boxes)
+        for i in range(len(boxes)):
+            boxes[i]["bbox"][2] += boxes[i]["bbox"][0] - 1
+            boxes[i]["bbox"][3] += boxes[i]["bbox"][1] - 1
+        result = json.dumps(boxes)
+        res_dict = {"bbox_result": result, "image": self.raw_img}
+        return res_dict,  None,  ""
+
+class RecOp(Op):
+    def init_op(self):
+        self.seq = Sequential([
+            BGR2RGB(), Resize((224, 224)), 
+            Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225],
+                                False), Transpose((2, 0, 1))
+        ])
+
+        index_dir = "../../recognition_demo_data_v1.1/gallery_product/index"
+        assert os.path.exists(os.path.join(
+            index_dir, "vector.index")), "vector.index not found ..."
+        assert os.path.exists(os.path.join(
+            index_dir, "id_map.pkl")), "id_map.pkl not found ... "
+        
+        self.searcher = faiss.read_index(
+            os.path.join(index_dir, "vector.index"))
+                
+        with open(os.path.join(index_dir, "id_map.pkl"), "rb") as fd:
+            self.id_map = pickle.load(fd)
+
+        self.rec_nms_thresold = 0.05
+        self.rec_score_thres = 0.5
+        self.feature_normalize = True
+        self.return_k = 1
+
+    def preprocess(self, input_dicts, data_id, log_id):
+        (_, input_dict), = input_dicts.items()
+        raw_img = input_dict["image"][0]
+        data = np.frombuffer(raw_img, np.uint8)
+        origin_img = cv2.imdecode(data, cv2.IMREAD_COLOR)
+        dt_boxes = input_dict["bbox_result"]
+        boxes = json.loads(dt_boxes)
+        boxes.append({"category_id": 0,
+                      "score": 1.0,
+                      "bbox": [0, 0, origin_img.shape[1], origin_img.shape[0]]
+                     })
+        self.det_boxes = boxes
+
+        #construct batch images for rec
+        imgs = []
+        for box in boxes:
+            box = [int(x) for x in box["bbox"]]
+            im = origin_img[box[1]: box[3], box[0]: box[2]].copy()
+            img = self.seq(im)
+            imgs.append(img[np.newaxis, :].copy())
+
+        input_imgs = np.concatenate(imgs, axis=0)
+        return {"x": input_imgs},  False,  None,  ""
+
+    def nms_to_rec_results(self, results, thresh = 0.1):
+        filtered_results = []
+        x1 = np.array([r["bbox"][0] for r in results]).astype("float32")
+        y1 = np.array([r["bbox"][1] for r in results]).astype("float32")
+        x2 = np.array([r["bbox"][2] for r in results]).astype("float32")
+        y2 = np.array([r["bbox"][3] for r in results]).astype("float32")
+        scores = np.array([r["rec_scores"] for r in results])
+
+        areas = (x2 - x1 + 1) * (y2 - y1 + 1)
+        order = scores.argsort()[::-1]
+        while order.size > 0:
+            i = order[0]
+            xx1 = np.maximum(x1[i], x1[order[1:]])
+            yy1 = np.maximum(y1[i], y1[order[1:]])
+            xx2 = np.minimum(x2[i], x2[order[1:]])
+            yy2 = np.minimum(y2[i], y2[order[1:]])
+
+            w = np.maximum(0.0, xx2 - xx1 + 1)
+            h = np.maximum(0.0, yy2 - yy1 + 1)
+            inter = w * h
+            ovr = inter / (areas[i] + areas[order[1:]] - inter)
+            inds = np.where(ovr <= thresh)[0]
+            order = order[inds + 1]
+            filtered_results.append(results[i])
+        return filtered_results
+
+    def postprocess(self, input_dicts, fetch_dict, log_id):
+        batch_features = fetch_dict["features"]
+
+        if self.feature_normalize:
+            feas_norm = np.sqrt(
+                np.sum(np.square(batch_features), axis=1, keepdims=True))
+            batch_features = np.divide(batch_features, feas_norm)
+
+        scores, docs = self.searcher.search(batch_features,  self.return_k)
+
+        results = []
+        for i in range(scores.shape[0]):
+            pred = {}
+            if scores[i][0] >= self.rec_score_thres:
+                pred["bbox"] = [int(x) for x in self.det_boxes[i]["bbox"]]
+                pred["rec_docs"] = self.id_map[docs[i][0]].split()[1]
+                pred["rec_scores"] = scores[i][0]
+                results.append(pred)
+        
+        #do nms
+        results = self.nms_to_rec_results(results, self.rec_nms_thresold)
+        return {"result": str(results)}, None, ""
+
+class RecognitionService(WebService):
+    def get_pipeline_response(self, read_op):
+        det_op = DetOp(name="det", input_ops=[read_op])
+        rec_op = RecOp(name="rec", input_ops=[det_op])
+        return rec_op
+
+product_recog_service = RecognitionService(name="recognition")
+product_recog_service.prepare_pipeline_config("config.yml")
+product_recog_service.run_service()
--- a/deploy/python/preprocess.py
+++ b/deploy/python/preprocess.py
@@ -78,6 +78,9 @@ class UnifiedResize(object):
        if backend.lower() == "cv2":
            if isinstance(interpolation, str):
                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif not interpolation:
+                interpolation = cv2.INTER_LINEAR
            self.resize_func = partial(cv2.resize, interpolation=interpolation)
        elif backend.lower() == "pil":
            if isinstance(interpolation, str):

--- a/docs/en/tutorials/getting_started_en.md
+++ b/docs/en/tutorials/getting_started_en.md
@@ -14,13 +14,13 @@ After preparing the configuration file, The training process can be started in t

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="" \
-    -o use_gpu=False
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Arch.pretrained=False \
+    -o Global.device=gpu
 ```

-Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o pretrained_model=""` means to not using pre-trained models.
-`-o use_gpu=True` means to use GPU for training. If you want to use the CPU for training, you need to set `use_gpu` to `False`.
+Among them, `-c` is used to specify the path of the configuration file, `-o` is used to specify the parameters needed to be modified or added, `-o Arch.pretrained=False` means to not using pre-trained models.
+`-o Global.device=gpu` means to use GPU for training. If you want to use the CPU for training, you need to set `Global.device` to `cpu`.


 Of course, you can also directly modify the configuration file to update the configuration. For specific configuration parameters, please refer to [Configuration Document](config_description_en.md).
@@ -54,12 +54,12 @@ After configuring the configuration file, you can finetune it by loading the pre

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained" \
-    -o use_gpu=True
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Arch.pretrained=True \
+    -o Global.device=gpu
 ```

-Among them, `-o pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `-o Arch.pretrained` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file. You can also set it into `True` to use pretrained weights that trained in ImageNet1k.

 We also provide a lot of pre-trained models trained on the ImageNet-1k dataset. For the model list and download address, please refer to the [model library overview](../models/models_intro_en.md).

@@ -69,28 +69,26 @@ If the training process is terminated for some reasons, you can also load the ch

 ```
 python tools/train.py \
-    -c configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-    -o last_epoch=5 \
-    -o use_gpu=True
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
+    -o Global.device=gpu
 ```

-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter.

 **Note**:
-* The parameter `-o last_epoch=5` means to record the number of the last training epoch as `5`, that is, the number of this training epoch starts from `6`, , and the parameter defaults to `-1`, which means the number of this training epoch starts from `0`.

-* The `-o checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `checkpoints` to `./output/MobileNetV3_large_x1_0_gpupaddle/5/ppcls`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.
+* The `-o Global.checkpoints` parameter does not need to include the suffix of the checkpoints. The above training command will generate the checkpoints as shown below during the training process. If you want to continue training from the epoch `5`, Just set the `Global.checkpoints` to `../output/MobileNetV3_large_x1_0/epoch_5`, PaddleClas will automatically fill in the `pdopt` and `pdparams` suffixes.

    ```shell
-    output/
-    └── MobileNetV3_large_x1_0
-        ├── 0
-        │   ├── ppcls.pdopt
-        │   └── ppcls.pdparams
-        ├── 1
-        │   ├── ppcls.pdopt
-        │   └── ppcls.pdparams
+    output
+    ├── MobileNetV3_large_x1_0
+    │   ├── best_model.pdopt
+    │   ├── best_model.pdparams
+    │   ├── best_model.pdstates
+    │   ├── epoch_1.pdopt
+    │   ├── epoch_1.pdparams
+    │   ├── epoch_1.pdstates
        .
        .
        .
@@ -103,18 +101,15 @@ The model evaluation process can be started as follows.

 ```bash
 python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-    -o load_static_weights=False
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

-The above command will use `./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model/ppcls`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.
+The above command will use `./configs/quick_start/MobileNetV3_large_x1_0.yaml` as the configuration file to evaluate the model `./output/MobileNetV3_large_x1_0/best_model`. You can also set the evaluation by changing the parameters in the configuration file, or you can update the configuration with the `-o` parameter, as shown above.

 Some of the configurable evaluation parameters are described as follows:
-* `ARCHITECTURE.name`: Model name
-* `pretrained_model`: The path of the model file to be evaluated
-* `load_static_weights`: Whether the model to be evaluated is a static graph model
-
+* `Arch.name`: Model name
+* `Global.pretrained_model`: The path of the model file to be evaluated

 **Note:** If the model is a dygraph type, you only need to specify the prefix of the model file when loading the model, instead of specifying the suffix, such as [1.3 Resume Training](#13-resume-training).

@@ -125,26 +120,15 @@ If you want to run PaddleClas on Linux with GPU, it is highly recommended to use

 ### 2.1 Model training

-After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `selected_gpus`:
+After preparing the configuration file, The training process can be started in the following way. `paddle.distributed.launch` specifies the GPU running card number by setting `gpus`:

 ```bash
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml
-```
-
-The configuration can be updated by adding the `-o` parameter.
-
-```bash
-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
-    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="" \
-        -o use_gpu=True
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml
 ```

 The format of output log information is the same as above, see [1.1 Model training](#11-model-training) for details.
@@ -156,14 +140,14 @@ After configuring the configuration file, you can finetune it by loading the pre
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o pretrained_model="./pretrained/MobileNetV3_large_x1_0_pretrained"
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Arch.pretrained=True
 ```

-Among them, `pretrained_model` is used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.
+Among them, `Arch.pretrained` is set to `True` or `False`. It also can be used to set the address to load the pretrained weights. When using it, you need to replace it with your own pretrained weights' path, or you can modify the path directly in the configuration file.

 There contains a lot of examples of model finetuning in [Quick Start](./quick_start_en.md). You can refer to this tutorial to finetune the model on a specific dataset.

@@ -175,26 +159,26 @@ If the training process is terminated for some reasons, you can also load the ch
 ```
 export CUDA_VISIBLE_DEVICES=0,1,2,3

-python -m paddle.distributed.launch \
-    --selected_gpus="0,1,2,3" \
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
    tools/train.py \
-        -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-        -o checkpoints="./output/MobileNetV3_large_x1_0/5/ppcls" \
-        -o last_epoch=5 \
-        -o use_gpu=True
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.checkpoints="./output/MobileNetV3_large_x1_0/epoch_5" \
+        -o Global.device=gpu
 ```

-The configuration file does not need to be modified. You only need to add the `checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter. About `last_epoch` parameter, please refer [1.3 Resume training](#13-resume-training) for details.
+The configuration file does not need to be modified. You only need to add the `Global.checkpoints` parameter during training, which represents the path of the checkpoints. The parameter weights, learning rate, optimizer and other information will be loaded using this parameter as described in [1.3 Resume training](#13-resume-training).

 ### 2.4 Model evaluation

 The model evaluation process can be started as follows.

 ```bash
-python tools/eval.py \
-    -c ./configs/quick_start/MobileNetV3_large_x1_0_finetune.yaml \
-    -o pretrained_model="./output/MobileNetV3_large_x1_0/best_model/ppcls"\
-    -o load_static_weights=False
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    tools/eval.py \
+        -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+        -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

 About parameter description, see [1.4 Model evaluation](#14-model-evaluation) for details.
@@ -204,30 +188,16 @@ About parameter description, see [1.4 Model evaluation](#14-model-evaluation) fo
 After the training is completed, you can predict by using the pre-trained model obtained by the training, as follows:

 ```python
-python tools/infer/infer.py \
-    -i image path \
-    --model MobileNetV3_large_x1_0 \
-    --pretrained_model "./output/MobileNetV3_large_x1_0/best_model/ppcls" \
-    --use_gpu True \
-    --load_static_weights False
+python3 tools/infer.py \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Infer.infer_imgs=dataset/flowers102/jpg/image_00001.jpg \
+    -o Global.pretrained_model=./output/MobileNetV3_large_x1_0/best_model
 ```

 Among them:
-+ `image_file`(i): The path of the image file to be predicted, such as `./test.jpeg`;
-+ `model`: Model name, such as `MobileNetV3_large_x1_0`;
-+ `pretrained_model`: Weight file path, such as `./pretrained/MobileNetV3_large_x1_0_pretrained/`;
-+ `use_gpu`: Whether to use the GPU, default by `True`;
-+ `load_static_weights`: Whether to load the pre-trained model obtained from static image training, default by `False`;
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `pre_label_image`: Whether to pre-label the image data, default value: `False`;
-+ `pre_label_out_idr`: The output path of pre-labeled image data. When `pre_label_image=True`, a lot of subfolders will be generated under the path, each subfolder represent a category, which stores all the images predicted by the model to belong to the category.
-
-**Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.
-
-About more detailed infomation, you can refer to [infer.py](../../../tools/infer/infer.py).
+ `Infer.infer_imgs`: The path of the image file or folder to be predicted;
+ `Global.pretrained_model`: Weight file path, such as `./output/MobileNetV3_large_x1_0/best_model`;

-<a name="model_inference"></a>
 ## 4. Use the inference model to predict

 PaddlePaddle supports inference using prediction engines, which will be introduced next.
@@ -235,41 +205,38 @@ PaddlePaddle supports inference using prediction engines, which will be introduc
 Firstly, you should export inference model using `tools/export_model.py`.

 ```bash
-python tools/export_model.py \
-    --model MobileNetV3_large_x1_0 \
-    --pretrained_model ./output/MobileNetV3_large_x1_0/best_model/ppcls \
-    --output_path ./inference \
-    --class_dim 1000
+python3 tools/export_model.py \
+    -c ./ppcls/configs/quick_start/MobileNetV3_large_x1_0.yaml \
+    -o Global.pretrained_model=output/MobileNetV3_large_x1_0/best_model
 ```

-Among them, the `--model` parameter is used to specify the model name, `--pretrained_model` parameter is used to specify the model file path, the path does not need to include the model file suffix name, and `--output_path` is used to specify the storage path of the converted model, class_dim means number of class for the model, default as 1000.
-
-**Note**:
-1. If `--output_path=./inference`, then three files will be generated in the folder `inference`, they are `inference.pdiparams`, `inference.pdmodel` and `inference.pdiparams.info`.
-2. You can specify the `shape` of the model input image by setting the parameter `--img_size`, the default is `224`, which means the shape of input image is `224*224`. If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, you need to set `--img_size=384`.
+Among them,  `Global.pretrained_model` parameter is used to specify the model file path that does not need to include the file suffix name.

 The above command will generate the model structure file (`inference.pdmodel`) and the model weight file (`inference.pdiparams`), and then the inference engine can be used for inference:

+Go to the deploy directory:
+
+```
+cd deploy
+```
+
+Using inference engine to inference. Because the mapping file of ImageNet1k dataset is used by default, we should set `PostProcess.Topk.class_id_map_file` into `None`.
+
 ```bash
-python tools/infer/predict.py \
-    --image_file image path \
-    --model_file "./inference/inference.pdmodel" \
-    --params_file "./inference/inference.pdiparams" \
-    --use_gpu=True \
-    --use_tensorrt=False
+python3 python/predict_cls.py \
+    -c configs/inference_cls.yaml \
+    -o Global.infer_imgs=../dataset/flowers102/jpg/image_00001.jpg \
+    -o Global.inference_model_dir=../inference/ \
+    -o PostProcess.Topk.class_id_map_file=None
 ```
 Among them:
-+ `image_file`: The path of the image file to be predicted, such as `./test.jpeg`;
-+ `model_file`: Model file path, such as `./MobileNetV3_large_x1_0/inference.pdmodel`;
-+ `params_file`: Weight file path, such as `./MobileNetV3_large_x1_0/inference.pdiparams`;
-+ `use_tensorrt`: Whether to use the TesorRT, default by `True`;
-+ `use_gpu`: Whether to use the GPU, default by `True`
-+ `enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. When both `use_gpu` and `enable_mkldnn` are set to `True`, GPU is used to run and `enable_mkldnn` will be ignored.
-+ `resize_short`: The length of the shortest side of the image that be scaled proportionally, default by `256`;
-+ `resize`: The side length of the image that be center cropped from resize_shorted image, default by `224`;
-+ `enable_calc_topk`: Whether to calculate top-k accuracy of the predction, default by `False`. Top-k accuracy will be printed out when set as `True`.
-+ `gt_label_path`: Image name and label file, used when `enable_calc_topk` is `True` to get image list and labels.
+ `Global.infer_imgs`: The path of the image file to be predicted;
+ `Global.inference_model_dir`: Model structure file path, such as `../inference/inference.pdmodel`;
+ `Global.use_tensorrt`: Whether to use the TesorRT, default by `False`;
+ `Global.use_gpu`: Whether to use the GPU, default by `True`
+ `Global.enable_mkldnn`: Wheter to use `MKL-DNN`, default by `False`. It is valid when `Global.use_gpu` is `False`.
+ `Global.use_fp16`: Whether to enable FP16, default by `False`;

 **Note**: If you want to use `Transformer series models`, such as `DeiT_***_384`, `ViT_***_384`, etc., please pay attention to the input size of model, and need to set `resize_short=384`, `resize=384`.

-If you want to evaluate the speed of the model, it is recommended to use [predict.py](../../../tools/infer/predict.py), and enable TensorRT to accelerate.
+If you want to evaluate the speed of the model, it is recommended to enable TensorRT to accelerate for GPU, and MKL-DNN for CPU.
--- a/docs/en/tutorials/getting_started_retrieval_en.md
+++ b/docs/en/tutorials/getting_started_retrieval_en.md
@@ -120,7 +120,7 @@ python3 tools/train.py \

 `-c` is used to specify the path to the configuration file, and `-o` is used to specify the parameters that need to be modified or added, where `-o Arch.Backbone.pretrained=True` indicates that the Backbone part uses the pre-trained model, in addition, `Arch.Backbone.pretrained` can also specify backbone.`pretrained` can also specify the address of a specific model weight file, which needs to be replaced with the path to your own pre-trained model weight file when using it. `-o Global.device=gpu` indicates that the GPU is used for training. If you want to use a CPU for training, you need to set `Global.device` to `cpu`.

-For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_en.md) for specific configuration parameters.
+For more detailed training configuration, you can also modify the corresponding configuration file of the model directly. Refer to the [configuration document](config_description_en.md) for specific configuration parameters.

 Run the above commands to check the output log, an example is as follows:


--- a/docs/images/wx_group.png
+++ b/docs/images/wx_group.png
--- a/docs/zh_CN/tutorials/getting_started_retrieval.md
+++ b/docs/zh_CN/tutorials/getting_started_retrieval.md
@@ -117,7 +117,7 @@ python3 tools/train.py \

 其中，`-c`用于指定配置文件的路径，`-o`用于指定需要修改或者添加的参数，其中`-o Arch.Backbone.pretrained=True`表示Backbone部分使用预训练模型，此外，`Arch.Backbone.pretrained`也可以指定具体的模型权重文件的地址，使用时需要换成自己的预训练模型权重文件的路径。`-o Global.device=gpu`表示使用GPU进行训练。如果希望使用CPU进行训练，则需要将`Global.device`设置为`cpu`。

-更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config.md)。
+更详细的训练配置，也可以直接修改模型对应的配置文件。具体配置参数参考[配置文档](config_description.md)。

 运行上述命令，可以看到输出日志，示例如下：


--- a/ppcls/arch/backbone/model_zoo/googlenet.py
+++ b/ppcls/arch/backbone/model_zoo/googlenet.py
@@ -131,7 +131,7 @@ class GoogLeNetDY(nn.Layer):
        self._ince5b = Inception(
            832, 832, 384, 192, 384, 48, 128, 128, name="ince5b")

-        self._pool_5 = AvgPool2D(kernel_size=7, stride=7)
+        self._pool_5 = AdaptiveAvgPool2D(1)

        self._drop = Dropout(p=0.4, mode="downscale_in_infer")
        self._fc_out = Linear(

--- a/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+++ b/ppcls/configs/GeneralRecognition/GeneralRecognition_PPLCNet_x2_5.yaml
+# global configs
+Global:
+  checkpoints: null
+  pretrained_model: null
+  output_dir: ./output/
+  device: gpu
+  save_interval: 1
+  eval_during_train: True
+  eval_interval: 1
+  epochs: 100
+  print_batch_step: 10
+  use_visualdl: False
+  # used for static mode and model export
+  image_shape: [3, 224, 224]
+  save_inference_dir: ./inference
+  eval_mode: retrieval
+  use_dali: False
+  to_static: False
+
+# model architecture
+Arch:
+  name: RecModel
+  infer_output_key: features
+  infer_add_softmax: False
+
+  Backbone: 
+    name: PPLCNet_x2_5
+    pretrained: True
+    use_ssld: True
+  BackboneStopLayer:
+    name: flatten_0
+  Neck:
+    name: FC
+    embedding_size: 1280
+    class_num: 512
+  Head:
+    name: ArcMargin 
+    embedding_size: 512
+    class_num: 185341
+    margin: 0.2
+    scale: 30
+
+# loss function config for traing/eval process
+Loss:
+  Train:
+    - CELoss:
+        weight: 1.0
+  Eval:
+    - CELoss:
+        weight: 1.0
+
+Optimizer:
+  name: Momentum
+  momentum: 0.9
+  lr:
+    name: Cosine
+    learning_rate: 0.04
+    warmup_epoch: 5
+  regularizer:
+    name: 'L2'
+    coeff: 0.00001
+
+
+# data loader for train and eval
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/train_reg_all_data.txt
+      transform_ops:
+        - DecodeImage:
+            to_rgb: True
+            channel_first: False
+        - RandCropImage:
+            size: 224
+        - RandFlipImage:
+            flip_code: 1
+        - NormalizeImage:
+            scale: 1.0/255.0
+            mean: [0.485, 0.456, 0.406]
+            std: [0.229, 0.224, 0.225]
+            order: ''
+
+    sampler:
+      name: DistributedBatchSampler
+      batch_size: 256
+      drop_last: False
+      shuffle: True
+    loader:
+      num_workers: 4
+      use_shared_memory: True
+
+  Eval:
+    Query:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+
+    Gallery:
+      dataset: 
+        name: VeriWild
+        image_root: ./dataset/Aliproduct/
+        cls_label_path: ./dataset/Aliproduct/val_list.txt
+        transform_ops:
+          - DecodeImage:
+              to_rgb: True
+              channel_first: False
+          - ResizeImage:
+              size: 224
+          - NormalizeImage:
+              scale: 0.00392157
+              mean: [0.485, 0.456, 0.406]
+              std: [0.229, 0.224, 0.225]
+              order: ''
+      sampler:
+        name: DistributedBatchSampler
+        batch_size: 64
+        drop_last: False
+        shuffle: False
+      loader:
+        num_workers: 4
+        use_shared_memory: True
+
+Metric:
+  Eval:
+    - Recallk:
+        topk: [1, 5]
--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window12_384.yaml
@@ -61,6 +61,8 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
@@ -109,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@@ -134,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:

--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_base_patch4_window7_224.yaml
@@ -61,6 +61,8 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
@@ -109,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -134,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:

--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window12_384.yaml
@@ -61,6 +61,8 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 384
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
@@ -109,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 438
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 384
        - NormalizeImage:
@@ -134,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 438
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 384
    - NormalizeImage:

--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_large_patch4_window7_224.yaml
@@ -61,6 +61,8 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
@@ -109,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -134,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:

--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_small_patch4_window7_224.yaml
@@ -61,6 +61,8 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
@@ -109,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -134,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:

--- a/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
+++ b/ppcls/configs/ImageNet/SwinTransformer/SwinTransformer_tiny_patch4_window7_224.yaml
@@ -61,6 +61,8 @@ DataLoader:
            channel_first: False
        - RandCropImage:
            size: 224
+            interpolation: bicubic
+            backend: pil
        - RandFlipImage:
            flip_code: 1
        - TimmAutoAugment:
@@ -109,6 +111,8 @@ DataLoader:
            channel_first: False
        - ResizeImage:
            resize_short: 256
+            interpolation: bicubic
+            backend: pil
        - CropImage:
            size: 224
        - NormalizeImage:
@@ -134,6 +138,8 @@ Infer:
        channel_first: False
    - ResizeImage:
        resize_short: 256
+        interpolation: bicubic
+        backend: pil
    - CropImage:
        size: 224
    - NormalizeImage:

--- a/ppcls/configs/Logo/ResNet50_ReID.yaml
+++ b/ppcls/configs/Logo/ResNet50_ReID.yaml
@@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0001
@@ -84,10 +84,10 @@ DataLoader:
          - RandomErasing:
              EPSILON: 0.5
    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
-        drop_last: False
+        sample_per_id: 2
+        drop_last: True

    loader:
        num_workers: 6
@@ -97,7 +97,7 @@ DataLoader:
      dataset:
        name: LogoDataset
        image_root: "dataset/LogoDet-3K-crop/val/"
-        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+query.txt"
+        cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+val.txt"
        transform_ops:
          - DecodeImage:
              to_rgb: True
@@ -122,7 +122,7 @@ DataLoader:
      dataset:
          name: LogoDataset
          image_root: "dataset/LogoDet-3K-crop/train/"
-          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+gallery.txt"
+          cls_label_path: "dataset/LogoDet-3K-crop/LogoDet-3K+train.txt"
          transform_ops:
            - DecodeImage:
                to_rgb: True

--- a/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
+++ b/ppcls/configs/Products/ResNet50_vd_Inshop.yaml
@@ -54,7 +54,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: MultiStepDecay
-    learning_rate: 0.01
+    learning_rate: 0.04
    milestones: [30, 60, 70, 80, 90, 100]
    gamma: 0.5
    verbose: False
@@ -90,10 +90,10 @@ DataLoader:
            r1: 0.3
            mean: [0., 0., 0.]
    sampler:
-      name: DistributedRandomIdentitySampler
+      name: PKSampler
      batch_size: 64
-      num_instances: 2
-      drop_last: False
+      sample_per_id: 2
+      drop_last: True
      shuffle: True
    loader:
      num_workers: 4

--- a/ppcls/configs/Vehicle/ResNet50_ReID.yaml
+++ b/ppcls/configs/Vehicle/ResNet50_ReID.yaml
@@ -53,7 +53,7 @@ Optimizer:
  momentum: 0.9
  lr:
    name: Cosine
-    learning_rate: 0.01
+    learning_rate: 0.04
  regularizer:
    name: 'L2'
    coeff: 0.0005
@@ -88,10 +88,10 @@ DataLoader:
              mean: [0., 0., 0.]

    sampler:
-        name: DistributedRandomIdentitySampler
+        name: PKSampler
        batch_size: 128
-        num_instances: 2
-        drop_last: False
+        sample_per_id: 2
+        drop_last: True
        shuffle: True
    loader:
        num_workers: 6

--- a/ppcls/data/__init__.py
+++ b/ppcls/data/__init__.py
@@ -26,9 +26,12 @@ from ppcls.data.dataloader.common_dataset import create_operators
 from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
 from ppcls.data.dataloader.logo_dataset import LogoDataset
 from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset

 # sampler
 from ppcls.data.dataloader.DistributedRandomIdentitySampler import DistributedRandomIdentitySampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
+from ppcls.data.dataloader.mix_sampler import MixSampler
 from ppcls.data import preprocess
 from ppcls.data.preprocess import transform


--- a/ppcls/data/dataloader/__init__.py
+++ b/ppcls/data/dataloader/__init__.py
+from ppcls.data.dataloader.imagenet_dataset import ImageNetDataset
+from ppcls.data.dataloader.multilabel_dataset import MultiLabelDataset
+from ppcls.data.dataloader.common_dataset import create_operators
+from ppcls.data.dataloader.vehicle_dataset import CompCars, VeriWild
+from ppcls.data.dataloader.logo_dataset import LogoDataset
+from ppcls.data.dataloader.icartoon_dataset import ICartoonDataset
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data.dataloader.mix_sampler import MixSampler
+from ppcls.data.dataloader.pk_sampler import PKSampler
--- a/ppcls/data/dataloader/mix_dataset.py
+++ b/ppcls/data/dataloader/mix_dataset.py
+#   Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import print_function
+
+import numpy as np
+import os
+
+from paddle.io import Dataset
+from .. import dataloader
+
+
+class MixDataset(Dataset):
+    def __init__(self, datasets_config):
+        super().__init__()
+        self.dataset_list = []
+        start_idx = 0
+        end_idx = 0
+        for config_i in datasets_config:
+            dataset_name = config_i.pop('name')
+            dataset = getattr(dataloader, dataset_name)(**config_i)
+            end_idx += len(dataset)
+            self.dataset_list.append([end_idx, start_idx, dataset])
+            start_idx = end_idx
+
+        self.length = end_idx
+
+    def __getitem__(self, idx):
+        for dataset_i in self.dataset_list:
+            if dataset_i[0] > idx:
+                dataset_i_idx = idx - dataset_i[1]
+                return dataset_i[2][dataset_i_idx]
+
+    def __len__(self):
+        return self.length
+
+    def get_dataset_list(self):
+        return self.dataset_list
--- a/ppcls/data/dataloader/mix_sampler.py
+++ b/ppcls/data/dataloader/mix_sampler.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+
+from paddle.io import DistributedBatchSampler, Sampler
+
+from ppcls.utils import logger
+from ppcls.data.dataloader.mix_dataset import MixDataset
+from ppcls.data import dataloader
+
+
+class MixSampler(DistributedBatchSampler):
+    def __init__(self, dataset, batch_size, sample_configs, iter_per_epoch):
+        super().__init__(dataset, batch_size)
+        assert isinstance(dataset,
+                          MixDataset), "MixSampler only support MixDataset"
+        self.sampler_list = []
+        self.batch_size = batch_size
+        self.start_list = []
+        self.length = iter_per_epoch
+        dataset_list = dataset.get_dataset_list()
+        batch_size_left = self.batch_size
+        self.iter_list = []
+        for i, config_i in enumerate(sample_configs):
+            self.start_list.append(dataset_list[i][1])
+            sample_method = config_i.pop("name")
+            ratio_i = config_i.pop("ratio")
+            if i < len(sample_configs) - 1:
+                batch_size_i = int(self.batch_size * ratio_i)
+                batch_size_left -= batch_size_i
+            else:
+                batch_size_i = batch_size_left
+            assert batch_size_i <= len(dataset_list[i][2])
+            config_i["batch_size"] = batch_size_i
+            if sample_method == "DistributedBatchSampler":
+                sampler_i = DistributedBatchSampler(dataset_list[i][2],
+                                                    **config_i)
+            else:
+                sampler_i = getattr(dataloader, sample_method)(
+                    dataset_list[i][2], **config_i)
+            self.sampler_list.append(sampler_i)
+            self.iter_list.append(iter(sampler_i))
+            self.length += len(dataset_list[i][2]) * ratio_i
+            self.iter_counter = 0
+
+    def __iter__(self):
+        while self.iter_counter < self.length:
+            batch = []
+            for i, iter_i in enumerate(self.iter_list):
+                batch_i = next(iter_i, None)
+                if batch_i is None:
+                    iter_i = iter(self.sampler_list[i])
+                    self.iter_list[i] = iter_i
+                    batch_i = next(iter_i, None)
+                    assert batch_i is not None, "dataset {} return None".format(
+                        i)
+                batch += [idx + self.start_list[i] for idx in batch_i]
+            if len(batch) == self.batch_size:
+                self.iter_counter += 1
+                yield batch
+            else:
+                logger.info("Some dataset reaches end")
+        self.iter_counter = 0
+
+    def __len__(self):
+        return self.length
--- a/ppcls/data/dataloader/pk_sampler.py
+++ b/ppcls/data/dataloader/pk_sampler.py
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from collections import defaultdict
+import numpy as np
+import random
+from paddle.io import DistributedBatchSampler
+
+from ppcls.utils import logger
+
+
+class PKSampler(DistributedBatchSampler):
+    """
+    First, randomly sample P identities.
+    Then for each identity randomly sample K instances.
+    Therefore batch size is P*K, and the sampler called PKSampler.
+    Args:
+        dataset (paddle.io.Dataset): list of (img_path, pid, cam_id).
+        sample_per_id(int): number of instances per identity in a batch.
+        batch_size (int): number of examples in a batch.
+        shuffle(bool): whether to shuffle indices order before generating
+            batch indices. Default False.
+    """
+
+    def __init__(self,
+                 dataset,
+                 batch_size,
+                 sample_per_id,
+                 shuffle=True,
+                 drop_last=True,
+                 sample_method="sample_avg_prob"):
+        super().__init__(
+            dataset, batch_size, shuffle=shuffle, drop_last=drop_last)
+        assert batch_size % sample_per_id == 0, \
+            "PKSampler configs error, Sample_per_id must be a divisor of batch_size."
+        assert hasattr(self.dataset,
+                       "labels"), "Dataset must have labels attribute."
+        self.sample_per_label = sample_per_id
+        self.label_dict = defaultdict(list)
+        self.sample_method = sample_method
+        for idx, label in enumerate(self.dataset.labels):
+            self.label_dict[label].append(idx)
+        self.label_list = list(self.label_dict)
+        assert len(self.label_list) * self.sample_per_label > self.batch_size, \
+            "batch size should be smaller than "
+        if self.sample_method == "id_avg_prob":
+            self.prob_list = np.array([1 / len(self.label_list)] *
+                                      len(self.label_list))
+        elif self.sample_method == "sample_avg_prob":
+            counter = []
+            for label_i in self.label_list:
+                counter.append(len(self.label_dict[label_i]))
+            self.prob_list = np.array(counter) / sum(counter)
+        else:
+            logger.error(
+                "PKSampler only support id_avg_prob and sample_avg_prob sample method, "
+                "but receive {}.".format(self.sample_method))
+        if sum(np.abs(self.prob_list - 1) > 0.00000001):
+            self.prob_list[-1] = 1 - sum(self.prob_list[:-1])
+            if self.prob_list[-1] > 1 or self.prob_list[-1] < 0:
+                logger.error("PKSampler prob list error")
+            else:
+                logger.info(
+                    "PKSampler: sum of prob list not equal to 1, change the last prob"
+                )
+
+    def __iter__(self):
+        label_per_batch = self.batch_size // self.sample_per_label
+        if self.shuffle:
+            np.random.RandomState(self.epoch).shuffle(self.label_list)
+        for i in range(len(self)):
+            batch_index = []
+            batch_label_list = np.random.choice(
+                self.label_list,
+                size=label_per_batch,
+                replace=False,
+                p=self.prob_list)
+            for label_i in batch_label_list:
+                label_i_indexes = self.label_dict[label_i]
+                if self.sample_per_label <= len(label_i_indexes):
+                    batch_index.extend(
+                        np.random.choice(
+                            label_i_indexes,
+                            size=self.sample_per_label,
+                            replace=False))
+                else:
+                    batch_index.extend(
+                        np.random.choice(
+                            label_i_indexes,
+                            size=self.sample_per_label,
+                            replace=True))
+            if not self.drop_last or len(batch_index) == self.batch_size:
+                yield batch_index
--- a/ppcls/data/preprocess/ops/operators.py
+++ b/ppcls/data/preprocess/ops/operators.py
@@ -59,6 +59,9 @@ class UnifiedResize(object):
        if backend.lower() == "cv2":
            if isinstance(interpolation, str):
                interpolation = _cv2_interp_from_str[interpolation.lower()]
+            # compatible with opencv < version 4.4.0
+            elif not interpolation:
+                interpolation = cv2.INTER_LINEAR
            self.resize_func = partial(cv2.resize, interpolation=interpolation)
        elif backend.lower() == "pil":
            if isinstance(interpolation, str):

--- a/ppcls/engine/evaluation/classification.py
+++ b/ppcls/engine/evaluation/classification.py
@@ -22,7 +22,7 @@ from ppcls.utils.misc import AverageMeter
 from ppcls.utils import logger


-def classification_eval(evaler, epoch_id=0):
+def classification_eval(engine, epoch_id=0):
    output_info = dict()
    time_info = {
        "batch_cost": AverageMeter(
@@ -30,21 +30,19 @@ def classification_eval(evaler, epoch_id=0):
        "reader_cost": AverageMeter(
            "reader_cost", ".5f", postfix=" s,"),
    }
-    print_batch_step = evaler.config["Global"]["print_batch_step"]
+    print_batch_step = engine.config["Global"]["print_batch_step"]

    metric_key = None
    tic = time.time()
-    eval_dataloader = evaler.eval_dataloader if evaler.use_dali else evaler.eval_dataloader(
-    )
-    max_iter = len(evaler.eval_dataloader) - 1 if platform.system(
-    ) == "Windows" else len(evaler.eval_dataloader)
-    for iter_id, batch in enumerate(eval_dataloader):
+    max_iter = len(engine.eval_dataloader) - 1 if platform.system(
+    ) == "Windows" else len(engine.eval_dataloader)
+    for iter_id, batch in enumerate(engine.eval_dataloader):
        if iter_id >= max_iter:
            break
        if iter_id == 5:
            for key in time_info:
                time_info[key].reset()
-        if evaler.use_dali:
+        if engine.use_dali:
            batch = [
                paddle.to_tensor(batch[0]['data']),
                paddle.to_tensor(batch[0]['label'])
@@ -55,17 +53,17 @@ def classification_eval(evaler, epoch_id=0):
        if not evaler.config["Global"].get("use_multilabel", False):
            batch[1] = batch[1].reshape([-1, 1]).astype("int64")
        # image input
-        out = evaler.model(batch[0])
+        out = engine.model(batch[0])
        # calc loss
-        if evaler.eval_loss_func is not None:
-            loss_dict = evaler.eval_loss_func(out, batch[1])
+        if engine.eval_loss_func is not None:
+            loss_dict = engine.eval_loss_func(out, batch[1])
            for key in loss_dict:
                if key not in output_info:
                    output_info[key] = AverageMeter(key, '7.5f')
                output_info[key].update(loss_dict[key].numpy()[0], batch_size)
        # calc metric
-        if evaler.eval_metric_func is not None:
-            metric_dict = evaler.eval_metric_func(out, batch[1])
+        if engine.eval_metric_func is not None:
+            metric_dict = engine.eval_metric_func(out, batch[1])
            if paddle.distributed.get_world_size() > 1:
                for key in metric_dict:
                    paddle.distributed.all_reduce(
@@ -98,18 +96,18 @@ def classification_eval(evaler, epoch_id=0):
            ])
            logger.info("[Eval][Epoch {}][Iter: {}/{}]{}, {}, {}".format(
                epoch_id, iter_id,
-                len(evaler.eval_dataloader), metric_msg, time_msg, ips_msg))
+                len(engine.eval_dataloader), metric_msg, time_msg, ips_msg))

        tic = time.time()
-    if evaler.use_dali:
-        evaler.eval_dataloader.reset()
+    if engine.use_dali:
+        engine.eval_dataloader.reset()
    metric_msg = ", ".join([
        "{}: {:.5f}".format(key, output_info[key].avg) for key in output_info
    ])
    logger.info("[Eval][Epoch {}][Avg]{}".format(epoch_id, metric_msg))

    # do not try to save best eval.model
-    if evaler.eval_metric_func is None:
+    if engine.eval_metric_func is None:
        return -1
    # return 1st metric in the dict
    return output_info[metric_key].avg
--- a/ppcls/engine/evaluation/retrieval.py
+++ b/ppcls/engine/evaluation/retrieval.py
@@ -20,21 +20,21 @@ import paddle
 from ppcls.utils import logger


-def retrieval_eval(evaler, epoch_id=0):
-    evaler.model.eval()
+def retrieval_eval(engine, epoch_id=0):
+    engine.model.eval()
    # step1. build gallery
-    if evaler.gallery_query_dataloader is not None:
+    if engine.gallery_query_dataloader is not None:
        gallery_feas, gallery_img_id, gallery_unique_id = cal_feature(
-            evaler, name='gallery_query')
+            engine, name='gallery_query')
        query_feas, query_img_id, query_query_id = gallery_feas, gallery_img_id, gallery_unique_id
    else:
        gallery_feas, gallery_img_id, gallery_unique_id = cal_feature(
-            evaler, name='gallery')
+            engine, name='gallery')
        query_feas, query_img_id, query_query_id = cal_feature(
-            evaler, name='query')
+            engine, name='query')

    # step2. do evaluation
-    sim_block_size = evaler.config["Global"].get("sim_block_size", 64)
+    sim_block_size = engine.config["Global"].get("sim_block_size", 64)
    sections = [sim_block_size] * (len(query_feas) // sim_block_size)
    if len(query_feas) % sim_block_size:
        sections.append(len(query_feas) % sim_block_size)
@@ -45,7 +45,7 @@ def retrieval_eval(evaler, epoch_id=0):
    image_id_blocks = paddle.split(query_img_id, num_or_sections=sections)
    metric_key = None

-    if evaler.eval_loss_func is None:
+    if engine.eval_loss_func is None:
        metric_dict = {metric_key: 0.}
    else:
        metric_dict = dict()
@@ -65,7 +65,7 @@ def retrieval_eval(evaler, epoch_id=0):
            else:
                keep_mask = None

-            metric_tmp = evaler.eval_metric_func(similarity_matrix,
+            metric_tmp = engine.eval_metric_func(similarity_matrix,
                                                 image_id_blocks[block_idx],
                                                 gallery_img_id, keep_mask)

@@ -88,32 +88,31 @@ def retrieval_eval(evaler, epoch_id=0):
    return metric_dict[metric_key]


-def cal_feature(evaler, name='gallery'):
+def cal_feature(engine, name='gallery'):
    all_feas = None
    all_image_id = None
    all_unique_id = None
    has_unique_id = False

    if name == 'gallery':
-        dataloader = evaler.gallery_dataloader
+        dataloader = engine.gallery_dataloader
    elif name == 'query':
-        dataloader = evaler.query_dataloader
+        dataloader = engine.query_dataloader
    elif name == 'gallery_query':
-        dataloader = evaler.gallery_query_dataloader
+        dataloader = engine.gallery_query_dataloader
    else:
        raise RuntimeError("Only support gallery or query dataset")

    max_iter = len(dataloader) - 1 if platform.system() == "Windows" else len(
        dataloader)
-    dataloader_tmp = dataloader if evaler.use_dali else dataloader()
-    for idx, batch in enumerate(dataloader_tmp):  # load is very time-consuming
+    for idx, batch in enumerate(dataloader):  # load is very time-consuming
        if idx >= max_iter:
            break
-        if idx % evaler.config["Global"]["print_batch_step"] == 0:
+        if idx % engine.config["Global"]["print_batch_step"] == 0:
            logger.info(
                f"{name} feature calculation process: [{idx}/{len(dataloader)}]"
            )
-        if evaler.use_dali:
+        if engine.use_dali:
            batch = [
                paddle.to_tensor(batch[0]['data']),
                paddle.to_tensor(batch[0]['label'])
@@ -123,20 +122,20 @@ def cal_feature(evaler, name='gallery'):
        if len(batch) == 3:
            has_unique_id = True
            batch[2] = batch[2].reshape([-1, 1]).astype("int64")
-        out = evaler.model(batch[0], batch[1])
+        out = engine.model(batch[0], batch[1])
        batch_feas = out["features"]

        # do norm
-        if evaler.config["Global"].get("feature_normalize", True):
+        if engine.config["Global"].get("feature_normalize", True):
            feas_norm = paddle.sqrt(
                paddle.sum(paddle.square(batch_feas), axis=1, keepdim=True))
            batch_feas = paddle.divide(batch_feas, feas_norm)

        # do binarize
-        if evaler.config["Global"].get("feature_binarize") == "round":
+        if engine.config["Global"].get("feature_binarize") == "round":
            batch_feas = paddle.round(batch_feas).astype("float32") * 2.0 - 1.0

-        if evaler.config["Global"].get("feature_binarize") == "sign":
+        if engine.config["Global"].get("feature_binarize") == "sign":
            batch_feas = paddle.sign(batch_feas).astype("float32")

        if all_feas is None:
@@ -150,8 +149,8 @@ def cal_feature(evaler, name='gallery'):
            if has_unique_id:
                all_unique_id = paddle.concat([all_unique_id, batch[2]])

-    if evaler.use_dali:
-        dataloader_tmp.reset()
+    if engine.use_dali:
+        dataloader.reset()

    if paddle.distributed.get_world_size() > 1:
        feat_list = []

--- a/ppcls/engine/train/train.py
+++ b/ppcls/engine/train/train.py
@@ -18,63 +18,61 @@ import paddle
 from ppcls.engine.train.utils import update_loss, update_metric, log_info


-def train_epoch(trainer, epoch_id, print_batch_step):
+def train_epoch(engine, epoch_id, print_batch_step):
    tic = time.time()
-
-    train_dataloader = trainer.train_dataloader if trainer.use_dali else trainer.train_dataloader(
-    )
-    for iter_id, batch in enumerate(train_dataloader):
-        if iter_id >= trainer.max_iter:
+    for iter_id, batch in enumerate(engine.train_dataloader):
+        if iter_id >= engine.max_iter:
            break
        if iter_id == 5:
-            for key in trainer.time_info:
-                trainer.time_info[key].reset()
-        trainer.time_info["reader_cost"].update(time.time() - tic)
-        if trainer.use_dali:
+            for key in engine.time_info:
+                engine.time_info[key].reset()
+        engine.time_info["reader_cost"].update(time.time() - tic)
+        if engine.use_dali:
            batch = [
                paddle.to_tensor(batch[0]['data']),
                paddle.to_tensor(batch[0]['label'])
            ]
        batch_size = batch[0].shape[0]
-        if not trainer.config["Global"].get("use_multilabel", False):
+        if not engine.config["Global"].get("use_multilabel", False):
            batch[1] = batch[1].reshape([-1, 1]).astype("int64")
-        trainer.global_step += 1
+        engine.global_step += 1
+
        # image input
-        if trainer.amp:
+        if engine.amp:
            with paddle.amp.auto_cast(custom_black_list={
                    "flatten_contiguous_range", "greater_than"
            }):
-                out = forward(trainer, batch)
-                loss_dict = trainer.train_loss_func(out, batch[1])
+                out = forward(engine, batch)
+                loss_dict = engine.train_loss_func(out, batch[1])
        else:
-            out = forward(trainer, batch)
+            out = forward(engine, batch)

        # calc loss
-        if trainer.config["DataLoader"]["Train"]["dataset"].get(
+        if engine.config["DataLoader"]["Train"]["dataset"].get(
                "batch_transform_ops", None):
-            loss_dict = trainer.train_loss_func(out, batch[1:])
+            loss_dict = engine.train_loss_func(out, batch[1:])
        else:
-            loss_dict = trainer.train_loss_func(out, batch[1])
+            loss_dict = engine.train_loss_func(out, batch[1])

        # step opt and lr
-        if trainer.amp:
-            scaled = trainer.scaler.scale(loss_dict["loss"])
+        if engine.amp:
+            scaled = engine.scaler.scale(loss_dict["loss"])
            scaled.backward()
-            trainer.scaler.minimize(trainer.optimizer, scaled)
+            engine.scaler.minimize(engine.optimizer, scaled)
        else:
            loss_dict["loss"].backward()
-            trainer.optimizer.step()
-        trainer.optimizer.clear_grad()
-        trainer.lr_sch.step()
+            engine.optimizer.step()
+        engine.optimizer.clear_grad()
+        engine.lr_sch.step()

        # below code just for logging
        # update metric_for_logger
-        update_metric(trainer, out, batch, batch_size)
+        update_metric(engine, out, batch, batch_size)
        # update_loss_for_logger
-        update_loss(trainer, loss_dict, batch_size)
-        trainer.time_info["batch_cost"].update(time.time() - tic)
+        update_loss(engine, loss_dict, batch_size)
+        engine.time_info["batch_cost"].update(time.time() - tic)
        if iter_id % print_batch_step == 0:
-            log_info(trainer, batch_size, epoch_id, iter_id)
+            log_info(engine, batch_size, epoch_id, iter_id)
        tic = time.time()



--- a/ppcls/optimizer/learning_rate.py
+++ b/ppcls/optimizer/learning_rate.py
@@ -11,12 +11,15 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
+
 from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

 from paddle.optimizer import lr
 from paddle.optimizer.lr import LRScheduler

+from ppcls.utils import logger
+

 class Linear(object):
    """
@@ -41,7 +44,11 @@ class Linear(object):
                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Linear, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.learning_rate = learning_rate
        self.steps = (epochs - warmup_epoch) * step_each_epoch
        self.end_lr = end_lr
@@ -56,7 +63,8 @@ class Linear(object):
            decay_steps=self.steps,
            end_lr=self.end_lr,
            power=self.power,
-            last_epoch=self.last_epoch)
+            last_epoch=self.
+            last_epoch) if self.steps > 0 else self.learning_rate
        if self.warmup_steps > 0:
            learning_rate = lr.LinearWarmup(
                learning_rate=learning_rate,
@@ -90,7 +98,11 @@ class Cosine(object):
                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Cosine, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.learning_rate = learning_rate
        self.T_max = (epochs - warmup_epoch) * step_each_epoch
        self.eta_min = eta_min
@@ -103,7 +115,8 @@ class Cosine(object):
            learning_rate=self.learning_rate,
            T_max=self.T_max,
            eta_min=self.eta_min,
-            last_epoch=self.last_epoch)
+            last_epoch=self.
+            last_epoch) if self.T_max > 0 else self.learning_rate
        if self.warmup_steps > 0:
            learning_rate = lr.LinearWarmup(
                learning_rate=learning_rate,
@@ -132,12 +145,17 @@ class Step(object):
                 learning_rate,
                 step_size,
                 step_each_epoch,
+                 epochs,
                 gamma,
                 warmup_epoch=0,
                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Step, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.step_size = step_each_epoch * step_size
        self.learning_rate = learning_rate
        self.gamma = gamma
@@ -177,11 +195,16 @@ class Piecewise(object):
                 step_each_epoch,
                 decay_epochs,
                 values,
+                 epochs,
                 warmup_epoch=0,
                 warmup_start_lr=0.0,
                 last_epoch=-1,
                 **kwargs):
-        super(Piecewise, self).__init__()
+        super().__init__()
+        if warmup_epoch >= epochs:
+            msg = f"When using warm up, the value of \"Global.epochs\" must be greater than value of \"Optimizer.lr.warmup_epoch\". The value of \"Optimizer.lr.warmup_epoch\" has been set to {epochs}."
+            logger.warning(msg)
+            warmup_epoch = epochs
        self.boundaries = [step_each_epoch * e for e in decay_epochs]
        self.values = values
        self.last_epoch = last_epoch
@@ -294,8 +317,7 @@ class MultiStepDecay(LRScheduler):
            raise ValueError('gamma should be < 1.0.')
        self.milestones = [x * step_each_epoch for x in milestones]
        self.gamma = gamma
-        super(MultiStepDecay, self).__init__(learning_rate, last_epoch,
-                                             verbose)
+        super().__init__(learning_rate, last_epoch, verbose)

    def get_lr(self):
        for i in range(len(self.milestones)):