Merge branch 'master' of https://github.com/PaddlePaddle/PaddleClas

8b8659af · WuHaobo · 45545955 · bb9b1533 · 8b8659af · 8b8659af
18 changed file
--- a/README.md
+++ b/README.md
 # PaddleClas

+**文档教程**：https://paddleclas.readthedocs.io （正在持续更新中）
+
 ## 简介
 PaddleClas的目的是为工业界和学术界提供一个图像分类任务相关的百宝箱，特色如下：
- 模型库：ResNet_vd、MobileNetV3等25种系列的分类网络结构和训练技巧，以及对应的117个分类预训练模型和性能评估
+- 模型库：ResNet_vd、MobileNetV3等23种系列的分类网络结构和训练技巧，以及对应的117个分类预训练模型和性能评估

- 高阶使用：高精度的实用模型蒸馏方案（准确率82.39%的ResNet50_vd和78.9%的MobileNetV3）、8种数据增广方法的复现和验证
+- 高阶使用：高精度的实用知识蒸馏方案（准确率82.39%的ResNet50_vd和78.9%的MobileNetV3）、8种数据增广方法的复现和验证

- 应用拓展：常见视觉任务的特色方案，包括图像分类领域的迁移学习（百度自研的10w类图像分类预训练模型）和通用目标检测（mAP 47.8%的实用检测方案）等
+- 应用拓展：常见视觉任务的特色方案，包括图像分类领域的迁移学习（百度自研的10万类图像分类预训练模型）和通用目标检测（mAP 47.8%的实用检测方案）等

- 实用工具：便于工业应用部署的实用工具，包括TensorRT预测、移动端预测、INT8量化、多机训练、PaddleHub等
+- 实用工具：便于工业应用部署的实用工具，包括TensorRT预测、移动端预测、模型服务化部署等

 - 赛事支持：助力多个视觉全球挑战赛取得领先成绩，包括2018年Kaggle Open Images V4图像目标检测挑战赛冠军、2019年Kaggle地标检索挑战赛亚军等

 ## 模型库

+基于ImageNet1k分类数据集，PaddleClas提供ResNet、ResNet_vd、EfficientNet、Res2Net、HRNet、MobileNetV3等23种系列的分类网络结构的简单介绍、论文指标复现配置，以及在复现过程中的训练技巧。与此同时，PaddleClas也提供了对应的117个图像分类预训练模型，并且基于TensorRT评估了所有模型的GPU预测时间，以及在骁龙855（SD855）上评估了移动端模型的CPU预测时间和存储大小。支持的***预训练模型列表、下载地址以及更多信息***请见文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/zh_cn/models/models_intro.html)。
+
 <div align="center">
    <img src="docs/images/models/main_fps_top1.png" width="600">
 </div>

-基于ImageNet1k分类数据集，PaddleClas提供ResNet、ResNet_vd、EfficientNet、Res2Net、HRNet、MobileNetV3等25种常用分类网络结构的简单介绍，论文指标复现配置，以及在复现过程中的训练技巧。与此同时，PaddleClas也提供了117个图像分类预训练模型，并且基于TensorRT评估了所有模型的GPU预测时间，以及在骁龙855（SD855）上评估了移动端模型的CPU预测时间和存储大小。
-
-上图展示了一些适合服务器端应用的模型，使用V100，FP16和TensorRT预测一个batch的时间，其中batch_size=32，图中ResNet50_vd_ssld，是采用PaddleClas提供的SSLD蒸馏方法训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的FLOPS和Parameters、FP16和FP32的预测时间以及不同batch_size的预测时间正在持续更新中。
+上图展示了一些适合服务器端应用的模型，使用V100，FP16和TensorRT预测一个batch的时间，其中batch_size=32，图中ResNet50_vd_ssld，是采用PaddleClas提供的SSLD蒸馏方法训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的FLOPS和Parameters、FP16和FP32的预测时间以及不同batch_size的预测时间请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/zh_cn/models/models_intro.html)。

 <div align="center">
 <img
-src="docs/images/models/mobile_arm_top1.png" width="600">
+src="docs/images/models/mobile_arm_top1.png" width="700">
 </div>

-上图展示了一些适合移动端应用的模型，在SD855上预测一张图像的CPU时间以及模型的存储大小。图中MV3_large_x1_0_ssld（M是MobileNet的简称），MV3_small_x1_0_ssld、MV2_ssld和MV1_ssld，是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的FLOPS和Parameters、以及更多的GPU预测时间正在持续更新中。
+上图展示了一些适合移动端应用的模型，在SD855上预测一张图像的CPU时间。图中MV3_large_x1_0_ssld（M是MobileNet的简称），MV3_small_x1_0_ssld、MV2_ssld和MV1_ssld，是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的FLOPS、Parameters、模型存储大小，以及更多的GPU预测时间请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/zh_cn/models/models_intro.html)。

 - TODO
 - [ ] EfficientLite、GhostNet、RegNet论文指标复现和性能评估

 ## 高阶使用
 除了提供丰富的分类网络结构和预训练模型，PaddleClas也支持了一系列有助于图像分类任务效果和效率提升的算法或工具。
-### 模型蒸馏
+### 知识蒸馏

-模型蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，得到比较大的效果提升，甚至获得与大模型相似的精度指标。PaddleClas提供了一种简单的半监督标签模型蒸馏方案（SSLD，Simple Semi-supervised Label Distillation），使用该方案大幅提升了ResNet50_vd、MobileNetV1和MobileNetV3在ImageNet数据集上分类效果。该蒸馏方案的框架图和蒸馏模型效果如下图所示，详细的蒸馏方法介绍以及使用正在持续更新中。
+知识蒸馏是指使用教师模型(teacher model)去指导学生模型(student model)学习特定任务，保证小模型在参数量不变的情况下，得到比较大的效果提升，甚至获得与大模型相似的精度指标。

 <div align="center">
 <img
-src="docs/images/distillation/ppcls_distillation_v1.png" width="600">
+src="docs/images/distillation/distillation_perform.png" width="500">
 </div>

+PaddleClas提供了一种简单的半监督标签知识蒸馏方案（SSLD，Simple Semi-supervised Label Distillation），使用该方案大幅提升了ResNet50_vd、MobileNetV1、MobileNetV2和MobileNetV3在ImageNet数据集上分类效果，如上图所示。该知识蒸馏方案的框架图如下，详细的知识蒸馏方法介绍以及使用正在持续更新中。
+
 <div align="center">
 <img
-src="docs/images/distillation/distillation_perform.png" width="500">
+src="docs/images/distillation/ppcls_distillation_v1.png" width="700">
 </div>

 ### 数据增广

-在图像分类任务中，图像数据的增广是一种常用的正则化方法，可以有效提升图像分类的效果，尤其对于数据量不足或者模型网络较深的场景。PaddleClas支持了最新的8种数据增广算法的复现和在统一实验环境下效果评估，如下图所示。每种数据增广方法的详细介绍、对比的实验环境以及使用正在持续更新中。
+在图像分类任务中，图像数据的增广是一种常用的正则化方法，可以有效提升图像分类的效果，尤其对于数据量不足或者模型网络较大的场景。PaddleClas支持了最新的8种数据增广算法的复现和在统一实验环境下的效果评估，如下图所示。每种数据增广方法的详细介绍、对比的实验环境以及使用正在持续更新中。

 <div align="center">
 <img
@@ -66,7 +70,7 @@ src="docs/images/image_aug/main_image_aug.png" width="600">

 ### 图像分类的迁移学习

-在实际应用中，由于训练数据的匮乏，往往将ImageNet1K数据集训练的分类模型作为预训练模型，进行图像分类的迁移学习。为了进一步助力实际问题的解决，PaddleClas计划开源百度自研的基于10万种类别，4千多万的有标签数据训练的预训练模型，同时给出多种超参搜索方法。该部分内容正在持续更新中。
+在实际应用中，由于训练数据的匮乏，往往将ImageNet1K数据集训练的分类模型作为预训练模型，进行图像分类的迁移学习。为了进一步助力实际问题的解决，PaddleClas计划开源百度自研的基于10万种类别、4千多万的有标签数据训练的预训练模型，同时给出多种超参搜索方法。该部分内容正在持续更新中。

 ### 通用目标检测

@@ -82,16 +86,17 @@ src="docs/images/det/pssdet.png" width="500">
 - [ ] PaddleClas在人脸检测和识别中的特色应用

 ## 实用工具
-PaddlePaddle提供了一系列实用工具，便于工业应用部署PaddleClas，详细使用请参考文档教程。
+PaddlePaddle提供了一系列实用工具，便于工业应用部署PaddleClas，具体请参考文档教程中的[**实用工具章节**](https://paddleclas.readthedocs.io/zh_CN/latest/zh_cn/extension/index.html)。

 - TensorRT预测
- 移动端预测
- INT8量化
+- Paddle-Lite
+- 模型服务化部署
+- 模型量化
 - 多机训练
- PaddleHub
+- Paddle Hub

 ## 赛事支持
-PaddleClas的建设源于百度实际视觉业务应用的淬炼和视觉前沿能力的探索，助力多个视觉重点赛事取得领先成绩，并且持续推进更多的前沿视觉问题的解决和落地应用。
+PaddleClas的建设源于百度实际视觉业务应用的淬炼和视觉前沿能力的探索，助力多个视觉重点赛事取得领先成绩，并且持续推进更多的前沿视觉问题的解决和落地应用。更多内容请关注文档教程中的[**赛事支持章节**](https://paddleclas.readthedocs.io/zh_CN/latest/zh_cn/competition_support.html)

 - 2018年Kaggle Open Images V4图像目标检测挑战赛冠军
 - 2019年Kaggle Open Images V5图像目标检测挑战赛亚军

--- a/docs/images/image_aug/hide-and-seek-visual.png
+++ b/docs/images/image_aug/hide-and-seek-visual.png
--- a/docs/images/image_aug/test_cutmix.png
+++ b/docs/images/image_aug/test_cutmix.png
--- a/docs/images/image_aug/test_mixup.png
+++ b/docs/images/image_aug/test_mixup.png
--- a/docs/zh_CN/models/models_intro.md
+++ b/docs/zh_CN/models/models_intro.md
@@ -7,6 +7,7 @@
 ![](../../images/models/main_fps_top1.png)
 ![](../../images/models/mobile_arm_top1.png)

+## 预训练模型列表及下载地址
 - ResNet及其Vd系列
  - ResNet系列<sup>[[1](#ref1)]</sup>([论文地址](http://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html))
    - [ResNet18](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet18_pretrained.tar)

--- a/docs/zh_cn/advanced_tutorials/image_augmentation/ImageAugment.md
+++ b/docs/zh_cn/advanced_tutorials/image_augmentation/ImageAugment.md
--- a/docs/zh_cn/advanced_tutorials/image_augmentation/index.rst
+++ b/docs/zh_cn/advanced_tutorials/image_augmentation/index.rst
+图像增广
+================================
+
+.. toctree::
+   :maxdepth: 3
+
+   ImageAugment.md
--- a/docs/zh_cn/advanced_tutorials/index.rst
+++ b/docs/zh_cn/advanced_tutorials/index.rst
+高阶使用
+================================
+
+.. toctree::
+   :maxdepth: 1
+
+   image_augmentation/index
+   distillation/index
+
--- a/docs/zh_cn/competition_support.md
+++ b/docs/zh_cn/competition_support.md
+### 赛事支持
+
+PaddleCLS的建设源于百度实际视觉业务应用的淬炼和视觉前沿能力的探索，助力多个视觉重点赛事取得领先成绩，并且持续推进更多的前沿视觉问题的解决和落地应用。
+
+* 2018年Kaggle Open Images V4图像目标检测挑战赛冠军
+
+* 2019年Kaggle Open Images V5图像目标检测挑战赛亚军
+    * 技术报告可以参考：[https://arxiv.org/pdf/1911.07171.pdf](https://arxiv.org/pdf/1911.07171.pdf)
+    * 详细文档与开源的模型可以参考：[OIDV5目标检测github地址](https://github.com/PaddlePaddle/PaddleDetection/blob/master/docs/featured_model/OIDV5_BASELINE_MODEL.md)
+
+* 2019年Kaggle地标检索挑战赛亚军
+    * 技术报告可以参考：[https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
+    * 详细文档与开源的模型可以参考：[2019地标检索和识别github地址](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
+
+* 2019年Kaggle地标识别挑战赛亚军
+    * 技术报告可以参考：[https://arxiv.org/abs/1906.03990](https://arxiv.org/abs/1906.03990)
+    * 详细文档与开源的模型可以参考：[2019地标检索和识别github地址](https://github.com/PaddlePaddle/Research/tree/master/CV/landmark)
+
+* 首届多媒体信息识别技术竞赛中印刷文本OCR、人脸识别和地标识别三项任务A级证书
--- a/ppcls/data/imaug/autoaugment.py
+++ b/ppcls/data/imaug/autoaugment.py
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-#This code is based on https://github.com/DeepVoltaire/AutoAugment/blob/master/autoaugment.py
+# This code is based on https://github.com/DeepVoltaire/AutoAugment/blob/master/autoaugment.py

 from PIL import Image, ImageEnhance, ImageOps
 import numpy as np

--- a/ppcls/data/imaug/cutout.py
+++ b/ppcls/data/imaug/cutout.py
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+# This code is based on https://github.com/uoguelph-mlrg/Cutout
+
 import numpy as np
 import random


--- a/ppcls/data/imaug/grid.py
+++ b/ppcls/data/imaug/grid.py
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+# This code is based on https://github.com/akuxcw/GridMask
+
 import numpy as np
 from PIL import Image
 import pdb

--- a/ppcls/data/imaug/hide_and_seek.py
+++ b/ppcls/data/imaug/hide_and_seek.py
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+# This code is based on https://github.com/kkanshul/Hide-and-Seek
+
 import numpy as np
 import random


--- a/ppcls/data/imaug/randaugment.py
+++ b/ppcls/data/imaug/randaugment.py
@@ -12,7 +12,7 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-#This code is based on https://github.com/
+#This code is based on https://github.com/heartInsert/randaugment

 from PIL import Image, ImageEnhance, ImageOps
 import numpy as np

--- a/ppcls/data/imaug/random_erasing.py
+++ b/ppcls/data/imaug/random_erasing.py
@@ -12,6 +12,8 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

+#This code is based on https://github.com/zhunzhong07/Random-Erasing
+
 import math
 import random


--- a/ppcls/data/reader.py
+++ b/ppcls/data/reader.py
@@ -140,7 +140,8 @@ def get_file_list(params):
    full_lines = shuffle_lines(full_lines, params["shuffle_seed"])

    # use only partial data for each trainer in distributed training
-    full_lines = full_lines[trainer_id::trainers_num]
+    img_per_trainer = len(full_lines) // trainers_num
+    full_lines = full_lines[trainer_id::trainers_num][:img_per_trainer]

    return full_lines


--- a/tools/infer/predict.py
+++ b/tools/infer/predict.py
@@ -12,7 +12,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-
 import utils
 import argparse
 import numpy as np
@@ -24,6 +23,7 @@ from paddle.fluid.core import create_paddle_predictor
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)

+
 def parse_args():
    def str2bool(v):
        return v.lower() in ("true", "t", "1")
@@ -47,19 +47,18 @@ def parse_args():
 def create_predictor(args):
    config = AnalysisConfig(args.model_file, args.params_file)

-
-
    if args.use_gpu:
        config.enable_use_gpu(args.gpu_mem, 0)
    else:
        config.disable_gpu()

    config.disable_glog_info()
-    config.switch_ir_optim(args.ir_optim) # default true
+    config.switch_ir_optim(args.ir_optim)  # default true
    if args.use_tensorrt:
        config.enable_tensorrt_engine(
-                precision_mode=AnalysisConfig.Precision.Half if args.use_fp16 else AnalysisConfig.Precision.Float32,
-                max_batch_size=args.batch_size)
+            precision_mode=AnalysisConfig.Precision.Half
+            if args.use_fp16 else AnalysisConfig.Precision.Float32,
+            max_batch_size=args.batch_size)

    config.enable_memory_optim()
    # use zero copy
@@ -79,7 +78,7 @@ def create_operators():
    resize_op = utils.ResizeImage(resize_short=256)
    crop_op = utils.CropImage(size=(size, size))
    normalize_op = utils.NormalizeImage(
-            scale=img_scale, mean=img_mean, std=img_std)
+        scale=img_scale, mean=img_mean, std=img_std)
    totensor_op = utils.ToTensor()

    return [decode_op, resize_op, crop_op, normalize_op, totensor_op]
@@ -104,38 +103,62 @@ def main():
        assert args.model_name is not None
        assert args.use_tensorrt == True
    # HALF precission predict only work when using tensorrt
-    if args.use_fp16==True:
+    if args.use_fp16 == True:
        assert args.use_tensorrt == True

    operators = create_operators()
    predictor = create_predictor(args)

    inputs = preprocess(args.image_file, operators)
-    inputs = np.expand_dims(inputs, axis=0).repeat(args.batch_size, axis=0).copy()
+    inputs = np.expand_dims(
+        inputs, axis=0).repeat(
+            args.batch_size, axis=0).copy()

    input_names = predictor.get_input_names()
    input_tensor = predictor.get_input_tensor(input_names[0])
-    input_tensor.copy_from_cpu(inputs)
+
+    output_names = predictor.get_output_names()
+    output_tensor = predictor.get_output_tensor(output_names[0])
+
+    test_num = 500
+    test_time = 0.0
    if not args.enable_benchmark:
+        inputs = preprocess(args.image_file, operators)
+        inputs = np.expand_dims(
+            inputs, axis=0).repeat(
+                args.batch_size, axis=0).copy()
+        input_tensor.copy_from_cpu(inputs)
+
        predictor.zero_copy_run()
+
+        output = output_tensor.copy_to_cpu()
+        output = output.flatten()
+        cls = np.argmax(output)
+        score = output[cls]
+        logger.info("class: {0}".format(cls))
+        logger.info("score: {0}".format(score))
    else:
-        for i in range(0,1010):
-            if i == 10:
-                start = time.time()
+        for i in range(0, test_num + 10):
+            inputs = np.random.rand(args.batch_size, 3, 224,
+                                    224).astype(np.float32)
+            start_time = time.time()
+            input_tensor.copy_from_cpu(inputs)
+
            predictor.zero_copy_run()

-        end = time.time()
-        fp_message = "FP16" if args.use_fp16 else "FP32"
-        logger.info("{0}\t{1}\tbatch size: {2}\ttime(ms): {3}".format(args.model_name, fp_message, args.batch_size, end-start))
+            output = output_tensor.copy_to_cpu()
+            output = output.flatten()
+            if i >= 10:
+                test_time += time.time() - start_time
+            cls = np.argmax(output)
+            score = output[cls]
+            logger.info("class: {0}".format(cls))
+            logger.info("score: {0}".format(score))

-    output_names = predictor.get_output_names()
-    output_tensor = predictor.get_output_tensor(output_names[0])
-    output = output_tensor.copy_to_cpu()
-    output = output.flatten()
-    cls = np.argmax(output)
-    score = output[cls]
-    logger.info("class: {0}".format(cls))
-    logger.info("score: {0}".format(score))
+        fp_message = "FP16" if args.use_fp16 else "FP32"
+        logger.info("{0}\t{1}\tbatch size: {2}\ttime(ms): {3}".format(
+            args.model_name, fp_message, args.batch_size, 1000 * test_time /
+            test_num))


 if __name__ == "__main__":

--- a/tools/run_download.sh
+++ b/tools/run_download.sh
+#!/usr/bin/env bash
+
+export PYTHONPATH=$PWD:$PYTHONPATH
+
+python tools/download.py -a ResNet34 -p ./pretrained/ -d 1