diff --git a/.gitignore b/.gitignore
index 7fad702f3a1ccf3a95ef3856b48c22107a536931..4871ce649f83e49648b33fb27d62bf9a179341a2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,3 +8,4 @@ output/
pretrained/
*.ipynb*
_build/
+nohup.out
diff --git a/.travis/precommit.sh b/.travis/precommit.sh
new file mode 100644
index 0000000000000000000000000000000000000000..369fa5101630431ca72bc630bb070c2e0084b7ca
--- /dev/null
+++ b/.travis/precommit.sh
@@ -0,0 +1,21 @@
+#!/bin/bash
+function abort(){
+ echo "Your commit does not fit PaddlePaddle code style" 1>&2
+ echo "Please use pre-commit scripts to auto-format your code" 1>&2
+ exit 1
+}
+
+trap 'abort' 0
+set -e
+cd `dirname $0`
+cd ..
+export PATH=/usr/bin:$PATH
+pre-commit install
+
+if ! pre-commit run -a ; then
+ ls -lh
+ git diff --exit-code
+ exit 1
+fi
+
+trap : 0
diff --git a/README.md b/README.md
index 3124743469116b35424066e2cb7f7cbf5388c9b9..c78e28f865fb30c218861ddbd2f98af858f37645 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@
飞桨图像分类套件PaddleClas是飞桨为工业界和学术界所准备的一个图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
-
+
## 丰富的模型库
@@ -17,21 +17,20 @@
基于ImageNet1k分类数据集,PaddleClas提供ResNet、ResNet_vd、Res2Net、HRNet、MobileNetV3等23种系列的分类网络结构的简单介绍、论文指标复现配置,以及在复现过程中的训练技巧。与此同时,也提供了对应的117个图像分类预训练模型,并且基于TensorRT评估了服务器端模型的GPU预测时间,以及在骁龙855(SD855)上评估了移动端模型的CPU预测时间和存储大小。支持的***预训练模型列表、下载地址以及更多信息***请见文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
-
+
-上图对比了一些最新的面向服务器端应用场景的模型,在使用V100,FP32和TensorRT预测一张图像的时间和其准确率,图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld,是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
+上图对比了一些最新的面向服务器端应用场景的模型,在使用V100,FP32和TensorRT,batch size为1时的预测时间及其准确率,图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld,是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间(包括不同batchsize的T4卡预测速度)请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
+src="./docs/images/models/mobile_arm_top1.png" width="700">
上图对比了一些最新的面向移动端应用场景的模型,在骁龙855(SD855)上预测一张图像的时间和其准确率,包括MobileNetV1系列、MobileNetV2系列、MobileNetV3系列和ShuffleNetV2系列。图中准确率79%的MV3_large_x1_0_ssld(M是MobileNet的简称),71.3%的MV3_small_x1_0_ssld、76.74%的MV2_ssld和77.89%的MV1_ssld,是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的简介、FLOPS、Parameters和模型存储大小请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
- TODO
- [ ] EfficientLite、GhostNet、RegNet论文指标复现和性能评估
-- [ ] The speed benchmark on P4/T4
## 高阶优化支持
除了提供丰富的分类网络结构和预训练模型,PaddleClas也支持了一系列有助于图像分类任务效果和效率提升的算法或工具。
@@ -41,14 +40,14 @@ src="https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/images/models/m
+src="./docs/images/distillation/distillation_perform_s.jpg" width="700">
以在ImageNet1K蒸馏模型为例,SSLD知识蒸馏方案框架图如下,该方案的核心关键点包括教师模型的选择、loss计算方式、迭代轮数、无标签数据的使用、以及ImageNet1k蒸馏finetune,每部分的详细介绍以及实验介绍请参考文档教程中的[**知识蒸馏章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/distillation/index.html)。
+src="./docs/images/distillation/ppcls_distillation_s.jpg" width="700">
### 数据增广
@@ -57,14 +56,14 @@ src="https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/images/distilla
+src="./docs/images/image_aug/image_aug_samples_s.jpg" width="800">
PaddleClas提供了上述8种数据增广算法的复现和在统一实验环境下的效果评估。下图展示了不同数据增广方式在ResNet50上的表现, 与标准变换相比,采用数据增广,识别准确率最高可以提升1%。每种数据增广方法的详细介绍、对比的实验环境请参考文档教程中的[**数据增广章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/index.html)。
+src="./docs/images/image_aug/main_image_aug_s.jpg" width="600">
## 30分钟玩转PaddleClas
@@ -80,7 +79,7 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调(f
### 10万类图像分类预训练模型
在实际应用中,由于训练数据匮乏,往往将ImageNet1K数据集训练的分类模型作为预训练模型,进行图像分类的迁移学习。然而ImageNet1K数据集的类别只有1000种,预训练模型的特征迁移能力有限。因此百度自研了一个有语义体系的、粒度有粗有细的10w级别的Tag体系,通过人工或半监督方式,至今收集到 5500w+图片训练数据;该系统是国内甚至世界范围内最大规模的图片分类体系和训练集合。PaddleClas提供了在该数据集上训练的ResNet50_vd的模型。下表显示了一些实际应用场景中,使用ImageNet预训练模型和上述10万类图像分类预训练模型的效果比对,使用10万类图像分类预训练模型,识别准确率最高可以提升30%。
-
+
| 数据集 | 数据统计 | ImageNet预训练模型 | 10万类图像分类预训练模型 |
|:--:|:--:|:--:|:--:|
| 花卉 | class_num:102
train/val:5789/2396 | 0.7779 | 0.9892 |
@@ -92,16 +91,12 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调(f
10万类图像分类预训练模型下载地址如下,更多的相关内容请参考文档教程中的[**图像分类迁移学习章节**](https://paddleclas.readthedocs.io/zh_CN/latest/application/transfer_learning.html#id1)。
-- [**10万类预训练模型下载地址**](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar)
+- **10万类预训练模型:**[**下载地址**](https://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_vd_10w_pretrained.tar)
### 通用目标检测
近年来,学术界和工业界广泛关注图像中目标检测任务,而图像分类的网络结构以及预训练模型效果直接影响目标检测的效果。PaddleDetection使用PaddleClas的82.39%的ResNet50_vd的预训练模型,结合自身丰富的检测算子,提供了一种面向服务器端应用的目标检测方案,PSS-DET (Practical Server Side Detection)。该方案融合了多种只增加少许计算量,但是可以有效提升两阶段Faster RCNN目标检测效果的策略,包括检测模型剪裁、使用分类效果更优的预训练模型、DCNv2、Cascade RCNN、AutoAugment、Libra sampling以及多尺度训练。其中基于82.39%的R50_vd_ssld预训练模型,与79.12%的R50_vd的预训练模型相比,检测效果可以提升1.5%。在COCO目标检测数据集上测试PSS-DET,当V100单卡预测速度为61FPS时,mAP是41.6%,预测速度为20FPS时,mAP是47.8%。详情请参考[**通用目标检测章节**](https://paddleclas.readthedocs.io/zh_CN/latest/application/object_detection.html)。
-
-
-
- TODO
- [ ] PaddleClas在OCR任务中的应用
diff --git a/configs/CSPNet/CSPResNet50.yaml b/configs/CSPNet/CSPResNet50.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..78b56af93b1a77f05510092b8236f1b1bfb596a5
--- /dev/null
+++ b/configs/CSPNet/CSPResNet50.yaml
@@ -0,0 +1,76 @@
+mode: 'train'
+ARCHITECTURE:
+ name: 'CSPResNet50_leaky'
+
+pretrained_model: ""
+model_save_dir: "./output/"
+classes_num: 1000
+total_images: 1281167
+save_interval: 1
+validate: True
+valid_interval: 1
+epochs: 120
+topk: 5
+image_shape: [3, 256, 256]
+
+use_mix: False
+ls_epsilon: -1
+
+LEARNING_RATE:
+ function: 'Piecewise'
+ params:
+ lr: 0.1
+ decay_epochs: [30, 60, 90]
+ gamma: 0.1
+
+OPTIMIZER:
+ function: 'Momentum'
+ params:
+ momentum: 0.9
+ regularizer:
+ function: 'L2'
+ factor: 0.000100
+
+TRAIN:
+ batch_size: 256
+ num_workers: 4
+ file_list: "./dataset/ILSVRC2012/train_list.txt"
+ data_dir: "./dataset/ILSVRC2012/"
+ shuffle_seed: 0
+ transforms:
+ - DecodeImage:
+ to_rgb: True
+ to_np: False
+ channel_first: False
+ - RandCropImage:
+ size: 256
+ - RandFlipImage:
+ flip_code: 1
+ - NormalizeImage:
+ scale: 1./255.
+ mean: [0.485, 0.456, 0.406]
+ std: [0.229, 0.224, 0.225]
+ order: ''
+ - ToCHWImage:
+
+VALID:
+ batch_size: 64
+ num_workers: 4
+ file_list: "./dataset/ILSVRC2012/val_list.txt"
+ data_dir: "./dataset/ILSVRC2012/"
+ shuffle_seed: 0
+ transforms:
+ - DecodeImage:
+ to_rgb: True
+ to_np: False
+ channel_first: False
+ - ResizeImage:
+ resize_short: 256
+ - CropImage:
+ size: 256
+ - NormalizeImage:
+ scale: 1.0/255.0
+ mean: [0.485, 0.456, 0.406]
+ std: [0.229, 0.224, 0.225]
+ order: ''
+ - ToCHWImage:
diff --git a/configs/ResNet/ResNet50_fp16.yml b/configs/ResNet/ResNet50_fp16.yml
new file mode 100644
index 0000000000000000000000000000000000000000..a952833221a193cefe32c004f40181eaa409188d
--- /dev/null
+++ b/configs/ResNet/ResNet50_fp16.yml
@@ -0,0 +1,81 @@
+mode: 'train'
+ARCHITECTURE:
+ name: 'ResNet50'
+
+pretrained_model: ""
+model_save_dir: "./output/"
+classes_num: 1000
+total_images: 1281167
+save_interval: 1
+validate: True
+valid_interval: 1
+epochs: 120
+topk: 5
+image_shape: [3, 224, 224]
+
+# mixed precision training
+use_fp16: True
+amp_scale_loss: 128.0
+use_dynamic_loss_scaling: True
+
+use_mix: False
+ls_epsilon: -1
+
+LEARNING_RATE:
+ function: 'Piecewise'
+ params:
+ lr: 0.1
+ decay_epochs: [30, 60, 90]
+ gamma: 0.1
+
+OPTIMIZER:
+ function: 'Momentum'
+ params:
+ momentum: 0.9
+ regularizer:
+ function: 'L2'
+ factor: 0.000100
+
+TRAIN:
+ batch_size: 256
+ num_workers: 4
+ file_list: "./dataset/ILSVRC2012/train_list.txt"
+ data_dir: "./dataset/ILSVRC2012/"
+ shuffle_seed: 0
+ transforms:
+ - DecodeImage:
+ to_rgb: True
+ to_np: False
+ channel_first: False
+ - RandCropImage:
+ size: 224
+ - RandFlipImage:
+ flip_code: 1
+ - NormalizeImage:
+ scale: 1./255.
+ mean: [0.485, 0.456, 0.406]
+ std: [0.229, 0.224, 0.225]
+ order: ''
+ - ToCHWImage:
+
+VALID:
+ batch_size: 64
+ num_workers: 4
+ file_list: "./dataset/ILSVRC2012/val_list.txt"
+ data_dir: "./dataset/ILSVRC2012/"
+ shuffle_seed: 0
+ transforms:
+ - DecodeImage:
+ to_rgb: True
+ to_np: False
+ channel_first: False
+ - ResizeImage:
+ resize_short: 256
+ - CropImage:
+ size: 224
+ - NormalizeImage:
+ scale: 1.0/255.0
+ mean: [0.485, 0.456, 0.406]
+ std: [0.229, 0.224, 0.225]
+ order: ''
+ - ToCHWImage:
diff --git a/docs/images/models/DPN.png.flops.png b/docs/images/models/DPN.png.flops.png
deleted file mode 100644
index 72bb96f49812711035ec09ce0d8d44202d17cfcb..0000000000000000000000000000000000000000
Binary files a/docs/images/models/DPN.png.flops.png and /dev/null differ
diff --git a/docs/images/models/DPN.png.fp32.png b/docs/images/models/DPN.png.fp32.png
deleted file mode 100644
index bcd524c781d305fdacfd5a07886851fadbeeeddc..0000000000000000000000000000000000000000
Binary files a/docs/images/models/DPN.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/DPN.png.params.png b/docs/images/models/DPN.png.params.png
deleted file mode 100644
index 818f1961bbaf98cd0aff0c5405113f107fa02e54..0000000000000000000000000000000000000000
Binary files a/docs/images/models/DPN.png.params.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png b/docs/images/models/EfficientNet.png
deleted file mode 100644
index 5556481c960432cab6080c243644ab43783ceabb..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png.flops.png b/docs/images/models/EfficientNet.png.flops.png
deleted file mode 100644
index dd3c36ced2973133bdd0bb6b300125b25fdcefe4..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png.flops.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png.fp32.png b/docs/images/models/EfficientNet.png.fp32.png
deleted file mode 100644
index eca753f7d84699cebee13d15484291d96f0a9b6f..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/EfficientNet.png.params.png b/docs/images/models/EfficientNet.png.params.png
deleted file mode 100644
index 2348c55013998bcb2ed2c3edff282493243ee37a..0000000000000000000000000000000000000000
Binary files a/docs/images/models/EfficientNet.png.params.png and /dev/null differ
diff --git a/docs/images/models/HRNet.png.flops.png b/docs/images/models/HRNet.png.flops.png
deleted file mode 100644
index 5f8ce9cd2c1c8ed9e8fb775bd89a8acd3a8e9402..0000000000000000000000000000000000000000
Binary files a/docs/images/models/HRNet.png.flops.png and /dev/null differ
diff --git a/docs/images/models/HRNet.png.fp32.png b/docs/images/models/HRNet.png.fp32.png
deleted file mode 100644
index 0e73fb4b57dc374560efa92429a2dd457c73369e..0000000000000000000000000000000000000000
Binary files a/docs/images/models/HRNet.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/HRNet.png.params.png b/docs/images/models/HRNet.png.params.png
deleted file mode 100644
index e4443a770ac0ca910d1158fe9adaf6dd92e680aa..0000000000000000000000000000000000000000
Binary files a/docs/images/models/HRNet.png.params.png and /dev/null differ
diff --git a/docs/images/models/Inception.png.flops.png b/docs/images/models/Inception.png.flops.png
deleted file mode 100644
index 589f3931c1feef3e1c0245566cd0c7e0a22782d8..0000000000000000000000000000000000000000
Binary files a/docs/images/models/Inception.png.flops.png and /dev/null differ
diff --git a/docs/images/models/Inception.png.fp32.png b/docs/images/models/Inception.png.fp32.png
deleted file mode 100644
index b9245800a2d7ca6fad5ed7457e55356d579a81d0..0000000000000000000000000000000000000000
Binary files a/docs/images/models/Inception.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/Inception.png.params.png b/docs/images/models/Inception.png.params.png
deleted file mode 100644
index 657c4360451b24905b9a1ce170e4da719d35d917..0000000000000000000000000000000000000000
Binary files a/docs/images/models/Inception.png.params.png and /dev/null differ
diff --git a/docs/images/models/ResNet.png.flops.png b/docs/images/models/ResNet.png.flops.png
deleted file mode 100644
index da1fd2eb359f57fbd545a5436b737fe23df6e891..0000000000000000000000000000000000000000
Binary files a/docs/images/models/ResNet.png.flops.png and /dev/null differ
diff --git a/docs/images/models/ResNet.png.fp32.png b/docs/images/models/ResNet.png.fp32.png
deleted file mode 100644
index 05020997f2b6eb2c926d2a8948ed69b393b6cd3b..0000000000000000000000000000000000000000
Binary files a/docs/images/models/ResNet.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/ResNet.png.params.png b/docs/images/models/ResNet.png.params.png
deleted file mode 100644
index 6fcbb69cc1e1e9a3402f2849fe016a73312d9a79..0000000000000000000000000000000000000000
Binary files a/docs/images/models/ResNet.png.params.png and /dev/null differ
diff --git a/docs/images/models/SeResNeXt.png.flops.png b/docs/images/models/SeResNeXt.png.flops.png
deleted file mode 100644
index 51d6d6497e9cd582ad671a79b9e24bbdc1a9bdae..0000000000000000000000000000000000000000
Binary files a/docs/images/models/SeResNeXt.png.flops.png and /dev/null differ
diff --git a/docs/images/models/SeResNeXt.png.fp32.png b/docs/images/models/SeResNeXt.png.fp32.png
deleted file mode 100644
index 452488955096f896ecb9dafe07885f666c92d8ad..0000000000000000000000000000000000000000
Binary files a/docs/images/models/SeResNeXt.png.fp32.png and /dev/null differ
diff --git a/docs/images/models/SeResNeXt.png.params.png b/docs/images/models/SeResNeXt.png.params.png
deleted file mode 100644
index 9898f52fb0be6bdfb22d39a1f4a16c98e20ab510..0000000000000000000000000000000000000000
Binary files a/docs/images/models/SeResNeXt.png.params.png and /dev/null differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..1c87d711ea11f3d662cdf78e34959ddd1f355f76
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png
new file mode 100644
index 0000000000000000000000000000000000000000..1eb393903bbf3b4e02b83962b33c0e0a3b4e341a
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..f21d63cd1a1e24481947875db23f8447af1a65ca
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png
new file mode 100644
index 0000000000000000000000000000000000000000..8095a3c0253170c00d8ae74af4dec25c4f9544eb
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..53a603ecad2e36df580167e134ef036df14d5596
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png
new file mode 100644
index 0000000000000000000000000000000000000000..99b8a039e0fda22053e7d7cb971d5e83b208ec6b
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..395a32f5c7e28ed09ee2b7e12e4a3ea2e9094154
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..24aabf8c3fb6607e4bb17f4d4dcc72d733476c48
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..689e73d31d70cf8566c356142080d79a0f64d6e3
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png
new file mode 100644
index 0000000000000000000000000000000000000000..dc3922d2e2347f11b193057de9bbf730489b9cc1
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..deacfaca6a279fed852dfcbf0006dc497191a7a8
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..7177bbc56b374dd71c66231132ead01c8b141732
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..062ecd79d2fc3ab788212a238b8ca4627b0dd14f
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..4bb3f76caecbdbd43f04ed62507985476f5cac40
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..3905f0b38b7cffa856d944d99ef0035cf6d4b489
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..6fdc94b27130924d0e71a7a6954f239e632a4215
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..25a5d1648e1cf220a383377a8d9fd5ae3c9eceea
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png
new file mode 100644
index 0000000000000000000000000000000000000000..7ef4f339ae2b9414b4cf71a8772cc4c9a92a0ad1
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..755adb5684562c341816ca98856490d270735ac5
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..44e03fdd20df450573889e5c4eca85cf5b686d9b
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..a461b9ec281c2340780da6d69e625a0947ffa42e
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..197522d16401ef6430313d36ac235a7b37e1d7f9
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..6943fc056c1cd22b4b4d1767acf72fd94a209d0d
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png
new file mode 100644
index 0000000000000000000000000000000000000000..8476efb33b73acd836993a5fec3967122da319ab
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png
new file mode 100644
index 0000000000000000000000000000000000000000..965efff5c1a3bd32c9bc2da9c1b3034dc2cd55ef
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png
new file mode 100644
index 0000000000000000000000000000000000000000..8c1be3ae9b32773ca56adac9dc1c15d9532e8f1a
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png differ
diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png
new file mode 100644
index 0000000000000000000000000000000000000000..a41a325b5f5e9e85b86abfae2e62c9e2132fc7ec
Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png
new file mode 100644
index 0000000000000000000000000000000000000000..10f542b7fc989da80af12244cd45662bccfe677b
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..0491ff3e6f4795fec2e131a4df31b4a3b102900e
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..284088ec5555e2128eab564cf8331532cfe08370
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png
new file mode 100644
index 0000000000000000000000000000000000000000..0e0e42e7ccc319bd2c9842c3d99c849b547ac603
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png
new file mode 100644
index 0000000000000000000000000000000000000000..2332791866c06b9cdcfc576947280de27ba3667c
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png differ
diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png
new file mode 100644
index 0000000000000000000000000000000000000000..610c846347f008216589ed080d834588d7fedfe0
Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png differ
diff --git a/docs/images/models/main_fps_top1_s.jpg b/docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg
similarity index 100%
rename from docs/images/models/main_fps_top1_s.jpg
rename to docs/images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg
diff --git a/docs/images/models/main_fps_top1.png b/docs/images/models/main_fps_top1.png
deleted file mode 100755
index a7149d7de16c71dbd3359efdbcee519a486a2278..0000000000000000000000000000000000000000
Binary files a/docs/images/models/main_fps_top1.png and /dev/null differ
diff --git a/docs/images/models/mobile_arm_storage.png b/docs/images/models/mobile_arm_storage.png
old mode 100755
new mode 100644
index 350fd3ced05e802250a461a747f60c2e5cf1be0c..07e1f4f3fe95ed7e9a358b212f4d5d939fb94a02
Binary files a/docs/images/models/mobile_arm_storage.png and b/docs/images/models/mobile_arm_storage.png differ
diff --git a/docs/images/models/mobile_arm_top1.png b/docs/images/models/mobile_arm_top1.png
old mode 100755
new mode 100644
index 37091dd2c91a334c2888d5facec7598ad3219e84..06add75fe630510e6ab9af62b3fa93dc166b7944
Binary files a/docs/images/models/mobile_arm_top1.png and b/docs/images/models/mobile_arm_top1.png differ
diff --git a/docs/images/models/mobile_arm_top1_s.jpg b/docs/images/models/mobile_arm_top1_s.jpg
deleted file mode 100644
index e5a3e77e9f0aabf97f9687f2005be5e1baf16fd1..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_arm_top1_s.jpg and /dev/null differ
diff --git a/docs/images/models/mobile_trt.png b/docs/images/models/mobile_trt.png
deleted file mode 100644
index d722548bae56b9aeca081cffe3e7d34494864033..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_trt.png and /dev/null differ
diff --git a/docs/images/models/mobile_trt.png.flops.png b/docs/images/models/mobile_trt.png.flops.png
deleted file mode 100644
index 6e7010906c58ef9e1f4f286725442ca85e638c51..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_trt.png.flops.png and /dev/null differ
diff --git a/docs/images/models/mobile_trt.png.params.png b/docs/images/models/mobile_trt.png.params.png
deleted file mode 100644
index 35b78a65b390f9843def8870bbd63f656070cdcc..0000000000000000000000000000000000000000
Binary files a/docs/images/models/mobile_trt.png.params.png and /dev/null differ
diff --git a/docs/zh_CN/extension/paddle_mobile_inference.md b/docs/zh_CN/extension/paddle_mobile_inference.md
index e076fc7d471e8fd3ada051e3df171b23953884fe..833231f736ffc19cb3b705025d7511dc8eba6bc4 100644
--- a/docs/zh_CN/extension/paddle_mobile_inference.md
+++ b/docs/zh_CN/extension/paddle_mobile_inference.md
@@ -1,5 +1,119 @@
# Paddle-Lite
+## 一、简介
+
[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite) 是飞桨推出的一套功能完善、易用性强且性能卓越的轻量化推理引擎。
轻量化体现在使用较少比特数用于表示神经网络的权重和激活,能够大大降低模型的体积,解决终端设备存储空间有限的问题,推理性能也整体优于其他框架。
-[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 使用 Paddle-Lite 进行了[移动端模型的性能评估](../models/Mobile.md),具体流程参考 [Paddle-Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/)。
+[PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 使用 Paddle-Lite 进行了[移动端模型的性能评估](../models/Mobile.md),本部分以`ImageNet1k`数据集的`MobileNetV1`模型为例,介绍怎样使用`Paddle-Lite`,在移动端(基于骁龙855的安卓开发平台)对进行模型速度评估。
+
+
+## 二、评估步骤
+
+### 2.1 导出inference模型
+
+* 首先需要将训练过程中保存的模型存储为用于预测部署的固化模型,可以使用`tools/export_model.py`导出inference模型,具体使用方法如下。
+
+```shell
+python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
+```
+
+最终在`inference/MobileNetV1`文件夹下会保存得到`model`与`parmas`文件。
+
+
+### 2.2 benchmark二进制文件下载
+
+* 使用adb(Android Debug Bridge)工具可以连接Android手机与PC端,并进行开发调试等。安装好adb,并确保PC端和手机连接成功后,使用以下命令可以查看手机的ARM版本,并基于此选择合适的预编译库。
+
+```shell
+adb shell getprop ro.product.cpu.abi
+```
+
+* 下载benchmark_bin文件
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
+```
+
+如果查看的ARM版本为v7,则需要下载v7版本的benchmark_bin文件,下载命令如下。
+
+```shell
+wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
+```
+
+### 2.3 模型速度benchmark
+
+PC端和手机连接成功后,使用下面的命令开始模型评估。
+
+```
+sh tools/lite/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
+```
+
+其中`./benchmark_bin_v8`为benchmark二进制文件路径,`./inference`为所有需要评测的模型的路径,`result_armv8.txt`为保存的结果文件,最后的参数`true`表示在评估之后会首先进行模型优化。最终在当前文件夹下会输出`result_armv8.txt`的评估结果文件,具体信息如下。
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1 min = 30.89100 max = 30.73600 average = 30.79750
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1 min = 18.26600 max = 18.14000 average = 18.21637
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1 min = 10.03200 max = 9.94300 average = 9.97627
+```
+
+这里给出了不同线程数下的模型预测速度,单位为FPS,以线程数为1为例,MobileNetV1在骁龙855上的平均速度为`30.79750FPS`。
+
+
+### 2.4 模型优化与速度评估
+
+
+* 在2.3节中提到了在模型评估之前对其进行优化,在这里也可以首先对模型进行优化,再直接加载优化后的模型进行速度评估。
+
+* Paddle-Lite 提供了多种策略来自动优化原始的训练模型,其中包括量化、子图融合、混合调度、Kernel优选等等方法。为了使优化过程更加方便易用,Paddle-Lite提供了opt 工具来自动完成优化步骤,输出一个轻量的、最优的可执行模型。可以在[Paddle-Lite模型优化工具页面](https://paddle-lite.readthedocs.io/zh/latest/user_guides/model_optimize_tool.html)下载。在这里以`MacOS`开发环境为例,下载[opt_mac](https://paddlelite-data.bj.bcebos.com/model_optimize_tool/opt_mac)模型优化工具,并使用下面的命令对模型进行优化。
+
+
+
+```shell
+model_file="../MobileNetV1/model"
+param_file="../MobileNetV1/params"
+opt_models_dir="./opt_models"
+mkdir ${opt_models_dir}
+./opt_mac --model_file=${model_file} \
+ --param_file=${param_file} \
+ --valid_targets=arm \
+ --optimize_out_type=naive_buffer \
+ --prefer_int8_kernel=false \
+ --optimize_out=${opt_models_dir}/MobileNetV1
+```
+
+其中`model_file`与`param_file`分别是导出的inference模型结构文件与参数文件地址,转换成功后,会在`opt_models`文件夹下生成`MobileNetV1.nb`文件。
+
+
+
+
+使用benchmark_bin文件加载优化后的模型进行评估,具体的命令如下。
+
+```shell
+bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
+```
+
+最终`result_armv8.txt`中结果如下。
+
+```
+PaddleLite Benchmark
+Threads=1 Warmup=10 Repeats=30
+MobileNetV1_lite min = 30.89500 max = 30.78500 average = 30.84173
+
+Threads=2 Warmup=10 Repeats=30
+MobileNetV1_lite min = 18.25300 max = 18.11000 average = 18.18017
+
+Threads=4 Warmup=10 Repeats=30
+MobileNetV1_lite min = 10.00600 max = 9.90000 average = 9.96177
+```
+
+
+以线程数为1为例,MobileNetV1在骁龙855上的平均速度为`30.84173FPS`。
+
+
+更加具体的参数解释与Paddle-Lite使用方法可以参考 [Paddle-Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/)。
diff --git a/docs/zh_CN/extension/paddle_serving.md b/docs/zh_CN/extension/paddle_serving.md
index 62b8cda8b74283fea701a86f6f05e5194b0a6da8..7b9102042b5018ab972bbd2542526186b5b402df 100644
--- a/docs/zh_CN/extension/paddle_serving.md
+++ b/docs/zh_CN/extension/paddle_serving.md
@@ -1,4 +1,65 @@
# 模型服务化部署
-[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻易部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端等特点,详细使用请参考 [Paddle Serving 相关文档](https://github.com/PaddlePaddle/Serving)。
+## 一、简介
+[Paddle Serving](https://github.com/PaddlePaddle/Serving) 旨在帮助深度学习开发者轻易部署在线预测服务,支持一键部署工业级的服务能力、客户端和服务端之间高并发和高效通信、并支持多种编程语言开发客户端。
+该部分以HTTP预测服务部署为例,介绍怎样在PaddleClas中使用PaddleServing部署模型服务。
+
+
+## 二、Serving安装
+
+Serving官网推荐使用docker安装并部署Serving环境。首先需要拉取docker环境并创建基于Serving的docker。
+
+```shell
+nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
+nvidia-docker exec -it test bash
+```
+
+进入docker后,需要安装Serving相关的python包。
+
+```shell
+pip install paddlepaddle-gpu
+pip install paddle-serving-client
+pip install paddle-serving-server-gpu
+```
+
+* 如果安装速度太慢,可以通过`-i https://pypi.tuna.tsinghua.edu.cn/simple`更换源,加速安装过程。
+
+* 如果希望部署CPU服务,可以安装serving-server的cpu版本,安装命令如下。
+
+```shell
+pip install paddle-serving-server
+```
+
+### 三、导出模型
+
+使用`tools/export_serving_model.py`脚本导出Serving模型,以`ResNet50_vd`为例,使用方法如下。
+
+```shell
+python tools/export_serving_model.py -m ResNet50_vd -p ./pretrained/ResNet50_vd_pretrained/ -o serving
+```
+
+最终在serving文件夹下会生成`ppcls_client_conf`与`ppcls_model`两个文件夹,分别存储了client配置、模型参数与结构文件。
+
+
+### 四、服务部署与请求
+
+* 使用下面的方式启动Serving服务。
+
+```shell
+python tools/serving/image_service_gpu.py serving/ppcls_model workdir 9292
+```
+
+其中`serving/ppcls_model`为刚才保存的Serving模型地址,`workdir`为为工作目录,`9292`为服务的端口号。
+
+
+* 使用下面的脚本向Serving服务发送识别请求,并返回结果。
+
+```
+python tools/serving/image_http_client.py 9292 ./docs/images/logo.png
+```
+
+`9292`为发送请求的端口号,需要与服务启动时的端口号保持一致,`./docs/images/logo.png`为待识别的图像文件。最终返回Top1识别结果的类别ID以及概率值。
+
+* 更多的服务部署类型,如`RPC预测服务`等,可以参考Serving的github官网:[https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet)
diff --git a/docs/zh_CN/faq.md b/docs/zh_CN/faq.md
index e3037e45c7b0c19f39bdae4fb6558ea2eab19a8b..36aff3ec2458c80802a9da2c2c9244efa3387dd9 100644
--- a/docs/zh_CN/faq.md
+++ b/docs/zh_CN/faq.md
@@ -1,10 +1,5 @@
# FAQ
->>
-* Q: 启动训练后,为什么当前终端中的输出信息一直没有更新?
-* A: 启动运行后,日志会实时输出到`mylog/workerlog.*`中,可以在这里查看实时的日志。
-
-
>>
* Q: 多卡评估时,为什么每张卡输出的精度指标不相同?
* A: 目前PaddleClas基于fleet api使用多卡,在多卡评估时,每张卡都是单独读取各自part的数据,不同卡中计算的图片是不同的,因此最终指标也会有微量差异,如果希望得到准确的评估指标,可以使用单卡评估。
@@ -47,3 +42,7 @@ VALID:
order: ''
- ToCHWImage:
```
+
+>>
+* Q: 如果想将保存的`pdparams`模型参数文件转换为早期版本(Paddle1.7.0之前)的零碎文件(每个文件均为一个单独的模型参数),该怎么实现呢?
+* A: 可以首先导入`pdparams`模型,之后使用`fluid.io.save_vars`函数将模型保存为零散的碎文件。
diff --git a/docs/zh_CN/models/DPN_DenseNet.md b/docs/zh_CN/models/DPN_DenseNet.md
index 53092bd74014c42fae872cd20e0ea24ff858989c..25f61476e43d752f2426c000885166963de7ebc4 100644
--- a/docs/zh_CN/models/DPN_DenseNet.md
+++ b/docs/zh_CN/models/DPN_DenseNet.md
@@ -4,13 +4,15 @@
DenseNet是2017年CVPR best paper提出的一种新的网络结构,该网络设计了一种新的跨层连接的block,即dense-block。相比ResNet中的bottleneck,dense-block设计了一个更激进的密集连接机制,即互相连接所有的层,每个层都会接受其前面所有层作为其额外的输入。DenseNet将所有的dense-block堆叠,组合成了一个密集连接型网络。密集的连接方式使得DenseNe更容易进行梯度的反向传播,使得网络更容易训练。
DPN的全称是Dual Path Networks,即双通道网络。该网络是由DenseNet和ResNeXt结合的一个网络,其证明了DenseNet能从靠前的层级中提取到新的特征,而ResNeXt本质上是对之前层级中已提取特征的复用。作者进一步分析发现,ResNeXt对特征有高复用率,但冗余度低,DenseNet能创造新特征,但冗余度高。结合二者结构的优势,作者设计了DPN网络。最终DPN网络在同样FLOPS和参数量下,取得了比ResNeXt与DenseNet更好的结果。
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
-![](../../images/models/DPN.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png)
-![](../../images/models/DPN.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png)
-![](../../images/models/DPN.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png)
目前PaddleClas开源的这两类模型的预训练模型一共有10个,其指标如上图所示,可以看到,在相同的FLOPS和参数量下,相比DenseNet,DPN拥有更高的精度。但是由于DPN有更多的分支,所以其推理速度要慢于DenseNet。由于DenseNet264的网络层数最深,所以该网络是DenseNet系列模型中参数量最大的网络,DenseNet161的网络的宽度最大,导致其是该系列中网络中计算量最大、精度最高的网络。从推理速度来看,计算量大且精度高的的DenseNet161比DenseNet264具有更快的速度,所以其比DenseNet264具有更大的优势。
@@ -34,9 +36,9 @@ DPN的全称是Dual Path Networks,即双通道网络。该网络是由DenseNet
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|-------------|-----------|-------------------|--------------------------|
| DenseNet121 | 224 | 256 | 4.371 |
| DenseNet161 | 224 | 256 | 8.863 |
@@ -48,3 +50,20 @@ DPN的全称是Dual Path Networks,即双通道网络。该网络是由DenseNet
| DPN98 | 224 | 256 | 21.057 |
| DPN107 | 224 | 256 | 28.685 |
| DPN131 | 224 | 256 | 28.083 |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| DenseNet121 | 224 | 256 | 4.16436 | 7.2126 | 10.50221 | 4.40447 | 9.32623 | 15.25175 |
+| DenseNet161 | 224 | 256 | 9.27249 | 14.25326 | 20.19849 | 10.39152 | 22.15555 | 35.78443 |
+| DenseNet169 | 224 | 256 | 6.11395 | 10.28747 | 13.68717 | 6.43598 | 12.98832 | 20.41964 |
+| DenseNet201 | 224 | 256 | 7.9617 | 13.4171 | 17.41949 | 8.20652 | 17.45838 | 27.06309 |
+| DenseNet264 | 224 | 256 | 11.70074 | 19.69375 | 24.79545 | 12.14722 | 26.27707 | 40.01905 |
+| DPN68 | 224 | 256 | 11.7827 | 13.12652 | 16.19213 | 11.64915 | 12.82807 | 18.57113 |
+| DPN92 | 224 | 256 | 18.56026 | 20.35983 | 29.89544 | 18.15746 | 23.87545 | 38.68821 |
+| DPN98 | 224 | 256 | 21.70508 | 24.7755 | 40.93595 | 21.18196 | 33.23925 | 62.77751 |
+| DPN107 | 224 | 256 | 27.84462 | 34.83217 | 60.67903 | 27.62046 | 52.65353 | 100.11721 |
+| DPN131 | 224 | 256 | 28.58941 | 33.01078 | 55.65146 | 28.33119 | 46.19439 | 89.24904 |
diff --git a/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md b/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md
index eadd17683d0219b4ee72d7aba71dc04743896ea5..3cbe4009dfd17cbc88ead5f3bb91d4ed9cc7470d 100644
--- a/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md
+++ b/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md
@@ -6,13 +6,15 @@ EfficientNet是Google于2019年发布的一个基于NAS的轻量级网络,其
ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019年,facebook通过弱监督学习研究了该系列网络在ImageNet上的精度上限,为了区别之前的ResNeXt网络,该系列网络的后缀为wsl,其中wsl是弱监督学习(weakly-supervised-learning)的简称。为了能有更强的特征提取能力,研究者将其网络宽度进一步放大,其中最大的ResNeXt101_32x48d_wsl拥有8亿个参数,将其在9.4亿的弱标签图片下训练并在ImageNet-1k上做finetune,最终在ImageNet-1k的top-1达到了85.4%,这也是迄今为止在ImageNet-1k的数据集上以224x224的分辨率下精度最高的网络。Fix-ResNeXt中,作者使用了更大的图像分辨率,针对训练图片和验证图片数据预处理不一致的情况下做了专门的Fix策略,并使得ResNeXt101_32x48d_wsl拥有了更高的精度,由于其用到了Fix策略,故命名为Fix-ResNeXt101_32x48d_wsl。
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
-![](../../images/models/EfficientNet.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png)
-![](../../images/models/EfficientNet.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png)
-![](../../images/models/EfficientNet.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png)
目前PaddleClas开源的这两类模型的预训练模型一共有14个。从上图中可以看出EfficientNet系列网络优势非常明显,ResNeXt101_wsl系列模型由于用到了更多的数据,最终的精度也更高。EfficientNet_B0_Small是去掉了SE_block的EfficientNet_B0,其具有更快的推理速度。
@@ -36,9 +38,9 @@ ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019
| EfficientNetB0_
small | 0.758 | 0.926 | | | 0.720 | 4.650 |
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|-------------------------------|-----------|-------------------|--------------------------|
| ResNeXt101_
32x8d_wsl | 224 | 256 | 19.127 |
| ResNeXt101_
32x16d_wsl | 224 | 256 | 23.629 |
@@ -54,3 +56,24 @@ ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019
| EfficientNetB6 | 528 | 560 | 18.381 |
| EfficientNetB7 | 600 | 632 | 27.817 |
| EfficientNetB0_
small | 224 | 256 | 1.692 |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNeXt101_
32x8d_wsl | 224 | 256 | 18.19374 | 21.93529 | 34.67802 | 18.52528 | 34.25319 | 67.2283 |
+| ResNeXt101_
32x16d_wsl | 224 | 256 | 18.52609 | 36.8288 | 62.79947 | 25.60395 | 71.88384 | 137.62327 |
+| ResNeXt101_
32x32d_wsl | 224 | 256 | 33.51391 | 70.09682 | 125.81884 | 54.87396 | 160.04337 | 316.17718 |
+| ResNeXt101_
32x48d_wsl | 224 | 256 | 50.97681 | 137.60926 | 190.82628 | 99.01698256 | 315.91261 | 551.83695 |
+| Fix_ResNeXt101_
32x48d_wsl | 320 | 320 | 78.62869 | 191.76039 | 317.15436 | 160.0838242 | 595.99296 | 1151.47384 |
+| EfficientNetB0 | 224 | 256 | 3.40122 | 5.95851 | 9.10801 | 3.442 | 6.11476 | 9.3304 |
+| EfficientNetB1 | 240 | 272 | 5.25172 | 9.10233 | 14.11319 | 5.3322 | 9.41795 | 14.60388 |
+| EfficientNetB2 | 260 | 292 | 5.91052 | 10.5898 | 17.38106 | 6.29351 | 10.95702 | 17.75308 |
+| EfficientNetB3 | 300 | 332 | 7.69582 | 16.02548 | 27.4447 | 7.67749 | 16.53288 | 28.5939 |
+| EfficientNetB4 | 380 | 412 | 11.55585 | 29.44261 | 53.97363 | 12.15894 | 30.94567 | 57.38511 |
+| EfficientNetB5 | 456 | 488 | 19.63083 | 56.52299 | - | 20.48571 | 61.60252 | - |
+| EfficientNetB6 | 528 | 560 | 30.05911 | - | - | 32.62402 | - | - |
+| EfficientNetB7 | 600 | 632 | 47.86087 | - | - | 53.93823 | - | - |
+| EfficientNetB0_small | 224 | 256 | 2.39166 | 4.36748 | 6.96002 | 2.3076 | 4.71886 | 7.21888 |
diff --git a/docs/zh_CN/models/HRNet.md b/docs/zh_CN/models/HRNet.md
index c33fb0fa025fd9b7f1993186d622c2acfdd22acb..f694f7b0c1d6d6c9b195fa61aa0cc9544564859d 100644
--- a/docs/zh_CN/models/HRNet.md
+++ b/docs/zh_CN/models/HRNet.md
@@ -3,13 +3,16 @@
## 概述
HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络,不同于以往的卷积神经网络,该网络在网络深层仍然可以保持高分辨率,因此预测的关键点热图更准确,在空间上也更精确。此外,该网络在对分辨率敏感的其他视觉任务中,如检测、分割等,表现尤为优异。
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
-![](../../images/models/HRNet.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png)
-![](../../images/models/HRNet.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.HRNet.png)
-![](../../images/models/HRNet.png.fp32.png)
目前PaddleClas开源的这类模型的预训练模型一共有7个,其指标如图所示,其中HRNet_W48_C指标精度异常的原因可能是因为网络训练的正常波动。
@@ -26,9 +29,9 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络,
| HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 |
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|-------------|-----------|-------------------|--------------------------|
| HRNet_W18_C | 224 | 256 | 7.368 |
| HRNet_W30_C | 224 | 256 | 9.402 |
@@ -37,3 +40,18 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络,
| HRNet_W44_C | 224 | 256 | 11.497 |
| HRNet_W48_C | 224 | 256 | 12.165 |
| HRNet_W64_C | 224 | 256 | 15.003 |
+
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| HRNet_W18_C | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 |
+| HRNet_W30_C | 224 | 256 | 8.98077 | 14.08082 | 21.23527 | 9.57594 | 17.35485 | 32.6933 |
+| HRNet_W32_C | 224 | 256 | 8.82415 | 14.21462 | 21.19804 | 9.49807 | 17.72921 | 32.96305 |
+| HRNet_W40_C | 224 | 256 | 11.4229 | 19.1595 | 30.47984 | 12.12202 | 25.68184 | 48.90623 |
+| HRNet_W44_C | 224 | 256 | 12.25778 | 22.75456 | 32.61275 | 13.19858 | 32.25202 | 59.09871 |
+| HRNet_W48_C | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 |
+| HRNet_W64_C | 224 | 256 | 15.10428 | 27.68901 | 40.4198 | 17.57527 | 47.9533 | 97.11228 |
diff --git a/docs/zh_CN/models/Inception.md b/docs/zh_CN/models/Inception.md
index 8c9333656b1e78199b1f29feff17ef5d15c593b8..b85c2bf1b5068936daa3091215540b866d4d31b3 100644
--- a/docs/zh_CN/models/Inception.md
+++ b/docs/zh_CN/models/Inception.md
@@ -9,13 +9,15 @@ Xception 是 Google 继 Inception 后提出的对 InceptionV3 的另一种改进
InceptionV4是2016年由Google设计的新的神经网络,当时残差结构风靡一时,但是作者认为仅使用Inception 结构也可以达到很高的性能。InceptionV4使用了更多的Inception module,在ImageNet上的精度再创新高。
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
-![](../../images/models/Inception.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png)
-![](../../images/models/Inception.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png)
-![](../../images/models/Inception.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.Inception.png)
上图反映了Xception系列和InceptionV4的精度和其他指标的关系。其中Xception_deeplab与论文结构保持一致,Xception是PaddleClas的改进模型,在预测速度基本不变的情况下,精度提升约0.6%。关于该改进模型的详细介绍正在持续更新中,敬请期待。
@@ -35,14 +37,28 @@ InceptionV4是2016年由Google设计的新的神经网络,当时残差结构
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|------------------------|-----------|-------------------|--------------------------|
| GoogLeNet | 224 | 256 | 1.807 |
| Xception41 | 299 | 320 | 3.972 |
-| Xception41
_deeplab | 299 | 320 | 4.408 |
+| Xception41_
deeplab | 299 | 320 | 4.408 |
| Xception65 | 299 | 320 | 6.174 |
-| Xception65
_deeplab | 299 | 320 | 6.464 |
+| Xception65_
deeplab | 299 | 320 | 6.464 |
| Xception71 | 299 | 320 | 6.782 |
| InceptionV4 | 299 | 320 | 11.141 |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| GoogLeNet | 299 | 320 | 1.75451 | 3.39931 | 4.71909 | 1.88038 | 4.48882 | 6.94035 |
+| Xception41 | 299 | 320 | 2.91192 | 7.86878 | 15.53685 | 4.96939 | 17.01361 | 32.67831 |
+| Xception41_
deeplab | 299 | 320 | 2.85934 | 7.2075 | 14.01406 | 5.33541 | 17.55938 | 33.76232 |
+| Xception65 | 299 | 320 | 4.30126 | 11.58371 | 23.22213 | 7.26158 | 25.88778 | 53.45426 |
+| Xception65_
deeplab | 299 | 320 | 4.06803 | 9.72694 | 19.477 | 7.60208 | 26.03699 | 54.74724 |
+| Xception71 | 299 | 320 | 4.80889 | 13.5624 | 27.18822 | 8.72457 | 31.55549 | 69.31018 |
+| InceptionV4 | 299 | 320 | 9.50821 | 13.72104 | 20.27447 | 12.99342 | 25.23416 | 43.56121 |
diff --git a/docs/zh_CN/models/Mobile.md b/docs/zh_CN/models/Mobile.md
index 6cf4ed90241ca959e7c5b66f9a14a8c415e7dd87..3c0ebe37693b5564fbc654d99948d69b04b97a85 100644
--- a/docs/zh_CN/models/Mobile.md
+++ b/docs/zh_CN/models/Mobile.md
@@ -8,10 +8,16 @@ MobileNetV2是Google继MobileNetV1提出的一种轻量级网络。相比MobileN
ShuffleNet系列网络是旷视提出的轻量化网络结构,到目前为止,该系列网络一共有两种典型的结构,即ShuffleNetV1与ShuffleNetV2。ShuffleNet中的Channel Shuffle操作可以将组间的信息进行交换,并且可以实现端到端的训练。在ShuffleNetV2的论文中,作者提出了设计轻量级网络的四大准则,并且根据四大准则与ShuffleNetV1的不足,设计了ShuffleNetV2网络。
MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络,为了进一步提升效果,将relu和sigmoid激活函数分别替换为hard_swish与hard_sigmoid激活函数,同时引入了一些专门减小网络计算量的改进策略。
+
![](../../images/models/mobile_arm_top1.png)
+
![](../../images/models/mobile_arm_storage.png)
-![](../../images/models/mobile_trt.png.flops.png)
-![](../../images/models/mobile_trt.png.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png)
+
+
目前PaddleClas开源的的移动端系列的预训练模型一共有32个,其指标如图所示。从图片可以看出,越新的轻量级模型往往有更优的表现,MobileNetV3代表了目前最新的轻量级神经网络结构。在MobileNetV3中,作者为了获得更高的精度,在global-avg-pooling后使用了1x1的卷积。该操作大幅提升了参数量但对计算量影响不大,所以如果从存储角度评价模型的优异程度,MobileNetV3优势不是很大,但由于其更小的计算量,使得其有更快的推理速度。此外,我们模型库中的ssld蒸馏模型表现优异,从各个考量角度下,都刷新了当前轻量级模型的精度。由于MobileNetV3模型结构复杂,分支较多,对GPU并不友好,GPU预测速度不如MobileNetV1。
@@ -53,9 +59,9 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
| ShuffleNetV2_swish | 0.700 | 0.892 | | | 0.290 | 2.260 |
-## CPU预测速度和存储大小
+## 基于SD855的预测速度和存储大小
-| Models | batch_size=1(ms) | Storage Size(M) |
+| Models | Batch Size=1(ms) | Storage Size(M) |
|:--:|:--:|:--:|
| MobileNetV1_x0_25 | 3.220 | 1.900 |
| MobileNetV1_x0_5 | 9.580 | 5.200 |
@@ -89,3 +95,40 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络
| ShuffleNetV2_x1_5 | 19.352 | 14.000 |
| ShuffleNetV2_x2_0 | 34.770 | 28.000 |
| ShuffleNetV2_swish | 16.023 | 9.100 |
+
+
+## 基于T4 GPU的预测速度
+
+| Models | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
+| MobileNetV1_x0_25 | 0.68422 | 1.13021 | 1.72095 | 0.67274 | 1.226 | 1.84096 |
+| MobileNetV1_x0_5 | 0.69326 | 1.09027 | 1.84746 | 0.69947 | 1.43045 | 2.39353 |
+| MobileNetV1_x0_75 | 0.6793 | 1.29524 | 2.15495 | 0.79844 | 1.86205 | 3.064 |
+| MobileNetV1 | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
+| MobileNetV1_ssld | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 |
+| MobileNetV2_x0_25 | 2.85399 | 3.62405 | 4.29952 | 2.81989 | 3.52695 | 4.2432 |
+| MobileNetV2_x0_5 | 2.84258 | 3.1511 | 4.10267 | 2.80264 | 3.65284 | 4.31737 |
+| MobileNetV2_x0_75 | 2.82183 | 3.27622 | 4.98161 | 2.86538 | 3.55198 | 5.10678 |
+| MobileNetV2 | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
+| MobileNetV2_x1_5 | 2.81852 | 4.87434 | 8.97934 | 2.79398 | 5.30149 | 9.30899 |
+| MobileNetV2_x2_0 | 3.65197 | 6.32329 | 11.644 | 3.29788 | 7.08644 | 12.45375 |
+| MobileNetV2_ssld | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 |
+| MobileNetV3_large_x1_25 | 2.34387 | 3.16103 | 4.79742 | 2.35117 | 3.44903 | 5.45658 |
+| MobileNetV3_large_x1_0 | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
+| MobileNetV3_large_x0_75 | 2.1058 | 2.61426 | 3.61021 | 2.0006 | 2.56987 | 3.78005 |
+| MobileNetV3_large_x0_5 | 2.06934 | 2.77341 | 3.35313 | 2.11199 | 2.88172 | 3.19029 |
+| MobileNetV3_large_x0_35 | 2.14965 | 2.7868 | 3.36145 | 1.9041 | 2.62951 | 3.26036 |
+| MobileNetV3_small_x1_25 | 2.06817 | 2.90193 | 3.5245 | 2.02916 | 2.91866 | 3.34528 |
+| MobileNetV3_small_x1_0 | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
+| MobileNetV3_small_x0_75 | 1.80617 | 2.64646 | 3.24513 | 1.93697 | 2.64285 | 3.32797 |
+| MobileNetV3_small_x0_5 | 1.95001 | 2.74014 | 3.39485 | 1.88406 | 2.99601 | 3.3908 |
+| MobileNetV3_small_x0_35 | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 |
+| MobileNetV3_large_x1_0_ssld | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 |
+| MobileNetV3_small_x1_0_ssld | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 |
+| ShuffleNetV2 | 1.95064 | 2.15928 | 2.97169 | 1.89436 | 2.26339 | 3.17615 |
+| ShuffleNetV2_x0_25 | 1.43242 | 2.38172 | 2.96768 | 1.48698 | 2.29085 | 2.90284 |
+| ShuffleNetV2_x0_33 | 1.69008 | 2.65706 | 2.97373 | 1.75526 | 2.85557 | 3.09688 |
+| ShuffleNetV2_x0_5 | 1.48073 | 2.28174 | 2.85436 | 1.59055 | 2.18708 | 3.09141 |
+| ShuffleNetV2_x1_5 | 1.51054 | 2.4565 | 3.41738 | 1.45389 | 2.5203 | 3.99872 |
+| ShuffleNetV2_x2_0 | 1.95616 | 2.44751 | 4.19173 | 2.15654 | 3.18247 | 5.46893 |
+| ShuffleNetV2_swish | 2.50213 | 2.92881 | 3.474 | 2.5129 | 2.97422 | 3.69357 |
diff --git a/docs/zh_CN/models/Others.md b/docs/zh_CN/models/Others.md
index 35cabf68d5e2d05ec336d7e5cdfac3d363477971..c24f76652bc5e322df2533bbf8c59889bb420910 100644
--- a/docs/zh_CN/models/Others.md
+++ b/docs/zh_CN/models/Others.md
@@ -27,10 +27,10 @@ DarkNet53是YOLO作者在论文设计的用于目标检测的backbone,该网
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|---------------------------|-----------|-------------------|----------------------|
| AlexNet | 224 | 256 | 1.176 |
| SqueezeNet1_0 | 224 | 256 | 0.860 |
@@ -41,3 +41,20 @@ DarkNet53是YOLO作者在论文设计的用于目标检测的backbone,该网
| VGG19 | 224 | 256 | 3.076 |
| DarkNet53 | 256 | 256 | 3.139 |
| ResNet50_ACNet
_deploy | 224 | 256 | 5.626 |
+
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| AlexNet | 224 | 256 | 1.06447 | 1.70435 | 2.38402 | 1.44993 | 2.46696 | 3.72085 |
+| SqueezeNet1_0 | 224 | 256 | 0.97162 | 2.06719 | 3.67499 | 0.96736 | 2.53221 | 4.54047 |
+| SqueezeNet1_1 | 224 | 256 | 0.81378 | 1.62919 | 2.68044 | 0.76032 | 1.877 | 3.15298 |
+| VGG11 | 224 | 256 | 2.24408 | 4.67794 | 7.6568 | 3.90412 | 9.51147 | 17.14168 |
+| VGG13 | 224 | 256 | 2.58589 | 5.82708 | 10.03591 | 4.64684 | 12.61558 | 23.70015 |
+| VGG16 | 224 | 256 | 3.13237 | 7.19257 | 12.50913 | 5.61769 | 16.40064 | 32.03939 |
+| VGG19 | 224 | 256 | 3.69987 | 8.59168 | 15.07866 | 6.65221 | 20.4334 | 41.55902 |
+| DarkNet53 | 256 | 256 | 3.18101 | 5.88419 | 10.14964 | 4.10829 | 12.1714 | 22.15266 |
+| ResNet50_ACNet | 256 | 256 | 3.89002 | 4.58195 | 9.01095 | 5.33395 | 10.96843 | 18.70368 |
+| ResNet50_ACNet_deploy | 224 | 256 | 2.6823 | 5.944 | 7.16655 | 3.49161 | 7.78374 | 13.94361 |
diff --git a/docs/zh_CN/models/ResNet_and_vd.md b/docs/zh_CN/models/ResNet_and_vd.md
index bc2946e99bc270c084c146808188da53c475ebae..ea045f12ca545ca0cea2229fcfc1993fd50ec77b 100644
--- a/docs/zh_CN/models/ResNet_and_vd.md
+++ b/docs/zh_CN/models/ResNet_and_vd.md
@@ -10,18 +10,19 @@ ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠
其中,ResNet50_vd_v2与ResNet50_vd_ssld采用了知识蒸馏,保证模型结构不变的情况下,进一步提升了模型的精度,具体地,ResNet50_vd_v2的teacher模型是ResNet152_vd(top1准确率80.59%),数据选用的是ImageNet-1k的训练集,ResNet50_vd_ssld的teacher模型是ResNeXt101_32x16d_wsl(top1准确率84.2%),数据选用结合了ImageNet-1k的训练集和ImageNet-22k挖掘的400万数据。知识蒸馏的具体方法正在持续更新中。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png)
-![](../../images/models/ResNet.png.flops.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png)
-![](../../images/models/ResNet.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.ResNet.png)
-![](../../images/models/ResNet.png.fp32.png)
通过上述曲线可以看出,层数越多,准确率越高,但是相应的参数量、计算量和延时都会增加。ResNet50_vd_ssld通过用更强的teacher和更多的数据,将其在ImageNet-1k上的验证集top-1精度进一步提高,达到了82.39%,刷新了ResNet50系列模型的精度。
-**注意**:所有模型在预测时,图像的crop_size设置为224,resize_short_size设置为256。
## 精度、FLOPS和参数量
@@ -46,9 +47,9 @@ ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|------------------|-----------|-------------------|--------------------------|
| ResNet18 | 224 | 256 | 1.499 |
| ResNet18_vd | 224 | 256 | 1.603 |
@@ -65,3 +66,24 @@ ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠
| ResNet200_vd | 224 | 256 | 8.885 |
| ResNet50_vd_ssld | 224 | 256 | 3.165 |
| ResNet101_vd_ssld | 224 | 256 | 5.252 |
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| ResNet18 | 224 | 256 | 1.3568 | 2.5225 | 3.61904 | 1.45606 | 3.56305 | 6.28798 |
+| ResNet18_vd | 224 | 256 | 1.39593 | 2.69063 | 3.88267 | 1.54557 | 3.85363 | 6.88121 |
+| ResNet34 | 224 | 256 | 2.23092 | 4.10205 | 5.54904 | 2.34957 | 5.89821 | 10.73451 |
+| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 |
+| ResNet50 | 224 | 256 | 2.63824 | 4.63802 | 7.02444 | 3.47712 | 7.84421 | 13.90633 |
+| ResNet50_vc | 224 | 256 | 2.67064 | 4.72372 | 7.17204 | 3.52346 | 8.10725 | 14.45577 |
+| ResNet50_vd | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
+| ResNet50_vd_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
+| ResNet101 | 224 | 256 | 5.04037 | 7.73673 | 10.8936 | 6.07125 | 13.40573 | 24.3597 |
+| ResNet101_vd | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |
+| ResNet152 | 224 | 256 | 7.28665 | 10.62001 | 14.90317 | 8.50198 | 19.17073 | 35.78384 |
+| ResNet152_vd | 224 | 256 | 7.29127 | 10.86137 | 15.32444 | 8.54376 | 19.52157 | 36.64445 |
+| ResNet200_vd | 224 | 256 | 9.36026 | 13.5474 | 19.0725 | 10.80619 | 25.01731 | 48.81399 |
+| ResNet50_vd_ssld | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 |
+| ResNet101_vd_ssld | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 |
diff --git a/docs/zh_CN/models/SEResNext_and_Res2Net.md b/docs/zh_CN/models/SEResNext_and_Res2Net.md
index 90955354aece52a3770c57c6d387c4bdf8238453..1a8c125ee931a2bc036834d122d529c26264bb66 100644
--- a/docs/zh_CN/models/SEResNext_and_Res2Net.md
+++ b/docs/zh_CN/models/SEResNext_and_Res2Net.md
@@ -7,18 +7,20 @@ SENet是2017年ImageNet分类比赛的冠军方案,其提出了一个全新的
Res2Net是2019年提出的一种全新的对ResNet的改进方案,该方案可以和现有其他优秀模块轻松整合,在不增加计算负载量的情况下,在ImageNet、CIFAR-100等数据集上的测试性能超过了ResNet。Res2Net结构简单,性能优越,进一步探索了CNN在更细粒度级别的多尺度表示能力。Res2Net揭示了一个新的提升模型精度的维度,即scale,其是除了深度、宽度和基数的现有维度之外另外一个必不可少的更有效的因素。该网络在其他视觉任务如目标检测、图像分割等也有相当不错的表现。
-该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。
+该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。
-![](../../images/models/SeResNeXt.png.flops.png)
-![](../../images/models/SeResNeXt.png.params.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png)
-![](../../images/models/SeResNeXt.png.fp32.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png)
+
+![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png)
+
+![](../../images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png)
-目前PaddleClas开源的这三类的预训练模型一共有24个,其指标如图所示,从图中可以看出,在同样Flops和Params下,改进版的模型往往有更高的精度,但是推理速度往往不如ResNet系列。另一方面,Res2Net表现也较为优秀,相比ResNeXt中的group操作、SEResNet中的SE结构操作,Res2Net在相同Flops、Params和推理速度下往往精度更佳。
+目前PaddleClas开源的这三类的预训练模型一共有24个,其指标如图所示,从图中可以看出,在同样Flops和Params下,改进版的模型往往有更高的精度,但是推理速度往往不如ResNet系列。另一方面,Res2Net表现也较为优秀,相比ResNeXt中的group操作、SEResNet中的SE结构操作,Res2Net在相同Flops、Params和推理速度下往往精度更佳。
-**注意**:所有模型在预测时,图像的crop_size设置为224,resize_short_size设置为256。
## 精度、FLOPS和参数量
@@ -52,9 +54,9 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案,该方案可
-## FP32预测速度
+## 基于V100 GPU的预测速度
-| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) |
+| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) |
|-----------------------|-----------|-------------------|--------------------------|
| Res2Net50_26w_4s | 224 | 256 | 4.148 |
| Res2Net50_vd_26w_4s | 224 | 256 | 4.172 |
@@ -80,3 +82,33 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案,该方案可
| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.011 |
| SE_ResNeXt101_32x4d | 224 | 256 | 19.204 |
| SENet154_vd | 224 | 256 | 50.406 |
+
+
+## 基于T4 GPU的预测速度
+
+| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) |
+|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|
+| Res2Net50_26w_4s | 224 | 256 | 3.56067 | 6.61827 | 11.41566 | 4.47188 | 9.65722 | 17.54535 |
+| Res2Net50_vd_26w_4s | 224 | 256 | 3.69221 | 6.94419 | 11.92441 | 4.52712 | 9.93247 | 18.16928 |
+| Res2Net50_14w_8s | 224 | 256 | 4.45745 | 7.69847 | 12.30935 | 5.4026 | 10.60273 | 18.01234 |
+| Res2Net101_vd_26w_4s | 224 | 256 | 6.53122 | 10.81895 | 18.94395 | 8.08729 | 17.31208 | 31.95762 |
+| Res2Net200_vd_26w_4s | 224 | 256 | 11.66671 | 18.93953 | 33.19188 | 14.67806 | 32.35032 | 63.65899 |
+| ResNeXt50_32x4d | 224 | 256 | 7.61087 | 8.88918 | 12.99674 | 7.56327 | 10.6134 | 18.46915 |
+| ResNeXt50_vd_32x4d | 224 | 256 | 7.69065 | 8.94014 | 13.4088 | 7.62044 | 11.03385 | 19.15339 |
+| ResNeXt50_64x4d | 224 | 256 | 13.78688 | 15.84655 | 21.79537 | 13.80962 | 18.4712 | 33.49843 |
+| ResNeXt50_vd_64x4d | 224 | 256 | 13.79538 | 15.22201 | 22.27045 | 13.94449 | 18.88759 | 34.28889 |
+| ResNeXt101_32x4d | 224 | 256 | 16.59777 | 17.93153 | 21.36541 | 16.21503 | 19.96568 | 33.76831 |
+| ResNeXt101_vd_32x4d | 224 | 256 | 16.36909 | 17.45681 | 22.10216 | 16.28103 | 20.25611 | 34.37152 |
+| ResNeXt101_64x4d | 224 | 256 | 30.12355 | 32.46823 | 38.41901 | 30.4788 | 36.29801 | 68.85559 |
+| ResNeXt101_vd_64x4d | 224 | 256 | 30.34022 | 32.27869 | 38.72523 | 30.40456 | 36.77324 | 69.66021 |
+| ResNeXt152_32x4d | 224 | 256 | 25.26417 | 26.57001 | 30.67834 | 24.86299 | 29.36764 | 52.09426 |
+| ResNeXt152_vd_32x4d | 224 | 256 | 25.11196 | 26.70515 | 31.72636 | 25.03258 | 30.08987 | 52.64429 |
+| ResNeXt152_64x4d | 224 | 256 | 46.58293 | 48.34563 | 56.97961 | 46.7564 | 56.34108 | 106.11736 |
+| ResNeXt152_vd_64x4d | 224 | 256 | 47.68447 | 48.91406 | 57.29329 | 47.18638 | 57.16257 | 107.26288 |
+| SE_ResNet18_vd | 224 | 256 | 1.61823 | 3.1391 | 4.60282 | 1.7691 | 4.19877 | 7.5331 |
+| SE_ResNet34_vd | 224 | 256 | 2.67518 | 5.04694 | 7.18946 | 2.88559 | 7.03291 | 12.73502 |
+| SE_ResNet50_vd | 224 | 256 | 3.65394 | 7.568 | 12.52793 | 4.28393 | 10.38846 | 18.33154 |
+| SE_ResNeXt50_32x4d | 224 | 256 | 9.06957 | 11.37898 | 18.86282 | 8.74121 | 13.563 | 23.01954 |
+| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.25016 | 11.85045 | 25.57004 | 9.17134 | 14.76192 | 19.914 |
+| SE_ResNeXt101_32x4d | 224 | 256 | 19.34455 | 20.6104 | 32.20432 | 18.82604 | 25.31814 | 41.97758 |
+| SENet154_vd | 224 | 256 | 49.85733 | 54.37267 | 74.70447 | 53.79794 | 66.31684 | 121.59885 |
diff --git a/docs/zh_CN/models/models_intro.md b/docs/zh_CN/models/models_intro.md
index 96c4474e0ff0eb078d9d5c3fd4ff17622eb320df..ff309b3c599777528f495d4fe3df8c05b12ed9be 100644
--- a/docs/zh_CN/models/models_intro.md
+++ b/docs/zh_CN/models/models_intro.md
@@ -23,7 +23,10 @@ python tools/infer/predict.py \
--batch_size=1
```
-![](../../images/models/main_fps_top1.png)
+![](../../images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png)
+
+![](../../images/models/V100_benchmark/v100.fp32.bs1.main_fps_top1_s.jpg)
+
![](../../images/models/mobile_arm_top1.png)
diff --git a/docs/zh_CN/tutorials/getting_started.md b/docs/zh_CN/tutorials/getting_started.md
index 11bea8411362ac0b4122d9e6da8d9f9c7d24b54e..8790faf9c037d9f5b7902eff96eabc922759bba8 100644
--- a/docs/zh_CN/tutorials/getting_started.md
+++ b/docs/zh_CN/tutorials/getting_started.md
@@ -41,7 +41,8 @@ python -m paddle.distributed.launch \
--selected_gpus="0,1,2,3" \
tools/train.py \
-c ./configs/ResNet/ResNet50_vd.yaml \
- -o use_mix=1
+ -o use_mix=1 \
+ --vdl_dir=./scalar/
```
@@ -53,6 +54,13 @@ epoch:0 train step:522 loss:1.6330 lr:0.100000 elapse:0.210
也可以直接修改模型对应的配置文件更新配置。具体配置参数参考[配置文档](config.md)。
+训练期间可以通过VisualDL实时观察loss变化,启动命令如下:
+
+```bash
+visualdl --logdir ./scalar --host --port
+
+```
+
### 2.2 模型微调
diff --git a/docs/zh_CN/update_history.md b/docs/zh_CN/update_history.md
index a3ba148b2be262e920e8a8f06c87c6b429b30413..b2ab286109354b9a75defc3ec5b29fbf85596349 100644
--- a/docs/zh_CN/update_history.md
+++ b/docs/zh_CN/update_history.md
@@ -1,3 +1,12 @@
# 更新日志
-* 2020.04.10: 第一次提交
+* 2020.05.17
+ * 添加混合精度训练。
+
+* 2020.05.09
+ * 添加Paddle Serving使用文档。
+ * 添加Paddle-Lite使用文档。
+ * 添加T4 GPU的FP32/FP16预测速度benchmark。
+
+* 2020.04.10:
+ * 第一次提交。
diff --git a/ppcls/data/reader.py b/ppcls/data/reader.py
index 5bf83c2170319fa13c6c86cddc42b04ec48da7e7..c1428a36be1010318e03991a962d00a63635a757 100755
--- a/ppcls/data/reader.py
+++ b/ppcls/data/reader.py
@@ -1,27 +1,26 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
-import cv2
import numpy as np
+import imghdr
import os
import signal
-import paddle
+from paddle.fluid.io import multiprocess_reader
from . import imaug
from .imaug import transform
-from .imaug import MixupOperator
from ppcls.utils import logger
trainers_num = int(os.environ.get('PADDLE_TRAINERS_NUM', 1))
@@ -35,7 +34,7 @@ class ModeException(Exception):
def __init__(self, message='', mode=''):
message += "\nOnly the following 3 modes are supported: " \
- "train, valid, test. Given mode is {}".format(mode)
+ "train, valid, test. Given mode is {}".format(mode)
super(ModeException, self).__init__(message)
@@ -46,10 +45,10 @@ class SampleNumException(Exception):
def __init__(self, message='', sample_num=0, batch_size=1):
message += "\nError: The number of the whole data ({}) " \
- "is smaller than the batch_size ({}), and drop_last " \
- "is turnning on, so nothing will feed in program, " \
- "Terminated now. Please reset batch_size to a smaller " \
- "number or feed more data!".format(sample_num, batch_size)
+ "is smaller than the batch_size ({}), and drop_last " \
+ "is turnning on, so nothing will feed in program, " \
+ "Terminated now. Please reset batch_size to a smaller " \
+ "number or feed more data!".format(sample_num, batch_size)
super(SampleNumException, self).__init__(message)
@@ -80,12 +79,12 @@ def check_params(params):
data_dir = params.get('data_dir', '')
assert os.path.isdir(data_dir), \
- "{} doesn't exist, please check datadir path".format(data_dir)
+ "{} doesn't exist, please check datadir path".format(data_dir)
if params['mode'] != 'test':
file_list = params.get('file_list', '')
assert os.path.isfile(file_list), \
- "{} doesn't exist, please check file list path".format(file_list)
+ "{} doesn't exist, please check file list path".format(file_list)
def create_file_list(params):
@@ -176,8 +175,8 @@ def partial_reader(params, full_lines, part_id=0, part_num=1):
part_id(int): part index of the current partial data
part_num(int): part num of the dataset
"""
- assert part_id < part_num, ("part_num: {} should be larger " \
- "than part_id: {}".format(part_num, part_id))
+ assert part_id < part_num, ("part_num: {} should be larger "
+ "than part_id: {}".format(part_num, part_id))
full_lines = full_lines[part_id::part_num]
@@ -187,8 +186,9 @@ def partial_reader(params, full_lines, part_id=0, part_num=1):
def reader():
ops = create_operators(params['transforms'])
+ delimiter = params.get('delimiter', ' ')
for line in full_lines:
- img_path, label = line.split()
+ img_path, label = line.split(delimiter)
img_path = os.path.join(params['data_dir'], img_path)
with open(img_path, 'rb') as f:
img = f.read()
@@ -216,11 +216,11 @@ def mp_reader(params):
for part_id in range(part_num):
readers.append(partial_reader(params, full_lines, part_id, part_num))
- return paddle.reader.multiprocess_reader(readers, use_pipe=False)
+ return multiprocess_reader(readers, use_pipe=False)
def term_mp(sig_num, frame):
- """ kill all child processes
+ """ kill all child processes
"""
pid = os.getpid()
pgid = os.getpgid(os.getpid())
diff --git a/ppcls/modeling/architectures/__init__.py b/ppcls/modeling/architectures/__init__.py
index b6b69a370b4bb56752715f6faed51501609609b5..ac57a786aa16dcdda9c418a130fd4423c721e421 100644
--- a/ppcls/modeling/architectures/__init__.py
+++ b/ppcls/modeling/architectures/__init__.py
@@ -1,16 +1,16 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
from .alexnet import AlexNet
from .mobilenet_v1 import MobileNetV1_x0_25, MobileNetV1_x0_5, MobileNetV1_x1_0, MobileNetV1_x0_75, MobileNetV1
@@ -45,3 +45,5 @@ from .resnet_acnet import ResNet18_ACNet, ResNet34_ACNet, ResNet50_ACNet, ResNet
# distillation model
from .distillation_models import ResNet50_vd_distill_MobileNetV3_large_x1_0, ResNeXt101_32x16d_wsl_distill_ResNet50_vd
+
+from .csp_resnet import CSPResNet50_leaky
\ No newline at end of file
diff --git a/ppcls/modeling/architectures/csp_resnet.py b/ppcls/modeling/architectures/csp_resnet.py
new file mode 100644
index 0000000000000000000000000000000000000000..b1be8d25ccc97292c9abf18b6a6d3730bc69a429
--- /dev/null
+++ b/ppcls/modeling/architectures/csp_resnet.py
@@ -0,0 +1,258 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+
+import math
+
+import paddle.fluid as fluid
+from paddle.fluid.param_attr import ParamAttr
+
+__all__ = [
+ "CSPResNet50_leaky", "CSPResNet50_mish", "CSPResNet101_leaky",
+ "CSPResNet101_mish"
+]
+
+
+class CSPResNet():
+ def __init__(self, layers=50, act="leaky_relu"):
+ self.layers = layers
+ self.act = act
+
+ def net(self, input, class_dim=1000, data_format="NCHW"):
+ layers = self.layers
+ supported_layers = [50, 101]
+ assert layers in supported_layers, \
+ "supported layers are {} but input layer is {}".format(
+ supported_layers, layers)
+
+ if layers == 50:
+ depth = [3, 3, 5, 2]
+ elif layers == 101:
+ depth = [3, 3, 22, 2]
+
+ num_filters = [64, 128, 256, 512]
+
+ conv = self.conv_bn_layer(
+ input=input,
+ num_filters=64,
+ filter_size=7,
+ stride=2,
+ act=self.act,
+ name="conv1",
+ data_format=data_format)
+ conv = fluid.layers.pool2d(
+ input=conv,
+ pool_size=2,
+ pool_stride=2,
+ pool_padding=0,
+ pool_type='max',
+ data_format=data_format)
+
+ for block in range(len(depth)):
+ conv_name = "res" + str(block + 2) + chr(97)
+ if block != 0:
+ conv = self.conv_bn_layer(
+ input=conv,
+ num_filters=num_filters[block],
+ filter_size=3,
+ stride=2,
+ act=self.act,
+ name=conv_name + "_downsample",
+ data_format=data_format)
+
+ # split
+ left = conv
+ right = conv
+ if block == 0:
+ ch = num_filters[block]
+ else:
+ ch = num_filters[block] * 2
+ right = self.conv_bn_layer(
+ input=right,
+ num_filters=ch,
+ filter_size=1,
+ act=self.act,
+ name=conv_name + "_right_first_route",
+ data_format=data_format)
+
+ for i in range(depth[block]):
+ conv_name = "res" + str(block + 2) + chr(97 + i)
+
+ right = self.bottleneck_block(
+ input=right,
+ num_filters=num_filters[block],
+ stride=1,
+ name=conv_name,
+ data_format=data_format)
+
+ # route
+ left = self.conv_bn_layer(
+ input=left,
+ num_filters=num_filters[block] * 2,
+ filter_size=1,
+ act=self.act,
+ name=conv_name + "_left_route",
+ data_format=data_format)
+ right = self.conv_bn_layer(
+ input=right,
+ num_filters=num_filters[block] * 2,
+ filter_size=1,
+ act=self.act,
+ name=conv_name + "_right_route",
+ data_format=data_format)
+ conv = fluid.layers.concat([left, right], axis=1)
+
+ conv = self.conv_bn_layer(
+ input=conv,
+ num_filters=num_filters[block] * 2,
+ filter_size=1,
+ stride=1,
+ act=self.act,
+ name=conv_name + "_merged_transition",
+ data_format=data_format)
+
+ pool = fluid.layers.pool2d(
+ input=conv,
+ pool_type='avg',
+ global_pooling=True,
+ data_format=data_format)
+ stdv = 1.0 / math.sqrt(pool.shape[1] * 1.0)
+ out = fluid.layers.fc(
+ input=pool,
+ size=class_dim,
+ param_attr=fluid.param_attr.ParamAttr(
+ name="fc_0.w_0",
+ initializer=fluid.initializer.Uniform(-stdv, stdv)),
+ bias_attr=ParamAttr(name="fc_0.b_0"))
+ return out
+
+ def conv_bn_layer(self,
+ input,
+ num_filters,
+ filter_size,
+ stride=1,
+ groups=1,
+ act=None,
+ name=None,
+ data_format='NCHW'):
+ conv = fluid.layers.conv2d(
+ input=input,
+ num_filters=num_filters,
+ filter_size=filter_size,
+ stride=stride,
+ padding=(filter_size - 1) // 2,
+ groups=groups,
+ act=None,
+ param_attr=ParamAttr(name=name + "_weights"),
+ bias_attr=False,
+ name=name + '.conv2d.output.1',
+ data_format=data_format)
+
+ if name == "conv1":
+ bn_name = "bn_" + name
+ else:
+ bn_name = "bn" + name[3:]
+ bn = fluid.layers.batch_norm(
+ input=conv,
+ act=None,
+ name=bn_name + '.output.1',
+ param_attr=ParamAttr(name=bn_name + '_scale'),
+ bias_attr=ParamAttr(bn_name + '_offset'),
+ moving_mean_name=bn_name + '_mean',
+ moving_variance_name=bn_name + '_variance',
+ data_layout=data_format)
+ if act == "relu":
+ bn = fluid.layers.relu(bn)
+ elif act == "leaky_relu":
+ bn = fluid.layers.leaky_relu(bn)
+ elif act == "mish":
+ bn = self._mish(bn)
+ return bn
+
+ def _mish(self, input):
+ return input * fluid.layers.tanh(self._softplus(input))
+
+ def _softplus(self, input):
+ expf = fluid.layers.exp(fluid.layers.clip(input, -200, 50))
+ return fluid.layers.log(1 + expf)
+
+ def shortcut(self, input, ch_out, stride, is_first, name, data_format):
+ if data_format == 'NCHW':
+ ch_in = input.shape[1]
+ else:
+ ch_in = input.shape[-1]
+ if ch_in != ch_out or stride != 1 or is_first is True:
+ return self.conv_bn_layer(
+ input, ch_out, 1, stride, name=name, data_format=data_format)
+ else:
+ return input
+
+ def bottleneck_block(self, input, num_filters, stride, name, data_format):
+ conv0 = self.conv_bn_layer(
+ input=input,
+ num_filters=num_filters,
+ filter_size=1,
+ act="leaky_relu",
+ name=name + "_branch2a",
+ data_format=data_format)
+ conv1 = self.conv_bn_layer(
+ input=conv0,
+ num_filters=num_filters,
+ filter_size=3,
+ stride=stride,
+ act="leaky_relu",
+ name=name + "_branch2b",
+ data_format=data_format)
+ conv2 = self.conv_bn_layer(
+ input=conv1,
+ num_filters=num_filters * 2,
+ filter_size=1,
+ act=None,
+ name=name + "_branch2c",
+ data_format=data_format)
+
+ short = self.shortcut(
+ input,
+ num_filters * 2,
+ stride,
+ is_first=False,
+ name=name + "_branch1",
+ data_format=data_format)
+
+ ret = short + conv2
+ ret = fluid.layers.leaky_relu(ret, alpha=0.1)
+ return ret
+
+
+def CSPResNet50_leaky():
+ model = CSPResNet(layers=50, act="leaky_relu")
+ return model
+
+
+def CSPResNet50_mish():
+ model = CSPResNet(layers=50, act="mish")
+ return model
+
+
+def CSPResNet101_leaky():
+ model = CSPResNet(layers=101, act="leaky_relu")
+ return model
+
+
+def CSPResNet101_mish():
+ model = CSPResNet(layers=101, act="mish")
+ return model
diff --git a/ppcls/modeling/architectures/efficientnet.py b/ppcls/modeling/architectures/efficientnet.py
index 5952d4c9af833c04b10aeeb8f57fde89310e4242..d6bac79bd8674b6bcb315d512fc095577fc6c97a 100644
--- a/ppcls/modeling/architectures/efficientnet.py
+++ b/ppcls/modeling/architectures/efficientnet.py
@@ -192,9 +192,9 @@ class EfficientNet():
if is_test:
return inputs
keep_prob = 1.0 - prob
- random_tensor = keep_prob + \
- fluid.layers.uniform_random_batch_size_like(
- inputs, [-1, 1, 1, 1], min=0., max=1.)
+ inputs_shape = fluid.layers.shape(inputs)
+ random_tensor = keep_prob + fluid.layers.uniform_random(
+ shape=[inputs_shape[0], 1, 1, 1], min=0., max=1.)
binary_tensor = fluid.layers.floor(random_tensor)
output = inputs / keep_prob * binary_tensor
return output
diff --git a/ppcls/modeling/loss.py b/ppcls/modeling/loss.py
index 117ac7cc782a5b5cace7796ab3ef6165a36cae86..c19926e76b01e4ba8d98e1d9931af48a8a33096a 100644
--- a/ppcls/modeling/loss.py
+++ b/ppcls/modeling/loss.py
@@ -1,16 +1,16 @@
-#copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
import paddle.fluid as fluid
@@ -34,12 +34,13 @@ class Loss(object):
def _labelsmoothing(self, target):
if target.shape[-1] != self._class_dim:
- one_hot_target = fluid.layers.one_hot(
- input=target, depth=self._class_dim)
+ one_hot_target = fluid.one_hot(input=target, depth=self._class_dim)
else:
one_hot_target = target
soft_target = fluid.layers.label_smooth(
label=one_hot_target, epsilon=self._epsilon, dtype="float32")
+ soft_target = fluid.layers.reshape(
+ soft_target, shape=[-1, self._class_dim])
return soft_target
def _crossentropy(self, input, target):
diff --git a/ppcls/utils/config.py b/ppcls/utils/config.py
index b1c1be4ef616d504fce196e51d3ea5316e5b8c67..93b11569e7490ba1c7d3576e254dbfabf29f9344 100644
--- a/ppcls/utils/config.py
+++ b/ppcls/utils/config.py
@@ -64,14 +64,14 @@ def print_dict(d, delimiter=0):
placeholder = "-" * 60
for k, v in sorted(d.items()):
if isinstance(v, dict):
- logger.info("{}{} : ".format(delimiter * " ", k))
+ logger.info("{}{} : ".format(delimiter * " ", logger.coloring(k, "HEADER")))
print_dict(v, delimiter + 4)
elif isinstance(v, list) and len(v) >= 1 and isinstance(v[0], dict):
- logger.info("{}{} : ".format(delimiter * " ", k))
+ logger.info("{}{} : ".format(delimiter * " ", logger.coloring(str(k),"HEADER")))
for value in v:
print_dict(value, delimiter + 4)
else:
- logger.info("{}{} : {}".format(delimiter * " ", k, v))
+ logger.info("{}{} : {}".format(delimiter * " ", logger.coloring(k,"HEADER"), logger.coloring(v,"OKGREEN")))
if k.isupper():
logger.info(placeholder)
diff --git a/ppcls/utils/logger.py b/ppcls/utils/logger.py
index 5b4ae2ca0b12cad7b68a4f76603ae8bc4fd60d25..12789c7c893a9d48a189b43dfd251c1a88e45f76 100644
--- a/ppcls/utils/logger.py
+++ b/ppcls/utils/logger.py
@@ -14,15 +14,49 @@
import logging
import os
+import datetime
-logging.basicConfig(level=logging.INFO)
+from imp import reload
+reload(logging)
+
+logging.basicConfig(
+ level=logging.INFO,
+ format="%(asctime)s %(levelname)s: %(message)s",
+ datefmt="%Y-%m-%d %H:%M:%S")
+
+
+def time_zone(sec, fmt):
+ real_time = datetime.datetime.now() + datetime.timedelta(hours=8)
+ return real_time.timetuple()
+
+
+logging.Formatter.converter = time_zone
_logger = logging.getLogger(__name__)
+Color = {
+ 'RED': '\033[31m',
+ 'HEADER': '\033[35m', # deep purple
+ 'PURPLE': '\033[95m', # purple
+ 'OKBLUE': '\033[94m',
+ 'OKGREEN': '\033[92m',
+ 'WARNING': '\033[93m',
+ 'FAIL': '\033[91m',
+ 'ENDC': '\033[0m'
+}
+
+
+def coloring(message, color="OKGREEN"):
+ assert color in Color.keys()
+ if os.environ.get('PADDLECLAS_COLORING', False):
+ return Color[color] + str(message) + Color["ENDC"]
+ else:
+ return message
+
def anti_fleet(log):
"""
- Because of the fucking Fleet, logs will print multi-times.
- So we only display one of them and ignore the others.
+ logs will print multi-times when calling Fleet API.
+ Only display single log and ignore the others.
"""
def wrapper(fmt, *args):
@@ -39,12 +73,23 @@ def info(fmt, *args):
@anti_fleet
def warning(fmt, *args):
- _logger.warning(fmt, *args)
+ _logger.warning(coloring(fmt, "RED"), *args)
@anti_fleet
def error(fmt, *args):
- _logger.error(fmt, *args)
+ _logger.error(coloring(fmt, "FAIL"), *args)
+
+
+def scaler(name, value, step, writer):
+ """
+ This function will draw a scalar curve generated by the visualdl.
+ Usage: Install visualdl: pip3 install visualdl==2.0.0b4
+ and then:
+ visualdl --logdir ./scalar --host 0.0.0.0 --port 8830
+ to preview loss corve in real time.
+ """
+ writer.add_scalar(name, value, step)
def advertise():
@@ -66,12 +111,13 @@ def advertise():
website = "https://github.com/PaddlePaddle/PaddleClas"
AD_LEN = 6 + len(max([copyright, ad, website], key=len))
- info("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format(
- "=" * (AD_LEN + 4),
- "=={}==".format(copyright.center(AD_LEN)),
- "=" * (AD_LEN + 4),
- "=={}==".format(' ' * AD_LEN),
- "=={}==".format(ad.center(AD_LEN)),
- "=={}==".format(' ' * AD_LEN),
- "=={}==".format(website.center(AD_LEN)),
- "=" * (AD_LEN + 4), ))
+ info(
+ coloring("\n{0}\n{1}\n{2}\n{3}\n{4}\n{5}\n{6}\n{7}\n".format(
+ "=" * (AD_LEN + 4),
+ "=={}==".format(copyright.center(AD_LEN)),
+ "=" * (AD_LEN + 4),
+ "=={}==".format(' ' * AD_LEN),
+ "=={}==".format(ad.center(AD_LEN)),
+ "=={}==".format(' ' * AD_LEN),
+ "=={}==".format(website.center(AD_LEN)),
+ "=" * (AD_LEN + 4), ), "RED"))
diff --git a/ppcls/utils/model_zoo.py b/ppcls/utils/model_zoo.py
index dd65e921f3e258b75c911599ee234359f3fd7c51..d023f4d1fbc310d50ca56fa389632b41485a8174 100644
--- a/ppcls/utils/model_zoo.py
+++ b/ppcls/utils/model_zoo.py
@@ -58,9 +58,9 @@ class RetryError(Exception):
super(RetryError, self).__init__(message)
-def _get_url(architecture):
+def _get_url(architecture, postfix="tar"):
prefix = "https://paddle-imagenet-models-name.bj.bcebos.com/"
- fname = architecture + "_pretrained.tar"
+ fname = architecture + "_pretrained." + postfix
return prefix + fname
@@ -193,13 +193,13 @@ def list_models():
return
-def get(architecture, path, decompress=True):
+def get(architecture, path, decompress=True, postfix="tar"):
"""
Get the pretrained model.
"""
_check_pretrained_name(architecture)
- url = _get_url(architecture)
+ url = _get_url(architecture, postfix=postfix)
fname = _download(url, path)
- if decompress:
+ if postfix == "tar" and decompress:
_decompress(fname)
logger.info("download {} finished ".format(fname))
diff --git a/ppcls/utils/pretrained.list b/ppcls/utils/pretrained.list
index 633cafd921d7390f434b2b5f82dad70129349658..91ae4409f9289b0634b4c6fa95ae3e1d75cc42aa 100644
--- a/ppcls/utils/pretrained.list
+++ b/ppcls/utils/pretrained.list
@@ -116,3 +116,4 @@ VGG16
VGG19
DarkNet53_ImageNet1k
ResNet50_ACNet_deploy
+CSPResNet50_leaky
diff --git a/ppcls/utils/save_load.py b/ppcls/utils/save_load.py
index 673e54304b84bd962486e5bdc61b4ddcc1fa511d..e310166ece236cadffa0fc28d3bef6558b4c43e4 100644
--- a/ppcls/utils/save_load.py
+++ b/ppcls/utils/save_load.py
@@ -1,16 +1,16 @@
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
-#Licensed under the Apache License, Version 2.0 (the "License");
-#you may not use this file except in compliance with the License.
-#You may obtain a copy of the License at
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
-#Unless required by applicable law or agreed to in writing, software
-#distributed under the License is distributed on an "AS IS" BASIS,
-#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-#See the License for the specific language governing permissions and
-#limitations under the License.
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
from __future__ import absolute_import
from __future__ import division
@@ -18,10 +18,10 @@ from __future__ import print_function
import errno
import os
+import re
import shutil
import tempfile
-import paddle
import paddle.fluid as fluid
from ppcls.utils import logger
@@ -46,7 +46,6 @@ def _mkdir_if_not_exist(path):
def _load_state(path):
- print("path: ", path)
if os.path.exists(path + '.pdopt'):
# XXX another hack to ignore the optimizer state
tmp = tempfile.mkdtemp()
@@ -55,12 +54,11 @@ def _load_state(path):
state = fluid.io.load_program_state(dst)
shutil.rmtree(tmp)
else:
- print("path: ", path)
state = fluid.io.load_program_state(path)
return state
-def load_params(exe, prog, path, ignore_params=[]):
+def load_params(exe, prog, path, ignore_params=None):
"""
Load model from the given path.
Args:
@@ -69,13 +67,14 @@ def load_params(exe, prog, path, ignore_params=[]):
path (string): URL string or loca model path.
ignore_params (list): ignore variable to load when finetuning.
It can be specified by finetune_exclude_pretrained_params
- and the usage can refer to docs/advanced_tutorials/TRANSFER_LEARNING.md
+ and the usage can refer to the document
+ docs/advanced_tutorials/TRANSFER_LEARNING.md
"""
if not (os.path.isdir(path) or os.path.exists(path + '.pdparams')):
raise ValueError("Model pretrain path {} does not "
"exists.".format(path))
- logger.info('Loading parameters from {}...'.format(path))
+ logger.info(logger.coloring('Loading parameters from {}...'.format(path), 'HEADER'))
ignore_set = set()
state = _load_state(path)
@@ -101,8 +100,9 @@ def load_params(exe, prog, path, ignore_params=[]):
if len(ignore_set) > 0:
for k in ignore_set:
if k in state:
- logger.warning('variable {} not used'.format(k))
+ logger.warning('variable {} is already excluded automatically'.format(k))
del state[k]
+
fluid.io.set_program_state(prog, state)
@@ -113,7 +113,7 @@ def init_model(config, program, exe):
checkpoints = config.get('checkpoints')
if checkpoints:
fluid.load(program, checkpoints, exe)
- logger.info("Finish initing model from {}".format(checkpoints))
+ logger.info(logger.coloring("Finish initing model from {}".format(checkpoints),"HEADER"))
return
pretrained_model = config.get('pretrained_model')
@@ -122,7 +122,7 @@ def init_model(config, program, exe):
pretrained_model = [pretrained_model]
for pretrain in pretrained_model:
load_params(exe, program, pretrain)
- logger.info("Finish initing model from {}".format(pretrained_model))
+ logger.info(logger.coloring("Finish initing model from {}".format(pretrained_model),"HEADER"))
def save_model(program, model_path, epoch_id, prefix='ppcls'):
@@ -133,4 +133,4 @@ def save_model(program, model_path, epoch_id, prefix='ppcls'):
_mkdir_if_not_exist(model_path)
model_prefix = os.path.join(model_path, prefix)
fluid.save(program, model_prefix)
- logger.info("Already save model in {}".format(model_path))
+ logger.info(logger.coloring("Already save model in {}".format(model_path),"HEADER"))
diff --git a/tools/download.py b/tools/download.py
index d9fe1a8ee04a14b31cf4917e60f9348ae51b8d20..35cf77a725a9790c3cd2804ffd4e6ce1509b39de 100644
--- a/tools/download.py
+++ b/tools/download.py
@@ -24,6 +24,7 @@ def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('-a', '--architecture', type=str, default='ResNet50')
parser.add_argument('-p', '--path', type=str, default='./pretrained/')
+ parser.add_argument('--postfix', type=str, default="tar")
parser.add_argument('-d', '--decompress', type=str2bool, default=True)
parser.add_argument('-l', '--list', type=str2bool, default=False)
@@ -36,7 +37,8 @@ def main():
if args.list:
model_zoo.list_models()
else:
- model_zoo.get(args.architecture, args.path, args.decompress)
+ model_zoo.get(args.architecture, args.path, args.decompress,
+ args.postfix)
if __name__ == '__main__':
diff --git a/tools/export_serving_model.py b/tools/export_serving_model.py
new file mode 100644
index 0000000000000000000000000000000000000000..e6e4472cdbf8dfd1738dede98b5aa61121f8191a
--- /dev/null
+++ b/tools/export_serving_model.py
@@ -0,0 +1,76 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import os
+from ppcls.modeling import architectures
+
+import paddle.fluid as fluid
+import paddle_serving_client.io as serving_io
+
+
+def parse_args():
+ parser = argparse.ArgumentParser()
+ parser.add_argument("-m", "--model", type=str)
+ parser.add_argument("-p", "--pretrained_model", type=str)
+ parser.add_argument("-o", "--output_path", type=str, default="")
+ parser.add_argument("--class_dim", type=int, default=1000)
+ parser.add_argument("--img_size", type=int, default=224)
+
+ return parser.parse_args()
+
+
+def create_input(img_size=224):
+ image = fluid.data(
+ name='image', shape=[None, 3, img_size, img_size], dtype='float32')
+ return image
+
+
+def create_model(args, model, input, class_dim=1000):
+ if args.model == "GoogLeNet":
+ out, _, _ = model.net(input=input, class_dim=class_dim)
+ else:
+ out = model.net(input=input, class_dim=class_dim)
+ out = fluid.layers.softmax(out)
+ return out
+
+
+def main():
+ args = parse_args()
+
+ model = architectures.__dict__[args.model]()
+
+ place = fluid.CPUPlace()
+ exe = fluid.Executor(place)
+
+ startup_prog = fluid.Program()
+ infer_prog = fluid.Program()
+
+ with fluid.program_guard(infer_prog, startup_prog):
+ with fluid.unique_name.guard():
+ image = create_input(args.img_size)
+ out = create_model(args, model, image, class_dim=args.class_dim)
+
+ infer_prog = infer_prog.clone(for_test=True)
+ fluid.load(
+ program=infer_prog, model_path=args.pretrained_model, executor=exe)
+
+ model_path = os.path.join(args.output_path, "ppcls_model")
+ conf_path = os.path.join(args.output_path, "ppcls_client_conf")
+ serving_io.save_model(model_path, conf_path, {"image": image},
+ {"prediction": out}, infer_prog)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/tools/lite/benchmark.sh b/tools/lite/benchmark.sh
new file mode 100644
index 0000000000000000000000000000000000000000..591331e42a4ff7dc184a9873b3aaf05889d2aa02
--- /dev/null
+++ b/tools/lite/benchmark.sh
@@ -0,0 +1,73 @@
+#!/bin/bash
+# ref1: https://github.com/PaddlePaddle/Paddle-Lite/blob/58b2d7dd89/lite/api/benchmark.cc
+# ref2: https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark.sh
+
+set -e
+
+# Check input
+if [ $# -lt 3 ];
+then
+ echo "Input error"
+ echo "Usage:"
+ echo " sh benchmark.sh "
+ echo " sh benchmark.sh "
+ exit
+fi
+
+# Set benchmark params
+ANDROID_DIR=/data/local/tmp
+BENCHMARK_BIN=$1
+MODELS_DIR=$2
+RESULT_FILENAME=$3
+
+WARMUP=10
+REPEATS=30
+IS_RUN_MODEL_OPTIMIZE=false
+IS_RUN_QUANTIZED_MODEL=false
+NUM_THREADS_LIST=(1 2 4)
+MODELS_LIST=$(ls $MODELS_DIR)
+
+# Check input
+if [ $# -gt 3 ];
+then
+ IS_RUN_MODEL_OPTIMIZE=$4
+fi
+
+# Adb push benchmark_bin, models
+adb push $BENCHMARK_BIN $ANDROID_DIR/benchmark_bin
+adb shell chmod +x $ANDROID_DIR/benchmark_bin
+adb push $MODELS_DIR $ANDROID_DIR
+
+# Run benchmark
+adb shell "echo 'PaddleLite Benchmark' > $ANDROID_DIR/$RESULT_FILENAME"
+for threads in ${NUM_THREADS_LIST[@]}; do
+ adb shell "echo Threads=$threads Warmup=$WARMUP Repeats=$REPEATS >> $ANDROID_DIR/$RESULT_FILENAME"
+ for model_name in ${MODELS_LIST[@]}; do
+ echo "Model=$model_name Threads=$threads"
+ if [ "$IS_RUN_MODEL_OPTIMIZE" = true ];
+ then
+ adb shell "$ANDROID_DIR/benchmark_bin \
+ --model_dir=$ANDROID_DIR/${MODELS_DIR}/$model_name \
+ --model_filename=model \
+ --param_filename=params \
+ --warmup=$WARMUP \
+ --repeats=$REPEATS \
+ --threads=$threads \
+ --result_filename=$ANDROID_DIR/$RESULT_FILENAME"
+ else
+ adb shell "$ANDROID_DIR/benchmark_bin \
+ --optimized_model_path=$ANDROID_DIR/${MODELS_DIR}/$model_name \
+ --warmup=$WARMUP \
+ --repeats=$REPEATS \
+ --threads=$threads \
+ --result_filename=$ANDROID_DIR/$RESULT_FILENAME"
+ fi
+ done
+ adb shell "echo >> $ANDROID_DIR/$RESULT_FILENAME"
+done
+
+# Adb pull benchmark result, show result
+adb pull $ANDROID_DIR/$RESULT_FILENAME .
+echo "\n--------------------------------------"
+cat $RESULT_FILENAME
+echo "--------------------------------------"
diff --git a/tools/program.py b/tools/program.py
index c34d6512cb066710e0f18476c007b38dd32adb58..d1cc285e06db9006cd4166b972fb66a274fc4a29 100644
--- a/tools/program.py
+++ b/tools/program.py
@@ -300,6 +300,19 @@ def dist_optimizer(config, optimizer):
return optimizer
+def mixed_precision_optimizer(config, optimizer):
+ use_fp16 = config.get('use_fp16', False)
+ amp_scale_loss = config.get('amp_scale_loss', 1.0)
+ use_dynamic_loss_scaling = config.get('use_dynamic_loss_scaling', False)
+ if use_fp16:
+ optimizer = fluid.contrib.mixed_precision.decorate(
+ optimizer,
+ init_loss_scaling=amp_scale_loss,
+ use_dynamic_loss_scaling=use_dynamic_loss_scaling)
+
+ return optimizer
+
+
def build(config, main_prog, startup_prog, is_train=True):
"""
Build a program using a model and an optimizer
@@ -340,6 +353,8 @@ def build(config, main_prog, startup_prog, is_train=True):
optimizer = create_optimizer(config)
lr = optimizer._global_learning_rate()
fetchs['lr'] = (lr, AverageMeter('lr', 'f', need_avg=False))
+
+ optimizer = mixed_precision_optimizer(config, optimizer)
optimizer = dist_optimizer(config, optimizer)
optimizer.minimize(fetchs['loss'][0])
if config.get('use_ema'):
@@ -378,7 +393,10 @@ def compile(config, program, loss_name=None):
return compiled_program
-def run(dataloader, exe, program, fetchs, epoch=0, mode='train'):
+total_step = 0
+
+
+def run(dataloader, exe, program, fetchs, epoch=0, mode='train', vdl_writer=None):
"""
Feed data to the model and fetch the measures and loss
@@ -405,19 +423,34 @@ def run(dataloader, exe, program, fetchs, epoch=0, mode='train'):
for i, m in enumerate(metrics):
metric_list[i].update(m[0], len(batch[0]))
fetchs_str = ''.join([str(m.value) + ' '
- for m in metric_list] + [batch_time.value])
+ for m in metric_list] + [batch_time.value]) + 's'
+ if vdl_writer:
+ global total_step
+ logger.scaler('loss', metrics[0][0], total_step, vdl_writer)
+ total_step += 1
if mode == 'eval':
logger.info("{:s} step:{:<4d} {:s}s".format(mode, idx, fetchs_str))
else:
- logger.info("epoch:{:<3d} {:s} step:{:<4d} {:s}s".format(
- epoch, mode, idx, fetchs_str))
+ epoch_str = "epoch:{:<3d}".format(epoch)
+ step_str = "{:s} step:{:<4d}".format(mode, idx)
+
+ logger.info("{:s} {:s} {:s}".format(
+ logger.coloring(epoch_str, "HEADER")
+ if idx == 0 else epoch_str,
+ logger.coloring(step_str, "PURPLE"),
+ logger.coloring(fetchs_str, 'OKGREEN')))
end_str = ''.join([str(m.mean) + ' '
- for m in metric_list] + [batch_time.total])
+ for m in metric_list] + [batch_time.total]) + 's'
if mode == 'eval':
logger.info("END {:s} {:s}s".format(mode, end_str))
else:
- logger.info("END epoch:{:<3d} {:s} {:s}s".format(epoch, mode, end_str))
+ end_epoch_str = "END epoch:{:<3d}".format(epoch)
+
+ logger.info("{:s} {:s} {:s}".format(
+ logger.coloring(end_epoch_str, "RED"),
+ logger.coloring(mode, "PURPLE"),
+ logger.coloring(end_str, "OKGREEN")))
# return top1_acc in order to save the best model
if mode == 'valid':
diff --git a/tools/serving/image_http_client.py b/tools/serving/image_http_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..3b92091c659613c83e4423a3f22b0d4d20321f43
--- /dev/null
+++ b/tools/serving/image_http_client.py
@@ -0,0 +1,47 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import requests
+import base64
+import json
+import sys
+import numpy as np
+
+py_version = sys.version_info[0]
+
+
+def predict(image_path, server):
+ if py_version == 2:
+ image = base64.b64encode(open(image_path).read())
+ else:
+ image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
+ req = json.dumps({"feed": [{"image": image}], "fetch": ["prediction"]})
+ r = requests.post(
+ server, data=req, headers={"Content-Type": "application/json"})
+ try:
+ pred = r.json()["result"]["prediction"][0]
+ cls_id = np.argmax(pred)
+ score = pred[cls_id]
+ pred = {"cls_id": cls_id, "score": score}
+ return pred
+ except ValueError:
+ print(r.text)
+ return r
+
+
+if __name__ == "__main__":
+ server = "http://127.0.0.1:{}/image/prediction".format(sys.argv[1])
+ image_file = sys.argv[2]
+ res = predict(image_file, server)
+ print("res:", res)
diff --git a/tools/serving/image_service_cpu.py b/tools/serving/image_service_cpu.py
new file mode 100644
index 0000000000000000000000000000000000000000..92f67d3220670ffa880ff663ed887984325a0723
--- /dev/null
+++ b/tools/serving/image_service_cpu.py
@@ -0,0 +1,60 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import base64
+from paddle_serving_server.web_service import WebService
+import utils
+
+
+class ImageService(WebService):
+ def __init__(self, name):
+ super(ImageService, self).__init__(name=name)
+ self.operators = self.create_operators()
+
+ def create_operators(self):
+ size = 224
+ img_mean = [0.485, 0.456, 0.406]
+ img_std = [0.229, 0.224, 0.225]
+ img_scale = 1.0 / 255.0
+ decode_op = utils.DecodeImage()
+ resize_op = utils.ResizeImage(resize_short=256)
+ crop_op = utils.CropImage(size=(size, size))
+ normalize_op = utils.NormalizeImage(
+ scale=img_scale, mean=img_mean, std=img_std)
+ totensor_op = utils.ToTensor()
+ return [decode_op, resize_op, crop_op, normalize_op, totensor_op]
+
+ def _process_image(self, data, ops):
+ for op in ops:
+ data = op(data)
+ return data
+
+ def preprocess(self, feed={}, fetch=[]):
+ feed_batch = []
+ for ins in feed:
+ if "image" not in ins:
+ raise ("feed data error!")
+ sample = base64.b64decode(ins["image"])
+ img = self._process_image(sample, self.operators)
+ feed_batch.append({"image": img})
+ return feed_batch, fetch
+
+
+image_service = ImageService(name="image")
+image_service.load_model_config(sys.argv[1])
+image_service.prepare_server(
+ workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
+image_service.run_server()
+image_service.run_flask()
diff --git a/tools/serving/image_service_gpu.py b/tools/serving/image_service_gpu.py
new file mode 100644
index 0000000000000000000000000000000000000000..df61cdd60659713ac77176beb8c1ecfad1c8efd8
--- /dev/null
+++ b/tools/serving/image_service_gpu.py
@@ -0,0 +1,62 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import sys
+import base64
+from paddle_serving_server_gpu.web_service import WebService
+
+import utils
+
+
+class ImageService(WebService):
+ def __init__(self, name):
+ super(ImageService, self).__init__(name=name)
+ self.operators = self.create_operators()
+
+ def create_operators(self):
+ size = 224
+ img_mean = [0.485, 0.456, 0.406]
+ img_std = [0.229, 0.224, 0.225]
+ img_scale = 1.0 / 255.0
+ decode_op = utils.DecodeImage()
+ resize_op = utils.ResizeImage(resize_short=256)
+ crop_op = utils.CropImage(size=(size, size))
+ normalize_op = utils.NormalizeImage(
+ scale=img_scale, mean=img_mean, std=img_std)
+ totensor_op = utils.ToTensor()
+ return [decode_op, resize_op, crop_op, normalize_op, totensor_op]
+
+ def _process_image(self, data, ops):
+ for op in ops:
+ data = op(data)
+ return data
+
+ def preprocess(self, feed={}, fetch=[]):
+ feed_batch = []
+ for ins in feed:
+ if "image" not in ins:
+ raise ("feed data error!")
+ sample = base64.b64decode(ins["image"])
+ img = self._process_image(sample, self.operators)
+ feed_batch.append({"image": img})
+ return feed_batch, fetch
+
+
+image_service = ImageService(name="image")
+image_service.load_model_config(sys.argv[1])
+image_service.set_gpus("0")
+image_service.prepare_server(
+ workdir=sys.argv[2], port=int(sys.argv[3]), device="gpu")
+image_service.run_server()
+image_service.run_flask()
diff --git a/tools/serving/utils.py b/tools/serving/utils.py
new file mode 100644
index 0000000000000000000000000000000000000000..6c4a75e1afe2fc1e1710c7e8213f8ac4de8ffcc2
--- /dev/null
+++ b/tools/serving/utils.py
@@ -0,0 +1,84 @@
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import cv2
+import numpy as np
+
+
+class DecodeImage(object):
+ def __init__(self, to_rgb=True):
+ self.to_rgb = to_rgb
+
+ def __call__(self, img):
+ data = np.frombuffer(img, dtype='uint8')
+ img = cv2.imdecode(data, 1)
+ if self.to_rgb:
+ assert img.shape[2] == 3, 'invalid shape of image[%s]' % (
+ img.shape)
+ img = img[:, :, ::-1]
+
+ return img
+
+
+class ResizeImage(object):
+ def __init__(self, resize_short=None):
+ self.resize_short = resize_short
+
+ def __call__(self, img):
+ img_h, img_w = img.shape[:2]
+ percent = float(self.resize_short) / min(img_w, img_h)
+ w = int(round(img_w * percent))
+ h = int(round(img_h * percent))
+ return cv2.resize(img, (w, h))
+
+
+class CropImage(object):
+ def __init__(self, size):
+ if type(size) is int:
+ self.size = (size, size)
+ else:
+ self.size = size
+
+ def __call__(self, img):
+ w, h = self.size
+ img_h, img_w = img.shape[:2]
+ w_start = (img_w - w) // 2
+ h_start = (img_h - h) // 2
+
+ w_end = w_start + w
+ h_end = h_start + h
+ return img[h_start:h_end, w_start:w_end, :]
+
+
+class NormalizeImage(object):
+ def __init__(self, scale=None, mean=None, std=None):
+ self.scale = np.float32(scale if scale is not None else 1.0 / 255.0)
+ mean = mean if mean is not None else [0.485, 0.456, 0.406]
+ std = std if std is not None else [0.229, 0.224, 0.225]
+
+ shape = (1, 1, 3)
+ self.mean = np.array(mean).reshape(shape).astype('float32')
+ self.std = np.array(std).reshape(shape).astype('float32')
+
+ def __call__(self, img):
+ return (img.astype('float32') * self.scale - self.mean) / self.std
+
+
+class ToTensor(object):
+ def __init__(self):
+ pass
+
+ def __call__(self, img):
+ img = img.transpose((2, 0, 1))
+ return img
diff --git a/tools/train.py b/tools/train.py
index b188646a3fce909fa7fc9c1cfe96f36fcfd47050..deb953b16581989b7a48246b4c8b5b8aff3c3e14 100644
--- a/tools/train.py
+++ b/tools/train.py
@@ -19,6 +19,7 @@ from __future__ import print_function
import argparse
import os
+from visualdl import LogWriter
import paddle.fluid as fluid
from paddle.fluid.incubate.fleet.base import role_maker
from paddle.fluid.incubate.fleet.collective import fleet
@@ -38,6 +39,11 @@ def parse_args():
type=str,
default='configs/ResNet/ResNet50.yaml',
help='config file path')
+ parser.add_argument(
+ '--vdl_dir',
+ type=str,
+ default=None,
+ help='VisualDL logging directory for image.')
parser.add_argument(
'-o',
'--override',
@@ -62,7 +68,7 @@ def main(args):
startup_prog = fluid.Program()
train_prog = fluid.Program()
- best_top1_acc_list = (0.0, -1) # (top1_acc, epoch_id)
+ best_top1_acc = 0.0 # best top1 acc record
train_dataloader, train_fetchs = program.build(
config, train_prog, startup_prog, is_train=True)
@@ -91,10 +97,12 @@ def main(args):
compiled_valid_prog = program.compile(config, valid_prog)
compiled_train_prog = fleet.main_program
+ vdl_writer = LogWriter(args.vdl_dir) if args.vdl_dir else None
+
for epoch_id in range(config.epochs):
# 1. train with train dataset
program.run(train_dataloader, exe, compiled_train_prog, train_fetchs,
- epoch_id, 'train')
+ epoch_id, 'train', vdl_writer)
if int(os.getenv("PADDLE_TRAINER_ID", 0)) == 0:
# 2. validate with validate dataset
if config.validate and epoch_id % config.valid_interval == 0:
@@ -109,13 +117,17 @@ def main(args):
top1_acc = program.run(valid_dataloader, exe,
compiled_valid_prog, valid_fetchs,
epoch_id, 'valid')
- if top1_acc > best_top1_acc_list[0]:
- best_top1_acc_list = (top1_acc, epoch_id)
- logger.info("Best top1 acc: {}, in epoch: {}".format(
- *best_top1_acc_list))
- model_path = os.path.join(config.model_save_dir,
- config.ARCHITECTURE["name"])
- save_model(train_prog, model_path, "best_model")
+ if top1_acc > best_top1_acc:
+ best_top1_acc = top1_acc
+ message = "The best top1 acc {:.5f}, in epoch: {:d}".format(
+ best_top1_acc, epoch_id)
+ logger.info("{:s}".format(logger.coloring(message, "RED")))
+ if epoch_id % config.save_interval == 0:
+
+ model_path = os.path.join(config.model_save_dir,
+ config.ARCHITECTURE["name"])
+ save_model(train_prog, model_path,
+ "best_model_in_epoch_" + str(epoch_id))
# 3. save the persistable model
if epoch_id % config.save_interval == 0: