diff --git a/README.md b/README.md index 3124743469116b35424066e2cb7f7cbf5388c9b9..3ddfcacd7a92f371875d9873c4b840c345e0f9c5 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ 飞桨图像分类套件PaddleClas是飞桨为工业界和学术界所准备的一个图像分类任务的工具集,助力使用者训练出更好的视觉模型和应用落地。
- +
## 丰富的模型库 @@ -17,14 +17,14 @@ 基于ImageNet1k分类数据集,PaddleClas提供ResNet、ResNet_vd、Res2Net、HRNet、MobileNetV3等23种系列的分类网络结构的简单介绍、论文指标复现配置,以及在复现过程中的训练技巧。与此同时,也提供了对应的117个图像分类预训练模型,并且基于TensorRT评估了服务器端模型的GPU预测时间,以及在骁龙855(SD855)上评估了移动端模型的CPU预测时间和存储大小。支持的***预训练模型列表、下载地址以及更多信息***请见文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
- +
上图对比了一些最新的面向服务器端应用场景的模型,在使用V100,FP32和TensorRT预测一张图像的时间和其准确率,图中准确率82.4%的ResNet50_vd_ssld和83.7%的ResNet101_vd_ssld,是采用PaddleClas提供的SSLD知识蒸馏方案训练的模型。图中相同颜色和符号的点代表同一系列不同规模的模型。不同模型的简介、FLOPS、Parameters以及详细的GPU预测时间请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。
+src="./docs/images/models/mobile_arm_top1.png" width="700">
上图对比了一些最新的面向移动端应用场景的模型,在骁龙855(SD855)上预测一张图像的时间和其准确率,包括MobileNetV1系列、MobileNetV2系列、MobileNetV3系列和ShuffleNetV2系列。图中准确率79%的MV3_large_x1_0_ssld(M是MobileNet的简称),71.3%的MV3_small_x1_0_ssld、76.74%的MV2_ssld和77.89%的MV1_ssld,是采用PaddleClas提供的SSLD蒸馏方法训练的模型。MV3_large_x1_0_ssld_int8是进一步进行INT8量化的模型。不同模型的简介、FLOPS、Parameters和模型存储大小请参考文档教程中的[**模型库章节**](https://paddleclas.readthedocs.io/zh_CN/latest/models/models_intro.html)。 @@ -41,14 +41,14 @@ src="https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/images/models/m
+src="./docs/images/distillation/distillation_perform_s.jpg" width="700">
以在ImageNet1K蒸馏模型为例,SSLD知识蒸馏方案框架图如下,该方案的核心关键点包括教师模型的选择、loss计算方式、迭代轮数、无标签数据的使用、以及ImageNet1k蒸馏finetune,每部分的详细介绍以及实验介绍请参考文档教程中的[**知识蒸馏章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/distillation/index.html)。
+src="./docs/images/distillation/ppcls_distillation_s.jpg" width="700">
### 数据增广 @@ -57,14 +57,14 @@ src="https://github.com/PaddlePaddle/PaddleClas/blob/master/docs/images/distilla
+src="./docs/images/image_aug/image_aug_samples_s.jpg" width="800">
PaddleClas提供了上述8种数据增广算法的复现和在统一实验环境下的效果评估。下图展示了不同数据增广方式在ResNet50上的表现, 与标准变换相比,采用数据增广,识别准确率最高可以提升1%。每种数据增广方法的详细介绍、对比的实验环境请参考文档教程中的[**数据增广章节**](https://paddleclas.readthedocs.io/zh_CN/latest/advanced_tutorials/image_augmentation/index.html)。
+src="./docs/images/image_aug/main_image_aug_s.jpg" width="600">
## 30分钟玩转PaddleClas @@ -80,7 +80,7 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调(f ### 10万类图像分类预训练模型 在实际应用中,由于训练数据匮乏,往往将ImageNet1K数据集训练的分类模型作为预训练模型,进行图像分类的迁移学习。然而ImageNet1K数据集的类别只有1000种,预训练模型的特征迁移能力有限。因此百度自研了一个有语义体系的、粒度有粗有细的10w级别的Tag体系,通过人工或半监督方式,至今收集到 5500w+图片训练数据;该系统是国内甚至世界范围内最大规模的图片分类体系和训练集合。PaddleClas提供了在该数据集上训练的ResNet50_vd的模型。下表显示了一些实际应用场景中,使用ImageNet预训练模型和上述10万类图像分类预训练模型的效果比对,使用10万类图像分类预训练模型,识别准确率最高可以提升30%。 - + | 数据集 | 数据统计 | ImageNet预训练模型 | 10万类图像分类预训练模型 | |:--:|:--:|:--:|:--:| | 花卉 | class_num:102
train/val:5789/2396 | 0.7779 | 0.9892 | @@ -100,7 +100,7 @@ PaddleClas的安装说明、模型训练、预测、评估以及模型微调(f
+src="./docs/images/det/pssdet.png" width="500">
- TODO diff --git a/docs/images/models/DPN.png.flops.png b/docs/images/models/DPN.png.flops.png deleted file mode 100644 index 72bb96f49812711035ec09ce0d8d44202d17cfcb..0000000000000000000000000000000000000000 Binary files a/docs/images/models/DPN.png.flops.png and /dev/null differ diff --git a/docs/images/models/DPN.png.fp32.png b/docs/images/models/DPN.png.fp32.png deleted file mode 100644 index bcd524c781d305fdacfd5a07886851fadbeeeddc..0000000000000000000000000000000000000000 Binary files a/docs/images/models/DPN.png.fp32.png and /dev/null differ diff --git a/docs/images/models/DPN.png.params.png b/docs/images/models/DPN.png.params.png deleted file mode 100644 index 818f1961bbaf98cd0aff0c5405113f107fa02e54..0000000000000000000000000000000000000000 Binary files a/docs/images/models/DPN.png.params.png and /dev/null differ diff --git a/docs/images/models/EfficientNet.png b/docs/images/models/EfficientNet.png deleted file mode 100644 index 5556481c960432cab6080c243644ab43783ceabb..0000000000000000000000000000000000000000 Binary files a/docs/images/models/EfficientNet.png and /dev/null differ diff --git a/docs/images/models/EfficientNet.png.flops.png b/docs/images/models/EfficientNet.png.flops.png deleted file mode 100644 index dd3c36ced2973133bdd0bb6b300125b25fdcefe4..0000000000000000000000000000000000000000 Binary files a/docs/images/models/EfficientNet.png.flops.png and /dev/null differ diff --git a/docs/images/models/EfficientNet.png.fp32.png b/docs/images/models/EfficientNet.png.fp32.png deleted file mode 100644 index eca753f7d84699cebee13d15484291d96f0a9b6f..0000000000000000000000000000000000000000 Binary files a/docs/images/models/EfficientNet.png.fp32.png and /dev/null differ diff --git a/docs/images/models/EfficientNet.png.params.png b/docs/images/models/EfficientNet.png.params.png deleted file mode 100644 index 2348c55013998bcb2ed2c3edff282493243ee37a..0000000000000000000000000000000000000000 Binary files a/docs/images/models/EfficientNet.png.params.png and /dev/null differ diff --git a/docs/images/models/HRNet.png.flops.png b/docs/images/models/HRNet.png.flops.png deleted file mode 100644 index 5f8ce9cd2c1c8ed9e8fb775bd89a8acd3a8e9402..0000000000000000000000000000000000000000 Binary files a/docs/images/models/HRNet.png.flops.png and /dev/null differ diff --git a/docs/images/models/HRNet.png.fp32.png b/docs/images/models/HRNet.png.fp32.png deleted file mode 100644 index 0e73fb4b57dc374560efa92429a2dd457c73369e..0000000000000000000000000000000000000000 Binary files a/docs/images/models/HRNet.png.fp32.png and /dev/null differ diff --git a/docs/images/models/HRNet.png.params.png b/docs/images/models/HRNet.png.params.png deleted file mode 100644 index e4443a770ac0ca910d1158fe9adaf6dd92e680aa..0000000000000000000000000000000000000000 Binary files a/docs/images/models/HRNet.png.params.png and /dev/null differ diff --git a/docs/images/models/Inception.png.flops.png b/docs/images/models/Inception.png.flops.png deleted file mode 100644 index 589f3931c1feef3e1c0245566cd0c7e0a22782d8..0000000000000000000000000000000000000000 Binary files a/docs/images/models/Inception.png.flops.png and /dev/null differ diff --git a/docs/images/models/Inception.png.fp32.png b/docs/images/models/Inception.png.fp32.png deleted file mode 100644 index b9245800a2d7ca6fad5ed7457e55356d579a81d0..0000000000000000000000000000000000000000 Binary files a/docs/images/models/Inception.png.fp32.png and /dev/null differ diff --git a/docs/images/models/Inception.png.params.png b/docs/images/models/Inception.png.params.png deleted file mode 100644 index 657c4360451b24905b9a1ce170e4da719d35d917..0000000000000000000000000000000000000000 Binary files a/docs/images/models/Inception.png.params.png and /dev/null differ diff --git a/docs/images/models/ResNet.png.flops.png b/docs/images/models/ResNet.png.flops.png deleted file mode 100644 index da1fd2eb359f57fbd545a5436b737fe23df6e891..0000000000000000000000000000000000000000 Binary files a/docs/images/models/ResNet.png.flops.png and /dev/null differ diff --git a/docs/images/models/ResNet.png.fp32.png b/docs/images/models/ResNet.png.fp32.png deleted file mode 100644 index 05020997f2b6eb2c926d2a8948ed69b393b6cd3b..0000000000000000000000000000000000000000 Binary files a/docs/images/models/ResNet.png.fp32.png and /dev/null differ diff --git a/docs/images/models/ResNet.png.params.png b/docs/images/models/ResNet.png.params.png deleted file mode 100644 index 6fcbb69cc1e1e9a3402f2849fe016a73312d9a79..0000000000000000000000000000000000000000 Binary files a/docs/images/models/ResNet.png.params.png and /dev/null differ diff --git a/docs/images/models/SeResNeXt.png.flops.png b/docs/images/models/SeResNeXt.png.flops.png deleted file mode 100644 index 51d6d6497e9cd582ad671a79b9e24bbdc1a9bdae..0000000000000000000000000000000000000000 Binary files a/docs/images/models/SeResNeXt.png.flops.png and /dev/null differ diff --git a/docs/images/models/SeResNeXt.png.fp32.png b/docs/images/models/SeResNeXt.png.fp32.png deleted file mode 100644 index 452488955096f896ecb9dafe07885f666c92d8ad..0000000000000000000000000000000000000000 Binary files a/docs/images/models/SeResNeXt.png.fp32.png and /dev/null differ diff --git a/docs/images/models/SeResNeXt.png.params.png b/docs/images/models/SeResNeXt.png.params.png deleted file mode 100644 index 9898f52fb0be6bdfb22d39a1f4a16c98e20ab510..0000000000000000000000000000000000000000 Binary files a/docs/images/models/SeResNeXt.png.params.png and /dev/null differ diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png new file mode 100644 index 0000000000000000000000000000000000000000..1c87d711ea11f3d662cdf78e34959ddd1f355f76 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png new file mode 100644 index 0000000000000000000000000000000000000000..1eb393903bbf3b4e02b83962b33c0e0a3b4e341a Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.DPN.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png new file mode 100644 index 0000000000000000000000000000000000000000..f21d63cd1a1e24481947875db23f8447af1a65ca Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.HRNet.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png new file mode 100644 index 0000000000000000000000000000000000000000..8095a3c0253170c00d8ae74af4dec25c4f9544eb Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.Inception.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png new file mode 100644 index 0000000000000000000000000000000000000000..53a603ecad2e36df580167e134ef036df14d5596 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.ResNet.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png new file mode 100644 index 0000000000000000000000000000000000000000..99b8a039e0fda22053e7d7cb971d5e83b208ec6b Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png new file mode 100644 index 0000000000000000000000000000000000000000..395a32f5c7e28ed09ee2b7e12e4a3ea2e9094154 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..24aabf8c3fb6607e4bb17f4d4dcc72d733476c48 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png new file mode 100644 index 0000000000000000000000000000000000000000..689e73d31d70cf8566c356142080d79a0f64d6e3 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png new file mode 100644 index 0000000000000000000000000000000000000000..dc3922d2e2347f11b193057de9bbf730489b9cc1 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.DPN.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..deacfaca6a279fed852dfcbf0006dc497191a7a8 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png new file mode 100644 index 0000000000000000000000000000000000000000..7177bbc56b374dd71c66231132ead01c8b141732 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..062ecd79d2fc3ab788212a238b8ca4627b0dd14f Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png new file mode 100644 index 0000000000000000000000000000000000000000..4bb3f76caecbdbd43f04ed62507985476f5cac40 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png new file mode 100644 index 0000000000000000000000000000000000000000..3905f0b38b7cffa856d944d99ef0035cf6d4b489 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.HRNet.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..6fdc94b27130924d0e71a7a6954f239e632a4215 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png new file mode 100644 index 0000000000000000000000000000000000000000..25a5d1648e1cf220a383377a8d9fd5ae3c9eceea Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png new file mode 100644 index 0000000000000000000000000000000000000000..7ef4f339ae2b9414b4cf71a8772cc4c9a92a0ad1 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.Inception.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..755adb5684562c341816ca98856490d270735ac5 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png new file mode 100644 index 0000000000000000000000000000000000000000..44e03fdd20df450573889e5c4eca85cf5b686d9b Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png new file mode 100644 index 0000000000000000000000000000000000000000..a461b9ec281c2340780da6d69e625a0947ffa42e Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.ResNet.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..197522d16401ef6430313d36ac235a7b37e1d7f9 Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png new file mode 100644 index 0000000000000000000000000000000000000000..6943fc056c1cd22b4b4d1767acf72fd94a209d0d Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png new file mode 100644 index 0000000000000000000000000000000000000000..8476efb33b73acd836993a5fec3967122da319ab Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png new file mode 100644 index 0000000000000000000000000000000000000000..965efff5c1a3bd32c9bc2da9c1b3034dc2cd55ef Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.main_fps_top1.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png new file mode 100644 index 0000000000000000000000000000000000000000..8c1be3ae9b32773ca56adac9dc1c15d9532e8f1a Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png differ diff --git a/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png new file mode 100644 index 0000000000000000000000000000000000000000..a41a325b5f5e9e85b86abfae2e62c9e2132fc7ec Binary files /dev/null and b/docs/images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png differ diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png new file mode 100644 index 0000000000000000000000000000000000000000..10f542b7fc989da80af12244cd45662bccfe677b Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.DPN.png differ diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png new file mode 100644 index 0000000000000000000000000000000000000000..0491ff3e6f4795fec2e131a4df31b4a3b102900e Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.EfficientNet.png differ diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png new file mode 100644 index 0000000000000000000000000000000000000000..284088ec5555e2128eab564cf8331532cfe08370 Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.HRNet.png differ diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png new file mode 100644 index 0000000000000000000000000000000000000000..0e0e42e7ccc319bd2c9842c3d99c849b547ac603 Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.Inception.png differ diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png new file mode 100644 index 0000000000000000000000000000000000000000..2332791866c06b9cdcfc576947280de27ba3667c Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.ResNet.png differ diff --git a/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png b/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png new file mode 100644 index 0000000000000000000000000000000000000000..610c846347f008216589ed080d834588d7fedfe0 Binary files /dev/null and b/docs/images/models/V100_benchmark/v100.fp32.bs1.SeResNeXt.png differ diff --git a/docs/images/models/main_fps_top1.png b/docs/images/models/main_fps_top1.png deleted file mode 100755 index a7149d7de16c71dbd3359efdbcee519a486a2278..0000000000000000000000000000000000000000 Binary files a/docs/images/models/main_fps_top1.png and /dev/null differ diff --git a/docs/images/models/mobile_arm_storage.png b/docs/images/models/mobile_arm_storage.png old mode 100755 new mode 100644 index 350fd3ced05e802250a461a747f60c2e5cf1be0c..a0cae065721ac0d193c0709c805a8537cbe7d627 Binary files a/docs/images/models/mobile_arm_storage.png and b/docs/images/models/mobile_arm_storage.png differ diff --git a/docs/images/models/mobile_arm_top1.png b/docs/images/models/mobile_arm_top1.png old mode 100755 new mode 100644 index 37091dd2c91a334c2888d5facec7598ad3219e84..de42eb18065262f12fc0af06408f5602a1bcf194 Binary files a/docs/images/models/mobile_arm_top1.png and b/docs/images/models/mobile_arm_top1.png differ diff --git a/docs/images/models/mobile_arm_top1_s.jpg b/docs/images/models/mobile_arm_top1_s.jpg deleted file mode 100644 index e5a3e77e9f0aabf97f9687f2005be5e1baf16fd1..0000000000000000000000000000000000000000 Binary files a/docs/images/models/mobile_arm_top1_s.jpg and /dev/null differ diff --git a/docs/images/models/mobile_trt.png b/docs/images/models/mobile_trt.png deleted file mode 100644 index d722548bae56b9aeca081cffe3e7d34494864033..0000000000000000000000000000000000000000 Binary files a/docs/images/models/mobile_trt.png and /dev/null differ diff --git a/docs/images/models/mobile_trt.png.flops.png b/docs/images/models/mobile_trt.png.flops.png deleted file mode 100644 index 6e7010906c58ef9e1f4f286725442ca85e638c51..0000000000000000000000000000000000000000 Binary files a/docs/images/models/mobile_trt.png.flops.png and /dev/null differ diff --git a/docs/images/models/mobile_trt.png.params.png b/docs/images/models/mobile_trt.png.params.png deleted file mode 100644 index 35b78a65b390f9843def8870bbd63f656070cdcc..0000000000000000000000000000000000000000 Binary files a/docs/images/models/mobile_trt.png.params.png and /dev/null differ diff --git a/docs/zh_CN/models/DPN_DenseNet.md b/docs/zh_CN/models/DPN_DenseNet.md index 53092bd74014c42fae872cd20e0ea24ff858989c..25f61476e43d752f2426c000885166963de7ebc4 100644 --- a/docs/zh_CN/models/DPN_DenseNet.md +++ b/docs/zh_CN/models/DPN_DenseNet.md @@ -4,13 +4,15 @@ DenseNet是2017年CVPR best paper提出的一种新的网络结构,该网络设计了一种新的跨层连接的block,即dense-block。相比ResNet中的bottleneck,dense-block设计了一个更激进的密集连接机制,即互相连接所有的层,每个层都会接受其前面所有层作为其额外的输入。DenseNet将所有的dense-block堆叠,组合成了一个密集连接型网络。密集的连接方式使得DenseNe更容易进行梯度的反向传播,使得网络更容易训练。 DPN的全称是Dual Path Networks,即双通道网络。该网络是由DenseNet和ResNeXt结合的一个网络,其证明了DenseNet能从靠前的层级中提取到新的特征,而ResNeXt本质上是对之前层级中已提取特征的复用。作者进一步分析发现,ResNeXt对特征有高复用率,但冗余度低,DenseNet能创造新特征,但冗余度高。结合二者结构的优势,作者设计了DPN网络。最终DPN网络在同样FLOPS和参数量下,取得了比ResNeXt与DenseNet更好的结果。 -该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。 +该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。 -![](../../images/models/DPN.png.flops.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.flops.png) -![](../../images/models/DPN.png.params.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.params.png) -![](../../images/models/DPN.png.fp32.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.DPN.png) + +![](../../images/models/T4_benchmark/t4.fp16.bs4.DPN.png) 目前PaddleClas开源的这两类模型的预训练模型一共有10个,其指标如上图所示,可以看到,在相同的FLOPS和参数量下,相比DenseNet,DPN拥有更高的精度。但是由于DPN有更多的分支,所以其推理速度要慢于DenseNet。由于DenseNet264的网络层数最深,所以该网络是DenseNet系列模型中参数量最大的网络,DenseNet161的网络的宽度最大,导致其是该系列中网络中计算量最大、精度最高的网络。从推理速度来看,计算量大且精度高的的DenseNet161比DenseNet264具有更快的速度,所以其比DenseNet264具有更大的优势。 @@ -34,9 +36,9 @@ DPN的全称是Dual Path Networks,即双通道网络。该网络是由DenseNet -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-------------|-----------|-------------------|--------------------------| | DenseNet121 | 224 | 256 | 4.371 | | DenseNet161 | 224 | 256 | 8.863 | @@ -48,3 +50,20 @@ DPN的全称是Dual Path Networks,即双通道网络。该网络是由DenseNet | DPN98 | 224 | 256 | 21.057 | | DPN107 | 224 | 256 | 28.685 | | DPN131 | 224 | 256 | 28.083 | + + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| DenseNet121 | 224 | 256 | 4.16436 | 7.2126 | 10.50221 | 4.40447 | 9.32623 | 15.25175 | +| DenseNet161 | 224 | 256 | 9.27249 | 14.25326 | 20.19849 | 10.39152 | 22.15555 | 35.78443 | +| DenseNet169 | 224 | 256 | 6.11395 | 10.28747 | 13.68717 | 6.43598 | 12.98832 | 20.41964 | +| DenseNet201 | 224 | 256 | 7.9617 | 13.4171 | 17.41949 | 8.20652 | 17.45838 | 27.06309 | +| DenseNet264 | 224 | 256 | 11.70074 | 19.69375 | 24.79545 | 12.14722 | 26.27707 | 40.01905 | +| DPN68 | 224 | 256 | 11.7827 | 13.12652 | 16.19213 | 11.64915 | 12.82807 | 18.57113 | +| DPN92 | 224 | 256 | 18.56026 | 20.35983 | 29.89544 | 18.15746 | 23.87545 | 38.68821 | +| DPN98 | 224 | 256 | 21.70508 | 24.7755 | 40.93595 | 21.18196 | 33.23925 | 62.77751 | +| DPN107 | 224 | 256 | 27.84462 | 34.83217 | 60.67903 | 27.62046 | 52.65353 | 100.11721 | +| DPN131 | 224 | 256 | 28.58941 | 33.01078 | 55.65146 | 28.33119 | 46.19439 | 89.24904 | diff --git a/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md b/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md index eadd17683d0219b4ee72d7aba71dc04743896ea5..3cbe4009dfd17cbc88ead5f3bb91d4ed9cc7470d 100644 --- a/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md +++ b/docs/zh_CN/models/EfficientNet_and_ResNeXt101_wsl.md @@ -6,13 +6,15 @@ EfficientNet是Google于2019年发布的一个基于NAS的轻量级网络,其 ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019年,facebook通过弱监督学习研究了该系列网络在ImageNet上的精度上限,为了区别之前的ResNeXt网络,该系列网络的后缀为wsl,其中wsl是弱监督学习(weakly-supervised-learning)的简称。为了能有更强的特征提取能力,研究者将其网络宽度进一步放大,其中最大的ResNeXt101_32x48d_wsl拥有8亿个参数,将其在9.4亿的弱标签图片下训练并在ImageNet-1k上做finetune,最终在ImageNet-1k的top-1达到了85.4%,这也是迄今为止在ImageNet-1k的数据集上以224x224的分辨率下精度最高的网络。Fix-ResNeXt中,作者使用了更大的图像分辨率,针对训练图片和验证图片数据预处理不一致的情况下做了专门的Fix策略,并使得ResNeXt101_32x48d_wsl拥有了更高的精度,由于其用到了Fix策略,故命名为Fix-ResNeXt101_32x48d_wsl。 -该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。 +该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。 -![](../../images/models/EfficientNet.png.flops.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.flops.png) -![](../../images/models/EfficientNet.png.params.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.EfficientNet.params.png) -![](../../images/models/EfficientNet.png.fp32.png) +![](../../images/models/T4_benchmark/t4.fp32.bs1.EfficientNet.png) + +![](../../images/models/T4_benchmark/t4.fp16.bs1.EfficientNet.png) 目前PaddleClas开源的这两类模型的预训练模型一共有14个。从上图中可以看出EfficientNet系列网络优势非常明显,ResNeXt101_wsl系列模型由于用到了更多的数据,最终的精度也更高。EfficientNet_B0_Small是去掉了SE_block的EfficientNet_B0,其具有更快的推理速度。 @@ -36,9 +38,9 @@ ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019 | EfficientNetB0_
small | 0.758 | 0.926 | | | 0.720 | 4.650 | -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-------------------------------|-----------|-------------------|--------------------------| | ResNeXt101_
32x8d_wsl | 224 | 256 | 19.127 | | ResNeXt101_
32x16d_wsl | 224 | 256 | 23.629 | @@ -54,3 +56,24 @@ ResNeXt是facebook于2016年提出的一种对ResNet的改进版网络。在2019 | EfficientNetB6 | 528 | 560 | 18.381 | | EfficientNetB7 | 600 | 632 | 27.817 | | EfficientNetB0_
small | 224 | 256 | 1.692 | + + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|---------------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| ResNeXt101_
32x8d_wsl | 224 | 256 | 18.19374 | 21.93529 | 34.67802 | 18.52528 | 34.25319 | 67.2283 | +| ResNeXt101_
32x16d_wsl | 224 | 256 | 18.52609 | 36.8288 | 62.79947 | 25.60395 | 71.88384 | 137.62327 | +| ResNeXt101_
32x32d_wsl | 224 | 256 | 33.51391 | 70.09682 | 125.81884 | 54.87396 | 160.04337 | 316.17718 | +| ResNeXt101_
32x48d_wsl | 224 | 256 | 50.97681 | 137.60926 | 190.82628 | 99.01698256 | 315.91261 | 551.83695 | +| Fix_ResNeXt101_
32x48d_wsl | 320 | 320 | 78.62869 | 191.76039 | 317.15436 | 160.0838242 | 595.99296 | 1151.47384 | +| EfficientNetB0 | 224 | 256 | 3.40122 | 5.95851 | 9.10801 | 3.442 | 6.11476 | 9.3304 | +| EfficientNetB1 | 240 | 272 | 5.25172 | 9.10233 | 14.11319 | 5.3322 | 9.41795 | 14.60388 | +| EfficientNetB2 | 260 | 292 | 5.91052 | 10.5898 | 17.38106 | 6.29351 | 10.95702 | 17.75308 | +| EfficientNetB3 | 300 | 332 | 7.69582 | 16.02548 | 27.4447 | 7.67749 | 16.53288 | 28.5939 | +| EfficientNetB4 | 380 | 412 | 11.55585 | 29.44261 | 53.97363 | 12.15894 | 30.94567 | 57.38511 | +| EfficientNetB5 | 456 | 488 | 19.63083 | 56.52299 | - | 20.48571 | 61.60252 | - | +| EfficientNetB6 | 528 | 560 | 30.05911 | - | - | 32.62402 | - | - | +| EfficientNetB7 | 600 | 632 | 47.86087 | - | - | 53.93823 | - | - | +| EfficientNetB0_small | 224 | 256 | 2.39166 | 4.36748 | 6.96002 | 2.3076 | 4.71886 | 7.21888 | diff --git a/docs/zh_CN/models/HRNet.md b/docs/zh_CN/models/HRNet.md index c33fb0fa025fd9b7f1993186d622c2acfdd22acb..f694f7b0c1d6d6c9b195fa61aa0cc9544564859d 100644 --- a/docs/zh_CN/models/HRNet.md +++ b/docs/zh_CN/models/HRNet.md @@ -3,13 +3,16 @@ ## 概述 HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络,不同于以往的卷积神经网络,该网络在网络深层仍然可以保持高分辨率,因此预测的关键点热图更准确,在空间上也更精确。此外,该网络在对分辨率敏感的其他视觉任务中,如检测、分割等,表现尤为优异。 -该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。 +该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。 -![](../../images/models/HRNet.png.flops.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.flops.png) -![](../../images/models/HRNet.png.params.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.params.png) + +![](../../images/models/T4_benchmark/t4.fp32.bs4.HRNet.png) + +![](../../images/models/T4_benchmark/t4.fp16.bs4.HRNet.png) -![](../../images/models/HRNet.png.fp32.png) 目前PaddleClas开源的这类模型的预训练模型一共有7个,其指标如图所示,其中HRNet_W48_C指标精度异常的原因可能是因为网络训练的正常波动。 @@ -26,9 +29,9 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络, | HRNet_W64_C | 0.793 | 0.946 | 0.795 | 0.946 | 57.830 | 128.060 | -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-------------|-----------|-------------------|--------------------------| | HRNet_W18_C | 224 | 256 | 7.368 | | HRNet_W30_C | 224 | 256 | 9.402 | @@ -37,3 +40,18 @@ HRNet是2019年由微软亚洲研究院提出的一种全新的神经网络, | HRNet_W44_C | 224 | 256 | 11.497 | | HRNet_W48_C | 224 | 256 | 12.165 | | HRNet_W64_C | 224 | 256 | 15.003 | + + + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|-------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| HRNet_W18_C | 224 | 256 | 6.79093 | 11.50986 | 17.67244 | 7.40636 | 13.29752 | 23.33445 | +| HRNet_W30_C | 224 | 256 | 8.98077 | 14.08082 | 21.23527 | 9.57594 | 17.35485 | 32.6933 | +| HRNet_W32_C | 224 | 256 | 8.82415 | 14.21462 | 21.19804 | 9.49807 | 17.72921 | 32.96305 | +| HRNet_W40_C | 224 | 256 | 11.4229 | 19.1595 | 30.47984 | 12.12202 | 25.68184 | 48.90623 | +| HRNet_W44_C | 224 | 256 | 12.25778 | 22.75456 | 32.61275 | 13.19858 | 32.25202 | 59.09871 | +| HRNet_W48_C | 224 | 256 | 12.65015 | 23.12886 | 33.37859 | 13.70761 | 34.43572 | 63.01219 | +| HRNet_W64_C | 224 | 256 | 15.10428 | 27.68901 | 40.4198 | 17.57527 | 47.9533 | 97.11228 | diff --git a/docs/zh_CN/models/Inception.md b/docs/zh_CN/models/Inception.md index 8c9333656b1e78199b1f29feff17ef5d15c593b8..b85c2bf1b5068936daa3091215540b866d4d31b3 100644 --- a/docs/zh_CN/models/Inception.md +++ b/docs/zh_CN/models/Inception.md @@ -9,13 +9,15 @@ Xception 是 Google 继 Inception 后提出的对 InceptionV3 的另一种改进 InceptionV4是2016年由Google设计的新的神经网络,当时残差结构风靡一时,但是作者认为仅使用Inception 结构也可以达到很高的性能。InceptionV4使用了更多的Inception module,在ImageNet上的精度再创新高。 -该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。 +该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。 -![](../../images/models/Inception.png.flops.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.flops.png) -![](../../images/models/Inception.png.params.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.params.png) -![](../../images/models/Inception.png.fp32.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.Inception.png) + +![](../../images/models/T4_benchmark/t4.fp16.bs4.Inception.png) 上图反映了Xception系列和InceptionV4的精度和其他指标的关系。其中Xception_deeplab与论文结构保持一致,Xception是PaddleClas的改进模型,在预测速度基本不变的情况下,精度提升约0.6%。关于该改进模型的详细介绍正在持续更新中,敬请期待。 @@ -35,14 +37,28 @@ InceptionV4是2016年由Google设计的新的神经网络,当时残差结构 -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |------------------------|-----------|-------------------|--------------------------| | GoogLeNet | 224 | 256 | 1.807 | | Xception41 | 299 | 320 | 3.972 | -| Xception41
_deeplab | 299 | 320 | 4.408 | +| Xception41_
deeplab | 299 | 320 | 4.408 | | Xception65 | 299 | 320 | 6.174 | -| Xception65
_deeplab | 299 | 320 | 6.464 | +| Xception65_
deeplab | 299 | 320 | 6.464 | | Xception71 | 299 | 320 | 6.782 | | InceptionV4 | 299 | 320 | 11.141 | + + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|--------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| GoogLeNet | 299 | 320 | 1.75451 | 3.39931 | 4.71909 | 1.88038 | 4.48882 | 6.94035 | +| Xception41 | 299 | 320 | 2.91192 | 7.86878 | 15.53685 | 4.96939 | 17.01361 | 32.67831 | +| Xception41_
deeplab | 299 | 320 | 2.85934 | 7.2075 | 14.01406 | 5.33541 | 17.55938 | 33.76232 | +| Xception65 | 299 | 320 | 4.30126 | 11.58371 | 23.22213 | 7.26158 | 25.88778 | 53.45426 | +| Xception65_
deeplab | 299 | 320 | 4.06803 | 9.72694 | 19.477 | 7.60208 | 26.03699 | 54.74724 | +| Xception71 | 299 | 320 | 4.80889 | 13.5624 | 27.18822 | 8.72457 | 31.55549 | 69.31018 | +| InceptionV4 | 299 | 320 | 9.50821 | 13.72104 | 20.27447 | 12.99342 | 25.23416 | 43.56121 | diff --git a/docs/zh_CN/models/Mobile.md b/docs/zh_CN/models/Mobile.md index 6cf4ed90241ca959e7c5b66f9a14a8c415e7dd87..3c0ebe37693b5564fbc654d99948d69b04b97a85 100644 --- a/docs/zh_CN/models/Mobile.md +++ b/docs/zh_CN/models/Mobile.md @@ -8,10 +8,16 @@ MobileNetV2是Google继MobileNetV1提出的一种轻量级网络。相比MobileN ShuffleNet系列网络是旷视提出的轻量化网络结构,到目前为止,该系列网络一共有两种典型的结构,即ShuffleNetV1与ShuffleNetV2。ShuffleNet中的Channel Shuffle操作可以将组间的信息进行交换,并且可以实现端到端的训练。在ShuffleNetV2的论文中,作者提出了设计轻量级网络的四大准则,并且根据四大准则与ShuffleNetV1的不足,设计了ShuffleNetV2网络。 MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络,为了进一步提升效果,将relu和sigmoid激活函数分别替换为hard_swish与hard_sigmoid激活函数,同时引入了一些专门减小网络计算量的改进策略。 + ![](../../images/models/mobile_arm_top1.png) + ![](../../images/models/mobile_arm_storage.png) -![](../../images/models/mobile_trt.png.flops.png) -![](../../images/models/mobile_trt.png.params.png) + +![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.flops.png) + +![](../../images/models/T4_benchmark/t4.fp32.bs4.mobile_trt.params.png) + + 目前PaddleClas开源的的移动端系列的预训练模型一共有32个,其指标如图所示。从图片可以看出,越新的轻量级模型往往有更优的表现,MobileNetV3代表了目前最新的轻量级神经网络结构。在MobileNetV3中,作者为了获得更高的精度,在global-avg-pooling后使用了1x1的卷积。该操作大幅提升了参数量但对计算量影响不大,所以如果从存储角度评价模型的优异程度,MobileNetV3优势不是很大,但由于其更小的计算量,使得其有更快的推理速度。此外,我们模型库中的ssld蒸馏模型表现优异,从各个考量角度下,都刷新了当前轻量级模型的精度。由于MobileNetV3模型结构复杂,分支较多,对GPU并不友好,GPU预测速度不如MobileNetV1。 @@ -53,9 +59,9 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络 | ShuffleNetV2_swish | 0.700 | 0.892 | | | 0.290 | 2.260 | -## CPU预测速度和存储大小 +## 基于SD855的预测速度和存储大小 -| Models | batch_size=1(ms) | Storage Size(M) | +| Models | Batch Size=1(ms) | Storage Size(M) | |:--:|:--:|:--:| | MobileNetV1_x0_25 | 3.220 | 1.900 | | MobileNetV1_x0_5 | 9.580 | 5.200 | @@ -89,3 +95,40 @@ MobileNetV3是Google于2019年提出的一种基于NAS的新的轻量级网络 | ShuffleNetV2_x1_5 | 19.352 | 14.000 | | ShuffleNetV2_x2_0 | 34.770 | 28.000 | | ShuffleNetV2_swish | 16.023 | 9.100 | + + +## 基于T4 GPU的预测速度 + +| Models | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|-----------------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------| +| MobileNetV1_x0_25 | 0.68422 | 1.13021 | 1.72095 | 0.67274 | 1.226 | 1.84096 | +| MobileNetV1_x0_5 | 0.69326 | 1.09027 | 1.84746 | 0.69947 | 1.43045 | 2.39353 | +| MobileNetV1_x0_75 | 0.6793 | 1.29524 | 2.15495 | 0.79844 | 1.86205 | 3.064 | +| MobileNetV1 | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 | +| MobileNetV1_ssld | 0.71942 | 1.45018 | 2.47953 | 0.91164 | 2.26871 | 3.90797 | +| MobileNetV2_x0_25 | 2.85399 | 3.62405 | 4.29952 | 2.81989 | 3.52695 | 4.2432 | +| MobileNetV2_x0_5 | 2.84258 | 3.1511 | 4.10267 | 2.80264 | 3.65284 | 4.31737 | +| MobileNetV2_x0_75 | 2.82183 | 3.27622 | 4.98161 | 2.86538 | 3.55198 | 5.10678 | +| MobileNetV2 | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 | +| MobileNetV2_x1_5 | 2.81852 | 4.87434 | 8.97934 | 2.79398 | 5.30149 | 9.30899 | +| MobileNetV2_x2_0 | 3.65197 | 6.32329 | 11.644 | 3.29788 | 7.08644 | 12.45375 | +| MobileNetV2_ssld | 2.78603 | 3.71982 | 6.27879 | 2.62398 | 3.54429 | 6.41178 | +| MobileNetV3_large_x1_25 | 2.34387 | 3.16103 | 4.79742 | 2.35117 | 3.44903 | 5.45658 | +| MobileNetV3_large_x1_0 | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 | +| MobileNetV3_large_x0_75 | 2.1058 | 2.61426 | 3.61021 | 2.0006 | 2.56987 | 3.78005 | +| MobileNetV3_large_x0_5 | 2.06934 | 2.77341 | 3.35313 | 2.11199 | 2.88172 | 3.19029 | +| MobileNetV3_large_x0_35 | 2.14965 | 2.7868 | 3.36145 | 1.9041 | 2.62951 | 3.26036 | +| MobileNetV3_small_x1_25 | 2.06817 | 2.90193 | 3.5245 | 2.02916 | 2.91866 | 3.34528 | +| MobileNetV3_small_x1_0 | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 | +| MobileNetV3_small_x0_75 | 1.80617 | 2.64646 | 3.24513 | 1.93697 | 2.64285 | 3.32797 | +| MobileNetV3_small_x0_5 | 1.95001 | 2.74014 | 3.39485 | 1.88406 | 2.99601 | 3.3908 | +| MobileNetV3_small_x0_35 | 2.10683 | 2.94267 | 3.44254 | 1.94427 | 2.94116 | 3.41082 | +| MobileNetV3_large_x1_0_ssld | 2.20149 | 3.08423 | 4.07779 | 2.04296 | 2.9322 | 4.53184 | +| MobileNetV3_small_x1_0_ssld | 1.73933 | 2.59478 | 3.40276 | 1.74527 | 2.63565 | 3.28124 | +| ShuffleNetV2 | 1.95064 | 2.15928 | 2.97169 | 1.89436 | 2.26339 | 3.17615 | +| ShuffleNetV2_x0_25 | 1.43242 | 2.38172 | 2.96768 | 1.48698 | 2.29085 | 2.90284 | +| ShuffleNetV2_x0_33 | 1.69008 | 2.65706 | 2.97373 | 1.75526 | 2.85557 | 3.09688 | +| ShuffleNetV2_x0_5 | 1.48073 | 2.28174 | 2.85436 | 1.59055 | 2.18708 | 3.09141 | +| ShuffleNetV2_x1_5 | 1.51054 | 2.4565 | 3.41738 | 1.45389 | 2.5203 | 3.99872 | +| ShuffleNetV2_x2_0 | 1.95616 | 2.44751 | 4.19173 | 2.15654 | 3.18247 | 5.46893 | +| ShuffleNetV2_swish | 2.50213 | 2.92881 | 3.474 | 2.5129 | 2.97422 | 3.69357 | diff --git a/docs/zh_CN/models/Others.md b/docs/zh_CN/models/Others.md index 35cabf68d5e2d05ec336d7e5cdfac3d363477971..c24f76652bc5e322df2533bbf8c59889bb420910 100644 --- a/docs/zh_CN/models/Others.md +++ b/docs/zh_CN/models/Others.md @@ -27,10 +27,10 @@ DarkNet53是YOLO作者在论文设计的用于目标检测的backbone,该网 -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |---------------------------|-----------|-------------------|----------------------| | AlexNet | 224 | 256 | 1.176 | | SqueezeNet1_0 | 224 | 256 | 0.860 | @@ -41,3 +41,20 @@ DarkNet53是YOLO作者在论文设计的用于目标检测的backbone,该网 | VGG19 | 224 | 256 | 3.076 | | DarkNet53 | 256 | 256 | 3.139 | | ResNet50_ACNet
_deploy | 224 | 256 | 5.626 | + + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| AlexNet | 224 | 256 | 1.06447 | 1.70435 | 2.38402 | 1.44993 | 2.46696 | 3.72085 | +| SqueezeNet1_0 | 224 | 256 | 0.97162 | 2.06719 | 3.67499 | 0.96736 | 2.53221 | 4.54047 | +| SqueezeNet1_1 | 224 | 256 | 0.81378 | 1.62919 | 2.68044 | 0.76032 | 1.877 | 3.15298 | +| VGG11 | 224 | 256 | 2.24408 | 4.67794 | 7.6568 | 3.90412 | 9.51147 | 17.14168 | +| VGG13 | 224 | 256 | 2.58589 | 5.82708 | 10.03591 | 4.64684 | 12.61558 | 23.70015 | +| VGG16 | 224 | 256 | 3.13237 | 7.19257 | 12.50913 | 5.61769 | 16.40064 | 32.03939 | +| VGG19 | 224 | 256 | 3.69987 | 8.59168 | 15.07866 | 6.65221 | 20.4334 | 41.55902 | +| DarkNet53 | 256 | 256 | 3.18101 | 5.88419 | 10.14964 | 4.10829 | 12.1714 | 22.15266 | +| ResNet50_ACNet | 256 | 256 | 3.89002 | 4.58195 | 9.01095 | 5.33395 | 10.96843 | 18.70368 | +| ResNet50_ACNet_deploy | 224 | 256 | 2.6823 | 5.944 | 7.16655 | 3.49161 | 7.78374 | 13.94361 | diff --git a/docs/zh_CN/models/ResNet_and_vd.md b/docs/zh_CN/models/ResNet_and_vd.md index bc2946e99bc270c084c146808188da53c475ebae..ea045f12ca545ca0cea2229fcfc1993fd50ec77b 100644 --- a/docs/zh_CN/models/ResNet_and_vd.md +++ b/docs/zh_CN/models/ResNet_and_vd.md @@ -10,18 +10,19 @@ ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠 其中,ResNet50_vd_v2与ResNet50_vd_ssld采用了知识蒸馏,保证模型结构不变的情况下,进一步提升了模型的精度,具体地,ResNet50_vd_v2的teacher模型是ResNet152_vd(top1准确率80.59%),数据选用的是ImageNet-1k的训练集,ResNet50_vd_ssld的teacher模型是ResNeXt101_32x16d_wsl(top1准确率84.2%),数据选用结合了ImageNet-1k的训练集和ImageNet-22k挖掘的400万数据。知识蒸馏的具体方法正在持续更新中。 +该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。 -该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。 +![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.flops.png) -![](../../images/models/ResNet.png.flops.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.params.png) -![](../../images/models/ResNet.png.params.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.ResNet.png) + +![](../../images/models/T4_benchmark/t4.fp16.bs4.ResNet.png) -![](../../images/models/ResNet.png.fp32.png) 通过上述曲线可以看出,层数越多,准确率越高,但是相应的参数量、计算量和延时都会增加。ResNet50_vd_ssld通过用更强的teacher和更多的数据,将其在ImageNet-1k上的验证集top-1精度进一步提高,达到了82.39%,刷新了ResNet50系列模型的精度。 -**注意**:所有模型在预测时,图像的crop_size设置为224,resize_short_size设置为256。 ## 精度、FLOPS和参数量 @@ -46,9 +47,9 @@ ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠 -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |------------------|-----------|-------------------|--------------------------| | ResNet18 | 224 | 256 | 1.499 | | ResNet18_vd | 224 | 256 | 1.603 | @@ -65,3 +66,24 @@ ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠 | ResNet200_vd | 224 | 256 | 8.885 | | ResNet50_vd_ssld | 224 | 256 | 3.165 | | ResNet101_vd_ssld | 224 | 256 | 5.252 | + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|-------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| ResNet18 | 224 | 256 | 1.3568 | 2.5225 | 3.61904 | 1.45606 | 3.56305 | 6.28798 | +| ResNet18_vd | 224 | 256 | 1.39593 | 2.69063 | 3.88267 | 1.54557 | 3.85363 | 6.88121 | +| ResNet34 | 224 | 256 | 2.23092 | 4.10205 | 5.54904 | 2.34957 | 5.89821 | 10.73451 | +| ResNet34_vd | 224 | 256 | 2.23992 | 4.22246 | 5.79534 | 2.43427 | 6.22257 | 11.44906 | +| ResNet50 | 224 | 256 | 2.63824 | 4.63802 | 7.02444 | 3.47712 | 7.84421 | 13.90633 | +| ResNet50_vc | 224 | 256 | 2.67064 | 4.72372 | 7.17204 | 3.52346 | 8.10725 | 14.45577 | +| ResNet50_vd | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 | +| ResNet50_vd_v2 | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 | +| ResNet101 | 224 | 256 | 5.04037 | 7.73673 | 10.8936 | 6.07125 | 13.40573 | 24.3597 | +| ResNet101_vd | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 | +| ResNet152 | 224 | 256 | 7.28665 | 10.62001 | 14.90317 | 8.50198 | 19.17073 | 35.78384 | +| ResNet152_vd | 224 | 256 | 7.29127 | 10.86137 | 15.32444 | 8.54376 | 19.52157 | 36.64445 | +| ResNet200_vd | 224 | 256 | 9.36026 | 13.5474 | 19.0725 | 10.80619 | 25.01731 | 48.81399 | +| ResNet50_vd_ssld | 224 | 256 | 2.65164 | 4.84109 | 7.46225 | 3.53131 | 8.09057 | 14.45965 | +| ResNet101_vd_ssld | 224 | 256 | 5.05972 | 7.83685 | 11.34235 | 6.11704 | 13.76222 | 25.11071 | diff --git a/docs/zh_CN/models/SEResNext_and_Res2Net.md b/docs/zh_CN/models/SEResNext_and_Res2Net.md index 90955354aece52a3770c57c6d387c4bdf8238453..1a8c125ee931a2bc036834d122d529c26264bb66 100644 --- a/docs/zh_CN/models/SEResNext_and_Res2Net.md +++ b/docs/zh_CN/models/SEResNext_and_Res2Net.md @@ -7,18 +7,20 @@ SENet是2017年ImageNet分类比赛的冠军方案,其提出了一个全新的 Res2Net是2019年提出的一种全新的对ResNet的改进方案,该方案可以和现有其他优秀模块轻松整合,在不增加计算负载量的情况下,在ImageNet、CIFAR-100等数据集上的测试性能超过了ResNet。Res2Net结构简单,性能优越,进一步探索了CNN在更细粒度级别的多尺度表示能力。Res2Net揭示了一个新的提升模型精度的维度,即scale,其是除了深度、宽度和基数的现有维度之外另外一个必不可少的更有效的因素。该网络在其他视觉任务如目标检测、图像分割等也有相当不错的表现。 -该系列模型的FLOPS、参数量以及FP32预测耗时如下图所示。 +该系列模型的FLOPS、参数量以及T4 GPU上的预测耗时如下图所示。 -![](../../images/models/SeResNeXt.png.flops.png) -![](../../images/models/SeResNeXt.png.params.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.flops.png) -![](../../images/models/SeResNeXt.png.fp32.png) +![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.params.png) + +![](../../images/models/T4_benchmark/t4.fp32.bs4.SeResNeXt.png) + +![](../../images/models/T4_benchmark/t4.fp16.bs4.SeResNeXt.png) -目前PaddleClas开源的这三类的预训练模型一共有24个,其指标如图所示,从图中可以看出,在同样Flops和Params下,改进版的模型往往有更高的精度,但是推理速度往往不如ResNet系列。另一方面,Res2Net表现也较为优秀,相比ResNeXt中的group操作、SEResNet中的SE结构操作,Res2Net在相同Flops、Params和推理速度下往往精度更佳。 +目前PaddleClas开源的这三类的预训练模型一共有24个,其指标如图所示,从图中可以看出,在同样Flops和Params下,改进版的模型往往有更高的精度,但是推理速度往往不如ResNet系列。另一方面,Res2Net表现也较为优秀,相比ResNeXt中的group操作、SEResNet中的SE结构操作,Res2Net在相同Flops、Params和推理速度下往往精度更佳。 -**注意**:所有模型在预测时,图像的crop_size设置为224,resize_short_size设置为256。 ## 精度、FLOPS和参数量 @@ -52,9 +54,9 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案,该方案可 -## FP32预测速度 +## 基于V100 GPU的预测速度 -| Models | Crop Size | Resize Short Size | Batch Size=1
(ms) | +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | |-----------------------|-----------|-------------------|--------------------------| | Res2Net50_26w_4s | 224 | 256 | 4.148 | | Res2Net50_vd_26w_4s | 224 | 256 | 4.172 | @@ -80,3 +82,33 @@ Res2Net是2019年提出的一种全新的对ResNet的改进方案,该方案可 | SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.011 | | SE_ResNeXt101_32x4d | 224 | 256 | 19.204 | | SENet154_vd | 224 | 256 | 50.406 | + + +## 基于T4 GPU的预测速度 + +| Models | Crop Size | Resize Short Size | FP16
Batch Size=1
(ms) | FP16
Batch Size=4
(ms) | FP16
Batch Size=8
(ms) | FP32
Batch Size=1
(ms) | FP32
Batch Size=4
(ms) | FP32
Batch Size=8
(ms) | +|-----------------------|-----------|-------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------|------------------------------| +| Res2Net50_26w_4s | 224 | 256 | 3.56067 | 6.61827 | 11.41566 | 4.47188 | 9.65722 | 17.54535 | +| Res2Net50_vd_26w_4s | 224 | 256 | 3.69221 | 6.94419 | 11.92441 | 4.52712 | 9.93247 | 18.16928 | +| Res2Net50_14w_8s | 224 | 256 | 4.45745 | 7.69847 | 12.30935 | 5.4026 | 10.60273 | 18.01234 | +| Res2Net101_vd_26w_4s | 224 | 256 | 6.53122 | 10.81895 | 18.94395 | 8.08729 | 17.31208 | 31.95762 | +| Res2Net200_vd_26w_4s | 224 | 256 | 11.66671 | 18.93953 | 33.19188 | 14.67806 | 32.35032 | 63.65899 | +| ResNeXt50_32x4d | 224 | 256 | 7.61087 | 8.88918 | 12.99674 | 7.56327 | 10.6134 | 18.46915 | +| ResNeXt50_vd_32x4d | 224 | 256 | 7.69065 | 8.94014 | 13.4088 | 7.62044 | 11.03385 | 19.15339 | +| ResNeXt50_64x4d | 224 | 256 | 13.78688 | 15.84655 | 21.79537 | 13.80962 | 18.4712 | 33.49843 | +| ResNeXt50_vd_64x4d | 224 | 256 | 13.79538 | 15.22201 | 22.27045 | 13.94449 | 18.88759 | 34.28889 | +| ResNeXt101_32x4d | 224 | 256 | 16.59777 | 17.93153 | 21.36541 | 16.21503 | 19.96568 | 33.76831 | +| ResNeXt101_vd_32x4d | 224 | 256 | 16.36909 | 17.45681 | 22.10216 | 16.28103 | 20.25611 | 34.37152 | +| ResNeXt101_64x4d | 224 | 256 | 30.12355 | 32.46823 | 38.41901 | 30.4788 | 36.29801 | 68.85559 | +| ResNeXt101_vd_64x4d | 224 | 256 | 30.34022 | 32.27869 | 38.72523 | 30.40456 | 36.77324 | 69.66021 | +| ResNeXt152_32x4d | 224 | 256 | 25.26417 | 26.57001 | 30.67834 | 24.86299 | 29.36764 | 52.09426 | +| ResNeXt152_vd_32x4d | 224 | 256 | 25.11196 | 26.70515 | 31.72636 | 25.03258 | 30.08987 | 52.64429 | +| ResNeXt152_64x4d | 224 | 256 | 46.58293 | 48.34563 | 56.97961 | 46.7564 | 56.34108 | 106.11736 | +| ResNeXt152_vd_64x4d | 224 | 256 | 47.68447 | 48.91406 | 57.29329 | 47.18638 | 57.16257 | 107.26288 | +| SE_ResNet18_vd | 224 | 256 | 1.61823 | 3.1391 | 4.60282 | 1.7691 | 4.19877 | 7.5331 | +| SE_ResNet34_vd | 224 | 256 | 2.67518 | 5.04694 | 7.18946 | 2.88559 | 7.03291 | 12.73502 | +| SE_ResNet50_vd | 224 | 256 | 3.65394 | 7.568 | 12.52793 | 4.28393 | 10.38846 | 18.33154 | +| SE_ResNeXt50_32x4d | 224 | 256 | 9.06957 | 11.37898 | 18.86282 | 8.74121 | 13.563 | 23.01954 | +| SE_ResNeXt50_vd_32x4d | 224 | 256 | 9.25016 | 11.85045 | 25.57004 | 9.17134 | 14.76192 | 19.914 | +| SE_ResNeXt101_32x4d | 224 | 256 | 19.34455 | 20.6104 | 32.20432 | 18.82604 | 25.31814 | 41.97758 | +| SENet154_vd | 224 | 256 | 49.85733 | 54.37267 | 74.70447 | 53.79794 | 66.31684 | 121.59885 | diff --git a/docs/zh_CN/update_history.md b/docs/zh_CN/update_history.md index a3ba148b2be262e920e8a8f06c87c6b429b30413..e59f707209c409ebcf05bb3139407d8c84f169fb 100644 --- a/docs/zh_CN/update_history.md +++ b/docs/zh_CN/update_history.md @@ -1,3 +1,9 @@ # 更新日志 -* 2020.04.10: 第一次提交 +* 2020.05.09 + * 添加Paddle Serving使用文档。 + * 添加Paddle-Lite使用文档。 + * 添加T4 GPU的FP32/FP16预测速度benchmark。 + +* 2020.04.10: + * 第一次提交。