模型库

1. 图像分类

数据集:ImageNet1000类

1.1 量化

模型 压缩方法 Top-1/Top-5 Acc 模型体积(MB) TensorRT时延(V100, ms) 下载
MobileNetV1 - 70.99%/89.68% 17 - 下载链接
MobileNetV1 quant_post 70.18%/89.25% (-0.81%/-0.43%) 4.4 - 下载链接
MobileNetV1 quant_aware 70.60%/89.57% (-0.39%/-0.11%) 4.4 - 下载链接
MobileNetV2 - 72.15%/90.65% 15 - 下载链接
MobileNetV2 quant_post 71.15%/90.11% (-1%/-0.54%) 4.0 - 下载链接
MobileNetV2 quant_aware 72.05%/90.63% (-0.1%/-0.02%) 4.0 - 下载链接
ResNet50 - 76.50%/93.00% 99 2.71 下载链接
ResNet50 quant_post 76.33%/93.02% (-0.17%/+0.02%) 25.1 1.19 下载链接
ResNet50 quant_aware 76.48%/93.11% (-0.02%/+0.11%) 25.1 1.17 下载链接

分类模型Lite时延(ms)

设备 模型类型 压缩策略 armv7 Thread 1 armv7 Thread 2 armv7 Thread 4 armv8 Thread 1 armv8 Thread 2 armv8 Thread 4
高通835 MobileNetV1 FP32 baseline 96.1942 53.2058 32.4468 88.4955 47.95 27.5189
高通835 MobileNetV1 quant_aware 60.8186 32.1931 16.4275 56.4311 29.5446 15.1053
高通835 MobileNetV1 quant_post 60.5615 32.4016 16.6596 56.5266 29.7178 15.1459
高通835 MobileNetV2 FP32 baseline 65.715 38.1346 25.155 61.3593 36.2038 22.849
高通835 MobileNetV2 quant_aware 48.3655 30.2021 21.9303 46.1487 27.3146 18.3053
高通835 MobileNetV2 quant_post 48.3495 30.3069 22.1506 45.8715 27.4105 18.2223
高通835 ResNet50 FP32 baseline 526.811 319.6486 205.8345 506.1138 335.1584 214.8936
高通835 ResNet50 quant_aware 475.4538 256.8672 139.699 461.7344 247.9506 145.9847
高通835 ResNet50 quant_post 476.0507 256.5963 139.7266 461.9176 248.3795 149.353
高通855 MobileNetV1 FP32 baseline 33.5086 19.5773 11.7534 31.3474 18.5382 10.0811
高通855 MobileNetV1 quant_aware 36.7067 21.628 11.0372 14.0238 8.199 4.2588
高通855 MobileNetV1 quant_post 37.0498 21.7081 11.0779 14.0947 8.1926 4.2934
高通855 MobileNetV2 FP32 baseline 25.0396 15.2862 9.6609 22.909 14.1797 8.8325
高通855 MobileNetV2 quant_aware 28.1583 18.3317 11.8103 16.9158 11.1606 7.4148
高通855 MobileNetV2 quant_post 28.1631 18.3917 11.8333 16.9399 11.1772 7.4176
高通855 ResNet50 FP32 baseline 185.3705 113.0825 87.0741 177.7367 110.0433 74.4114
高通855 ResNet50 quant_aware 327.6883 202.4536 106.243 243.5621 150.0542 78.4205
高通855 ResNet50 quant_post 328.2683 201.9937 106.744 242.6397 150.0338 79.8659
麒麟970 MobileNetV1 FP32 baseline 101.2455 56.4053 35.6484 94.8985 51.7251 31.9511
麒麟970 MobileNetV1 quant_aware 62.5012 32.1863 16.6018 57.7477 29.2116 15.0703
麒麟970 MobileNetV1 quant_post 62.4412 32.2585 16.6215 57.825 29.2573 15.1206
麒麟970 MobileNetV2 FP32 baseline 70.4176 42.0795 25.1939 68.9597 39.2145 22.6617
麒麟970 MobileNetV2 quant_aware 52.9961 31.5323 22.1447 49.4858 28.0856 18.7287
麒麟970 MobileNetV2 quant_post 53.0961 31.7987 21.8334 49.383 28.2358 18.3642
麒麟970 ResNet50 FP32 baseline 586.8943 344.0858 228.2293 573.3344 351.4332 225.8006
麒麟970 ResNet50 quant_aware 488.361 260.1697 142.416 479.5668 249.8485 138.1742
麒麟970 ResNet50 quant_post 489.6188 258.3279 142.6063 480.0064 249.5339 138.5284

1.2 剪裁

PaddleLite推理耗时说明:

环境:Qualcomm SnapDragon 845 + armv8

速度指标:Thread1/Thread2/Thread4耗时

PaddleLite版本: v2.3

模型 压缩方法 Top-1/Top-5 Acc 模型体积(MB) GFLOPs PaddleLite推理耗时 TensorRT推理速度(FPS) 下载
MobileNetV1 Baseline 70.99%/89.68% 17 1.11 66.052\35.8014\19.5762 - 下载链接
MobileNetV1 uniform -50% 69.4%/88.66% (-1.59%/-1.02%) 9 0.56 33.5636\18.6834\10.5076 - 下载链接
MobileNetV1 sensitive -30% 70.4%/89.3% (-0.59%/-0.38%) 12 0.74 46.5958\25.3098\13.6982 - 下载链接
MobileNetV1 sensitive -50% 69.8% / 88.9% (-1.19%/-0.78%) 9 0.56 37.9892\20.7882\11.3144 - 下载链接
MobileNetV2 - 72.15%/90.65% 15 0.59 41.7874\23.375\13.3998 - 下载链接
MobileNetV2 uniform -50% 65.79%/86.11% (-6.35%/-4.47%) 11 0.296 23.8842\13.8698\8.5572 - 下载链接
ResNet34 - 72.15%/90.65% 84 7.36 217.808\139.943\96.7504 342.32 下载链接
ResNet34 uniform -50% 70.99%/89.95% (-1.36%/-0.87%) 41 3.67 114.787\75.0332\51.8438 452.41 下载链接
ResNet34 auto -55.05% 70.24%/89.63% (-2.04%/-1.06%) 33 3.31 105.924\69.3222\48.0246 457.25 下载链接

1.3 蒸馏

模型 压缩方法 Top-1/Top-5 Acc 模型体积(MB) 下载
MobileNetV1 student 70.99%/89.68% 17 下载链接
ResNet50_vd teacher 79.12%/94.44% 99 下载链接
MobileNetV1 ResNet50_vd1 distill 72.77%/90.68% (+1.78%/+1.00%) 17 下载链接
MobileNetV2 student 72.15%/90.65% 15 下载链接
MobileNetV2 ResNet50_vd distill 74.28%/91.53% (+2.13%/+0.88%) 15 下载链接
ResNet50 student 76.50%/93.00% 99 下载链接
ResNet101 teacher 77.56%/93.64% 173 下载链接
ResNet50 ResNet101 distill 77.29%/93.65% (+0.79%/+0.65%) 99 下载链接

注意:带”_vd”后缀代表该预训练模型使用了Mixup,Mixup相关介绍参考mixup: Beyond Empirical Risk Minimization

1.4 搜索

数据集: ImageNet1000

模型 压缩方法 Top-1/Top-5 Acc 模型体积(MB) GFLOPs 下载
MobileNetV2 - 72.15%/90.65% 15 0.59 下载链接
MobileNetV2 SANAS 71.518%/90.208% (-0.632%/-0.442%) 14 0.295 下载链接

数据集: Cifar10

模型 压缩方法 Acc 模型参数(MB) 下载
Darts - 97.135% 3.767 -
Darts_SA(基于Darts搜索空间) SANAS 97.276%(+0.141%) 3.344(-11.2%) -

Note: MobileNetV2_NAS 的token是:[4, 4, 5, 1, 1, 2, 1, 1, 0, 2, 6, 2, 0, 3, 4, 5, 0, 4, 5, 5, 1, 4, 8, 0, 0]. Darts_SA的token是:[5, 5, 0, 5, 5, 10, 7, 7, 5, 7, 7, 11, 10, 12, 10, 0, 5, 3, 10, 8].

2. 目标检测

2.1 量化

数据集: COCO 2017

模型 压缩方法 数据集 Image/GPU 输入608 Box AP 输入416 Box AP 输入320 Box AP 模型体积(MB) TensorRT时延(V100, ms) 下载
MobileNet-V1-YOLOv3 - COCO 8 29.3 29.3 27.1 95 - 下载链接
MobileNet-V1-YOLOv3 quant_post COCO 8 27.9 (-1.4) 28.0 (-1.3) 26.0 (-1.0) 25 - 下载链接
MobileNet-V1-YOLOv3 quant_aware COCO 8 28.1 (-1.2) 28.2 (-1.1) 25.8 (-1.2) 26.3 - 下载链接
R34-YOLOv3 - COCO 8 36.2 34.3 31.4 162 - 下载链接
R34-YOLOv3 quant_post COCO 8 35.7 (-0.5) - - 42.7 - 下载链接
R34-YOLOv3 quant_aware COCO 8 35.2 (-1.0) 33.3 (-1.0) 30.3 (-1.1) 44 - 下载链接
R50-dcn-YOLOv3 obj365_pretrain - COCO 8 41.4 - - 177 18.56 下载链接
R50-dcn-YOLOv3 obj365_pretrain quant_aware COCO 8 40.6 (-0.8) 37.5 34.1 66 14.64 下载链接

数据集:WIDER-FACE

模型 压缩方法 Image/GPU 输入尺寸 Easy/Medium/Hard 模型体积(MB) 下载
BlazeFace - 8 640 91.5/89.2/79.7 815 下载链接
BlazeFace quant_post 8 640 87.8/85.1/74.9 (-3.7/-4.1/-4.8) 228 下载链接
BlazeFace quant_aware 8 640 90.5/87.9/77.6 (-1.0/-1.3/-2.1) 228 下载链接
BlazeFace-Lite - 8 640 90.9/88.5/78.1 711 下载链接
BlazeFace-Lite quant_post 8 640 89.4/86.7/75.7 (-1.5/-1.8/-2.4) 211 下载链接
BlazeFace-Lite quant_aware 8 640 89.7/87.3/77.0 (-1.2/-1.2/-1.1) 211 下载链接
BlazeFace-NAS - 8 640 83.7/80.7/65.8 244 下载链接
BlazeFace-NAS quant_post 8 640 81.6/78.3/63.6 (-2.1/-2.4/-2.2) 71 下载链接
BlazeFace-NAS quant_aware 8 640 83.1/79.7/64.2 (-0.6/-1.0/-1.6) 71 下载链接

2.2 剪裁

数据集:Pasacl VOC & COCO 2017

PaddleLite推理耗时说明:

环境:Qualcomm SnapDragon 845 + armv8

速度指标:Thread1/Thread2/Thread4耗时

PaddleLite版本: v2.3

模型 压缩方法 数据集 Image/GPU 输入608 Box AP 输入416 Box AP 输入320 Box AP 模型体积(MB) GFLOPs (608*608) PaddleLite推理耗时(ms)(608*608) TensorRT推理速度(FPS)(608*608) 下载
MobileNet-V1-YOLOv3 Baseline Pascal VOC 8 76.2 76.7 75.3 94 40.49 1238\796.943\520.101 60.04 下载链接
MobileNet-V1-YOLOv3 sensitive -52.88% Pascal VOC 8 77.6 (+1.4) 77.7 (1.0) 75.5 (+0.2) 31 19.08 602.497\353.759\222.427 99.36 下载链接
MobileNet-V1-YOLOv3 - COCO 8 29.3 29.3 27.0 95 41.35 - - 下载链接
MobileNet-V1-YOLOv3 sensitive -51.77% COCO 8 26.0 (-3.3) 25.1 (-4.2) 22.6 (-4.4) 32 19.94 - 73.93 下载链接
R50-dcn-YOLOv3 - COCO 8 39.1 - - 177 89.60 - 27.68 下载链接
R50-dcn-YOLOv3 sensitive -9.37% COCO 8 39.3 (+0.2) - - 150 81.20 - 30.08 下载链接
R50-dcn-YOLOv3 sensitive -24.68% COCO 8 37.3 (-1.8) - - 113 67.48 - 34.32 下载链接
R50-dcn-YOLOv3 obj365_pretrain - COCO 8 41.4 - - 177 89.60 - - 下载链接
R50-dcn-YOLOv3 obj365_pretrain sensitive -9.37% COCO 8 40.5 (-0.9) - - 150 81.20 - - 下载链接
R50-dcn-YOLOv3 obj365_pretrain sensitive -24.68% COCO 8 37.8 (-3.3) - - 113 67.48 - - 下载链接

2.3 蒸馏

数据集:Pasacl VOC & COCO 2017

模型 压缩方法 数据集 Image/GPU 输入608 Box AP 输入416 Box AP 输入320 Box AP 模型体积(MB) 下载
MobileNet-V1-YOLOv3 - Pascal VOC 8 76.2 76.7 75.3 94 下载链接
ResNet34-YOLOv3 - Pascal VOC 8 82.6 81.9 80.1 162 下载链接
MobileNet-V1-YOLOv3 ResNet34-YOLOv3 distill Pascal VOC 8 79.0 (+2.8) 78.2 (+1.5) 75.5 (+0.2) 94 下载链接
MobileNet-V1-YOLOv3 - COCO 8 29.3 29.3 27.0 95 下载链接
ResNet34-YOLOv3 - COCO 8 36.2 34.3 31.4 163 下载链接
MobileNet-V1-YOLOv3 ResNet34-YOLOv3 distill COCO 8 31.4 (+2.1) 30.0 (+0.7) 27.1 (+0.1) 95 下载链接

2.4 搜索

数据集:WIDER-FACE

模型 压缩方法 Image/GPU 输入尺寸 Easy/Medium/Hard 模型体积(KB) 硬件延时(ms) 下载
BlazeFace - 8 640 91.5/89.2/79.7 815 71.862 下载链接
BlazeFace-NAS - 8 640 83.7/80.7/65.8 244 21.117 下载链接
BlazeFace-NASV2 SANAS 8 640 87.0/83.7/68.5 389 22.558 下载链接

Note: 硬件延时时间是利用提供的硬件延时表得到的,硬件延时表是在855芯片上基于PaddleLite测试的结果。BlazeFace-NASV2的详细配置在这里.

3. 图像分割

数据集:Cityscapes

3.1 量化

模型 压缩方法 mIoU 模型体积(MB) 下载
DeepLabv3+/MobileNetv1 - 63.26 6.6 下载链接
DeepLabv3+/MobileNetv1 quant_post 58.63 (-4.63) 1.8 下载链接
DeepLabv3+/MobileNetv1 quant_aware 62.03 (-1.23) 1.8 下载链接
DeepLabv3+/MobileNetv2 - 69.81 7.4 下载链接
DeepLabv3+/MobileNetv2 quant_post 67.59 (-2.22) 2.1 下载链接
DeepLabv3+/MobileNetv2 quant_aware 68.33 (-1.48) 2.1 下载链接

图像分割模型Lite时延(ms), 输入尺寸769x769

设备 模型类型 压缩策略 armv7 Thread 1 armv7 Thread 2 armv7 Thread 4 armv8 Thread 1 armv8 Thread 2 armv8 Thread 4
高通835 Deeplabv3- MobileNetV1 FP32 baseline 1227.9894 734.1922 527.9592 1109.96 699.3818 479.0818
高通835 Deeplabv3- MobileNetV1 quant_aware 848.6544 512.785 382.9915 752.3573 455.0901 307.8808
高通835 Deeplabv3- MobileNetV1 quant_post 840.2323 510.103 371.9315 748.9401 452.1745 309.2084
高通835 Deeplabv3-MobileNetV2 FP32 baseline 1282.8126 793.2064 653.6538 1193.9908 737.1827 593.4522
高通835 Deeplabv3-MobileNetV2 quant_aware 976.0495 659.0541 513.4279 892.1468 582.9847 484.7512
高通835 Deeplabv3-MobileNetV2 quant_post 981.44 658.4969 538.6166 885.3273 586.1284 484.0018
高通855 Deeplabv3- MobileNetV1 FP32 baseline 568.8748 339.8578 278.6316 420.6031 281.3197 217.5222
高通855 Deeplabv3- MobileNetV1 quant_aware 608.7578 347.2087 260.653 241.2394 177.3456 143.9178
高通855 Deeplabv3- MobileNetV1 quant_post 609.0142 347.3784 259.9825 239.4103 180.1894 139.9178
高通855 Deeplabv3-MobileNetV2 FP32 baseline 639.4425 390.1851 322.7014 477.7667 339.7411 262.2847
高通855 Deeplabv3-MobileNetV2 quant_aware 703.7275 497.689 417.1296 394.3586 300.2503 239.9204
高通855 Deeplabv3-MobileNetV2 quant_post 705.7589 474.4076 427.2951 394.8352 297.4035 264.6724
麒麟970 Deeplabv3- MobileNetV1 FP32 baseline 1682.1792 1437.9774 1181.0246 1261.6739 1068.6537 690.8225
麒麟970 Deeplabv3- MobileNetV1 quant_aware 1062.3394 1248.1014 878.3157 774.6356 710.6277 528.5376
麒麟970 Deeplabv3- MobileNetV1 quant_post 1109.1917 1339.6218 866.3587 771.5164 716.5255 500.6497
麒麟970 Deeplabv3-MobileNetV2 FP32 baseline 1771.1301 1746.0569 1222.4805 1448.9739 1192.4491 760.606
麒麟970 Deeplabv3-MobileNetV2 quant_aware 1320.2905 921.4522 676.0732 1145.8801 821.5685 590.1713
麒麟970 Deeplabv3-MobileNetV2 quant_post 1320.386 918.5328 672.2481 1020.753 820.094 591.4114

3.2 剪裁

PaddleLite推理耗时说明:

环境:Qualcomm SnapDragon 845 + armv8

速度指标:Thread1/Thread2/Thread4耗时

PaddleLite版本: v2.3

模型 压缩方法 mIoU 模型体积(MB) GFLOPs PaddleLite推理耗时 TensorRT推理速度(FPS) 下载
fast-scnn baseline 69.64 11 14.41 1226.36\682.96\415.664 39.53 下载链接
fast-scnn uniform -17.07% 69.58 (-0.06) 8.5 11.95 1140.37\656.612\415.888 42.01 下载链接
fast-scnn sensitive -47.60% 66.68 (-2.96) 5.7 7.55 866.693\494.467\291.748 51.48 下载链接