# 性能数据 可以参考[benchmark_tools](benchmark_tools),推荐**一键benchmark**。 ## ARM测试环境 * 测试模型 * fp32模型 * mobilenet_v1 * mobilenet_v2 * squeezenet_v1.1 * mnasnet * shufflenet_v2 * int8模型 * mobilenet_v1 * mobilenet_v2 * 测试机器(android ndk ndk-r17c) * 骁龙855 * xiaomi mi9, snapdragon 855 (enable sdot instruction) * 4xA76(1@2.84GHz + 3@2.4GHz) + 4xA55@1.78GHz * 骁龙845 * xiaomi mi8, 845 * 2.8GHz(大四核),1.7GHz(小四核) * 骁龙835 * xiaomi mix2, snapdragon 835 * 2.45GHz(大四核),1.9GHz(小四核) * 麒麟970 * HUAWEI Mate10 * 测试说明 * branch: release/v2.6.0 * warmup=10, repeats=30,统计平均时间,单位是ms * 当线程数为1时,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH,否者设置LITE_POWER_NO_BIND * 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1 ## ARM测试数据 ### fp32模型测试数据 #### paddlepaddle model 骁龙855|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 mobilenet_v1 |35.11 |20.67 |11.83 |30.56 |18.59 |10.44 | mobilenet_v2 |26.36 |15.83 |9.29 |21.64 |13.25 |7.95 | shufflenet_v2 |4.56 |3.14 |2.35 |4.07 |2.89 |2.28 | squeezenet_v1.1 |21.27 |13.55 |8.49 |18.05 |11.51 |7.83 | mnasnet |21.40 |13.18 |7.63 |18.84 |11.40 |6.80 | 骁龙845|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 mobilenet_v1 |65.56 |37.17 |19.65 |63.23 |32.98 |17.68 | mobilenet_v2 |45.89 |25.20 |14.39 |41.03 |22.94 |12.98 | shufflenet_v2 |7.31 |4.66 |3.27 |7.08 |4.71 |3.41 | squeezenet_v1.1 |36.98 |22.53 |13.45 |34.27 |20.96 |12.60 | mnasnet |39.85 |23.64 |12.25 |37.81 |20.70 |11.81 | 骁龙835|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 mobilenet_v1 |92.77 |51.56 |30.14 |87.46 |48.02 |26.42 | mobilenet_v2 |65.78 |36.52 |22.34 |58.31 |33.04 |19.87 | shufflenet_v2 |10.39 |6.26 |4.46 |9.72 |6.19 |4.41 | squeezenet_v1.1 |53.59 |33.16 |20.13 |51.56 |31.81 |19.10 | mnasnet |57.44 |32.62 |19.47 |54.99 |30.69 |17.98 | #### caffe model 骁龙855|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 | mobilenet_v1 |32.38 |18.65 |10.69 |30.75 |18.11 |9.88 | mobilenet_v2 |29.45 |17.86 |10.81 |26.61 |16.26 |9.67 | shufflenet_v2 |5.04 |3.14 |2.20 |4.09 |2.85 |2.25 | 骁龙845|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 | mobilenet_v1 |65.26 |35.19 |19.11 |61.42 |33.15 |17.48 | mobilenet_v2 |55.59 |31.31 |17.68 |51.54 |29.69 |16.00 | shufflenet_v2 |7.42 |4.73 |3.33 |7.18 |4.75 |3.39 | 骁龙835|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 | mobilenet_v1 |95.38 |52.16 |30.37 |92.10 |46.71 |26.31 | mobilenet_v2 |82.89 |45.49 |28.14 |74.91 |41.88 |25.25 | shufflenet_v2 |10.25 |6.36 |4.42 |9.68 |6.20 |4.42 | #### int8量化模型测试数据 骁龙855|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 | mobilenet_v1 |37.18 |21.71 |11.16 | 14.41 |8.34 |4.37 | mobilenet_v2 |27.95 |16.57 |8.97 | 13.68 |8.16 |4.67 | 骁龙835|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 | mobilenet_v1 |61.63 |32.60 |16.49 |57.36 |29.74 |15.50 | mobilenet_v2 |47.13 |25.62 |13.56 |41.87 |22.42 |11.72 | 麒麟970|armv7 | armv7 | armv7 |armv8 | armv8 |armv8 ----| ---- | ---- | ---- | ---- |---- |---- threads num|1 |2 |4 |1 |2 |4 | mobilenet_v1 |63.13 |32.63 |16.85 |58.92 |29.96 |15.42 | mobilenet_v2 |48.60 |25.43 |13.76 |43.06 |22.10 |12.09 | ## 华为麒麟NPU测试环境 * 测试模型 * fp32模型 * mobilenet_v1 * mobilenet_v2 * squeezenet_v1.1 * mnasnet * 测试机器(android ndk ndk-r17c) * 麒麟810 * HUAWEI Nova5, Kirin 810 * 2xCortex A76 2.27GHz + 6xCortex A55 1.88GHz * 麒麟990 * HUAWEI Mate 30, Kirin 990 * 2 x Cortex-A76 Based 2.86 GHz + 2 x Cortex-A76 Based 2.09 GHz + 4 x Cortex-A55 1.86 GHz * 麒麟990 5G * HUAWEI P40, Kirin 990 5G * 2 x Cortex-A76 Based 2.86GHz + 2 x Cortex-A76 Based 2.36GHz + 4 x Cortex-A55 1.95GHz * HIAI ddk 版本: 310 or 320 * 测试说明 * branch: release/v2.6.1 * warmup=10, repeats=30,统计平均时间,单位是ms * 线程数为1,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH * 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1 ## 华为麒麟NPU测试数据 #### paddlepaddle model - ddk 310 |Kirin |810||990||990 5G|| |---|---|---|---|---|---|---| | |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) | |mobilenet_v1| 41.20| 12.76| 31.91| 4.07| 33.97| 3.20| |mobilenet_v2| 29.57| 12.12| 22.47| 5.61| 23.17| 3.51| |squeezenet| 23.96| 9.04| 17.79| 3.82| 18.65| 3.01| |mnasnet| 26.47| 13.62| 19.54| 5.17| 20.34| 3.32| - ddk 320 |模型 |990||990-5G|| |---|---|---|---|---| ||cpu(ms) | npu(ms) |cpu(ms) | npu(ms) | |ssd_mobilenetv1| 65.67| 18.21| 71.8| 16.6| *说明:ssd_mobilenetv1的npu性能为npu、cpu混合调度运行的总时间*