benchmark.md 5.5 KB
Newer Older
1
# 性能数据
2 3 4

可以参考[benchmark_tools](benchmark_tools),推荐**一键benchmark**

Z
zhupengyang 已提交
5
## ARM测试环境
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

* 测试模型
    * fp32模型
        * mobilenet_v1
        * mobilenet_v2
        * squeezenet_v1.1
        * mnasnet
        * shufflenet_v2
    
    * int8模型
        * mobilenet_v1
        * mobilenet_v2

* 测试机器(android ndk ndk-r17c)
   *  骁龙855
21
      * xiaomi mi9, snapdragon 855 (enable sdot instruction)
22 23 24 25 26 27 28 29 30
      * 4xA76(1@2.84GHz + 3@2.4GHz) + 4xA55@1.78GHz

   *  骁龙845
      * xiaomi mi8, 845
      * 2.8GHz(大四核),1.7GHz(小四核)

   *  骁龙835
      * xiaomi mix2, snapdragon 835
      * 2.45GHz(大四核),1.9GHz(小四核)
31

32 33 34 35
   * 麒麟970
      * HUAWEI Mate10
 
* 测试说明
36
    * branch: release/v2.6.0
37 38 39 40
    * warmup=10, repeats=30,统计平均时间,单位是ms
    * 当线程数为1时,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH,否者设置LITE_POWER_NO_BIND
    * 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1
    
Z
zhupengyang 已提交
41
## ARM测试数据
42 43 44 45 46 47


### fp32模型测试数据

#### paddlepaddle model

48
骁龙855|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
49 50
----| ---- | ---- | ---- | ----  |----  |----
threads num|1 |2 |4 |1 |2 |4 
51 52 53 54 55
mobilenet_v1 |35.11 |20.67 |11.83 |30.56 |18.59 |10.44 |
mobilenet_v2 |26.36 |15.83 |9.29 |21.64 |13.25 |7.95 |
shufflenet_v2 |4.56 |3.14 |2.35 |4.07 |2.89 |2.28 |
squeezenet_v1.1 |21.27 |13.55 |8.49 |18.05 |11.51 |7.83 |
mnasnet |21.40 |13.18 |7.63 |18.84 |11.40 |6.80 |
56

57

58
骁龙845|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
59 60
----| ---- | ---- | ---- | ----  |----  |----
threads num|1 |2 |4 |1 |2 |4 
61 62 63 64 65
mobilenet_v1 |65.56 |37.17 |19.65 |63.23 |32.98 |17.68 |
mobilenet_v2 |45.89 |25.20 |14.39 |41.03 |22.94 |12.98 |
shufflenet_v2 |7.31 |4.66 |3.27 |7.08 |4.71 |3.41 |
squeezenet_v1.1 |36.98 |22.53 |13.45 |34.27 |20.96 |12.60 |
mnasnet |39.85 |23.64 |12.25 |37.81 |20.70 |11.81 |
66 67


68
骁龙835|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
69 70
----| ---- | ---- | ---- | ----  |----  |----
threads num|1 |2 |4 |1 |2 |4 
71 72 73 74 75
mobilenet_v1 |92.77 |51.56 |30.14 |87.46 |48.02 |26.42 |
mobilenet_v2 |65.78 |36.52 |22.34 |58.31 |33.04 |19.87 |
shufflenet_v2 |10.39 |6.26 |4.46 |9.72 |6.19 |4.41 |
squeezenet_v1.1 |53.59 |33.16 |20.13 |51.56 |31.81 |19.10 |
mnasnet |57.44 |32.62 |19.47 |54.99 |30.69 |17.98 |
76 77 78

#### caffe model

79 80
骁龙855|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
81
threads num|1 |2 |4 |1 |2 |4 |
82 83 84
mobilenet_v1 |32.38 |18.65 |10.69 |30.75 |18.11 |9.88 |
mobilenet_v2 |29.45 |17.86 |10.81 |26.61 |16.26 |9.67 |
shufflenet_v2 |5.04 |3.14 |2.20 |4.09 |2.85 |2.25 |
85 86


87
骁龙845|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
88
----| ---- | ---- | ---- | ----  |----  |----
89
threads num|1 |2 |4 |1 |2 |4 |
90 91 92
mobilenet_v1 |65.26 |35.19 |19.11 |61.42 |33.15 |17.48 |
mobilenet_v2 |55.59 |31.31 |17.68 |51.54 |29.69 |16.00 |
shufflenet_v2 |7.42 |4.73 |3.33 |7.18 |4.75 |3.39 |
93 94


95
骁龙835|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
96
----| ---- | ---- | ---- | ----  |----  |----
97
threads num|1 |2 |4 |1 |2 |4 |
98 99 100
mobilenet_v1 |95.38 |52.16 |30.37 |92.10 |46.71 |26.31 |
mobilenet_v2 |82.89 |45.49 |28.14 |74.91 |41.88 |25.25 |
shufflenet_v2 |10.25 |6.36 |4.42 |9.68 |6.20 |4.42 |
101 102 103

#### int8量化模型测试数据

104 105
骁龙855|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
106
threads num|1 |2 |4 |1 |2 |4 |
107 108
mobilenet_v1 |37.18 |21.71 |11.16 | 14.41 |8.34 |4.37 |
mobilenet_v2 |27.95 |16.57 |8.97 | 13.68 |8.16 |4.67 |
109

110

111 112
骁龙835|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
113
threads num|1 |2 |4 |1 |2 |4 |
114 115
mobilenet_v1 |61.63 |32.60 |16.49 |57.36 |29.74 |15.50 |
mobilenet_v2 |47.13 |25.62 |13.56 |41.87 |22.42 |11.72 |
116 117


118 119
麒麟970|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
120
threads num|1 |2 |4 |1 |2 |4 |
121 122
mobilenet_v1 |63.13 |32.63 |16.85 |58.92 |29.96 |15.42 |
mobilenet_v2 |48.60 |25.43 |13.76 |43.06 |22.10 |12.09 |
Z
zhupengyang 已提交
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146


## 华为麒麟NPU测试环境

* 测试模型
    * fp32模型
        * mobilenet_v1
        * mobilenet_v2
        * squeezenet_v1.1
        * mnasnet

* 测试机器(android ndk ndk-r17c)
   *  麒麟810
      * HUAWEI Nova5, Kirin 810
      * 2xCortex A76 2.27GHz + 6xCortex A55 1.88GHz

   *  麒麟990
      * HUAWEI Mate 30, Kirin 990
      * 2 x Cortex-A76 Based 2.86 GHz + 2 x Cortex-A76 Based 2.09 GHz + 4 x Cortex-A55 1.86 GHz

   *  麒麟990 5G
      * HUAWEI P40, Kirin 990 5G
      * 2 x Cortex-A76 Based 2.86GHz + 2 x Cortex-A76 Based 2.36GHz + 4 x Cortex-A55 1.95GHz

147
* HIAI ddk 版本: 310 or 320
Z
zhupengyang 已提交
148 149 150 151 152 153 154 155 156 157 158
 
* 测试说明
    * branch: release/v2.6.1
    * warmup=10, repeats=30,统计平均时间,单位是ms
    * 线程数为1,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH
    * 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1
    
## 华为麒麟NPU测试数据

#### paddlepaddle model

159 160
- ddk 310

Z
zhupengyang 已提交
161 162
|Kirin |810||990||990 5G||
|---|---|---|---|---|---|---|
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
|  |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |
|mobilenet_v1|	 41.20|  12.76|  31.91|  4.07|  33.97|  3.20|
|mobilenet_v2|	 29.57|  12.12|  22.47|  5.61|  23.17|  3.51|
|squeezenet|  23.96|  9.04|  17.79|  3.82|	 18.65|  3.01|
|mnasnet|  26.47|  13.62|  19.54|  5.17|	 20.34|  3.32|


- ddk 320

|模型 |990||990-5G||
|---|---|---|---|---|
||cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |
|ssd_mobilenetv1|  65.67|  18.21|  71.8|	16.6|


*说明:ssd_mobilenetv1的npu性能为npu、cpu混合调度运行的总时间*