benchmark.md 5.2 KB
Newer Older
1
# 性能数据
2 3 4

可以参考[benchmark_tools](benchmark_tools),推荐**一键benchmark**

Z
zhupengyang 已提交
5
## ARM测试环境
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

* 测试模型
    * fp32模型
        * mobilenet_v1
        * mobilenet_v2
        * squeezenet_v1.1
        * mnasnet
        * shufflenet_v2
    
    * int8模型
        * mobilenet_v1
        * mobilenet_v2

* 测试机器(android ndk ndk-r17c)
   *  骁龙855
21
      * xiaomi mi9, snapdragon 855 (enable sdot instruction)
22 23 24 25 26 27 28 29 30
      * 4xA76(1@2.84GHz + 3@2.4GHz) + 4xA55@1.78GHz

   *  骁龙845
      * xiaomi mi8, 845
      * 2.8GHz(大四核),1.7GHz(小四核)

   *  骁龙835
      * xiaomi mix2, snapdragon 835
      * 2.45GHz(大四核),1.9GHz(小四核)
31

32 33 34 35
   * 麒麟970
      * HUAWEI Mate10
 
* 测试说明
36
    * branch: release/v2.6.0
37 38 39 40
    * warmup=10, repeats=30,统计平均时间,单位是ms
    * 当线程数为1时,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH,否者设置LITE_POWER_NO_BIND
    * 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1
    
Z
zhupengyang 已提交
41
## ARM测试数据
42 43 44 45 46 47


### fp32模型测试数据

#### paddlepaddle model

48
骁龙855|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
49 50
----| ---- | ---- | ---- | ----  |----  |----
threads num|1 |2 |4 |1 |2 |4 
51 52 53 54 55
mobilenet_v1 |35.11 |20.67 |11.83 |30.56 |18.59 |10.44 |
mobilenet_v2 |26.36 |15.83 |9.29 |21.64 |13.25 |7.95 |
shufflenet_v2 |4.56 |3.14 |2.35 |4.07 |2.89 |2.28 |
squeezenet_v1.1 |21.27 |13.55 |8.49 |18.05 |11.51 |7.83 |
mnasnet |21.40 |13.18 |7.63 |18.84 |11.40 |6.80 |
56

57

58
骁龙845|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
59 60
----| ---- | ---- | ---- | ----  |----  |----
threads num|1 |2 |4 |1 |2 |4 
61 62 63 64 65
mobilenet_v1 |65.56 |37.17 |19.65 |63.23 |32.98 |17.68 |
mobilenet_v2 |45.89 |25.20 |14.39 |41.03 |22.94 |12.98 |
shufflenet_v2 |7.31 |4.66 |3.27 |7.08 |4.71 |3.41 |
squeezenet_v1.1 |36.98 |22.53 |13.45 |34.27 |20.96 |12.60 |
mnasnet |39.85 |23.64 |12.25 |37.81 |20.70 |11.81 |
66 67


68
骁龙835|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
69 70
----| ---- | ---- | ---- | ----  |----  |----
threads num|1 |2 |4 |1 |2 |4 
71 72 73 74 75
mobilenet_v1 |92.77 |51.56 |30.14 |87.46 |48.02 |26.42 |
mobilenet_v2 |65.78 |36.52 |22.34 |58.31 |33.04 |19.87 |
shufflenet_v2 |10.39 |6.26 |4.46 |9.72 |6.19 |4.41 |
squeezenet_v1.1 |53.59 |33.16 |20.13 |51.56 |31.81 |19.10 |
mnasnet |57.44 |32.62 |19.47 |54.99 |30.69 |17.98 |
76 77 78

#### caffe model

79 80
骁龙855|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
81
threads num|1 |2 |4 |1 |2 |4 |
82 83 84
mobilenet_v1 |32.38 |18.65 |10.69 |30.75 |18.11 |9.88 |
mobilenet_v2 |29.45 |17.86 |10.81 |26.61 |16.26 |9.67 |
shufflenet_v2 |5.04 |3.14 |2.20 |4.09 |2.85 |2.25 |
85 86


87
骁龙845|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
88
----| ---- | ---- | ---- | ----  |----  |----
89
threads num|1 |2 |4 |1 |2 |4 |
90 91 92
mobilenet_v1 |65.26 |35.19 |19.11 |61.42 |33.15 |17.48 |
mobilenet_v2 |55.59 |31.31 |17.68 |51.54 |29.69 |16.00 |
shufflenet_v2 |7.42 |4.73 |3.33 |7.18 |4.75 |3.39 |
93 94


95
骁龙835|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
96
----| ---- | ---- | ---- | ----  |----  |----
97
threads num|1 |2 |4 |1 |2 |4 |
98 99 100
mobilenet_v1 |95.38 |52.16 |30.37 |92.10 |46.71 |26.31 |
mobilenet_v2 |82.89 |45.49 |28.14 |74.91 |41.88 |25.25 |
shufflenet_v2 |10.25 |6.36 |4.42 |9.68 |6.20 |4.42 |
101 102 103

#### int8量化模型测试数据

104 105
骁龙855|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
106
threads num|1 |2 |4 |1 |2 |4 |
107 108
mobilenet_v1 |37.18 |21.71 |11.16 | 14.41 |8.34 |4.37 |
mobilenet_v2 |27.95 |16.57 |8.97 | 13.68 |8.16 |4.67 |
109

110

111 112
骁龙835|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
113
threads num|1 |2 |4 |1 |2 |4 |
114 115
mobilenet_v1 |61.63 |32.60 |16.49 |57.36 |29.74 |15.50 |
mobilenet_v2 |47.13 |25.62 |13.56 |41.87 |22.42 |11.72 |
116 117


118 119
麒麟970|armv7 | armv7 |  armv7 |armv8 | armv8 |armv8 
----| ---- | ---- | ---- | ----  |----  |----
120
threads num|1 |2 |4 |1 |2 |4 |
121 122
mobilenet_v1 |63.13 |32.63 |16.85 |58.92 |29.96 |15.42 |
mobilenet_v2 |48.60 |25.43 |13.76 |43.06 |22.10 |12.09 |
Z
zhupengyang 已提交
123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165


## 华为麒麟NPU测试环境

* 测试模型
    * fp32模型
        * mobilenet_v1
        * mobilenet_v2
        * squeezenet_v1.1
        * mnasnet

* 测试机器(android ndk ndk-r17c)
   *  麒麟810
      * HUAWEI Nova5, Kirin 810
      * 2xCortex A76 2.27GHz + 6xCortex A55 1.88GHz

   *  麒麟990
      * HUAWEI Mate 30, Kirin 990
      * 2 x Cortex-A76 Based 2.86 GHz + 2 x Cortex-A76 Based 2.09 GHz + 4 x Cortex-A55 1.86 GHz

   *  麒麟990 5G
      * HUAWEI P40, Kirin 990 5G
      * 2 x Cortex-A76 Based 2.86GHz + 2 x Cortex-A76 Based 2.36GHz + 4 x Cortex-A55 1.95GHz

* HIAI ddk 版本: 310
 
* 测试说明
    * branch: release/v2.6.1
    * warmup=10, repeats=30,统计平均时间,单位是ms
    * 线程数为1,```DeviceInfo::Global().SetRunMode```设置LITE_POWER_HIGH
    * 模型的输入图像的维度是{1, 3, 224, 224},输入图像的每一位数值是1
    
## 华为麒麟NPU测试数据

#### paddlepaddle model

|Kirin |810||990||990 5G||
|---|---|---|---|---|---|---|
||cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |cpu(ms) | npu(ms) |
|mobilenet_v1|	33.84|	3.10|  	31.91|  4.07|	33.97|  3.20|
|mobilenet_v2|	23.32|  3.51|	22.47|  5.61|	23.17|  3.51|
|squeezenet|	18.47| 4.35|  17.79|  5.05|	18.65|  3.47|
|mnasnet|	20.24|  3.28|	19.54|  5.17|	20.34| 3.32|