未验证 提交 a569c528 编写于 作者: Y yuyang18

Add anakin doc

上级 09ba7c8f
# Anakin ARM 性能测试
## 测试环境和参数:
+ 测试模型Mobilenetv1, mobilenetv2, mobilenet-ssd
+ 采用android ndk交叉编译,gcc 4.9,enable neon, ABI: armveabi-v7a with neon -mfloat-abi=softfp
+ 测试平台
- 荣耀v9(root): 处理器:麒麟960, 4 big cores in 2.36GHz, 4 little cores in 1.8GHz
- nubia z17:处理器:高通835, 4 big cores in 2.36GHz, 4 little cores in 1.9GHz
- 360 N5:处理器:高通653, 4 big cores in 1.8GHz, 4 little cores in 1.4GHz
+ 多线程:openmp
+ 时间:warmup10次,运行10次取均值
+ ncnn版本:来源于github的master branch中commits ID:307a77f04be29875f40d337cfff6df747df09de6(msg:convert LogisticRegressionOutput)版本
+ TFlite版本:来源于github的master branch中commits ID:65c05bc2ac19f51f7027e66350bc71652662125c(msg:Removed unneeded file copy that was causing failure in Pi builds)版本
在BenchMark中本文将使用**`ncnn`****`TFlite`****`Anakin`**进行性能对比分析
## BenchMark model
> 注意在性能测试之前,请先将测试model通过[External Converter](#10003)转换为Anakin model
> 对这些model,本文在ARM上进行多线程的单batch size测试。
- [Mobilenet v1](#11) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载*
- [Mobilenet v2](#22) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载*
- [mobilenet-ssd](#33) *caffe model 可以在[这儿](https://github.com/chuanqi305/MobileNet-SSD)下载*
### <span id = '11'> mobilenetv1 </span>
|platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
|麒麟960|107.7ms|61.1ms|38.2ms|152.8ms|85.2ms|51.9ms|152.6ms|nan|nan|
|高通835|105.7ms|63.1ms|~~46.8ms~~|152.7ms|87.0ms|~~92.7ms~~|146.9ms|nan|nan|
|高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan|
### <span id = '22'> mobilenetv2 </span>
|platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
|麒麟960|93.1ms|53.9ms|34.8ms|144.4ms|84.3ms|55.3ms|100.6ms|nan|nan|
|高通835|93.0ms|55.6ms|41.1ms|139.1ms|88.4ms|58.1ms|95.2ms|nan|nan|
|高通653|106.6ms|64.2ms|48.0ms|199.9ms|125.1ms|98.9ms|108.5ms|nan|nan|
### <span id = '33'> mobilenet-ssd </span>
|platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
|麒麟960|213.9ms|120.5ms|74.5ms|307.9ms|166.5ms|104.2ms|nan|nan|nan|
|高通835|213.0ms|125.7ms|~~98.4ms~~|292.9ms|177.9ms|~~167.8ms~~|nan|nan|nan|
|高通653|236.0ms|129.6ms|96.0ms|377.7ms|228.9ms|165.0ms|nan|nan|nan
## How to run those Benchmark models?
1. 首先, 使用[External Converter](../docs/Manual/Converter_en.md)对caffe model 进行转换
2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机
3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令
4. 最后,终端显示器上将会打印该模型的运行时间
5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到
../../../anakin/examples/example_introduction_cn.md
\ No newline at end of file
# Anakin GPU Benchmark
## Machine:
> CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz`
> GPU: `Tesla P4`
> cuDNN: `v7`
## Counterpart of anakin :
The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 3`** , The models which TensorRT 3 doesn't support we use the custom plugins to support.
## Benchmark Model
The following convolutional neural networks are tested with both `Anakin` and `TenorRT3`.
You can use pretrained caffe model or the model trained by youself.
> Please note that you should transform caffe model or others into anakin model with the help of [`external converter ->`](../docs/Manual/Converter_en.md)
- [Vgg16](#1) *caffe model can be found [here->](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)*
- [Yolo](#2) *caffe model can be found [here->](https://github.com/hojel/caffe-yolo-model)*
- [Resnet50](#3) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)*
- [Resnet101](#4) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)*
- [Mobilenet v1](#5) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)*
- [Mobilenet v2](#6) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)*
- [RNN](#7) *not support yet*
We tested them on single-GPU with single-thread.
### <span id = '1'>VGG16 </span>
- Latency (`ms`) of different batch
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 8.8690 | 8.2815
2 | 15.5344 | 13.9116
4 | 26.6000 | 21.8747
8 | 49.8279 | 40.4076
32 | 188.6270 | 163.7660
- GPU Memory Used (`MB`)
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 963 | 997
2 | 965 | 1039
4 | 991 | 1115
8 | 1067 | 1269
32 | 1715 | 2193
### <span id = '2'>Yolo </span>
- Latency (`ms`) of different batch
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 16.4596| 15.2124
2 | 26.6347| 25.0442
4 | 43.3695| 43.5017
8 | 80.9139 | 80.9880
32 | 293.8080| 310.8810
- GPU Memory Used (`MB`)
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 1569 | 1775
2 | 1649 | 1815
4 | 1709 | 1887
8 | 1731 | 2031
32 | 2253 | 2907
### <span id = '3'> Resnet50 </span>
- Latency (`ms`) of different batch
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 4.2459 | 4.1061
2 | 6.2627 | 6.5159
4 | 10.1277 | 11.3327
8 | 17.8209 | 20.6680
32 | 65.8582 | 77.8858
- GPU Memory Used (`MB`)
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 531 | 503
2 | 543 | 517
4 | 583 | 541
8 | 611 | 589
32 | 809 | 879
### <span id = '4'> Resnet101 </span>
- Latency (`ms`) of different batch
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 7.5562 | 7.0837
2 | 11.6023 | 11.4079
4 | 18.3650 | 20.0493
8 | 32.7632 | 36.0648
32 | 123.2550 | 135.4880
- GPU Memory Used (`MB)`
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 701 | 683
2 | 713 | 697
4 | 793 | 721
8 | 819 | 769
32 | 1043 | 1059
### <span id = '5'> MobileNet V1 </span>
- Latency (`ms`) of different batch
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 45.5156 | 1.3947
2 | 46.5585 | 2.5483
4 | 48.4242 | 4.3404
8 | 52.7957 | 8.1513
32 | 83.2519 | 31.3178
- GPU Memory Used (`MB`)
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 329 | 283
2 | 345 | 289
4 | 371 | 299
8 | 393 | 319
32 | 531 | 433
### <span id = '6'> MobileNet V2</span>
- Latency (`ms`) of different batch
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 65.6861 | 2.9842
2 | 66.6814 | 4.7472
4 | 69.7114 | 7.4163
8 | 76.1092 | 12.8779
32 | 124.9810 | 47.2142
- GPU Memory Used (`MB`)
BatchSize | TensorRT | Anakin
:---: | :---: | :---: |
1 | 341 | 293
2 | 353 | 301
4 | 385 | 319
8 | 421 | 351
32 | 637 | 551
## How to run those Benchmark models?
> 1. At first, you should parse the caffe model with [`external converter ->`](../docs/Manual/Converter_en.md).
> 2. Switch to *source_root/benchmark/CNN* directory. Use 'mkdir ./models' to create ./models and put anakin models into this file.
> 3. Use command 'sh run.sh', we will create files in logs to save model log with different batch size. Finally, model latency summary will be displayed on the screen.
> 4. If you want to get more detailed information with op time, you can modify CMakeLists.txt with setting `ENABLE_OP_TIMER` to `YES`, then recompile and run. You will find detailed information in model log file.
../../../anakin/docs/Manual/Tutorial_ch.md
\ No newline at end of file
../../../anakin/docs/Manual/Converter_ch.md
\ No newline at end of file
......@@ -6,5 +6,19 @@
######
使用Anakin进行服务端预测
========================
.. toctree::
:maxdepth: 1
install_anakin.md
convert_paddle_to_anakin.md
run_anakin_on_arm.md
anakin_tutorial.md
anakin_example.md
anakin_gpu_benchmark.md
anakin_arm_benchmark.md
移动端
######
\ No newline at end of file
######
../../../anakin/docs/Manual/INSTALL_ch.md
\ No newline at end of file
../../../anakin/docs/Manual/run_on_arm_ch.md
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册