diff --git a/.gitmodules b/.gitmodules index fa45d0eaa75ca251ab9cbb410ec7e7a73936cab4..ea6254755af221ea0d76d82bbf0bef054587c96e 100644 --- a/.gitmodules +++ b/.gitmodules @@ -4,6 +4,9 @@ [submodule "book"] path = book url = https://github.com/PaddlePaddle/book.git -[submodule "source/anakin"] - path = source/anakin - url = https://github.com/PaddlePaddle/Anakin +[submodule "anakin"] + path = anakin + url = https://github.com/PaddlePaddle/Anakin.git +[submodule "mobile"] + path = mobile + url = https://github.com/PaddlePaddle/paddle-mobile.git diff --git a/anakin b/anakin new file mode 160000 index 0000000000000000000000000000000000000000..b9d95555a73f3e02aa169251cd319053b6d7d642 --- /dev/null +++ b/anakin @@ -0,0 +1 @@ +Subproject commit b9d95555a73f3e02aa169251cd319053b6d7d642 diff --git a/mobile b/mobile new file mode 160000 index 0000000000000000000000000000000000000000..c3aa92ac28662d7a1553cd258ddd3f19412f5018 --- /dev/null +++ b/mobile @@ -0,0 +1 @@ +Subproject commit c3aa92ac28662d7a1553cd258ddd3f19412f5018 diff --git a/source/advanced_usage/deploy/anakin_arm_benchmark.md b/source/advanced_usage/deploy/anakin_arm_benchmark.md new file mode 100644 index 0000000000000000000000000000000000000000..bef1e20eb3479654b6167955bdc2ee1393b305a4 --- /dev/null +++ b/source/advanced_usage/deploy/anakin_arm_benchmark.md @@ -0,0 +1,57 @@ +# Anakin ARM 性能测试 + +## 测试环境和参数: ++ 测试模型Mobilenetv1, mobilenetv2, mobilenet-ssd ++ 采用android ndk交叉编译,gcc 4.9,enable neon, ABI: armveabi-v7a with neon -mfloat-abi=softfp ++ 测试平台 + - 荣耀v9(root): 处理器:麒麟960, 4 big cores in 2.36GHz, 4 little cores in 1.8GHz + - nubia z17:处理器:高通835, 4 big cores in 2.36GHz, 4 little cores in 1.9GHz + - 360 N5:处理器:高通653, 4 big cores in 1.8GHz, 4 little cores in 1.4GHz ++ 多线程:openmp ++ 时间:warmup10次,运行10次取均值 ++ ncnn版本:来源于github的master branch中commits ID:307a77f04be29875f40d337cfff6df747df09de6(msg:convert LogisticRegressionOutput)版本 ++ TFlite版本:来源于github的master branch中commits ID:65c05bc2ac19f51f7027e66350bc71652662125c(msg:Removed unneeded file copy that was causing failure in Pi builds)版本 + +在BenchMark中本文将使用**`ncnn`**、**`TFlite`**和**`Anakin`**进行性能对比分析 + +## BenchMark model + +> 注意在性能测试之前,请先将测试model通过[External Converter](#10003)转换为Anakin model +> 对这些model,本文在ARM上进行多线程的单batch size测试。 + +- [Mobilenet v1](#11) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* +- [Mobilenet v2](#22) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* +- [mobilenet-ssd](#33) *caffe model 可以在[这儿](https://github.com/chuanqi305/MobileNet-SSD)下载* + +### mobilenetv1 + + |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| + |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| + |麒麟960|107.7ms|61.1ms|38.2ms|152.8ms|85.2ms|51.9ms|152.6ms|nan|nan| + |高通835|105.7ms|63.1ms|~~46.8ms~~|152.7ms|87.0ms|~~92.7ms~~|146.9ms|nan|nan| + |高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan| + +### mobilenetv2 + + |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| + |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| + |麒麟960|93.1ms|53.9ms|34.8ms|144.4ms|84.3ms|55.3ms|100.6ms|nan|nan| + |高通835|93.0ms|55.6ms|41.1ms|139.1ms|88.4ms|58.1ms|95.2ms|nan|nan| + |高通653|106.6ms|64.2ms|48.0ms|199.9ms|125.1ms|98.9ms|108.5ms|nan|nan| + +### mobilenet-ssd + + |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| + |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| + |麒麟960|213.9ms|120.5ms|74.5ms|307.9ms|166.5ms|104.2ms|nan|nan|nan| + |高通835|213.0ms|125.7ms|~~98.4ms~~|292.9ms|177.9ms|~~167.8ms~~|nan|nan|nan| + |高通653|236.0ms|129.6ms|96.0ms|377.7ms|228.9ms|165.0ms|nan|nan|nan + +## How to run those Benchmark models? + +1. 首先, 使用[External Converter](../docs/Manual/Converter_en.md)对caffe model 进行转换 +2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机 +3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令 +4. 最后,终端显示器上将会打印该模型的运行时间 +5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到 + diff --git a/source/advanced_usage/deploy/anakin_example.md b/source/advanced_usage/deploy/anakin_example.md new file mode 120000 index 0000000000000000000000000000000000000000..4ee61319f788c3ea2648a0a4831d8434fcfb162f --- /dev/null +++ b/source/advanced_usage/deploy/anakin_example.md @@ -0,0 +1 @@ +../../../anakin/examples/example_introduction_cn.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/anakin_gpu_benchmark.md b/source/advanced_usage/deploy/anakin_gpu_benchmark.md new file mode 100644 index 0000000000000000000000000000000000000000..343e7ae74fae32c2c5e26b6168d0d1342177fb0f --- /dev/null +++ b/source/advanced_usage/deploy/anakin_gpu_benchmark.md @@ -0,0 +1,175 @@ +# Anakin GPU Benchmark + +## Machine: + +> CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz` +> GPU: `Tesla P4` +> cuDNN: `v7` + + +## Counterpart of anakin : + +The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 3`** , The models which TensorRT 3 doesn't support we use the custom plugins to support. + +## Benchmark Model + +The following convolutional neural networks are tested with both `Anakin` and `TenorRT3`. + You can use pretrained caffe model or the model trained by youself. + +> Please note that you should transform caffe model or others into anakin model with the help of [`external converter ->`](../docs/Manual/Converter_en.md) + + +- [Vgg16](#1) *caffe model can be found [here->](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)* +- [Yolo](#2) *caffe model can be found [here->](https://github.com/hojel/caffe-yolo-model)* +- [Resnet50](#3) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)* +- [Resnet101](#4) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)* +- [Mobilenet v1](#5) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)* +- [Mobilenet v2](#6) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)* +- [RNN](#7) *not support yet* + +We tested them on single-GPU with single-thread. + +### VGG16 + +- Latency (`ms`) of different batch + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 8.8690 | 8.2815 | +| 2 | 15.5344 | 13.9116 | +| 4 | 26.6000 | 21.8747 | +| 8 | 49.8279 | 40.4076 | +| 32 | 188.6270 | 163.7660 | + +- GPU Memory Used (`MB`) + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 963 | 997 | +| 2 | 965 | 1039 | +| 4 | 991 | 1115 | +| 8 | 1067 | 1269 | +| 32 | 1715 | 2193 | + + +### Yolo + +- Latency (`ms`) of different batch + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 16.4596| 15.2124 | +| 2 | 26.6347| 25.0442 | +| 4 | 43.3695| 43.5017 | +| 8 | 80.9139 | 80.9880 | +| 32 | 293.8080| 310.8810 | + +- GPU Memory Used (`MB`) + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 1569 | 1775 | +| 2 | 1649 | 1815 | +| 4 | 1709 | 1887 | +| 8 | 1731 | 2031 | +| 32 | 2253 | 2907 | + +### Resnet50 + +- Latency (`ms`) of different batch + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 4.2459 | 4.1061 | +| 2 | 6.2627 | 6.5159 | +| 4 | 10.1277 | 11.3327 | +| 8 | 17.8209 | 20.6680 | +| 32 | 65.8582 | 77.8858 | + +- GPU Memory Used (`MB`) + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 531 | 503 | +| 2 | 543 | 517 | +| 4 | 583 | 541 | +| 8 | 611 | 589 | +| 32 | 809 | 879 | + +### Resnet101 + +- Latency (`ms`) of different batch + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 7.5562 | 7.0837 | +| 2 | 11.6023 | 11.4079 | +| 4 | 18.3650 | 20.0493 | +| 8 | 32.7632 | 36.0648 | +| 32 | 123.2550 | 135.4880 | + +- GPU Memory Used (`MB)` + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 701 | 683 | +| 2 | 713 | 697 | +| 4 | 793 | 721 | +| 8 | 819 | 769 | +| 32 | 1043 | 1059 | + +### MobileNet V1 + +- Latency (`ms`) of different batch + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 45.5156 | 1.3947 | +| 2 | 46.5585 | 2.5483 | +| 4 | 48.4242 | 4.3404 | +| 8 | 52.7957 | 8.1513 | +| 32 | 83.2519 | 31.3178 | + +- GPU Memory Used (`MB`) + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 329 | 283 | +| 2 | 345 | 289 | +| 4 | 371 | 299 | +| 8 | 393 | 319 | +| 32 | 531 | 433 | + +### MobileNet V2 + +- Latency (`ms`) of different batch + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 65.6861 | 2.9842 | +| 2 | 66.6814 | 4.7472 | +| 4 | 69.7114 | 7.4163 | +| 8 | 76.1092 | 12.8779 | +| 32 | 124.9810 | 47.2142 | + +- GPU Memory Used (`MB`) + +| BatchSize | TensorRT | Anakin | +| --- | --- | --- | +| 1 | 341 | 293 | +| 2 | 353 | 301 | +| 4 | 385 | 319 | +| 8 | 421 | 351 | +| 32 | 637 | 551 | + +## How to run those Benchmark models? + +> 1. At first, you should parse the caffe model with [`external converter`](https://github.com/PaddlePaddle/Anakin/blob/b95f31e19993a192e7428b4fcf852b9fe9860e5f/docs/Manual/Converter_en.md). +> 2. Switch to *source_root/benchmark/CNN* directory. Use 'mkdir ./models' to create ./models and put anakin models into this file. +> 3. Use command 'sh run.sh', we will create files in logs to save model log with different batch size. Finally, model latency summary will be displayed on the screen. +> 4. If you want to get more detailed information with op time, you can modify CMakeLists.txt with setting `ENABLE_OP_TIMER` to `YES`, then recompile and run. You will find detailed information in model log file. + + + + + diff --git a/source/advanced_usage/deploy/anakin_tutorial.md b/source/advanced_usage/deploy/anakin_tutorial.md new file mode 120000 index 0000000000000000000000000000000000000000..fac95f29159b4f58bba9237349853201ae29f7cb --- /dev/null +++ b/source/advanced_usage/deploy/anakin_tutorial.md @@ -0,0 +1 @@ +../../../anakin/docs/Manual/Tutorial_ch.md \ No newline at end of file diff --git a/source/beginners_guide/install/build_and_install_lib_cn.rst b/source/advanced_usage/deploy/build_and_install_lib_cn.rst similarity index 100% rename from source/beginners_guide/install/build_and_install_lib_cn.rst rename to source/advanced_usage/deploy/build_and_install_lib_cn.rst diff --git a/source/advanced_usage/deploy/convert_paddle_to_anakin.md b/source/advanced_usage/deploy/convert_paddle_to_anakin.md new file mode 120000 index 0000000000000000000000000000000000000000..e9f36b4da1e20830c7a01e85fa8990d990503682 --- /dev/null +++ b/source/advanced_usage/deploy/convert_paddle_to_anakin.md @@ -0,0 +1 @@ +../../../anakin/docs/Manual/Converter_ch.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/how_to_add_anakin_op.md b/source/advanced_usage/deploy/how_to_add_anakin_op.md new file mode 120000 index 0000000000000000000000000000000000000000..ceeedeb9b99435b6b8cb3fd2a888d37562b04773 --- /dev/null +++ b/source/advanced_usage/deploy/how_to_add_anakin_op.md @@ -0,0 +1 @@ +../../../anakin/docs/Manual/addCustomOp.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md b/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md new file mode 120000 index 0000000000000000000000000000000000000000..c3e1e3494b2733b551a62b4aa47be27eaebe7914 --- /dev/null +++ b/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md @@ -0,0 +1 @@ +../../../anakin/docs/Manual/addCustomDevice.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/images b/source/advanced_usage/deploy/images new file mode 120000 index 0000000000000000000000000000000000000000..cd5258c313ab3a884d6dfd1e372f6655aba5a2e3 --- /dev/null +++ b/source/advanced_usage/deploy/images @@ -0,0 +1 @@ +../../../mobile/doc/images/ \ No newline at end of file diff --git a/source/advanced_usage/deploy/index.rst b/source/advanced_usage/deploy/index.rst index c6fadebf70f9f16668a9d736653b82f37edc1bf5..885c22e3d0c17d40defa04faa30a67afa7c2ae24 100644 --- a/source/advanced_usage/deploy/index.rst +++ b/source/advanced_usage/deploy/index.rst @@ -1,15 +1,45 @@ -######## -预测部署 -######## - -服务端 -###### +服务器端部署 - 原生引擎 +####################### .. toctree:: :maxdepth: 2 + build_and_install_lib_cn.rst native_inference_engine.rst +服务器端部署 - Anakin +##################### + + +使用文档 +~~~~~~~~ + +.. toctree:: + :maxdepth: 1 + + install_anakin.md + convert_paddle_to_anakin.md + run_anakin_on_arm.md + anakin_tutorial.md + anakin_example.md + anakin_gpu_benchmark.md + anakin_arm_benchmark.md + +开发文档 +~~~~~~~~ + +.. toctree:: + :maxdepth: 1 + + how_to_add_anakin_op.md + how_to_support_new_device_in_anakin.md + +移动端部署 +########## + +.. toctree:: + :maxdepth: 2 -移动端 -###### + mobile_build.md + mobile_design.md + mobile_dev.md diff --git a/source/advanced_usage/deploy/install_anakin.md b/source/advanced_usage/deploy/install_anakin.md new file mode 120000 index 0000000000000000000000000000000000000000..d14e1f720a8ac9c5eea8f90ea367202c7807f709 --- /dev/null +++ b/source/advanced_usage/deploy/install_anakin.md @@ -0,0 +1 @@ +../../../anakin/docs/Manual/INSTALL_ch.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/mobile_build.md b/source/advanced_usage/deploy/mobile_build.md new file mode 120000 index 0000000000000000000000000000000000000000..12b5c6ea2bbf737b764cc92312c7dea12284550d --- /dev/null +++ b/source/advanced_usage/deploy/mobile_build.md @@ -0,0 +1 @@ +../../../mobile/doc/build.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/mobile_design.md b/source/advanced_usage/deploy/mobile_design.md new file mode 120000 index 0000000000000000000000000000000000000000..b6fce99d7b121dce8dcc1796c7b13230ffc0c88d --- /dev/null +++ b/source/advanced_usage/deploy/mobile_design.md @@ -0,0 +1 @@ +../../../mobile/doc/design_doc.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/mobile_dev.md b/source/advanced_usage/deploy/mobile_dev.md new file mode 120000 index 0000000000000000000000000000000000000000..f8f533620a6f3109a9ebbae6ab5d3a7de48900f2 --- /dev/null +++ b/source/advanced_usage/deploy/mobile_dev.md @@ -0,0 +1 @@ +../../../mobile/doc/development_doc.md \ No newline at end of file diff --git a/source/advanced_usage/deploy/run_anakin_on_arm.md b/source/advanced_usage/deploy/run_anakin_on_arm.md new file mode 120000 index 0000000000000000000000000000000000000000..d55355f1d4d83c672ce6bd9684d1a42cd144e416 --- /dev/null +++ b/source/advanced_usage/deploy/run_anakin_on_arm.md @@ -0,0 +1 @@ +../../../anakin/docs/Manual/run_on_arm_ch.md \ No newline at end of file diff --git a/source/anakin b/source/anakin deleted file mode 160000 index 4e77324d1e1a7c224fee320b6e8ca1cd33b434ba..0000000000000000000000000000000000000000 --- a/source/anakin +++ /dev/null @@ -1 +0,0 @@ -Subproject commit 4e77324d1e1a7c224fee320b6e8ca1cd33b434ba