Merge branch 'feature/inference' into develop

72d42b85 · yuyang18 · 75e032b1 · 6db5c60f · 72d42b85 · b9d95555
24 changed file
--- a/.gitmodules
+++ b/.gitmodules
@@ -4,6 +4,9 @@
 [submodule "book"]
 	path = book
 	url = https://github.com/PaddlePaddle/book.git
-[submodule "source/anakin"]
+[submodule "anakin"]
-	path = source/anakin
+	path = anakin
-	url = https://github.com/PaddlePaddle/Anakin
+	url = https://github.com/PaddlePaddle/Anakin.git
+[submodule "mobile"]
+	path = mobile
+	url = https://github.com/PaddlePaddle/paddle-mobile.git
--- a/anakin @ b9d95555
+++ b/anakin @ b9d95555
+Subproject commit b9d95555a73f3e02aa169251cd319053b6d7d642
--- a/mobile @ c3aa92ac
+++ b/mobile @ c3aa92ac
+Subproject commit c3aa92ac28662d7a1553cd258ddd3f19412f5018
--- a/paddle @ 653686c7
+++ b/paddle @ 653686c7
-Subproject commit 494cecd650ab89b10a24784399a98aae904256c4
+Subproject commit 653686c753304f1b1d2a433cae96b96434e6c2d6
--- a/source/advanced_usage/deploy/anakin_arm_benchmark.md
+++ b/source/advanced_usage/deploy/anakin_arm_benchmark.md
+# Anakin ARM 性能测试
+## 测试环境和参数:
+ 测试模型Mobilenetv1, mobilenetv2, mobilenet-ssd
+ 采用android ndk交叉编译，gcc 4.9，enable neon， ABI： armveabi-v7a with neon -mfloat-abi=softfp
+ 测试平台
+   - 荣耀v9(root): 处理器:麒麟960, 4 big cores in 2.36GHz, 4 little cores in 1.8GHz
+   - nubia z17:处理器:高通835, 4 big cores in 2.36GHz, 4 little cores in 1.9GHz
+   - 360 N5:处理器:高通653, 4 big cores in 1.8GHz, 4 little cores in 1.4GHz
+ 多线程：openmp
+ 时间：warmup10次，运行10次取均值
+ ncnn版本：来源于github的master branch中commits ID：307a77f04be29875f40d337cfff6df747df09de6（msg:convert            LogisticRegressionOutput)版本
+ TFlite版本：来源于github的master branch中commits ID：65c05bc2ac19f51f7027e66350bc71652662125c（msg:Removed unneeded file copy that was causing failure in Pi builds)版本
+在BenchMark中本文将使用**`ncnn`**、**`TFlite`**和**`Anakin`**进行性能对比分析
+## BenchMark model
+> 注意在性能测试之前，请先将测试model通过[External Converter](#10003)转换为Anakin model
+> 对这些model，本文在ARM上进行多线程的单batch size测试。
+- [Mobilenet v1](#11)  *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载*
+- [Mobilenet v2](#22)  *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载*
+- [mobilenet-ssd](#33)  *caffe model 可以在[这儿](https://github.com/chuanqi305/MobileNet-SSD)下载*
+### <span id = '11'> mobilenetv1 </span>
+   |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| 
+   |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
+   |麒麟960|107.7ms|61.1ms|38.2ms|152.8ms|85.2ms|51.9ms|152.6ms|nan|nan|
+   |高通835|105.7ms|63.1ms|~~46.8ms~~|152.7ms|87.0ms|~~92.7ms~~|146.9ms|nan|nan|
+   |高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan| 
+### <span id = '22'> mobilenetv2 </span>
+   |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| 
+   |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
+   |麒麟960|93.1ms|53.9ms|34.8ms|144.4ms|84.3ms|55.3ms|100.6ms|nan|nan|
+   |高通835|93.0ms|55.6ms|41.1ms|139.1ms|88.4ms|58.1ms|95.2ms|nan|nan|
+   |高通653|106.6ms|64.2ms|48.0ms|199.9ms|125.1ms|98.9ms|108.5ms|nan|nan|
+### <span id = '33'> mobilenet-ssd </span>
+   |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| 
+   |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
+   |麒麟960|213.9ms|120.5ms|74.5ms|307.9ms|166.5ms|104.2ms|nan|nan|nan|
+   |高通835|213.0ms|125.7ms|~~98.4ms~~|292.9ms|177.9ms|~~167.8ms~~|nan|nan|nan|
+   |高通653|236.0ms|129.6ms|96.0ms|377.7ms|228.9ms|165.0ms|nan|nan|nan
+## How to run those Benchmark models?
+1. 首先, 使用[External Converter](../docs/Manual/Converter_en.md)对caffe model 进行转换
+2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机
+3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令
+4. 最后，终端显示器上将会打印该模型的运行时间
+5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到
--- a/source/advanced_usage/deploy/anakin_example.md
+++ b/source/advanced_usage/deploy/anakin_example.md
+../../../anakin/examples/example_introduction_cn.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/anakin_gpu_benchmark.md
+++ b/source/advanced_usage/deploy/anakin_gpu_benchmark.md
+# Anakin GPU Benchmark
+## Machine:
+>  CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz`
+>  GPU: `Tesla P4`
+>  cuDNN: `v7`
+## Counterpart of anakin  :
+The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 3`** ,   The models which TensorRT 3 doesn't support we use the custom plugins  to support.
+## Benchmark Model
+The following convolutional neural networks are tested with both `Anakin` and `TenorRT3`.
+ You can use pretrained caffe model or the model trained by youself.
+> Please note that you should transform caffe model or others into anakin model with the help of [`external converter ->`](../docs/Manual/Converter_en.md)
+- [Vgg16](#1)   *caffe model can be found [here->](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)*
+- [Yolo](#2)  *caffe model can be found [here->](https://github.com/hojel/caffe-yolo-model)*
+- [Resnet50](#3)  *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)*
+- [Resnet101](#4)  *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)*
+- [Mobilenet v1](#5)  *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)*
+- [Mobilenet v2](#6)  *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)*
+- [RNN](#7)  *not support yet*
+We tested them on single-GPU with single-thread.
+### <span id = '1'>VGG16 </span>
+- Latency (`ms`) of different batch
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 8.8690 | 8.2815 |
+| 2 | 15.5344 | 13.9116 |
+| 4 | 26.6000 | 21.8747 |
+| 8 | 49.8279 | 40.4076 |
+| 32 | 188.6270 | 163.7660 |
+- GPU Memory Used (`MB`)
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 963 | 997 |
+| 2 | 965 | 1039 |
+| 4 | 991 | 1115 |
+| 8 | 1067 | 1269 |
+| 32 | 1715 | 2193 |
+### <span id = '2'>Yolo </span>
+- Latency (`ms`) of different batch
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 16.4596| 15.2124 |
+| 2 | 26.6347| 25.0442 |
+| 4 | 43.3695| 43.5017 |
+| 8 | 80.9139 | 80.9880 |
+| 32 | 293.8080| 310.8810 |
+- GPU Memory Used (`MB`)
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 1569 | 1775 |
+| 2 | 1649 | 1815 |
+| 4 | 1709 | 1887 |
+| 8 | 1731 | 2031 |
+| 32 | 2253 | 2907 |
+### <span id = '3'> Resnet50 </span>
+- Latency (`ms`) of different batch
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 4.2459   |  4.1061 |
+| 2 |  6.2627  |  6.5159 |
+| 4 | 10.1277  | 11.3327 |
+| 8 | 17.8209  | 20.6680 |
+| 32 | 65.8582 | 77.8858 |
+- GPU Memory Used (`MB`)
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 531  | 503 |
+| 2 | 543  | 517 |
+| 4 | 583 | 541 |
+| 8 | 611 | 589 |
+| 32 |  809 | 879 |
+### <span id = '4'> Resnet101 </span>
+- Latency (`ms`) of different batch
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 7.5562 | 7.0837 |
+| 2 | 11.6023 | 11.4079 |
+| 4 | 18.3650 | 20.0493 |
+| 8 | 32.7632 | 36.0648 |
+| 32 | 123.2550 | 135.4880 |
+- GPU Memory Used (`MB)`
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 701  | 683 |
+| 2 | 713  | 697 |
+| 4 | 793 | 721 |
+| 8 | 819 | 769 |
+| 32 | 1043 | 1059 |
+###  <span id = '5'> MobileNet V1 </span>
+- Latency (`ms`) of different batch
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 45.5156  |  1.3947 |
+| 2 |  46.5585  |  2.5483 |
+| 4 | 48.4242  | 4.3404 |
+| 8 |  52.7957 |  8.1513 |
+| 32 | 83.2519 | 31.3178 |
+- GPU Memory Used (`MB`)
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 329  | 283 |
+| 2 | 345  | 289 |
+| 4 | 371 | 299 |
+| 8 | 393 | 319 |
+| 32 |  531 | 433 |
+###  <span id = '6'> MobileNet V2</span>
+- Latency (`ms`) of different batch
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 65.6861 | 2.9842 |
+| 2 | 66.6814 | 4.7472 |
+| 4 | 69.7114 | 7.4163 |
+| 8 | 76.1092 | 12.8779 |
+| 32 | 124.9810 | 47.2142 |
+- GPU Memory Used (`MB`)
+| BatchSize | TensorRT | Anakin |
+| --- | --- | --- |
+| 1 | 341 | 293 |
+| 2 | 353 | 301 |
+| 4 | 385 | 319 |
+| 8 | 421 | 351 |
+| 32 | 637 | 551 |
+## How to run those Benchmark models?
+> 1. At first, you should parse the caffe model with [`external converter`](https://github.com/PaddlePaddle/Anakin/blob/b95f31e19993a192e7428b4fcf852b9fe9860e5f/docs/Manual/Converter_en.md).
+> 2. Switch to *source_root/benchmark/CNN* directory. Use 'mkdir ./models' to create ./models and put anakin models into this file.
+> 3. Use command 'sh run.sh', we will create files in logs to save model log with different batch size. Finally, model latency summary will be displayed on the screen.
+> 4. If you want to get more detailed information with op time, you can modify CMakeLists.txt with setting `ENABLE_OP_TIMER` to `YES`, then recompile and run. You will find detailed information in  model log file.
--- a/source/advanced_usage/deploy/anakin_tutorial.md
+++ b/source/advanced_usage/deploy/anakin_tutorial.md
+../../../anakin/docs/Manual/Tutorial_ch.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/build_and_install_lib_cn.rst
+++ b/source/advanced_usage/deploy/build_and_install_lib_cn.rst
+.. _install_or_build_cpp_inference_lib:
+安装与编译C++预测库
+===========================
+直接下载安装
+-------------
+======================   ========================================
+版本说明                            C++预测库   
+======================   ========================================
+cpu_avx_mkl              `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuAvxCp27cp27mu/.lastSuccessful/fluid.tgz>`_ 
+cpu_avx_openblas         `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuAvxOpenblas/.lastSuccessful/fluid.tgz>`_
+cpu_noavx_openblas       `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_CpuNoavxOpenblas/.lastSuccessful/fluid.tgz>`_
+cuda7.5_cudnn5_avx_mkl   `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda75cudnn5cp27cp27mu/.lastSuccessful/fluid.tgz>`_
+cuda8.0_cudnn5_avx_mkl   `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda80cudnn5cp27cp27mu/.lastSuccessful/fluid.tgz>`_
+cuda8.0_cudnn7_avx_mkl   `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda8cudnn7cp27cp27mu/.lastSuccessful/fluid.tgz>`_
+cuda9.0_cudnn7_avx_mkl   `fluid.tgz <https://guest:@paddleci.ngrok.io/repository/download/Manylinux1_Cuda90cudnn7avxMkl/.lastSuccessful/fluid.tgz>`_
+======================   ========================================
+从源码编译
+----------
+用户也可以从 PaddlePaddle 核心代码编译C++预测库，只需在编译时配制下面这些编译选项：
+=================   =========
+选项                 值   
+=================   =========
+CMAKE_BUILD_TYPE    Release
+FLUID_INSTALL_DIR   安装路径    
+WITH_FLUID_ONLY     ON（推荐）
+WITH_SWIG_PY        OFF（推荐
+WITH_PYTHON         OFF（推荐）
+WITH_GPU            ON/OFF
+WITH_MKL            ON/OFF
+=================   =========
+建议按照推荐值设置，以避免链接不必要的库。其它可选编译选项按需进行设定。
+下面的代码片段从github拉取最新代码，配制编译选项（需要将PADDLE_ROOT替换为PaddlePaddle预测库的安装路径）：
+  .. code-block:: bash
+     pip install paddlepaddle-gpu
+     PADDLE_ROOT=/path/of/capi
+     git clone https://github.com/PaddlePaddle/Paddle.git
+     cd Paddle
+     mkdir build
+     cd build
+     cmake -DFLUID_INSTALL_DIR=$PADDLE_ROOT \
+           -DCMAKE_BUILD_TYPE=Release \
+           -DWITH_FLUID_ONLY=ON \
+           -DWITH_SWIG_PY=OFF \
+           -DWITH_PYTHON=OFF \
+           -DWITH_MKL=OFF \
+           -DWITH_GPU=OFF  \
+           ..
+      make
+      make inference_lib_dist
+成功编译后，使用C++预测库所需的依赖（包括：（1）编译出的PaddlePaddle预测库和头文件；（2）第三方链接库和头文件；（3）版本信息与编译选项信息）
+均会存放于PADDLE_ROOT目录中。目录结构如下：
+  .. code-block:: text
+     PaddleRoot/
+     ├── CMakeCache.txt
+     ├── paddle
+     │   └── fluid
+     │       ├── framework
+     │       ├── inference
+     │       ├── memory
+     │       ├── platform
+     │       ├── pybind
+     │       └── string
+     ├── third_party
+     │   ├── boost
+     │   │   └── boost
+     │   ├── eigen3
+     │   │   ├── Eigen
+     │   │   └── unsupported
+     │   └── install
+     │       ├── gflags
+     │       ├── glog
+     │       ├── mklml
+     │       ├── protobuf
+     │       ├── snappy
+     │       ├── snappystream
+     │       └── zlib
+     └── version.txt
+version.txt 中记录了该预测库的版本信息，包括Git Commit ID、使用OpenBlas或MKL数学库、CUDA/CUDNN版本号，如：
+  .. code-block:: text
+     GIT COMMIT ID: c95cd4742f02bb009e651a00b07b21c979637dc8
+     WITH_MKL: ON
+     WITH_GPU: ON
+     CUDA version: 8.0
+     CUDNN version: v5
--- a/source/advanced_usage/deploy/convert_paddle_to_anakin.md
+++ b/source/advanced_usage/deploy/convert_paddle_to_anakin.md
+../../../anakin/docs/Manual/Converter_ch.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/how_to_add_anakin_op.md
+++ b/source/advanced_usage/deploy/how_to_add_anakin_op.md
+../../../anakin/docs/Manual/addCustomOp.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+++ b/source/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+../../../anakin/docs/Manual/addCustomDevice.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/images
+++ b/source/advanced_usage/deploy/images
+../../../mobile/doc/images/
\ No newline at end of file
--- a/source/advanced_usage/deploy/index.rst
+++ b/source/advanced_usage/deploy/index.rst
-########
-预测部署
-########
-服务端
-######
-移动端
-######
\ No newline at end of file
--- a/source/advanced_usage/deploy/index_anakin.rst
+++ b/source/advanced_usage/deploy/index_anakin.rst
+服务器端部署 - Anakin
+#####################
+使用文档
+~~~~~~~
+.. toctree::
+   :maxdepth: 1
+   install_anakin.md
+   convert_paddle_to_anakin.md
+   run_anakin_on_arm.md
+   anakin_tutorial.md
+   anakin_example.md
+   anakin_gpu_benchmark.md
+   anakin_arm_benchmark.md
+开发文档
+~~~~~~~
+.. toctree::
+   :maxdepth: 1
+   how_to_add_anakin_op.md
+   how_to_support_new_device_in_anakin.md
--- a/source/advanced_usage/deploy/index_mobile.rst
+++ b/source/advanced_usage/deploy/index_mobile.rst
+移动端部署
+##########
+.. toctree::
+   :maxdepth: 2
+   mobile_build.md
+   mobile_dev.md
--- a/source/advanced_usage/deploy/index_native.rst
+++ b/source/advanced_usage/deploy/index_native.rst
+服务器端部署 - 原生引擎
+#######################
+..  toctree::
+    :maxdepth: 2
+    build_and_install_lib_cn.rst
+    native_inference_engine.rst
--- a/source/advanced_usage/deploy/install_anakin.md
+++ b/source/advanced_usage/deploy/install_anakin.md
+../../../anakin/docs/Manual/INSTALL_ch.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/mobile_build.md
+++ b/source/advanced_usage/deploy/mobile_build.md
+../../../mobile/doc/build.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/mobile_dev.md
+++ b/source/advanced_usage/deploy/mobile_dev.md
+../../../mobile/doc/development_doc.md
\ No newline at end of file
--- a/source/advanced_usage/deploy/native_inference_engine.rst
+++ b/source/advanced_usage/deploy/native_inference_engine.rst
+Paddle 预测 API
+===============
+为了更简单方便的预测部署，Fluid 提供了一套高层 API
+用来隐藏底层不同的优化实现。
+`预测库相关代码 <https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/contrib/inference>`__
+包括
+-  头文件 ``paddle_inference_api.h`` 定义了所有的接口
+-  库文件\ ``libpaddle_fluid.so`` 或 ``libpaddle_fluid.a``
+-  库文件 ``libpaddle_inference_api.so`` 或
+   ``libpaddle_inference_api.a``
+编译和依赖可以参考 :ref:`install_or_build_cpp_inference_lib` 。
+下面是一些 API 概念的介绍
+PaddleTensor
+------------
+PaddleTensor 定义了预测最基本的输入输出的数据格式，其定义是
+.. code:: cpp
+    struct PaddleTensor {
+      std::string name;  // variable name.
+      std::vector<int> shape;
+      PaddleBuf data;  // blob of data.
+      PaddleDType dtype;
+    };
+-  ``name`` 用于指定输入数据对应的 模型中variable 的名字
+   （暂时没有用，但会在后续支持任意 target 时启用）
+-  ``shape`` 表示一个 Tensor 的 shape
+-  ``data`` 数据以连续内存的方式存储在\ ``PaddleBuf``
+   中，\ ``PaddleBuf``
+   可以接收外面的数据或者独立\ ``malloc``\ 内存，详细可以参考头文件中相关定义。
+-  ``dtype`` 表示 Tensor 的数据类型
+engine
+------
+高层 API 底层有多种优化实现，我们称之为 engine，目前有三种 engine
+-  原生 engine，由 paddle 原生的 forward operator
+   组成，可以天然支持所有paddle 训练出的模型，
+-  Anakin engine，封装了
+   `Anakin <https://github.com/PaddlePaddle/Anakin>`__
+   ，在某些模型上性能不错，但只能接受自带模型格式，无法支持所有 paddle
+   模型，
+-  TensorRT mixed engine，用子图的方式支持了
+   `TensorRT <https://developer.nvidia.com/tensorrt>`__ ，支持所有paddle
+   模型，并自动切割部分计算子图到 TensorRT 上加速（WIP）
+其实现为
+.. code:: cpp
+    enum class PaddleEngineKind {
+      kNative = 0,       // Use the native Fluid facility.
+      kAnakin,           // Use Anakin for inference.
+      kAutoMixedTensorRT // Automatically mixing TensorRT with the Fluid ops.
+    };
+预测部署过程
+------------
+总体上分为以下步骤
+1. 用合适的配置创建 ``PaddlePredictor``
+2. 创建输入用的 ``PaddleTensor``\ ，传入到 ``PaddlePredictor`` 中
+3. 获取输出的 ``PaddleTensor`` ，将结果取出
+下面完整演示一个简单的模型，部分细节代码隐去
+.. code:: cpp
+    #include "paddle_inference_api.h"
+    // 创建一个 config，并修改相关设置
+    paddle::NativeConfig config;
+    config.model_dir = "xxx";
+    config.use_gpu = false;
+    // 创建一个原生的 PaddlePredictor
+    auto predictor =
+          paddle::CreatePaddlePredictor<NativeConfig, PaddleEngineKind::kNative>(config);
+    // 创建输入 tensor
+    int64_t data[4] = {1, 2, 3, 4};
+    paddle::PaddleTensor tensor{.name = "",
+                                .shape = std::vector<int>({4, 1}),
+                                .data = PaddleBuf(data, sizeof(data)),
+                                .dtype = PaddleDType::INT64};
+    // 创建输出 tensor，输出 tensor 的内存可以复用
+    std::vector<paddle::PaddleTensor> outputs;
+    // 执行预测
+    CHECK(predictor->Run(slots, &outputs));
+    // 获取 outputs ...
+编译时，联编 ``libpaddle_fluid.a/.so`` 和
+``libpaddle_inference_api.a/.so`` 便可。
+详细代码参考
+------------
+-  `inference
+   demos <https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/contrib/inference/demo>`__
+-  `复杂单线程/多线程例子 <https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/contrib/inference/test_paddle_inference_api_impl.cc>`__
--- a/source/advanced_usage/deploy/run_anakin_on_arm.md
+++ b/source/advanced_usage/deploy/run_anakin_on_arm.md
+../../../anakin/docs/Manual/run_on_arm_ch.md
\ No newline at end of file
--- a/source/advanced_usage/index.rst
+++ b/source/advanced_usage/index.rst
@@ -10,7 +10,9 @@
 ..  toctree::
    :maxdepth: 2
-    deploy/index.rst
+    deploy/index_native.rst
+    deploy/index_anakin.rst
+    deploy/index_mobile.rst
    development/contribute_to_paddle.md
    development/write_docs.rst
    development/new_op.md

--- a/anakin @ 4e77324d
+++ b/anakin @ 4e77324d
-Subproject commit 4e77324d1e1a7c224fee320b6e8ca1cd33b434ba