diff --git a/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md b/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md
index 08ea379f81d16407ed5f82770b55a34bcf138da8..e8701b2b54d96c104e6df13f28a0c028b1ca8d16 100644
--- a/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md
+++ b/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md
@@ -25,15 +25,15 @@
### mobilenetv1
- |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
+ |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
|麒麟960|107.7ms|61.1ms|38.2ms|152.8ms|85.2ms|51.9ms|152.6ms|nan|nan|
|高通835|105.7ms|63.1ms|~~46.8ms~~|152.7ms|87.0ms|~~92.7ms~~|146.9ms|nan|nan|
- |高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan|
+ |高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan|
### mobilenetv2
- |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
+ |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
|麒麟960|93.1ms|53.9ms|34.8ms|144.4ms|84.3ms|55.3ms|100.6ms|nan|nan|
|高通835|93.0ms|55.6ms|41.1ms|139.1ms|88.4ms|58.1ms|95.2ms|nan|nan|
@@ -41,7 +41,7 @@
### mobilenet-ssd
- |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
+ |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)|
|:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:|
|麒麟960|213.9ms|120.5ms|74.5ms|307.9ms|166.5ms|104.2ms|nan|nan|nan|
|高通835|213.0ms|125.7ms|~~98.4ms~~|292.9ms|177.9ms|~~167.8ms~~|nan|nan|nan|
@@ -49,8 +49,8 @@
## How to run those Benchmark models?
-1. 首先, 使用[External Converter](../docs/Manual/Converter_en.md)对caffe model 进行转换
-2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机
-3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令
-4. 最后,终端显示器上将会打印该模型的运行时间
-5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到
+ 1. 首先, 使用[External Converter](./convert_paddle_to_anakin.html)对caffe model 进行转换
+ 2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机
+ 3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令
+ 4. 最后,终端显示器上将会打印该模型的运行时间
+ 5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到
diff --git a/doc/fluid/advanced_usage/deploy/anakin_example.md b/doc/fluid/advanced_usage/deploy/anakin_example.md
index e6b9e18fe2d64b3fda6382bb23a6a818a3e17fbe..3cd684982e96077fefa7dd7a3d8a0e79a428f5d1 100644
--- a/doc/fluid/advanced_usage/deploy/anakin_example.md
+++ b/doc/fluid/advanced_usage/deploy/anakin_example.md
@@ -1,10 +1,14 @@
-# Example
+# Anakin 运行模型示例
+
Anakin目前只支持NCHW的格式
+
示例文件在test/framework/net下
## 在NV的GPU上运行CNN模型
+
示例文件为打开example_nv_cnn_net.cpp,整体流程如下:
-- 将模型的的path设置为anakin模型的路径,初始化NV平台的图对象。 anakin模型可以通过转换器转化caffe或fluid的模型得到
+
+- 将模型的的path设置为anakin模型的路径,初始化NV平台的图对象。 anakin模型可以通过转换器转化caffe或Paddle的模型得到
- 根据模型设置网络图的输入尺寸,进行图优化
- 根据优化后的网络图初始化网络执行器
- 取出网络的输入tensor,将数据拷贝到输入tensor
@@ -14,15 +18,21 @@
以NV平台为例演示Anakin框架的使用方法,注意编译时需要打开GPU编译开关
## 在X86上运行RNN模型
+
示例文件为example_x86_rnn_net.cpp
+
整体流程与在NV的GPU上运行CNN模型相似,不同之处如下:
+
- 使用X86标识初始化图对象和网络执行器对象
- rnn模型的输入尺寸是可变的,初始化图时的输入维度是维度的最大值,输入维度N代表总的词的个数。还需要设置输入tensor的seq_offset来标示这些词是如何划分为句子的,如{0,5,12}表示共有12个词,其中第0到第4个词是第一句话,第5到第11个词是第二句话
以X86平台为例演示Anakin框架的使用方法,注意编译时需要打开X86编译开关
## 在NV的GPU上使用Anakin的线程池运行CNN模型
+
示例文件为example_nv_cnn_net_multi_thread.cpp ,示例使用worker的同步预测接口
+
整体流程与在NV的GPU上运行CNN模型相似,不同之处如下:
+
- 用模型地址和线程池大小初始化worker对象
- 将输入tensor注入任务队列,获得输出tensor
diff --git a/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md b/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md
index 667f9396f1169a0d891b9e6b0e912aa5527ab0b8..72a5d50d99c982aa29ebb1fdbc55cd836aabce53 100644
--- a/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md
+++ b/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md
@@ -1,33 +1,28 @@
-# Anakin GPU Benchmark
+# Anakin GPU 性能测试
-## Machine:
+## 环境:
> CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz`
> GPU: `Tesla P4`
> cuDNN: `v7`
-## Counterpart of anakin :
+## anakin 对比对象:
-The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 3`** , The models which TensorRT 3 doesn't support we use the custom plugins to support.
+**`Anakin`** 将与高性能的推理引擎 **`NVIDIA TensorRT 3`** 进行比较
## Benchmark Model
-The following convolutional neural networks are tested with both `Anakin` and `TenorRT3`.
- You can use pretrained caffe model or the model trained by youself.
+> 注意在性能测试之前,请先将测试model通过 `External Converter` 工具转换为Anakin model
+> 对这些model,本文在GPU上进行单线程单GPU卡的性能测试。
-> Please note that you should transform caffe model or others into anakin model with the help of [`external converter ->`](../docs/Manual/Converter_en.md)
-
-
-- [Vgg16](#1) *caffe model can be found [here->](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)*
-- [Yolo](#2) *caffe model can be found [here->](https://github.com/hojel/caffe-yolo-model)*
-- [Resnet50](#3) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)*
-- [Resnet101](#4) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)*
-- [Mobilenet v1](#5) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)*
-- [Mobilenet v2](#6) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)*
-- [RNN](#7) *not support yet*
-
-We tested them on single-GPU with single-thread.
+- [Vgg16](#1) *caffe model 可以在[这儿](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)下载*
+- [Yolo](#2) *caffe model 可以在[这儿](https://github.com/hojel/caffe-yolo-model)下载*
+- [Resnet50](#3) *caffe model 可以在[这儿](https://github.com/KaimingHe/deep-residual-networks#models)下载*
+- [Resnet101](#4) *caffe model 可以在[这儿](https://github.com/KaimingHe/deep-residual-networks#models)下载*
+- [Mobilenet v1](#5) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载*
+- [Mobilenet v2](#6) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载*
+- [RNN](#7) *暂不支持*
### VGG16
@@ -162,9 +157,9 @@ We tested them on single-GPU with single-thread.
| 8 | 421 | 351 |
| 32 | 637 | 551 |
-## How to run those Benchmark models?
+## How to run those Benchmark models
-> 1. At first, you should parse the caffe model with [`external converter`](https://github.com/PaddlePaddle/Anakin/blob/b95f31e19993a192e7428b4fcf852b9fe9860e5f/docs/Manual/Converter_en.md).
-> 2. Switch to *source_root/benchmark/CNN* directory. Use 'mkdir ./models' to create ./models and put anakin models into this file.
-> 3. Use command 'sh run.sh', we will create files in logs to save model log with different batch size. Finally, model latency summary will be displayed on the screen.
-> 4. If you want to get more detailed information with op time, you can modify CMakeLists.txt with setting `ENABLE_OP_TIMER` to `YES`, then recompile and run. You will find detailed information in model log file.
+1. 首先, 使用[External Converter](./convert_paddle_to_anakin.html)对caffe model 进行转换
+2. 然后跳转至 *source_root/benchmark/CNN* 目录下,使用 'mkdir ./models'创建存放模型的目录,并将转换好的Anakin模型放在该目录下
+3. 运行脚本 `sh run.sh`,运行结束后,该模型的运行时间将会显示到终端上
+4. 如果你想获取每层OP的运行时间,你只用将 CMakeLists.txt 中的`ENABLE_OP_TIMER` 设置为 `YES` 即可
diff --git a/doc/fluid/advanced_usage/deploy/anakin_parser_design.md b/doc/fluid/advanced_usage/deploy/anakin_parser_design.md
new file mode 100644
index 0000000000000000000000000000000000000000..e2ec0c68dea031bf50c3adb37a7795a7f380eca0
--- /dev/null
+++ b/doc/fluid/advanced_usage/deploy/anakin_parser_design.md
@@ -0,0 +1,92 @@
+# Parser的编写指南
+
+ Parser是一种网络框架转换工具,将其他框架如Caffe、TensorFlow的网络结构转换为Anakin网络结构图,然后对转换后的Anakin图进行预测处理
+
+ 本文主要介绍Parser功能的框架结构和根据已有的网络框架改写Parser,以解析得到Anakin框架图,进行Anakin预测
+
+ 下文称Anakin为AK,运算操作为OP,本文参考TensorFlow的Parser编写,参考代码目录为tools/external_converter_v2/parser/tensorflow
+
+## Parser的功能和执行流程
+
+ Parser功能是将其他深度学习框架(如Caffe,TensorFlow,ONNX)的模型转换为AK的模型
+
+ 对AK的作用是屏蔽不同框架间的差异,这种差异包括模型存储、OP的定义、图差异
+
+ 因此Parser的执行流程是:
+
+ - 将源框架的模型载入Parser
+ - 将原框架的图解析为AK中的OP节点和OP节点的连接关系
+ - 进行OP定义的转换和图优化
+ - 将符合AK标准的图写入protobuf
+
+## Parser的目录结构
+
+ Parser工具在tools/external_converter_v2/parser目录下
+
+ Parser的目录主要包含3部分:
+
+ - Parser的运行配置文件包括 config.py, config.yaml, converter.py, 用户只用执行converter.py,Parser就会按照config.yaml中的声明去解析模型
+ - Parser的公共定义,包括operations,pbs,proto三个目录。Parser的公共工具函数 graph*.py logger.py utils.py
+ - 各个框架对应的Parser,其目录的命名方式为框架名,如Caffe, TensorFlow
+
+## Parser的编写流程
+
+### 1、声明你的Parser
+
+ - 在config.yaml中填写你的Parser运行的必要信息,包括ProtoPath和SavePath等。OPTIONS/Framework改为你的Parser的类型,TARGET下填写对应的参数列表
+ - 添加你的Parser目录,如TensorFlow,导出你的Parser符号。注意,Parser的框架默认调用你的Parser类中的__call__方法来执行解析,这个方法需要返回填写完毕的GraphProtoIO对象
+ - 在config.py中Configuration下__init__函数中增加对你的Parser的调用,将yaml中读取的配置信息传给你的Parser,此处调用你的Parser中的__init__方法
+
+### 2、添加你的Parser主体
+
+ 可以参考parser_tf.py
+
+ - 你需要在Parser主体构造时获取模型路径,input,ouput名字等解析必须的信息
+ - 在__call__中返回填写好的GraphProtoIO对象,该对象为填写protobuf的辅助工具
+ - 建议Parser的解析过程分成三部分,先将原框架的模型载入并转换为一种便于修改的中间的图形式;对中间图修改使得图满足AK的要求;将满足要求的中间图利用NodeProtoIO和GraphProtoIO这两个辅助类填入protobuf,具体细节可以参考parser_tf
+
+### 3、读取原始模型,并将模型转换为中间类型
+
+ 可以参考parse_tf_2_med.py
+
+ - 这一步与原始框架结合紧密,你可能需要import原始框架的工具函数来完成模型的裁剪、固定、加载等操作
+ - 大部分的框架都是使用tensor来连接OP的,但AK中是OP直接相连,这点需要注意
+ - AK的shape默认是4维的,有的参数的shape不足4维,需要Parser补全
+
+### 4、对中间类型的图进行优化
+
+ 可以参考med_graph.py
+
+ - 由于AK不支持普通OP多输出的情况,需要在多输出的OP后面补上Splite类型的OP节点
+ - 对于Convlution后接Batchnorm这种可以合并又不会导致OP定义改变的情况,需要Parser在这一步做掉
+ - AK规定所有的输入类型OP的名字必须是input_x这种命名方式,其中x为从0开始的数字
+
+### 5、将中间类型的图以GraphProtoIO的方式保存
+
+ 可以参考parse_med_2_ak.py 和 parser_tf.py
+
+ - 你首先需要构造Node节点,Node节点的名字是OP的名字(如conv2d_1_a_0),Node节点中OP成员变量的名字是Node节点的类型(如Convlution)
+ - Node节点需要按照输入的顺序用Node的add_in方法填写输入Node的名字,add_out方法按顺序填写输出Node的名字
+ - 通过调用GraphProtoIO的add_node方法将构造好的Node的__call__方法的返回值作为参数,将Node节点加入AK的graph中
+ - 调用GraphProtoIO的add_in_edge和add_out_edge完成AK图中OP间关系的构建。如果Node中的in和out填写正确,你也可以通过调用GraphProtoIO的format_edge_from_nodes方法完成这个工作
+ - AK的模型需要Parser给出输出Node的名字,使用GraphProtoIO的add_out方法填写输出Node的名字
+
+### 6、检查模型解析的正确性
+
+ - 默认的config.yaml配置会在解析结束后启动一个web服务器展示解析后的AK模型图,你需要对比原框架的模型图进行验证。这里最容易出现的错误是边关系的错误,表现为图非常乱,你需要逐条边地检查错误;第二个容易出错的地方是参数漏填,需要你检查OP中的属性
+ - 将解析后的模型放入AK中执行,使用相同的输入,原框架与AK有相同的输出。若果输出不一致可以开启AK的DEBUG模式,在net.cpp中将没层的输出打印;如果AK在解析阶段陷入死循环,大概率是边的关系出错
+
+## 如何添加新OP
+
+ - 需要在AK代码中加入该OP的实现,包括对应设备Saber的OP,Saber单测和Framework中的OP
+ - 根据Framework的OP在ops.py中添加Parser公共的OP定义
+ - 从原框架的模型中解析出该OP的节点,并在AK的graph中填入该OP节点
+
+## AK模型与其他框架模型的不同之处
+
+ + AK模型与caffe的模型相似,因此与其他模型有很多不同的地方,需要Parser在解析过程中处理掉
+ + 最大的不同是与PaddlePaddle或TensorFlow的模型中OP粒度很细,而AK的模型中OP的粒度很粗(目的是为了节省访存开销)。这会导致解析这些框架的模型时存在大量的合并操作
+ + 其次是OP的行为不同,如TensorFlow中Pooling默认都是exclusive的,而AK中是inclusive的。TensorFlow的Padding,如果是奇数pad,则在右方和下方多pad,而AK是在左方和上方多Pad
+ + AK默认的布局是NCHW,如果其他框架的OP是其他形式的,需要在Parser中做weights的布局转换,并处理reshape的问题
+ + AK中有的weights是需要预先做布局转换的(如GRU,LSTM),AK中也支持同一OP的不同算法,如(GRU,Pooling)
+
diff --git a/doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md b/doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md
new file mode 100644
index 0000000000000000000000000000000000000000..cdebd4ae090668ea2f4d417da99f7e50e34e323e
--- /dev/null
+++ b/doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md
@@ -0,0 +1,193 @@
+## ARM 源码编译 Anakin ##
+
+目前Anakin支持ARM Android平台,采用Android NDK交叉编译工具链,已在mac os和centos上编译和测试通过。
+
+### 安装概览 ###
+
+* [系统需求](#0001)
+* [安装第三方依赖](#0002)
+* [Anakin源码编译](#0003)
+* [验证安装](#0004)
+
+
+### 1. 系统需求 ###
+
+* 宿主机: linux, mac
+* cmake 3.8.2+
+* Android NDK r14, Linux 版本[从这里下载](https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip)
+
+### 2. 安装第三方依赖 ###
+
+- 2.1 protobuf3.4.0
+
+ 源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0)
+
+ - 2.1.1 为宿主机编译protobuf
+
+ ```bash
+ $ tar -xzf protobuf-3.4.0.tar.gz
+ $ cd protobuf-3.4.0
+ $ ./autogen.sh
+ $ ./configure
+ $ make
+ $ make check
+ $ make install
+ ```
+
+ 上述 $make install 执行后,可在 `/usr/local/include/google` 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下, 然后将已经生成文件清除。
+
+ 如有问题,请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md)。
+
+ ```bash
+ $ make distclean
+ ```
+
+ - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf,注意设置ANDROID_NDK的路径,以及ARCH_ABI、HOSTOSN的值
+
+ ```bash
+
+ $ export ANDROID_NDK=your_ndk_path
+ $ ARCH_ABI="arm-linux-androideabi-4.9"
+ $ HOSTOSN="darwin-x86_64"
+ $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm
+ $ export PREBUILT=$ANDROID_NDK/toolchains/$ARCH_ABI
+ $ export LDFLAGS="--sysroot=$SYSROOT"
+ $ export LD="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/arm-linux-androideabi/bin/ld $LDFLAGS"
+ $ export LIBS="-llog $ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/libgnustl_static.a"
+ $ export CPPFLAGS=""
+ $ export INCLUDES="-I$ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/include/ -I$ANDROID_NDK/platforms/android-9/arch-arm/usr/include/ -I$ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include/"
+ $ export CXXFLAGS="-march=armv7-a -mfloat-abi=softfp -DGOOGLE_PROTOBUF_NO_RTTI --sysroot=$SYSROOT"
+ $ export CCFLAGS="$CXXFLAGS"
+ $ export CXX="$PREBUILT/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-g++ $CXXFLAGS"
+ $ export CC="$CXX"
+ $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib"
+ $ ./autogen.sh
+ $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD"
+ $ make
+ ```
+
+ 编译生成 *.a 静态库,若希望编译*.so 动态链接库 ,请在./configure参数中改--disable-shared为--disable-static --enable-shared
+
+ 生成文件在`src/.libs/`下,将生成的文件拷贝至`Anakin/third-party/arm-android/protobuf/lib`下
+
+ 在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。
+
+ ```cmake
+ set(ARM_RPOTO_ROOT "${CMAKE_SOURCE_DIR}/third-party/arm-android/protobuf")
+ ```
+
+- 2.2 opencv 2.4.3+(optional)
+
+ Anakin只在examples示例中使用opencv
+
+ Android系统的opencv从[这里下载](https://opencv.org/releases.html)
+
+ 解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a`
+
+ 在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv`
+
+ 并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径
+
+ ```cmake
+ include_directories(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/jni/include/)
+ LINK_DIRECTORIES(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/libs/armeabi-v7a/)
+ ```
+
+### 3. Anakin源码编译 ###
+
+#### 编译Android版本
+
+克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm)
+
+```bash
+ cd your_dir
+ git clone https://github.com/PaddlePaddle/Anakin.git
+ cd Anakin
+ git fetch origin arm
+ git checkout arm
+```
+
+修改`android_build.sh`
+
+ - 修改NDK路径
+
+ ```bash
+ #modify "your_ndk_path" to your NDK path
+ export ANDROID_NDK=your_ndk_path
+ ```
+
+ - 修改ARM 处理器架构
+
+ 对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON`
+ 对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a`
+ 目前我们只支持 `armeabi-v7a with NEON`;`arm64-v8a` 还在开发中
+
+ ```bash
+ -DANDROID_ABI="armeabi-v7a with NEON"
+ ```
+
+- 设置Android API
+
+ 根据Android系统的版本设置API level, 例如API Level 21 -> Android 5.0.1
+
+ ```bash
+ -DANDROID_NATIVE_API_LEVEL=21
+ ```
+
+- 选择编译静态库或动态库
+
+ 设置`BUILD_SHARED=NO`编译静态库
+ 设置`BUILD_SHARED=YES`编译动态库
+
+ ```bash
+ -DBUILD_SHARED=NO
+ ```
+
+- OpenMP多线程支持
+
+ 设置`USE_OPENMP=YES`开启OpenMP多线程
+
+ ```bash
+ -DUSE_OPENMP=YES
+ ```
+
+- 编译单测文件
+
+ 设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件
+
+ ```bash
+ -DBUILD_WITH_UNIT_TEST=YES
+ ```
+
+- 编译示例文件
+
+ 设置`BUILD_EXAMPLES=YES`将会编译示例文件
+ ```bash
+ -DBUILD_EXAMPLES=YES
+ ```
+
+- 开启opencv
+
+ 如果使用opencv,设置`USE_OPENCV=YES`
+
+ ```bash
+ -DUSE_OPENCV=YES
+ ```
+
+- 开始编译
+
+ 运行脚本 `android_build.sh` 将自动编译Anakin
+
+ ```bash
+ ./android_build.sh
+ ```
+
+### 4. 验证安装 ###
+
+编译好的库会放在目录`${Anakin_root}/output`下;
+
+编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下;
+
+编译好的示例文件会放在`${Anakin_root}/output/examples`目录下。
+
+对于Android系统,打开设备的调试模式,通过ADB可以访问的目录是`data/local/tmp`,通过ADB push将测试文件、模型和数据发送到设备目录, 运行测试文件。
diff --git a/doc/fluid/advanced_usage/deploy/anakin_tutorial.md b/doc/fluid/advanced_usage/deploy/anakin_tutorial.md
index 5efbc89abd469871b318c306e8cb03dd95f0c85b..1658aae6387744743d557788d70ffc4e4c2a8639 100644
--- a/doc/fluid/advanced_usage/deploy/anakin_tutorial.md
+++ b/doc/fluid/advanced_usage/deploy/anakin_tutorial.md
@@ -1,7 +1,7 @@
# Anakin 使用教程 ##
本教程将会简略的介绍Anakin的工作原理,一些基本的Anakin API,以及如何调用这些API。
-
+
## 内容 ###
- [Anakin的工作原理](#principle)
@@ -14,31 +14,38 @@
用Anakin来进行前向计算主要分为三个步骤:
-- 将外部模型通过[Anakin Parser](Converter_ch.md)解析为Anakin模型
- 在使用Anakin之前,用户必须将所有其他模型转换成Anakin模型,我们提供了转换脚本,用户可通过[Anakin Parser](Converter_ch.md)进行模型转换。
-- 生成Anakin计算图
- 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。你只需要调用相应的API优化即可。
-- 执行计算图
- Anakin会选择不同硬件平台执行计算图。
+ - 将外部模型通过[Anakin Parser](./convert_paddle_to_anakin.html)解析为Anakin模型
+ 在使用Anakin之前,用户必须将所有其他模型转换成Anakin模型,我们提供了转换脚本,用户可通过[Anakin Parser](./convert_paddle_to_anakin.html)进行模型转换。
+ - 生成Anakin计算图
+ 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。你只需要调用相应的API优化即可。
+ - 执行计算图
+ Anakin会选择不同硬件平台执行计算图。
## Anakin APIs ###
+
### Tensor ####
-`Tensor`提供基础的数据操作和管理,为ops提供统一的数据接口。`Tensor`包含以下几个属性:
+`Tensor`提供基础的数据操作和管理,为ops提供统一的数据接口。`Tensor`包含以下几个属性:
+
+- Buffer
+ 数据存储区
+- Shape
+ 数据的维度信息
+- Event
+ 用于异步计算的同步
-- Buffer
- 数据存储区
-- Shape
- 数据的维度信息
-- Event
- 用于异步计算的同步
+`Tensor`类包含三个`Shape`对象, 分别是`_shape`, `_valid_shape`和 `offset`
- `Tensor` 类包含三个`Shape`对象, 分别是`_shape`, `_valid_shape`和 `offset`。 `_shape`为`tensor`真正空间信息,`_valid_shape`表示当前`tensor`使用的空间信息, `_offset`表示当前`tensor`数据指针相对于真正数据空间的信息。 `Tensor`不同维度与分别与数学中的向量、矩阵等相对应如下表所示。
+ - `_shape`为`tensor`真正空间信息
+ - `_valid_shape`表示当前`tensor`使用的空间信息
+ - `tensor`使用的空间信息
+ - `_offset`表示当前`tensor`数据指针相对于真正数据空间的信息
+`Tensor`不同维度与分别与数学中的向量、矩阵等相对应如下表所示
Dimentions | Math entity |
- :----: | :----:
+:----: | :----:
1 | vector
2 | matrix
3 | 3-tensor
@@ -57,195 +64,202 @@ n | n-tensor
};
```
-TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应;[LayOutType](#layout)是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识。 Anakin中数据类型与基本数据类型的对应如下:
-
-1. TargetType
-
- Anakin TargetType | platform
- :----: | :----:|
- NV | NVIDIA GPU
- ARM | ARM
- AMD | AMD GPU
- X86 | X86
- NVHX86 | NVIDIA GPU with Pinned Memory
+TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应
-2. DataType
+[LayOutType](#layout)是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识
-Anakin DataType | C++ | Description
-:---: | :---: | :---: |
-AK_HALF | short | fp16
-AK_FLOAT | float | fp32
-AK_DOUBLE | double | fp64
-AK_INT8 | char | int8
-AK_INT16 | short | int16
-AK_INT32 | int | int32
-AK_INT64 | long | int64
-AK_UINT8 | unsigned char | uint8
-AK_UINT16 | unsigned short | uint8
-AK_UINT32 | unsigned int | uint32
-AK_STRING | std::string | /
-AK_BOOL | bool | /
-AK_SHAPE | / | Anakin Shape
-AK_TENSOR | / | Anakin Tensor
+Anakin中数据类型与基本数据类型的对应如下:
+ 1. TargetType
-3. LayOutType
+ Anakin TargetType | platform
+ :----: | :----:
+ NV | NVIDIA GPU
+ ARM | ARM
+ AMD | AMD GPU
+ X86 | X86
+ NVHX86 | NVIDIA GPU with Pinned Memory
-Anakin LayOutType ( Tensor LayOut ) | Tensor Dimention | Tensor Support | Op Support
-:---: | :---: | :---: | :---: |
-W | 1-D | YES | NO
-HW | 2-D | YES | NO
-WH | 2-D | YES | NO
-NW | 2-D | YES | YES
-NHW | 3-D | YES |YES
-NCHW ( default ) | 4-D | YES | YES
-NHWC | 4-D | YES | NO
-NCHW_C4 | 5-D | YES | YES
+ 2. DataType
+ Anakin DataType | C++ | Description
+ :---: | :---: | :---:
+ AK_HALF | short | fp16
+ AK_FLOAT | float | fp32
+ AK_DOUBLE | double | fp64
+ AK_INT8 | char | int8
+ AK_INT16 | short | int16
+ AK_INT32 | int | int32
+ AK_INT64 | long | int64
+ AK_UINT8 | unsigned char | uint8
+ AK_UINT16 | unsigned short | uint8
+ AK_UINT32 | unsigned int | uint32
+ AK_STRING | std::string | /
+ AK_BOOL | bool | /
+ AK_SHAPE | / | Anakin Shape
+ AK_TENSOR | / | Anakin Tensor
-理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOutType,NCHW_C4是专门针对于int8这种数据类型的。
+ 3. LayOutType
+ Anakin LayOutType ( Tensor LayOut ) | Tensor Dimention | Tensor Support | Op Support
+ :---: | :---: | :---: | :---:
+ W | 1-D | YES | NO
+ HW | 2-D | YES | NO
+ WH | 2-D | YES | NO
+ NW | 2-D | YES | YES
+ NHW | 3-D | YES |YES
+ NCHW ( default ) | 4-D | YES | YES
+ NHWC | 4-D | YES | NO
+ NCHW_C4 | 5-D | YES | YES
-例子
+ 理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOuteType,NCHW_C4是专门针对于int8这种数据类型的。
-> 下面的代码将展示如何使用tensor, 我们建议先看看这些示例。
+ 例子
-> 要想获得更多关于tensor的信息, 请参考 *soure_path/core/tensor.h*
+ 下面的代码将展示如何使用tensor, 我们建议先看看这些示例。
-> 1. 使用shape对象初始化tensor
-``` c++
- //create a null tensor. A null tensor holds for nothing.
- //tensor's buffer is resident at CPU and its datatype is AK_FLOAT.
- //tensor's Layout is NCHW(default)
- Tensor mytensor;
+ 要想获得更多关于tensor的信息, 请参考 *soure_path/core/tensor.h*
- //1. using shape object to create a tensor.
- Shape shape1(NUM); //1-D shape. NUM is the number of dimention.
- Tensor mytensor1(shape1); //1-D tensor.
+ > 1. 使用shape对象初始化tensor
- // A 4-D shape
- Shape shape2(N, C, H, W); // batch x channel x height x width
-```
+ ```c++
+ //create a null tensor. A null tensor holds for nothing.
+ //tensor's buffer is resident at CPU and its datatype is AK_FLOAT.
+ //tensor's Layout is NCHW(default)
+ Tensor mytensor;
->`注意:Shape的维度必须和tensor的`[LayoutType](#layout)`相同,比如Shape(N,C,H,W), 那么Tensor的 LayoutType必须是NCHW,否则会出错。如下列代码所示`
+ //1. using shape object to create a tensor.
+ Shape shape1(NUM); //1-D shape. NUM is the number of dimention.
+ Tensor mytensor1(shape1); //1-D tensor.
+ // A 4-D shape
+ Shape shape2(N, C, H, W); // batch x channel x height x width
+ ```
-```c++
- // A 4-D tensor.
- Tensor mytensor2(shape2); //right
+ >`注意:Shape的维度必须和tensor的`[LayoutType](#layout)`相同,比如Shape(N,C,H,W), 那么Tensor的 LayoutType必须是NCHW,否则会出错。如下列代码所示`
- //A 4-D tensor which is resident at GPU and its datatype is AK_INT8
- Tensor mytensor3(shape2); //right
-
- Tensor mytensor4(shape2); //wrong!! shape's dimetion must be equal to tensor's Layout.
- Tensor mytensor5(shape2); //wrong!!!!
+ ```c++
+ // A 4-D tensor.
+ Tensor mytensor2(shape2); //right
-```
+ //A 4-D tensor which is resident at GPU and its datatype is AK_INT8
+ Tensor mytensor3(shape2); //right
-> 2. 使用现有的数据和shape初始化tensor
+ Tensor mytensor4(shape2); //wrong!! shape's dimetion must be equal to tensor's Layout.
+ Tensor mytensor5(shape2); //wrong!!!!
-```c++
+ ```
- /**
- * A construtor of Tensor.
- * data_ptr is a pointer to any data type of data
- * TargetType is type of a platform [Anakin TargetType]
- * id : device id
- * shape: a Anakin shape
- */
- Tensor(Dtype* data_ptr, TargetType_t target, int id, Shape shape);
+ > 2. 使用现有的数据和shape初始化tensor
- //using existing data feed to a tensor
- Tensor mytensor(data_ptr, TargetType, device_id, shape); //shape must has dimention (N, C, H, W).
+ ```c++
-```
+ /**
+ * A construtor of Tensor.
+ * data_ptr is a pointer to any data type of data
+ * TargetType is type of a platform [Anakin TargetType]
+ * id : device id
+ * shape: a Anakin shape
+ */
+ Tensor(Dtype* data_ptr, TargetType_t target, int id, Shape shape);
-> 3. 使用tensor初始化tensor
+ //using existing data feed to a tensor
+ Tensor mytensor(data_ptr, TargetType, device_id, shape); //shape must has dimention (N, C, H, W).
-```c++
- Tensor tensor(exist_tensor);
-```
+ ```
+ > 3. 使用tensor初始化tensor
-> 提示: 你可以用` typedef Tensor Tensor4d_X86 `方便定义tensor
+ ```c++
+ Tensor tensor(exist_tensor);
+ ```
+ > 提示: 你可以用` typedef Tensor Tensor4d_X86 `方便定义tensor
#### 填充tensor数据区
-
填充数据区得看你申明tensor的方式, 下面展示了如何填充tensor的数据区。
-```c++
首先来看看tensor的四种声明方式:
-1. Tensor mytensor;
-2. Tensor mytensor1(shape1);
-3. Tensor mytensor(data_ptr, TargetType, device_id, shape);
-4. Tensor tensor(exist_tensor);
-
+```c++
+ 1. Tensor mytensor;
+ 2. Tensor mytensor1(shape1);
+ 3. Tensor mytensor(data_ptr, TargetType, device_id, shape);
+ 4. Tensor tensor(exist_tensor);
+```
相关的声明方式的数据填充方法如下:
-1:声明一个空的tensor,此时没有为其分配内存,所以,我们需要手动的为其分配内存。
-
- //parama shape
- mytensor.re_alloc(Shape shape);
-
- //Get writable pointer to mytensor.
- //parama index (int): where you start to write.
- //Dtype is your data type such int, float or double.
- Dtype *p = mytensor.mutable_data(index/*=0*/);
- //write data to mytensor
- for(int i = 0; i < mytensor.size(); i++){
- p[i] = 1.0f;
- }
- //do something ...
-
-2: 这种声明方式会自动分配内存
-
- //Get writable pointer to mytensor.
- //parama index (int): where you start to write.
- //Dtype is your data type such int, float or double.
- Dtype *p = mytensor1.mutable_data(index/*=0*/);
- //write data to mytensor
- for(int i = 0; i < mytensor.size(); i++){
- p[i] = 1.0f;
- }
- //do something ...
-
-
-3:在该种声明方式中,我们仍不需要手动为其分配内存。但在构造函数内部是否为其分配内存,得依情况而定。如果data_ptr和申明的
-tensor都在都一个目标平台上,那么该tensor就会与data_ptr共享内存空间,相反,如果他们不在同一个平台上(如data_ptr在X86上,而
-tensor在GPU上),那么此时tensor就会开辟一个新的内存空间,并将data_ptr所指向的数据拷贝到tensor的buffer中。
-
- //Get writable pointer to mytensor.
- //parama index (int): where you start to write.
- //Dtype is your data type such int, float or double.
- Dtype *p = mytensor.mutable_data(index/*=0*/);
- //write data to mytensor
- for(int i = 0; i < mytensor.size(); i++){
- p[i] = 1.0f;
- }
- //do something ...
+- 声明一个空的tensor,此时没有为其分配内存,所以,我们需要手动的为其分配内存。
-4:该种方式仍不需要手动分配内存
+```c++
+
+ //parama shape
+ mytensor.re_alloc(Shape shape);
- //Get writable pointer to mytensor.
- //parama index (int): where you start to write.
- //Dtype is your data type such int, float or double.
- Dtype *p = mytensor.mutable_data(index/*=0*/);
- //write data to mytensor
- for(int i = 0; i < mytensor.size(); i++){
+ //Get writable pointer to mytensor.
+ //parama index (int): where you start to write.
+ //Dtype is your data type such int, float or double.
+ Dtype *p = mytensor.mutable_data(index/*=0*/);
+ //write data to mytensor
+ for(int i = 0; i < mytensor.size(); i++){
p[i] = 1.0f;
- }
- //do something ...
+ }
+ //do something ...
+```
+
+- 这种声明方式会自动分配内存
+
+```c++
+ //Get writable pointer to mytensor.
+ //parama index (int): where you start to write.
+ //Dtype is your data type such int, float or double.
+ Dtype *p = mytensor1.mutable_data(index/*=0*/);
+ //write data to mytensor
+ for(int i = 0; i < mytensor.size(); i++){
+ p[i] = 1.0f;
+ }
+ //do something ...
+```
+
+- 在该种声明方式中,我们仍不需要手动为其分配内存。但在构造函数内部是否为其分配内存,得依情况而定。如果data_ptr和申明的
+ tensor都在都一个目标平台上,那么该tensor就会与data_ptr共享内存空间,相反,如果他们不在同一个平台上(如data_ptr在X86上,而
+ tensor在GPU上),那么此时tensor就会开辟一个新的内存空间,并将data_ptr所指向的数据拷贝到tensor的buffer中。
+
+```c++
+ //Get writable pointer to mytensor.
+ //parama index (int): where you start to write.
+ //Dtype is your data type such int, float or double.
+ Dtype *p = mytensor.mutable_data(index/*=0*/);
+ //write data to mytensor
+ for(int i = 0; i < mytensor.size(); i++){
+ p[i] = 1.0f;
+ }
+ //do something ...
+```
+
+- 该种方式仍不需要手动分配内存
+
+```c++
+ //Get writable pointer to mytensor.
+ //parama index (int): where you start to write.
+ //Dtype is your data type such int, float or double.
+ Dtype *p = mytensor.mutable_data(index/*=0*/);
+ //write data to mytensor
+ for(int i = 0; i < mytensor.size(); i++){
+ p[i] = 1.0f;
+ }
+ //do something ...
+```
+- 另外,你还可以获取一个tensor的可读指针,示例如下:
-另外,你还可以获取一个tensor的可读指针,示例如下:
+```c++
//Get read-only pointer to mytensor.
//parama index (int): where you start to read.
//Dtype is your data type such int, float or double.
- Dtype *p = mytensor.data(index/*=0*/);
+ Dtype *p = mytensor.data(index/*=0*/);
//do something ...
```
@@ -254,77 +268,75 @@ tensor在GPU上),那么此时tensor就会开辟一个新的内存空间,
#### 获取tensor的shape
```c++
-//some declarations
-// ...
-Shape shape = mytensor.shape();
+ //some declarations
+ // ...
+ Shape shape = mytensor.shape();
-//Get a first dimetion size of tesor, if it has.
-int d1 = shape[0];
+ //Get a first dimetion size of tesor, if it has.
+ int d1 = shape[0];
-//Get a second dimention size of tensor, if it has.
-int d2 = shape[1];
+ //Get a second dimention size of tensor, if it has.
+ int d2 = shape[1];
-...
+ ...
-//Get a n-th dimention size of tensor, if it has.
-int dn = shape[n-1];
+ //Get a n-th dimention size of tensor, if it has.
+ int dn = shape[n-1];
-//Get a tensor's dimention
-int dims = mytensor.dims();
+ //Get a tensor's dimention
+ int dims = mytensor.dims();
-//Get the size of tensor.
-//size = d1 x d2 x ... x dn.
-int size = mytensor.size();
+ //Get the size of tensor.
+ //size = d1 x d2 x ... x dn.
+ int size = mytensor.size();
-//Get the size of tensor at interval [Di, Dj)
-// form i-th dimention to j-th dimention, but not including the j-th dimention.
-// which means di x (di+1) x ... x (dj -1)
-int size = mytensor.count(start, end);
+ //Get the size of tensor at interval [Di, Dj)
+ // form i-th dimention to j-th dimention, but not including the j-th dimention.
+ // which means di x (di+1) x ... x (dj -1)
+ int size = mytensor.count(start, end);
```
#### 设置tensor的shape
我们可以用tensor的成员函数set_shape来设置tensor的shape。 下面是set_shape的定义
-
```c++
-/**
- * \brief set a tensor's shape
- * \param valid_shape [a Shape object]
- * \param shape [a Shape object]
- * \param offset [a Shape object]
- * \return the status of this operation, that means whether it success * or not.
- */
-SaberStatus set_shape(Shape valid_shape, Shape shape = Shape::zero(TensorAPI::layout_dims::value), Shape offset = Shape::minusone(TensorAPI::layout_dims::value));
+ /**
+ * \brief set a tensor's shape
+ * \param valid_shape [a Shape object]
+ * \param shape [a Shape object]
+ * \param offset [a Shape object]
+ * \return the status of this operation, that means whether it success * or not.
+ */
+ SaberStatus set_shape(Shape valid_shape, Shape shape = Shape::zero(TensorAPI::layout_dims::value), Shape offset = Shape::minusone(TensorAPI::layout_dims::value));
```
这个成员函数只设置tensor的shape。这些shape对象(valid_shape, shape, offset)的[LayOutType](#layout)必须和当前的tensor的相应三个shape对象的LayOutType相同,如果不同就会出错,返回SaberInvalidValue。 如果相同,那么将成功设置tensor的shape。
```c++
-// some declarations
-// ...
-//valid_shape, shape , offset are Shape object;
-//All these Shape object's LayOutType must be equal to mytensor's.
-mytensor.set_shape(valid_shape, shape, offset);
+ // some declarations
+ // ...
+ //valid_shape, shape , offset are Shape object;
+ //All these Shape object's LayOutType must be equal to mytensor's.
+ mytensor.set_shape(valid_shape, shape, offset);
```
#### 重置 tensor的shape
```c++
-//some declarations
-Shape shape, valid_shape, offset;
+ //some declarations
+ Shape shape, valid_shape, offset;
-//do some initializations
-...
-mytensor.reshape(valid_shape, shape, offset);
+ //do some initializations
+ ...
+ mytensor.reshape(valid_shape, shape, offset);
```
注意: Reshape操作仍然需要shape的[LayOutType](#layout) 与tensor的相同
-
### Graph ###
`Graph`类负责加载Anakin模型生成计算图、对图进行优化、存储模型等操作。
@@ -335,62 +347,61 @@ mytensor.reshape(valid_shape, shape, offset);
```c++
-template
-class Graph ... /* inherit other class*/{
-
- //some implements
- ...
+ template
+ class Graph ... /* inherit other class*/{
-};
+ //some implements
+ ...
+
+ };
```
前面已经介绍过[TargetType](#target)和[DataType](#datatype)是Anakin内部自定义数据类型。[TargetType](#target)表示平台类型 (如NV、X86), [DataType](#datatype)是Anakin基本数据类型与C++/C中的基本数据类型相对应。 [Precision](#precision)为op所支持的精度类型, 稍后我们在介绍它。
-
```c++
-//Create a empty graph object.
-Graph graph = Graph tmp();
+ //Create a empty graph object.
+ Graph graph = Graph tmp();
-//Create a pointer to a empty graph.
-Graph *graph = new Graph();
+ //Create a pointer to a empty graph.
+ Graph *graph = new Graph();
-//Create a pointer to a empty graph.
-auto graph = new Graph();
+ //Create a pointer to a empty graph.
+ auto graph = new Graph();
```
#### 加载 Anakin 模型
```c++
-//some declarations
-...
-auto graph = new Graph();
-std::string model_path = "the/path/to/where/your/models/are";
-const char *model_path1 = "the/path/to/where/your/models/are";
-
-//Loading Anakin model to generate a compute graph.
-auto status = graph->load(model_path);
-
-//Or this way.
-auto status = graph->load(model_path1);
-//Check whether load operation success.
-if(!status){
- std::cout << "error" << endl;
- //do something...
-}
+ //some declarations
+ ...
+ auto graph = new Graph();
+ std::string model_path = "the/path/to/where/your/models/are";
+ const char *model_path1 = "the/path/to/where/your/models/are";
+
+ //Loading Anakin model to generate a compute graph.
+ auto status = graph->load(model_path);
+
+ //Or this way.
+ auto status = graph->load(model_path1);
+ //Check whether load operation success.
+ if(!status){
+ std::cout << "error" << endl;
+ //do something...
+ }
```
#### 优化计算图
```c++
-//some declarations
-...
-//Load graph.
-...
-//According to the ops of loaded graph, optimize compute graph.
-graph->Optimize();
+ //some declarations
+ ...
+ //Load graph.
+ ...
+ //According to the ops of loaded graph, optimize compute graph.
+ graph->Optimize();
```
@@ -400,34 +411,33 @@ graph->Optimize();
你可以在任何时候保存模型, 特别的, 你可以保存一个优化的模型,这样,下次再加载模型时,就不必进行优化操作。
-
```c++
-//some declarations
-...
-//Load graph.
-...
-// save a model
-//save_model_path: the path to where your model is.
-auto status = graph->save(save_model_path);
-
-//Checking
-if(!status){
- cout << "error" << endl;
- //do somethin...
-}
+ //some declarations
+ ...
+ //Load graph.
+ ...
+ // save a model
+ //save_model_path: the path to where your model is.
+ auto status = graph->save(save_model_path);
+
+ //Checking
+ if(!status){
+ cout << "error" << endl;
+ //do somethin...
+ }
```
#### 重新设置计算图里的tensor的shape
```c++
-//some declarations
-...
-//Load graph.
-...
-vector shape{10, 256, 256, 10};
-//input_name : std::string.
-//Reshape a tensor named input_name.
-graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object.
+ //some declarations
+ ...
+ //Load graph.
+ ...
+ vector shape{10, 256, 256, 10};
+ //input_name : std::string.
+ //Reshape a tensor named input_name.
+ graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object.
```
#### 设置 batch size
@@ -435,14 +445,14 @@ graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object.
`Graph` 支持重新设置batch size的大小。
```c++
-//some declarations
-...
-//Load graph.
-...
-//input_name : std::string.
-//Reset a tensor named input_name.
-int new_batch_size = 4;
-graph->ResetBatchSize(input_name, new_batch_size);
+ //some declarations
+ ...
+ //Load graph.
+ ...
+ //input_name : std::string.
+ //Reset a tensor named input_name.
+ int new_batch_size = 4;
+ graph->ResetBatchSize(input_name, new_batch_size);
```
### Net ###
@@ -451,189 +461,185 @@ graph->ResetBatchSize(input_name, new_batch_size);
`Net` 是计算图的执行器。你可以通过Net对象获得输入和输出
#### Creating a graph executor
-`Net`接受四个模板参数。
+`Net`接受四个模板参数。
```c++
-template
-class Net{
- //some implements
- ...
+ template
+ class Net{
+ //some implements
+ ...
-};
+ };
```
由于有些Op可能支持多种精度,我们可以通过Precision来指定。OpRunType表示同步或异步类型,异步是默认类型。OpRunType::SYNC表示同步,在GPU上只有单个流;OpRunType::ASYNC表示异步,在GPU上有多个流并以异步方式执行。实际上,Precision和OpRunType都是enum class, 详细设计请参考*source_root/framework/core/types.h*.
1. Precision
-Precision | Op support
-:---: | :---:
-Precision::INT4 | NO
-Precision::INT8 | NO
-Precision::FP16 | NO
-Precision::FP32 | YES
-Precision::FP64 | NO
+ Precision | Op support
+ :---: | :---:
+ Precision::INT4 | NO
+ Precision::INT8 | NO
+ Precision::FP16 | NO
+ Precision::FP32 | YES
+ Precision::FP64 | NO
现在Op的精度只支持FP32, 但在将来我们会支持剩下的Precision.
+2. OpRunType
+ OpRunType | Sync/Aync |Description
+ :---: | :---: | :---:
+ OpRunType::SYNC | Synchronization | single-stream on GPU
+ OpRunType::ASYNC | Asynchronization | multi-stream on GPU
-2. OpRunType
-
-OpRunType | Sync/Aync |Description
-:---: | :---: | :---:
-OpRunType::SYNC | Synchronization | single-stream on GPU
-OpRunType::ASYNC | Asynchronization | multi-stream on GPU
+用graph对象创建一个执行器
-用graph对象创建一个执行器。
```c++
-//some declarations
-...
-//Create a pointer to a graph.
-auto graph = new Graph();
-//do something...
-...
+ //some declarations
+ ...
+ //Create a pointer to a graph.
+ auto graph = new Graph();
+ //do something...
+ ...
-//create a executor
-Net executor(*graph);
+ //create a executor
+ Net executor(*graph);
```
#### 获取输入输出tensor
-
-获取输入输出tensor,并填充输入tensor的buffer。如果想要获取输入和输出tensor,那么必须指定输入的名字,如"input_0", "input_1", "input_2", ..., 必须传入如上字符串才能够获得输入tensor。另外,如果想知道input_i对应哪个输入,你需要去dash board查看,如何使用dash board请看[Anakin Parser](Converter_ch.md)。请看如下示例代码
+获取输入输出tensor,并填充输入tensor的buffer。如果想要获取输入和输出tensor,那么必须指定输入的名字,如"input_0", "input_1", "input_2", ..., 必须传入如上字符串才能够获得输入tensor。另外,如果想知道input_i对应哪个输入,你需要去dash board查看,如何使用dash board请看[Anakin Parser](./convert_paddle_to_anakin.html)。请看如下示例代码
```c++
-//some declaratinos
-...
-
-//create a executor
-//TargetType is NV [NVIDIA GPU]
-Net executor(*graph);
-
-//Get the first input tensor.
-//The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU.
-//Note: Member function get_in returns an pointer to tensor.
-Tensor* tensor_in0 = executor.get_in("input_0");
-
-//If you have multiple input tensors
-//You just type this code below.
-Tensor* tensor_in1 = executor.get_in("input_1");
-...
-auto tensor_inn = executor.get_in("input_n");
+ //some declaratinos
+ ...
+
+ //create a executor
+ //TargetType is NV [NVIDIA GPU]
+ Net executor(*graph);
+
+ //Get the first input tensor.
+ //The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU.
+ //Note: Member function get_in returns an pointer to tensor.
+ Tensor* tensor_in0 = executor.get_in("input_0");
+
+ //If you have multiple input tensors
+ //You just type this code below.
+ Tensor* tensor_in1 = executor.get_in("input_1");
+ ...
+ auto tensor_inn = executor.get_in("input_n");
```
当得到输入tensor之后,就可以填充它的数据区了。
```c++
-//This tensor is resident at GPU.
-auto tensor_d_in = executor.get_in("input_0");
-
-//If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one.
-
-//using Tensor4d = Tensor;
-Tensor4d tensor_h_in; //host tensor;
-//Tensor tensor_h_in;
-
-//Allocate memory for host tensor.
-tensor_h_in.re_alloc(tensor_d_in->valid_shape());
-//Get a writable pointer to tensor.
-float *h_data = tensor_h_in.mutable_data();
-
-//Feed your tensor.
-/** example
-for(int i = 0; i < tensor_h_in.size(); i++){
- h_data[i] = 1.0f;
-}
-*/
-//Copy host tensor's data to device tensor.
-tensor_d_in->copy_from(tensor_h_in);
-
-// And then
+ //This tensor is resident at GPU.
+ auto tensor_d_in = executor.get_in("input_0");
+
+ //If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one.
+
+ //using Tensor4d = Tensor;
+ Tensor4d tensor_h_in; //host tensor;
+ //Tensor tensor_h_in;
+
+ //Allocate memory for host tensor.
+ tensor_h_in.re_alloc(tensor_d_in->valid_shape());
+ //Get a writable pointer to tensor.
+ float *h_data = tensor_h_in.mutable_data();
+
+ //Feed your tensor.
+ /** example
+ for(int i = 0; i < tensor_h_in.size(); i++){
+ h_data[i] = 1.0f;
+ }
+ */
+ //Copy host tensor's data to device tensor.
+ tensor_d_in->copy_from(tensor_h_in);
+
+ // And then
```
+类似的,我们可以利用成员函数get_out来获得输出tensor。但与获得输入tensor不同的是, 我们需要指定输入tensor结点的名字,这个可以从dash board中看到,请从[Anakin Parser](./convert_paddle_to_anakin.html)中查看dash board的使用方法。假如有个输出结点叫pred_out, 那么我们可以通过如下代码获得相应的输出tensor:
-类似的,我们可以利用成员函数get_out来获得输出tensor。但与获得输入tensor不同的是, 我们需要指定输入tensor结点的名字,这个可以从dash board中看到,请从[Anakin Parser](Converter_ch.md)中查看dash board的使用方法。假如有个输出结点叫pred_out, 那么我们可以通过如下代码获得相应的输出tensor:
```c++
-//Note: this tensor are resident at GPU.
-Tensor* tensor_out_d = executor.get_out("pred_out");
+ //Note: this tensor are resident at GPU.
+ Tensor* tensor_out_d = executor.get_out("pred_out");
```
-
#### Executing graph
-
当一切准备就绪后,我们就可以执行真正的计算了!
```c++
-executor.prediction();
+ executor.prediction();
```
-
+
## 示例代码 ##
下面的例子展示了如何调用Anakin。
-在这儿之前, 请确保你已经有了Anakin模型。如果还没有,那么请使用[Anakin Parser](Converter_ch.md)转换你的模型。
+在这儿之前, 请确保你已经有了Anakin模型。如果还没有,那么请使用[Anakin Parser](./convert_paddle_to_anakin.html)转换你的模型。
### Single-thread
-单线程例子在 *source_root/test/framework/net/net_exec_test.cpp`*
+单线程例子在 *`source_root/test/framework/net/net_exec_test.cpp`*
```c++
-std::string model_path = "your_Anakin_models/xxxxx.anakin.bin";
-// Create an empty graph object.
-auto graph = new Graph();
-// Load Anakin model.
-auto status = graph->load(model_path);
-if(!status ) {
- LOG(FATAL) << " [ERROR] " << status.info();
-}
-// Reshape
-graph->Reshape("input_0", {10, 384, 960, 10});
-// You must optimize graph for the first time.
-graph->Optimize();
-// Create a executer.
-Net net_executer(*graph);
-
-//Get your input tensors through some specific string such as "input_0", "input_1", and
-//so on.
-//And then, feed the input tensor.
-//If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out.
-auto d_tensor_in_p = net_executer.get_in("input_0");
-Tensor4d h_tensor_in;
-auto valid_shape_in = d_tensor_in_p->valid_shape();
-for (int i=0; icopy_from(h_tensor_in);
-
-//Do inference.
-net_executer.prediction();
-
-//Get result tensor through the name of output node.
-//And also, you need to see the dash board again to find out how many output nodes are and remember their name.
-
-//For example, you've got a output node named obj_pre_out
-//Then, you can get an output tensor.
-auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor.
-auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor.
-//......
-// do something else ...
-//...
-//save model.
-//You might not optimize the graph when you load the saved model again.
-std::string save_model_path = model_path + std::string(".saved");
-auto status = graph->save(save_model_path);
-if (!status ) {
- LOG(FATAL) << " [ERROR] " << status.info();
-}
+ std::string model_path = "your_Anakin_models/xxxxx.anakin.bin";
+ // Create an empty graph object.
+ auto graph = new Graph();
+ // Load Anakin model.
+ auto status = graph->load(model_path);
+ if(!status ) {
+ LOG(FATAL) << " [ERROR] " << status.info();
+ }
+ // Reshape
+ graph->Reshape("input_0", {10, 384, 960, 10});
+ // You must optimize graph for the first time.
+ graph->Optimize();
+ // Create a executer.
+ Net net_executer(*graph);
+
+ //Get your input tensors through some specific string such as "input_0", "input_1", and
+ //so on.
+ //And then, feed the input tensor.
+ //If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out.
+ auto d_tensor_in_p = net_executer.get_in("input_0");
+ Tensor4d h_tensor_in;
+ auto valid_shape_in = d_tensor_in_p->valid_shape();
+ for (int i=0; icopy_from(h_tensor_in);
+
+ //Do inference.
+ net_executer.prediction();
+
+ //Get result tensor through the name of output node.
+ //And also, you need to see the dash board again to find out how many output nodes are and remember their name.
+
+ //For example, you've got a output node named obj_pre_out
+ //Then, you can get an output tensor.
+ auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor.
+ auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor.
+ //......
+ // do something else ...
+ //...
+ //save model.
+ //You might not optimize the graph when you load the saved model again.
+ std::string save_model_path = model_path + std::string(".saved");
+ auto status = graph->save(save_model_path);
+ if (!status ) {
+ LOG(FATAL) << " [ERROR] " << status.info();
+ }
```
diff --git a/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md b/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md
index 56ca582b2b47f404ede777712830731ea7f4e9b5..8a35875404ce460705de7559fd5eea1247fb69f5 100644
--- a/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md
+++ b/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md
@@ -1,14 +1,14 @@
# 模型转换指南
-Anakin 支持不同框架的模型预测。但由于格式的差别,Anakin 需要您预先转换模型。本文档介绍如何转换模型。
+Anakin 支持不同框架的模型预测。但由于格式的差别,Anakin 需要您预先转换模型, 本文档介绍如何转换模型。
## 简介
-Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型,模型包含网络结构(model 或 prototxt)和权重参数(param 或 caffemodel)。
+Anakin 模型转换器输入支持 Caffe 和 Paddle 两种格式的预测模型,模型包含网络结构(model 或 prototxt)和权重参数(param 或 caffemodel)。
-模型转换的输出是一个 bin 文件,它作为 Anakin 框架的 graph 参数导入。
+模型转换的输出是一个 bin 文件,它作为 Anakin 框架的 graph 参数导入。
-您还可以使用模型转换器的 launch board 功能生成网络结构的 HTML 预览。
+您还可以使用模型转换器的 launch board 功能生成网络结构的 HTML 预览。
## 系统要求
@@ -22,7 +22,7 @@ Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型
## 用法
### 1、环境
-转换器所需的依赖标注于 *系统要求* 一节。
+转换器所需的依赖标注于*系统要求*一节。
### 2、配置
您需要对 *config.yaml* 文件进行修改以告知您的需求。工程中给出了 *config.yaml* 示例,下面作进一步说明。
@@ -30,7 +30,7 @@ Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型
#### config.yaml
```bash
OPTIONS:
- Framework: CAFFE # 依框架类型填写 CAFFE 或 FLUID
+ Framework: CAFFE # 依框架类型填写 CAFFE 或 Paddle
SavePath: ./output # 转换结束后模型的保存位置
ResultName: googlenet # 输出模型的名字
Config:
@@ -53,13 +53,13 @@ TARGET:
PrototxtPath: /path/to/your/googlenet.prototxt
ModelPath: /path/to/your/googlenet.caffemodel
- FLUID:
- # 当 Framework 为 FLUID 时需填写
+ Paddle:
+ # 当 Framework 为 Paddle 时需填写
Debug: NULL
ProtoPaths:
- /
- PrototxtPath: /path/to/fluid/inference_model
- ModelPath: /path/to/fluid/inference_model
+ PrototxtPath: /path/to/paddle/inference_model
+ ModelPath: /path/to/paddle/inference_model
# ...
```
@@ -68,6 +68,6 @@ TARGET:
### 4、预览
-最后一步,就是在浏览器中查看令人振奋的转换结果!网址是在 *config.yaml* 中配置的,例如 http://0.0.0.0:8888 。
+最后一步,就是在浏览器中查看转换结果!网址是在 *config.yaml* 中配置的,例如 http://0.0.0.0:8888 。
> 注意:若您使用了默认的 IP 地址 0.0.0.0,请在预览时使用真实的服务器地址 real_ip:port 替代它。
diff --git a/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md b/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
index a1f75f5e95cfb90f26d3782ba30a6d1887a70424..da2c64cf4d842b3136adc21872e66f6101a9fbc7 100644
--- a/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
+++ b/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md
@@ -52,7 +52,7 @@ endif()
#cmakedefine USE_TNEW_PLACE
```
-* 其他依赖和编译选项
+* 其他依赖和编译选项
修改`cmake`目录下的`compiler_options.cmake`和`find_modules.cmake`
@@ -231,7 +231,7 @@ struct TargetWrapper { //根据TNEW的具体类型修改__xx
4. 在`impl/`目录下添加设备目录和实现
在`saber/core/impl`目录下添加设备目录`tnew`。
-* 实现`TargetWrapper`结构体中各函数的定义。
+* 实现`TargetWrapper`结构体中各函数的定义。
如果`TargetWrapper`的实现与默认的模板类一致,则不用特化出该类。
```c++
@@ -243,11 +243,11 @@ void TNEW_API::get_device_count(int &count) {
void TNEW_API::set_device(int id){
// add implementation
}
-
+
void TNEW_API::mem_alloc(void** ptr, size_t n){
// add implementation
}
-
+
void TNEW_API::mem_free(void* ptr){
if(ptr != nullptr){
// add implementation
@@ -275,7 +275,7 @@ void Device::get_info() {
### 在`saber/funcs`中实现设备相关的op
-参考[如何增加新的Operator](addCustomOp.md)
+参考[如何增加新的Operator](./how_to_add_anakin_op.html)
## 在`framework`中添加设备的具体化或实例化 ##
@@ -329,7 +329,7 @@ public:
typedef Tensor4d::type> type;
PBlock() {
- _inner_tensor = std::make_shared();
+ _inner_tensor = std::make_shared();
}
...
}
@@ -348,7 +348,7 @@ struct target_host {
### `framework/graph`
* `graph.cpp`中添加实例化
-
+
```c++
#ifdef USE_TNEW_PLACE
template class Graph;
@@ -360,7 +360,7 @@ struct target_host {
### `framework/model_parser`
* `parser.cpp`中添加实例化
-
+
```c++
#ifdef USE_TNEW_PLACE
template
@@ -372,7 +372,7 @@ struct target_host {
template
Status load(graph::Graph* graph,
const char* model_path);
-
+
template
Status save(graph::Graph* graph,
std::string& model_path);
@@ -382,7 +382,7 @@ struct target_host {
template
Status save(graph::Graph* graph,
std::string& model_path);
-
+
template
Status load(graph::Graph* graph,
std::string& model_path);
@@ -392,7 +392,7 @@ struct target_host {
template
Status load(graph::Graph* graph,
std::string& model_path);
-
+
template
Status save(graph::Graph* graph,
const char* model_path);
diff --git a/doc/fluid/advanced_usage/deploy/index_anakin.rst b/doc/fluid/advanced_usage/deploy/index_anakin.rst
index b561a577d557b2b7dac9065c44ac7d3500261aad..32d26156aed1d340482dbe2cb5a273c0679395cd 100644
--- a/doc/fluid/advanced_usage/deploy/index_anakin.rst
+++ b/doc/fluid/advanced_usage/deploy/index_anakin.rst
@@ -10,12 +10,13 @@ Anakin 预测引擎
install_anakin.md
convert_paddle_to_anakin.md
- run_anakin_on_arm.md
anakin_tutorial.md
+ anakin_run_on_arm.md
anakin_example.md
anakin_gpu_benchmark.md
anakin_arm_benchmark.md
+
开发文档
~~~~~~~
@@ -24,3 +25,4 @@ Anakin 预测引擎
how_to_add_anakin_op.md
how_to_support_new_device_in_anakin.md
+ anakin_parser_design.md
diff --git a/doc/fluid/advanced_usage/deploy/install_anakin.md b/doc/fluid/advanced_usage/deploy/install_anakin.md
index bb7c1950308622e3de292268a718e6ec688e6ae6..0b44a6be3baa51598fa8b2f2af863bed6c9c64e9 100644
--- a/doc/fluid/advanced_usage/deploy/install_anakin.md
+++ b/doc/fluid/advanced_usage/deploy/install_anakin.md
@@ -1,4 +1,4 @@
-## 从源码编译安装Anakin ##
+## 源码编译安装Anakin ##
我们已经在CentOS 7.3上成功的安装和测试了Anakin,对于其他操作系统,我们将很快支持。
@@ -6,7 +6,7 @@
* [在CentOS上安装 Anakin]()
* [在Ubuntu上安装 Anakin]()
-* [在ARM上安装 Anakin](run_on_arm_ch.md)
+* [在ARM上安装 Anakin](./anakin_run_on_arm.html)
* [验证安装]()
@@ -17,7 +17,6 @@
* cmake 2.8.12+
* gcc 4.8.2+
* g++ 4.8.2+
-* 其他需要补充的。。。
#### 2. 编译CPU版Anakin ####
@@ -26,30 +25,37 @@
#### 3. 编译支持NVIDIA GPU的Anakin ####
- 3.1. 安装依赖
- - 3.1.1 protobuf
- >$ git clone https://github.com/google/protobuf
- >$ cd protobuf
- >$ git submodule update --init --recursive
- >$ ./autogen.sh
- >$ ./configure --prefix=/path/to/your/insall_dir
- >$ make
- >$ make check
- >$ make install
- >$ sudo ldconfig
+ - 3.1.1 protobuf
- 如安装protobuf遇到任何问题,请访问[这里](https://github.com/google/protobuf/blob/master/src/README.md)
+ ```
+ > git clone https://github.com/google/protobuf
+ > cd protobuf
+ > git submodule update --init --recursive
+ > ./autogen.sh
+ > ./configure --prefix=/path/to/your/insall_dir
+ > make
+ > make check
+ > make install
+ > sudo ldconfig
+ ```
+
+ 如安装protobuf遇到任何问题,请访问[这里](https://github.com/google/protobuf/blob/master/src/README.md)
- 3.2 CUDA Toolkit
- - [CUDA 8.0](https://developer.nvidia.com/cuda-zone) or higher. 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
- - [cuDNN v7](https://developer.nvidia.com/cudnn). 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
+
+ - [CUDA 8.0](https://developer.nvidia.com/cuda-zone) or higher, 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
+ - [cuDNN v7](https://developer.nvidia.com/cudnn), 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/).
+
- 3.3 编译Anakin
- >$ git clone https:/xxxxx
- >$ cd anakin
- >$ mkdir build
- >$ camke ..
- >$ make
+ ```
+ > git clone https:/xxxxx
+ > cd anakin
+ > mkdir build
+ > camke ..
+ > make
+ ```
#### 4. 编译支持AMD GPU的Anakin ####
@@ -63,7 +69,8 @@
### 在ARM上安装 Anakin ###
-暂时还不支持
+请参考[ARM安装文档](./anakin_run_on_arm.html)
### 验证安装 ###
-we are coming soon...
+
+安装完成后,如果没有报错信息,你可以通过运行 `output/unit_test`路径下的单测示例验证是否编译成功。
diff --git a/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md b/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md
index ebeb38f534ebfc8cb5a41d103abe3bb1de7e379a..f61beca7ef21198fc992f0dafd9bfc464b4a60f5 100644
--- a/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md
+++ b/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md
@@ -1,4 +1,4 @@
-## 源码编译 Anakin ##
+## ARM 源码编译 Anakin ##
目前Anakin支持ARM Android平台,采用Android NDK交叉编译工具链,已在mac os和centos上编译和测试通过。
@@ -12,37 +12,44 @@
### 1. 系统需求 ###
-* 宿主机: linux, mac
-* cmake 3.8.2+
+* 宿主机: linux, mac
+* cmake 3.8.2+
* Android NDK r14, Linux 版本[从这里下载](https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip)
### 2. 安装第三方依赖 ###
-- 2.1 protobuf3.4.0
- 源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0)
- - 2.1.1 为宿主机编译protobuf
- ```bash
- $ tar -xzf protobuf-3.4.0.tar.gz
- $ cd protobuf-3.4.0
- $ ./autogen.sh
- $ ./configure
- $ make
- $ make check
+- 2.1 protobuf3.4.0
+
+ 源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0)
+
+ - 2.1.1 为宿主机编译protobuf
+
+```bash
+ $ tar -xzf protobuf-3.4.0.tar.gz
+ $ cd protobuf-3.4.0
+ $ ./autogen.sh
+ $ ./configure
+ $ make
+ $ make check
$ make install
- ```
- 上述 $make install 执行后,可在 /usr/local/include/google 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下,
- 如有问题,请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md)。
- 然后将已经生成文件清除。
- ```bash
+```
+
+上述 $make install 执行后,可在 /usr/local/include/google 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下
+
+如有问题,请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md),然后将已经生成文件清除。
+
+```bash
$ make distclean
- ```
- - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf,注意设置ANDROID_NDK的路径,以及ARCH_ABI、HOSTOSN的值,
+```
+
+ - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf,注意设置ANDROID_NDK的路径,以及ARCH_ABI、HOSTOSN的值
+
```bash
- $ export ANDROID_NDK=your_ndk_path
+ $ export ANDROID_NDK=your_ndk_path
$ ARCH_ABI="arm-linux-androideabi-4.9"
$ HOSTOSN="darwin-x86_64"
- $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm
+ $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm
$ export PREBUILT=$ANDROID_NDK/toolchains/$ARCH_ABI
$ export LDFLAGS="--sysroot=$SYSROOT"
$ export LD="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/arm-linux-androideabi/bin/ld $LDFLAGS"
@@ -53,34 +60,38 @@
$ export CCFLAGS="$CXXFLAGS"
$ export CXX="$PREBUILT/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-g++ $CXXFLAGS"
$ export CC="$CXX"
- $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib"
- $ ./autogen.sh
- $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD"
+ $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib"
+ $ ./autogen.sh
+ $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD"
$ make
- ```
-
- 编译生成 *.a 静态库,若希望编译*.so 动态链接库 ,请在./configure参数中改--disable-shared为--disable-static --enable-shared。
- 生成文件在src/.libs/下,将生成的文件拷贝至Anakin/third-party/arm-android/protobuf/lib下。
- 在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。
- ```cmake
+```
+
+编译生成 *.a 静态库,若希望编译*.so 动态链接库 ,请在./configure参数中改--disable-shared为--disable-static --enable-shared。
+生成文件在src/.libs/下,将生成的文件拷贝至Anakin/third-party/arm-android/protobuf/lib下。
+在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。
+
+```cmake
set(ARM_RPOTO_ROOT "${CMAKE_SOURCE_DIR}/third-party/arm-android/protobuf")
- ```
-
-- 2.2 opencv 2.4.3+(optional)
- Anakin只在examples示例中使用opencv
- Android系统的opencv从[这里下载](https://opencv.org/releases.html)
- 解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a`
- 在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv`,
- 并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径。
- ```cmake
+```
+
+- 2.2 opencv 2.4.3+(optional)
+
+ Anakin只在examples示例中使用opencv
+ Android系统的opencv从[这里下载](https://opencv.org/releases.html)
+ 解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a`
+ 在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv`,
+ 并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径。
+
+ ```cmake
include_directories(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/jni/include/)
LINK_DIRECTORIES(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/libs/armeabi-v7a/)
- ```
+ ```
### 3. Anakin源码编译 ###
#### 编译Android版本
- 克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm)
+ 克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm)
+
```bash
cd your_dir
git clone https://github.com/PaddlePaddle/Anakin.git
@@ -88,64 +99,87 @@
git fetch origin arm
git checkout arm
```
- 修改`android_build.sh`
-- 修改NDK路径
+
+ 修改`android_build.sh`
+
+- 修改NDK路径
+
```bash
#modify "your_ndk_path" to your NDK path
export ANDROID_NDK=your_ndk_path
```
-- 修改ARM 处理器架构
- 对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON`,
- 对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a`。
- 目前我们只支持 `armeabi-v7a with NEON`;`arm64-v8a` 还在开发中。
+
+- 修改ARM 处理器架构
+
+ 对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON`,
+ 对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a`。
+ 目前我们只支持 `armeabi-v7a with NEON`;`arm64-v8a` 还在开发中。
+
```bash
-DANDROID_ABI="armeabi-v7a with NEON"
```
-- 设置Android API
- 根据Android系统的版本设置API level, 例如API Level 21 -> Android 5.0.1
+
+- 设置Android API
+
+ 根据Android系统的版本设置API level, 例如API Level 21 -> Android 5.0.1
```bash
-DANDROID_NATIVE_API_LEVEL=21
```
-- 选择编译静态库或动态库
- 设置`BUILD_SHARED=NO`编译静态库
- 设置`BUILD_SHARED=YES`编译动态库
+- 选择编译静态库或动态库
+
+ 设置`BUILD_SHARED=NO`编译静态库
+ 设置`BUILD_SHARED=YES`编译动态库
+
```bash
-DBUILD_SHARED=NO
```
-- OpenMP多线程支持
- 设置`USE_OPENMP=YES`开启OpenMP多线程
+- OpenMP多线程支持
+
+ 设置`USE_OPENMP=YES`开启OpenMP多线程
+
```bash
-DUSE_OPENMP=YES
```
-
-- 编译单测文件
- 设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件
- ```bash
- -DBUILD_WITH_UNIT_TEST=YES
- ```
-
-- 编译示例文件
- 设置`BUILD_EXAMPLES=YES`将会编译示例文件
- ```bash
- -DBUILD_EXAMPLES=YES
- ```
-
-- 开启opencv
- 如果使用opencv,设置`USE_OPENCV=YES`
- ```bash
- -DUSE_OPENCV=YES
- ```
-
-- 开始编译
- 运行脚本 `android_build.sh` 将自动编译Anakin
+
+- 编译单测文件
+
+ 设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件
+
+ ```bash
+ -DBUILD_WITH_UNIT_TEST=YES
+ ```
+
+- 编译示例文件
+
+ 设置`BUILD_EXAMPLES=YES`将会编译示例文件
+
+ ```bash
+ -DBUILD_EXAMPLES=YES
+ ```
+
+- 开启opencv
+
+ 如果使用opencv,设置`USE_OPENCV=YES`
+
+ ```bash
+ -DUSE_OPENCV=YES
+ ```
+
+- 开始编译
+
+ 运行脚本 `android_build.sh` 将自动编译Anakin
+
```bash
./android_build.sh
```
-### 4. 验证安装 ###
- 编译好的库会放在目录`${Anakin_root}/output`下;
- 编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下;
- 编译好的示例文件会放在`${Anakin_root}/output/examples`目录下。
-
- 对于Android系统,打开设备的调试模式,通过ADB可以访问的目录是`data/local/tmp`,通过ADB push将测试文件、模型和数据发送到设备目录, 运行测试文件。
+### 4. 验证安装 ###
+
+ 编译好的库会放在目录`${Anakin_root}/output`下
+
+ 编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下
+
+ 编译好的示例文件会放在`${Anakin_root}/output/examples`目录下
+
+ 对于Android系统,打开设备的调试模式,通过ADB可以访问的目录是`data/local/tmp`,通过ADB push将测试文件、模型和数据发送到设备目录,运行测试文件。