From 6ea4618f43abb21c3cbfccc37c84b50d7db9d753 Mon Sep 17 00:00:00 2001 From: chenjiaoAngel Date: Wed, 10 Oct 2018 15:58:13 +0800 Subject: [PATCH] change .md to .html to fix Anakin's doc (#127) * add anakin docker * add * add doc * delete docker * Update index_anakin_ch.rst * Update index_anakin_ch.rst * Update index_anakin_ch.rst * Update index_anakin_ch.rst * Update index_anakin.rst * Update anakin_parser_design_ch.md * Update and rename anakin_parser_design_ch.md to anakin_parser_design.md * Delete index_anakin_ch.rst * Update anakin_parser_design.md * update * update * Update anakin_parser_design.md * fix format * fix format * fix format * Delete .DS_Store * Delete .DS_Store * Delete menu.json * Update anakin_parser_design.md * change fluid to paddle * Update anakin_arm_benchmark.md * Update anakin_gpu_benchmark.md * fix .md to .html * fix .md to .html * fix .md to .html --- .../deploy/anakin_arm_benchmark.md | 18 +- .../advanced_usage/deploy/anakin_example.md | 14 +- .../deploy/anakin_gpu_benchmark.md | 41 +- .../deploy/anakin_parser_design.md | 92 ++ .../deploy/anakin_run_on_arm.md | 193 +++++ .../advanced_usage/deploy/anakin_tutorial.md | 794 +++++++++--------- .../deploy/convert_paddle_to_anakin.md | 22 +- .../how_to_support_new_device_in_anakin.md | 22 +- .../advanced_usage/deploy/index_anakin.rst | 4 +- .../advanced_usage/deploy/install_anakin.md | 53 +- .../deploy/run_anakin_on_arm.md | 198 +++-- 11 files changed, 895 insertions(+), 556 deletions(-) create mode 100644 doc/fluid/advanced_usage/deploy/anakin_parser_design.md create mode 100644 doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md diff --git a/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md b/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md index 08ea379f8..e8701b2b5 100644 --- a/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md +++ b/doc/fluid/advanced_usage/deploy/anakin_arm_benchmark.md @@ -25,15 +25,15 @@ ### mobilenetv1 - |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| + |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| |麒麟960|107.7ms|61.1ms|38.2ms|152.8ms|85.2ms|51.9ms|152.6ms|nan|nan| |高通835|105.7ms|63.1ms|~~46.8ms~~|152.7ms|87.0ms|~~92.7ms~~|146.9ms|nan|nan| - |高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan| + |高通653|120.3ms|64.2ms|46.6ms|202.5ms|117.6ms|84.8ms|158.6ms|nan|nan| ### mobilenetv2 - |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| + |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| |麒麟960|93.1ms|53.9ms|34.8ms|144.4ms|84.3ms|55.3ms|100.6ms|nan|nan| |高通835|93.0ms|55.6ms|41.1ms|139.1ms|88.4ms|58.1ms|95.2ms|nan|nan| @@ -41,7 +41,7 @@ ### mobilenet-ssd - |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| + |platform | Anakin (1) | Anakin (2) | Anakin (4) | ncnn (1) | ncnn (2) | ncnn (4) | TFlite (1) | TFlite (2) | TFlite (4)| |:---: | :---: | :---: | :---:| :---:| :---:| :---:| :---:| :---:| :---:| |麒麟960|213.9ms|120.5ms|74.5ms|307.9ms|166.5ms|104.2ms|nan|nan|nan| |高通835|213.0ms|125.7ms|~~98.4ms~~|292.9ms|177.9ms|~~167.8ms~~|nan|nan|nan| @@ -49,8 +49,8 @@ ## How to run those Benchmark models? -1. 首先, 使用[External Converter](../docs/Manual/Converter_en.md)对caffe model 进行转换 -2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机 -3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令 -4. 最后,终端显示器上将会打印该模型的运行时间 -5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到 + 1. 首先, 使用[External Converter](./convert_paddle_to_anakin.html)对caffe model 进行转换 + 2. 然后将转换后的Anakin model和编译好的benchmark_arm 二进制文件通过'adb push'命令上传至测试机 + 3. 接着在测试机含有Anakin model的目录中运行'./benchmark_arm ./ anakin_model.anakin.bin 1 10 10 1' 命令 + 4. 最后,终端显示器上将会打印该模型的运行时间 + 5. 其中运行命令的参数个数和含义可以通过运行'./benchmark_arm'看到 diff --git a/doc/fluid/advanced_usage/deploy/anakin_example.md b/doc/fluid/advanced_usage/deploy/anakin_example.md index e6b9e18fe..3cd684982 100644 --- a/doc/fluid/advanced_usage/deploy/anakin_example.md +++ b/doc/fluid/advanced_usage/deploy/anakin_example.md @@ -1,10 +1,14 @@ -# Example +# Anakin 运行模型示例 + Anakin目前只支持NCHW的格式 + 示例文件在test/framework/net下 ## 在NV的GPU上运行CNN模型 + 示例文件为打开example_nv_cnn_net.cpp,整体流程如下: -- 将模型的的path设置为anakin模型的路径,初始化NV平台的图对象。 anakin模型可以通过转换器转化caffe或fluid的模型得到 + +- 将模型的的path设置为anakin模型的路径,初始化NV平台的图对象。 anakin模型可以通过转换器转化caffe或Paddle的模型得到 - 根据模型设置网络图的输入尺寸,进行图优化 - 根据优化后的网络图初始化网络执行器 - 取出网络的输入tensor,将数据拷贝到输入tensor @@ -14,15 +18,21 @@ 以NV平台为例演示Anakin框架的使用方法,注意编译时需要打开GPU编译开关 ## 在X86上运行RNN模型 + 示例文件为example_x86_rnn_net.cpp + 整体流程与在NV的GPU上运行CNN模型相似,不同之处如下: + - 使用X86标识初始化图对象和网络执行器对象 - rnn模型的输入尺寸是可变的,初始化图时的输入维度是维度的最大值,输入维度N代表总的词的个数。还需要设置输入tensor的seq_offset来标示这些词是如何划分为句子的,如{0,5,12}表示共有12个词,其中第0到第4个词是第一句话,第5到第11个词是第二句话 以X86平台为例演示Anakin框架的使用方法,注意编译时需要打开X86编译开关 ## 在NV的GPU上使用Anakin的线程池运行CNN模型 + 示例文件为example_nv_cnn_net_multi_thread.cpp ,示例使用worker的同步预测接口 + 整体流程与在NV的GPU上运行CNN模型相似,不同之处如下: + - 用模型地址和线程池大小初始化worker对象 - 将输入tensor注入任务队列,获得输出tensor diff --git a/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md b/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md index 667f9396f..72a5d50d9 100644 --- a/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md +++ b/doc/fluid/advanced_usage/deploy/anakin_gpu_benchmark.md @@ -1,33 +1,28 @@ -# Anakin GPU Benchmark +# Anakin GPU 性能测试 -## Machine: +## 环境: > CPU: `12-core Intel(R) Xeon(R) CPU E5-2620 v2 @2.10GHz` > GPU: `Tesla P4` > cuDNN: `v7` -## Counterpart of anakin : +## anakin 对比对象: -The counterpart of **`Anakin`** is the acknowledged high performance inference engine **`NVIDIA TensorRT 3`** , The models which TensorRT 3 doesn't support we use the custom plugins to support. +**`Anakin`** 将与高性能的推理引擎 **`NVIDIA TensorRT 3`** 进行比较 ## Benchmark Model -The following convolutional neural networks are tested with both `Anakin` and `TenorRT3`. - You can use pretrained caffe model or the model trained by youself. +> 注意在性能测试之前,请先将测试model通过 `External Converter` 工具转换为Anakin model +> 对这些model,本文在GPU上进行单线程单GPU卡的性能测试。 -> Please note that you should transform caffe model or others into anakin model with the help of [`external converter ->`](../docs/Manual/Converter_en.md) - - -- [Vgg16](#1) *caffe model can be found [here->](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)* -- [Yolo](#2) *caffe model can be found [here->](https://github.com/hojel/caffe-yolo-model)* -- [Resnet50](#3) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)* -- [Resnet101](#4) *caffe model can be found [here->](https://github.com/KaimingHe/deep-residual-networks#models)* -- [Mobilenet v1](#5) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)* -- [Mobilenet v2](#6) *caffe model can be found [here->](https://github.com/shicai/MobileNet-Caffe)* -- [RNN](#7) *not support yet* - -We tested them on single-GPU with single-thread. +- [Vgg16](#1) *caffe model 可以在[这儿](https://gist.github.com/jimmie33/27c1c0a7736ba66c2395)下载* +- [Yolo](#2) *caffe model 可以在[这儿](https://github.com/hojel/caffe-yolo-model)下载* +- [Resnet50](#3) *caffe model 可以在[这儿](https://github.com/KaimingHe/deep-residual-networks#models)下载* +- [Resnet101](#4) *caffe model 可以在[这儿](https://github.com/KaimingHe/deep-residual-networks#models)下载* +- [Mobilenet v1](#5) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* +- [Mobilenet v2](#6) *caffe model 可以在[这儿](https://github.com/shicai/MobileNet-Caffe)下载* +- [RNN](#7) *暂不支持* ### VGG16 @@ -162,9 +157,9 @@ We tested them on single-GPU with single-thread. | 8 | 421 | 351 | | 32 | 637 | 551 | -## How to run those Benchmark models? +## How to run those Benchmark models -> 1. At first, you should parse the caffe model with [`external converter`](https://github.com/PaddlePaddle/Anakin/blob/b95f31e19993a192e7428b4fcf852b9fe9860e5f/docs/Manual/Converter_en.md). -> 2. Switch to *source_root/benchmark/CNN* directory. Use 'mkdir ./models' to create ./models and put anakin models into this file. -> 3. Use command 'sh run.sh', we will create files in logs to save model log with different batch size. Finally, model latency summary will be displayed on the screen. -> 4. If you want to get more detailed information with op time, you can modify CMakeLists.txt with setting `ENABLE_OP_TIMER` to `YES`, then recompile and run. You will find detailed information in model log file. +1. 首先, 使用[External Converter](./convert_paddle_to_anakin.html)对caffe model 进行转换 +2. 然后跳转至 *source_root/benchmark/CNN* 目录下,使用 'mkdir ./models'创建存放模型的目录,并将转换好的Anakin模型放在该目录下 +3. 运行脚本 `sh run.sh`,运行结束后,该模型的运行时间将会显示到终端上 +4. 如果你想获取每层OP的运行时间,你只用将 CMakeLists.txt 中的`ENABLE_OP_TIMER` 设置为 `YES` 即可 diff --git a/doc/fluid/advanced_usage/deploy/anakin_parser_design.md b/doc/fluid/advanced_usage/deploy/anakin_parser_design.md new file mode 100644 index 000000000..e2ec0c68d --- /dev/null +++ b/doc/fluid/advanced_usage/deploy/anakin_parser_design.md @@ -0,0 +1,92 @@ +# Parser的编写指南 + + Parser是一种网络框架转换工具,将其他框架如Caffe、TensorFlow的网络结构转换为Anakin网络结构图,然后对转换后的Anakin图进行预测处理 + + 本文主要介绍Parser功能的框架结构和根据已有的网络框架改写Parser,以解析得到Anakin框架图,进行Anakin预测 + + 下文称Anakin为AK,运算操作为OP,本文参考TensorFlow的Parser编写,参考代码目录为tools/external_converter_v2/parser/tensorflow + +## Parser的功能和执行流程 + + Parser功能是将其他深度学习框架(如Caffe,TensorFlow,ONNX)的模型转换为AK的模型 + + 对AK的作用是屏蔽不同框架间的差异,这种差异包括模型存储、OP的定义、图差异 + + 因此Parser的执行流程是: + + - 将源框架的模型载入Parser + - 将原框架的图解析为AK中的OP节点和OP节点的连接关系 + - 进行OP定义的转换和图优化 + - 将符合AK标准的图写入protobuf + +## Parser的目录结构 + + Parser工具在tools/external_converter_v2/parser目录下 + + Parser的目录主要包含3部分: + + - Parser的运行配置文件包括 config.py, config.yaml, converter.py, 用户只用执行converter.py,Parser就会按照config.yaml中的声明去解析模型 + - Parser的公共定义,包括operations,pbs,proto三个目录。Parser的公共工具函数 graph*.py logger.py utils.py + - 各个框架对应的Parser,其目录的命名方式为框架名,如Caffe, TensorFlow + +## Parser的编写流程 + +### 1、声明你的Parser + + - 在config.yaml中填写你的Parser运行的必要信息,包括ProtoPath和SavePath等。OPTIONS/Framework改为你的Parser的类型,TARGET下填写对应的参数列表 + - 添加你的Parser目录,如TensorFlow,导出你的Parser符号。注意,Parser的框架默认调用你的Parser类中的__call__方法来执行解析,这个方法需要返回填写完毕的GraphProtoIO对象 + - 在config.py中Configuration下__init__函数中增加对你的Parser的调用,将yaml中读取的配置信息传给你的Parser,此处调用你的Parser中的__init__方法 + +### 2、添加你的Parser主体 + + 可以参考parser_tf.py + + - 你需要在Parser主体构造时获取模型路径,input,ouput名字等解析必须的信息 + - 在__call__中返回填写好的GraphProtoIO对象,该对象为填写protobuf的辅助工具 + - 建议Parser的解析过程分成三部分,先将原框架的模型载入并转换为一种便于修改的中间的图形式;对中间图修改使得图满足AK的要求;将满足要求的中间图利用NodeProtoIO和GraphProtoIO这两个辅助类填入protobuf,具体细节可以参考parser_tf + +### 3、读取原始模型,并将模型转换为中间类型 + + 可以参考parse_tf_2_med.py + + - 这一步与原始框架结合紧密,你可能需要import原始框架的工具函数来完成模型的裁剪、固定、加载等操作 + - 大部分的框架都是使用tensor来连接OP的,但AK中是OP直接相连,这点需要注意 + - AK的shape默认是4维的,有的参数的shape不足4维,需要Parser补全 + +### 4、对中间类型的图进行优化 + + 可以参考med_graph.py + + - 由于AK不支持普通OP多输出的情况,需要在多输出的OP后面补上Splite类型的OP节点 + - 对于Convlution后接Batchnorm这种可以合并又不会导致OP定义改变的情况,需要Parser在这一步做掉 + - AK规定所有的输入类型OP的名字必须是input_x这种命名方式,其中x为从0开始的数字 + +### 5、将中间类型的图以GraphProtoIO的方式保存 + + 可以参考parse_med_2_ak.py 和 parser_tf.py + + - 你首先需要构造Node节点,Node节点的名字是OP的名字(如conv2d_1_a_0),Node节点中OP成员变量的名字是Node节点的类型(如Convlution) + - Node节点需要按照输入的顺序用Node的add_in方法填写输入Node的名字,add_out方法按顺序填写输出Node的名字 + - 通过调用GraphProtoIO的add_node方法将构造好的Node的__call__方法的返回值作为参数,将Node节点加入AK的graph中 + - 调用GraphProtoIO的add_in_edge和add_out_edge完成AK图中OP间关系的构建。如果Node中的in和out填写正确,你也可以通过调用GraphProtoIO的format_edge_from_nodes方法完成这个工作 + - AK的模型需要Parser给出输出Node的名字,使用GraphProtoIO的add_out方法填写输出Node的名字 + +### 6、检查模型解析的正确性 + + - 默认的config.yaml配置会在解析结束后启动一个web服务器展示解析后的AK模型图,你需要对比原框架的模型图进行验证。这里最容易出现的错误是边关系的错误,表现为图非常乱,你需要逐条边地检查错误;第二个容易出错的地方是参数漏填,需要你检查OP中的属性 + - 将解析后的模型放入AK中执行,使用相同的输入,原框架与AK有相同的输出。若果输出不一致可以开启AK的DEBUG模式,在net.cpp中将没层的输出打印;如果AK在解析阶段陷入死循环,大概率是边的关系出错 + +## 如何添加新OP + + - 需要在AK代码中加入该OP的实现,包括对应设备Saber的OP,Saber单测和Framework中的OP + - 根据Framework的OP在ops.py中添加Parser公共的OP定义 + - 从原框架的模型中解析出该OP的节点,并在AK的graph中填入该OP节点 + +## AK模型与其他框架模型的不同之处 + + + AK模型与caffe的模型相似,因此与其他模型有很多不同的地方,需要Parser在解析过程中处理掉 + + 最大的不同是与PaddlePaddle或TensorFlow的模型中OP粒度很细,而AK的模型中OP的粒度很粗(目的是为了节省访存开销)。这会导致解析这些框架的模型时存在大量的合并操作 + + 其次是OP的行为不同,如TensorFlow中Pooling默认都是exclusive的,而AK中是inclusive的。TensorFlow的Padding,如果是奇数pad,则在右方和下方多pad,而AK是在左方和上方多Pad + + AK默认的布局是NCHW,如果其他框架的OP是其他形式的,需要在Parser中做weights的布局转换,并处理reshape的问题 + + AK中有的weights是需要预先做布局转换的(如GRU,LSTM),AK中也支持同一OP的不同算法,如(GRU,Pooling) + diff --git a/doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md b/doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md new file mode 100644 index 000000000..cdebd4ae0 --- /dev/null +++ b/doc/fluid/advanced_usage/deploy/anakin_run_on_arm.md @@ -0,0 +1,193 @@ +## ARM 源码编译 Anakin ## + +目前Anakin支持ARM Android平台,采用Android NDK交叉编译工具链,已在mac os和centos上编译和测试通过。 + +### 安装概览 ### + +* [系统需求](#0001) +* [安装第三方依赖](#0002) +* [Anakin源码编译](#0003) +* [验证安装](#0004) + + +### 1. 系统需求 ### + +* 宿主机: linux, mac +* cmake 3.8.2+ +* Android NDK r14, Linux 版本[从这里下载](https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip) + +### 2. 安装第三方依赖 ### + +- 2.1 protobuf3.4.0 + + 源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0) + + - 2.1.1 为宿主机编译protobuf + + ```bash + $ tar -xzf protobuf-3.4.0.tar.gz + $ cd protobuf-3.4.0 + $ ./autogen.sh + $ ./configure + $ make + $ make check + $ make install + ``` + + 上述 $make install 执行后,可在 `/usr/local/include/google` 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下, 然后将已经生成文件清除。 + + 如有问题,请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md)。 + + ```bash + $ make distclean + ``` + + - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf,注意设置ANDROID_NDK的路径,以及ARCH_ABI、HOSTOSN的值 + + ```bash + + $ export ANDROID_NDK=your_ndk_path + $ ARCH_ABI="arm-linux-androideabi-4.9" + $ HOSTOSN="darwin-x86_64" + $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm + $ export PREBUILT=$ANDROID_NDK/toolchains/$ARCH_ABI + $ export LDFLAGS="--sysroot=$SYSROOT" + $ export LD="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/arm-linux-androideabi/bin/ld $LDFLAGS" + $ export LIBS="-llog $ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/libgnustl_static.a" + $ export CPPFLAGS="" + $ export INCLUDES="-I$ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/include/ -I$ANDROID_NDK/platforms/android-9/arch-arm/usr/include/ -I$ANDROID_NDK/sources/cxx-stl/gnu-libstdc++/4.9/libs/armeabi-v7a/include/" + $ export CXXFLAGS="-march=armv7-a -mfloat-abi=softfp -DGOOGLE_PROTOBUF_NO_RTTI --sysroot=$SYSROOT" + $ export CCFLAGS="$CXXFLAGS" + $ export CXX="$PREBUILT/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-g++ $CXXFLAGS" + $ export CC="$CXX" + $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib" + $ ./autogen.sh + $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD" + $ make + ``` + + 编译生成 *.a 静态库,若希望编译*.so 动态链接库 ,请在./configure参数中改--disable-shared为--disable-static --enable-shared + + 生成文件在`src/.libs/`下,将生成的文件拷贝至`Anakin/third-party/arm-android/protobuf/lib`下 + + 在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。 + + ```cmake + set(ARM_RPOTO_ROOT "${CMAKE_SOURCE_DIR}/third-party/arm-android/protobuf") + ``` + +- 2.2 opencv 2.4.3+(optional) + + Anakin只在examples示例中使用opencv + + Android系统的opencv从[这里下载](https://opencv.org/releases.html) + + 解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a` + + 在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv` + + 并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径 + + ```cmake + include_directories(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/jni/include/) + LINK_DIRECTORIES(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/libs/armeabi-v7a/) + ``` + +### 3. Anakin源码编译 ### + +#### 编译Android版本 + +克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm) + +```bash + cd your_dir + git clone https://github.com/PaddlePaddle/Anakin.git + cd Anakin + git fetch origin arm + git checkout arm +``` + +修改`android_build.sh` + + - 修改NDK路径 + + ```bash + #modify "your_ndk_path" to your NDK path + export ANDROID_NDK=your_ndk_path + ``` + + - 修改ARM 处理器架构 + + 对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON` + 对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a` + 目前我们只支持 `armeabi-v7a with NEON`;`arm64-v8a` 还在开发中 + + ```bash + -DANDROID_ABI="armeabi-v7a with NEON" + ``` + +- 设置Android API + + 根据Android系统的版本设置API level, 例如API Level 21 -> Android 5.0.1 + + ```bash + -DANDROID_NATIVE_API_LEVEL=21 + ``` + +- 选择编译静态库或动态库 + + 设置`BUILD_SHARED=NO`编译静态库 + 设置`BUILD_SHARED=YES`编译动态库 + + ```bash + -DBUILD_SHARED=NO + ``` + +- OpenMP多线程支持 + + 设置`USE_OPENMP=YES`开启OpenMP多线程 + + ```bash + -DUSE_OPENMP=YES + ``` + +- 编译单测文件 + + 设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件 + + ```bash + -DBUILD_WITH_UNIT_TEST=YES + ``` + +- 编译示例文件 + + 设置`BUILD_EXAMPLES=YES`将会编译示例文件 + ```bash + -DBUILD_EXAMPLES=YES + ``` + +- 开启opencv + + 如果使用opencv,设置`USE_OPENCV=YES` + + ```bash + -DUSE_OPENCV=YES + ``` + +- 开始编译 + + 运行脚本 `android_build.sh` 将自动编译Anakin + + ```bash + ./android_build.sh + ``` + +### 4. 验证安装 ### + +编译好的库会放在目录`${Anakin_root}/output`下; + +编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下; + +编译好的示例文件会放在`${Anakin_root}/output/examples`目录下。 + +对于Android系统,打开设备的调试模式,通过ADB可以访问的目录是`data/local/tmp`,通过ADB push将测试文件、模型和数据发送到设备目录, 运行测试文件。 diff --git a/doc/fluid/advanced_usage/deploy/anakin_tutorial.md b/doc/fluid/advanced_usage/deploy/anakin_tutorial.md index 5efbc89ab..1658aae63 100644 --- a/doc/fluid/advanced_usage/deploy/anakin_tutorial.md +++ b/doc/fluid/advanced_usage/deploy/anakin_tutorial.md @@ -1,7 +1,7 @@ # Anakin 使用教程 ## 本教程将会简略的介绍Anakin的工作原理,一些基本的Anakin API,以及如何调用这些API。 - + ## 内容 ### - [Anakin的工作原理](#principle) @@ -14,31 +14,38 @@ 用Anakin来进行前向计算主要分为三个步骤: -- 将外部模型通过[Anakin Parser](Converter_ch.md)解析为Anakin模型 - 在使用Anakin之前,用户必须将所有其他模型转换成Anakin模型,我们提供了转换脚本,用户可通过[Anakin Parser](Converter_ch.md)进行模型转换。 -- 生成Anakin计算图 - 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。你只需要调用相应的API优化即可。 -- 执行计算图 - Anakin会选择不同硬件平台执行计算图。 + - 将外部模型通过[Anakin Parser](./convert_paddle_to_anakin.html)解析为Anakin模型 + 在使用Anakin之前,用户必须将所有其他模型转换成Anakin模型,我们提供了转换脚本,用户可通过[Anakin Parser](./convert_paddle_to_anakin.html)进行模型转换。 + - 生成Anakin计算图 + 加载Anakin模型生成原始计算图,然后需要对原始计算图进行优化。你只需要调用相应的API优化即可。 + - 执行计算图 + Anakin会选择不同硬件平台执行计算图。 ## Anakin APIs ### + ### Tensor #### -`Tensor`提供基础的数据操作和管理,为ops提供统一的数据接口。`Tensor`包含以下几个属性: +`Tensor`提供基础的数据操作和管理,为ops提供统一的数据接口。`Tensor`包含以下几个属性: + +- Buffer + 数据存储区 +- Shape + 数据的维度信息 +- Event + 用于异步计算的同步 -- Buffer - 数据存储区 -- Shape - 数据的维度信息 -- Event - 用于异步计算的同步 +`Tensor`类包含三个`Shape`对象, 分别是`_shape`, `_valid_shape`和 `offset` - `Tensor` 类包含三个`Shape`对象, 分别是`_shape`, `_valid_shape`和 `offset`。 `_shape`为`tensor`真正空间信息,`_valid_shape`表示当前`tensor`使用的空间信息, `_offset`表示当前`tensor`数据指针相对于真正数据空间的信息。 `Tensor`不同维度与分别与数学中的向量、矩阵等相对应如下表所示。 + - `_shape`为`tensor`真正空间信息 + - `_valid_shape`表示当前`tensor`使用的空间信息 + - `tensor`使用的空间信息 + - `_offset`表示当前`tensor`数据指针相对于真正数据空间的信息 +`Tensor`不同维度与分别与数学中的向量、矩阵等相对应如下表所示 Dimentions | Math entity | - :----: | :----: +:----: | :----: 1 | vector 2 | matrix 3 | 3-tensor @@ -57,195 +64,202 @@ n | n-tensor }; ``` -TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应;[LayOutType](#layout)是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识。 Anakin中数据类型与基本数据类型的对应如下: - -1. TargetType - - Anakin TargetType | platform - :----: | :----:| - NV | NVIDIA GPU - ARM | ARM - AMD | AMD GPU - X86 | X86 - NVHX86 | NVIDIA GPU with Pinned Memory +TargetType是平台类型,如X86,GPU等等,在Anakin内部有相应的标识与之对应;datatype是普通的数据类型,在Anakin内部也有相应的标志与之对应 -2. DataType +[LayOutType](#layout)是数据分布类型,如batch x channel x height x width [NxCxHxW], 在Anakin内部用一个struct来标识 -Anakin DataType | C++ | Description -:---: | :---: | :---: | -AK_HALF | short | fp16 -AK_FLOAT | float | fp32 -AK_DOUBLE | double | fp64 -AK_INT8 | char | int8 -AK_INT16 | short | int16 -AK_INT32 | int | int32 -AK_INT64 | long | int64 -AK_UINT8 | unsigned char | uint8 -AK_UINT16 | unsigned short | uint8 -AK_UINT32 | unsigned int | uint32 -AK_STRING | std::string | / -AK_BOOL | bool | / -AK_SHAPE | / | Anakin Shape -AK_TENSOR | / | Anakin Tensor +Anakin中数据类型与基本数据类型的对应如下: + 1. TargetType -3. LayOutType + Anakin TargetType | platform + :----: | :----: + NV | NVIDIA GPU + ARM | ARM + AMD | AMD GPU + X86 | X86 + NVHX86 | NVIDIA GPU with Pinned Memory -Anakin LayOutType ( Tensor LayOut ) | Tensor Dimention | Tensor Support | Op Support -:---: | :---: | :---: | :---: | -W | 1-D | YES | NO -HW | 2-D | YES | NO -WH | 2-D | YES | NO -NW | 2-D | YES | YES -NHW | 3-D | YES |YES -NCHW ( default ) | 4-D | YES | YES -NHWC | 4-D | YES | NO -NCHW_C4 | 5-D | YES | YES + 2. DataType + Anakin DataType | C++ | Description + :---: | :---: | :---: + AK_HALF | short | fp16 + AK_FLOAT | float | fp32 + AK_DOUBLE | double | fp64 + AK_INT8 | char | int8 + AK_INT16 | short | int16 + AK_INT32 | int | int32 + AK_INT64 | long | int64 + AK_UINT8 | unsigned char | uint8 + AK_UINT16 | unsigned short | uint8 + AK_UINT32 | unsigned int | uint32 + AK_STRING | std::string | / + AK_BOOL | bool | / + AK_SHAPE | / | Anakin Shape + AK_TENSOR | / | Anakin Tensor -理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOutType,NCHW_C4是专门针对于int8这种数据类型的。 + 3. LayOutType + Anakin LayOutType ( Tensor LayOut ) | Tensor Dimention | Tensor Support | Op Support + :---: | :---: | :---: | :---: + W | 1-D | YES | NO + HW | 2-D | YES | NO + WH | 2-D | YES | NO + NW | 2-D | YES | YES + NHW | 3-D | YES |YES + NCHW ( default ) | 4-D | YES | YES + NHWC | 4-D | YES | NO + NCHW_C4 | 5-D | YES | YES -例子 + 理论上,Anakin支持申明1维以上的tensor,但是对于Anakin中的Op来说,只支持NW、NHW、NCHW、NCHW_C4这四种LayOut,其中NCHW是默认的LayOuteType,NCHW_C4是专门针对于int8这种数据类型的。 -> 下面的代码将展示如何使用tensor, 我们建议先看看这些示例。 + 例子 -> 要想获得更多关于tensor的信息, 请参考 *soure_path/core/tensor.h* + 下面的代码将展示如何使用tensor, 我们建议先看看这些示例。 -> 1. 使用shape对象初始化tensor -``` c++ - //create a null tensor. A null tensor holds for nothing. - //tensor's buffer is resident at CPU and its datatype is AK_FLOAT. - //tensor's Layout is NCHW(default) - Tensor mytensor; + 要想获得更多关于tensor的信息, 请参考 *soure_path/core/tensor.h* - //1. using shape object to create a tensor. - Shape shape1(NUM); //1-D shape. NUM is the number of dimention. - Tensor mytensor1(shape1); //1-D tensor. + > 1. 使用shape对象初始化tensor - // A 4-D shape - Shape shape2(N, C, H, W); // batch x channel x height x width -``` + ```c++ + //create a null tensor. A null tensor holds for nothing. + //tensor's buffer is resident at CPU and its datatype is AK_FLOAT. + //tensor's Layout is NCHW(default) + Tensor mytensor; ->`注意:Shape的维度必须和tensor的`[LayoutType](#layout)`相同,比如Shape(N,C,H,W), 那么Tensor的 LayoutType必须是NCHW,否则会出错。如下列代码所示` + //1. using shape object to create a tensor. + Shape shape1(NUM); //1-D shape. NUM is the number of dimention. + Tensor mytensor1(shape1); //1-D tensor. + // A 4-D shape + Shape shape2(N, C, H, W); // batch x channel x height x width + ``` -```c++ - // A 4-D tensor. - Tensor mytensor2(shape2); //right + >`注意:Shape的维度必须和tensor的`[LayoutType](#layout)`相同,比如Shape(N,C,H,W), 那么Tensor的 LayoutType必须是NCHW,否则会出错。如下列代码所示` - //A 4-D tensor which is resident at GPU and its datatype is AK_INT8 - Tensor mytensor3(shape2); //right - - Tensor mytensor4(shape2); //wrong!! shape's dimetion must be equal to tensor's Layout. - Tensor mytensor5(shape2); //wrong!!!! + ```c++ + // A 4-D tensor. + Tensor mytensor2(shape2); //right -``` + //A 4-D tensor which is resident at GPU and its datatype is AK_INT8 + Tensor mytensor3(shape2); //right -> 2. 使用现有的数据和shape初始化tensor + Tensor mytensor4(shape2); //wrong!! shape's dimetion must be equal to tensor's Layout. + Tensor mytensor5(shape2); //wrong!!!! -```c++ + ``` - /** - * A construtor of Tensor. - * data_ptr is a pointer to any data type of data - * TargetType is type of a platform [Anakin TargetType] - * id : device id - * shape: a Anakin shape - */ - Tensor(Dtype* data_ptr, TargetType_t target, int id, Shape shape); + > 2. 使用现有的数据和shape初始化tensor - //using existing data feed to a tensor - Tensor mytensor(data_ptr, TargetType, device_id, shape); //shape must has dimention (N, C, H, W). + ```c++ -``` + /** + * A construtor of Tensor. + * data_ptr is a pointer to any data type of data + * TargetType is type of a platform [Anakin TargetType] + * id : device id + * shape: a Anakin shape + */ + Tensor(Dtype* data_ptr, TargetType_t target, int id, Shape shape); -> 3. 使用tensor初始化tensor + //using existing data feed to a tensor + Tensor mytensor(data_ptr, TargetType, device_id, shape); //shape must has dimention (N, C, H, W). -```c++ - Tensor tensor(exist_tensor); -``` + ``` + > 3. 使用tensor初始化tensor -> 提示: 你可以用` typedef Tensor Tensor4d_X86 `方便定义tensor + ```c++ + Tensor tensor(exist_tensor); + ``` + > 提示: 你可以用` typedef Tensor Tensor4d_X86 `方便定义tensor #### 填充tensor数据区 - 填充数据区得看你申明tensor的方式, 下面展示了如何填充tensor的数据区。 -```c++ 首先来看看tensor的四种声明方式: -1. Tensor mytensor; -2. Tensor mytensor1(shape1); -3. Tensor mytensor(data_ptr, TargetType, device_id, shape); -4. Tensor tensor(exist_tensor); - +```c++ + 1. Tensor mytensor; + 2. Tensor mytensor1(shape1); + 3. Tensor mytensor(data_ptr, TargetType, device_id, shape); + 4. Tensor tensor(exist_tensor); +``` 相关的声明方式的数据填充方法如下: -1:声明一个空的tensor,此时没有为其分配内存,所以,我们需要手动的为其分配内存。 - - //parama shape - mytensor.re_alloc(Shape shape); - - //Get writable pointer to mytensor. - //parama index (int): where you start to write. - //Dtype is your data type such int, float or double. - Dtype *p = mytensor.mutable_data(index/*=0*/); - //write data to mytensor - for(int i = 0; i < mytensor.size(); i++){ - p[i] = 1.0f; - } - //do something ... - -2: 这种声明方式会自动分配内存 - - //Get writable pointer to mytensor. - //parama index (int): where you start to write. - //Dtype is your data type such int, float or double. - Dtype *p = mytensor1.mutable_data(index/*=0*/); - //write data to mytensor - for(int i = 0; i < mytensor.size(); i++){ - p[i] = 1.0f; - } - //do something ... - - -3:在该种声明方式中,我们仍不需要手动为其分配内存。但在构造函数内部是否为其分配内存,得依情况而定。如果data_ptr和申明的 -tensor都在都一个目标平台上,那么该tensor就会与data_ptr共享内存空间,相反,如果他们不在同一个平台上(如data_ptr在X86上,而 -tensor在GPU上),那么此时tensor就会开辟一个新的内存空间,并将data_ptr所指向的数据拷贝到tensor的buffer中。 - - //Get writable pointer to mytensor. - //parama index (int): where you start to write. - //Dtype is your data type such int, float or double. - Dtype *p = mytensor.mutable_data(index/*=0*/); - //write data to mytensor - for(int i = 0; i < mytensor.size(); i++){ - p[i] = 1.0f; - } - //do something ... +- 声明一个空的tensor,此时没有为其分配内存,所以,我们需要手动的为其分配内存。 -4:该种方式仍不需要手动分配内存 +```c++ + + //parama shape + mytensor.re_alloc(Shape shape); - //Get writable pointer to mytensor. - //parama index (int): where you start to write. - //Dtype is your data type such int, float or double. - Dtype *p = mytensor.mutable_data(index/*=0*/); - //write data to mytensor - for(int i = 0; i < mytensor.size(); i++){ + //Get writable pointer to mytensor. + //parama index (int): where you start to write. + //Dtype is your data type such int, float or double. + Dtype *p = mytensor.mutable_data(index/*=0*/); + //write data to mytensor + for(int i = 0; i < mytensor.size(); i++){ p[i] = 1.0f; - } - //do something ... + } + //do something ... +``` + +- 这种声明方式会自动分配内存 + +```c++ + //Get writable pointer to mytensor. + //parama index (int): where you start to write. + //Dtype is your data type such int, float or double. + Dtype *p = mytensor1.mutable_data(index/*=0*/); + //write data to mytensor + for(int i = 0; i < mytensor.size(); i++){ + p[i] = 1.0f; + } + //do something ... +``` + +- 在该种声明方式中,我们仍不需要手动为其分配内存。但在构造函数内部是否为其分配内存,得依情况而定。如果data_ptr和申明的 + tensor都在都一个目标平台上,那么该tensor就会与data_ptr共享内存空间,相反,如果他们不在同一个平台上(如data_ptr在X86上,而 + tensor在GPU上),那么此时tensor就会开辟一个新的内存空间,并将data_ptr所指向的数据拷贝到tensor的buffer中。 + +```c++ + //Get writable pointer to mytensor. + //parama index (int): where you start to write. + //Dtype is your data type such int, float or double. + Dtype *p = mytensor.mutable_data(index/*=0*/); + //write data to mytensor + for(int i = 0; i < mytensor.size(); i++){ + p[i] = 1.0f; + } + //do something ... +``` + +- 该种方式仍不需要手动分配内存 + +```c++ + //Get writable pointer to mytensor. + //parama index (int): where you start to write. + //Dtype is your data type such int, float or double. + Dtype *p = mytensor.mutable_data(index/*=0*/); + //write data to mytensor + for(int i = 0; i < mytensor.size(); i++){ + p[i] = 1.0f; + } + //do something ... +``` +- 另外,你还可以获取一个tensor的可读指针,示例如下: -另外,你还可以获取一个tensor的可读指针,示例如下: +```c++ //Get read-only pointer to mytensor. //parama index (int): where you start to read. //Dtype is your data type such int, float or double. - Dtype *p = mytensor.data(index/*=0*/); + Dtype *p = mytensor.data(index/*=0*/); //do something ... ``` @@ -254,77 +268,75 @@ tensor在GPU上),那么此时tensor就会开辟一个新的内存空间, #### 获取tensor的shape ```c++ -//some declarations -// ... -Shape shape = mytensor.shape(); + //some declarations + // ... + Shape shape = mytensor.shape(); -//Get a first dimetion size of tesor, if it has. -int d1 = shape[0]; + //Get a first dimetion size of tesor, if it has. + int d1 = shape[0]; -//Get a second dimention size of tensor, if it has. -int d2 = shape[1]; + //Get a second dimention size of tensor, if it has. + int d2 = shape[1]; -... + ... -//Get a n-th dimention size of tensor, if it has. -int dn = shape[n-1]; + //Get a n-th dimention size of tensor, if it has. + int dn = shape[n-1]; -//Get a tensor's dimention -int dims = mytensor.dims(); + //Get a tensor's dimention + int dims = mytensor.dims(); -//Get the size of tensor. -//size = d1 x d2 x ... x dn. -int size = mytensor.size(); + //Get the size of tensor. + //size = d1 x d2 x ... x dn. + int size = mytensor.size(); -//Get the size of tensor at interval [Di, Dj) -// form i-th dimention to j-th dimention, but not including the j-th dimention. -// which means di x (di+1) x ... x (dj -1) -int size = mytensor.count(start, end); + //Get the size of tensor at interval [Di, Dj) + // form i-th dimention to j-th dimention, but not including the j-th dimention. + // which means di x (di+1) x ... x (dj -1) + int size = mytensor.count(start, end); ``` #### 设置tensor的shape 我们可以用tensor的成员函数set_shape来设置tensor的shape。 下面是set_shape的定义 - ```c++ -/** - * \brief set a tensor's shape - * \param valid_shape [a Shape object] - * \param shape [a Shape object] - * \param offset [a Shape object] - * \return the status of this operation, that means whether it success * or not. - */ -SaberStatus set_shape(Shape valid_shape, Shape shape = Shape::zero(TensorAPI::layout_dims::value), Shape offset = Shape::minusone(TensorAPI::layout_dims::value)); + /** + * \brief set a tensor's shape + * \param valid_shape [a Shape object] + * \param shape [a Shape object] + * \param offset [a Shape object] + * \return the status of this operation, that means whether it success * or not. + */ + SaberStatus set_shape(Shape valid_shape, Shape shape = Shape::zero(TensorAPI::layout_dims::value), Shape offset = Shape::minusone(TensorAPI::layout_dims::value)); ``` 这个成员函数只设置tensor的shape。这些shape对象(valid_shape, shape, offset)的[LayOutType](#layout)必须和当前的tensor的相应三个shape对象的LayOutType相同,如果不同就会出错,返回SaberInvalidValue。 如果相同,那么将成功设置tensor的shape。 ```c++ -// some declarations -// ... -//valid_shape, shape , offset are Shape object; -//All these Shape object's LayOutType must be equal to mytensor's. -mytensor.set_shape(valid_shape, shape, offset); + // some declarations + // ... + //valid_shape, shape , offset are Shape object; + //All these Shape object's LayOutType must be equal to mytensor's. + mytensor.set_shape(valid_shape, shape, offset); ``` #### 重置 tensor的shape ```c++ -//some declarations -Shape shape, valid_shape, offset; + //some declarations + Shape shape, valid_shape, offset; -//do some initializations -... -mytensor.reshape(valid_shape, shape, offset); + //do some initializations + ... + mytensor.reshape(valid_shape, shape, offset); ``` 注意: Reshape操作仍然需要shape的[LayOutType](#layout) 与tensor的相同 - ### Graph ### `Graph`类负责加载Anakin模型生成计算图、对图进行优化、存储模型等操作。 @@ -335,62 +347,61 @@ mytensor.reshape(valid_shape, shape, offset); ```c++ -template -class Graph ... /* inherit other class*/{ - - //some implements - ... + template + class Graph ... /* inherit other class*/{ -}; + //some implements + ... + + }; ``` 前面已经介绍过[TargetType](#target)和[DataType](#datatype)是Anakin内部自定义数据类型。[TargetType](#target)表示平台类型 (如NV、X86), [DataType](#datatype)是Anakin基本数据类型与C++/C中的基本数据类型相对应。 [Precision](#precision)为op所支持的精度类型, 稍后我们在介绍它。 - ```c++ -//Create a empty graph object. -Graph graph = Graph tmp(); + //Create a empty graph object. + Graph graph = Graph tmp(); -//Create a pointer to a empty graph. -Graph *graph = new Graph(); + //Create a pointer to a empty graph. + Graph *graph = new Graph(); -//Create a pointer to a empty graph. -auto graph = new Graph(); + //Create a pointer to a empty graph. + auto graph = new Graph(); ``` #### 加载 Anakin 模型 ```c++ -//some declarations -... -auto graph = new Graph(); -std::string model_path = "the/path/to/where/your/models/are"; -const char *model_path1 = "the/path/to/where/your/models/are"; - -//Loading Anakin model to generate a compute graph. -auto status = graph->load(model_path); - -//Or this way. -auto status = graph->load(model_path1); -//Check whether load operation success. -if(!status){ - std::cout << "error" << endl; - //do something... -} + //some declarations + ... + auto graph = new Graph(); + std::string model_path = "the/path/to/where/your/models/are"; + const char *model_path1 = "the/path/to/where/your/models/are"; + + //Loading Anakin model to generate a compute graph. + auto status = graph->load(model_path); + + //Or this way. + auto status = graph->load(model_path1); + //Check whether load operation success. + if(!status){ + std::cout << "error" << endl; + //do something... + } ``` #### 优化计算图 ```c++ -//some declarations -... -//Load graph. -... -//According to the ops of loaded graph, optimize compute graph. -graph->Optimize(); + //some declarations + ... + //Load graph. + ... + //According to the ops of loaded graph, optimize compute graph. + graph->Optimize(); ``` @@ -400,34 +411,33 @@ graph->Optimize(); 你可以在任何时候保存模型, 特别的, 你可以保存一个优化的模型,这样,下次再加载模型时,就不必进行优化操作。 - ```c++ -//some declarations -... -//Load graph. -... -// save a model -//save_model_path: the path to where your model is. -auto status = graph->save(save_model_path); - -//Checking -if(!status){ - cout << "error" << endl; - //do somethin... -} + //some declarations + ... + //Load graph. + ... + // save a model + //save_model_path: the path to where your model is. + auto status = graph->save(save_model_path); + + //Checking + if(!status){ + cout << "error" << endl; + //do somethin... + } ``` #### 重新设置计算图里的tensor的shape ```c++ -//some declarations -... -//Load graph. -... -vector shape{10, 256, 256, 10}; -//input_name : std::string. -//Reshape a tensor named input_name. -graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object. + //some declarations + ... + //Load graph. + ... + vector shape{10, 256, 256, 10}; + //input_name : std::string. + //Reshape a tensor named input_name. + graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object. ``` #### 设置 batch size @@ -435,14 +445,14 @@ graph->Reshape(input_name, shape);//Note: shape is a vector, not a Shape object. `Graph` 支持重新设置batch size的大小。 ```c++ -//some declarations -... -//Load graph. -... -//input_name : std::string. -//Reset a tensor named input_name. -int new_batch_size = 4; -graph->ResetBatchSize(input_name, new_batch_size); + //some declarations + ... + //Load graph. + ... + //input_name : std::string. + //Reset a tensor named input_name. + int new_batch_size = 4; + graph->ResetBatchSize(input_name, new_batch_size); ``` ### Net ### @@ -451,189 +461,185 @@ graph->ResetBatchSize(input_name, new_batch_size); `Net` 是计算图的执行器。你可以通过Net对象获得输入和输出 #### Creating a graph executor -`Net`接受四个模板参数。 +`Net`接受四个模板参数。 ```c++ -template -class Net{ - //some implements - ... + template + class Net{ + //some implements + ... -}; + }; ``` 由于有些Op可能支持多种精度,我们可以通过Precision来指定。OpRunType表示同步或异步类型,异步是默认类型。OpRunType::SYNC表示同步,在GPU上只有单个流;OpRunType::ASYNC表示异步,在GPU上有多个流并以异步方式执行。实际上,Precision和OpRunType都是enum class, 详细设计请参考*source_root/framework/core/types.h*. 1. Precision -Precision | Op support -:---: | :---: -Precision::INT4 | NO -Precision::INT8 | NO -Precision::FP16 | NO -Precision::FP32 | YES -Precision::FP64 | NO + Precision | Op support + :---: | :---: + Precision::INT4 | NO + Precision::INT8 | NO + Precision::FP16 | NO + Precision::FP32 | YES + Precision::FP64 | NO 现在Op的精度只支持FP32, 但在将来我们会支持剩下的Precision. +2. OpRunType + OpRunType | Sync/Aync |Description + :---: | :---: | :---: + OpRunType::SYNC | Synchronization | single-stream on GPU + OpRunType::ASYNC | Asynchronization | multi-stream on GPU -2. OpRunType - -OpRunType | Sync/Aync |Description -:---: | :---: | :---: -OpRunType::SYNC | Synchronization | single-stream on GPU -OpRunType::ASYNC | Asynchronization | multi-stream on GPU +用graph对象创建一个执行器 -用graph对象创建一个执行器。 ```c++ -//some declarations -... -//Create a pointer to a graph. -auto graph = new Graph(); -//do something... -... + //some declarations + ... + //Create a pointer to a graph. + auto graph = new Graph(); + //do something... + ... -//create a executor -Net executor(*graph); + //create a executor + Net executor(*graph); ``` #### 获取输入输出tensor - -获取输入输出tensor,并填充输入tensor的buffer。如果想要获取输入和输出tensor,那么必须指定输入的名字,如"input_0", "input_1", "input_2", ..., 必须传入如上字符串才能够获得输入tensor。另外,如果想知道input_i对应哪个输入,你需要去dash board查看,如何使用dash board请看[Anakin Parser](Converter_ch.md)。请看如下示例代码 +获取输入输出tensor,并填充输入tensor的buffer。如果想要获取输入和输出tensor,那么必须指定输入的名字,如"input_0", "input_1", "input_2", ..., 必须传入如上字符串才能够获得输入tensor。另外,如果想知道input_i对应哪个输入,你需要去dash board查看,如何使用dash board请看[Anakin Parser](./convert_paddle_to_anakin.html)。请看如下示例代码 ```c++ -//some declaratinos -... - -//create a executor -//TargetType is NV [NVIDIA GPU] -Net executor(*graph); - -//Get the first input tensor. -//The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU. -//Note: Member function get_in returns an pointer to tensor. -Tensor* tensor_in0 = executor.get_in("input_0"); - -//If you have multiple input tensors -//You just type this code below. -Tensor* tensor_in1 = executor.get_in("input_1"); -... -auto tensor_inn = executor.get_in("input_n"); + //some declaratinos + ... + + //create a executor + //TargetType is NV [NVIDIA GPU] + Net executor(*graph); + + //Get the first input tensor. + //The following tensors(tensor_in0, tensor_in2 ...) are resident at GPU. + //Note: Member function get_in returns an pointer to tensor. + Tensor* tensor_in0 = executor.get_in("input_0"); + + //If you have multiple input tensors + //You just type this code below. + Tensor* tensor_in1 = executor.get_in("input_1"); + ... + auto tensor_inn = executor.get_in("input_n"); ``` 当得到输入tensor之后,就可以填充它的数据区了。 ```c++ -//This tensor is resident at GPU. -auto tensor_d_in = executor.get_in("input_0"); - -//If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one. - -//using Tensor4d = Tensor; -Tensor4d tensor_h_in; //host tensor; -//Tensor tensor_h_in; - -//Allocate memory for host tensor. -tensor_h_in.re_alloc(tensor_d_in->valid_shape()); -//Get a writable pointer to tensor. -float *h_data = tensor_h_in.mutable_data(); - -//Feed your tensor. -/** example -for(int i = 0; i < tensor_h_in.size(); i++){ - h_data[i] = 1.0f; -} -*/ -//Copy host tensor's data to device tensor. -tensor_d_in->copy_from(tensor_h_in); - -// And then + //This tensor is resident at GPU. + auto tensor_d_in = executor.get_in("input_0"); + + //If we want to feed above tensor, we must feed the tensor which is resident at host. And then copy the host tensor to the device's one. + + //using Tensor4d = Tensor; + Tensor4d tensor_h_in; //host tensor; + //Tensor tensor_h_in; + + //Allocate memory for host tensor. + tensor_h_in.re_alloc(tensor_d_in->valid_shape()); + //Get a writable pointer to tensor. + float *h_data = tensor_h_in.mutable_data(); + + //Feed your tensor. + /** example + for(int i = 0; i < tensor_h_in.size(); i++){ + h_data[i] = 1.0f; + } + */ + //Copy host tensor's data to device tensor. + tensor_d_in->copy_from(tensor_h_in); + + // And then ``` +类似的,我们可以利用成员函数get_out来获得输出tensor。但与获得输入tensor不同的是, 我们需要指定输入tensor结点的名字,这个可以从dash board中看到,请从[Anakin Parser](./convert_paddle_to_anakin.html)中查看dash board的使用方法。假如有个输出结点叫pred_out, 那么我们可以通过如下代码获得相应的输出tensor: -类似的,我们可以利用成员函数get_out来获得输出tensor。但与获得输入tensor不同的是, 我们需要指定输入tensor结点的名字,这个可以从dash board中看到,请从[Anakin Parser](Converter_ch.md)中查看dash board的使用方法。假如有个输出结点叫pred_out, 那么我们可以通过如下代码获得相应的输出tensor: ```c++ -//Note: this tensor are resident at GPU. -Tensor* tensor_out_d = executor.get_out("pred_out"); + //Note: this tensor are resident at GPU. + Tensor* tensor_out_d = executor.get_out("pred_out"); ``` - #### Executing graph - 当一切准备就绪后,我们就可以执行真正的计算了! ```c++ -executor.prediction(); + executor.prediction(); ``` - + ## 示例代码 ## 下面的例子展示了如何调用Anakin。 -在这儿之前, 请确保你已经有了Anakin模型。如果还没有,那么请使用[Anakin Parser](Converter_ch.md)转换你的模型。 +在这儿之前, 请确保你已经有了Anakin模型。如果还没有,那么请使用[Anakin Parser](./convert_paddle_to_anakin.html)转换你的模型。 ### Single-thread -单线程例子在 *source_root/test/framework/net/net_exec_test.cpp`* +单线程例子在 *`source_root/test/framework/net/net_exec_test.cpp`* ```c++ -std::string model_path = "your_Anakin_models/xxxxx.anakin.bin"; -// Create an empty graph object. -auto graph = new Graph(); -// Load Anakin model. -auto status = graph->load(model_path); -if(!status ) { - LOG(FATAL) << " [ERROR] " << status.info(); -} -// Reshape -graph->Reshape("input_0", {10, 384, 960, 10}); -// You must optimize graph for the first time. -graph->Optimize(); -// Create a executer. -Net net_executer(*graph); - -//Get your input tensors through some specific string such as "input_0", "input_1", and -//so on. -//And then, feed the input tensor. -//If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out. -auto d_tensor_in_p = net_executer.get_in("input_0"); -Tensor4d h_tensor_in; -auto valid_shape_in = d_tensor_in_p->valid_shape(); -for (int i=0; icopy_from(h_tensor_in); - -//Do inference. -net_executer.prediction(); - -//Get result tensor through the name of output node. -//And also, you need to see the dash board again to find out how many output nodes are and remember their name. - -//For example, you've got a output node named obj_pre_out -//Then, you can get an output tensor. -auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor. -auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor. -//...... -// do something else ... -//... -//save model. -//You might not optimize the graph when you load the saved model again. -std::string save_model_path = model_path + std::string(".saved"); -auto status = graph->save(save_model_path); -if (!status ) { - LOG(FATAL) << " [ERROR] " << status.info(); -} + std::string model_path = "your_Anakin_models/xxxxx.anakin.bin"; + // Create an empty graph object. + auto graph = new Graph(); + // Load Anakin model. + auto status = graph->load(model_path); + if(!status ) { + LOG(FATAL) << " [ERROR] " << status.info(); + } + // Reshape + graph->Reshape("input_0", {10, 384, 960, 10}); + // You must optimize graph for the first time. + graph->Optimize(); + // Create a executer. + Net net_executer(*graph); + + //Get your input tensors through some specific string such as "input_0", "input_1", and + //so on. + //And then, feed the input tensor. + //If you don't know Which input do these specific string ("input_0", "input_1") correspond with, you can launch dash board to find out. + auto d_tensor_in_p = net_executer.get_in("input_0"); + Tensor4d h_tensor_in; + auto valid_shape_in = d_tensor_in_p->valid_shape(); + for (int i=0; icopy_from(h_tensor_in); + + //Do inference. + net_executer.prediction(); + + //Get result tensor through the name of output node. + //And also, you need to see the dash board again to find out how many output nodes are and remember their name. + + //For example, you've got a output node named obj_pre_out + //Then, you can get an output tensor. + auto d_tensor_out_0_p = net_executer.get_out("obj_pred_out"); //get_out returns a pointer to output tensor. + auto d_tensor_out_1_p = net_executer.get_out("lc_pred_out"); //get_out returns a pointer to output tensor. + //...... + // do something else ... + //... + //save model. + //You might not optimize the graph when you load the saved model again. + std::string save_model_path = model_path + std::string(".saved"); + auto status = graph->save(save_model_path); + if (!status ) { + LOG(FATAL) << " [ERROR] " << status.info(); + } ``` diff --git a/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md b/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md index 56ca582b2..8a3587540 100644 --- a/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md +++ b/doc/fluid/advanced_usage/deploy/convert_paddle_to_anakin.md @@ -1,14 +1,14 @@ # 模型转换指南 -Anakin 支持不同框架的模型预测。但由于格式的差别,Anakin 需要您预先转换模型。本文档介绍如何转换模型。 +Anakin 支持不同框架的模型预测。但由于格式的差别,Anakin 需要您预先转换模型, 本文档介绍如何转换模型。 ## 简介 -Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型,模型包含网络结构(model 或 prototxt)和权重参数(param 或 caffemodel)。 +Anakin 模型转换器输入支持 Caffe 和 Paddle 两种格式的预测模型,模型包含网络结构(model 或 prototxt)和权重参数(param 或 caffemodel)。 -模型转换的输出是一个 bin 文件,它作为 Anakin 框架的 graph 参数导入。 +模型转换的输出是一个 bin 文件,它作为 Anakin 框架的 graph 参数导入。 -您还可以使用模型转换器的 launch board 功能生成网络结构的 HTML 预览。 +您还可以使用模型转换器的 launch board 功能生成网络结构的 HTML 预览。 ## 系统要求 @@ -22,7 +22,7 @@ Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型 ## 用法 ### 1、环境 -转换器所需的依赖标注于 *系统要求* 一节。 +转换器所需的依赖标注于*系统要求*一节。 ### 2、配置 您需要对 *config.yaml* 文件进行修改以告知您的需求。工程中给出了 *config.yaml* 示例,下面作进一步说明。 @@ -30,7 +30,7 @@ Anakin 模型转换器输入支持 Caffe 和 Fluid 两种格式的预测模型 #### config.yaml ```bash OPTIONS: - Framework: CAFFE # 依框架类型填写 CAFFE 或 FLUID + Framework: CAFFE # 依框架类型填写 CAFFE 或 Paddle SavePath: ./output # 转换结束后模型的保存位置 ResultName: googlenet # 输出模型的名字 Config: @@ -53,13 +53,13 @@ TARGET: PrototxtPath: /path/to/your/googlenet.prototxt ModelPath: /path/to/your/googlenet.caffemodel - FLUID: - # 当 Framework 为 FLUID 时需填写 + Paddle: + # 当 Framework 为 Paddle 时需填写 Debug: NULL ProtoPaths: - / - PrototxtPath: /path/to/fluid/inference_model - ModelPath: /path/to/fluid/inference_model + PrototxtPath: /path/to/paddle/inference_model + ModelPath: /path/to/paddle/inference_model # ... ``` @@ -68,6 +68,6 @@ TARGET: ### 4、预览 -最后一步,就是在浏览器中查看令人振奋的转换结果!网址是在 *config.yaml* 中配置的,例如 http://0.0.0.0:8888 。 +最后一步,就是在浏览器中查看转换结果!网址是在 *config.yaml* 中配置的,例如 http://0.0.0.0:8888 。 > 注意:若您使用了默认的 IP 地址 0.0.0.0,请在预览时使用真实的服务器地址 real_ip:port 替代它。 diff --git a/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md b/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md index a1f75f5e9..da2c64cf4 100644 --- a/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md +++ b/doc/fluid/advanced_usage/deploy/how_to_support_new_device_in_anakin.md @@ -52,7 +52,7 @@ endif() #cmakedefine USE_TNEW_PLACE ``` -* 其他依赖和编译选项 +* 其他依赖和编译选项 修改`cmake`目录下的`compiler_options.cmake`和`find_modules.cmake` @@ -231,7 +231,7 @@ struct TargetWrapper { //根据TNEW的具体类型修改__xx 4. 在`impl/`目录下添加设备目录和实现 在`saber/core/impl`目录下添加设备目录`tnew`。 -* 实现`TargetWrapper`结构体中各函数的定义。 +* 实现`TargetWrapper`结构体中各函数的定义。 如果`TargetWrapper`的实现与默认的模板类一致,则不用特化出该类。 ```c++ @@ -243,11 +243,11 @@ void TNEW_API::get_device_count(int &count) { void TNEW_API::set_device(int id){ // add implementation } - + void TNEW_API::mem_alloc(void** ptr, size_t n){ // add implementation } - + void TNEW_API::mem_free(void* ptr){ if(ptr != nullptr){ // add implementation @@ -275,7 +275,7 @@ void Device::get_info() { ### 在`saber/funcs`中实现设备相关的op -参考[如何增加新的Operator](addCustomOp.md) +参考[如何增加新的Operator](./how_to_add_anakin_op.html) ## 在`framework`中添加设备的具体化或实例化 ## @@ -329,7 +329,7 @@ public: typedef Tensor4d::type> type; PBlock() { - _inner_tensor = std::make_shared(); + _inner_tensor = std::make_shared(); } ... } @@ -348,7 +348,7 @@ struct target_host { ### `framework/graph` * `graph.cpp`中添加实例化 - + ```c++ #ifdef USE_TNEW_PLACE template class Graph; @@ -360,7 +360,7 @@ struct target_host { ### `framework/model_parser` * `parser.cpp`中添加实例化 - + ```c++ #ifdef USE_TNEW_PLACE template @@ -372,7 +372,7 @@ struct target_host { template Status load(graph::Graph* graph, const char* model_path); - + template Status save(graph::Graph* graph, std::string& model_path); @@ -382,7 +382,7 @@ struct target_host { template Status save(graph::Graph* graph, std::string& model_path); - + template Status load(graph::Graph* graph, std::string& model_path); @@ -392,7 +392,7 @@ struct target_host { template Status load(graph::Graph* graph, std::string& model_path); - + template Status save(graph::Graph* graph, const char* model_path); diff --git a/doc/fluid/advanced_usage/deploy/index_anakin.rst b/doc/fluid/advanced_usage/deploy/index_anakin.rst index b561a577d..32d26156a 100644 --- a/doc/fluid/advanced_usage/deploy/index_anakin.rst +++ b/doc/fluid/advanced_usage/deploy/index_anakin.rst @@ -10,12 +10,13 @@ Anakin 预测引擎 install_anakin.md convert_paddle_to_anakin.md - run_anakin_on_arm.md anakin_tutorial.md + anakin_run_on_arm.md anakin_example.md anakin_gpu_benchmark.md anakin_arm_benchmark.md + 开发文档 ~~~~~~~ @@ -24,3 +25,4 @@ Anakin 预测引擎 how_to_add_anakin_op.md how_to_support_new_device_in_anakin.md + anakin_parser_design.md diff --git a/doc/fluid/advanced_usage/deploy/install_anakin.md b/doc/fluid/advanced_usage/deploy/install_anakin.md index bb7c19503..0b44a6be3 100644 --- a/doc/fluid/advanced_usage/deploy/install_anakin.md +++ b/doc/fluid/advanced_usage/deploy/install_anakin.md @@ -1,4 +1,4 @@ -## 从源码编译安装Anakin ## +## 源码编译安装Anakin ## 我们已经在CentOS 7.3上成功的安装和测试了Anakin,对于其他操作系统,我们将很快支持。 @@ -6,7 +6,7 @@ * [在CentOS上安装 Anakin]() * [在Ubuntu上安装 Anakin]() -* [在ARM上安装 Anakin](run_on_arm_ch.md) +* [在ARM上安装 Anakin](./anakin_run_on_arm.html) * [验证安装]() @@ -17,7 +17,6 @@ * cmake 2.8.12+ * gcc 4.8.2+ * g++ 4.8.2+ -* 其他需要补充的。。。 #### 2. 编译CPU版Anakin #### @@ -26,30 +25,37 @@ #### 3. 编译支持NVIDIA GPU的Anakin #### - 3.1. 安装依赖 - - 3.1.1 protobuf - >$ git clone https://github.com/google/protobuf - >$ cd protobuf - >$ git submodule update --init --recursive - >$ ./autogen.sh - >$ ./configure --prefix=/path/to/your/insall_dir - >$ make - >$ make check - >$ make install - >$ sudo ldconfig + - 3.1.1 protobuf - 如安装protobuf遇到任何问题,请访问[这里](https://github.com/google/protobuf/blob/master/src/README.md) + ``` + > git clone https://github.com/google/protobuf + > cd protobuf + > git submodule update --init --recursive + > ./autogen.sh + > ./configure --prefix=/path/to/your/insall_dir + > make + > make check + > make install + > sudo ldconfig + ``` + + 如安装protobuf遇到任何问题,请访问[这里](https://github.com/google/protobuf/blob/master/src/README.md) - 3.2 CUDA Toolkit - - [CUDA 8.0](https://developer.nvidia.com/cuda-zone) or higher. 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/). - - [cuDNN v7](https://developer.nvidia.com/cudnn). 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/). + + - [CUDA 8.0](https://developer.nvidia.com/cuda-zone) or higher, 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/). + - [cuDNN v7](https://developer.nvidia.com/cudnn), 具体信息参见[NVIDIA's documentation](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/). + - 3.3 编译Anakin - >$ git clone https:/xxxxx - >$ cd anakin - >$ mkdir build - >$ camke .. - >$ make + ``` + > git clone https:/xxxxx + > cd anakin + > mkdir build + > camke .. + > make + ``` #### 4. 编译支持AMD GPU的Anakin #### @@ -63,7 +69,8 @@ ### 在ARM上安装 Anakin ### -暂时还不支持 +请参考[ARM安装文档](./anakin_run_on_arm.html) ### 验证安装 ### -we are coming soon... + +安装完成后,如果没有报错信息,你可以通过运行 `output/unit_test`路径下的单测示例验证是否编译成功。 diff --git a/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md b/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md index ebeb38f53..f61beca7e 100644 --- a/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md +++ b/doc/fluid/advanced_usage/deploy/run_anakin_on_arm.md @@ -1,4 +1,4 @@ -## 源码编译 Anakin ## +## ARM 源码编译 Anakin ## 目前Anakin支持ARM Android平台,采用Android NDK交叉编译工具链,已在mac os和centos上编译和测试通过。 @@ -12,37 +12,44 @@ ### 1. 系统需求 ### -* 宿主机: linux, mac -* cmake 3.8.2+ +* 宿主机: linux, mac +* cmake 3.8.2+ * Android NDK r14, Linux 版本[从这里下载](https://dl.google.com/android/repository/android-ndk-r14b-linux-x86_64.zip) ### 2. 安装第三方依赖 ### -- 2.1 protobuf3.4.0 - 源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0) - - 2.1.1 为宿主机编译protobuf - ```bash - $ tar -xzf protobuf-3.4.0.tar.gz - $ cd protobuf-3.4.0 - $ ./autogen.sh - $ ./configure - $ make - $ make check +- 2.1 protobuf3.4.0 + + 源码从这里[下载](https://github.com/google/protobuf/releases/tag/v3.4.0) + + - 2.1.1 为宿主机编译protobuf + +```bash + $ tar -xzf protobuf-3.4.0.tar.gz + $ cd protobuf-3.4.0 + $ ./autogen.sh + $ ./configure + $ make + $ make check $ make install - ``` - 上述 $make install 执行后,可在 /usr/local/include/google 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下, - 如有问题,请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md)。 - 然后将已经生成文件清除。 - ```bash +``` + +上述 $make install 执行后,可在 /usr/local/include/google 找到 libprotobuf 所需的头文件,将整个google文件夹拷贝至Anakin/third-party/arm-android/protobuf/下 + +如有问题,请点[这里](https://github.com/google/protobuf/blob/v3.4.0/src/README.md),然后将已经生成文件清除。 + +```bash $ make distclean - ``` - - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf,注意设置ANDROID_NDK的路径,以及ARCH_ABI、HOSTOSN的值, +``` + + - 2.1.1 交叉编译Android`armeabi-v7a`的protobuf,注意设置ANDROID_NDK的路径,以及ARCH_ABI、HOSTOSN的值 + ```bash - $ export ANDROID_NDK=your_ndk_path + $ export ANDROID_NDK=your_ndk_path $ ARCH_ABI="arm-linux-androideabi-4.9" $ HOSTOSN="darwin-x86_64" - $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm + $ export SYSROOT=$ANDROID_NDK/platforms/android-9/arch-arm $ export PREBUILT=$ANDROID_NDK/toolchains/$ARCH_ABI $ export LDFLAGS="--sysroot=$SYSROOT" $ export LD="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/arm-linux-androideabi/bin/ld $LDFLAGS" @@ -53,34 +60,38 @@ $ export CCFLAGS="$CXXFLAGS" $ export CXX="$PREBUILT/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-g++ $CXXFLAGS" $ export CC="$CXX" - $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib" - $ ./autogen.sh - $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD" + $ export RANLIB="$ANDROID_NDK/toolchains/$ARCH_ABI/prebuilt/$HOSTOSN/bin/arm-linux-androideabi-ranlib" + $ ./autogen.sh + $ ./configure --host=arm-linux-androideabi --with-sysroot=$SYSROOT --enable-cross-compile --with-protoc=protoc --disable-shared CXX="$CXX" CC="$CC" LD="$LD" $ make - ``` - - 编译生成 *.a 静态库,若希望编译*.so 动态链接库 ,请在./configure参数中改--disable-shared为--disable-static --enable-shared。 - 生成文件在src/.libs/下,将生成的文件拷贝至Anakin/third-party/arm-android/protobuf/lib下。 - 在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。 - ```cmake +``` + +编译生成 *.a 静态库,若希望编译*.so 动态链接库 ,请在./configure参数中改--disable-shared为--disable-static --enable-shared。 +生成文件在src/.libs/下,将生成的文件拷贝至Anakin/third-party/arm-android/protobuf/lib下。 +在[cmake](../../cmake/find_modules.cmake)中更新`ARM_RPOTO_ROOT`的路径。 + +```cmake set(ARM_RPOTO_ROOT "${CMAKE_SOURCE_DIR}/third-party/arm-android/protobuf") - ``` - -- 2.2 opencv 2.4.3+(optional) - Anakin只在examples示例中使用opencv - Android系统的opencv从[这里下载](https://opencv.org/releases.html) - 解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a` - 在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv`, - 并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径。 - ```cmake +``` + +- 2.2 opencv 2.4.3+(optional) + + Anakin只在examples示例中使用opencv + Android系统的opencv从[这里下载](https://opencv.org/releases.html) + 解压后将 `3rdparty/libs/armeabi-v7a`中的库文件拷贝到`libs/armeabi-v7a` + 在[cmake](../../cmake/find_modules.cmake)中搜索`anakin_find_opencv`, + 并设置 `include_directories` 和 `LINK_DIRECTORIES`为自己安装的库的路径。 + + ```cmake include_directories(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/jni/include/) LINK_DIRECTORIES(${CMAKE_SOURCE_DIR}/third-party/arm-android/opencv/sdk/native/libs/armeabi-v7a/) - ``` + ``` ### 3. Anakin源码编译 ### #### 编译Android版本 - 克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm) + 克隆[源码](https://github.com/PaddlePaddle/Anakin/tree/arm) + ```bash cd your_dir git clone https://github.com/PaddlePaddle/Anakin.git @@ -88,64 +99,87 @@ git fetch origin arm git checkout arm ``` - 修改`android_build.sh` -- 修改NDK路径 + + 修改`android_build.sh` + +- 修改NDK路径 + ```bash #modify "your_ndk_path" to your NDK path export ANDROID_NDK=your_ndk_path ``` -- 修改ARM 处理器架构 - 对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON`, - 对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a`。 - 目前我们只支持 `armeabi-v7a with NEON`;`arm64-v8a` 还在开发中。 + +- 修改ARM 处理器架构 + + 对于32位ARM处理器, 将ANDROID_ABI 设置为 `armeabi-v7a with NEON`, + 对于64位ARM处理器, 可以将ANDROID_ABI 设置为 `armeabi-v7a with NEON`或者`arm64-v8a`。 + 目前我们只支持 `armeabi-v7a with NEON`;`arm64-v8a` 还在开发中。 + ```bash -DANDROID_ABI="armeabi-v7a with NEON" ``` -- 设置Android API - 根据Android系统的版本设置API level, 例如API Level 21 -> Android 5.0.1 + +- 设置Android API + + 根据Android系统的版本设置API level, 例如API Level 21 -> Android 5.0.1 ```bash -DANDROID_NATIVE_API_LEVEL=21 ``` -- 选择编译静态库或动态库 - 设置`BUILD_SHARED=NO`编译静态库 - 设置`BUILD_SHARED=YES`编译动态库 +- 选择编译静态库或动态库 + + 设置`BUILD_SHARED=NO`编译静态库 + 设置`BUILD_SHARED=YES`编译动态库 + ```bash -DBUILD_SHARED=NO ``` -- OpenMP多线程支持 - 设置`USE_OPENMP=YES`开启OpenMP多线程 +- OpenMP多线程支持 + + 设置`USE_OPENMP=YES`开启OpenMP多线程 + ```bash -DUSE_OPENMP=YES ``` - -- 编译单测文件 - 设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件 - ```bash - -DBUILD_WITH_UNIT_TEST=YES - ``` - -- 编译示例文件 - 设置`BUILD_EXAMPLES=YES`将会编译示例文件 - ```bash - -DBUILD_EXAMPLES=YES - ``` - -- 开启opencv - 如果使用opencv,设置`USE_OPENCV=YES` - ```bash - -DUSE_OPENCV=YES - ``` - -- 开始编译 - 运行脚本 `android_build.sh` 将自动编译Anakin + +- 编译单测文件 + + 设置`BUILD_WITH_UNIT_TEST=YES`将会编译单测文件 + + ```bash + -DBUILD_WITH_UNIT_TEST=YES + ``` + +- 编译示例文件 + + 设置`BUILD_EXAMPLES=YES`将会编译示例文件 + + ```bash + -DBUILD_EXAMPLES=YES + ``` + +- 开启opencv + + 如果使用opencv,设置`USE_OPENCV=YES` + + ```bash + -DUSE_OPENCV=YES + ``` + +- 开始编译 + + 运行脚本 `android_build.sh` 将自动编译Anakin + ```bash ./android_build.sh ``` -### 4. 验证安装 ### - 编译好的库会放在目录`${Anakin_root}/output`下; - 编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下; - 编译好的示例文件会放在`${Anakin_root}/output/examples`目录下。 - - 对于Android系统,打开设备的调试模式,通过ADB可以访问的目录是`data/local/tmp`,通过ADB push将测试文件、模型和数据发送到设备目录, 运行测试文件。 +### 4. 验证安装 ### + + 编译好的库会放在目录`${Anakin_root}/output`下 + + 编译好的单测文件会放在`${Anakin_root}/output/unit_test`目录下 + + 编译好的示例文件会放在`${Anakin_root}/output/examples`目录下 + + 对于Android系统,打开设备的调试模式,通过ADB可以访问的目录是`data/local/tmp`,通过ADB push将测试文件、模型和数据发送到设备目录,运行测试文件。 -- GitLab