提交 5dd99908 编写于 作者: L liuqi

Replace how_to_build doc with english.

上级 d6f79f62
...@@ -24,3 +24,4 @@ pygments_style = 'sphinx' ...@@ -24,3 +24,4 @@ pygments_style = 'sphinx'
html_theme = "sphinx_rtd_theme" html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()] html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
html_static_path = ['_static'] html_static_path = ['_static']
smartquotes = False
How to build How to build
============ ============
模型格式支持 Supported Platforms
------------- -------------------
+--------------+------------------------------------------------------------------------------------------+ .. list-table::
| 框架格式 | 支持情况 | :widths: auto
+==============+==========================================================================================+ :header-rows: 1
| TensorFlow | 推荐使用1.4以上版本,否则可能达不到最佳性能 (考虑到后续Android NN,建议首选TensorFLow) | :align: left
+--------------+------------------------------------------------------------------------------------------+
| Caffe | 推荐使用1.0以上版本,低版本可能不支持,建议改用TensorFlow | * - Platform
+--------------+------------------------------------------------------------------------------------------+ - Explanation
| MXNet | 尚未支持 | * - Tensorflow
+--------------+------------------------------------------------------------------------------------------+ - >= 1.6.0. (first choice, convenient for Android NN API in the future)
| ONNX | 尚未支持 | * - Caffe
+--------------+------------------------------------------------------------------------------------------+ - >= 1.0.
环境要求 Environment Requirement
--------- -------------------------
``mace``\ 提供了包含开发运行所需环境的docker镜像,镜像文件可以参考\ ``./docker/``\ 。启动命令: ``mace``\ supply a docker image which contains all required environment. ``Dockerfile`` under the ``./docker`` directory.
the followings are start commands:
.. code:: sh .. code:: sh
sudo docker pull cr.d.xiaomi.net/mace/mace-dev sudo docker pull cr.d.xiaomi.net/mace/mace-dev
sudo docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb --net=host -v /local/path:/container/path cr.d.xiaomi.net/mace/mace-dev /bin/bash sudo docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb --net=host -v /local/path:/container/path cr.d.xiaomi.net/mace/mace-dev /bin/bash
如果用户希望配置开发机上的环境,可以参考如下环境要求: if you want to run on your local computer, you have to install the following softwares.
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ .. list-table::
| 软件 | 版本号 | 安装命令 | :widths: auto
+=====================+=================+===================================================================================================+ :header-rows: 1
| bazel | >= 0.5.4 | - | :align: left
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+
| android-ndk | r15c,r16b | - | * - software
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - version
| adb | >= 1.0.32 | apt install -y android-tools-adb | - install command
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ * - bazel
| tensorflow | 1.7.0 | pip install tensorflow==1.7.0 | - >= 0.13.0
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - `bazel installation <https://docs.bazel.build/versions/master/install.html>`__
| numpy | >= 1.14.0 | pip install numpy | * - android-ndk
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - r15c/r16b
| scipy | >= 1.0.0 | pip install scipy | - reference the docker file
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ * - adb
| jinja2 | >= 2.10 | pip install jinja2 | - >= 1.0.32
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - apt-get install android-tools-adb
| PyYaml | >= 3.12 | pip install pyyaml | * - tensorflow
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - >= 1.6.0
| sh | >= 1.12.14 | pip install sh | - pip install -I tensorflow==1.6.0 (if you use tensorflow model)
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ * - numpy
| filelock | >= 3.0.0 | pip install filelock | - >= 1.14.0
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - pip install -I numpy=1.14.0
| docker(for caffe) | >= 17.09.0-ce | `install doc <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__ | * - scipy
+---------------------+-----------------+---------------------------------------------------------------------------------------------------+ - >= 1.0.0
- pip install -I scipy=1.0.0
* - jinja2
- >= 2.10
- pip install -I jinja2=2.10
* - PyYaml
- >= 3.12.0
- pip install -I pyyaml=3.12
* - sh
- >= 1.12.14
- pip install -I sh=1.12.14
* - filelock
- >= 3.0.0
- pip install -I filelock=3.0.0
* - docker (for caffe)
- >= 17.09.0-ce
- `install doc <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__
Docker Images Docker Images
---------------- ----------------
...@@ -84,12 +102,16 @@ Docker Images ...@@ -84,12 +102,16 @@ Docker Images
docker run -it --rm -v /local/path:/container/path --net=host cr.d.xiaomi.net/mace/mace-dev /bin/bash docker run -it --rm -v /local/path:/container/path --net=host cr.d.xiaomi.net/mace/mace-dev /bin/bash
使用简介 Usage
-------- --------
1. 获取最新tag的代码 ============================
1. Pull code with latest tag
============================
.. warning::
**建议尽可能使用最新tag下的代码,以及不要直接使用master分支的最新代码。** please do not use master branch for deployment.
.. code:: sh .. code:: sh
...@@ -104,19 +126,23 @@ Docker Images ...@@ -104,19 +126,23 @@ Docker Images
# checkout to latest tag branch # checkout to latest tag branch
git checkout -b ${tag_name} tags/${tag_name} git checkout -b ${tag_name} tags/${tag_name}
2. 模型优化 ============================
2. Model Optimization
============================
- Tensorflow - Tensorflow
TensorFlow训练得到的模型进行一系列的转换,可以提升设备上的运行速度。TensorFlow提供了官方工具 Tensorflow supply a
`TensorFlow Graph Transform `model optimization tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__ for speed up inference. The docker image contain the tool,
来进行模型优化 by the way you can download from `transform_graph <http://cnbj1-inner-fds.api.xiaomi.net/mace/tool/transform_graph>`__
(此工具Docker镜像中已经提供,也可以直接点击`下载 <http://cnbj1-inner-fds.api.xiaomi.net/mace/tool/transform_graph>`__\ 这个工具,用户亦可从官方源码编译\`)。以下分别是GPU模型和DSP模型的优化命令: or compile from tensorflow source code.
The following commands are optimization for CPU, GPU and DSP.
.. code:: sh .. code:: sh
# GPU模型: # CPU/GPU:
./transform_graph \ ./transform_graph \
--in_graph=tf_model.pb \ --in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \ --out_graph=tf_model_opt.pb \
...@@ -132,7 +158,7 @@ Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/grap ...@@ -132,7 +158,7 @@ Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/grap
strip_unused_nodes strip_unused_nodes
sort_by_execution_order' sort_by_execution_order'
# DSP模型: # DSP:
./transform_graph \ ./transform_graph \
--in_graph=tf_model.pb \ --in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \ --out_graph=tf_model_opt.pb \
...@@ -152,7 +178,7 @@ Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/grap ...@@ -152,7 +178,7 @@ Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/grap
- Caffe - Caffe
Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级。 Only support versions greater then 1.0, please use the tools caffe supplied to upgrade the models.
.. code:: bash .. code:: bash
...@@ -162,54 +188,81 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级 ...@@ -162,54 +188,81 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级
# Upgrade caffemodel # Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel $CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
3. 生成模型静态库 ============================
3. Build static library
============================
-----------------
3.1 Overview
-----------------
Mace only build static library. the followings are two use cases.
* **build for specified SOC**
You must assign ``target_socs`` in yaml configuration file.
if you want to use gpu for the soc, mace will tuning the parameters for better performance automatically.
.. warning::
模型静态库的生成需要使用目标机型,\ ***并且要求必须在目标SOC的机型上编译生成静态库。*** you should plug in a phone with that soc.
我们提供了\ ``converter.py``\ 工具,可以将模型文件转换成静态库。\ ``tools/converter.py``\ 使用步骤: * **build for all SOC**
When no ``target_soc`` specified, the library is suitable for all soc.
.. warning::
3.2 运行\ ``tools/converter.py``\ 脚本 The performance will be a little poorer than the first case.
We supply a python script ``tools/converter.py`` to build the library and run the model on the command line.
.. warning::
must run the script on the root directory of the mace code.
------------------------------------------
3.2 \ ``tools/converter.py``\ explanation
------------------------------------------
**Commands** **Commands**
**build** * **build**
.. note:: .. note::
build模型静态库以及测试工具。 build static library and test tools.
* *--config* (type=str, default="", required):模型配置yaml文件路径. * *--config* (type=str, default="", required): the path of model yaml configuration file.
* *--tuning* (default=false, optional):是否为特定SOC调制GPU参数. * *--tuning* (default=false, optional): whether tuning the parameters for the GPU of specified SOC.
* *--enable_openmp* (default=true, optional):是否启用openmp. * *--enable_openmp* (default=true, optional): whether use openmp.
**run** * **run**
.. note:: .. note::
命令行运行模型 run the models in command line
* *--config* (type=str, default="", required):模型配置yaml文件路径. * *--config* (type=str, default="", required): the path of model yaml configuration file.
* *--round* (type=int, default=1, optional):模型运行次数。 * *--round* (type=int, default=1, optional): times for run.
* *--validate* (default=false, optional): 是否需要验证运行结果与框架运行结果是否一致 * *--validate* (default=false, optional): whether to verify the results of mace are consistent with the frameworks
* *--caffe_env* (type=local/docker, default=docker, optional):当vaildate时,可以选择指定caffe环境,local表示本地,docker表示使用docker容器. * *--caffe_env* (type=local/docker, default=docker, optional): you can specific caffe environment for validation. local environment or caffe docker image.
* *--restart_round* (type=int, default=1, optional):模型重启次数。 * *--restart_round* (type=int, default=1, optional): restart round between run.
* *--check_gpu_out_of_memory* (default=false, optional): 是否需要检查gpu内存越界。 * *--check_gpu_out_of_memory* (default=false, optional): whether check out of memory for gpu.
* *--vlog_level* (type=int[0-5], default=0, optional):详细日志级别. * *--vlog_level* (type=int[0-5], default=0, optional): verbose log level for debug.
.. warning:: .. warning::
run依赖于build命令.build完成以后才可以执行run命令 ``run`` rely on ``build`` command, you should ``run`` after ``build``.
**benchmark** * **benchmark**
* *--config* (type=str, default="", required):模型配置yaml文件路径. * *--config* (type=str, default="", required): the path of model yaml configuration file.
.. warning:: .. warning::
benchmark依赖于build命令. ``benchmark`` rely on ``build`` command, you should ``benchmark`` after ``build``.
**通用参数** **common arguments**
.. list-table:: .. list-table::
:widths: auto :widths: auto
...@@ -226,33 +279,37 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级 ...@@ -226,33 +279,37 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级
- int - int
- -1 - -1
- N - N
- run/benchmark - ``run``/``benchmark``
- number of threads - number of threads
* - --cpu_affinity_policy * - --cpu_affinity_policy
- int - int
- 1 - 1
- N - N
- run/benchmark - ``run``/``benchmark``
- 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY - 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
* - --gpu_perf_hint * - --gpu_perf_hint
- int - int
- 3 - 3
- N - N
- run/benchmark - ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH - 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_perf_hint * - --gpu_perf_hint
- int - int
- 3 - 3
- N - N
- run/benchmark - ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH - 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_priority_hint * - --gpu_priority_hint
- int - int
- 3 - 3
- N - N
- run/benchmark - ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH - 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
---------------------------------------------
3.3 \ ``tools/converter.py``\ usage examples
---------------------------------------------
.. code:: sh .. code:: sh
# print help message # print help message
...@@ -261,107 +318,104 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级 ...@@ -261,107 +318,104 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级
python tools/converter.py run -h python tools/converter.py run -h
python tools/converter.py benchmark -h python tools/converter.py benchmark -h
# 仅编译模型和生成静态库 # Build the static library
python tools/converter.py build --config=models/config.yaml python tools/converter.py build --config=models/config.yaml
# 测试模型的运行时间 # Test model run time
python tools/converter.py run --config=models/config.yaml --round=100 python tools/converter.py run --config=models/config.yaml --round=100
# 对比编译好的模型在mace上与直接使用tensorflow或者caffe运行的结果,相似度使用`余弦距离表示` # Compare the results of mace and platform. use the **cosine distance** to represent similarity.
# 其中使用OpenCL设备,默认相似度大于等于`0.995`为通过;DSP设备下,相似度需要达到`0.930`。
python tools/converter.py run --config=models/config.yaml --validate python tools/converter.py run --config=models/config.yaml --validate
# 模型Benchmark:查看每个Op的运行时间 # Benchmark Model: check the execution time of each Op.
python tools/converter.py benchmark --config=models/config.yaml python tools/converter.py benchmark --config=models/config.yaml
# 查看模型运行时占用内存(如果有多个模型,可能需要注释掉一部分配置,只剩一个模型的配置) # Check the memory usage of the model(**Just keep only one model in configuration file**)
python tools/converter.py run --config=models/config.yaml --round=10000 & python tools/converter.py run --config=models/config.yaml --round=10000 &
adb shell dumpsys meminfo | grep mace_run adb shell dumpsys meminfo | grep mace_run
sleep 10 sleep 10
kill %1 kill %1
4. 发布 =============
4. Deployment
=============
通过前面的步骤,我们得到了包含业务模型的库文件。在业务代码中,我们只需要引入下面3组文件(\ ``./build/``\ 是默认的编译结果输出目录): ``build`` command will generate a package which contains the static library, model files and header files.
the package is at ``./build/${library_name}/libmace_${library_name}.tar.gz``.
The followings list the details.
**头文件** **header files**
* ``./build/${library_name}/include/mace/public/*.h`` * ``include/mace/public/*.h``
**静态库** **static libraries**
* ``./build/${library_name}/library/${target_abi}/*.a`` * ``library/${target_abi}/*.a``
**动态库** **dynamic libraries**
* ``./build/${library_name}/library/${target_abi}/libhexagon_controller.so`` * ``library/libhexagon_controller.so``
.. note:: .. note::
仅编译的模型中包含dsp模式时用到 only use for DSP
**模型文件** **model files**
* ``./build/${library_name}/model/${MODEL_TAG}.pb`` * ``model/${MODEL_TAG}.pb``
* ``./build/${library_name}/model/${MODEL_TAG}.data`` * ``model/${MODEL_TAG}.data``
.. note:: .. note::
pb文件紧当模型build_type设置为proto时才会产生。 ``.pb`` file will be generated only when build_type is ``proto``.
**库文件tar包** =============
* ``./build/${library_name}/libmace_${library_name}.tar.gz`` 5. how to use
=============
.. note:: Please refer to \ ``mace/examples/example.cc``\ for full usage. the following list the key steps.
该文件包含了上述所有文件,可以发布使用。
5. 使用
具体使用流程可参考\ ``mace/examples/mace_run.cc``\ ,下面列出关键步骤。
.. code:: cpp .. code:: cpp
// 引入头文件 // include the header files
#include "mace/public/mace.h" #include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
#include "mace/public/mace_engine_factory.h" #include "mace/public/mace_engine_factory.h"
// 0. 设置内部存储(设置一次即可 // 0. set internal storage factory(**Call once**
const std::string file_path ="/path/to/store/internel/files"; const std::string file_path ="/path/to/store/internel/files";
std::shared_ptr<KVStorageFactory> storage_factory( std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path)); new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory); ConfigKVStorageFactory(storage_factory);
//1. 声明设备类型(必须与build时指定的runtime一致) //1. Declare the device type(must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU; DeviceType device_type = DeviceType::GPU;
//2. 定义输入输出名称数组 //2. Define the input and output tensor names.
std::vector<std::string> input_names = {...}; std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...}; std::vector<std::string> output_names = {...};
//3. 创建MaceEngine对象 //3. Create MaceEngine object
std::shared_ptr<mace::MaceEngine> engine; std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status; MaceStatus create_engine_status;
// Create Engine // Create Engine from code
if (model_data_file.empty()) { create_engine_status =
create_engine_status = CreateMaceEngineFromCode(model_name.c_str(),
CreateMaceEngine(model_name.c_str(), nullptr,
nullptr, input_names,
input_names, output_names,
output_names, device_type,
device_type, &engine);
&engine); // Create Engine from proto file
} else { create_engine_status =
create_engine_status = CreateMaceEngineFromProto(model_pb_data,
CreateMaceEngine(model_name.c_str(), model_data_file.c_str(),
model_data_file.c_str(), input_names,
input_names, output_names,
output_names, device_type,
device_type, &engine);
&engine);
}
if (create_engine_status != MaceStatus::MACE_SUCCESS) { if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// do something // do something
} }
//4. 创建输入输出对象 //4. Create Input and Output objects
std::map<std::string, mace::MaceTensor> inputs; std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs; std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) { for (size_t i = 0; i < input_count; ++i) {
...@@ -386,6 +440,6 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级 ...@@ -386,6 +440,6 @@ Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out); outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
} }
//5. 执行模型,得到结果 //5. Run the model
engine.Run(inputs, &outputs); MaceStatus status = engine.Run(inputs, &outputs);
使用介绍
============
模型格式支持
-------------
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Platform
- Explanation
* - Tensorflow
- >= 1.6.0. (first choice, convenient for Android NN API in the future)
* - Caffe
- >= 1.0.
环境要求
---------
``mace``\ 提供了包含开发运行所需环境的docker镜像,镜像文件可以参考\ ``./docker/``\ 。启动命令:
.. code:: sh
sudo docker pull cr.d.xiaomi.net/mace/mace-dev
sudo docker run -it --rm --privileged -v /dev/bus/usb:/dev/bus/usb --net=host -v /local/path:/container/path cr.d.xiaomi.net/mace/mace-dev /bin/bash
如果用户希望配置开发机上的环境,可以参考如下环境要求:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - software
- version
- install command
* - bazel
- >= 0.13.0
- `bazel installation <https://docs.bazel.build/versions/master/install.html>`__
* - android-ndk
- r15c/r16b
- reference the docker file
* - adb
- >= 1.0.32
- apt-get install android-tools-adb
* - tensorflow
- >= 1.6.0
- pip install -I tensorflow==1.6.0 (if you use tensorflow model)
* - numpy
- >= 1.14.0
- pip install -I numpy=1.14.0
* - scipy
- >= 1.0.0
- pip install -I scipy=1.0.0
* - jinja2
- >= 2.10
- pip install -I jinja2=2.10
* - PyYaml
- >= 3.12.0
- pip install -I pyyaml=3.12
* - sh
- >= 1.12.14
- pip install -I sh=1.12.14
* - filelock
- >= 3.0.0
- pip install -I filelock=3.0.0
* - docker (for caffe)
- >= 17.09.0-ce
- `install doc <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__
Docker Images
----------------
* Login in `Xiaomi Docker Registry <http://docs.api.xiaomi.net/docker-registry/>`__
.. code:: sh
docker login cr.d.xiaomi.net
* Build with Dockerfile
.. code:: sh
docker build -t cr.d.xiaomi.net/mace/mace-dev
* Pull image from docker registry
.. code:: sh
docker pull cr.d.xiaomi.net/mace/mace-dev
* Create container
.. code:: sh
# Set 'host' network to use ADB
docker run -it --rm -v /local/path:/container/path --net=host cr.d.xiaomi.net/mace/mace-dev /bin/bash
使用简介
--------
=======================
1. 获取最新tag的代码
=======================
.. warning::
建议尽可能使用最新tag下的代码。
.. code:: sh
git clone git@v9.git.n.xiaomi.com:deep-computing/mace.git
# update
git fetch --all --tags --prune
# get latest tag version
tag_name=`git describe --abbrev=0 --tags`
# checkout to latest tag branch
git checkout -b ${tag_name} tags/${tag_name}
==================
2. 模型优化
==================
- Tensorflow
TensorFlow训练得到的模型进行一系列的转换,可以提升设备上的运行速度。TensorFlow提供了官方工具
`TensorFlow Graph Transform
Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
来进行模型优化
(此工具Docker镜像中已经提供,也可以直接点击
`transform_graph <http://cnbj1-inner-fds.api.xiaomi.net/mace/tool/transform_graph>`__
下载这个工具,用户亦可从官方源码编译)。以下分别是GPU模型和DSP模型的优化命令:
.. code:: sh
# GPU模型:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
# DSP模型:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
backport_concatv2
quantize_weights(minimum_size=2)
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
- Caffe
Caffe目前只支持最新版本,旧版本请使用Caffe的工具进行升级。
.. code:: bash
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
==================
3. 生成模型静态库
==================
---------------------------------------
3.1 简介
---------------------------------------
Mace目前只提供静态库,有以下两种使用场景。
**特定SOC库**
该使用场景要求在``yaml``文件中必须制定``target_socs``。主要用于为编译适用于指定手机SOC的静态库。
如果希望使用GPU,那么编译过程会自动测试选择最佳的GPU相关参数以获得更好的性能。
.. warning::
该场景下,你必须插入符合SOC的手机。
**通用库**
如果在``yaml``文件中没有指定``target_soc``,生成的静态库适用于所有手机。
.. warning::
该场景下,GPU性能会略逊于第一种场景。
我们提供了\ ``tools/converter.py``\ 工具,用于编译和运行。
.. warning::
必须在mace项目的根目录下运行\ ``tools/converter.py``\ 脚本。
---------------------------------------
3.2 \ ``tools/converter.py``\ 脚本
---------------------------------------
**Commands**
**build**
.. note::
build模型静态库以及测试工具。
* *--config* (type=str, default="", required):模型配置yaml文件路径.
* *--tuning* (default=false, optional):是否为特定SOC调制GPU参数.
* *--enable_openmp* (default=true, optional):是否启用openmp.
**run**
.. note::
命令行运行模型
* *--config* (type=str, default="", required):模型配置yaml文件路径.
* *--round* (type=int, default=1, optional):模型运行次数。
* *--validate* (default=false, optional): 是否需要验证运行结果与框架运行结果是否一致。
* *--caffe_env* (type=local/docker, default=docker, optional):当vaildate时,可以选择指定caffe环境,local表示本地,docker表示使用docker容器.
* *--restart_round* (type=int, default=1, optional):模型重启次数。
* *--check_gpu_out_of_memory* (default=false, optional): 是否需要检查gpu内存越界。
* *--vlog_level* (type=int[0-5], default=0, optional):详细日志级别.
.. warning::
run依赖于build命令.build完成以后才可以执行run命令
**benchmark**
* *--config* (type=str, default="", required):模型配置yaml文件路径.
.. warning::
benchmark依赖于build命令.
**通用参数**
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - argument(key)
- argument(value)
- default
- required
- commands
- explanation
* - --omp_num_threads
- int
- -1
- N
- run/benchmark
- number of threads
* - --cpu_affinity_policy
- int
- 1
- N
- run/benchmark
- 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
* - --gpu_perf_hint
- int
- 3
- N
- run/benchmark
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_perf_hint
- int
- 3
- N
- run/benchmark
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_priority_hint
- int
- 3
- N
- run/benchmark
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
---------------------------------------
3.3 \ ``tools/converter.py``\ 使用示例
---------------------------------------
.. code:: sh
# print help message
python tools/converter.py -h
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
# 仅编译模型和生成静态库
python tools/converter.py build --config=models/config.yaml
# 测试模型的运行时间
python tools/converter.py run --config=models/config.yaml --round=100
# 对比编译好的模型在mace上与直接使用tensorflow或者caffe运行的结果,相似度使用`余弦距离表示`
# 其中使用OpenCL设备,默认相似度大于等于`0.995`为通过;DSP设备下,相似度需要达到`0.930`。
python tools/converter.py run --config=models/config.yaml --validate
# 模型Benchmark:查看每个Op的运行时间
python tools/converter.py benchmark --config=models/config.yaml
# 查看模型运行时占用内存(如果有多个模型,可能需要注释掉一部分配置,只剩一个模型的配置)
python tools/converter.py run --config=models/config.yaml --round=10000 &
adb shell dumpsys meminfo | grep mace_run
sleep 10
kill %1
==========
4. 发布
==========
``build``命令会生成一个tar包,里面包含了发布所需要的所有文件,其位于``./build/${library_name}/libmace_${library_name}.tar.gz``.
下面解释了该包中包含了哪些文件。
**头文件**
* ``./build/${library_name}/include/mace/public/*.h``
**静态库**
* ``./build/${library_name}/library/${target_abi}/*.a``
**动态库**
* ``./build/${library_name}/library/${target_abi}/libhexagon_controller.so``
.. note::
仅编译的模型中包含dsp模式时用到
**模型文件**
* ``./build/${library_name}/model/${MODEL_TAG}.pb``
* ``./build/${library_name}/model/${MODEL_TAG}.data``
.. note::
pb文件紧当模型build_type设置为proto时才会产生。
**库文件tar包**
* ``./build/${library_name}/libmace_${library_name}.tar.gz``
.. note::
该文件包含了上述所有文件,可以发布使用。
============
5. 使用
============
具体使用流程可参考\ ``mace/examples/mace_run.cc``\ ,下面列出关键步骤。
.. code:: cpp
// 引入头文件
#include "mace/public/mace.h"
#include "mace/public/mace_engine_factory.h"
// 0. 设置内部存储(设置一次即可)
const std::string file_path ="/path/to/store/internel/files";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
//1. 声明设备类型(必须与build时指定的runtime一致)
DeviceType device_type = DeviceType::GPU;
//2. 定义输入输出名称数组
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
//3. 创建MaceEngine对象
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
// Create Engine from proto file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// do something
}
//4. 创建输入输出对象
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// load input
...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
//5. 执行模型,得到结果
engine.Run(inputs, &outputs);
...@@ -844,7 +844,8 @@ def merge_libs(target_soc, ...@@ -844,7 +844,8 @@ def merge_libs(target_soc,
project_output_dir = "%s/%s" % (build_output_dir, project_name) project_output_dir = "%s/%s" % (build_output_dir, project_name)
model_header_dir = "%s/include/mace/public" % project_output_dir model_header_dir = "%s/include/mace/public" % project_output_dir
hexagon_lib_file = "third_party/nnlib/libhexagon_controller.so" hexagon_lib_file = "third_party/nnlib/libhexagon_controller.so"
model_bin_dir = "%s/%s/%s/" % (project_output_dir, library_output_dir, abi) library_dir = "%s/%s" % (project_output_dir, library_output_dir)
model_bin_dir = "%s/%s/" % (library_dir, abi)
if os.path.exists(model_bin_dir): if os.path.exists(model_bin_dir):
sh.rm("-rf", model_bin_dir) sh.rm("-rf", model_bin_dir)
...@@ -855,7 +856,7 @@ def merge_libs(target_soc, ...@@ -855,7 +856,7 @@ def merge_libs(target_soc,
# copy header files # copy header files
sh.cp("-f", glob.glob("mace/public/*.h"), model_header_dir) sh.cp("-f", glob.glob("mace/public/*.h"), model_header_dir)
if hexagon_mode: if hexagon_mode:
sh.cp("-f", hexagon_lib_file, model_bin_dir) sh.cp("-f", hexagon_lib_file, library_dir)
if model_build_type == BuildType.code: if model_build_type == BuildType.code:
sh.cp("-f", glob.glob("mace/codegen/engine/*.h"), model_header_dir) sh.cp("-f", glob.glob("mace/codegen/engine/*.h"), model_header_dir)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册