提交 feacc76e 编写于 作者: Y yejianwu

Merge branch 'master' of v9.git.n.xiaomi.com:deep-computing/mace into refactor_target_deps

......@@ -4,9 +4,9 @@
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![Build Status](https://travis-ci.org/travis-ci/travis-web.svg?branch=master)](https://travis-ci.org/travis-ci/travis-web)
[![pipeline status](https://gitlab.com/llhe/mace/badges/master/pipeline.svg)](https://gitlab.com/llhe/mace/pipelines)
[![doc build status](https://readthedocs.org/projects/mace/badge/?version=latest)](https://readthedocs.org/projects/mace/badge/?version=latest)
[![Build Status](https://travis-ci.org/travis-ci/travis-web.svg?branch=master)](https://travis-ci.org/travis-ci/travis-web)
[Documentation](https://mace.readthedocs.io) |
[FAQ](https://mace.readthedocs.io/en/latest/faq.html) |
......@@ -44,9 +44,10 @@ targets:
architectures with limited performance.
## Getting Started
* [Introduction](https://mace.readthedocs.io/en/latest/getting_started/introduction.html)
* [Create a model deployment file](https://mace.readthedocs.io/en/latest/getting_started/create_a_model_deployment.html)
* [How to build](https://mace.readthedocs.io/en/latest/getting_started/how_to_build.html)
* [Introduction](https://mace.readthedocs.io/en/latest/introduction.html)
* [Installation](https://mace.readthedocs.io/en/latest/installation/env_requirement.html)
* [Basic Usage](https://mace.readthedocs.io/en/latest/user_guide/basic_usage.html)
* [Advanced Usage](https://mace.readthedocs.io/en/latest/user_guide/advanced_usage.html)
## Performance
[MACE Model Zoo](https://github.com/XiaoMi/mace-models) contains
......
......@@ -35,9 +35,10 @@
同时支持在具有POSIX接口的系统的CPU上运行。
## 开始使用
* [简介](https://mace.readthedocs.io/en/latest/getting_started/introduction.html)
* [创建模型部署文件](https://mace.readthedocs.io/en/latest/getting_started/create_a_model_deployment.html)
* [如何构建](https://mace.readthedocs.io/en/latest/getting_started/how_to_build.html)
* [简介](https://mace.readthedocs.io/en/latest/introduction.html)
* [安装](https://mace.readthedocs.io/en/latest/installation/env_requirement.html)
* [基本用法](https://mace.readthedocs.io/en/latest/user_guide/basic_usage.html)
* [高级用法](https://mace.readthedocs.io/en/latest/user_guide/advanced_usage.html)
## 性能评测
[MACE Model Zoo](https://github.com/XiaoMi/mace-models)
......
......@@ -6,7 +6,7 @@
import recommonmark.parser
import sphinx_rtd_theme
project = u'Mobile AI Compute Engine (MACE)'
project = u'MACE'
author = u'%s Developers' % project
copyright = u'2018, %s' % author
......
......@@ -96,6 +96,10 @@ Add test and benchmark
It's strongly recommended to add unit tests and micro benchmarks for your
new Op. If you wish to contribute back, it's required.
Add Op in model converter
-------------------------
You need to add this new Op in the model converter.
Document the new Op
---------------------
Finally, add an entry in operator table in the document.
How to run tests
=================
To run tests, you need to first cross compile the code, push the binary
into the device and then execute the binary. To automate this process,
MACE provides `tools/bazel_adb_run.py` tool.
You need to make sure your device has been connected to your dev pc before running tests.
Run unit tests
---------------
MACE use [gtest](https://github.com/google/googletest) for unit tests.
* Run all unit tests defined in a Bazel target, for example, run `ops_test`:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_test" \
--run_target=True
```
* Run unit tests with [gtest](https://github.com/google/googletest) filter,
for example, run `Conv2dOpTest` unit tests:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_test" \
--run_target=True \
--args="--gtest_filter=Conv2dOpTest*"
```
Run micro benchmarks
--------------------
MACE provides a micro benchmark framework for performance tuning.
* Run all micro benchmarks defined in a Bazel target, for example, run all
`ops_benchmark` micro benchmarks:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_benchmark" \
--run_target=True
```
* Run micro benchmarks with regex filter, for example, run all `CONV_2D` GPU
micro benchmarks:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_benchmark" \
--run_target=True \
--args="--filter=MACE_BM_CONV_2D_.*_GPU"
```
Memory layout
===========================
==============
CPU runtime memory layout
-------------------------
--------------------------
The CPU tensor buffer is organized in the following order:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
......@@ -22,7 +20,7 @@ The CPU tensor buffer is organized in the following order:
- W
GPU runtime memory layout
-----------------------------
--------------------------
GPU runtime implementation base on OpenCL, which uses 2D image with CL_RGBA
channel order as the tensor storage. This requires OpenCL 1.2 and above.
......@@ -34,14 +32,12 @@ The following tables describe the mapping from different type of tensors to
2D RGBA Image.
Input/Output Tensor
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
The Input/Output Tensor is stored in NHWC format:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
......@@ -64,9 +60,7 @@ Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
......@@ -82,12 +76,10 @@ coordination relation between **Image** and **Buffer**.
- k=[0, 4)
Filter Tensor
~~~~~~~~~~~~~
~~~~~~~~~~~~~~
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor
- Buffer
......@@ -106,9 +98,7 @@ Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
......@@ -121,12 +111,10 @@ coordination relation between **Image** and **Buffer**.
- only support multiplier == 1, k=[0, 4)
1-D Argument Tensor
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
......@@ -141,9 +129,7 @@ Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
......
Create a model deployment file
==============================
The first step to deploy your models is to create a YAML model deployment
file.
One deployment file describes a case of model deployment,
each file will generate one static library (if more than one ABIs specified,
there will be one static library for each). The deployment file can contain
one or more models, for example, a smart camera application may contain face
recognition, object recognition, and voice recognition models, which can be
defined in one deployment file.
Example
----------
Here is an example deployment file used by an Android demo application.
TODO: change this example file to the demo deployment file
(reuse the same file) and rename to a reasonable name.
.. literalinclude:: models/demo_app_models.yaml
:language: yaml
Configurations
--------------------
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - library_name
- library name.
* - target_abis
- The target ABI to build, can be one or more of 'host', 'armeabi-v7a' or 'arm64-v8a'.
* - target_socs
- [optional] build for specified socs if you just want use the model for that socs.
* - embed_model_data
- Whether embedding model weights as the code, default to 0.
* - build_type
- model build type, can be ['proto', 'code']. 'proto' for converting model to ProtoBuf file and 'code' for converting model to c++ code.
* - linkshared
- [optional] Use dynamic linking for libmace library when setting to 1, or static linking when setting to 0, default to 0.
* - model_name
- model name, should be unique if there are multiple models.
**LIMIT: if build_type is code, model_name will used in c++ code so that model_name must fulfill c++ name specification.**
* - platform
- The source framework, one of [tensorflow, caffe].
* - model_file_path
- The path of the model file, can be local or remote.
* - model_sha256_checksum
- The SHA256 checksum of the model file.
* - weight_file_path
- [optional] The path of the model weights file, used by Caffe model.
* - weight_sha256_checksum
- [optional] The SHA256 checksum of the weight file, used by Caffe model.
* - subgraphs
- subgraphs key. **DO NOT EDIT**
* - input_tensors
- The input tensor names (tensorflow), top name of inputs' layer (caffe). one or more strings.
* - output_tensors
- The output tensor names (tensorflow), top name of outputs' layer (caffe). one or more strings.
* - input_shapes
- The shapes of the input tensors, in NHWC order.
* - output_shapes
- The shapes of the output tensors, in NHWC order.
* - input_ranges
- The numerical range of the input tensors, default [-1, 1]. It is only for test.
* - validation_inputs_data
- [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used.
* - runtime
- The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU.
* - data_type
- [optional] The data type used for specified runtime. [fp16_fp32, fp32_fp32] for GPU, default is fp16_fp32. [fp32] for CPU. [uint8] for DSP.
* - limit_opencl_kernel_time
- [optional] Whether splitting the OpenCL kernel within 1 ms to keep UI responsiveness, default to 0.
* - nnlib_graph_mode
- [optional] Control the DSP precision and performance, default to 0 usually works for most cases.
* - obfuscate
- [optional] Whether to obfuscate the model operator name, default to 0.
* - winograd
- [optional] Whether to enable Winograd convolution, **will increase memory consumption**.
How to build
============
Supported Platforms
-------------------
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Platform
- Explanation
* - TensorFlow
- >= 1.6.0.
* - Caffe
- >= 1.0.
Environment Requirement
-------------------------
MACE requires the following dependencies:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - software
- version
- install command
* - bazel
- >= 0.13.0
- `bazel installation guide <https://docs.bazel.build/versions/master/install.html>`__
* - android-ndk
- r15c/r16b
- `NDK installation guide <https://developer.android.com/ndk/guides/setup#install>`__ or refers to the docker file
* - adb
- >= 1.0.32
- apt-get install android-tools-adb
* - tensorflow
- >= 1.6.0
- pip install -I tensorflow==1.6.0 (if you use tensorflow model)
* - numpy
- >= 1.14.0
- pip install -I numpy==1.14.0
* - scipy
- >= 1.0.0
- pip install -I scipy==1.0.0
* - jinja2
- >= 2.10
- pip install -I jinja2==2.10
* - PyYaml
- >= 3.12.0
- pip install -I pyyaml==3.12
* - sh
- >= 1.12.14
- pip install -I sh==1.12.14
* - filelock
- >= 3.0.0
- pip install -I filelock==3.0.0
* - docker (for caffe)
- >= 17.09.0-ce
- `docker installation guide <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__
.. note::
``export ANDROID_NDK_HOME=/path/to/ndk`` to specify ANDROID_NDK_HOME
MACE provides a Dockerfile with these dependencies installed,
you can build the image from it,
.. code:: sh
docker build -t registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite ./docker/mace-dev-lite
or pull the pre-built image from Docker Hub,
.. code:: sh
docker pull registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite
and then run the container with the following command.
.. code:: sh
# Create container
# Set 'host' network to use ADB
docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb --net=host \
-v /local/path:/container/path \
registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite \
/bin/bash
Usage
--------
=======================================
1. Pull MACE source code
=======================================
.. code:: sh
git clone https://github.com/XiaoMi/mace.git
git fetch --all --tags --prune
# Checkout the latest tag (i.e. release version)
tag_name=`git describe --abbrev=0 --tags`
git checkout tags/${tag_name}
.. note::
It's highly recommanded to use a release version instead of master branch.
============================
2. Model Preprocessing
============================
- TensorFlow
TensorFlow provides
`Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
to improve inference efficiency by making various optimizations like Ops
folding, redundant node removal etc. It's strongly recommended to make these
optimizations before graph conversion step.
The following commands show the suggested graph transformations and
optimizations for different runtimes,
.. code:: sh
# CPU/GPU:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
.. code:: sh
# DSP:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
backport_concatv2
quantize_weights(minimum_size=2)
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
- Caffe
MACE converter only supports Caffe 1.0+, you need to upgrade
your models with Caffe built-in tool when necessary,
.. code:: bash
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
==============================
3. Build static/shared library
==============================
-----------------
3.1 Overview
-----------------
MACE can build either static or shared library (which is
specified by ``linkshared`` in YAML model deployment file).
The followings are two use cases.
* **Build well tuned library for specific SoCs**
When ``target_socs`` is specified in YAML model deployment file, the build
tool will enable automatic tuning for GPU kernels. This usually takes some
time to finish depending on the complexity of your model.
.. note::
You should plug in device(s) with the correspoding SoC(s).
* **Build generic library for all SoCs**
When ``target_socs`` is not specified, the generated library is compatible
with general devices.
.. note::
There will be around of 1 ~ 10% performance drop for GPU
runtime compared to the well tuned library.
MACE provide command line tool (``tools/converter.py``) for
model conversion, compiling, test run, benchmark and correctness validation.
.. note::
1. ``tools/converter.py`` should be run at the root directory of this project.
2. When ``linkshared`` is set to ``1``, ``build_type`` should be ``proto``.
And currently only android devices supported.
------------------------------------------
3.2 \ ``tools/converter.py``\ usage
------------------------------------------
**Commands**
* **build**
build library and test tools.
.. code:: sh
# Build library
python tools/converter.py build --config=models/config.yaml
* **run**
run the model(s).
.. code:: sh
# Test model run time
python tools/converter.py run --config=models/config.yaml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=models/config.yaml --validate
# Check the memory usage of the model(**Just keep only one model in configuration file**)
python tools/converter.py run --config=models/config.yaml --round=10000 &
sleep 5
adb shell dumpsys meminfo | grep mace_run
kill %1
.. warning::
``run`` rely on ``build`` command, you should ``run`` after ``build``.
* **benchmark**
benchmark and profiling model.
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=models/config.yaml
.. warning::
``benchmark`` rely on ``build`` command, you should ``benchmark`` after ``build``.
**Common arguments**
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - option
- type
- default
- commands
- explanation
* - --omp_num_threads
- int
- -1
- ``run``/``benchmark``
- number of threads
* - --cpu_affinity_policy
- int
- 1
- ``run``/``benchmark``
- 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_priority_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
Using ``-h`` to get detailed help.
.. code:: sh
python tools/converter.py -h
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
=============
4. Deployment
=============
``build`` command will generate the static/shared library, model files and
header files and package them as
``build/${library_name}/libmace_${library_name}.tar.gz``.
- The generated ``static`` libraries are organized as follows,
.. code::
build/
└── mobilenet-v2-gpu
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── libmace_mobilenet-v2-gpu.tar.gz
├── lib
│   ├── arm64-v8a
│   │   └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
│   └── armeabi-v7a
│   └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
- The generated ``shared`` libraries are organized as follows,
.. code::
build
└── mobilenet-v2-gpu
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── lib
│   ├── arm64-v8a
│   │   ├── libgnustl_shared.so
│   │   └── libmace.so
│   └── armeabi-v7a
│   ├── libgnustl_shared.so
│   └── libmace.so
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
.. note::
1. DSP runtime depends on ``libhexagon_controller.so``.
2. ``${MODEL_TAG}.pb`` file will be generated only when ``build_type`` is ``proto``.
3. ``${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin`` will
be generated only when ``target_socs`` and ``gpu`` runtime are specified.
4. Generated shared library depends on ``libgnustl_shared.so``.
.. warning::
``${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin`` depends
on the OpenCL version of the device, you should maintan the compatibility or
configure compiling cache store with ``ConfigKVStorageFactory``.
=========================================
5. How to use the library in your project
=========================================
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the build_type is code
#include "mace/public/mace_engine_factory.h"
// 0. Set pre-compiled OpenCL binary program file paths when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 1. Set compiled OpenCL kernel cache, this is used to reduce the
// initialization time since the compiling is too slow. It's suggested
// to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel
// recompilations.
const std::string file_path ="path/to/opencl_cache_file";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 2. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
// Create Engine from model file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// 5. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// Load input here
// ...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);
Introduction
============
Mobile AI Compute Engine (MACE) is a deep learning inference framework optimized for
mobile heterogeneous computing platforms. The following figure shows the
overall architecture.
.. image:: mace-arch.png
:scale: 40 %
:align: center
Model format
------------
MACE defines a customized model format which is similar to
Caffe2. The MACE model can be converted from exported models by TensorFlow
and Caffe. A YAML file is used to describe the model deployment details. In the
next chapter, there is a detailed guide showing how to create this YAML file.
Model conversion
----------------
Currently, we provide model converters for TensorFlow and Caffe. And
more frameworks will be supported in the future.
Model loading
-------------
The MACE model format contains two parts: the model graph definition and
the model parameter tensors. The graph part utilizes Protocol Buffers
for serialization. All the model parameter tensors are concatenated
together into a continuous byte array, and we call this array tensor data in
the following paragraphs. In the model graph, the tensor data offsets
and lengths are recorded.
The models can be loaded in 3 ways:
1. Both model graph and tensor data are dynamically loaded externally
(by default, from file system, but the users are free to choose their own
implementations, for example, with compression or encryption). This
approach provides the most flexibility but the weakest model protection.
2. Both model graph and tensor data are converted into C++ code and loaded
by executing the compiled code. This approach provides the strongest
model protection and simplest deployment.
3. The model graph is converted into C++ code and constructed as the second
approach, and the tensor data is loaded externally as the first approach.
......@@ -6,21 +6,37 @@ The main documentation is organized into the following sections:
.. toctree::
:maxdepth: 1
:caption: Getting started
:name: sec-start
:caption: Introduction
:name: sec-intro
getting_started/introduction
getting_started/create_a_model_deployment
getting_started/how_to_build
getting_started/op_lists
introduction
.. toctree::
:maxdepth: 1
:caption: Development
:caption: Installation
:name: sec-install
installation/env_requirement
installation/using_docker
installation/manual_setup
.. toctree::
:maxdepth: 1
:caption: User guide
:name: sec-user
user_guide/basic_usage
user_guide/advanced_usage
user_guide/op_lists
.. toctree::
:maxdepth: 1
:caption: Developer guide
:name: sec-devel
development/contributing
development/adding_a_new_op
development/how_to_run_tests
development/memory_layout
.. toctree::
......
Environment requirement
========================
MACE requires the following dependencies:
Required dependencies
---------------------
.. list-table::
:header-rows: 1
* - Software
- Installation command
- Tested version
* - Python
-
- 2.7
* - Bazel
- `bazel installation guide <https://docs.bazel.build/versions/master/install.html>`__
- 0.13.0
* - CMake
- apt-get install cmake
- >= 3.11.3
* - Jinja2
- pip install -I jinja2==2.10
- 2.10
* - PyYaml
- pip install -I pyyaml==3.12
- 3.12.0
* - sh
- pip install -I sh==1.12.14
- 1.12.14
Optional dependencies
---------------------
.. list-table::
:header-rows: 1
* - Software
- Installation command
- Remark
* - Android NDK
- `NDK installation guide <https://developer.android.com/ndk/guides/setup#install>`__
- Required by Android build, r15b, r15c, r16b
* - ADB
- apt-get install android-tools-adb
- Required by Android run, >= 1.0.32
* - TensorFlow
- pip install -I tensorflow==1.6.0
- Required by TensorFlow model
* - Docker
- `docker installation guide <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__
- Required by docker mode for Caffe model
* - Numpy
- pip install -I numpy==1.14.0
- Required by model validation
* - Scipy
- pip install -I scipy==1.0.0
- Required by model validation
* - FileLock
- pip install -I filelock==3.0.0
- Required by Android run
.. note::
For Android build, `ANDROID_NDK_HOME` must be confifigured by using ``export ANDROID_NDK_HOME=/path/to/ndk``
Manual setup
=============
The setup steps are based on ``Ubuntu``, you can change the commands
correspondingly for other systems.
For the detailed installation dependencies, please refer to :doc:`env_requirement`.
Install Bazel
-------------
Recommend bazel with version larger than ``0.13.0`` (Refer to `Bazel documentation <https://docs.bazel.build/versions/master/install.html>`__).
.. code:: sh
export BAZEL_VERSION=0.13.1
mkdir /bazel && \
cd /bazel && \
wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
chmod +x bazel-*.sh && \
./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
cd / && \
rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
Install Android NDK
--------------------
The recommended Android NDK versions includes r15b, r15c and r16b (Refers to
`NDK installation guide <https://developer.android.com/ndk/guides/setup#install>`__).
.. code:: sh
# Download NDK r15c
cd /opt/ && \
wget -q https://dl.google.com/android/repository/android-ndk-r15c-linux-x86_64.zip && \
unzip -q android-ndk-r15c-linux-x86_64.zip && \
rm -f android-ndk-r15c-linux-x86_64.zip
export ANDROID_NDK_VERSION=r15c
export ANDROID_NDK=/opt/android-ndk-${ANDROID_NDK_VERSION}
export ANDROID_NDK_HOME=${ANDROID_NDK}
# add to PATH
export PATH=${PATH}:${ANDROID_NDK_HOME}
Install extra tools
--------------------
.. code:: sh
apt-get install -y --no-install-recommends \
cmake \
android-tools-adb
pip install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com setuptools
pip install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com \
"numpy>=1.14.0" \
scipy \
jinja2 \
pyyaml \
sh==1.12.14 \
pycodestyle==2.4.0 \
filelock
Install TensorFlow (Optional)
------------------------------
.. code:: sh
pip install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com tensorflow==1.6.0
Install Caffe (Optional)
-------------------------
Please follow the installation instruction of `Caffe <http://caffe.berkeleyvision.org/installation.html>`__.
Using docker
=============
Pull or build docker image
---------------------------
MACE provides docker images with dependencies installed and also Dockerfiles for images building,
you can pull the existing ones directly or build them from the Dockerfiles.
In most cases, the ``lite edition`` image can satisfy developer's basic needs.
.. note::
It's highly recommended to pull built images.
- ``lite edition`` docker image.
.. code:: sh
# Pull lite edition docker image
docker pull registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite
# Build lite edition docker image
docker build -t registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite ./docker/mace-dev-lite
- ``full edition`` docker image (which contains multiple NDK versions and other dev tools).
.. code:: sh
# Pull full edition docker image
docker pull registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev
# Build full edition docker image
docker build -t registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev ./docker/mace-dev
.. note::
We will show steps with lite edition later.
Using the image
-----------------
Create container with the following command
.. code:: sh
# Create a container named `mace-dev`
docker run -it --privileged -d --name mace-dev \
-v /dev/bus/usb:/dev/bus/usb --net=host \
-v /local/path:/container/path \
registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite
# Execute an interactive bash shell on the container
docker exec -it mace-dev /bin/bash
Introduction
============
MACE (Mobile AI Compute Engine) is a deep learning inference framework optimized for
mobile heterogeneous computing platforms.
MACE provides tools and documents to help users to deploy deep learning models
to mobile phones, tablets, personal computers and IoT devices.
Architecture
-------------
The following figure shows the overall architecture.
.. image:: mace-arch.png
:scale: 40 %
:align: center
MACE Model
~~~~~~~~~~
MACE defines a customized model format which is similar to
Caffe2. The MACE model can be converted from exported models by TensorFlow
and Caffe.
MACE Interpreter
~~~~~~~~~~~~~~~~~
Mace Interpreter mainly parses the NN graph and manages the tensors in the graph.
Runtime
~~~~~~~
CPU/GPU/DSP runtime correspond to the Ops for different devices.
Workflow
--------
The following figure shows the basic work flow of MACE.
.. image:: mace-work-flow.png
:scale: 60 %
:align: center
1. Configure model deployment file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Model deploy configuration file (.yml) describes the information of the model and library,
MACE will build the library based on the file.
2. Build libraries
~~~~~~~~~~~~~~~~~~
Build MACE dynamic or static libraries.
3. Convert model
~~~~~~~~~~~~~~~~~~
Convert TensorFlow or Caffe model to MACE model.
4.1. Deploy
~~~~~~~~~~~~~~~~~~
Integrate the MACE library into your application and run with MACE API.
4.2. Run (CLI)
~~~~~~~~~~~~~~~~~~
MACE provides `mace_run` command line tool, which could be used to run model
and validate model correctness against original TensorFlow or Caffe results.
4.3. Benchmark
~~~~~~~~~~~~~~~~~~
MACE provides benchmark tool to get the Op level profiling result of the model.
简介
----
Mobile AI Compute Engine (MACE) 是一个专为移动端异构计算设备优化的深度学习前向预测框架。
MACE覆盖了常见的移动端计算设备(CPU,GPU和DSP),并且提供了完整的工具链和文档,用户借助MACE能够
很方便地在移动端部署深度学习模型。MACE已经在小米内部广泛使用并且被充分验证具有业界领先的性能和稳定性。
框架
----
下图描述了MACE的基本框架。
.. image:: mace-arch.png
:scale: 40 %
:align: center
MACE Model
~~~~~~~~~~~~~~~~~~
MACE定义了自有的模型格式(类似于Caffe2),通过MACE提供的工具可以将Caffe和TensorFlow的模型
转为MACE模型。
MACE Interpreter
~~~~~~~~~~~~~~~~~~
MACE Interpreter主要负责解析运行神经网络图(DAG)并管理网络中的Tensors。
Runtime
~~~~~~~~~~~~~~~~~~
CPU/GPU/DSP Runtime对应于各个计算设备的算子实现。
使用流程
------------
下图描述了MACE使用的基本流程。
.. image:: mace-work-flow-zh.png
:scale: 60 %
:align: center
1. 配置模型部署文件(.yml)
~~~~~~~~~~~~~~~~~~~~~~~~~~
模型部署文件详细描述了需要部署的模型以及生成库的信息,MACE根据该文件最终生成对应的库文件。
2. 编译MACE库
~~~~~~~~~~~~~~~~~~
编译MACE的静态库或者动态库。
3. 转换模型
~~~~~~~~~~~~~~~~~~
将TensorFlow 或者 Caffe的模型转为MACE的模型。
4.1. 部署
~~~~~~~~~~~~~~~~~~
根据不同使用目的集成Build阶段生成的库文件,然后调用MACE相应的接口执行模型。
4.2. 命令行运行
~~~~~~~~~~~~~~~~~~
MACE提供了命令行工具,可以在命令行运行模型,可以用来测试模型运行时间,内存占用和正确性。
4.3. Benchmark
~~~~~~~~~~~~~~~~~~
MACE提供了命令行benchmark工具,可以细粒度的查看模型中所涉及的所有算子的运行时间。
Advanced usage
===============
This part contains the full usage of MACE.
Overview
---------
As mentioned in the previous part, a model deployment file defines a case of model deployment.
The building process includes parsing model deployment file, converting models,
building MACE core library and packing generated model libraries.
Deployment file
---------------
One deployment file will generate one library normally, but if more than one ABIs are specified,
one library will be generated for each ABI.
A deployment file can also contain multiple models. For example, an AI camera application may
contain face recognition, object recognition, and voice recognition models, all of which can be defined
in one deployment file.
* **Example**
Here is an example deployment file with two models.
.. literalinclude:: models/demo_models.yml
:language: yaml
* **Configurations**
.. list-table::
:header-rows: 1
* - Options
- Usage
* - library_name
- Library name.
* - target_abis
- The target ABI(s) to build, could be 'host', 'armeabi-v7a' or 'arm64-v8a'.
If more than one ABIs will be used, separate them by commas.
* - target_socs
- [optional] Build for specific SoCs.
* - model_graph_format
- model graph format, could be 'file' or 'code'. 'file' for converting model graph to ProtoBuf file(.pb) and 'code' for converting model graph to c++ code.
* - model_data_format
- model data format, could be 'file' or 'code'. 'file' for converting model weight to data file(.data) and 'code' for converting model weight to c++ code.
* - model_name
- model name should be unique if there are more than one models.
**LIMIT: if build_type is code, model_name will be used in c++ code so that model_name must comply with c++ name specification.**
* - platform
- The source framework, tensorflow or caffe.
* - model_file_path
- The path of your model file which can be local path or remote URL.
* - model_sha256_checksum
- The SHA256 checksum of the model file.
* - weight_file_path
- [optional] The path of Caffe model weights file.
* - weight_sha256_checksum
- [optional] The SHA256 checksum of Caffe model weights file.
* - subgraphs
- subgraphs key. **DO NOT EDIT**
* - input_tensors
- The input tensor name(s) (tensorflow) or top name(s) of inputs' layer (caffe).
If there are more than one tensors, use one line for a tensor.
* - output_tensors
- The output tensor name(s) (tensorflow) or top name(s) of outputs' layer (caffe).
If there are more than one tensors, use one line for a tensor.
* - input_shapes
- The shapes of the input tensors, in NHWC order.
* - output_shapes
- The shapes of the output tensors, in NHWC order.
* - input_ranges
- The numerical range of the input tensors' data, default [-1, 1]. It is only for test.
* - validation_inputs_data
- [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used.
* - runtime
- The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU.
* - data_type
- [optional] The data type used for specified runtime. [fp16_fp32, fp32_fp32] for GPU, default is fp16_fp32, [fp32] for CPU and [uint8] for DSP.
* - limit_opencl_kernel_time
- [optional] Whether splitting the OpenCL kernel within 1 ms to keep UI responsiveness, default is 0.
* - obfuscate
- [optional] Whether to obfuscate the model operator name, default to 0.
* - winograd
- [optional] Which type winograd to use, could be [0, 2, 4]. 0 for disable winograd, 2 and 4 for enable winograd, 4 may be faster than 2 but may take more memory.
.. note::
Some command tools:
.. code:: bash
# Get device's soc info.
adb shell getprop | grep platform
# command for generating sha256_sum
sha256sum /path/to/your/file
Advanced usage
--------------
There are two common advanced use cases:
- converting model to C++ code.
- tuning GPU kernels for a specific SoC.
* **Convert model(s) to C++ code**
.. warning::
If you want to use this case, you can just use static mace library.
* **1. Change the model deployment file(.yml)**
If you want to protect your model, you can convert model to C++ code. there are also two cases:
* convert model graph to code and model weight to file with below model configuration.
.. code:: sh
model_graph_format: code
model_data_format: file
* convert both model graph and model weight to code with below model configuration.
.. code:: sh
model_graph_format: code
model_data_format: code
.. note::
Another model protection method is using ``obfuscate`` to obfuscate names of model's operators.
* **2. Convert model(s) to code**
.. code:: sh
python tools/converter.py convert --config=/path/to/model_deployment_file.yml
The command will generate **${library_name}.a** in **builds/${library_name}/model** directory and
** *.h ** in **builds/${library_name}/include** like the following dir-tree.
.. code::
# model_graph_format: code
# model_data_format: file
builds
├── include
│   └── mace
│   └── public
│   ├── mace_engine_factory.h
│   └── mobilenet_v1.h
└── model
   ├── mobilenet-v1.a
   └── mobilenet_v1.data
* **3. Deployment**
* Link `libmace.a` and `${library_name}.a` to your target.
* Refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the model_graph_format is code
#include "mace/public/${model_name}.h"
#include "mace/public/mace_engine_factory.h"
// ... Same with the code in basic usage
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// ... Same with the code in basic usage
* **Tuning for specific SoC's GPU**
If you want to use the GPU of a specific device, you can just specify the ``target_socs`` in your YAML file and
then tune the MACE lib for it (OpenCL kernels), which may get 1~10% performance improvement.
* **1. Change the model deployment file(.yml)**
Specify ``target_socs`` in your model deployment file(.yml):
.. code:: sh
target_socs: [sdm845]
.. note::
Get device's soc info: `adb shell getprop | grep platform`
* **2. Convert model(s)**
.. code:: sh
python tools/converter.py convert --config=/path/to/model_deployment_file.yml
* **3. Tuning**
The tools/converter.py will enable automatic tuning for GPU kernels. This usually takes some
time to finish depending on the complexity of your model.
.. note::
You should plug in device(s) with the specific SoC(s).
.. code:: sh
python tools/converter.py run --config=/path/to/model_deployment_file.yml --validate
The command will generate two files in `builds/${library_name}/opencl`, like the following dir-tree.
.. code::
builds
└── mobilenet-v2
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
└── arm64-v8a
   ├── moblinet-v2_compiled_opencl_kernel.MiNote3.sdm660.bin
   └── moblinet-v2_tuned_opencl_parameter.MiNote3.sdm660.bin
* **mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin** stands for the OpenCL binaries
used for your models, which could accelerate the initialization stage.
Details please refer to `OpenCL Specification <https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clCreateProgramWithBinary.html>`__.
* **mobilenet-v2-tuned_opencl_parameter.MI6.msm8998.bin** stands for the tuned OpenCL parameters
for the SoC.
* **4. Deployment**
* Change the names of files generated above for not collision and push them to **your own device's directory**.
* Use like the previous procedure, below lists the key steps differently.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// 0. Set pre-compiled OpenCL binary program file paths and OpenCL parameters file path when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(path/to/opencl_binary_paths);
mace::SetOpenCLParameterPath(path/to/opencl_parameter_file);
}
// ... Same with the code in basic usage.
Useful Commands
---------------
* **run the model**
.. code:: sh
# Test model run time
python tools/converter.py run --config=/path/to/model_deployment_file.yml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/model_deployment_file.yml --validate
# Check the memory usage of the model(**Just keep only one model in deployment file**)
python tools/converter.py run --config=/path/to/model_deployment_file.yml --round=10000 &
sleep 5
adb shell dumpsys meminfo | grep mace_run
kill %1
.. warning::
``run`` rely on ``convert`` command, you should ``convert`` before ``run``.
* **benchmark and profile model**
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=/path/to/model_deployment_file.yml
.. warning::
``benchmark`` rely on ``convert`` command, you should ``benchmark`` after ``convert``.
**Common arguments**
.. list-table::
:header-rows: 1
* - option
- type
- default
- commands
- explanation
* - --omp_num_threads
- int
- -1
- ``run``/``benchmark``
- number of threads
* - --cpu_affinity_policy
- int
- 1
- ``run``/``benchmark``
- 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_priority_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
Use ``-h`` to get detailed help.
.. code:: sh
python tools/converter.py -h
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
Basic usage
============
Build and run an example model
-------------------------------
At first, make sure the environment has been set up correctly already (refer to :doc:`../installation/env_requirement`).
The followings are instructions about how to quickly build and run a provided model in
`MACE Model Zoo <https://github.com/XiaoMi/mace-models>`__.
Here we use the mobilenet-v2 model as an example.
**Commands**
1. Pull `MACE <https://github.com/XiaoMi/mace>`__ project.
.. code:: sh
git clone https://github.com/XiaoMi/mace.git
git fetch --all --tags --prune
# Checkout the latest tag (i.e. release version)
tag_name=`git describe --abbrev=0 --tags`
git checkout tags/${tag_name}
.. note::
It's highly recommanded to use a release version instead of master branch.
2. Pull `MACE Model Zoo <https://github.com/XiaoMi/mace-models>`__ project.
.. code:: sh
git clone https://github.com/XiaoMi/mace-models.git
3. Build a generic MACE library.
.. code:: sh
cd path/to/mace
# Build library
# output lib path: builds/lib
bash tools/build-standalone-lib.sh
4. Convert the pre-trained mobilenet-v2 model to MACE format model.
.. code:: sh
cd path/to/mace
# Build library
python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
5. Run the model.
.. note::
If you want to run on device/phone, please plug in at least one device/phone.
.. code:: sh
# Run example
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --example
# Test model run time
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --validate
Build your own model
---------------------
This part will show you how to use your own pre-trained model in MACE.
======================
1. Prepare your model
======================
MACE now supports models from TensorFlow and Caffe (more frameworks will be supported).
- TensorFlow
Prepare your pre-trained TensorFlow model.pb file.
Use `Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
to optimize your model for inference.
This tool will improve the efficiency of inference by making several optimizations like operators
folding, redundant node removal etc. We strongly recommend MACE users to use it before building.
Usage for CPU/GPU,
.. code:: bash
# CPU/GPU:
./transform_graph \
--in_graph=/path/to/your/tf_model.pb \
--out_graph=/path/to/your/output/tf_model_opt.pb \
--inputs='input node name' \
--outputs='output node name' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
- Caffe
Caffe 1.0+ models are supported in MACE converter tool.
If your model is from lower version Caffe, you need to upgrade it by using the Caffe built-in tool before converting.
.. code:: bash
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
===========================================
2. Create a deployment file for your model
===========================================
When converting a model or building a library, MACE needs to read a YAML file which is called model deployment file here.
A model deployment file contains all the information of your model(s) and building options. There are several example
deployment files in *MACE Model Zoo* project.
The following shows two basic usage of deployment files for TensorFlow and Caffe models.
Modify one of them and use it for your own case.
- TensorFlow
.. literalinclude:: models/demo_models_tf.yml
:language: yaml
- Caffe
.. literalinclude:: models/demo_models_caffe.yml
:language: yaml
More details about model deployment file are in :doc:`advanced_usage`.
======================
3. Convert your model
======================
When the deployment file is ready, you can use MACE converter tool to convert your model(s).
.. code:: bash
python tools/converter.py convert --config=/path/to/your/model_deployment_file.yml
This command will download or load your pre-trained model and convert it to a MACE model proto file and weights data file.
The generated model files will be stored in ``build/${library_name}/model`` folder.
.. warning::
Please set ``model_graph_format: file`` and ``model_data_format: file`` in your deployment file before converting.
The usage of ``model_graph_format: code`` will be demonstrated in :doc:`advanced_usage`.
=============================
4. Build MACE into a library
=============================
Use bazel to build MACE source code into a library.
.. code:: sh
cd path/to/mace
# Build library
# output lib path: builds/lib
bash tools/build-standalone-lib.sh
The above command will generate dynamic library ``builds/lib/${ABI}/libmace.so`` and static library ``builds/lib/${ABI}/libmace.a``.
.. warning::
Please verify that the target_abis param in the above command and your deployment file are the same.
==================
5. Run your model
==================
With the converted model, the static or shared library and header files, you can use the following commands
to run and validate your model.
.. warning::
If you want to run on device/phone, please plug in at least one device/phone.
* **run**
run the model.
.. code:: sh
# Test model run time
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --validate
* **benchmark**
benchmark and profile the model.
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=/path/to/your/model_deployment_file.yml
=======================================
6. Deploy your model into applications
=======================================
In the converting and building steps, you've got the static/shared library, model files and
header files.
``${library_name}`` is the name you defined in the first line of your deployment YAML file.
- The generated ``static`` library files are organized as follows,
.. code::
builds
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── lib
│   ├── arm64-v8a
│   │   ├── libmace.a
│   │   └── libmace.so
│   ├── armeabi-v7a
│   │   ├── libhexagon_controller.so
│   │   ├── libmace.a
│   │   └── libmace.so
│   └── linux-x86-64
│   ├── libmace.a
│   └── libmace.so
└── mobilenet-v1
├── model
│   ├── mobilenet_v1.data
│   └── mobilenet_v1.pb
└── _tmp
└── arm64-v8a
└── mace_run_static
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// 0. Set pre-compiled OpenCL binary program file paths when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 1. Set compiled OpenCL kernel cache, this is used to reduce the
// initialization time since the compiling is too slow. It's suggested
// to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel
// recompilations.
const std::string file_path ="path/to/opencl_cache_file";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 2. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from model file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// 5. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// Load input here
// ...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);
More details are in :doc:`advanced_usage`.
# The name of library
library_name: mobile_squeeze
# host, armeabi-v7a or arm64-v8a
target_abis: [arm64-v8a]
# The build mode for model(s).
# 'code' for transferring model(s) into cpp code, 'file' for keeping model(s) in protobuf file(s) (.pb).
model_graph_format: code
# 'code' for transferring model data(s) into cpp code, 'file' for keeping model data(s) in file(s) (.data).
model_data_format: code
# One yaml config file can contain multi models' deployment info.
models:
mobilenet_v1:
platform: tensorflow
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
subgraphs:
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
validation_inputs_data:
- https://cnbj1.fds.api.xiaomi.com/mace/inputs/dog.npy
runtime: cpu+gpu
limit_opencl_kernel_time: 0
obfuscate: 0
winograd: 0
squeezenet_v11:
platform: caffe
model_file_path: http://cnbj1-inner-fds.api.xiaomi.net/mace/mace-models/squeezenet/SqueezeNet_v1.1/model.prototxt
weight_file_path: http://cnbj1-inner-fds.api.xiaomi.net/mace/mace-models/squeezenet/SqueezeNet_v1.1/weight.caffemodel
model_sha256_checksum: 625c952063da1569e22d2f499dc454952244d42cd8feca61f05502566e70ae1c
weight_sha256_checksum: 72b912ace512e8621f8ff168a7d72af55910d3c7c9445af8dfbff4c2ee960142
subgraphs:
- input_tensors:
- data
input_shapes:
- 1,227,227,3
output_tensors:
- prob
output_shapes:
- 1,1,1,1000
runtime: cpu+gpu
limit_opencl_kernel_time: 0
obfuscate: 0
winograd: 0
# The name of library
library_name: squeezenet-v10
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
squeezenet-v10: # model tag, which will be used in model loading and must be specific.
platform: caffe
# support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/squeezenet/squeezenet-v1.0.prototxt
weight_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/squeezenet/squeezenet-v1.0.caffemodel
# sha256_checksum of your model's graph and data files.
# get the sha256_checksum: sha256sum path/to/your/file
model_sha256_checksum: db680cf18bb0387ded9c8e9401b1bbcf5dc09bf704ef1e3d3dbd1937e772cae0
weight_sha256_checksum: 9ff8035aada1f9ffa880b35252680d971434b141ec9fbacbe88309f0f9a675ce
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors:
- data
input_shapes:
- 1,227,227,3
output_tensors:
- prob
output_shapes:
- 1,1,1,1000
runtime: cpu+gpu
winograd: 0
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
platform: tensorflow
# path to your tensorflow model's pb file. Support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
# sha256_checksum of your model's pb file.
# use this command to get the sha256_checksum: sha256sum path/to/your/pb/file
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
# cpu, gpu or cpu+gpu
runtime: cpu+gpu
winograd: 0
\ No newline at end of file
......@@ -3,7 +3,6 @@ Operator lists
.. Please keep in chronological order when editing
.. csv-table::
:widths: auto
:header: "Operator","Supported","Remark"
"AVERAGE_POOL_2D","Y",""
......@@ -27,17 +26,17 @@ Operator lists
"LOCAL_RESPONSE_NORMALIZATION","Y",""
"LOGISTIC","Y",""
"LSTM","",""
"MATMUL","Y",""
"MATMUL","Y","Only CPU is supported"
"MAX_POOL_2D","Y",""
"PAD","Y",""
"PSROI_ALIGN","Y",""
"PRELU","Y","Only caffe model is supported"
"REDUCE_MEAN","Y","Only tensorflow model is supported"
"REDUCE_MEAN","Y","Only tensorflow model is supported. For GPU only H + W axis reduce is supported"
"RELU","Y",""
"RELU1","Y",""
"RELU6","Y",""
"RELUX","Y",""
"RESHAPE","Y","Limited support: only internal use of reshape in composed operations is supported"
"RESHAPE","Y","Limited support: GPU is full supported, for CPU only supports softmax-like usage"
"RESIZE_BILINEAR","Y",""
"RNN","",""
"RPN_PROPOSAL_LAYER","Y",""
......
......@@ -3,10 +3,13 @@
set -e -u -o pipefail
pushd ../../../
python tools/converter.py build --config=docs/getting_started/models/demo_app_models.yaml
cp -r builds/mobilenet/include mace/examples/android/macelibrary/src/main/cpp/
cp -r builds/mobilenet/lib mace/examples/android/macelibrary/src/main/cpp/
python tools/converter.py convert --config=mace/examples/android/mobilenet.yml
cp -rf builds/mobilenet/include mace/examples/android/macelibrary/src/main/cpp/
cp -rf builds/mobilenet/model mace/examples/android/macelibrary/src/main/cpp/
bash tools/build-standalone-lib.sh
cp -rf builds/lib mace/examples/android/macelibrary/src/main/cpp/
popd
......
......@@ -14,18 +14,12 @@ cmake_minimum_required(VERSION 3.4.1)
include_directories(${CMAKE_SOURCE_DIR}/)
include_directories(${CMAKE_SOURCE_DIR}/src/main/cpp/include)
file(GLOB static_file ${CMAKE_SOURCE_DIR}/src/main/cpp/lib/arm64-v8a/*.a)
MESSAGE(STATUS "FILE URL = ${CMAKE_SOURCE_DIR}")
MESSAGE(STATUS "FILE URL = ${static_file}")
foreach(fileStr ${static_file})
set(tmpstr ${fileStr})
MESSAGE(STATUS "FILE URL = ${tmpstr}")
endforeach()
add_library (mace_mobile_lib STATIC IMPORTED)
set_target_properties(mace_mobile_lib PROPERTIES IMPORTED_LOCATION ${tmpstr})
set(mace_file ${CMAKE_SOURCE_DIR}/src/main/cpp/lib/arm64-v8a/libmace.a)
set(mobilenet_file ${CMAKE_SOURCE_DIR}/src/main/cpp/model/mobilenet.a)
add_library (mace_lib STATIC IMPORTED)
set_target_properties(mace_lib PROPERTIES IMPORTED_LOCATION ${mace_file})
add_library (mobilenet_lib STATIC IMPORTED)
set_target_properties(mobilenet_lib PROPERTIES IMPORTED_LOCATION ${mobilenet_file})
add_library( # Sets the name of the library.
mace_mobile_jni
......@@ -55,7 +49,8 @@ find_library( # Sets the name of the path variable.
target_link_libraries( # Specifies the target library.
mace_mobile_jni
mace_mobile_lib
mace_lib
mobilenet_lib
# Links the target library to the log library
# included in the NDK.
${log-lib} )
\ No newline at end of file
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
embed_model_data: 1
# The build mode for model(s).
# 'code' stand for transfer model(s) into cpp code, 'proto' for model(s) in protobuf file(s).
build_type: code
linkshared: 0
# One yaml config file can contain multi models' config message.
model_graph_format: code
model_data_format: code
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
mobilenet_v1:
platform: tensorflow
# support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
subgraphs:
- input_tensors: input
input_shapes: 1,224,224,3
output_tensors: MobilenetV1/Predictions/Reshape_1
output_shapes: 1,1001
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
......@@ -28,10 +26,14 @@ models:
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v2/mobilenet-v2-1.0.pb
model_sha256_checksum: 369f9a5f38f3c15b4311c1c84c032ce868da9f371b5f78c13d3ea3c537389bb4
subgraphs:
- input_tensors: input
input_shapes: 1,224,224,3
output_tensors: MobilenetV2/Predictions/Reshape_1
output_shapes: 1,1001
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV2/Predictions/Reshape_1
output_shapes:
- 1,1001
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
......
......@@ -59,6 +59,10 @@ MaceStatus CreateMaceEngineFromCode(
return MaceStatus::MACE_INVALID_ARGS;
}
std::shared_ptr<NetDef> net_def;
{% if embed_model_data %}
(void)model_data_file;
const unsigned char * model_data;
{% endif %}
MaceStatus status = MaceStatus::MACE_SUCCESS;
switch (model_name_map[model_name]) {
{% for i in range(model_tags |length) %}
......@@ -66,12 +70,12 @@ MaceStatus CreateMaceEngineFromCode(
net_def = mace::{{model_tags[i]}}::CreateNet();
engine->reset(new mace::MaceEngine(device_type));
{% if embed_model_data %}
(void)model_data_file;
const unsigned char * model_data =
mace::{{model_tags[i]}}::LoadModelData();
status = (*engine)->Init(net_def.get(), input_nodes, output_nodes, model_data);
model_data = mace::{{model_tags[i]}}::LoadModelData();
status = (*engine)->Init(net_def.get(), input_nodes, output_nodes,
model_data);
{% else %}
status = (*engine)->Init(net_def.get(), input_nodes, output_nodes, model_data_file);
status = (*engine)->Init(net_def.get(), input_nodes, output_nodes,
model_data_file);
{% endif %}
break;
{% endfor %}
......
......@@ -8,16 +8,6 @@ INCLUDE_DIR=builds/include/mace/public
mkdir -p $LIB_DIR
mkdir -p $INCLUDE_DIR
# generate version code
rm -rf mace/codegen/version
mkdir -p mace/codegen/version
bash mace/tools/git/gen_version_source.sh mace/codegen/version/version.cc
# generate tuning code
rm -rf mace/codegen/tuning
mkdir -p mace/codegen/tuning
python mace/python/tools/binary_codegen.py --output_path=mace/codegen/tuning/tuning_params.cc
# copy include headers
cp mace/public/*.h $INCLUDE_DIR/
......@@ -57,7 +47,7 @@ bazel build --config android --config optimization mace:libmace_static --define
cp bazel-genfiles/mace/libmace.a $LIB_DIR/arm64-v8a/
echo "build static lib for linux-x86-64"
bazel build mace:libmace --config optimization --define openmp=true
bazel build mace:libmace_static --config optimization --define openmp=true
cp bazel-genfiles/mace/libmace.a $LIB_DIR/linux-x86-64/
echo "LIB PATH: $LIB_DIR"
......
......@@ -543,23 +543,24 @@ def clear_build_dirs(library_name):
def check_model_converted(library_name, model_name,
model_graph_format, model_data_format):
model_graph_format, model_data_format,
abi):
model_output_dir = \
'%s/%s/%s' % (BUILD_OUTPUT_DIR, library_name, MODEL_OUTPUT_DIR_NAME)
if model_graph_format == ModelFormat.file:
mace_check(os.path.exists("%s/%s.pb" % (model_output_dir, model_name)),
ModuleName.RUN,
"You shuold convert model first.")
"You should convert model first.")
else:
mace_check(os.path.exists("%s/%s.a" %
(model_output_dir, library_name)),
model_lib_path = get_model_lib_output_path(library_name, abi)
mace_check(os.path.exists(model_lib_path),
ModuleName.RUN,
"You shuold convert model first.")
"You should convert model first.")
if model_data_format == ModelFormat.file:
mace_check(os.path.exists("%s/%s.data" %
(model_output_dir, model_name)),
ModuleName.RUN,
"You shuold convert model first.")
"You should convert model first.")
################################
......@@ -716,10 +717,10 @@ def convert_model(configs):
StringFormatter.block("Model %s converted" % model_name))
def get_model_lib_output_path(library_name):
library_out_dir = os.path.join(BUILD_OUTPUT_DIR, library_name,
MODEL_OUTPUT_DIR_NAME)
lib_output_path = "%s/%s.a" % (library_out_dir, library_name)
def get_model_lib_output_path(library_name, abi):
lib_output_path = os.path.join(BUILD_OUTPUT_DIR, library_name,
MODEL_OUTPUT_DIR_NAME, abi,
"%s.a" % library_name)
return lib_output_path
......@@ -728,13 +729,13 @@ def build_model_lib(configs, address_sanitizer):
# create model library dir
library_name = configs[YAMLKeyword.library_name]
model_lib_output_path = get_model_lib_output_path(library_name)
library_out_dir = os.path.dirname(model_lib_output_path)
if not os.path.exists(library_out_dir):
os.makedirs(library_out_dir)
for target_abi in configs[YAMLKeyword.target_abis]:
hexagon_mode = get_hexagon_mode(configs)
model_lib_output_path = get_model_lib_output_path(library_name,
target_abi)
library_out_dir = os.path.dirname(model_lib_output_path)
if not os.path.exists(library_out_dir):
os.makedirs(library_out_dir)
sh_commands.bazel_build(
MODEL_LIB_TARGET,
......@@ -841,7 +842,7 @@ def build_mace_run(configs, target_abi, enable_openmp, address_sanitizer,
if configs[YAMLKeyword.model_graph_format] == ModelFormat.code:
mace_check(os.path.exists(ENGINE_CODEGEN_DIR),
ModuleName.RUN,
"You shuold convert model first.")
"You should convert model first.")
build_arg = "--per_file_copt=mace/tools/validation/mace_run.cc@-DMODEL_GRAPH_FORMAT_CODE" # noqa
sh_commands.bazel_build(
......@@ -887,8 +888,9 @@ def build_example(configs, target_abi, enable_openmp, mace_lib_type):
if configs[YAMLKeyword.model_graph_format] == ModelFormat.code:
mace_check(os.path.exists(ENGINE_CODEGEN_DIR),
ModuleName.RUN,
"You shuold convert model first.")
model_lib_path = get_model_lib_output_path(library_name)
"You should convert model first.")
model_lib_path = get_model_lib_output_path(library_name,
target_abi)
sh.cp("-f", model_lib_path, LIB_CODEGEN_DIR)
build_arg = "--per_file_copt=mace/examples/cli/example.cc@-DMODEL_GRAPH_FORMAT_CODE" # noqa
......@@ -912,12 +914,6 @@ def tuning(library_name, model_name, model_config,
mace_lib_type):
print('* Tuning, it may take some time...')
# clear opencl output dir
opencl_output_dir = os.path.join(
BUILD_OUTPUT_DIR, library_name, OUTPUT_OPENCL_BINARY_DIR_NAME)
if os.path.exists(opencl_output_dir):
sh.rm('-rf', opencl_output_dir)
build_tmp_binary_dir = get_build_binary_dir(library_name, target_abi)
mace_run_name = MACE_RUN_STATIC_NAME
link_dynamic = False
......@@ -994,16 +990,7 @@ def run_specific_target(flags, configs, target_abi,
mace_lib_type = flags.mace_lib_type
embed_model_data = \
configs[YAMLKeyword.model_data_format] == ModelFormat.code
opencl_output_bin_path = ""
opencl_parameter_path = ""
build_tmp_binary_dir = get_build_binary_dir(library_name, target_abi)
if configs[YAMLKeyword.target_socs] and target_abi != ABIType.host:
opencl_output_bin_path = get_opencl_binary_output_path(
library_name, target_abi, target_soc, serial_num
)
opencl_parameter_path = get_opencl_parameter_output_path(
library_name, target_abi, target_soc, serial_num
)
# get target name for run
if flags.example:
......@@ -1023,7 +1010,8 @@ def run_specific_target(flags, configs, target_abi,
for model_name in configs[YAMLKeyword.models]:
check_model_converted(library_name, model_name,
configs[YAMLKeyword.model_graph_format],
configs[YAMLKeyword.model_data_format])
configs[YAMLKeyword.model_data_format],
target_abi)
if target_abi == ABIType.host:
device_name = ABIType.host
else:
......@@ -1049,10 +1037,14 @@ def run_specific_target(flags, configs, target_abi,
get_build_model_dirs(library_name, model_name, target_abi,
target_soc, serial_num,
model_config[YAMLKeyword.model_file_path])
# clear temp model output dir
if os.path.exists(model_output_dir):
sh.rm("-rf", model_output_dir)
os.makedirs(model_output_dir)
is_tuned = False
model_opencl_output_bin_path = ""
model_opencl_parameter_path = ""
# tuning for specified soc
if not flags.address_sanitizer \
and not flags.example \
......@@ -1067,6 +1059,23 @@ def run_specific_target(flags, configs, target_abi,
target_abi, target_soc, serial_num,
mace_lib_type)
model_output_dirs.append(model_output_dir)
model_opencl_output_bin_path =\
"%s/%s/%s" % (model_output_dir,
BUILD_TMP_OPENCL_BIN_DIR,
CL_COMPILED_BINARY_FILE_NAME)
model_opencl_parameter_path = \
"%s/%s/%s" % (model_output_dir,
BUILD_TMP_OPENCL_BIN_DIR,
CL_TUNED_PARAMETER_FILE_NAME)
sh_commands.clear_phone_data_dir(serial_num, PHONE_DATA_DIR)
is_tuned = True
elif target_abi != ABIType.host and target_soc:
model_opencl_output_bin_path = get_opencl_binary_output_path(
library_name, target_abi, target_soc, serial_num
)
model_opencl_parameter_path = get_opencl_parameter_output_path(
library_name, target_abi, target_soc, serial_num
)
# generate input data
sh_commands.gen_random_input(
......@@ -1114,8 +1123,8 @@ def run_specific_target(flags, configs, target_abi,
gpu_priority_hint=flags.gpu_priority_hint,
runtime_failure_ratio=flags.runtime_failure_ratio,
address_sanitizer=flags.address_sanitizer,
opencl_binary_file=opencl_output_bin_path,
opencl_parameter_file=opencl_parameter_path,
opencl_binary_file=model_opencl_output_bin_path,
opencl_parameter_file=model_opencl_parameter_path,
libmace_dynamic_library_path=LIBMACE_DYNAMIC_PATH,
link_dynamic=link_dynamic,
)
......@@ -1142,11 +1151,7 @@ def run_specific_target(flags, configs, target_abi,
phone_data_dir=PHONE_DATA_DIR,
caffe_env=flags.caffe_env)
if flags.report and flags.round > 0:
opencl_parameter_bin_path = get_opencl_parameter_output_path(
library_name, target_abi, target_soc, serial_num
)
tuned = device_type == DeviceType.GPU\
and os.path.exists(opencl_parameter_bin_path)
tuned = is_tuned and device_type == DeviceType.GPU
report_run_statistics(
run_output, target_abi, serial_num,
model_name, device_type, flags.report_dir,
......@@ -1159,6 +1164,12 @@ def run_specific_target(flags, configs, target_abi,
opencl_parameter_bin_path = get_opencl_parameter_output_path(
library_name, target_abi, target_soc, serial_num
)
# clear opencl output dir
if os.path.exists(opencl_output_bin_path):
sh.rm('-rf', opencl_output_bin_path)
if os.path.exists(opencl_parameter_bin_path):
sh.rm('-rf', opencl_parameter_bin_path)
# merge all models' OpenCL binaries together
sh_commands.merge_opencl_binaries(
model_output_dirs, CL_COMPILED_BINARY_FILE_NAME,
......@@ -1228,7 +1239,7 @@ def build_benchmark_model(configs, target_abi, enable_openmp, mace_lib_type):
if configs[YAMLKeyword.model_graph_format] == ModelFormat.code:
mace_check(os.path.exists(ENGINE_CODEGEN_DIR),
ModuleName.BENCHMARK,
"You shuold convert model first.")
"You should convert model first.")
build_arg = "--per_file_copt=mace/benchmark/benchmark_model.cc@-DMODEL_GRAPH_FORMAT_CODE" # noqa
sh_commands.bazel_build(benchmark_target,
......@@ -1271,7 +1282,8 @@ def bm_specific_target(flags, configs, target_abi, target_soc, serial_num):
for model_name in configs[YAMLKeyword.models]:
check_model_converted(library_name, model_name,
configs[YAMLKeyword.model_graph_format],
configs[YAMLKeyword.model_data_format])
configs[YAMLKeyword.model_data_format],
target_abi)
if target_abi == ABIType.host:
device_name = ABIType.host
else:
......
......@@ -780,7 +780,7 @@ def tuning_run(abi,
print("Running finished!\n")
return stdout
return stdout
def validate_model(abi,
......
......@@ -191,7 +191,7 @@ def parse_args():
"""Parses command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument(
"--platform", type=str, default="", help="Tensorflow or Caffe.")
"--platform", type=str, default="", help="TensorFlow or Caffe.")
parser.add_argument(
"--model_file",
type=str,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册