提交 f6e62ec8 编写于 作者: L Liangliang He

Merge branch 'refactor_docs' into 'master'

Refactor docs

See merge request !644
......@@ -96,6 +96,10 @@ Add test and benchmark
It's strongly recommended to add unit tests and micro benchmarks for your
new Op. If you wish to contribute back, it's required.
Add Op in model converter
-------------------------
You need to add this new Op in the model converter.
Document the new Op
---------------------
Finally, add an entry in operator table in the document.
How to run tests
=================
To run tests, you need to first cross compile the code, push the binary
into the device and then execute the binary. To automate this process,
MACE provides `tools/bazel_adb_run.py` tool.
You need to make sure your device has been connected to your dev pc before running tests.
Run unit tests
---------------
MACE use [gtest](https://github.com/google/googletest) for unit tests.
* Run all unit tests defined in a Bazel target, for example, run `ops_test`:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_test" \
--run_target=True
```
* Run unit tests with [gtest](https://github.com/google/googletest) filter,
for example, run `Conv2dOpTest` unit tests:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_test" \
--run_target=True \
--args="--gtest_filter=Conv2dOpTest*"
```
Run micro benchmarks
--------------------
MACE provides a micro benchmark framework for performance tuning.
* Run all micro benchmarks defined in a Bazel target, for example, run all
`ops_benchmark` micro benchmarks:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_benchmark" \
--run_target=True
```
* Run micro benchmarks with regex filter, for example, run all `CONV_2D` GPU
micro benchmarks:
```sh
python tools/bazel_adb_run.py --target="//mace/ops:ops_benchmark" \
--run_target=True \
--args="--filter=MACE_BM_CONV_2D_.*_GPU"
```
Memory layout
===========================
==============
CPU runtime memory layout
-------------------------
--------------------------
The CPU tensor buffer is organized in the following order:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
......@@ -22,7 +20,7 @@ The CPU tensor buffer is organized in the following order:
- W
GPU runtime memory layout
-----------------------------
--------------------------
GPU runtime implementation base on OpenCL, which uses 2D image with CL_RGBA
channel order as the tensor storage. This requires OpenCL 1.2 and above.
......@@ -34,14 +32,12 @@ The following tables describe the mapping from different type of tensors to
2D RGBA Image.
Input/Output Tensor
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
The Input/Output Tensor is stored in NHWC format:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
......@@ -64,9 +60,7 @@ Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
......@@ -82,12 +76,10 @@ coordination relation between **Image** and **Buffer**.
- k=[0, 4)
Filter Tensor
~~~~~~~~~~~~~
~~~~~~~~~~~~~~
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor
- Buffer
......@@ -106,9 +98,7 @@ Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
......@@ -121,12 +111,10 @@ coordination relation between **Image** and **Buffer**.
- only support multiplier == 1, k=[0, 4)
1-D Argument Tensor
~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Buffer
......@@ -141,9 +129,7 @@ Each Pixel of **Image** contains 4 elements. The below table list the
coordination relation between **Image** and **Buffer**.
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Tensor type
- Pixel coordinate relationship
......
Create a model deployment file
==============================
The first step to deploy your models is to create a YAML model deployment
file.
One deployment file describes a case of model deployment,
each file will generate one static library (if more than one ABIs specified,
there will be one static library for each). The deployment file can contain
one or more models, for example, a smart camera application may contain face
recognition, object recognition, and voice recognition models, which can be
defined in one deployment file.
Example
----------
Here is an example deployment file used by an Android demo application.
TODO: change this example file to the demo deployment file
(reuse the same file) and rename to a reasonable name.
.. literalinclude:: models/demo_app_models.yaml
:language: yaml
Configurations
--------------------
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - library_name
- library name.
* - target_abis
- The target ABI to build, can be one or more of 'host', 'armeabi-v7a' or 'arm64-v8a'.
* - target_socs
- [optional] build for specified socs if you just want use the model for that socs.
* - embed_model_data
- Whether embedding model weights as the code, default to 0.
* - build_type
- model build type, can be ['proto', 'code']. 'proto' for converting model to ProtoBuf file and 'code' for converting model to c++ code.
* - linkshared
- [optional] Use dynamic linking for libmace library when setting to 1, or static linking when setting to 0, default to 0.
* - model_name
- model name, should be unique if there are multiple models.
**LIMIT: if build_type is code, model_name will used in c++ code so that model_name must fulfill c++ name specification.**
* - platform
- The source framework, one of [tensorflow, caffe].
* - model_file_path
- The path of the model file, can be local or remote.
* - model_sha256_checksum
- The SHA256 checksum of the model file.
* - weight_file_path
- [optional] The path of the model weights file, used by Caffe model.
* - weight_sha256_checksum
- [optional] The SHA256 checksum of the weight file, used by Caffe model.
* - subgraphs
- subgraphs key. **DO NOT EDIT**
* - input_tensors
- The input tensor names (tensorflow), top name of inputs' layer (caffe). one or more strings.
* - output_tensors
- The output tensor names (tensorflow), top name of outputs' layer (caffe). one or more strings.
* - input_shapes
- The shapes of the input tensors, in NHWC order.
* - output_shapes
- The shapes of the output tensors, in NHWC order.
* - input_ranges
- The numerical range of the input tensors, default [-1, 1]. It is only for test.
* - validation_inputs_data
- [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used.
* - runtime
- The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU.
* - data_type
- [optional] The data type used for specified runtime. [fp16_fp32, fp32_fp32] for GPU, default is fp16_fp32. [fp32] for CPU. [uint8] for DSP.
* - limit_opencl_kernel_time
- [optional] Whether splitting the OpenCL kernel within 1 ms to keep UI responsiveness, default to 0.
* - nnlib_graph_mode
- [optional] Control the DSP precision and performance, default to 0 usually works for most cases.
* - obfuscate
- [optional] Whether to obfuscate the model operator name, default to 0.
* - winograd
- [optional] Whether to enable Winograd convolution, **will increase memory consumption**.
How to build
============
Supported Platforms
-------------------
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - Platform
- Explanation
* - TensorFlow
- >= 1.6.0.
* - Caffe
- >= 1.0.
Environment Requirement
-------------------------
MACE requires the following dependencies:
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - software
- version
- install command
* - bazel
- >= 0.13.0
- `bazel installation guide <https://docs.bazel.build/versions/master/install.html>`__
* - android-ndk
- r15c/r16b
- `NDK installation guide <https://developer.android.com/ndk/guides/setup#install>`__ or refers to the docker file
* - adb
- >= 1.0.32
- apt-get install android-tools-adb
* - tensorflow
- >= 1.6.0
- pip install -I tensorflow==1.6.0 (if you use tensorflow model)
* - numpy
- >= 1.14.0
- pip install -I numpy==1.14.0
* - scipy
- >= 1.0.0
- pip install -I scipy==1.0.0
* - jinja2
- >= 2.10
- pip install -I jinja2==2.10
* - PyYaml
- >= 3.12.0
- pip install -I pyyaml==3.12
* - sh
- >= 1.12.14
- pip install -I sh==1.12.14
* - filelock
- >= 3.0.0
- pip install -I filelock==3.0.0
* - docker (for caffe)
- >= 17.09.0-ce
- `docker installation guide <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__
.. note::
``export ANDROID_NDK_HOME=/path/to/ndk`` to specify ANDROID_NDK_HOME
MACE provides a Dockerfile with these dependencies installed,
you can build the image from it,
.. code:: sh
docker build -t registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite ./docker/mace-dev-lite
or pull the pre-built image from Docker Hub,
.. code:: sh
docker pull registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite
and then run the container with the following command.
.. code:: sh
# Create container
# Set 'host' network to use ADB
docker run -it --privileged -v /dev/bus/usb:/dev/bus/usb --net=host \
-v /local/path:/container/path \
registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite \
/bin/bash
Usage
--------
=======================================
1. Pull MACE source code
=======================================
.. code:: sh
git clone https://github.com/XiaoMi/mace.git
git fetch --all --tags --prune
# Checkout the latest tag (i.e. release version)
tag_name=`git describe --abbrev=0 --tags`
git checkout tags/${tag_name}
.. note::
It's highly recommanded to use a release version instead of master branch.
============================
2. Model Preprocessing
============================
- TensorFlow
TensorFlow provides
`Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
to improve inference efficiency by making various optimizations like Ops
folding, redundant node removal etc. It's strongly recommended to make these
optimizations before graph conversion step.
The following commands show the suggested graph transformations and
optimizations for different runtimes,
.. code:: sh
# CPU/GPU:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
.. code:: sh
# DSP:
./transform_graph \
--in_graph=tf_model.pb \
--out_graph=tf_model_opt.pb \
--inputs='input' \
--outputs='output' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
backport_concatv2
quantize_weights(minimum_size=2)
quantize_nodes
strip_unused_nodes
sort_by_execution_order'
- Caffe
MACE converter only supports Caffe 1.0+, you need to upgrade
your models with Caffe built-in tool when necessary,
.. code:: bash
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
==============================
3. Build static/shared library
==============================
-----------------
3.1 Overview
-----------------
MACE can build either static or shared library (which is
specified by ``linkshared`` in YAML model deployment file).
The followings are two use cases.
* **Build well tuned library for specific SoCs**
When ``target_socs`` is specified in YAML model deployment file, the build
tool will enable automatic tuning for GPU kernels. This usually takes some
time to finish depending on the complexity of your model.
.. note::
You should plug in device(s) with the correspoding SoC(s).
* **Build generic library for all SoCs**
When ``target_socs`` is not specified, the generated library is compatible
with general devices.
.. note::
There will be around of 1 ~ 10% performance drop for GPU
runtime compared to the well tuned library.
MACE provide command line tool (``tools/converter.py``) for
model conversion, compiling, test run, benchmark and correctness validation.
.. note::
1. ``tools/converter.py`` should be run at the root directory of this project.
2. When ``linkshared`` is set to ``1``, ``build_type`` should be ``proto``.
And currently only android devices supported.
------------------------------------------
3.2 \ ``tools/converter.py``\ usage
------------------------------------------
**Commands**
* **build**
build library and test tools.
.. code:: sh
# Build library
python tools/converter.py build --config=models/config.yaml
* **run**
run the model(s).
.. code:: sh
# Test model run time
python tools/converter.py run --config=models/config.yaml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=models/config.yaml --validate
# Check the memory usage of the model(**Just keep only one model in configuration file**)
python tools/converter.py run --config=models/config.yaml --round=10000 &
sleep 5
adb shell dumpsys meminfo | grep mace_run
kill %1
.. warning::
``run`` rely on ``build`` command, you should ``run`` after ``build``.
* **benchmark**
benchmark and profiling model.
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=models/config.yaml
.. warning::
``benchmark`` rely on ``build`` command, you should ``benchmark`` after ``build``.
**Common arguments**
.. list-table::
:widths: auto
:header-rows: 1
:align: left
* - option
- type
- default
- commands
- explanation
* - --omp_num_threads
- int
- -1
- ``run``/``benchmark``
- number of threads
* - --cpu_affinity_policy
- int
- 1
- ``run``/``benchmark``
- 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_priority_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
Using ``-h`` to get detailed help.
.. code:: sh
python tools/converter.py -h
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
=============
4. Deployment
=============
``build`` command will generate the static/shared library, model files and
header files and package them as
``build/${library_name}/libmace_${library_name}.tar.gz``.
- The generated ``static`` libraries are organized as follows,
.. code::
build/
└── mobilenet-v2-gpu
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── libmace_mobilenet-v2-gpu.tar.gz
├── lib
│   ├── arm64-v8a
│   │   └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
│   └── armeabi-v7a
│   └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
- The generated ``shared`` libraries are organized as follows,
.. code::
build
└── mobilenet-v2-gpu
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── lib
│   ├── arm64-v8a
│   │   ├── libgnustl_shared.so
│   │   └── libmace.so
│   └── armeabi-v7a
│   ├── libgnustl_shared.so
│   └── libmace.so
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
.. note::
1. DSP runtime depends on ``libhexagon_controller.so``.
2. ``${MODEL_TAG}.pb`` file will be generated only when ``build_type`` is ``proto``.
3. ``${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin`` will
be generated only when ``target_socs`` and ``gpu`` runtime are specified.
4. Generated shared library depends on ``libgnustl_shared.so``.
.. warning::
``${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin`` depends
on the OpenCL version of the device, you should maintan the compatibility or
configure compiling cache store with ``ConfigKVStorageFactory``.
=========================================
5. How to use the library in your project
=========================================
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the build_type is code
#include "mace/public/mace_engine_factory.h"
// 0. Set pre-compiled OpenCL binary program file paths when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 1. Set compiled OpenCL kernel cache, this is used to reduce the
// initialization time since the compiling is too slow. It's suggested
// to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel
// recompilations.
const std::string file_path ="path/to/opencl_cache_file";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 2. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
// Create Engine from model file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// 5. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// Load input here
// ...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);
Introduction
============
Mobile AI Compute Engine (MACE) is a deep learning inference framework optimized for
mobile heterogeneous computing platforms. The following figure shows the
overall architecture.
.. image:: mace-arch.png
:scale: 40 %
:align: center
Model format
------------
MACE defines a customized model format which is similar to
Caffe2. The MACE model can be converted from exported models by TensorFlow
and Caffe. A YAML file is used to describe the model deployment details. In the
next chapter, there is a detailed guide showing how to create this YAML file.
Model conversion
----------------
Currently, we provide model converters for TensorFlow and Caffe. And
more frameworks will be supported in the future.
Model loading
-------------
The MACE model format contains two parts: the model graph definition and
the model parameter tensors. The graph part utilizes Protocol Buffers
for serialization. All the model parameter tensors are concatenated
together into a continuous byte array, and we call this array tensor data in
the following paragraphs. In the model graph, the tensor data offsets
and lengths are recorded.
The models can be loaded in 3 ways:
1. Both model graph and tensor data are dynamically loaded externally
(by default, from file system, but the users are free to choose their own
implementations, for example, with compression or encryption). This
approach provides the most flexibility but the weakest model protection.
2. Both model graph and tensor data are converted into C++ code and loaded
by executing the compiled code. This approach provides the strongest
model protection and simplest deployment.
3. The model graph is converted into C++ code and constructed as the second
approach, and the tensor data is loaded externally as the first approach.
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
embed_model_data: 1
# The build mode for model(s).
# 'code' stand for transfer model(s) into cpp code, 'proto' for model(s) in protobuf file(s).
build_type: code
linkshared: 0
# One yaml config file can contain multi models' config message.
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
platform: tensorflow
# support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
subgraphs:
- input_tensors: input
input_shapes: 1,224,224,3
output_tensors: MobilenetV1/Predictions/Reshape_1
output_shapes: 1,1001
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
obfuscate: 0
winograd: 0
mobilenet_v2:
platform: tensorflow
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v2/mobilenet-v2-1.0.pb
model_sha256_checksum: 369f9a5f38f3c15b4311c1c84c032ce868da9f371b5f78c13d3ea3c537389bb4
subgraphs:
- input_tensors: input
input_shapes: 1,224,224,3
output_tensors: MobilenetV2/Predictions/Reshape_1
output_shapes: 1,1001
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
obfuscate: 0
winograd: 0
......@@ -6,21 +6,37 @@ The main documentation is organized into the following sections:
.. toctree::
:maxdepth: 1
:caption: Getting started
:name: sec-start
:caption: Introduction
:name: sec-intro
getting_started/introduction
getting_started/create_a_model_deployment
getting_started/how_to_build
getting_started/op_lists
introduction
.. toctree::
:maxdepth: 1
:caption: Development
:caption: Installation
:name: sec-install
installation/env_requirement
installation/using_docker
installation/manual_setup
.. toctree::
:maxdepth: 1
:caption: User guide
:name: sec-user
user_guide/basic_usage
user_guide/advanced_usage
user_guide/op_lists
.. toctree::
:maxdepth: 1
:caption: Developer guide
:name: sec-devel
development/contributing
development/adding_a_new_op
development/how_to_run_tests
development/memory_layout
.. toctree::
......
Environment Requirement
========================
MACE requires the following dependencies:
Necessary Dependencies:
------------------------
.. list-table::
:header-rows: 1
* - software
- version
- install command
* - bazel
- >= 0.13.0
- `bazel installation guide <https://docs.bazel.build/versions/master/install.html>`__
* - android-ndk
- r15c/r16b
- `NDK installation guide <https://developer.android.com/ndk/guides/setup#install>`__
* - adb
- >= 1.0.32
- apt-get install android-tools-adb
* - cmake
- >= 3.11.3
- apt-get install cmake
* - numpy
- >= 1.14.0
- pip install -I numpy==1.14.0
* - scipy
- >= 1.0.0
- pip install -I scipy==1.0.0
* - jinja2
- >= 2.10
- pip install -I jinja2==2.10
* - PyYaml
- >= 3.12.0
- pip install -I pyyaml==3.12
* - sh
- >= 1.12.14
- pip install -I sh==1.12.14
* - filelock
- >= 3.0.0
- pip install -I filelock==3.0.0
.. note::
``export ANDROID_NDK_HOME=/path/to/ndk`` to specify ANDROID_NDK_HOME
Optional Dependencies:
-----------------------
.. list-table::
:header-rows: 1
* - software
- version
- install command
* - tensorflow
- >= 1.6.0
- pip install -I tensorflow==1.6.0 (if you use tensorflow model)
* - docker (for caffe)
- >= 17.09.0-ce
- `docker installation guide <https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository>`__
Manual setup
=============
The setup steps are based on ``Ubuntu``. And dependencies to install can refer to :doc:`env_requirement`.
Install Necessary Dependencies
-------------------------------
Install Bazel
~~~~~~~~~~~~~~
Recommend bazel with version larger than ``0.13.0`` (Refer to `Bazel documentation <https://docs.bazel.build/versions/master/install.html>`__).
.. code:: sh
export BAZEL_VERSION=0.13.1
mkdir /bazel && \
cd /bazel && \
wget https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
chmod +x bazel-*.sh && \
./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
cd / && \
rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
Install NDK
~~~~~~~~~~~~
Recommend NDK with version r15c or r16 (Refer to `NDK installation guide <https://developer.android.com/ndk/guides/setup#install>`__).
.. code:: sh
# Download NDK r15c
cd /opt/ && \
wget -q https://dl.google.com/android/repository/android-ndk-r15c-linux-x86_64.zip && \
unzip -q android-ndk-r15c-linux-x86_64.zip && \
rm -f android-ndk-r15c-linux-x86_64.zip
export ANDROID_NDK_VERSION=r15c
export ANDROID_NDK=/opt/android-ndk-${ANDROID_NDK_VERSION}
export ANDROID_NDK_HOME=${ANDROID_NDK}
# add to PATH
export PATH=${PATH}:${ANDROID_NDK_HOME}
Install extra tools
~~~~~~~~~~~~~~~~~~~~
.. code:: sh
apt-get install -y --no-install-recommends \
cmake \
android-tools-adb
pip install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com setuptools
pip install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com \
"numpy>=1.14.0" \
scipy \
jinja2 \
pyyaml \
sh==1.12.14 \
pycodestyle==2.4.0 \
filelock
Install Optional Dependencies
------------------------------
.. code:: sh
pip install -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com tensorflow==1.6.0
Using docker
=============
Pull or Build docker image
---------------------------
MACE provides docker images with necessary dependencies installed and also Dockerfiles for images building,
you can pull the existing ones directly or build them from the Dockerfiles.
In most cases, the ``lite edition`` image can satify developer's basic needs.
.. note::
It's highly recommended to pull built images.
- ``lite edition`` docker image.
.. code:: sh
# Pull lite edition docker image
docker pull registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite
# Build lite edition docker image
docker build -t registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite ./docker/mace-dev-lite
- ``full edition`` docker image (which contains multiple NDK versions and other dev tools).
.. code:: sh
# Pull full edition docker image
docker pull registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev
# Build full edition docker image
docker build -t registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev ./docker/mace-dev
.. note::
We will show steps with lite edition later.
Using the image
-----------------
Create container with the following command
.. code:: sh
# Create a container named `mace-dev`
docker run -it --privileged -d --name mace-dev \
-v /dev/bus/usb:/dev/bus/usb --net=host \
-v /local/path:/container/path \
registry.cn-hangzhou.aliyuncs.com/xiaomimace/mace-dev-lite
# Execute an interactive bash shell on the container
docker exec -it mace-dev /bin/bash
Introduction
============
Mobile AI Compute Engine (MACE) is a deep learning inference framework optimized for
mobile heterogeneous computing platforms. MACE covers common mobile computing devices (CPU, GPU and DSP),
and supplies tools and document to help users to deploy neural network model to mobile devices.
MACE has been widely used in Xiaomi and proved with industry leading performance and stability.
Framework
---------
The following figure shows the overall architecture.
.. image:: mace-arch.png
:scale: 40 %
:align: center
==========
MACE Model
==========
MACE defines a customized model format which is similar to
Caffe2. The MACE model can be converted from exported models by TensorFlow
and Caffe.
================
MACE Interpreter
================
Mace Interpreter mainly parses the NN graph and manages the tensors in the graph.
=======
Runtime
=======
CPU/GPU/DSP runtime correspond to the Ops for different devices.
Workflow
--------
The following figure shows the basic work flow of MACE.
.. image:: mace-work-flow.png
:scale: 60 %
:align: center
==================================
1. Configure model deployment file
==================================
Model deploy configuration file (.yml) describe the information of the model and library,
MACE will build the library based on the file.
==================
2. Build libraries
==================
Build MACE dynamic or static libraries.
==================
3. Convert model
==================
Convert TensorFlow or Caffe model to MACE model.
===========
4.1. Deploy
===========
Integrate the MACE library to your application and run with MACE API.
==============
4.2. Run (CLI)
==============
There are command line tools to run models, which could be used for testing time, memory usage and correctness.
==============
4.3. Benchmark
==============
MACE supplies Benchmark tool to look up the run time of every Operation in the model.
简介
----
Mobile AI Compute Engine (MACE) 是一个专为移动端异构计算设备优化的深度学习前向预测框架。
MACE覆盖了常见的移动端计算设备(CPU,GPU和DSP),并且提供了完整的工具链和文档,用户借助MACE能够
很方便地在移动端部署深度学习模型。MACE已经在小米内部广泛使用并且被充分验证具有业界领先的性能和稳定性。
框架
----
下图描述了MACE的基本框架.
.. image:: mace-arch.png
:scale: 60 %
:align: center
==============
MACE Model
==============
MACE定义了自有的模型格式(类似于Caffe2),通过MACE提供的工具可以将Caffe和TensorFlow的模型
转为MACE模型。
=================
MACE Interpreter
=================
MACE Interpreter主要负责解析运行神经网络图(DAG)并管理网络中的Tensors。
=======
Runtime
=======
CPU/GPU/DSP Runtime对应于各个计算设备的算子实现。
使用流程
------------
下图描述了MACE使用的基本流程。
.. image:: mace-work-flow-zh.png
:scale: 60 %
:align: center
==================================
1. 配置模型部署文件(.yml)
==================================
模型部署文件详细描述了需要部署的模型以及生成库的信息,MACE根据该文件最终生成对应的库文件。
==================================
2. 编译MACE库
==================================
编译MACE的静态库或者动态库。
==================
3. 转换模型
==================
将TensorFlow 或者 Caffe的模型转为MACE的模型.
==================================
4.1. 部署
==================================
根据不同使用目的集成Build阶段生成的库文件,然后调用MACE相应的接口执行模型。
==================================
4.2. 命令行运行
==================================
MACE提供了命令行工具,可以在命令行运行模型,可以用来测试模型运行时间,内存占用和正确性。
==================================
4.3. Benchmark
==================================
MACE提供了命令行benchmark工具,可以细粒度的查看模型中所涉及的所有算子的运行时间。
==============
Advanced usage
==============
This part contains the full usage of MACE.
=========
Overview
=========
As mentioned in the previous part, a model deployment file defines a case of model deployment.
The whole building process is loading a deployment file, converting models, building MACE and packing generated files.
================
Deployment file
================
One deployment file will generate one library normally, but if more than one ABIs are specified,
one library will be generated for each ABI.
A deployment file can also contain multiple models. For example, an AI camera application may
contain face recognition, object recognition, and voice recognition models, all of which can be defined
in one deployment file.
* **Example**
Here is an example deployment file used by an Android demo application.
.. literalinclude:: models/demo_app_models.yml
:language: yaml
* **Configurations**
.. list-table::
:header-rows: 1
* - Options
- Usage
* - library_name
- Library name.
* - target_abis
- The target ABI(s) to build, could be 'host', 'armeabi-v7a' or 'arm64-v8a'.
If more than one ABIs will be used, separate them by commas.
* - target_socs
- [optional] Build for specific SoCs.
* - model_graph_format
- model graph format, could be 'file' or 'code'. 'file' for converting model graph to ProtoBuf file(.pb) and 'code' for converting model graph to c++ code.
* - model_data_format
- model data format, could be 'file' or 'code'. 'file' for converting model weight to data file(.data) and 'code' for converting model weight to c++ code.
* - model_name
- model name should be unique if there are more than one models.
**LIMIT: if build_type is code, model_name will be used in c++ code so that model_name must comply with c++ name specification.**
* - platform
- The source framework, tensorflow or caffe.
* - model_file_path
- The path of your model file which can be local path or remote URL.
* - model_sha256_checksum
- The SHA256 checksum of the model file.
* - weight_file_path
- [optional] The path of Caffe model weights file.
* - weight_sha256_checksum
- [optional] The SHA256 checksum of Caffe model weights file.
* - subgraphs
- subgraphs key. **DO NOT EDIT**
* - input_tensors
- The input tensor name(s) (tensorflow) or top name(s) of inputs' layer (caffe).
If there are more than one tensors, use one line for a tensor.
* - output_tensors
- The output tensor name(s) (tensorflow) or top name(s) of outputs' layer (caffe).
If there are more than one tensors, use one line for a tensor.
* - input_shapes
- The shapes of the input tensors, in NHWC order.
* - output_shapes
- The shapes of the output tensors, in NHWC order.
* - input_ranges
- The numerical range of the input tensors' data, default [-1, 1]. It is only for test.
* - validation_inputs_data
- [optional] Specify Numpy validation inputs. When not provided, [-1, 1] random values will be used.
* - runtime
- The running device, one of [cpu, gpu, dsp, cpu_gpu]. cpu_gpu contains CPU and GPU model definition so you can run the model on both CPU and GPU.
* - data_type
- [optional] The data type used for specified runtime. [fp16_fp32, fp32_fp32] for GPU, default is fp16_fp32, [fp32] for CPU and [uint8] for DSP.
* - limit_opencl_kernel_time
- [optional] Whether splitting the OpenCL kernel within 1 ms to keep UI responsiveness, default is 0.
* - nnlib_graph_mode
- [optional] Control the DSP precision and performance, default to 0 usually works for most cases.
* - obfuscate
- [optional] Whether to obfuscate the model operator name, default to 0.
* - winograd
- [optional] Which type winograd to use, could be [0, 2, 4]. 0 for disable winograd, 2 and 4 for enable winograd, 4 may be faster than 2 but may take more memory.
.. note::
Some command tools:
.. code:: bash
# Get device's soc info.
adb shell getprop | grep platform
# command for generating sha256_sum
sha256sum /path/to/your/file
==============
Advanced Usage
==============
There are two common advanced use cases: 1. convert a model to CPP code. 2. tuning for specific SOC if use GPU.
* **Convert model(s) to CPP code**
.. warning::
If you want to use this case, you can just use static mace library.
* **1. Change the model deployment file(.yml)**
If you want to protect your model, you can convert model to CPP code. there are also two cases:
* convert model graph to code and model weight to file with below model configuration.
.. code:: sh
model_graph_format: code
model_data_format: file
* convert both model graph and model weight to code with below model configuration.
.. code:: sh
model_graph_format: code
model_data_format: code
.. note::
Another model protection method is using ``obfuscate`` to obfuscate names of model's operators.
* **2. Convert model(s) to code**
.. code:: sh
python tools/converter.py convert --config=/path/to/model_deployment_file.yml
The command will generate **${library_name}.a** in **builds/${library_name}/model** directory and
** *.h ** in **builds/${library_name}/include** like the following dir-tree.
.. code::
# model_graph_format: code
# model_data_format: file
builds
├── include
│   └── mace
│   └── public
│   ├── mace_engine_factory.h
│   └── mobilenet_v1.h
└── model
   ├── mobilenet-v1.a
   └── mobilenet_v1.data
* **3. Deployment**
* Link `libmace.a` and `${library_name}.a` to your target.
* Refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the model_graph_format is code
#include "mace/public/${model_name}.h"
#include "mace/public/mace_engine_factory.h"
// ... Same with the code in basic usage
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// ... Same with the code in basic usage
* **Tuning for specific SOC's GPU**
If you want to use the GPU of a specific device, you can just specify the ``target_socs`` in your YAML file and
then tune the MACE lib for it, which may get 1~10% performance improvement.
* **1. Change the model deployment file(.yml)**
Specify ``target_socs`` in your model deployment file(.yml):
.. code:: sh
target_socs: [sdm845]
.. note::
Get device's soc info: `adb shell getprop | grep platform`
* **2. Convert model(s)**
.. code:: sh
python tools/converter.py convert --config=/path/to/model_deployment_file.yml
* **3. Tuning**
The tools/converter.py will enable automatic tuning for GPU kernels. This usually takes some
time to finish depending on the complexity of your model.
.. note::
You should plug in device(s) with the specific SoC(s).
.. code:: sh
python tools/converter.py run --config=/path/to/model_deployment_file.yml --validate
The command will generate two files in `builds/${library_name}/opencl`, like the following dir-tree.
.. code::
builds
└── mobilenet-v2
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
└── arm64-v8a
   ├── moblinet-v2_compiled_opencl_kernel.MiNote3.sdm660.bin
   └── moblinet-v2_tuned_opencl_parameter.MiNote3.sdm660.bin
* **mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin** stands for the OpenCL binaries
used for your models, which could accelerate the initialization stage.
Details please refer to `OpenCL Specification <https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clCreateProgramWithBinary.html>`__.
* **mobilenet-v2-tuned_opencl_parameter.MI6.msm8998.bin** stands for the tuned OpenCL parameters
for the SOC.
* **4. Deployment**
* Change the names of files generated above for not collision and push them to **your own device's directory**.
* Use like the previous procedure, below lists the key steps differently.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// 0. Set pre-compiled OpenCL binary program file paths and OpenCL parameters file path when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(path/to/opencl_binary_paths);
mace::SetOpenCLParameterPath(path/to/opencl_parameter_file);
}
// ... Same with the code in basic usage.
===============
Useful Commands
===============
* **run the model**
.. code:: sh
# Test model run time
python tools/converter.py run --config=/path/to/model_deployment_file.yml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/model_deployment_file.yml --validate
# Check the memory usage of the model(**Just keep only one model in deployment file**)
python tools/converter.py run --config=/path/to/model_deployment_file.yml --round=10000 &
sleep 5
adb shell dumpsys meminfo | grep mace_run
kill %1
.. warning::
``run`` rely on ``convert`` command, you should ``convert`` before ``run``.
* **benchmark and profile model**
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=/path/to/model_deployment_file.yml
.. warning::
``benchmark`` rely on ``convert`` command, you should ``benchmark`` after ``convert``.
**Common arguments**
.. list-table::
:header-rows: 1
* - option
- type
- default
- commands
- explanation
* - --omp_num_threads
- int
- -1
- ``run``/``benchmark``
- number of threads
* - --cpu_affinity_policy
- int
- 1
- ``run``/``benchmark``
- 0:AFFINITY_NONE/1:AFFINITY_BIG_ONLY/2:AFFINITY_LITTLE_ONLY
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_perf_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
* - --gpu_priority_hint
- int
- 3
- ``run``/``benchmark``
- 0:DEFAULT/1:LOW/2:NORMAL/3:HIGH
Use ``-h`` to get detailed help.
.. code:: sh
python tools/converter.py -h
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
Basic usage
============
Build and run an example model
-------------------------------
At first, make sure the environment has been set up correctly already (refer to :doc:`../installation/env_requirement`).
The followings are instructions about how to quickly build and run a provided model in *MACE Model Zoo*.
Here we use the mobilenet-v2 model as an example.
**Commands**
1. Pull *MACE* project.
.. code:: sh
git clone https://github.com/XiaoMi/mace.git
git fetch --all --tags --prune
# Checkout the latest tag (i.e. release version)
tag_name=`git describe --abbrev=0 --tags`
git checkout tags/${tag_name}
.. note::
It's highly recommanded to use a release version instead of master branch.
2. Pull *MACE Model Zoo* project.
.. code:: sh
git clone https://github.com/XiaoMi/mace-models.git
3. Build a general MACE library.
.. code:: sh
cd path/to/mace
# Build library
# output lib path: builds/lib
bash tools/build-standalone-lib.sh
4. Convert the model to MACE format model.
.. code:: sh
cd path/to/mace
# Build library
python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
5. Run the model.
.. warning::
If you want to run on device/phone, please plug in at least one device/phone.
.. code:: sh
# Run example
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --example
# Test model run time
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --validate
Build your own model
---------------------
This part will show you how to use your pre-trained model in MACE.
======================
1. Prepare your model
======================
Mace now supports models from TensorFlow and Caffe (more frameworks will be supported).
- TensorFlow
Prepare your pre-trained TensorFlow model.pb file.
Use `Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
to optimize your model for inference.
This tool will improve the efficiency of inference by making several optimizations like operators
folding, redundant node removal etc. We strongly recommend MACE users to use it before building.
Usage for CPU/GPU,
.. code:: bash
# CPU/GPU:
./transform_graph \
--in_graph=/path/to/your/tf_model.pb \
--out_graph=/path/to/your/output/tf_model_opt.pb \
--inputs='input node name' \
--outputs='output node name' \
--transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
strip_unused_nodes(type=float, shape="1,64,64,3")
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
flatten_atrous_conv
fold_batch_norms
fold_old_batch_norms
strip_unused_nodes
sort_by_execution_order'
- Caffe
Caffe 1.0+ models are supported in MACE converter tool.
If your model is from lower version Caffe, you need to upgrade it by using the Caffe built-in tool before converting.
.. code:: bash
# Upgrade prototxt
$CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt
# Upgrade caffemodel
$CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel
===========================================
2. Create a deployment file for your model
===========================================
When converting a model or building a library, MACE needs to read a YAML file which is called model deployment file here.
A model deployment file contains all the information of your model(s) and building options. There are several example
deployment files in *MACE Model Zoo* project.
The following shows two basic usage of deployment files for TensorFlow and Caffe models.
Modify one of them and use it for your own case.
- TensorFlow
.. literalinclude:: models/demo_app_models_tf.yml
:language: yaml
- Caffe
.. literalinclude:: models/demo_app_models_caffe.yml
:language: yaml
More details about model deployment file are in :doc:`advanced_usage`.
======================
3. Convert your model
======================
When the deployment file is ready, you can use MACE converter tool to convert your model(s).
.. code:: bash
python tools/converter.py convert --config=/path/to/your/model_deployment_file.yml
This command will download or load your pre-trained model and convert it to a MACE model proto file and weights data file.
The generated model files will be stored in ``build/${library_name}/model`` folder.
.. warning::
Please set ``model_graph_format: file`` and ``model_data_format: file`` in your deployment file before converting.
The usage of ``model_graph_format: code`` will be demonstrated in :doc:`advanced_usage`.
=============================
4. Build MACE into a library
=============================
Use bazel to build MACE source code into a library.
.. code:: sh
cd path/to/mace
# Build library
# output lib path: builds/lib
bash tools/build-standalone-lib.sh
The above command will generate dynamic library ``builds/lib/${ABI}/libmace.so`` and static library ``builds/lib/${ABI}/libmace.a``.
.. warning::
Please verify that the target_abis param in the above command and your deployment file are the same.
==================
5. Run your model
==================
With the converted model, the static or shared library and header files, you can use the following commands
to run and validate your model.
.. warning::
If you want to run on device/phone, please plug in at least one device/phone.
* **run**
run the model.
.. code:: sh
# Test model run time
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --round=100
# Validate the correctness by comparing the results against the
# original model and framework, measured with cosine distance for similarity.
python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --validate
* **benchmark**
benchmark and profile the model.
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=/path/to/your/model_deployment_file.yml
=======================================
6. Deploy your model into applications
=======================================
In the converting and building steps, you've got the static/shared library, model files and
header files.
``${library_name}`` is the name you defined in the first line of your deployment YAML file.
- The generated ``static`` library files are organized as follows,
.. code::
builds
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── lib
│   ├── arm64-v8a
│   │   ├── libmace.a
│   │   └── libmace.so
│   ├── armeabi-v7a
│   │   ├── libhexagon_controller.so
│   │   ├── libmace.a
│   │   └── libmace.so
│   └── linux-x86-64
│   ├── libmace.a
│   └── libmace.so
└── mobilenet-v1
├── model
│   ├── mobilenet_v1.data
│   └── mobilenet_v1.pb
└── _tmp
└── arm64-v8a
└── mace_run_static
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// 0. Set pre-compiled OpenCL binary program file paths when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 1. Set compiled OpenCL kernel cache, this is used to reduce the
// initialization time since the compiling is too slow. It's suggested
// to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel
// recompilations.
const std::string file_path ="path/to/opencl_cache_file";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 2. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from model file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// 5. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// Load input here
// ...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);
More details are in :doc:`advanced_usage`.
\ No newline at end of file
# The name of library
library_name: mobile_squeeze
# host, armeabi-v7a or arm64-v8a
target_abis: [arm64-v8a]
# The build mode for model(s).
# 'code' for transferring model(s) into cpp code, 'file' for keeping model(s) in protobuf file(s) (.pb).
model_graph_format: code
# 'code' for transferring model data(s) into cpp code, 'file' for keeping model data(s) in file(s) (.data).
model_data_format: code
# One yaml config file can contain multi models' deployment info.
models:
mobilenet_v1:
platform: tensorflow
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
subgraphs:
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
validation_inputs_data:
- https://cnbj1.fds.api.xiaomi.com/mace/inputs/dog.npy
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
obfuscate: 0
winograd: 0
squeezenet_v11:
platform: caffe
model_file_path: http://cnbj1-inner-fds.api.xiaomi.net/mace/mace-models/squeezenet/SqueezeNet_v1.1/model.prototxt
weight_file_path: http://cnbj1-inner-fds.api.xiaomi.net/mace/mace-models/squeezenet/SqueezeNet_v1.1/weight.caffemodel
model_sha256_checksum: 625c952063da1569e22d2f499dc454952244d42cd8feca61f05502566e70ae1c
weight_sha256_checksum: 72b912ace512e8621f8ff168a7d72af55910d3c7c9445af8dfbff4c2ee960142
subgraphs:
- input_tensors:
- data
input_shapes:
- 1,227,227,3
output_tensors:
- prob
output_shapes:
- 1,1,1,1000
runtime: cpu+gpu
limit_opencl_kernel_time: 0
nnlib_graph_mode: 0
obfuscate: 0
winograd: 0
\ No newline at end of file
# The name of library
library_name: squeezenet-v10
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
squeezenet-v10: # model tag, which will be used in model loading and must be specific.
platform: caffe
# support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/squeezenet/squeezenet-v1.0.prototxt
weight_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/squeezenet/squeezenet-v1.0.caffemodel
# sha256_checksum of your model's graph and data files.
# get the sha256_checksum: sha256sum path/to/your/file
model_sha256_checksum: db680cf18bb0387ded9c8e9401b1bbcf5dc09bf704ef1e3d3dbd1937e772cae0
weight_sha256_checksum: 9ff8035aada1f9ffa880b35252680d971434b141ec9fbacbe88309f0f9a675ce
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors:
- data
input_shapes:
- 1,227,227,3
output_tensors:
- prob
output_shapes:
- 1,1,1,1000
runtime: cpu+gpu
winograd: 0
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
model_graph_format: file
model_data_format: file
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
platform: tensorflow
# path to your tensorflow model's pb file. Support local path, http:// and https://
model_file_path: https://cnbj1.fds.api.xiaomi.com/mace/miai-models/mobilenet-v1/mobilenet-v1-1.0.pb
# sha256_checksum of your model's pb file.
# use this command to get the sha256_checksum: sha256sum path/to/your/pb/file
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
# cpu, gpu or cpu+gpu
runtime: cpu+gpu
winograd: 0
\ No newline at end of file
......@@ -3,7 +3,7 @@
set -e -u -o pipefail
pushd ../../../
python tools/converter.py build --config=docs/getting_started/models/demo_app_models.yaml
python tools/converter.py build --config=docs/user_guide/models/demo_app_models.yml
cp -r builds/mobilenet/include mace/examples/android/macelibrary/src/main/cpp/
cp -r builds/mobilenet/lib mace/examples/android/macelibrary/src/main/cpp/
......
......@@ -191,7 +191,7 @@ def parse_args():
"""Parses command line arguments."""
parser = argparse.ArgumentParser()
parser.add_argument(
"--platform", type=str, default="", help="Tensorflow or Caffe.")
"--platform", type=str, default="", help="TensorFlow or Caffe.")
parser.add_argument(
"--model_file",
type=str,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册