提交 b2441b02 编写于 作者: L liuqi

temp commit

上级 9c408a6c
......@@ -3,11 +3,6 @@ Advanced usage
This part contains the full usage of MACE.
How to build
-------------
=========
Overview
=========
......@@ -104,70 +99,187 @@ in one deployment file.
.. code:: bash
# command for fetching android device's soc info.
adb shell getprop | grep "model\|version.sdk\|manufacturer\|hardware\|platform\|brand"
# Get device's soc info.
adb shell getprop | grep platform
# command for generating sha256_sum
sha256sum /path/to/your/file
=========
Building
=========
==============
Advanced Usage
==============
* **Build static or shared library**
There are two common advanced use cases: 1. convert model to CPP code. 2. tuning for specific SOC if use GPU.
MACE can build either static or shared library (which is
specified by ``linkshared`` in YAML model deployment file).
The followings are two using cases.
* **Convert model(s) to CPP code**
* **Build well tuned library for specific SoCs**
.. warning::
When ``target_socs`` is specified in YAML model deployment file, the build
tool will enable automatic tuning for GPU kernels. This usually takes some
time to finish depending on the complexity of your model.
If you want to use this case, you can just use static mace library.
.. note::
* **1. Change the model configuration file(.yml)**
1. You should plug in device(s) with the specific SoC(s).
If you want to protect your model, you can convert model to CPP code. there are also two cases:
* **Build generic library for all SoCs**
* convert model graph to code and model weight to file with blow model configuration.
When ``target_socs`` is not specified, the generated library is compatible
with general devices.
.. code:: sh
.. note::
model_graph_format: code
model_data_format: file
1. There will be around of 1 ~ 10% performance drop for GPU
runtime compared to the well tuned library.
* convert both model graph and model weight to code with blow model configuration.
* **Build models into file or code**
.. code:: sh
When ``build_type`` is set to ``code``, model's graph and weights data will be embedded into codes.
This is used for model protection.
model_graph_format: code
model_data_format: code
.. note::
1. When ``linkshared`` is set to ``1``, ``build_type`` should be ``proto``.
And currently only android devices supported.
2. Another model protection method is using ``obfuscate`` to obfuscate the model operator name.
Another model protection method is using ``obfuscate`` to obfuscate the model operator name.
* **2. Convert model(s) to code**
.. code:: sh
python tools/converter.py convert --config=/path/to/model_deployment_file.yml
The command will generate **${library_name}.a** in **builds/${library_name}/model** directory and
** *.h ** in **builds/${library_name}/include** like blow dir-tree.
.. code::
builds
├── include
│   └── mace
│   └── public
│   ├── mace_engine_factory.h
│   └── mobilenet_v1.h
└── model
   ├── mobilenet-v1.a
   └── mobilenet_v1.data
* **3. Deployment**
* Link `libmace.a` and `${library_name}.a` to your target.
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the model_graph_format is code
#include "mace/public/${model_name}.h"
#include "mace/public/mace_engine_factory.h"
// ... Same with the code in basic usage
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// ... Same with the code in basic usage
* **Tuning for specific SOC's GPU**
If you want to use GPU of a specific device, you could specify ``target_socs`` and
tuning for the specific SOC. It may get 1~10% performance improvement.
* **1. Change the model configuration file(.yml)**
Specific ``target_socs`` in your model configuration file(.yml):
.. code:: sh
target_socs: [sdm845]
.. note::
**Commands**
Get device's soc info: `adb shell getprop | grep platform`
* **build library and test tools**
* **2. Convert model(s)**
.. code:: sh
# Build library
python tools/converter.py build --config=/path/to/model_deployment_file.yml
python tools/converter.py convert --config=/path/to/model_deployment_file.yml
* **3. Tuning**
The tools/converter.py will enable automatic tuning for GPU kernels. This usually takes some
time to finish depending on the complexity of your model.
.. note::
You should plug in device(s) with the specific SoC(s).
* **run the model**
.. code:: sh
python tools/converter.py run --config=/path/to/model_deployment_file.yml --validate
The command will generate two files in `builds/${library_name}/opencl`, like blow.
.. code::
builds
└── mobilenet-v2
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
└── arm64-v8a
   ├── moblinet-v2_compiled_opencl_kernel.MiNote3.sdm660.bin
   └── moblinet-v2_tuned_opencl_parameter.MiNote3.sdm660.bin
* **mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin** stands for the OpenCL binaries
used for your models, which could accelerate the initialization stage.
Details please refer to `OpenCL Specification <https://www.khronos.org/registry/OpenCL/sdk/1.0/docs/man/xhtml/clCreateProgramWithBinary.html>`__.
* **mobilenet-v2-tuned_opencl_parameter.MI6.msm8998.bin** stands for the tuned OpenCL parameters
for the SOC.
* **4. Deployment**
* Change the names of files generated above for not collision and push them to **your own device' directory**.
* Usage like the previous procedure, blow list the key steps different.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// 0. Set pre-compiled OpenCL binary program file paths and OpenCL parameters file path when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(path/to/opencl_binary_paths);
mace::SetOpenCLParameterPath(path/to/opencl_parameter_file);
}
// ... Same with the code in basic usage.
===============
Useful Commands
===============
* **run the model**
.. code:: sh
# Test model run time
python tools/converter.py run --config=/path/to/model_deployment_file.yml --round=100
......@@ -182,21 +294,21 @@ Building
kill %1
.. warning::
.. warning::
``run`` rely on ``build`` command, you should ``run`` after ``build``.
``run`` rely on ``convert`` command, you should ``run`` after ``convert``.
* **benchmark and profiling model**
* **benchmark and profiling model**
.. code:: sh
.. code:: sh
# Benchmark model, get detailed statistics of each Op.
python tools/converter.py benchmark --config=/path/to/model_deployment_file.yml
.. warning::
.. warning::
``benchmark`` rely on ``build`` command, you should ``benchmark`` after ``build``.
``benchmark`` rely on ``convert`` command, you should ``benchmark`` after ``convert``.
**Common arguments**
......@@ -242,183 +354,3 @@ Use ``-h`` to get detailed help.
python tools/converter.py build -h
python tools/converter.py run -h
python tools/converter.py benchmark -h
How to deploy
--------------
=========
Overview
=========
``build`` command will generate the static/shared library, model files and
header files and package them as
``build/${library_name}/libmace_${library_name}.tar.gz``.
- The generated ``static`` libraries are organized as follows,
.. code::
build/
└── mobilenet-v2-gpu
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
| └── mace_engine_factory.h (Only exists if ``build_type`` set to ``code``))
├── libmace_mobilenet-v2-gpu.tar.gz
├── lib
│   ├── arm64-v8a
│   │   └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
│   └── armeabi-v7a
│   └── libmace_mobilenet-v2-gpu.MI6.msm8998.a
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
- The generated ``shared`` libraries are organized as follows,
.. code::
build
└── mobilenet-v2-gpu
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
| └── mace_engine_factory.h (Only exists if ``build_type`` set to ``code``)
├── lib
│   ├── arm64-v8a
│   │   ├── libgnustl_shared.so
│   │   └── libmace.so
│   └── armeabi-v7a
│   ├── libgnustl_shared.so
│   └── libmace.so
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2-gpu_compiled_opencl_kernel.MI6.msm8998.bin
.. note::
1. DSP runtime depends on ``libhexagon_controller.so``.
2. ``${MODEL_TAG}.pb`` file will be generated only when ``build_type`` is ``proto``.
3. ``${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin`` will
be generated only when ``target_socs`` and ``gpu`` runtime are specified.
4. Generated shared library depends on ``libgnustl_shared.so``.
5. Files in opencl folder will be generated only if
``target_soc`` was set and ``runtime`` contains ``gpu`` in the deployment file.
6. When ``build_type`` has been set to ``code``, ${library_name}.h and mace_engine_factory.h
will be generated in ``include`` folder. This header file will be used to create mace_engine of your model.
.. warning::
``${library_name}_compiled_opencl_kernel.${device_name}.${soc}.bin`` depends
on the OpenCL version of the device, you should maintan the compatibility or
configure compiling cache store with ``ConfigKVStorageFactory``.
===========
Deployment
===========
Unpack the generated libmace_${library_name}.tar.gz file and copy all of the uncompressed files into your project.
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
.. code:: cpp
// Include the headers
#include "mace/public/mace.h"
#include "mace/public/mace_runtime.h"
// If the build_type is code
#include "mace/public/mace_engine_factory.h"
// 0. Set pre-compiled OpenCL binary program file paths when available
if (device_type == DeviceType::GPU) {
mace::SetOpenCLBinaryPaths(opencl_binary_paths);
}
// 1. Set compiled OpenCL kernel cache, this is used to reduce the
// initialization time since the compiling is too slow. It's suggested
// to set this even when pre-compiled OpenCL program file is provided
// because the OpenCL version upgrade may also leads to kernel
// recompilations.
const std::string file_path ="path/to/opencl_cache_file";
std::shared_ptr<KVStorageFactory> storage_factory(
new FileStorageFactory(file_path));
ConfigKVStorageFactory(storage_factory);
// 2. Declare the device type (must be same with ``runtime`` in configuration file)
DeviceType device_type = DeviceType::GPU;
// 3. Define the input and output tensor names.
std::vector<std::string> input_names = {...};
std::vector<std::string> output_names = {...};
// 4. Create MaceEngine instance
std::shared_ptr<mace::MaceEngine> engine;
MaceStatus create_engine_status;
// Create Engine from compiled code
create_engine_status =
CreateMaceEngineFromCode(model_name.c_str(),
nullptr,
input_names,
output_names,
device_type,
&engine);
// Create Engine from model file
create_engine_status =
CreateMaceEngineFromProto(model_pb_data,
model_data_file.c_str(),
input_names,
output_names,
device_type,
&engine);
if (create_engine_status != MaceStatus::MACE_SUCCESS) {
// Report error
}
// 5. Create Input and Output tensor buffers
std::map<std::string, mace::MaceTensor> inputs;
std::map<std::string, mace::MaceTensor> outputs;
for (size_t i = 0; i < input_count; ++i) {
// Allocate input and output
int64_t input_size =
std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_in = std::shared_ptr<float>(new float[input_size],
std::default_delete<float[]>());
// Load input here
// ...
inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
}
for (size_t i = 0; i < output_count; ++i) {
int64_t output_size =
std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
std::multiplies<int64_t>());
auto buffer_out = std::shared_ptr<float>(new float[output_size],
std::default_delete<float[]>());
outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
}
// 6. Run the model
MaceStatus status = engine.Run(inputs, &outputs);
......@@ -42,7 +42,8 @@ Here we use the mobilenet-v2 model as an example.
cd path/to/mace
# Build library
python tools/converter.py build --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
# output lib path: builds/lib
bash tools/build-standalone-lib.sh
4. Convert the model to MACE format model.
......@@ -51,11 +52,15 @@ Here we use the mobilenet-v2 model as an example.
cd path/to/mace
# Build library
python tools/converter.py build --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
5. Run the model.
.. warning::
If you want to run on device/phone, please plug in at least one device/phone.
.. code:: sh
# Test model run time
......@@ -160,8 +165,8 @@ The generated model files will be stored in ``build/${library_name}/model`` fold
.. warning::
Please set ``build_type:proto`` in your deployment file before converting.
The usage of ``build_type:code`` will be demonstrated in :doc:`advanced_usage`.
Please set ``model_graph_format: file`` and ``model_data_format: file`` in your deployment file before converting.
The usage of ``model_graph_format: code`` will be demonstrated in :doc:`advanced_usage`.
=============================
4. Build MACE into a library
......@@ -173,14 +178,14 @@ Use bazel to build MACE source code into a library.
cd path/to/mace
# Build library
bazel build --config android mace:libmace --define neon=true --define openmp=true -cpu=arm64-v8a
# output lib path: builds/lib
bash tools/build-standalone-lib.sh
The above command will generate a library as ``bazel-bin/mace/libmace.so``.
The above command will generate dynamic library ``builds/lib/${ABI}/libmace.so`` and static library ``builds/lib/${ABI}/libmace.a``.
.. warning::
1. Please verify that the target_abis param in the above command and your deployment file are the same.
2. If you want to build a library for a specific soc, please refer to :doc:`advanced_usage`.
==================
......@@ -190,6 +195,10 @@ The above command will generate a library as ``bazel-bin/mace/libmace.so``.
With the converted model, the static or shared library and header files, you can use the following commands
to run and validate your model.
.. warning::
If you want to run on device/phone, please plug in at least one device/phone.
* **run**
run the model.
......@@ -218,8 +227,7 @@ to run and validate your model.
=======================================
In the converting and building steps, you've got the static/shared library, model files and
header files. All of these generated files have been packaged into
``build/${library_name}/libmace_${library_name}.tar.gz`` when building.
header files.
``${library_name}`` is the name you defined in the first line of your deployment YAML file.
......@@ -227,34 +235,7 @@ header files. All of these generated files have been packaged into
.. code::
build/
└── mobilenet-v2
├── include
│   └── mace
│   └── public
│   ├── mace.h
│   └── mace_runtime.h
├── libmace_mobilenet-v2.tar.gz
├── lib
│   ├── arm64-v8a
│   │   └── libmace_mobilenet-v2.MI6.msm8998.a
│   └── armeabi-v7a
│   └── libmace_mobilenet-v2.MI6.msm8998.a
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2_compiled_opencl_kernel.MI6.msm8998.bin
- The generated ``shared`` library files are organized as follows,
.. code::
build
└── mobilenet-v2
builds
├── include
│   └── mace
│   └── public
......@@ -262,22 +243,23 @@ header files. All of these generated files have been packaged into
│   └── mace_runtime.h
├── lib
│   ├── arm64-v8a
│   │   ├── libgnustl_shared.so
│   │   ├── libmace.a
│   │   └── libmace.so
│   └── armeabi-v7a
│   ├── libgnustl_shared.so
│   ├── armeabi-v7a
│   │   ├── libhexagon_controller.so
│   │   ├── libmace.a
│   │   └── libmace.so
│   └── linux-x86-64
│   ├── libmace.a
│   └── libmace.so
└── mobilenet-v1
├── model
│   ├── mobilenet_v2.data
│   └── mobilenet_v2.pb
└── opencl
├── arm64-v8a
│   └── mobilenet-v2_compiled_opencl_kernel.MI6.msm8998.bin
└── armeabi-v7a
└── mobilenet-v2_compiled_opencl_kernel.MI6.msm8998.bin
│   ├── mobilenet_v1.data
│   └── mobilenet_v1.pb
└── _tmp
└── arm64-v8a
└── mace_run_static
Unpack the generated libmace_${library_name}.tar.gz file and copy all of the uncompressed files into your project.
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
......
......@@ -16,9 +16,6 @@ Example
----------
Here is an example deployment file used by an Android demo application.
TODO: change this example file to the demo deployment file
(reuse the same file) and rename to a reasonable name.
.. literalinclude:: models/demo_app_models.yml
:language: yaml
......@@ -34,12 +31,10 @@ Configurations
- The target ABI to build, can be one or more of 'host', 'armeabi-v7a' or 'arm64-v8a'.
* - target_socs
- [optional] build for specified socs if you just want use the model for that socs.
* - embed_model_data
- Whether embedding model weights as the code, default to 0.
* - build_type
- model build type, can be ['proto', 'code']. 'proto' for converting model to ProtoBuf file and 'code' for converting model to c++ code.
* - linkshared
- [optional] Use dynamic linking for libmace library when setting to 1, or static linking when setting to 0, default to 0.
* - model_graph_format
- MACE model graph type, could be ['file', 'code']. 'file' for converting model to ProtoBuf(`.pb`) file and 'code' for converting model to c++ code.
* - model_data_format
- MACE model data type, could be ['file', 'code']. 'file' for converting model to `.data` file and 'code' for converting model to c++ code.
* - model_name
- model name, should be unique if there are multiple models.
**LIMIT: if build_type is code, model_name will used in c++ code so that model_name must fulfill c++ name specification.**
......
......@@ -2,13 +2,11 @@
library_name: mobile_squeeze
# host, armeabi-v7a or arm64-v8a
target_abis: [arm64-v8a]
# set 1 to embed model weights data into code. default is 0, keep weights in model.data file
embed_model_data: 1
# The build mode for model(s).
# 'code' for transferring model(s) into cpp code, 'proto' for keeping model(s) in protobuf file(s).
build_type: code
# 0 for static library, 1 for shared library.
linkshared: 0
# 'code' for transferring model(s) into cpp code, 'file' for keeping model(s) in protobuf file(s) (.pb).
model_graph_format: code
# 'code' for transferring model data(s) into cpp code, 'file' for keeping model data(s) in file(s) (.data).
model_data_format: code
# One yaml config file can contain multi models' deployment info.
models:
mobilenet_v1:
......
# The name of library
library_name: squeezenet-v10
target_abis: [arm64-v8a]
embed_model_data: 0
build_type: proto
linkshared: 1
model_graph_format: file
model_data_format: file
models:
squeezenet-v10: # model tag, which will be used in model loading and must be specific.
platform: caffe
......@@ -15,10 +14,28 @@ models:
model_sha256_checksum: db680cf18bb0387ded9c8e9401b1bbcf5dc09bf704ef1e3d3dbd1937e772cae0
weight_sha256_checksum: 9ff8035aada1f9ffa880b35252680d971434b141ec9fbacbe88309f0f9a675ce
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors: data
input_shapes: 1,227,227,3
output_tensors: prob
output_shapes: 1,1,1,1000
- input_tensors:
- data
input_shapes:
- 1,227,227,3
output_tensors:
- prob
output_shapes:
- 1,1,1,1000
runtime: cpu+gpu
winograd: 0
# The name of library
library_name: mobilenet
target_abis: [arm64-v8a]
embed_model_data: 0
build_type: proto
linkshared: 1
model_graph_format: file
model_data_format: file
models:
mobilenet_v1: # model tag, which will be used in model loading and must be specific.
platform: tensorflow
......@@ -13,11 +12,29 @@ models:
# use this command to get the sha256_checksum: sha256sum path/to/your/pb/file
model_sha256_checksum: 71b10f540ece33c49a7b51f5d4095fc9bd78ce46ebf0300487b2ee23d71294e6
# define your model's interface
# if there multiple inputs or outputs, write like blow:
# subgraphs:
# - input_tensors:
# - input0
# - input1
# input_shapes:
# - 1,224,224,3
# - 1,224,224,3
# output_tensors:
# - output0
# - output1
# output_shapes:
# - 1,1001
# - 1,1001
subgraphs:
- input_tensors: input
input_shapes: 1,224,224,3
output_tensors: MobilenetV1/Predictions/Reshape_1
output_shapes: 1,1001
- input_tensors:
- input
input_shapes:
- 1,224,224,3
output_tensors:
- MobilenetV1/Predictions/Reshape_1
output_shapes:
- 1,1001
# cpu, gpu or cpu+gpu
runtime: cpu+gpu
winograd: 0
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册