basic_usage.rst 13.0 KB
Newer Older
L
Liangliang He 已提交
1
Basic usage
L
update  
liutuo 已提交
2
============
L
Liangliang He 已提交
3

L
liutuo 已提交
4 5

Build and run an example model
L
update  
liutuo 已提交
6
-------------------------------
L
liutuo 已提交
7

L
update  
liutuo 已提交
8
At first, make sure the environment has been set up correctly already (refer to :doc:`../installation/env_requirement`).
L
liutuo 已提交
9

L
Liangliang He 已提交
10 11
The followings are instructions about how to quickly build and run a provided model in
`MACE Model Zoo <https://github.com/XiaoMi/mace-models>`__.
L
liutuo 已提交
12

L
liutuo 已提交
13
Here we use the mobilenet-v2 model as an example.
L
liutuo 已提交
14

L
liutuo 已提交
15 16
**Commands**

L
Liangliang He 已提交
17
    1. Pull `MACE <https://github.com/XiaoMi/mace>`__ project.
L
liutuo 已提交
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

    .. code:: sh

        git clone https://github.com/XiaoMi/mace.git
        git fetch --all --tags --prune

        # Checkout the latest tag (i.e. release version)
        tag_name=`git describe --abbrev=0 --tags`
        git checkout tags/${tag_name}

    .. note::

        It's highly recommanded to use a release version instead of master branch.


L
Liangliang He 已提交
33
    2. Pull `MACE Model Zoo <https://github.com/XiaoMi/mace-models>`__ project.
L
liutuo 已提交
34 35 36 37 38 39

    .. code:: sh

        git clone https://github.com/XiaoMi/mace-models.git


L
Liangliang He 已提交
40
    3. Build a generic MACE library.
L
liutuo 已提交
41 42

    .. code:: sh
L
liutuo 已提交
43

L
liutuo 已提交
44 45
        cd path/to/mace
        # Build library
L
liuqi 已提交
46 47
        # output lib path: builds/lib
        bash tools/build-standalone-lib.sh
L
liutuo 已提交
48 49


50 51 52 53 54 55 56
    .. note::

        - Libraries in ``builds/lib/armeabi-v7a/cpu_gpu/`` means it can run on ``cpu`` or ``gpu`` devices.

        - The results in ``builds/lib/armeabi-v7a/cpu_gpu_dsp/`` need HVX supported.


L
Liangliang He 已提交
57
    4. Convert the pre-trained mobilenet-v2 model to MACE format model.
L
liutuo 已提交
58

L
liutuo 已提交
59
    .. code:: sh
L
liutuo 已提交
60

L
liutuo 已提交
61 62
        cd path/to/mace
        # Build library
L
liuqi 已提交
63
        python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml
L
liutuo 已提交
64 65


L
liutuo 已提交
66
    5. Run the model.
L
liutuo 已提交
67

L
Liangliang He 已提交
68
    .. note::
L
liuqi 已提交
69 70 71

        If you want to run on device/phone, please plug in at least one device/phone.

L
liutuo 已提交
72 73
    .. code:: sh

74 75 76
        # Run example
        python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --example

L
liutuo 已提交
77
    	# Test model run time
L
liutuo 已提交
78
        python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --round=100
L
liutuo 已提交
79 80 81

    	# Validate the correctness by comparing the results against the
    	# original model and framework, measured with cosine distance for similarity.
L
liutuo 已提交
82
    	python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --validate
L
liutuo 已提交
83 84 85


Build your own model
L
update  
liutuo 已提交
86
---------------------
L
liutuo 已提交
87

L
Liangliang He 已提交
88
This part will show you how to use your own pre-trained model in MACE.
L
liutuo 已提交
89

L
update  
liutuo 已提交
90
======================
L
liutuo 已提交
91
1. Prepare your model
L
update  
liutuo 已提交
92
======================
L
liutuo 已提交
93

L
Liangliang He 已提交
94
MACE now supports models from TensorFlow and Caffe (more frameworks will be supported).
L
liutuo 已提交
95 96 97

-  TensorFlow

98
   Prepare your pre-trained TensorFlow model.pb file.
L
liutuo 已提交
99 100

   Use `Graph Transform Tool <https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/README.md>`__
L
liutuo 已提交
101
   to optimize your model for inference.
L
liutuo 已提交
102
   This tool will improve the efficiency of inference by making several optimizations like operators
L
liutuo 已提交
103
   folding, redundant node removal etc. We strongly recommend MACE users to use it before building.
L
liutuo 已提交
104

L
liutuo 已提交
105
   Usage for CPU/GPU,
L
liutuo 已提交
106 107 108 109 110

   .. code:: bash

       # CPU/GPU:
       ./transform_graph \
L
liutuo 已提交
111 112 113 114
           --in_graph=/path/to/your/tf_model.pb \
           --out_graph=/path/to/your/output/tf_model_opt.pb \
           --inputs='input node name' \
           --outputs='output node name' \
L
liutuo 已提交
115 116 117 118 119 120 121
           --transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
               strip_unused_nodes(type=float, shape="1,64,64,3")
               remove_nodes(op=Identity, op=CheckNumerics)
               fold_constants(ignore_errors=true)
               flatten_atrous_conv
               fold_batch_norms
               fold_old_batch_norms
122
               remove_control_dependencies
L
liutuo 已提交
123 124
               strip_unused_nodes
               sort_by_execution_order'
李寅 已提交
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144

	Usage for DSP,

   .. code:: bash

       # DSP:
       ./transform_graph \
           --in_graph=/path/to/your/tf_model.pb \
           --out_graph=/path/to/your/output/tf_model_opt.pb \
           --inputs='input node name' \
           --outputs='output node name' \
           --transforms='strip_unused_nodes(type=float, shape="1,64,64,3")
               strip_unused_nodes(type=float, shape="1,64,64,3")
               remove_nodes(op=Identity, op=CheckNumerics)
               fold_constants(ignore_errors=true)
               fold_batch_norms
               fold_old_batch_norms
               backport_concatv2
               quantize_weights(minimum_size=2)
               quantize_nodes
145
               remove_control_dependencies
李寅 已提交
146 147
               strip_unused_nodes
               sort_by_execution_order'
L
liutuo 已提交
148 149 150

-  Caffe

L
liutuo 已提交
151 152 153
   Caffe 1.0+ models are supported in MACE converter tool.

   If your model is from lower version Caffe, you need to upgrade it by using the Caffe built-in tool before converting.
L
liutuo 已提交
154 155 156 157 158 159 160 161 162

   .. code:: bash

       # Upgrade prototxt
       $CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt

       # Upgrade caffemodel
       $CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel

L
liutuo 已提交
163

L
update  
liutuo 已提交
164
===========================================
L
liutuo 已提交
165
2. Create a deployment file for your model
L
update  
liutuo 已提交
166
===========================================
L
liutuo 已提交
167

L
liutuo 已提交
168 169 170 171 172
When converting a model or building a library, MACE needs to read a YAML file which is called model deployment file here.

A model deployment file contains all the information of your model(s) and building options. There are several example
deployment files in *MACE Model Zoo* project.

173
The following shows two basic usage of deployment files for TensorFlow and Caffe models.
L
liutuo 已提交
174
Modify one of them and use it for your own case.
L
liutuo 已提交
175

176
-  TensorFlow
L
liutuo 已提交
177

178
   .. literalinclude:: models/demo_models_tf.yml
L
liutuo 已提交
179 180 181 182
      :language: yaml

-  Caffe

183
   .. literalinclude:: models/demo_models_caffe.yml
L
liutuo 已提交
184 185
      :language: yaml

L
liutuo 已提交
186
More details about model deployment file are in :doc:`advanced_usage`.
L
liutuo 已提交
187

L
update  
liutuo 已提交
188
======================
L
liutuo 已提交
189
3. Convert your model
L
update  
liutuo 已提交
190
======================
L
liutuo 已提交
191

L
liutuo 已提交
192
When the deployment file is ready, you can use MACE converter tool to convert your model(s).
L
liutuo 已提交
193

L
liutuo 已提交
194
.. code:: bash
L
liutuo 已提交
195

L
liutuo 已提交
196
    python tools/converter.py convert --config=/path/to/your/model_deployment_file.yml
L
liutuo 已提交
197

L
liutuo 已提交
198
This command will download or load your pre-trained model and convert it to a MACE model proto file and weights data file.
L
liutuo 已提交
199 200 201 202
The generated model files will be stored in ``build/${library_name}/model`` folder.

.. warning::

L
liuqi 已提交
203 204
    Please set ``model_graph_format: file`` and ``model_data_format: file`` in your deployment file before converting.
    The usage of ``model_graph_format: code`` will be demonstrated in :doc:`advanced_usage`.
L
liutuo 已提交
205

L
update  
liutuo 已提交
206
=============================
L
liutuo 已提交
207
4. Build MACE into a library
L
update  
liutuo 已提交
208
=============================
L
liuqi 已提交
209
You could Download the prebuilt MACE Library from `Github MACE release page <https://github.com/XiaoMi/mace/releases>`__.
L
liutuo 已提交
210

L
liuqi 已提交
211
Or use bazel to build MACE source code into a library.
L
liutuo 已提交
212 213 214 215 216

    .. code:: sh

        cd path/to/mace
        # Build library
L
liuqi 已提交
217 218
        # output lib path: builds/lib
        bash tools/build-standalone-lib.sh
L
liutuo 已提交
219

Y
yejianwu 已提交
220
The above command will generate dynamic library ``builds/lib/${ABI}/${DEVICES}/libmace.so`` and static library ``builds/lib/${ABI}/${DEVICES}/libmace.a``.
L
liutuo 已提交
221 222 223

    .. warning::

L
liuqi 已提交
224
        Please verify that the target_abis param in the above command and your deployment file are the same.
L
liutuo 已提交
225 226


L
update  
liutuo 已提交
227
==================
L
liutuo 已提交
228
5. Run your model
L
update  
liutuo 已提交
229
==================
L
liutuo 已提交
230

L
update  
liutuo 已提交
231 232
With the converted model, the static or shared library and header files, you can use the following commands
to run and validate your model.
L
liutuo 已提交
233

L
liuqi 已提交
234 235 236 237
    .. warning::

        If you want to run on device/phone, please plug in at least one device/phone.

L
liutuo 已提交
238 239 240
* **run**

    run the model.
L
liutuo 已提交
241 242 243 244

    .. code:: sh

    	# Test model run time
L
liutuo 已提交
245
        python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --round=100
L
liutuo 已提交
246 247 248

    	# Validate the correctness by comparing the results against the
    	# original model and framework, measured with cosine distance for similarity.
L
liutuo 已提交
249
    	python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --validate
L
liutuo 已提交
250

L
liutuo 已提交
251
* **benchmark**
L
liutuo 已提交
252

L
liutuo 已提交
253
    benchmark and profile the model.
L
liutuo 已提交
254 255 256 257

    .. code:: sh

        # Benchmark model, get detailed statistics of each Op.
L
liutuo 已提交
258
        python tools/converter.py benchmark --config=/path/to/your/model_deployment_file.yml
L
liutuo 已提交
259 260


L
update  
liutuo 已提交
261
=======================================
L
liutuo 已提交
262
6. Deploy your model into applications
L
update  
liutuo 已提交
263
=======================================
L
liutuo 已提交
264

L
liuqi 已提交
265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280
You could run model on CPU, GPU and DSP (based on the `runtime` in your model deployment file).
However, there are some differences in different devices.

* **CPU**

    Almost all of mobile SoCs use ARM-based CPU architecture, so your model could run on different SoCs in theory.

* **GPU**

    Although most GPUs use OpenCL standard, but there are some SoCs not fully complying with the standard,
    or the GPU is too low-level to use. So you should have some fallback strategies when the GPU run failed.

* **DSP**

    MACE only support Qualcomm DSP.

L
liutuo 已提交
281
In the converting and building steps, you've got the static/shared library, model files and
L
liuqi 已提交
282
header files.
L
liutuo 已提交
283

L
liuqi 已提交
284

L
liutuo 已提交
285
``${library_name}`` is the name you defined in the first line of your deployment YAML file.
L
liutuo 已提交
286

Y
yejianwu 已提交
287 288 289 290 291 292
.. note::

    When linking generated ``libmace.a`` into shared library,
    `version script <ftp://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_25.html>`__
    is helpful for reducing a specified set of symbols to local scope.

L
liutuo 已提交
293 294 295 296
-  The generated ``static`` library files are organized as follows,

.. code::

L
liuqi 已提交
297 298 299 300 301 302 303 304
    builds
    ├── include
    │   └── mace
    │       └── public
    │           ├── mace.h
    │           └── mace_runtime.h
    ├── lib
    │   ├── arm64-v8a
Y
yejianwu 已提交
305 306 307
    │   │   └── cpu_gpu
    │   │       ├── libmace.a
    │   │       └── libmace.so
L
liuqi 已提交
308
    │   ├── armeabi-v7a
Y
yejianwu 已提交
309 310 311 312 313 314 315
    │   │   ├── cpu_gpu
    │   │   │   ├── libmace.a
    │   │   │   └── libmace.so
    │   │   └── cpu_gpu_dsp
    │   │       ├── libhexagon_controller.so
    │   │       ├── libmace.a
    │   │       └── libmace.so
L
liuqi 已提交
316 317 318 319 320 321 322 323 324 325
    │   └── linux-x86-64
    │       ├── libmace.a
    │       └── libmace.so
    └── mobilenet-v1
        ├── model
        │   ├── mobilenet_v1.data
        │   └── mobilenet_v1.pb
        └── _tmp
            └── arm64-v8a
                └── mace_run_static
L
liutuo 已提交
326

L
liutuo 已提交
327

L
liutuo 已提交
328
Please refer to \ ``mace/examples/example.cc``\ for full usage. The following list the key steps.
L
liutuo 已提交
329 330 331 332 333 334 335

.. code:: cpp

    // Include the headers
    #include "mace/public/mace.h"
    #include "mace/public/mace_runtime.h"

L
liuqi 已提交
336
    // 0. Set compiled OpenCL kernel cache, this is used to reduce the
L
liutuo 已提交
337 338 339 340
    // initialization time since the compiling is too slow. It's suggested
    // to set this even when pre-compiled OpenCL program file is provided
    // because the OpenCL version upgrade may also leads to kernel
    // recompilations.
L
liutuo 已提交
341 342 343 344 345
    const std::string file_path ="path/to/opencl_cache_file";
    std::shared_ptr<KVStorageFactory> storage_factory(
        new FileStorageFactory(file_path));
    ConfigKVStorageFactory(storage_factory);

L
liuqi 已提交
346
    // 1. Declare the device type (must be same with ``runtime`` in configuration file)
L
liutuo 已提交
347 348
    DeviceType device_type = DeviceType::GPU;

L
liuqi 已提交
349
    // 2. Define the input and output tensor names.
L
liutuo 已提交
350 351 352
    std::vector<std::string> input_names = {...};
    std::vector<std::string> output_names = {...};

L
liuqi 已提交
353
    // 3. Create MaceEngine instance
L
liutuo 已提交
354 355
    std::shared_ptr<mace::MaceEngine> engine;
    MaceStatus create_engine_status;
L
liutuo 已提交
356 357

    // Create Engine from model file
L
liutuo 已提交
358 359 360 361 362 363 364 365
    create_engine_status =
        CreateMaceEngineFromProto(model_pb_data,
                                  model_data_file.c_str(),
                                  input_names,
                                  output_names,
                                  device_type,
                                  &engine);
    if (create_engine_status != MaceStatus::MACE_SUCCESS) {
L
liuqi 已提交
366
      // fall back to other strategy.
L
liutuo 已提交
367 368
    }

L
liuqi 已提交
369
    // 4. Create Input and Output tensor buffers
L
liutuo 已提交
370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393
    std::map<std::string, mace::MaceTensor> inputs;
    std::map<std::string, mace::MaceTensor> outputs;
    for (size_t i = 0; i < input_count; ++i) {
      // Allocate input and output
      int64_t input_size =
          std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1,
                          std::multiplies<int64_t>());
      auto buffer_in = std::shared_ptr<float>(new float[input_size],
                                              std::default_delete<float[]>());
      // Load input here
      // ...

      inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in);
    }

    for (size_t i = 0; i < output_count; ++i) {
      int64_t output_size =
          std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1,
                          std::multiplies<int64_t>());
      auto buffer_out = std::shared_ptr<float>(new float[output_size],
                                               std::default_delete<float[]>());
      outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out);
    }

L
liuqi 已提交
394
    // 5. Run the model
L
liutuo 已提交
395
    MaceStatus status = engine.Run(inputs, &outputs);
L
liutuo 已提交
396

L
Liangliang He 已提交
397
More details are in :doc:`advanced_usage`.