Basic usage for Bazel users ============================ Build and run an example model ------------------------------- At first, make sure the environment has been set up correctly already (refer to :doc:`../installation/env_requirement.rst`). The followings are instructions about how to quickly build and run a provided model in `MACE Model Zoo `__. Here we use the mobilenet-v2 model as an example. **Commands** 1. Pull `MACE `__ project. .. code-block:: sh git clone https://github.com/XiaoMi/mace.git cd mace/ git fetch --all --tags --prune # Checkout the latest tag (i.e. release version) tag_name=`git describe --abbrev=0 --tags` git checkout tags/${tag_name} .. note:: It's highly recommended to use a release version instead of master branch. 2. Pull `MACE Model Zoo `__ project. .. code-block:: sh git clone https://github.com/XiaoMi/mace-models.git 3. Build a generic MACE library. .. code-block:: sh cd path/to/mace # Build library # output lib path: build/lib bash tools/bazel_build_standalone_lib.sh [-abi=abi][-runtimes=rt1,rt2,...][-static] .. note:: - This step can be skipped if you just want to run a model using ``tools/converter.py``, such as commands in step 5. - Use the `-abi` parameter to specify the ABI. Supported ABIs are armeabi-v7a, arm64-v8a, arm_linux_gnueabihf, aarch64_linux_gnu and host (for host machine, linux-x86-64). The default ABI is arm64-v8a. - For each ABI, several runtimes can be chosen by specifying the `-runtimes` parameter. Supported runtimes are CPU, GPU, DSP and APU. By default, the library is built to run on CPU. - Omit the `-static` option if a shared library is desired instead of a static one. By default, a shared library is built. - See 'bash tools/bazel_build_standalone_lib.sh -help' for detailed information. - DO respect the hyphens ('-') and the underscores ('_') in the ABI. 4. Convert the pre-trained mobilenet-v2 model to MACE format model. .. code-block:: sh cd path/to/mace # Build library python tools/converter.py convert --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml 5. Run the model. .. note:: If you want to run on phone, please plug in at least one phone. Or if you want to run on embedded device, please give a :doc:`advanced_usage`. .. code-block:: sh # Run python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml # Test model run time python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --round=100 # Validate the correctness by comparing the results against the # original model and framework, measured with cosine distance for similarity. python tools/converter.py run --config=/path/to/mace-models/mobilenet-v2/mobilenet-v2.yml --validate Build your own model --------------------- This part will show you how to use your own pre-trained model in MACE. ====================== 1. Prepare your model ====================== MACE now supports models from TensorFlow and Caffe (more frameworks will be supported). - TensorFlow Prepare your pre-trained TensorFlow model.pb file. - Caffe Caffe 1.0+ models are supported in MACE converter tool. If your model is from lower version Caffe, you need to upgrade it by using the Caffe built-in tool before converting. .. code-block:: bash # Upgrade prototxt $CAFFE_ROOT/build/tools/upgrade_net_proto_text MODEL.prototxt MODEL.new.prototxt # Upgrade caffemodel $CAFFE_ROOT/build/tools/upgrade_net_proto_binary MODEL.caffemodel MODEL.new.caffemodel - ONNX Prepare your ONNX model.onnx file. Use `ONNX Optimizer Tool `__ to optimize your model for inference. This tool will improve the efficiency of inference like the `Graph Transform Tool `__ in TensorFlow. .. code-block:: bash # Optimize your model $python MACE_ROOT/tools/onnx_optimizer.py model.onnx model_opt.onnx =========================================== 2. Create a deployment file for your model =========================================== When converting a model or building a library, MACE needs to read a YAML file which is called model deployment file here. A model deployment file contains all the information of your model(s) and building options. There are several example deployment files in *MACE Model Zoo* project. The following shows two basic usage of deployment files for TensorFlow and Caffe models. Modify one of them and use it for your own case. - TensorFlow .. literalinclude:: models/demo_models_tf.yml :language: yaml - Caffe .. literalinclude:: models/demo_models_caffe.yml :language: yaml - ONNX .. literalinclude:: models/demo_models_onnx.yml :language: yaml More details about model deployment file are in :doc:`advanced_usage`. ====================== 3. Convert your model ====================== When the deployment file is ready, you can use MACE converter tool to convert your model(s). .. code-block:: bash python tools/converter.py convert --config=/path/to/your/model_deployment_file.yml This command will download or load your pre-trained model and convert it to a MACE model proto file and weights data file. The generated model files will be stored in ``build/${library_name}/model`` folder. .. warning:: Please set ``model_graph_format: file`` and ``model_data_format: file`` in your deployment file before converting. The usage of ``model_graph_format: code`` will be demonstrated in :doc:`advanced_usage`. ============================= 4. Build MACE into a library ============================= You could Download the prebuilt MACE Library from `Github MACE release page `__. Or use bazel to build MACE source code into a library. .. code-block:: sh cd path/to/mace # Build library # output lib path: build/lib bash tools/bazel_build_standalone_lib.sh [-abi=abi][-runtimes=rt1,rt2,...][-static] The above command will generate static library ``build/lib/libmace.a`` dynamic library ``build/lib/libmace.so``. .. warning:: Please verify that the -abi param in the above command is the same as the target_abi param in your deployment file. ================== 5. Run your model ================== With the converted model, the static or shared library and header files, you can use the following commands to run and validate your model. .. warning:: If you want to run on device/phone, please plug in at least one device/phone. * **run** run the model. .. code-block:: sh # Test model run time python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --round=100 # Validate the correctness by comparing the results against the # original model and framework, measured with cosine distance for similarity. python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --validate # If you want to run model on specified arm linux device, you should put device config file in the working directory or run with flag `--device_yml` python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --device_yml=/path/to/devices.yml * **benchmark** benchmark and profile the model. the details are in :doc:`benchmark`. .. code-block:: sh # Benchmark model, get detailed statistics of each Op. python tools/converter.py run --config=/path/to/your/model_deployment_file.yml --benchmark ======================================= 6. Deploy your model into applications ======================================= You could run model on CPU, GPU and DSP (based on the `runtime` in your model deployment file). However, there are some differences in different devices. * **CPU** Almost all of mobile SoCs use ARM-based CPU architecture, so your model could run on different SoCs in theory. * **GPU** Although most GPUs use OpenCL standard, but there are some SoCs not fully complying with the standard, or the GPU is too low-level to use. So you should have some fallback strategies when the GPU run failed. * **DSP** MACE only supports Qualcomm DSP. And you need to push the hexagon nn library to the device. .. code-block:: sh # For Android device adb root; adb remount adb push third_party/nnlib/v6x/libhexagon_nn_skel.so /system/vendor/lib/rfsa/adsp/ In the converting and building steps, you've got the static/shared library, model files and header files. ``${library_name}`` is the name you defined in the first line of your deployment YAML file. .. note:: When linking generated ``libmace.a`` into shared library, `version script `__ is helpful for reducing a specified set of symbols to local scope. - The generated ``static`` library files are organized as follows, .. code-block:: none build ├── include │   └── mace │   └── public │   └── mace.h ├── lib │ ├── libmace.a (for static library) │ ├── libmace.so (for shared library) │   └── libhexagon_controller.so (for DSP runtime) └── mobilenet-v1 ├── model │   ├── mobilenet_v1.data │   └── mobilenet_v1.pb └── _tmp └── arm64-v8a └── mace_run_static Please refer to \ ``mace/tools/mace_run.cc``\ for full usage. The following list the key steps. .. code-block:: cpp // Include the headers #include "mace/public/mace.h" // 0. Declare the device type (must be same with ``runtime`` in configuration file) DeviceType device_type = DeviceType::GPU; // 1. configuration MaceStatus status; MaceEngineConfig config(device_type); std::shared_ptr gpu_context; // Set the path to store compiled OpenCL kernel binaries. // please make sure your application have read/write rights of the directory. // this is used to reduce the initialization time since the compiling is too slow. // It's suggested to set this even when pre-compiled OpenCL program file is provided // because the OpenCL version upgrade may also leads to kernel recompilations. const std::string storage_path ="path/to/storage"; gpu_context = GPUContextBuilder() .SetStoragePath(storage_path) .Finalize(); config.SetGPUContext(gpu_context); config.SetGPUHints( static_cast(GPUPerfHint::PERF_NORMAL), static_cast(GPUPriorityHint::PRIORITY_LOW)); // 2. Define the input and output tensor names. std::vector input_names = {...}; std::vector output_names = {...}; // 3. Create MaceEngine instance std::shared_ptr engine; MaceStatus create_engine_status; // Create Engine from model file create_engine_status = CreateMaceEngineFromProto(model_graph_proto, model_graph_proto_size, model_weights_data, model_weights_data_size, input_names, output_names, device_type, &engine); if (create_engine_status != MaceStatus::MACE_SUCCESS) { // fall back to other strategy. } // 4. Create Input and Output tensor buffers std::map inputs; std::map outputs; for (size_t i = 0; i < input_count; ++i) { // Allocate input and output int64_t input_size = std::accumulate(input_shapes[i].begin(), input_shapes[i].end(), 1, std::multiplies()); auto buffer_in = std::shared_ptr(new float[input_size], std::default_delete()); // Load input here // ... inputs[input_names[i]] = mace::MaceTensor(input_shapes[i], buffer_in); } for (size_t i = 0; i < output_count; ++i) { int64_t output_size = std::accumulate(output_shapes[i].begin(), output_shapes[i].end(), 1, std::multiplies()); auto buffer_out = std::shared_ptr(new float[output_size], std::default_delete()); outputs[output_names[i]] = mace::MaceTensor(output_shapes[i], buffer_out); } // 5. Run the model MaceStatus status = engine.Run(inputs, &outputs); More details are in :doc:`advanced_usage`.