diff --git a/docs/user_guide/quantization_usage.rst b/docs/user_guide/quantization_usage.rst index 7d1b78af4410a50f4cb6f887b84a2415cc2b1905..337641f97e844b4b93fc44513112a5f1b9862898 100644 --- a/docs/user_guide/quantization_usage.rst +++ b/docs/user_guide/quantization_usage.rst @@ -13,14 +13,19 @@ Refer to `Tensorflow quantization-aware training range_log + # Run with input tensors + # For CMake users: + python tools/python/run_model.py --config ../mace-models/inception-v3/inception-v3.yml + --quantize_stat --input_dir /path/to/directory/of/input/tensors > range_log - # For Bazel users: - python tools/converter.py run --config ../mace-models/inception-v3/inception-v3.yml - --quantize_stat --input_dir /path/to/directory/of/input/tensors > range_log + # For Bazel users: + python tools/converter.py run --config ../mace-models/inception-v3/inception-v3.yml + --quantize_stat --input_dir /path/to/directory/of/input/tensors > range_log - 3. Calculate overall range of each activation layer. You may specify `--percentile` or `--enhance` and `--enhance_ratio` - to try different ranges and see which is better. Experimentation shows that the default `percentile` and `enhance_ratio` - works fine for several common models. + 3. Calculate overall range of each activation layer. You may specify `--percentile` or `--enhance` and `--enhance_ratio` + to try different ranges and see which is better. Experimentation shows that the default `percentile` and `enhance_ratio` + works fine for several common models. - .. code-block:: sh + .. code-block:: sh - python tools/python/quantize/quantize_stat.py --log_file range_log > overall_range + python tools/python/quantize/quantize_stat.py --log_file range_log > overall_range - 4. Convert quantized model (by setting `target_abis` to the final target abis, e.g., `armeabi-v7a`, - `quantize` to `1` and `quantize_range_file` to the overall_range file path in yaml config). + 4. Convert quantized model (by setting `target_abis` to the final target abis, e.g., `armeabi-v7a`, + `quantize` to `1` and `quantize_range_file` to the overall_range file path in yaml config). -.. note:: +Supported devices +----------------- +MACE supports running quantized models on ARM CPU and other acceleration devices, e.g., Qualcomm Hexagon DSP, MediaTek APU. +ARM CPU is ubiquitous, which can speed up most of edge devices. However, AI specialized devices may run much faster +than ARM CPU, and in the meantime consume much lower power. Headers and libraries of these devices can be found in `third_party` +directory. + +* **To run models on Hexagon DSP, users should** + + 1. Make sure SOCs of the phone is manufactured by Qualcomm and has HVX supported. + + 2. Make sure the phone disables secure boot (once enabled, cannot be reversed, so you probably can only get that type + phones from manufacturers). This can be checked by executing the following command. + + .. code-block:: sh + + adb shell getprop ro.boot.secureboot + + The return value should be 0. + + 3. Root the phone. + + 4. Sign the phone by using testsig provided by Qualcomm. (Download Qualcomm Hexagon SDK first, plugin the phone to PC, + run scripts/testsig.py) + + 5. Push `third_party/nnlib/v6x/libhexagon_nn_skel.so` to `/system/vendor/lib/rfsa/adsp/`. You can check + `docs/feature_matrix.html` in Hexagon SDK to make sure which version to use. + +Then, there you go, you can run Mace on Hexagon DSP. This indeed seems like a whole lot of work to do. Well, the good news +is that starting in the SM8150 family(some devices with old firmware may still not work), signature-free dynamic +module offload is enabled on cDSP. So, steps 2-4 can be skipped. This can be achieved by calling `SetHexagonToUnsignedPD()` +before creating MACE engine. + +* **To run models on MediaTek APU, users should** + + 1. Make sure SOCs of the phone is manufactured by MediaTek and has APU supported. - `quantize_weights` and `quantize_nodes` should not be specified when using `TransformGraph` tool if using MACE quantization. + 2. Push `third_party/apu/mtxxxx/libapu-platform.so` to `/vendor/lib64/`.