diff --git a/docs/user_guide/advanced_usage.rst b/docs/user_guide/advanced_usage.rst index 091fcb989a8b16b05de058bd8d80e1038c177a8a..0f9d76093077835b71623edd274801f277a07ae8 100644 --- a/docs/user_guide/advanced_usage.rst +++ b/docs/user_guide/advanced_usage.rst @@ -577,12 +577,16 @@ so MACE provides several ways to reduce the model size with no or little perform **1. Save model weights in half-precision floating point format** -The default data type of a regular model is float (32bit). To reduce the model weights size, +The data type of a regular model is float (32bit). To reduce the model weights size, half (16bit) can be used to reduce it by half with negligible accuracy degradation. +Therefore, the default storage type for a regular model in MACE is half. However, +if the model is very sensitive to accuracy, storage type can be changed to float. -For CPU, ``data_type`` can be specified as ``fp16_fp32`` in the deployment file to save the weights in half and actual inference in float. +In the deployment file, ``data_type`` is ``fp16_fp32`` by default and can be changed to ``fp32_fp32``. -For GPU, ``fp16_fp32`` is default. The ops in GPU take half as inputs and outputs while kernel execution in float. +For CPU, ``fp16_fp32`` means that the weights are saved in half and actual inference is in float. + +For GPU, ``fp16_fp32`` means that the ops in GPU take half as inputs and outputs while kernel execution in float. **2. Save model weights in quantized fixed point format** diff --git a/docs/user_guide/advanced_usage_cmake.rst b/docs/user_guide/advanced_usage_cmake.rst index 87d17fe4133c89724fd51c563bc9a1cf47f42dfe..7be5e2f227a6950ae83bc7bb9d218cd1fcb1a87d 100644 --- a/docs/user_guide/advanced_usage_cmake.rst +++ b/docs/user_guide/advanced_usage_cmake.rst @@ -406,12 +406,16 @@ so MACE provides several ways to reduce the model size with no or little perform **1. Save model weights in half-precision floating point format** -The default data type of a regular model is float (32bit). To reduce the model weights size, +The data type of a regular model is float (32bit). To reduce the model weights size, half (16bit) can be used to reduce it by half with negligible accuracy degradation. +Therefore, the default storage type for a regular model in MACE is half. However, +if the model is very sensitive to accuracy, storage type can be changed to float. -For CPU, ``data_type`` can be specified as ``fp16_fp32`` in the deployment file to save the weights in half and actual inference in float. +In the deployment file, ``data_type`` is ``fp16_fp32`` by default and can be changed to ``fp32_fp32``. -For GPU, ``fp16_fp32`` is default. The ops in GPU take half as inputs and outputs while kernel execution in float. +For CPU, ``fp16_fp32`` means that the weights are saved in half and actual inference is in float. + +For GPU, ``fp16_fp32`` means that the ops in GPU take half as inputs and outputs while kernel execution in float. **2. Save model weights in quantized fixed point format** diff --git a/tools/python/utils/config_parser.py b/tools/python/utils/config_parser.py index 9e5c9f6dc843e8bb9a54f1e65fc99fd75ab1b455..5a56fd3c4e54ce27920939579522da7f673afdad 100644 --- a/tools/python/utils/config_parser.py +++ b/tools/python/utils/config_parser.py @@ -204,7 +204,7 @@ def normalize_model_config(conf): conf[ModelKeys.platform] = parse_platform(conf[ModelKeys.platform]) conf[ModelKeys.runtime] = parse_device_type(conf[ModelKeys.runtime]) - if ModelKeys.quantize in conf: + if ModelKeys.quantize in conf and conf[ModelKeys.quantize] == 1: conf[ModelKeys.data_type] = mace_pb2.DT_FLOAT else: if ModelKeys.data_type in conf: