Created by: wojtuss
PR types
New features
PR changes
OPs
Describe
This patch adds support for quantization with shift (aka asymmetric quantization). The support is required by the fusion_gru
op quantization (https://github.com/PaddlePaddle/Paddle/issues/27330).
Quantization with shift is performed according to the formula:
out_u8 = in_f32 * scale + shift
When shift is nonzero, the output of quantization is always of unsigned int8 data type. Dequantization formula is:
out_f32 = in_u8/scale - shift/scale
Dequantization with shift expects unsigned int8 input.
Support in quantization passes will come in a separate PR.