Created by: cryoco
PR types
Function optimization
PR changes
Others
Describe
The former quant_conv2d_dequant_fuse_pass has problems as following:
-
Using a matrix-like structure to store IR nodes requires a threshold of the max number of quantized nodes preceded by the same quant node, which leads to serious problems when model structure is like:
-
Converting weights from int8 range to fp32 range happens in tensorrt_subgraph_pass, which means if some quantized op is outside trt subgraph, it's weight will not be converted. This might cause wrong results when trt subgraph can't cover all quantized nodes.
We refactored this pass by splitting the fusion into 2 phases, DeleteQuant Fuse and Dequant fuse, so that the threshold of branch nodes is not needed. Moreover, the range of quantized weights in conv/mul/fc is converted to fp32 in this pass instead of tensorrt_subgraph_pass, to produce the right result when trt subgraph can't cover all quantized nodes.