[Paddle TRT Int8] Refactor quant_conv2d_dequant_fuse_pass (!24861) · 合并请求 · PaddlePaddle / Paddle

[Paddle TRT Int8] Refactor quant_conv2d_dequant_fuse_pass !24861

Created by: cryoco

PR types

Function optimization

PR changes

Others

Describe

The former quant_conv2d_dequant_fuse_pass has problems as following:

Using a matrix-like structure to store IR nodes requires a threshold of the max number of quantized nodes preceded by the same quant node, which leads to serious problems when model structure is like:
Converting weights from int8 range to fp32 range happens in tensorrt_subgraph_pass, which means if some quantized op is outside trt subgraph, it's weight will not be converted. This might cause wrong results when trt subgraph can't cover all quantized nodes.

We refactored this pass by splitting the fusion into 2 phases, DeleteQuant Fuse and Dequant fuse, so that the threshold of branch nodes is not needed. Moreover, the range of quantized weights in conv/mul/fc is converted to fp32 in this pass instead of tensorrt_subgraph_pass, to produce the right result when trt subgraph can't cover all quantized nodes.

PaddlePaddle / Paddle 1 年多 前同步成功

[Paddle TRT Int8] Refactor quant_conv2d_dequant_fuse_pass !24861

PR types

PR changes

Describe

PaddlePaddle / Paddle
1 年多前同步成功