[Paddle-TRT] Optimize TRT int8 support for inferencing models from PaddleSlim QAT (!24688) · 合并请求 · PaddlePaddle / Paddle

[Paddle-TRT] Optimize TRT int8 support for inferencing models from PaddleSlim QAT !24688

Created by: cryoco

PR types

New features

PR changes

Others

Describe

Problem: While using Paddle-TRT int8 to preform inference with models trained by PaddleSlim quant-aware training, there are some restraints during the training phase: if there are ops like elementwise_add, pool2d, leaky_relu, etc. in the model, they must be quantized, or TRT will throw an error during inference. This damages the flexibility for the utilization of the int8 quantization. Optimization: The problem above happens in TRT 5. In TRT 6 and higher versions, such situation does not result in error but a precision fallback. We added support to ensure that while training with PaddleSlim QAT, users doesn't have to add extra quant ops for inferencing with TensorRT, without damaging performance of the quant models.

PaddlePaddle / Paddle 大约 1 年 前同步成功