Created by: cryoco
PR types
New features
PR changes
Others
Describe
Problem: While using Paddle-TRT int8 to preform inference with models trained by PaddleSlim quant-aware training, there are some restraints during the training phase: if there are ops like elementwise_add, pool2d, leaky_relu, etc. in the model, they must be quantized, or TRT will throw an error during inference. This damages the flexibility for the utilization of the int8 quantization. Optimization: The problem above happens in TRT 5. In TRT 6 and higher versions, such situation does not result in error but a precision fallback. We added support to ensure that while training with PaddleSlim QAT, users doesn't have to add extra quant ops for inferencing with TensorRT, without damaging performance of the quant models.