Running additional passes after a few warm-up inference iterations
Created by: wojtuss
@luotao1 , @Superjomn In order to optimize a model with INT8 quantization, it would be necessary to optimize the model using common passes first, then run a few warm-up iterations to gather data required for quantization, then quantize the model using additional passes and finally start the main inference iterations. In your opinion, what would be the best way to handle such a scenario (passes -> warm-up -> passes -> inference)?
Our current idea is to add a method like PaddlePredictor::Reconfigure(const ConfigT& config)
where the predictor's configuration would be modified and new passes would be run on the same inference program.
Our current idea is to handle this inside the AnalysisPredictor::PrepareProgram()
method.