Int8 design
Created by: Superjomn
Int8 design
There are two ways to support int8 conversion from float32:
- Convert the value directly by some math methods;
- Pros: simple
- Cons: some precision loss for some models
- Retrain the model in int8 domain
- Pros: lower the precision loss for all the models
- Cons: heavier to use
The converting way is a common way, and easy to carry out, it works for both the server-deployment and mobile deployment; while the retraining way harder but with better effect, we can make it a special feature for Paddle inference.
Converting way
- Input: the float32 model
- Run the infer-predictor N times, collect the statistics from the predictions. (with Python)
- Generate the Int8 conversion information for the parameters, persist it in some way. (with Python)
- The infer-predictor load the transformed model
- Load the parameters
- Load the program, run IR passes, change the graph if needed to support Int8.
- Run prediction ...
NOTE: the parameter statistics or parameter conversion can be developed with Python code, while the program modification better to develop with IR/passes, so that the infer-predictor can unify all the optimization ways, for example, schedule the Int8 as well as some high-performance float kernels in the same graph.
Retraining way
We have made some progress in this way, and tested it in some image classification models and works fine.
It works as follows:
- Input: the float32 model (python)
- Run the trainer, with the program inserted with some int8 domain operators (python)
- Persist the model (python)
- The infer-predictor load the model
- IR modify the program (remove the int8 training operators)
- IR transform the parameters (transform the float32 to the format for )
- and parameter with IR/pass and run the infer.
How to determine the boundary between python and C++
We prefer more codes in C++ so that the codes can be better reused and controlled, but considering Python is better for data analysis, so we had better.
- Leave the data analysis, some parameter transformation in Python
- The program modification in C++, as IR passes.