Add the calibration tool and document for int8 inference on Paddle !1458

Created by: guomingz

We developed the calibration tool for int8 inference. Here is the document for this tool.

Introduction of Accuracy Calibration Tool for 8 Bit Inference

The 8-bits (INT8) inference is also known as the Low Precision Inference which could speed up the inference with the lower accuracy loss. It has higher throughput and lower memory requirements compared to FP32. As the PaddlePaddle enables the INT8 inference supporting, we release a accuracy tool(Calibration.py) at the same time. This tool will generate the quantization parameters and quantized model file finally.

Usage

Build the PaddlePaddle with MKLDNN supporting.
cd /path/to/the/model/fluid/PaddleCV/image_classification
export FLAGS_use_mkldnn=True
Run the command python calibration.py --model=MobileNet --batch_size=50 --class_dim=1000 --image_shape=3,224,224 --with_mem_opt=True --use_gpu=False --pretrained_model=weights/mobilenet --out=quantized_out --algo=KL It will generate the specified model (by the parameter --model) to the output directory "quantized_out". Tht tool also provide the parameter "--algo" for KL divergence algorithm which will improve the accuracy.
Run the INT8 inference with this command python eval_int8.py --iterations=1000 --batch_size=50 --class_dim=1000 --image_shape=3,224,224 --with_mem_opt=True --use_gpu=False --pretrained_model=quantized_out.

Result

Accuracy

Topology	FP32 Accuracy(Top-1/Top-5)	INT8 Accurary(Top-1/Top-5)
ResNet-50 (FB)	76.63%/93.10%	76.42%/93.07%
MobileNet-V1	70.78%/89.69%	70.10%/89.30%

Performance

Topology	FP32 Throughput	INT8 Throughput	FP32 latency(1x1)	INT8 latency(1x1)
ResNet-50 (FB)	260	527	62.4	36.7
MobileNet-V1	1207	1934	9.3	8.4

Note: The above performance measured on SKX8180 1S (HT On, Turbo On) (Throughput imgs/sec; Latency: ms; 1x1: batch size 1 x thread 1)

PaddlePaddle / models 大约 1 年 前同步成功

Add the calibration tool and document for int8 inference on Paddle !1458

Introduction of Accuracy Calibration Tool for 8 Bit Inference

Usage

Result

PaddlePaddle / models
大约 1 年前同步成功