int8_mkldnn_quantization.md 8.1 KB
Newer Older
1
# INT8 MKL-DNN quantization
2

3
This document describes how to use Paddle inference Engine to convert the FP32 models to INT8 models. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the accuracy and performance results of the quantized models, including 7 image classification models: GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16, VGG19, and 1 object detection model Mobilenet-SSD.
4

5 6 7 8 9 10 11
## 0. Install PaddlePaddle

Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you build PaddlePaddle yourself, please use the following cmake arguments.

```bash
cmake ..  -DWITH_TESTING=ON -WITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_MKLDNN=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON

12 13 14 15
```

Note: MKL-DNN and MKL are required.

16 17
## 1. Enable INT8 MKL-DNN quantization

18
For reference, please examine the code of unit test enclosed in [analyzer_int8_image_classification_tester.cc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc) and [analyzer_int8_object_detection_tester.cc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/analyzer_int8_object_detection_tester.cc).
19 20

* ### Create Analysis config
21 22

INT8 quantization is one of the optimizations in analysis config. More information about analysis config can be found [here](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md#upgrade-performance-based-on-contribanalysisconfig-prerelease)
23 24

* ### Create quantize config by analysis config
25

26 27 28 29 30 31 32
We enable the MKL-DNN quantization procedure by calling an appropriate method from analysis config. Afterwards, all the required quantization parameters (quantization op names, quantization strategies etc.) can be set through quantizer config which is present in the analysis config. It is also necessary to specify a pre-processed warmup dataset and desired batch size.

```cpp
//Enable MKL-DNN quantization
cfg.EnableMkldnnQuantizer();

//use analysis config to call the MKL-DNN quantization config
33
cfg.mkldnn_quantizer_config()->SetWarmupData(warmup_data);
34 35 36
cfg.mkldnn_quantizer_config()->SetWarmupBatchSize(100);
```

37
## 2. Accuracy and Performance benchmark for Image Classification models
38 39 40

We provide the results of accuracy and performance measured on Intel(R) Xeon(R) Gold 6271 on single core.

41
>**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271**
42

43 44
| Model        | FP32 Accuracy   | INT8 Accuracy   | Accuracy Diff(FP32-INT8)   |
| :----------: | :-------------: | :------------:  | :--------------:           |
45 46 47 48 49 50 51
| GoogleNet    |  70.50%         |  70.08%         |   0.42%                    |
| MobileNet-V1 |  70.78%         |  70.41%         |   0.37%                    |
| MobileNet-V2 |  71.90%         |  71.34%         |   0.56%                    |
| ResNet-101   |  77.50%         |  77.43%         |   0.07%                    |
| ResNet-50    |  76.63%         |  76.57%         |   0.06%                    |
| VGG16        |  72.08%         |  72.05%         |   0.03%                    |
| VGG19        |  72.57%         |  72.57%         |   0.00%                    |
52

53
>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**
54

55 56
| Model        | FP32 Throughput(images/s)  | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
| :-----------:| :------------:             | :------------:            | :------------:  |
57 58 59 60 61 62 63
| GoogleNet    |    32.76                   |    67.43                  |   2.06          |
| MobileNet-V1 |    73.96                   |   218.82                  |   2.96          |
| MobileNet-V2 |    87.94                   |   193.70                  |   2.20          |
| ResNet-101   |     7.17                   |    26.37                  |   3.42          |
| ResNet-50    |    13.26                   |    48.72                  |   3.67          |
| VGG16        |     3.47                   |    10.10                  |   2.91          |
| VGG19        |     2.82                   |     8.68                  |   3.07          |
64

65

66
* ## Prepare dataset
67

68
Run the following commands to download and preprocess the ILSVRC2012 Validation dataset.
69

70 71 72 73
```bash
cd /PATH/TO/PADDLE/build
python ../paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py
```
74 75 76

Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin`

77
* ## Commands to reproduce image classification benchmark
78

79
You can run `test_analyzer_int8_imagenet_classification` with the following arguments to reproduce the accuracy result on Resnet50.
80

81
```bash
82 83
cd /PATH/TO/PADDLE/build
./paddle/fluid/inference/tests/api/test_analyzer_int8_image_classification --infer_model=third_party/inference_demo/int8v2/resnet50/model --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --paddle_num_threads=1
84
```
85 86 87

To verify all the 7 models, you need to set the parameter of `--infer_model` to one of the following values in command line:

88
```bash
89 90 91 92 93
--infer_model /PATH/TO/PADDLE/build/third_party/inference_demo/int8v2/MODEL_NAME/model
```

```text
MODEL_NAME=googlenet, mobilenetv1, mobilenetv2, resnet101, resnet50, vgg16, vgg19
94
```
95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

## 3. Accuracy and Performance benchmark for Object Detection models

>**I. mAP on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core):**

| Model        | FP32 Accuracy   | INT8 Accuracy   | Accuracy Diff(FP32-INT8)   |
| :----------: | :-------------: | :------------:  | :--------------:           |
| Mobilenet-SSD| 73.80%         |  73.17%         |   0.63%                    |

>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**

| Model        | FP32 Throughput(images/s)  | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
| :-----------:| :------------:             | :------------:            | :------------:  |
| Mobilenet-SSD    |    37.8180       | 115.0604 |3.04 |

* ## Prepare dataset

* Run the following commands to download and preprocess the Pascal VOC2007 test set.
  
```bash
cd /PATH/TO/PADDLE/build
python ./paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=VOC_test_2007 \\
```

Then the Pascal VOC2007 test set will be preprocessed and saved by default in `~/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin`

* Run the following commands to prepare your own dataset.

```bash
cd /PATH/TO/PADDLE/build
python ./paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=local \\
                                         --data_dir=./third_party/inference_demo/int8v2/pascalvoc_small \\
                                         --img_annotation_list=test_100.txt \\
                                         --label_file=label_list \\
                                         --output_file=pascalvoc_small.bin \\
                                         --resize_h=300 \\
                                         --resize_w=300 \\
                                         --mean_value=[127.5, 127.5, 127.5] \\
                                         --ap_version=11point \\
```
Then the user dataset will be preprocessed and saved by default in `/PATH/TO/PADDLE/build/third_party/inference_demo/int8v2/pascalvoc_small/pascalvoc_small.bin`

* ## Commands to reproduce object detection benchmark

You can run `test_analyzer_int8_object_detection` with the following arguments to reproduce the benchmark results for Mobilenet-SSD.

```bash
cd /PATH/TO/PADDLE/build
./paddle/fluid/inference/tests/api/test_analyzer_int8_object_detection --infer_model=third_party/inference_demo/int8v2/mobilenet-ssd/model --infer_data=$HOME/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin --warmup_batch_size=10 --batch_size=100 --paddle_num_threads=1
```

## 4. Notes

* Measurement of accuracy requires a model which accepts two inputs: data and labels.
* Different sampling batch size data may cause slight difference on INT8 accuracy.
* CAPI performance data is better than python API performance data because of the python overhead. Especially for the small computational model, python overhead will be more obvious.