diff --git a/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc b/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc index fbf67d933786e3ee2baab7a20911da2837cdce4d..ae78e07304f1d0e9905f78498f3a0c5ca2a64fe7 100644 --- a/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc +++ b/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc @@ -138,7 +138,7 @@ void SetInput(std::vector> *inputs, } } -TEST(Analyzer_int8_resnet50, quantization) { +TEST(Analyzer_int8_image_classification, quantization) { AnalysisConfig cfg; SetConfig(&cfg); diff --git a/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md b/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md index cbeef5fb9da42388eade6fa90344abf77cb59bd6..e1d3a31f2921b04a2c8f3bb385278421b20212ea 100644 --- a/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md +++ b/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md @@ -1,22 +1,28 @@ -# INT8 MKL-DNN quantization +# INT8 MKL-DNN quantization This document describes how to use Paddle inference Engine to convert the FP32 model to INT8 model on ResNet-50 and MobileNet-V1. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the ResNet-50 and MobileNet-V1 results in accuracy and performance. -## 0. Install PaddlePaddle -Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you build PaddlePaddle yourself, please use the following cmake arguments. +## 0. Install PaddlePaddle + +Follow PaddlePaddle [installation instruction](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification#installation) to install PaddlePaddle. If you build PaddlePaddle yourself, please use the following cmake arguments. + +```bash +cmake .. -DWITH_TESTING=ON -WITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_MKL=ON -DWITH_MKLDNN=ON -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON + ``` -cmake .. -DWITH_TESTING=ON -WITH_FLUID_ONLY=ON -DWITH_GPU=OFF -DWITH_MKL=ON -WITH_SWIG_PY=OFF -DWITH_INFERENCE_API_TEST=ON -DON_INFER=ON -``` Note: MKL-DNN and MKL are required. -## 1. Enable INT8 MKL-DNN quantization +## 1. Enable INT8 MKL-DNN quantization + For reference, please examine the code of unit test enclosed in [analyzer_int8_image_classification_tester.cc](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/analyzer_int8_image_classification_tester.cc). * ### Create Analysis config -INT8 quantization is one of the optimizations in analysis config. More information about analysis config can be found [here](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md#upgrade-performance-based-on-contribanalysisconfig-prerelease) + +INT8 quantization is one of the optimizations in analysis config. More information about analysis config can be found [here](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md#upgrade-performance-based-on-contribanalysisconfig-prerelease) * ### Create quantize config by analysis config + We enable the MKL-DNN quantization procedure by calling an appropriate method from analysis config. Afterwards, all the required quantization parameters (quantization op names, quantization strategies etc.) can be set through quantizer config which is present in the analysis config. It is also necessary to specify a pre-processed warmup dataset and desired batch size. ```cpp @@ -24,7 +30,7 @@ We enable the MKL-DNN quantization procedure by calling an appropriate method fr cfg.EnableMkldnnQuantizer(); //use analysis config to call the MKL-DNN quantization config -cfg.mkldnn_quantizer_config()->SetWarmupData(warmup_data); +cfg.mkldnn_quantizer_config()->SetWarmupData(warmup_data); cfg.mkldnn_quantizer_config()->SetWarmupBatchSize(100); ``` @@ -32,39 +38,68 @@ cfg.mkldnn_quantizer_config()->SetWarmupBatchSize(100); We provide the results of accuracy and performance measured on Intel(R) Xeon(R) Gold 6271 on single core. - >**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271** +>**Dataset: ILSVRC2012 Validation dataset** + +>**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271** -| Model | Dataset | FP32 Accuracy | INT8 Accuracy | Accuracy Diff | -| :------------: | :------------: | :------------: | :------------: | :------------: | -| ResNet-50 | Full ImageNet Val | 76.63% | 76.48% | 0.15% | -| MobileNet-V1 | Full ImageNet Val | 70.78% | 70.36% | 0.42% | +| Model | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(FP32-INT8) | +| :----------: | :-------------: | :------------: | :--------------: | +| GoogleNet | 70.50% | 69.81% | 0.69% | +| MobileNet-V1 | 70.78% | 70.42% | 0.36% | +| MobileNet-V2 | 71.90% | 71.35% | 0.55% | +| ResNet-101 | 77.50% | 77.42% | 0.08% | +| ResNet-50 | 76.63% | 76.52% | 0.11% | +| VGG16 | 72.08% | 72.03% | 0.05% | +| VGG19 | 72.57% | 72.55% | 0.02% | - >**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)** +>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)** -| Model | Dataset | FP32 Throughput | INT8 Throughput | Ratio(INT8/FP32) | -| :------------: | :------------: | :------------: | :------------: | :------------: | -| ResNet-50 | Full ImageNet Val | 13.17 images/s | 49.84 images/s | 3.78 | -| MobileNet-V1 | Full ImageNet Val | 75.49 images/s | 232.38 images/s | 3.07 | +| Model | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32)| +| :-----------:| :------------: | :------------: | :------------: | +| GoogleNet | 34.06 | 72.79 | 2.14 | +| MobileNet-V1 | 80.02 | 230.65 | 2.88 | +| MobileNet-V2 | 99.38 | 206.92 | 2.08 | +| ResNet-101 | 7.38 | 27.31 | 3.70 | +| ResNet-50 | 13.71 | 50.55 | 3.69 | +| VGG16 | 3.64 | 10.56 | 2.90 | +| VGG19 | 2.95 | 9.02 | 3.05 | Notes: + * Measurement of accuracy requires a model which accepts two inputs: data and labels. -* Different sampling batch size data may cause slight difference on INT8 top accuracy. -* CAPI performance data is better than python API performance data because of the python overhead. Especially for the small computational model, python overhead will be more obvious. +* Different sampling batch size data may cause slight difference on INT8 top accuracy. +* CAPI performance data is better than python API performance data because of the python overhead. Especially for the small computational model, python overhead will be more obvious. ## 3. Commands to reproduce the above accuracy and performance benchmark -* #### Full dataset (Single core) - * ##### Download full ImageNet Validation Dataset + +Two steps to reproduce the above-mentioned accuracy results, and we take GoogleNet benchmark as an example: + +* ### Prepare dataset + +Running the following commands to download and preprocess the ILSVRC2012 Validation dataset. + ```bash cd /PATH/TO/PADDLE/build python ../paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py ``` -The converted data binary file is saved by default in ~/.cache/paddle/dataset/int8/download/int8_full_val.bin - * ##### ResNet50 Full dataset benchmark + +Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin` + +* ### Commands to reproduce benchmark + +You can run `test_analyzer_int8_imagenet_classification` with the following arguments to reproduce the accuracy result on GoogleNet. + ```bash -./paddle/fluid/inference/tests/api/test_analyzer_int8_resnet50 --infer_model=third_party/inference_demo/int8v2/resnet50/model --infer_data=/path/to/converted/int8_full_val.bin --batch_size=1 --paddle_num_threads=1 +./paddle/fluid/inference/tests/api/test_analyzer_int8_imagenet_classification --infer_model=third_party/inference_demo/int8v2/resnet50/model --infer_data=/~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --paddle_num_threads=1 ``` - * ##### Mobilenet-v1 Full dataset benchmark + +To verify all the 7 models, you need to set the parameter of `--infer_model` to one of the following values in command line: + ```bash -./paddle/fluid/inference/tests/api/test_analyzer_int8_mobilenet --infer_model=third_party/inference_demo/int8v2/mobilenet/model --infer_data=/path/to/converted/int8_full_val.bin --batch_size=1 --paddle_num_threads=1 +--infer_model /PATH/TO/PADDLE/build/third_party/inference_demo/int8v2/MODEL_NAME/model +``` + +```text +MODEL_NAME=googlenet, mobilenetv1, mobilenetv2, resnet101, resnet50, vgg16, vgg19 ``` diff --git a/paddle/fluid/inference/tests/api/tester_helper.h b/paddle/fluid/inference/tests/api/tester_helper.h index eb786196a88482817617f0156327be95e67bd4ad..05936458cefbadecb0ab15afdc3b48d0fd4d64ce 100644 --- a/paddle/fluid/inference/tests/api/tester_helper.h +++ b/paddle/fluid/inference/tests/api/tester_helper.h @@ -336,14 +336,20 @@ void PredictionRun(PaddlePredictor *predictor, #ifdef WITH_GPERFTOOLS ProfilerStart("paddle_inference.prof"); #endif + int predicted_num = 0; if (!FLAGS_zero_copy) { - run_timer.tic(); for (int i = 0; i < iterations; i++) { + run_timer.tic(); for (int j = 0; j < num_times; j++) { predictor->Run(inputs[i], &(*outputs)[i], FLAGS_batch_size); } + elapsed_time += run_timer.toc(); + + predicted_num += FLAGS_batch_size; + if (predicted_num % 100 == 0) { + LOG(INFO) << predicted_num << " samples"; + } } - elapsed_time = run_timer.toc(); } else { for (int i = 0; i < iterations; i++) { ConvertPaddleTensorToZeroCopyTensor(predictor, inputs[i]); @@ -352,8 +358,14 @@ void PredictionRun(PaddlePredictor *predictor, predictor->ZeroCopyRun(); } elapsed_time += run_timer.toc(); + + predicted_num += FLAGS_batch_size; + if (predicted_num % 100 == 0) { + LOG(INFO) << predicted_num << " samples"; + } } } + #ifdef WITH_GPERFTOOLS ProfilerStop(); #endif diff --git a/python/paddle/fluid/contrib/slim/tests/slim_int8_mkldnn_post_training_quantization.md b/python/paddle/fluid/contrib/slim/tests/slim_int8_mkldnn_post_training_quantization.md new file mode 100644 index 0000000000000000000000000000000000000000..33edb13c48c88b5ceb2c405ce4726209018249b4 --- /dev/null +++ b/python/paddle/fluid/contrib/slim/tests/slim_int8_mkldnn_post_training_quantization.md @@ -0,0 +1,130 @@ +# PaddleSlim Post-training quantization (MKL-DNN INT8) + +This document describes how to use [PaddleSlim](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/usage.md) to convert a FP32 ProgramDesc with FP32 weights to an INT8 ProgramDesc with FP32 weights on GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16 and VGG19. We provide the instructions on how to enable MKL-DNN INT8 calibration in PaddleSlim and show the results of accuracy on all the 7 models as mentioned. + +## 0. Prerequisite + +You need to install at least PaddlePaddle-1.5 python package `pip install paddlepaddle==1.5`. + +## 1. How to generate INT8 ProgramDesc with FP32 weights + +You can refer to the usage doc of [PaddleSlim](https://github.com/PaddlePaddle/models/blob/develop/PaddleSlim/docs/usage.md) in section 1.2 for details that how to use PaddleSlim Compressor. But for PaddleSlim Post-training quantization with MKL-DNN INT8, there are two differences. + +* Differences in `paddle.fluid.contrib.slim.Compressor` arguments + +Since the only one requirement in PaddleSlim Post-training quantization with MKL-DNN INT8 is the reader of warmup dataset, so you need to set other parameters of `paddle.fluid.contrib.slim.Compressor` to None, [] or ''. + +```python +com_pass = Compressor( + place=None, # not required, set to None + scope=None, # not required, set to None + train_program=None, # not required, set to None + train_reader=None, # not required, set to None + train_feed_list=[], # not required, set to [] + train_fetch_list=[], # not required, set to [] + eval_program=None, # not required, set to None + eval_reader=reader, # required, the reader of warmup dataset + eval_feed_list=[], # not required, set to [] + eval_fetch_list=[], # not required, set to [] + teacher_programs=[], # not required, set to [] + checkpoint_path='', # not required, set to '' + train_optimizer=None, # not required, set to None + distiller_optimizer=None # not required, set to None + ) +``` + +* Differences in yaml config + +An example yaml config is listed below, for more details, you can refer to [config_mkldnn_int8.yaml](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/fluid/contrib/slim/tests/quantization/config_mkldnn_int8.yaml) which is used in unit test. + +```yaml +version: 1.0 +strategies: + mkldnn_post_training_strategy: + class: 'MKLDNNPostTrainingQuantStrategy' # required, class name of MKL-DNN INT8 Post-training quantization strategy + int8_model_save_path: 'OUTPUT_PATH' # required, int8 ProgramDesc with fp32 weights + fp32_model_path: 'MODEL_PATH' # required, fp32 ProgramDesc with fp32 weights + cpu_math_library_num_threads: 1 # required, The number of cpu math library threads +compressor: + epoch: 0 # not required, set to 0 + checkpoint_path: '' # not required, set to '' + strategies: + - mkldnn_post_training_strategy +``` + +## 2. How to run INT8 ProgramDesc with fp32 weights + +You can load INT8 ProgramDesc with fp32 weights by load_inference_model [API](https://github.com/PaddlePaddle/Paddle/blob/8b50ad80ff6934512d3959947ac1e71ea3fb9ea3/python/paddle/fluid/io.py#L991) and run INT8 inference similar as [FP32](https://github.com/PaddlePaddle/models/blob/develop/PaddleCV/object_detection/eval.py "FP32"). + +```python +[infer_program, feed_dict, fetch_targets] = fluid.io.load_inference_model(model_path, exe) +``` + +## 3. Result + +We provide the results of accuracy measured on Intel(R) Xeon(R) Gold 6271. + +>**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271** + +>**Dataset: ILSVRC2012 Validation dataset** + +| Model | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(FP32-INT8) | +| :----------: | :-------------: | :------------: | :--------------: | +| GoogleNet | 70.50% | 69.81% | 0.69% | +| MobileNet-V1 | 70.78% | 70.42% | 0.36% | +| MobileNet-V2 | 71.90% | 71.35% | 0.55% | +| ResNet-101 | 77.50% | 77.42% | 0.08% | +| ResNet-50 | 76.63% | 76.52% | 0.11% | +| VGG16 | 72.08% | 72.03% | 0.05% | +| VGG19 | 72.57% | 72.55% | 0.02% | + +Notes: + +* MKL-DNN and MKL are required. + +## 4. How to reproduce the results + +Three steps to reproduce the above-mentioned accuracy results, and we take GoogleNet benchmark as an example: + +* ### Prepare dataset + +You can run the following commands to download and preprocess the ILSVRC2012 Validation dataset. + +```bash +cd /PATH/TO/PADDLE +python ./paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py +``` + +Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin` + +* ### Prepare model + +You can run the following commands to download GoogleNet model. + +```bash +mkdir -p /PATH/TO/DOWNLOAD/MODEL/ +cd /PATH/TO/DOWNLOAD/MODEL/ +export MODEL_NAME=GoogleNet +wget http://paddle-inference-dist.bj.bcebos.com/int8/${MODEL_NAME}_int8_model.tar.gz +mkdir -p ${MODEL_NAME} +tar -xvf ${MODEL_NAME}_int8_model.tar.gz -C ${MODEL_NAME} +``` + +To download and verify all the 7 models, you need to set `MODEL_NAME` to one of the following values in command line: + +```text +MODEL_NAME=GoogleNet, mobilenetv1, mobilenet_v2, Res101, resnet50, VGG16, VGG19 +``` + +* ### Commands to reproduce benchmark + +You can run `test_mkldnn_int8_quantization_strategy.py` with the following arguments to reproduce the accuracy result on GoogleNet. + +``` bash +cd /PATH/TO/PADDLE/python/paddle/fluid/contrib/slim/tests/ +python ./test_mkldnn_int8_quantization_strategy.py --infer_model /PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME}/model --infer_data ~/.cache/paddle/dataset/int8/download/int8_full_val.bin --warmup_batch_size 100 --batch_size 1 +``` + +Notes: + +* The above commands will cost maybe several hours in the prediction stage (include int8 prediction and fp32 prediction) since there have 50000 pictures need to be predicted in `int8_full_val.bin`