提交 fbf9eca0 编写于 作者: L lidanqing 提交者: Tao Luo

QAT Int8 document (#21360)

* update benchmark for int8v2, QAT1, QAT2 accuracy and performance
test=document_fix

* change according to reviews
test=develop test=document_fix

* improve some descriptions and some models
test=develop test=document_fix

* update models benchmark data
test=develop test=document_fix

* update int8v2 and qat2 performance
test=develop test=document_fix
上级 a1a5adc9
# INT8 MKL-DNN quantization
# INT8 MKL-DNN post-training quantization
This document describes how to use Paddle inference Engine to convert the FP32 models to INT8 models. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the accuracy and performance results of the quantized models, including 7 image classification models: GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16, VGG19, and 1 object detection model Mobilenet-SSD.
This document describes how to use Paddle inference Engine to convert the FP32 models to INT8 models using INT8 MKL-DNN post-training quantization. We provide the instructions on enabling INT8 MKL-DNN quantization in Paddle inference and show the accuracy and performance results of the quantized models, including 7 image classification models: GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16, VGG19, and 1 object detection model Mobilenet-SSD.
## 0. Install PaddlePaddle
......@@ -40,28 +40,27 @@ We provide the results of accuracy and performance measured on Intel(R) Xeon(R)
>**I. Top-1 Accuracy on Intel(R) Xeon(R) Gold 6271**
| Model | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(FP32-INT8) |
| :----------: | :-------------: | :------------: | :--------------: |
| GoogleNet | 70.50% | 70.08% | 0.42% |
| MobileNet-V1 | 70.78% | 70.41% | 0.37% |
| MobileNet-V2 | 71.90% | 71.34% | 0.56% |
| ResNet-101 | 77.50% | 77.43% | 0.07% |
| ResNet-50 | 76.63% | 76.57% | 0.06% |
| VGG16 | 72.08% | 72.05% | 0.03% |
| VGG19 | 72.57% | 72.57% | 0.00% |
| Model | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(INT8-FP32) |
|:------------:|:-------------:|:-------------:|:------------------------:|
| GoogleNet | 70.50% | 70.08% | -0.42% |
| MobileNet-V1 | 70.78% | 70.41% | -0.37% |
| MobileNet-V2 | 71.90% | 71.34% | -0.56% |
| ResNet-101 | 77.50% | 77.43% | -0.07% |
| ResNet-50 | 76.63% | 76.57% | -0.06% |
| VGG16 | 72.08% | 72.05% | -0.03% |
| VGG19 | 72.57% | 72.57% | 0.00% |
>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**
| Model | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
| :-----------:| :------------: | :------------: | :------------: |
| GoogleNet | 32.76 | 67.43 | 2.06 |
| MobileNet-V1 | 73.96 | 218.82 | 2.96 |
| MobileNet-V2 | 87.94 | 193.70 | 2.20 |
| ResNet-101 | 7.17 | 26.37 | 3.42 |
| ResNet-50 | 13.26 | 48.72 | 3.67 |
| VGG16 | 3.47 | 10.10 | 2.91 |
| VGG19 | 2.82 | 8.68 | 3.07 |
| Model | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32) |
|:------------:|:-------------------------:|:-------------------------:|:----------------:|
| GoogleNet | 32.53 | 68.32 | 2.13 |
| MobileNet-V1 | 73.98 | 224.91 | 3.04 |
| MobileNet-V2 | 86.59 | 204.91 | 2.37 |
| ResNet-101 | 7.15 | 26.73 | 3.74 |
| ResNet-50 | 13.15 | 49.48 | 3.76 |
| VGG16 | 3.34 | 10.11 | 3.03 |
| VGG19 | 2.83 | 8.68 | 3.07 |
* ## Prepare dataset
......@@ -72,7 +71,7 @@ cd /PATH/TO/PADDLE/build
python ../paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py
```
Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin`
Then the ILSVRC2012 Validation dataset will be preprocessed and saved by default in `$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin`
* ## Commands to reproduce image classification benchmark
......@@ -95,17 +94,17 @@ MODEL_NAME=googlenet, mobilenetv1, mobilenetv2, resnet101, resnet50, vgg16, vgg1
## 3. Accuracy and Performance benchmark for Object Detection models
>**I. mAP on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core):**
>**I. mAP on Intel(R) Xeon(R) Gold 6271 (batch size 100 on single core):**
| Model | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(FP32-INT8) |
| :----------: | :-------------: | :------------: | :--------------: |
| Mobilenet-SSD| 73.80% | 73.17% | 0.63% |
| Model | FP32 Accuracy | INT8 Accuracy | Accuracy Diff(INT8-FP32) |
|:-------------:|:-------------:|:-------------:|:------------------------:|
| Mobilenet-SSD | 73.80% | 73.17% | -0.63 |
>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 1 on single core)**
>**II. Throughput on Intel(R) Xeon(R) Gold 6271 (batch size 100 on single core)**
| Model | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
| :-----------:| :------------: | :------------: | :------------: |
| Mobilenet-SSD | 37.8180 | 115.0604 |3.04 |
| Model | FP32 Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32) |
|:-------------:|:-------------------------:|:-------------------------:|:----------------:|
| Mobilenet-ssd | 37.94 | 114.94 | 3.03 |
* ## Prepare dataset
......@@ -113,16 +112,16 @@ MODEL_NAME=googlenet, mobilenetv1, mobilenetv2, resnet101, resnet50, vgg16, vgg1
```bash
cd /PATH/TO/PADDLE/build
python ./paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=VOC_test_2007 \\
python ../paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=VOC_test_2007
```
Then the Pascal VOC2007 test set will be preprocessed and saved by default in `~/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin`
Then the Pascal VOC2007 test set will be preprocessed and saved by default in `$HOME/.cache/paddle/dataset/pascalvoc/pascalvoc_full.bin`
* Run the following commands to prepare your own dataset.
```bash
cd /PATH/TO/PADDLE/build
python ./paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=local \\
python ../paddle/fluid/inference/tests/api/full_pascalvoc_test_preprocess.py --choice=local \\
--data_dir=./third_party/inference_demo/int8v2/pascalvoc_small \\
--img_annotation_list=test_100.txt \\
--label_file=label_list \\
......
# SLIM Quantization-aware training (QAT) on INT8 MKL-DNN
This document describes how to use [Paddle Slim](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/paddle_slim/paddle_slim.md) to convert a quantization-aware trained model to an INT8 MKL-DNN. In **Release 1.5**, we have released the QAT MKL-DNN 1.0 which enabled the INT8 MKL-DNN kernel for QAT trained model within 0.05% accuracy diff on GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16 and VGG19. In **Release 1.6**, QAT MKL-DNN 2.0, we did the performance optimization based on fake QAT models: ResNet50, ResNet101, Mobilenet-v1, Mobilenet-v2, VGG16 and VGG19 with the minor accuracy drop. Compared with Release 1.5, the QAT MKL-DNN 2.0 got better performance gain on inference compared with fake QAT models but got a little bit bigger accuracy diff. We provide the accuracy benchmark both for QAT MKL-DNN 1.0 and QAT MKL-DNN 2.0, and performance benchmark on QAT MKL-DNN 2.0.
MKL-DNN INT8 quantization performance gain can only be obtained with AVX512 series CPU servers.
This document describes how to use [Paddle Slim](https://github.com/PaddlePaddle/FluidDoc/blob/develop/doc/fluid/advanced_usage/paddle_slim/paddle_slim.md) to convert a quantization-aware trained model to INT8 MKL-DNN quantized model. In **Release 1.5**, we have released the QAT1.0 MKL-DNN which enabled the INT8 MKL-DNN kernel for QAT trained model within 0.05% accuracy diff on GoogleNet, MobileNet-V1, MobileNet-V2, ResNet-101, ResNet-50, VGG16 and VGG19. In **Release 1.6**, QAT2.0 MKL-DNN, we did the performance optimization based on fake QAT models: ResNet50, ResNet101, Mobilenet-v1, Mobilenet-v2, VGG16 and VGG19 with the minor accuracy drop. Compared with Release 1.5, the QAT2.0 MKL-DNN got better performance gain on inference compared with fake QAT models but got a little bit bigger accuracy diff. We provide the accuracy benchmark both for QAT1.0 MKL-DNN and QAT2.0 MKL-DNN, and performance benchmark on QAT2.0 MKL-DNN.
Notes:
* MKL-DNN and MKL are required. The performance gain can only be obtained with AVX512 series CPU servers.
## 0. Prerequisite
You need to install at least PaddlePaddle-1.6 python package `pip install paddlepaddle==1.6`.
......@@ -19,12 +22,12 @@ You can refer to the unit test in [test_quantization_mkldnn_pass.py](test_quanti
graph = IrGraph(core.Graph(fluid.Program().desc), for_test=False)
place = fluid.CPUPlace()
# Convert the IrGraph to MKL-DNN supported INT8 IrGraph by using
# QAT MKL-DNN 1.0
# QAT1.0 MKL-DNN
# FakeQAT2MkldnnINT8KernelPass
mkldnn_pass = FakeQAT2MkldnnINT8KernelPass(fluid.global_scope(), place)
# Apply FakeQAT2MkldnnINT8KernelPass to IrGraph
mkldnn_pass.apply(graph)
# QAT MKL-DNN 2.0
# QAT2.0 MKL-DNN
# FakeQAT2MkldnnINT8PerfPass
mkldnn_pass = FakeQAT2MkldnnINT8PerfPass(fluid.global_scope(), place, fluid.core, False)
# Apply FakeQAT2MkldnnINT8PerfPass to IrGraph
......@@ -32,45 +35,46 @@ You can refer to the unit test in [test_quantization_mkldnn_pass.py](test_quanti
```
## 2. Accuracy benchmark
## 2. Accuracy and Performance benchmark
>**I. QAT1.0 MKL_DNN Accuracy on Intel(R) Xeon(R) Gold 6271**
>**I. QAT1.0 MKL-DNN Accuracy on Intel(R) Xeon(R) Gold 6271**
| Model | Fake QAT Top1 Accuracy | Fake QAT Top5 Accuracy |MKL-DNN INT8 Top1 Accuracy | Top1 Diff | MKL-DNN INT8 Top5 Accuracy | Top5 Diff |
| :----------: | :--------------------: | :--------------------: |:-----------------------: | :----------: | :------------------------: | :--------: |
| GoogleNet | 70.40% | 89.46% | 70.39% | -0.01% | 89.46% | 0.00% |
| MobileNet-V1 | 70.84% | 89.59% | 70.85% | +0.01% | 89.58% | -0.01% |
| MobileNet-V2 | 72.07% | 90.71% | 72.06% | -0.01% | 90.69% | -0.02% |
| ResNet-101 | 77.49% | 93.68% | 77.52% | +0.03% | 93.67% | -0.01% |
| ResNet-50 | 76.61% | 93.08% | 76.62% | +0.01% | 93.10% | +0.02% |
| VGG16 | 72.71% | 91.11% | 72.69% | -0.02% | 91.09% | -0.02% |
| VGG19 | 73.37% | 91.40% | 73.37% | 0.00% | 91.41% | +0.01% |
| Model | Fake QAT Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | Fake QAT Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
|:------------:|:----------------------:|:----------------------:|:---------:|:----------------------:|:----------------------:|:---------:|
| GoogleNet | 70.40% | 70.39% | -0.01% | 89.46% | 89.46% | 0.00% |
| MobileNet-V1 | 70.84% | 70.85% | +0.01% | 89.59% | 89.58% | -0.01% |
| MobileNet-V2 | 72.07% | 72.06% | -0.01% | 90.71% | 90.69% | -0.02% |
| ResNet-101 | 77.49% | 77.52% | +0.03% | 93.68% | 93.67% | -0.01% |
| ResNet-50 | 76.61% | 76.62% | +0.01% | 93.08% | 93.10% | +0.02% |
| VGG16 | 72.71% | 72.69% | -0.02% | 91.11% | 91.09% | -0.02% |
| VGG19 | 73.37% | 73.37% | 0.00% | 91.40% | 91.41% | +0.01% |
Notes:
* MKL-DNN and MKL are required. AVX512 CPU server is required.
>**II. QAT2.0 MKL-DNN Accuracy on Intel(R) Xeon(R) Gold 6271**
| Model | Fake QAT Top1 Accuracy | Fake QAT Top5 Accuracy |MKL-DNN INT8 Top1 Accuracy | Top1 Diff | MKL-DNN INT8 Top5 Accuracy | Top5 Diff |
| :----------: | :--------------------: | :--------------------: |:-----------------------: | :----------:| :------------------------: | :--------:|
| MobileNet-V1 | 70.72% | 89.47% | 70.78% | +0.06% | 89.39% | -0.08% |
| MobileNet-V2 | 72.07% | 90.65% | 72.17% | +0.10% | 90.63% | -0.02% |
| ResNet101 | 77.86% | 93.54% | 77.59% | -0.27% | 93.54% | -0.00% |
| ResNet50 | 76.62% | 93.01% | 76.53% | -0.09% | 92.98% | -0.03% |
| VGG16 | 71.74% | 89.96% | 71.75% | +0.01% | 89.73% | -0.23% |
| VGG19 | 72.30% | 90.19% | 72.09% | -0.21% | 90.13% | -0.06% |
>**III. QAT2.0 MKL-DNN Python Performance on Intel(R) Xeon(R) Gold 6271**
| Model | Fake QAT Original Throughput(images/s) | INT8 Throughput(images/s) | Ratio(INT8/FP32)|
| :-----------:| :-------------------------: | :------------: | :------------: |
| MobileNet-V1 | 12.86 | 118.05 | 9.18 |
| MobileNet-V2 | 9.76 | 85.89 | 8.80 |
| ResNet101 | 2.55 | 19.40 | 7.61 |
| ResNet50 | 4.39 | 35.78 | 8.15 |
| VGG16 | 2.26 | 9.89 | 4.38 |
| VGG19 | 1.96 | 8.41 | 4.29 |
| Model | Fake QAT Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | Fake QAT Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
|:------------:|:----------------------:|:----------------------:|:---------:|:----------------------:|:----------------------:|:---------:|
| MobileNet-V1 | 70.72% | 70.78% | +0.06% | 89.47% | 89.39% | -0.08% |
| MobileNet-V2 | 72.07% | 72.17% | +0.10% | 90.65% | 90.63% | -0.02% |
| ResNet101 | 77.86% | 77.59% | -0.27% | 93.54% | 93.54% | 0.00% |
| ResNet50 | 76.62% | 76.53% | -0.09% | 93.01% | 92.98% | -0.03% |
| VGG16 | 71.74% | 71.75% | +0.01% | 89.96% | 89.73% | -0.23% |
| VGG19 | 72.30% | 72.09% | -0.21% | 90.19% | 90.13% | -0.06% |
>**III. QAT2.0 MKL-DNN C-API Performance on Intel(R) Xeon(R) Gold 6271**
| Model | FP32 Optimized Throughput (images/s) | INT8 QAT Throughput(images/s) | Ratio(INT8/FP32) |
|:------------:|:------------------------------------:|:-----------------------------:|:----------------:|
| MobileNet-V1 | 73.98 | 227.73 | 3.08 |
| MobileNet-V2 | 86.59 | 206.74 | 2.39 |
| ResNet101 | 7.15 | 26.69 | 3.73 |
| ResNet50 | 13.15 | 49.33 | 3.75 |
| VGG16 | 3.34 | 10.15 | 3.04 |
| VGG19 | 2.83 | 8.67 | 3.07 |
Notes:
* FP32 Optimized Throughput (images/s) is from [int8_mkldnn_quantization.md](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/inference/tests/api/int8_mkldnn_quantization.md).
## 3. How to reproduce the results
Three steps to reproduce the above-mentioned accuracy results, and we take ResNet50 benchmark as an example:
......@@ -79,46 +83,57 @@ Three steps to reproduce the above-mentioned accuracy results, and we take ResNe
cd /PATH/TO/PADDLE
python paddle/fluid/inference/tests/api/full_ILSVRC2012_val_preprocess.py
```
The converted data binary file is saved by default in `~/.cache/paddle/dataset/int8/download/int8_full_val.bin`
The converted data binary file is saved by default in `$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin`
* ### Prepare model
You can run the following commands to download ResNet50 model. The exemplary code snippet provided below downloads a ResNet50 QAT model. The reason for having two different versions of the same model originates from having a two different QAT training strategies: One for an non-optimized and second for an optimized graph transform which correspond to QAT 1.0 and QAT 2.0 respectively.
You can run the following commands to download ResNet50 model. The exemplary code snippet provided below downloads a ResNet50 QAT model. The reason for having two different versions of the same model originates from having two different QAT training strategies: One for an non-optimized and second for an optimized graph transform which correspond to QAT1.0 and QAT2.0 respectively.
```bash
mkdir -p /PATH/TO/DOWNLOAD/MODEL/
cd /PATH/TO/DOWNLOAD/MODEL/
# uncomment for QAT 1.0 MKL-DNN
# uncomment for QAT1.0 MKL-DNN
# export MODEL_NAME=ResNet50
# export MODEL_FILE_NAME= QAT_models/${MODEL_NAME}_qat_model.tar.gz
# uncomment for QAT 2.0 MKL-DNN
# uncomment for QAT2.0 MKL-DNN
# export MODEL_NAME=resnet50
# export MODEL_FILE_NAME= QAT2_models/${MODEL_NAME}_quant.tar.gz
wget http://paddle-inference-dist.bj.bcebos.com/int8/${MODEL_FILE_NAME}
```
Unzip the downloaded model to the folder.To verify all the 7 models, you need to set `MODEL_NAME` to one of the following values in command line:
Unzip the downloaded model to the folder. To verify all the 7 models, you need to set `MODEL_NAME` to one of the following values in command line:
```text
QAT MKL-DNN 1.0
QAT1.0 models
MODEL_NAME=ResNet50, ResNet101, GoogleNet, MobileNetV1, MobileNetV2, VGG16, VGG19
QAT MKL-DNN 2.0
QAT2.0 models
MODEL_NAME=resnet50, resnet101, mobilenetv1, mobilenetv2, vgg16, vgg19
```
* ### Commands to reproduce benchmark
You can run `qat_int8_comparison.py` with the following arguments to reproduce the accuracy result on ResNet50. The difference of command line between the QAT MKL-DNN 1.0 and QAT MKL-DNN 2.0 is that we use argument `qat2` to enable QAT MKL-DNN 2.0. To perform QAT MKL-DNN 2.0 the performance test, the environmental variable `OMP_NUM_THREADS=1` and `batch_size=1` parameter should be set.
>*QAT 1.0*
You can run `qat_int8_comparison.py` with the following arguments to reproduce the accuracy result on ResNet50. The difference of command line between the QAT1.0 MKL-DNN and QAT2.0 MKL-DNN is that we use argument `qat2` to enable QAT2.0 MKL-DNN. To perform QAT2.0 MKL-DNN the performance test, the environmental variable `OMP_NUM_THREADS=1` and `batch_size=1` parameter should be set.
>*QAT1.0*
- Accuracy benchmark command on QAT1.0 models
```bash
OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME}/model --infer_data=~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.001
cd /PATH/TO/PADDLE
OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME}/model --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.001
```
>*QAT 2.0*
>*QAT2.0*
- Accuracy benchamrk
- Accuracy benchamrk command on QAT2.0 models
```bash
OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME} --infer_data=~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.01 --qat2
cd /PATH/TO/PADDLE
OMP_NUM_THREADS=28 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME} --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=50 --batch_num=1000 --acc_diff_threshold=0.01 --qat2
```
- Performance benchmark
* Performance benchmark command on QAT2.0 models
```bash
OMP_NUM_THREADS=1 FLAGS_use_mkldnn=true python python/paddle/fluid/contrib/slim/tests/qat_int8_comparison.py --qat_model=/PATH/TO/DOWNLOAD/MODEL/${MODEL_NAME} --infer_data=~/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --batch_num=1000 --acc_diff_threshold=0.01 --qat2
# 1. Save QAT2.0 INT8 model
cd /PATH/TO/PADDLE/build
python ../python/paddle/fluid/contrib/slim/tests/save_qat_model.py --qat_model_path /PATH/TO/DOWNLOAD/MODEL/${QAT2_MODEL_NAME} --int8_model_save_path /PATH/TO/${QAT2_MODEL_NAME}_qat_int8
# 2. Run the QAT2.0 C-API for performance benchmark
cd /PATH/TO/PADDLE/build
OMP_NUM_THREADS=1 paddle/fluid/inference/tests/api/test_analyzer_qat_image_classification ARGS --enable_fp32=false --with_accuracy_layer=false --int8_model=/PATH/TO/${QAT2_MODEL_NAME}_qat_int8 --infer_data=$HOME/.cache/paddle/dataset/int8/download/int8_full_val.bin --batch_size=1 --paddle_num_threads=1
```
> Notes: Due to a high amount of images contained in `int8_full_val.bin` dataset (50 000), the accuracy benchmark which includes comparison of unoptimized and optimized QAT model may last long (even several hours). To accelerate the process, it is recommended to set `OMP_NUM_THREADS` to the max number of physical cores available on the server. Since performance test doesn't require running through the whole dataset, it is sufficient to keep the number of iterations to as low as 1000, with batch size and `OMP_NUM_THRADS` both set to 1.
> Notes: Due to a large amount of images contained in `int8_full_val.bin` dataset (50 000), the accuracy benchmark which includes comparison of unoptimized and optimized QAT model may last long (even several hours). To accelerate the process, it is recommended to set `OMP_NUM_THREADS` to the max number of physical cores available on the server.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册