From 8ef3c02e9032aa7330ec235c91bd8c88dbd3eb6e Mon Sep 17 00:00:00 2001 From: lidanqing Date: Thu, 14 May 2020 14:02:40 +0200 Subject: [PATCH] Update DNNL QAT document 2.0-alpha (#24494) Update DNNL QAT document 2.0-alpha --- .../slim/tests/QAT_mkldnn_int8_readme.md | 55 ++++--------------- 1 file changed, 10 insertions(+), 45 deletions(-) diff --git a/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md b/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md index 634c685fd32..e84c5b3b505 100644 --- a/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md +++ b/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md @@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr ## 5. Accuracy and Performance benchmark -This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on two servers: +This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on the following server: * Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support), -* Intel(R) Xeon(R) Gold 6148. Performance benchmarks were run with the following environment settings: @@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings: | VGG16 | 72.08% | 71.73% | -0.35% | 90.63% | 89.71% | -0.92% | | VGG19 | 72.57% | 72.12% | -0.45% | 90.84% | 90.15% | -0.69% | ->**Intel(R) Xeon(R) Gold 6148** - -| Model | FP32 Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | FP32 Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff | -| :----------: | :----------------: | :--------------------: | :-------: | :----------------: | :--------------------: | :-------: | -| MobileNet-V1 | 70.78% | 70.85% | 0.07% | 89.69% | 89.41% | -0.28% | -| MobileNet-V2 | 71.90% | 72.08% | 0.18% | 90.56% | 90.66% | +0.10% | -| ResNet101 | 77.50% | 77.51% | 0.01% | 93.58% | 93.50% | -0.08% | -| ResNet50 | 76.63% | 76.55% | -0.08% | 93.10% | 92.96% | -0.14% | -| VGG16 | 72.08% | 71.72% | -0.36% | 90.63% | 89.75% | -0.88% | -| VGG19 | 72.57% | 72.08% | -0.49% | 90.84% | 90.11% | -0.73% | - #### Performance Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below. @@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The | Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) | | :----------: | :-------------: | :-----------------: | :---------------: | -| MobileNet-V1 | 77.00 | 210.76 | 2.74 | -| MobileNet-V2 | 88.43 | 182.47 | 2.06 | -| ResNet101 | 7.20 | 25.88 | 3.60 | -| ResNet50 | 13.26 | 47.44 | 3.58 | -| VGG16 | 3.48 | 10.11 | 2.90 | -| VGG19 | 2.83 | 8.77 | 3.10 | - ->**Intel(R) Xeon(R) Gold 6148** - -| Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) | -| :----------: | :-------------: | :-----------------: | :---------------: | -| MobileNet-V1 | 75.23 | 103.63 | 1.38 | -| MobileNet-V2 | 86.65 | 128.14 | 1.48 | -| ResNet101 | 6.61 | 10.79 | 1.63 | -| ResNet50 | 12.42 | 19.65 | 1.58 | -| VGG16 | 3.31 | 4.74 | 1.43 | -| VGG19 | 2.68 | 3.91 | 1.46 | +| MobileNet-V1 | 74.05 | 196.98 | 2.66 | +| MobileNet-V2 | 88.60 | 187.67 | 2.12 | +| ResNet101 | 7.20 | 26.43 | 3.67 | +| ResNet50 | 13.23 | 47.44 | 3.59 | +| VGG16 | 3.47 | 10.20 | 2.94 | +| VGG19 | 2.83 | 8.67 | 3.06 | Notes: @@ -194,13 +171,8 @@ Notes: | Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff | |:------------:|:----------------------:|:----------------------:|:---------:| -| Ernie | 80.20% | 79.88% | -0.32% | +| Ernie | 80.20% | 79.44% | -0.76% | ->**Intel(R) Xeon(R) Gold 6148** - -| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff | -| :---: | :-----------: | :---------------: | :-----------: | -| Ernie | 80.20% | 79.64% | -0.56% | #### Performance @@ -209,16 +181,9 @@ Notes: | Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) | |:------------:|:----------------------:|:-------------------:|:---------:|:---------:| -| Ernie | 1 thread | 236.72 | 83.70 | 2.82x | -| Ernie | 20 threads | 27.40 | 15.01 | 1.83x | - - ->**Intel(R) Xeon(R) Gold 6148** +| Ernie | 1 thread | 237.21 | 79.26 | 2.99x | +| Ernie | 20 threads | 22.08 | 12.57 | 1.76x | -| Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) | -| :---: | :--------: | :---------------: | :-------------------: | :---------------: | -| Ernie | 1 thread | 248.42 | 169.30 | 1.46 | -| Ernie | 20 threads | 28.92 | 20.83 | 1.39 | ## 6. How to reproduce the results -- GitLab