Update DNNL QAT document 2.0-alpha (#24494)

Update DNNL QAT document 2.0-alpha

Update DNNL QAT document 2.0-alpha (#24494)
Update DNNL QAT document 2.0-alpha
8ef3c02e · lidanqing · GitHub · db2b6b65 · 8ef3c02e
显示空白变更内容
内联并排

Showing with 10 addition and 45 deletion

python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md ...paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md +10 -45

未找到文件。
--- a/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
+++ b/python/paddle/fluid/contrib/slim/tests/QAT_mkldnn_int8_readme.md
@@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr

 ## 5. Accuracy and Performance benchmark

-This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on two servers:
+This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on the following server:

 * Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support),
-* Intel(R) Xeon(R) Gold 6148.

 Performance benchmarks were run with the following environment settings:

@@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings:
 |    VGG16     |       72.08%       |         71.73%         |  -0.35%   |       90.63%       |         89.71%         |  -0.92%   |
 |    VGG19     |       72.57%       |         72.12%         |  -0.45%   |       90.84%       |         90.15%         |  -0.69%   |

->**Intel(R) Xeon(R) Gold 6148**
-
-|    Model     | FP32 Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | FP32 Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
-| :----------: | :----------------: | :--------------------: | :-------: | :----------------: | :--------------------: | :-------: |
-| MobileNet-V1 |       70.78%       |         70.85%         |   0.07%   |       89.69%       |         89.41%         |  -0.28%   |
-| MobileNet-V2 |       71.90%       |         72.08%         |   0.18%   |       90.56%       |         90.66%         |  +0.10%   |
-|  ResNet101   |       77.50%       |         77.51%         |   0.01%   |       93.58%       |         93.50%         |  -0.08%   |
-|   ResNet50   |       76.63%       |         76.55%         |  -0.08%   |       93.10%       |         92.96%         |  -0.14%   |
-|    VGG16     |       72.08%       |         71.72%         |  -0.36%   |       90.63%       |         89.75%         |  -0.88%   |
-|    VGG19     |       72.57%       |         72.08%         |  -0.49%   |       90.84%       |         90.11%         |  -0.73%   |
-
 #### Performance

 Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below.
@@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The

 |    Model     | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32)  |
 | :----------: | :-------------: | :-----------------: | :---------------:  |
-| MobileNet-V1 |      77.00      |       210.76        |      2.74          |
-| MobileNet-V2 |      88.43      |       182.47        |      2.06          |
-|  ResNet101   |      7.20       |        25.88        |      3.60          |
-|   ResNet50   |      13.26      |        47.44        |      3.58          |
-|    VGG16     |      3.48       |        10.11        |      2.90          |
-|    VGG19     |      2.83       |        8.77         |      3.10          |
-
->**Intel(R) Xeon(R) Gold 6148**
-
-|    Model     | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
-| :----------: | :-------------: | :-----------------: | :---------------: |
-| MobileNet-V1 |      75.23      |       103.63        |      1.38         |
-| MobileNet-V2 |      86.65      |       128.14        |      1.48         |
-|  ResNet101   |      6.61       |       10.79         |      1.63         |
-|   ResNet50   |      12.42      |       19.65         |      1.58         |
-|    VGG16     |      3.31       |        4.74         |      1.43         |
-|    VGG19     |      2.68       |        3.91         |      1.46         |
+| MobileNet-V1 |      74.05      |       196.98        |      2.66          |
+| MobileNet-V2 |      88.60      |       187.67        |      2.12          |
+|  ResNet101   |      7.20       |       26.43         |      3.67          |
+|   ResNet50   |      13.23      |       47.44         |      3.59          |
+|    VGG16     |      3.47       |       10.20         |      2.94          |
+|    VGG19     |      2.83       |       8.67          |      3.06          |

 Notes:

@@ -194,13 +171,8 @@ Notes:

 |     Model    |  FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
 |:------------:|:----------------------:|:----------------------:|:---------:|
-|   Ernie      |      80.20%            |        79.88%        |  -0.32%  |
-
->**Intel(R) Xeon(R) Gold 6148**
+|   Ernie      |      80.20%            |        79.44%        |  -0.76%  |

-| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
-| :---: | :-----------: | :---------------: | :-----------: |
-| Ernie |    80.20%     |      79.64%       |    -0.56%     |

 #### Performance

@@ -209,17 +181,10 @@ Notes:

 |  Model  |     Threads  | FP32 Latency (ms) | QAT INT8 Latency (ms)    | Ratio (FP32/INT8) |
 |:------------:|:----------------------:|:-------------------:|:---------:|:---------:|
-| Ernie | 1 thread     |       236.72        |     83.70    |   2.82x   |
-| Ernie | 20 threads   |       27.40         |     15.01    |   1.83x   |
+| Ernie | 1 thread     |       237.21        |     79.26    |   2.99x    |
+| Ernie | 20 threads   |       22.08         |     12.57    |   1.76x    |


->**Intel(R) Xeon(R) Gold 6148**
-
-| Model |  Threads   | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
-| :---: | :--------: | :---------------: | :-------------------: | :---------------: |
-| Ernie |  1 thread  |    248.42         |       169.30           |       1.46       |
-| Ernie | 20 threads |    28.92          |       20.83            |       1.39       |
-
 ## 6. How to reproduce the results

 The steps below show, taking ResNet50 as an example, how to reproduce the above accuracy and performance results for Image Classification models.