未验证 提交 8ef3c02e 编写于 作者: L lidanqing 提交者: GitHub

Update DNNL QAT document 2.0-alpha (#24494)

Update DNNL QAT document 2.0-alpha
上级 db2b6b65
...@@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr ...@@ -109,10 +109,9 @@ The code snipped shows how the `Qat2Int8MkldnnPass` can be applied to a model gr
## 5. Accuracy and Performance benchmark ## 5. Accuracy and Performance benchmark
This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on two servers: This section contain QAT2 MKL-DNN accuracy and performance benchmark results measured on the following server:
* Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support), * Intel(R) Xeon(R) Gold 6271 (with AVX512 VNNI support),
* Intel(R) Xeon(R) Gold 6148.
Performance benchmarks were run with the following environment settings: Performance benchmarks were run with the following environment settings:
...@@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings: ...@@ -144,17 +143,6 @@ Performance benchmarks were run with the following environment settings:
| VGG16 | 72.08% | 71.73% | -0.35% | 90.63% | 89.71% | -0.92% | | VGG16 | 72.08% | 71.73% | -0.35% | 90.63% | 89.71% | -0.92% |
| VGG19 | 72.57% | 72.12% | -0.45% | 90.84% | 90.15% | -0.69% | | VGG19 | 72.57% | 72.12% | -0.45% | 90.84% | 90.15% | -0.69% |
>**Intel(R) Xeon(R) Gold 6148**
| Model | FP32 Top1 Accuracy | INT8 QAT Top1 Accuracy | Top1 Diff | FP32 Top5 Accuracy | INT8 QAT Top5 Accuracy | Top5 Diff |
| :----------: | :----------------: | :--------------------: | :-------: | :----------------: | :--------------------: | :-------: |
| MobileNet-V1 | 70.78% | 70.85% | 0.07% | 89.69% | 89.41% | -0.28% |
| MobileNet-V2 | 71.90% | 72.08% | 0.18% | 90.56% | 90.66% | +0.10% |
| ResNet101 | 77.50% | 77.51% | 0.01% | 93.58% | 93.50% | -0.08% |
| ResNet50 | 76.63% | 76.55% | -0.08% | 93.10% | 92.96% | -0.14% |
| VGG16 | 72.08% | 71.72% | -0.36% | 90.63% | 89.75% | -0.88% |
| VGG19 | 72.57% | 72.08% | -0.49% | 90.84% | 90.11% | -0.73% |
#### Performance #### Performance
Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below. Image classification models performance was measured using a single thread. The setting is included in the benchmark reproduction commands below.
...@@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The ...@@ -164,23 +152,12 @@ Image classification models performance was measured using a single thread. The
| Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) | | Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
| :----------: | :-------------: | :-----------------: | :---------------: | | :----------: | :-------------: | :-----------------: | :---------------: |
| MobileNet-V1 | 77.00 | 210.76 | 2.74 | | MobileNet-V1 | 74.05 | 196.98 | 2.66 |
| MobileNet-V2 | 88.43 | 182.47 | 2.06 | | MobileNet-V2 | 88.60 | 187.67 | 2.12 |
| ResNet101 | 7.20 | 25.88 | 3.60 | | ResNet101 | 7.20 | 26.43 | 3.67 |
| ResNet50 | 13.26 | 47.44 | 3.58 | | ResNet50 | 13.23 | 47.44 | 3.59 |
| VGG16 | 3.48 | 10.11 | 2.90 | | VGG16 | 3.47 | 10.20 | 2.94 |
| VGG19 | 2.83 | 8.77 | 3.10 | | VGG19 | 2.83 | 8.67 | 3.06 |
>**Intel(R) Xeon(R) Gold 6148**
| Model | FP32 (images/s) | INT8 QAT (images/s) | Ratio (INT8/FP32) |
| :----------: | :-------------: | :-----------------: | :---------------: |
| MobileNet-V1 | 75.23 | 103.63 | 1.38 |
| MobileNet-V2 | 86.65 | 128.14 | 1.48 |
| ResNet101 | 6.61 | 10.79 | 1.63 |
| ResNet50 | 12.42 | 19.65 | 1.58 |
| VGG16 | 3.31 | 4.74 | 1.43 |
| VGG19 | 2.68 | 3.91 | 1.46 |
Notes: Notes:
...@@ -194,13 +171,8 @@ Notes: ...@@ -194,13 +171,8 @@ Notes:
| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff | | Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
|:------------:|:----------------------:|:----------------------:|:---------:| |:------------:|:----------------------:|:----------------------:|:---------:|
| Ernie | 80.20% | 79.88% | -0.32% | | Ernie | 80.20% | 79.44% | -0.76% |
>**Intel(R) Xeon(R) Gold 6148**
| Model | FP32 Accuracy | QAT INT8 Accuracy | Accuracy Diff |
| :---: | :-----------: | :---------------: | :-----------: |
| Ernie | 80.20% | 79.64% | -0.56% |
#### Performance #### Performance
...@@ -209,17 +181,10 @@ Notes: ...@@ -209,17 +181,10 @@ Notes:
| Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) | | Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
|:------------:|:----------------------:|:-------------------:|:---------:|:---------:| |:------------:|:----------------------:|:-------------------:|:---------:|:---------:|
| Ernie | 1 thread | 236.72 | 83.70 | 2.82x | | Ernie | 1 thread | 237.21 | 79.26 | 2.99x |
| Ernie | 20 threads | 27.40 | 15.01 | 1.83x | | Ernie | 20 threads | 22.08 | 12.57 | 1.76x |
>**Intel(R) Xeon(R) Gold 6148**
| Model | Threads | FP32 Latency (ms) | QAT INT8 Latency (ms) | Ratio (FP32/INT8) |
| :---: | :--------: | :---------------: | :-------------------: | :---------------: |
| Ernie | 1 thread | 248.42 | 169.30 | 1.46 |
| Ernie | 20 threads | 28.92 | 20.83 | 1.39 |
## 6. How to reproduce the results ## 6. How to reproduce the results
The steps below show, taking ResNet50 as an example, how to reproduce the above accuracy and performance results for Image Classification models. The steps below show, taking ResNet50 as an example, how to reproduce the above accuracy and performance results for Image Classification models.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册