diff --git a/docs/en/models/PP-LCNet_en.md b/docs/en/models/PP-LCNet_en.md index 0fae0bf8cf5e33d42447270e8c5a9afc6c4432a9..e5057bd5c9d1e306e9b9de6ac6109cf79f7c6151 100644 --- a/docs/en/models/PP-LCNet_en.md +++ b/docs/en/models/PP-LCNet_en.md @@ -15,8 +15,10 @@ - [4.1 Image Classification](#4.1) - [4.2 Object Detection](#4.2) - [4.3 Semantic Segmentation](#4.3) -- [5. Conclusion](#5) -- [6. Reference](#6) +- [5. Inference speed based on V100 GPU](#5) +- [6. Inference speed based on SD855](#6) +- [7. Conclusion](#7) +- [8. Reference](#8) ## 1. Abstract @@ -91,38 +93,38 @@ Since the introduction of GoogLeNet, GAP (Global-Average-Pooling) is often direc For image classification, ImageNet dataset is adopted. Compared with the current mainstream lightweight network, PP-LCNet can obtain faster inference speed with the same accuracy. When using Baidu’s self-developed SSLD distillation strategy, the accuracy is further improved, with the Top-1 Acc of ImageNet exceeding 80% at an inference speed of about 5ms on the Intel CPU side. -| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | +| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | |-------|-----------|----------|---------------|---------------|-------------| -| PP-LCNet-0.25x | 1.5 | 18 | 51.86 | 75.65 | 1.74 | -| PP-LCNet-0.35x | 1.6 | 29 | 58.09 | 80.83 | 1.92 | -| PP-LCNet-0.5x | 1.9 | 47 | 63.14 | 84.66 | 2.05 | -| PP-LCNet-0.75x | 2.4 | 99 | 68.18 | 88.30 | 2.29 | -| PP-LCNet-1x | 3.0 | 161 | 71.32 | 90.03 | 2.46 | -| PP-LCNet-1.5x | 4.5 | 342 | 73.71 | 91.53 | 3.19 | -| PP-LCNet-2x | 6.5 | 590 | 75.18 | 92.27 | 4.27 | -| PP-LCNet-2.5x | 9.0 | 906 | 76.60 | 93.00 | 5.39 | -| PP-LCNet-0.5x\* | 1.9 | 47 | 66.10 | 86.46 | 2.05 | -| PP-LCNet-1.0x\* | 3.0 | 161 | 74.39 | 92.09 | 2.46 | -| PP-LCNet-2.5x\* | 9.0 | 906 | 80.82 | 95.33 | 5.39 | - -\* denotes the model after using SSLD distillation. +| PPLCNet_x0_25 | 1.5 | 18 | 51.86 | 75.65 | 1.74 | +| PPLCNet_x0_35 | 1.6 | 29 | 58.09 | 80.83 | 1.92 | +| PPLCNet_x0_5 | 1.9 | 47 | 63.14 | 84.66 | 2.05 | +| PPLCNet_x0_75 | 2.4 | 99 | 68.18 | 88.30 | 2.29 | +| PPLCNet_x1_0 | 3.0 | 161 | 71.32 | 90.03 | 2.46 | +| PPLCNet_x1_5 | 4.5 | 342 | 73.71 | 91.53 | 3.19 | +| PPLCNet_x2_0 | 6.5 | 590 | 75.18 | 92.27 | 4.27 | +| PPLCNet_x2_5 | 9.0 | 906 | 76.60 | 93.00 | 5.39 | +| PPLCNet_x0_5_ssld | 1.9 | 47 | 66.10 | 86.46 | 2.05 | +| PPLCNet_x1_0_ssld | 3.0 | 161 | 74.39 | 92.09 | 2.46 | +| PPLCNet_x2_5_ssld | 9.0 | 906 | 80.82 | 95.33 | 5.39 | + +where `_ssld` represents the model after using `SSLD distillation`. For details about `SSLD distillation`, see [SSLD distillation](../advanced_tutorials/knowledge_distillation_en.md). Performance comparison with other lightweight networks: | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | |-------|-----------|----------|---------------|---------------|-------------| -| MobileNetV2-0.25x | 1.5 | 34 | 53.21 | 76.52 | 2.47 | -| MobileNetV3-small-0.35x | 1.7 | 15 | 53.03 | 76.37 | 3.02 | -| ShuffleNetV2-0.33x | 0.6 | 24 | 53.73 | 77.05 | 4.30 | -| PP-LCNet-0.25x | 1.5 | 18 | 51.86 | 75.65 | 1.74 | -| MobileNetV2-0.5x | 2.0 | 99 | 65.03 | 85.72 | 2.85 | -| MobileNetV3-large-0.35x | 2.1 | 41 | 64.32 | 85.46 | 3.68 | -| ShuffleNetV2-0.5x | 1.4 | 43 | 60.32 | 82.26 | 4.65 | -| PP-LCNet-0.5x | 1.9 | 47 | 63.14 | 84.66 | 2.05 | -| MobileNetV1-1x | 4.3 | 578 | 70.99 | 89.68 | 3.38 | -| MobileNetV2-1x | 3.5 | 327 | 72.15 | 90.65 | 4.26 | -| MobileNetV3-small-1.25x | 3.6 | 100 | 70.67 | 89.51 | 3.95 | -| PP-LCNet-1x | 3.0 | 161 | 71.32 | 90.03 | 2.46 | +| MobileNetV2_x0_25 | 1.5 | 34 | 53.21 | 76.52 | 2.47 | +| MobileNetV3_small_x0_35 | 1.7 | 15 | 53.03 | 76.37 | 3.02 | +| ShuffleNetV2_x0_33 | 0.6 | 24 | 53.73 | 77.05 | 4.30 | +| PPLCNet_x0_25 | 1.5 | 18 | 51.86 | 75.65 | 1.74 | +| MobileNetV2_x0_5 | 2.0 | 99 | 65.03 | 85.72 | 2.85 | +| MobileNetV3_large_x0_35 | 2.1 | 41 | 64.32 | 85.46 | 3.68 | +| ShuffleNetV2_x0_5 | 1.4 | 43 | 60.32 | 82.26 | 4.65 | +| PPLCNet_x0_5 | 1.9 | 47 | 63.14 | 84.66 | 2.05 | +| MobileNetV1_x1_0 | 4.3 | 578 | 70.99 | 89.68 | 3.38 | +| MobileNetV2_x1_0 | 3.5 | 327 | 72.15 | 90.65 | 4.26 | +| MobileNetV3_small_x1_25 | 3.6 | 100 | 70.67 | 89.51 | 3.95 | +| PPLCNet_x1_0 | 3.0 | 161 | 71.32 | 90.03 | 2.46 | ### 4.2 Object Detection @@ -131,10 +133,10 @@ For object detection, we adopt Baidu’s self-developed PicoDet, which focuses o | Backbone | mAP(%) | Latency(ms) | |-------|-----------|----------| -MobileNetV3-large-0.35x | 19.2 | 8.1 | -PP-LCNet-0.5x | 20.3 | 6.0 | -MobileNetV3-large-0.75x | 25.8 | 11.1 | -PP-LCNet-1x | 26.9 | 7.9 | +MobileNetV3_large_x0_35 | 19.2 | 8.1 | +PPLCNet_x0_5 | 20.3 | 6.0 | +MobileNetV3_large_x0_75 | 25.8 | 11.1 | +PPLCNet_x1_0 | 26.9 | 7.9 | ### 4.3 Semantic Segmentation @@ -143,18 +145,47 @@ For semantic segmentation, DeeplabV3+ is adopted. The following table presents t | Backbone | mIoU(%) | Latency(ms) | |-------|-----------|----------| -MobileNetV3-large-0.5x | 55.42 | 135 | -PP-LCNet-0.5x | 58.36 | 82 | -MobileNetV3-large-0.75x | 64.53 | 151 | -PP-LCNet-1x | 66.03 | 96 | +MobileNetV3_large_x0_5 | 55.42 | 135 | +PPLCNet_x0_5 | 58.36 | 82 | +MobileNetV3_large_x0_75 | 64.53 | 151 | +PPLCNet_x1_0 | 66.03 | 96 | -## 5. Conclusion +## 5. Inference speed based on V100 GPU + +| Models | Crop Size | Resize Short Size | FP32
Batch Size=1
(ms) | FP32
Batch Size=1\4
(ms) | FP32
Batch Size=8
(ms) | +| ------------- | --------- | ----------------- | ---------------------------- | -------------------------------- | ------------------------------ | +| PPLCNet_x0_25 | 224 | 256 | 0.72 | 1.17 | 1.71 | +| PPLCNet_x0_35 | 224 | 256 | 0.69 | 1.21 | 1.82 | +| PPLCNet_x0_5 | 224 | 256 | 0.70 | 1.32 | 1.94 | +| PPLCNet_x0_75 | 224 | 256 | 0.71 | 1.49 | 2.19 | +| PPLCNet_x1_0 | 224 | 256 | 0.73 | 1.64 | 2.53 | +| PPLCNet_x1_5 | 224 | 256 | 0.82 | 2.06 | 3.12 | +| PPLCNet_x2_0 | 224 | 256 | 0.94 | 2.58 | 4.08 | + + + +## 6. Inference speed based on SD855 + +| Models | SD855 time(ms)
bs=1, thread=1 | SD855 time(ms)
bs=1, thread=2 | SD855 time(ms)
bs=1, thread=4 | +| ------------- | -------------------------------- | --------------------------------- | --------------------------------- | +| PPLCNet_x0_25 | 2.30 | 1.62 | 1.32 | +| PPLCNet_x0_35 | 3.15 | 2.11 | 1.64 | +| PPLCNet_x0_5 | 4.27 | 2.73 | 1.92 | +| PPLCNet_x0_75 | 7.38 | 4.51 | 2.91 | +| PPLCNet_x1_0 | 10.78 | 6.49 | 3.98 | +| PPLCNet_x1_5 | 20.55 | 12.26 | 7.54 | +| PPLCNet_x2_0 | 33.79 | 20.17 | 12.10 | +| PPLCNet_x2_5 | 49.89 | 29.60 | 17.82 | + + + +## 7. Conclusion Rather than holding on to perfect FLOPs and Params as academics do, PP-LCNet focuses on analyzing how to add Intel CPU-friendly modules to improve the performance of the model, which can better balance accuracy and inference time. The experimental conclusions therein are available to other researchers in network structure design, while providing NAS search researchers with a smaller search space and general conclusions. The finished PP-LCNet can also be better accepted and applied in industry. - -## 6. Reference + +## 8. Reference Reference to cite when you use PP-LCNet in a paper: ``` diff --git a/docs/en/quick_start/quick_start_classification_professional_en.md b/docs/en/quick_start/quick_start_classification_professional_en.md index fe559fe059ed9b0920a7ae6b36536fd8fdc65559..10c2174063c9914ac2f97c03ce5bb789a116a9d9 100644 --- a/docs/en/quick_start/quick_start_classification_professional_en.md +++ b/docs/en/quick_start/quick_start_classification_professional_en.md @@ -75,6 +75,10 @@ python3 -m paddle.distributed.launch \ The highest accuracy of the validation set is around 0.415. +* ** Note** + + * If the number of GPU cards is not 4, the accuracy of the validation set may be different from 0.415. To maintain a comparable accuracy, you need to change the learning rate in the configuration file to `the current learning rate / 4 \* current card number`. The same below. + diff --git a/docs/zh_CN/models/PP-LCNet.md b/docs/zh_CN/models/PP-LCNet.md index 156f30e96fa4266680ad352425a193fe350446af..7fea973d4634228fefcccff0e7e856546b3b9652 100644 --- a/docs/zh_CN/models/PP-LCNet.md +++ b/docs/zh_CN/models/PP-LCNet.md @@ -95,38 +95,38 @@ BaseNet 经过以上四个方面的改进,得到了 PP-LCNet。下表进一步 图像分类我们选用了 ImageNet 数据集,相比目前主流的轻量级网络,PP-LCNet 在相同精度下可以获得更快的推理速度。当使用百度自研的 SSLD 蒸馏策略后,精度进一步提升,在 Intel cpu 端约 5ms 的推理速度下 ImageNet 的 Top-1 Acc 超过了 80%。 -| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | +| Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | |-------|-----------|----------|---------------|---------------|-------------| -| PP-LCNet-0.25x | 1.5 | 18 | 51.86 | 75.65 | 1.74 | -| PP-LCNet-0.35x | 1.6 | 29 | 58.09 | 80.83 | 1.92 | -| PP-LCNet-0.5x | 1.9 | 47 | 63.14 | 84.66 | 2.05 | -| PP-LCNet-0.75x | 2.4 | 99 | 68.18 | 88.30 | 2.29 | -| PP-LCNet-1x | 3.0 | 161 | 71.32 | 90.03 | 2.46 | -| PP-LCNet-1.5x | 4.5 | 342 | 73.71 | 91.53 | 3.19 | -| PP-LCNet-2x | 6.5 | 590 | 75.18 | 92.27 | 4.27 | -| PP-LCNet-2.5x | 9.0 | 906 | 76.60 | 93.00 | 5.39 | -| PP-LCNet-0.5x\* | 1.9 | 47 | 66.10 | 86.46 | 2.05 | -| PP-LCNet-1.0x\* | 3.0 | 161 | 74.39 | 92.09 | 2.46 | -| PP-LCNet-2.5x\* | 9.0 | 906 | 80.82 | 95.33 | 5.39 | - -其中\*表示使用 SSLD 蒸馏后的模型。 +| PPLCNet_x0_25 | 1.5 | 18 | 51.86 | 75.65 | 1.74 | +| PPLCNet_x0_35 | 1.6 | 29 | 58.09 | 80.83 | 1.92 | +| PPLCNet_x0_5 | 1.9 | 47 | 63.14 | 84.66 | 2.05 | +| PPLCNet_x0_75 | 2.4 | 99 | 68.18 | 88.30 | 2.29 | +| PPLCNet_x1_0 | 3.0 | 161 | 71.32 | 90.03 | 2.46 | +| PPLCNet_x1_5 | 4.5 | 342 | 73.71 | 91.53 | 3.19 | +| PPLCNet_x2_0 | 6.5 | 590 | 75.18 | 92.27 | 4.27 | +| PPLCNet_x2_5 | 9.0 | 906 | 76.60 | 93.00 | 5.39 | +| PPLCNet_x0_5_ssld | 1.9 | 47 | 66.10 | 86.46 | 2.05 | +| PPLCNet_x1_0_ssld | 3.0 | 161 | 74.39 | 92.09 | 2.46 | +| PPLCNet_x2_5_ssld | 9.0 | 906 | 80.82 | 95.33 | 5.39 | + +其中 `_ssld` 表示使用 `SSLD 蒸馏`后的模型。关于 `SSLD蒸馏` 的内容,详情 [SSLD 蒸馏](../advanced_tutorials/knowledge_distillation.md)。 与其他轻量级网络的性能对比: | Model | Params(M) | FLOPs(M) | Top-1 Acc(\%) | Top-5 Acc(\%) | Latency(ms) | |-------|-----------|----------|---------------|---------------|-------------| -| MobileNetV2-0.25x | 1.5 | 34 | 53.21 | 76.52 | 2.47 | -| MobileNetV3-small-0.35x | 1.7 | 15 | 53.03 | 76.37 | 3.02 | -| ShuffleNetV2-0.33x | 0.6 | 24 | 53.73 | 77.05 | 4.30 | -| PP-LCNet-0.25x | 1.5 | 18 | 51.86 | 75.65 | 1.74 | -| MobileNetV2-0.5x | 2.0 | 99 | 65.03 | 85.72 | 2.85 | -| MobileNetV3-large-0.35x | 2.1 | 41 | 64.32 | 85.46 | 3.68 | -| ShuffleNetV2-0.5x | 1.4 | 43 | 60.32 | 82.26 | 4.65 | -| PP-LCNet-0.5x | 1.9 | 47 | 63.14 | 84.66 | 2.05 | -| MobileNetV1-1x | 4.3 | 578 | 70.99 | 89.68 | 3.38 | -| MobileNetV2-1x | 3.5 | 327 | 72.15 | 90.65 | 4.26 | -| MobileNetV3-small-1.25x | 3.6 | 100 | 70.67 | 89.51 | 3.95 | -| PP-LCNet-1x | 3.0 | 161 | 71.32 | 90.03 | 2.46 | +| MobileNetV2_x0_25 | 1.5 | 34 | 53.21 | 76.52 | 2.47 | +| MobileNetV3_small_x0_35 | 1.7 | 15 | 53.03 | 76.37 | 3.02 | +| ShuffleNetV2_x0_33 | 0.6 | 24 | 53.73 | 77.05 | 4.30 | +| PPLCNet_x0_25 | 1.5 | 18 | 51.86 | 75.65 | 1.74 | +| MobileNetV2_x0_5 | 2.0 | 99 | 65.03 | 85.72 | 2.85 | +| MobileNetV3_large_x0_35 | 2.1 | 41 | 64.32 | 85.46 | 3.68 | +| ShuffleNetV2_x0_5 | 1.4 | 43 | 60.32 | 82.26 | 4.65 | +| PPLCNet_x0_5 | 1.9 | 47 | 63.14 | 84.66 | 2.05 | +| MobileNetV1_x1_0 | 4.3 | 578 | 70.99 | 89.68 | 3.38 | +| MobileNetV2_x1_0 | 3.5 | 327 | 72.15 | 90.65 | 4.26 | +| MobileNetV3_small_x1_25 | 3.6 | 100 | 70.67 | 89.51 | 3.95 | +| PPLCNet_x1_0 | 3.0 | 161 | 71.32 | 90.03 | 2.46 | ### 4.2 目标检测 @@ -135,10 +135,10 @@ BaseNet 经过以上四个方面的改进,得到了 PP-LCNet。下表进一步 | Backbone | mAP(%) | Latency(ms) | |-------|-----------|----------| -MobileNetV3-large-0.35x | 19.2 | 8.1 | -PP-LCNet-0.5x | 20.3 | 6.0 | -MobileNetV3-large-0.75x | 25.8 | 11.1 | -PP-LCNet-1x | 26.9 | 7.9 | +MobileNetV3_large_x0_35 | 19.2 | 8.1 | +PPLCNet_x0_5 | 20.3 | 6.0 | +MobileNetV3_large_x0_75 | 25.8 | 11.1 | +PPLCNet_x1_0 | 26.9 | 7.9 | ### 4.3 语义分割 @@ -147,10 +147,10 @@ MobileNetV3-large-0.75x | 25.8 | 11.1 | | Backbone | mIoU(%) | Latency(ms) | |-------|-----------|----------| -|MobileNetV3-large-0.5x | 55.42 | 135 | -|PP-LCNet-0.5x | 58.36 | 82 | -|MobileNetV3-large-0.75x | 64.53 | 151 | -|PP-LCNet-1x | 66.03 | 96 | +MobileNetV3_large_x0_5 | 55.42 | 135 | +PPLCNet_x0_5 | 58.36 | 82 | +MobileNetV3_large_x0_75 | 64.53 | 151 | +PPLCNet_x1_0 | 66.03 | 96 | diff --git a/docs/zh_CN/quick_start/quick_start_classification_professional.md b/docs/zh_CN/quick_start/quick_start_classification_professional.md index 5a1304185ecfcc8b73de087fc9e27e6087c342f3..4564b059f54034db346c9ae03d4bc2cbb8a8a1e4 100644 --- a/docs/zh_CN/quick_start/quick_start_classification_professional.md +++ b/docs/zh_CN/quick_start/quick_start_classification_professional.md @@ -75,6 +75,10 @@ python3 -m paddle.distributed.launch \ 验证集的最高准确率为 0.415 左右。 +* **注意** + + * 如果 GPU 卡数不是 4,验证集的准确率可能与 0.415 有差异,若需保持相当的准确率,需要将配置文件中的学习率改为`当前学习率 / 4 \* 当前卡数`。下同。 + @@ -153,7 +157,7 @@ python3 -m paddle.distributed.launch \ * **注意** * 其他数据增广的配置文件可以参考 `ppcls/configs/ImageNet/DataAugment/` 中的配置文件。 -* 训练 CIFAR100 的迭代轮数较少,因此进行训练时,验证集的精度指标可能会有 1% 左右的波动。 + * 训练 CIFAR100 的迭代轮数较少,因此进行训练时,验证集的精度指标可能会有 1% 左右的波动。