未验证 提交 db5c20f6 编写于 作者: G gaotingquan

docs: fix

上级 d1f3622c
......@@ -38,18 +38,18 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of car exists using PaddleClas PULC (Practical Ultra Lightweight Classification). The model can be widely used in monitoring scenarios, massive data filtering scenarios, etc.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of car exists using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in monitoring scenarios, massive data filtering scenarios, etc.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | Tpr(%) | Latency(ms) | Size(M)| Training Strategy |
|-------|----------------|----------|---------------|---------------|
| SwinTranformer_tiny | 97.71 | 95.30 | 107 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 81.23 | 2.85 | 1.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 94.72 | 2.12 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 95.48 | 2.12 | 6.5 | using SSLD pretrained model |
| PPLCNet_x1_0 | 95.48 | 2.12 | 6.5 | using SSLD pretrained model + EDA strategy |
| <b>PPLCNet_x1_0<b> | <b>95.92<b> | <b>2.12<b> | <b>6.5<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
| SwinTranformer_tiny | 97.71 | 95.30 | 111 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 81.23 | 2.85 | 2.7 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 94.72 | 2.12 | 7.1 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 95.48 | 2.12 | 7.1 | using SSLD pretrained model |
| PPLCNet_x1_0 | 95.48 | 2.12 | 7.1 | using SSLD pretrained model + EDA strategy |
| <b>PPLCNet_x1_0<b> | <b>95.92<b> | <b>2.12<b> | <b>7.1<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high Tpr can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 13 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 0.7 percentage points without affecting the inference speed. Finally, after additional using the SKL-UGI knowledge distillation, the Tpr can be further improved by 0.44 percentage points. At this point, the Tpr is close to that of SwinTranformer_tiny, but the speed is more than 40 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
......
......@@ -38,18 +38,18 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of language in the image using PaddleClas PULC (Practical Ultra Lightweight Classification). The model can be widely used in various scenarios involving multilingual OCR processing, such as finance and government affairs.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of language in the image using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in various scenarios involving multilingual OCR processing, such as finance and government affairs.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy. When replacing the backbone with PPLCNet_x1_0, the input shape of model is changed to [192, 48], and the stride of the network is changed to [2, [2, 1], [2, 1], [2, 1]].
| Backbone | Top1-Acc(%) | Latency(ms) | Size(M)| Training Strategy |
| ----------------------- | --------- | -------- | ------- | ---------------------------------------------- |
| SwinTranformer_tiny | 98.12 | 89.09 | 107 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 95.92 | 2.98 | 17 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.35 | 2.58 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.7 | 2.58 | 6.5 | using SSLD pretrained model |
| PPLCNet_x1_0 | 99.12 | 2.58 | 6.5 | using SSLD pretrained model + EDA strategy |
| **PPLCNet_x1_0** | **99.26** | **2.58** | **6.5** | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
| SwinTranformer_tiny | 98.12 | 89.09 | 111 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 95.92 | 2.98 | 3.7 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.35 | 2.58 | 7.1 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.7 | 2.58 | 7.1 | using SSLD pretrained model |
| PPLCNet_x1_0 | 99.12 | 2.58 | 7.1 | using SSLD pretrained model + EDA strategy |
| **PPLCNet_x1_0** | **99.26** | **2.58** | **7.1** | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0 and changing the input shape and stride of network, the accuracy is higher more 2.43 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.35 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the accuracy can be increased by 0.42 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the accuracy can be further improved by 0.14 percentage points. At this point, the accuracy is higher than that of SwinTranformer_tiny, but the speed is more faster. The training method and deployment instructions of PULC will be introduced in detail below.
......
......@@ -7,15 +7,15 @@ The PULC model zoo is provided here, mainly providing indicators, model storage
|Model name| Model Description | Metrics |Storage Size| Latency| Download Address|
| --- | --- | --- | --- | --- | --- |
| person_exists |[Human Exists Classification](PULC_person_exists_en.md)| 95.60 |6.5M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_exists_pretrained.pdparams)|
| person_attribute |[Pedestrian Attribute Classification](PULC_person_attribute_en.md)| 78.59 |6.6M|2.01ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_attribute_pretrained.pdparams)|
| safety_helmet |[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)| 99.38 |6.5M|2.03ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/safety_helmet_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/safety_helmet_pretrained.pdparams)|
| person_exists |[Human Exists Classification](PULC_person_exists_en.md)| 96.23 |7.0M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_exists_pretrained.pdparams)|
| person_attribute |[Pedestrian Attribute Classification](PULC_person_attribute_en.md)| 78.59 |7.2M|2.01ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/person_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/person_attribute_pretrained.pdparams)|
| safety_helmet |[Classification of Wheather Wearing Safety Helmet](PULC_safety_helmet_en.md)| 99.38 |7.1M|2.03ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/safety_helmet_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/safety_helmet_pretrained.pdparams)|
| traffic_sign |[Traffic Sign Classification](PULC_traffic_sign_en.md)| 98.35 |8.2M|2.10ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/traffic_sign_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/traffic_sign_pretrained.pdparams)|
| vehicle_attribute |[Vehicle Attribute Classification](PULC_vehicle_attribute_en.md)| 90.81 |7.2M|2.36ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/vehicle_attribute_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/vehicle_attribute_pretrained.pdparams)|
| car_exists |[Car Exists Classification](PULC_car_exists_en.md) | 95.92 | 6.6M | 2.38ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/car_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/car_exists_pretrained.pdparams)|
| text_image_orientation |[Text Image Orientation Classification](PULC_text_image_orientation_en.md)| 99.06 | 6.5M | 2.16ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/text_image_orientation_pretrained.pdparams)|
| textline_orientation |[Text-line Orientation Classification](PULC_textline_orientation_en.md)| 96.01 |6.5M|2.72ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/textline_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/textline_orientation_pretrained.pdparams)|
| language_classification |[Language Classification](PULC_language_classification_en.md)| 99.26 |6.5M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/language_classification_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/language_classification_pretrained.pdparams)|
| car_exists |[Car Exists Classification](PULC_car_exists_en.md) | 95.92 | 7.1M | 2.38ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/car_exists_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/car_exists_pretrained.pdparams)|
| text_image_orientation |[Text Image Orientation Classification](PULC_text_image_orientation_en.md)| 99.06 | 7.1M | 2.16ms |[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/text_image_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/text_image_orientation_pretrained.pdparams)|
| textline_orientation |[Text-line Orientation Classification](PULC_textline_orientation_en.md)| 96.01 |7.0M|2.72ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/textline_orientation_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/textline_orientation_pretrained.pdparams)|
| language_classification |[Language Classification](PULC_language_classification_en.md)| 99.26 |7.1M|2.58ms|[inference model](https://paddleclas.bj.bcebos.com/models/PULC/inference/language_classification_infer.tar) / [pretrained model](https://paddleclas.bj.bcebos.com/models/PULC/pretrained/language_classification_pretrained.pdparams)|
**Note:**
......
......@@ -38,21 +38,21 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of person attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of person attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in
Pedestrian analysis scenarios, pedestrian tracking scenarios, etc.
The following table lists the relevant indicators of the model. The first three lines means that using Res2Net200_vd_26w_4s, SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | ma(%) | Latency(ms) | Size(M) | Training Strategy |
|-------|-----------|----------|---------------|---------------|
| Res2Net200_vd_26w_4s | 81.25 | 77.51 | 293 | using ImageNet pretrained |
| SwinTransformer_tiny | 80.17 | 89.51 | 107 | using ImageNet pretrained |
| SwinTransformer_tiny | 80.17 | 89.51 | 111 | using ImageNet pretrained |
| MobileNetV3_small_x0_35 | 70.79 | 2.90 | 1.7 | using ImageNet pretrained |
| PPLCNet_x1_0 | 76.31 | 2.01 | 6.6 | using ImageNet pretrained |
| PPLCNet_x1_0 | 77.31 | 2.01 | 6.6 | using SSLD pretrained |
| PPLCNet_x1_0 | 77.71 | 2.01 | 6.6 | using SSLD pretrained + EDA strategy|
| <b>PPLCNet_x1_0<b> | <b>78.59<b> | <b>2.01<b> | <b>6.6<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
| PPLCNet_x1_0 | 76.31 | 2.01 | 7.1 | using ImageNet pretrained |
| PPLCNet_x1_0 | 77.31 | 2.01 | 7.1 | using SSLD pretrained |
| PPLCNet_x1_0 | 77.71 | 2.01 | 7.1 | using SSLD pretrained + EDA strategy|
| <b>PPLCNet_x1_0<b> | <b>78.59<b> | <b>2.01<b> | <b>7.1<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high ma metric can be getted when backbone are Res2Net200_vd_26w_4s and SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the ma metric will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the ma metric is higher more 5.5 percentage points higher than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the ma metric can be improved by about 1 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the ma metric can be increased by 0.4 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the ma matric can be further improved by 0.88 percentage points. At this time, the ma metric of PPLCNet_x1_0 is only 1.58% different from SwinTransformer_tiny, but the speed is more than 44 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
......
......@@ -38,20 +38,20 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of human exists using PaddleClas PULC (Practical Ultra Lightweight Classification). The model can be widely used in monitoring scenarios, personnel access control scenarios, massive data filtering scenarios, etc.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of human exists using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in monitoring scenarios, personnel access control scenarios, massive data filtering scenarios, etc.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | Tpr(%) | Latency(ms) | Size(M)| Training Strategy |
|-------|-----------|----------|---------------|---------------|
| SwinTranformer_tiny | 95.69 | 95.30 | 107 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 68.25 | 2.85 | 1.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 89.57 | 2.12 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 92.10 | 2.12 | 6.5 | using SSLD pretrained model |
| PPLCNet_x1_0 | 93.43 | 2.12 | 6.5 | using SSLD pretrained model + EDA strategy |
| <b>PPLCNet_x1_0<b> | <b>95.60<b> | <b>2.12<b> | <b>6.5<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high Tpr can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 20 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 2.6 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the Tpr can be increased by 1.3 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the Tpr can be further improved by 2.2 percentage points. At this point, the Tpr is close to that of SwinTranformer_tiny, but the speed is more than 40 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
| SwinTranformer_tiny | 95.69 | 95.30 | 111 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 68.25 | 2.85 | 2.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 89.57 | 2.12 | 7.0 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 92.10 | 2.12 | 7.0 | using SSLD pretrained model |
| PPLCNet_x1_0 | 93.43 | 2.12 | 7.0 | using SSLD pretrained model + EDA strategy |
| <b>PPLCNet_x1_0<b> | <b>96.23<b> | <b>2.12<b> | <b>7.0<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high Tpr can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 20 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 2.6 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the Tpr can be increased by 1.3 percentage points. Finally, after additional using the SKL-UGI knowledge distillation, the Tpr can be further improved by 2.8 percentage points. At this point, the Tpr is close to that of SwinTranformer_tiny, but the speed is more than 40 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
**Note**:
......
......@@ -38,19 +38,19 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of wheather wearing safety helmet using PaddleClas PULC (Practical Ultra Lightweight Classification). The model can be widely used in construction scenes, factory workshop scenes, traffic scenes and so on.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of wheather wearing safety helmet using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in construction scenes, factory workshop scenes, traffic scenes and so on.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
The following table lists the relevant indicators of the model. The first three lines means that using SwinTransformer_tiny, Res2Net200_vd_26w_4s and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | Tpr(%) | Latency(ms) | Size(M)| Training Strategy |
|-------|-----------|----------|---------------|---------------|
| SwinTranformer_tiny | 93.57 | 91.32 | 107 | using ImageNet pretrained model |
| SwinTranformer_tiny | 93.57 | 91.32 | 111 | using ImageNet pretrained model |
| Res2Net200_vd_26w_4s | 98.92 | 80.99 | 284 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 84.83 | 2.85 | 1.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 93.27 | 2.03 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.16 | 2.03 | 6.5 | using SSLD pretrained model |
| PPLCNet_x1_0 | 99.30 | 2.03 | 6.5 | using SSLD pretrained model + EDA strategy |
| <b>PPLCNet_x1_0<b> | <b>99.38<b> | <b>2.03<b> | <b>6.5<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
| MobileNetV3_small_x0_35 | 84.83 | 2.85 | 2.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 93.27 | 2.03 | 7.1 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.16 | 2.03 | 7.1 | using SSLD pretrained model |
| PPLCNet_x1_0 | 99.30 | 2.03 | 7.1 | using SSLD pretrained model + EDA strategy |
| <b>PPLCNet_x1_0<b> | <b>99.38<b> | <b>2.03<b> | <b>7.1<b> | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high Tpr can be getted when backbone is Res2Net200_vd_26w_4s, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the Tpr will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the Tpr is higher more 8.5 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 20% faster. After additional using the SSLD pretrained model, the Tpr can be improved by about 4.9 percentage points without affecting the inference speed. Further, additional using the EDA strategy, the Tpr can be increased by 1.1 percentage points. Finally, after additional using the UDML knowledge distillation, the Tpr can be further improved by 2.2 percentage points. At this point, the Tpr is higher than that of Res2Net200_vd_26w_4s, but the speed is more than 70 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
......
......@@ -36,17 +36,17 @@
## 1. Introduction
In the process of document scanning, license shooting and so on, sometimes in order to shoot more clearly, the camera device will be rotated, resulting in photo in different directions. At this time, the standard OCR process cannot cope with these issues well. Using the text image orientation classification technology, the direction of the text image can be predicted and adjusted, so as to improve the accuracy of OCR processing. This case provides a way for users to use PaddleClas PULC (Practical Ultra Lightweight Classification) to quickly build a lightweight, high-precision, practical classification model of text image orientation. This model can be widely used in OCR processing scenarios of rotating pictures in financial, government and other industries.
In the process of document scanning, license shooting and so on, sometimes in order to shoot more clearly, the camera device will be rotated, resulting in photo in different directions. At this time, the standard OCR process cannot cope with these issues well. Using the text image orientation classification technology, the direction of the text image can be predicted and adjusted, so as to improve the accuracy of OCR processing. This case provides a way for users to use PaddleClas PULC (Practical Ultra Lightweight image Classification) to quickly build a lightweight, high-precision, practical classification model of text image orientation. This model can be widely used in OCR processing scenarios of rotating pictures in financial, government and other industries.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to fifth lines means that the backbone is replaced by PPLCNet, additional use of SSLD pretrained model and additional use of hyperparameters searching strategy.
| Backbone | Top1-Acc(%) | Latency(ms) | Size(M)| Training Strategy |
| ----------------------- | --------- | ---------- | --------- | ------------------------------------- |
| SwinTranformer_tiny | 99.12 | 89.65 | 107 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 83.61 | 2.95 | 17 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 97.85 | 2.16 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.02 | 2.16 | 6.5 | using SSLD pretrained model |
| **PPLCNet_x1_0** | **99.06** | **2.16** | **6.5** | using SSLD pretrained model + hyperparameters searching strategy |
| SwinTranformer_tiny | 99.12 | 89.65 | 111 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 83.61 | 2.95 | 2.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 97.85 | 2.16 | 7.1 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 98.02 | 2.16 | 7.1 | using SSLD pretrained model |
| **PPLCNet_x1_0** | **99.06** | **2.16** | **7.1** | using SSLD pretrained model + hyperparameters searching strategy |
It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the accuracy is higher more 14 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more faster. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.17 percentage points without affecting the inference speed. Finally, after additional using the hyperparameters searching strategy, the accuracy can be further improved by 1.04 percentage points. At this point, the accuracy is close to that of SwinTranformer_tiny, but the speed is more faster. The training method and deployment instructions of PULC will be introduced in detail below.
......
......@@ -38,19 +38,19 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of textline orientation using PaddleClas PULC (Practical Ultra Lightweight Classification). The model can be widely used in character correction, character recognition, etc.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of textline orientation using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in character correction, character recognition, etc.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | Top-1 Acc(%) | Latency(ms) | Size(M)| Training Strategy |
|-------|-----------|----------|---------------|---------------|
| SwinTranformer_tiny | 93.61 | 89.64 | 107 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 81.40 | 2.96 | 17 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 89.99 | 2.11 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0* | 94.06 | 2.68 | 6.5 | using ImageNet pretrained model |
| PPLCNet_x1_0* | 94.11 | 2.68 | 6.5 | using SSLD pretrained model |
| <b>PPLCNet_x1_0**<b> | <b>96.01<b> | <b>2.72<b> | <b>6.5<b> | using SSLD pretrained model + EDA strategy |
| PPLCNet_x1_0** | 95.86 | 2.72 | 6.5 | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
| SwinTranformer_tiny | 93.61 | 89.64 | 111 | using ImageNet pretrained model |
| MobileNetV3_small_x0_35 | 81.40 | 2.96 | 2.6 | using ImageNet pretrained model |
| PPLCNet_x1_0 | 89.99 | 2.11 | 7.0 | using ImageNet pretrained model |
| PPLCNet_x1_0* | 94.06 | 2.68 | 7.0 | using ImageNet pretrained model |
| PPLCNet_x1_0* | 94.11 | 2.68 | 7.0 | using SSLD pretrained model |
| <b>PPLCNet_x1_0**<b> | <b>96.01<b> | <b>2.72<b> | <b>7.0<b> | using SSLD pretrained model + EDA strategy |
| PPLCNet_x1_0** | 95.86 | 2.72 | 7.0 | using SSLD pretrained model + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen that high accuracy can be getted when backbone is SwinTranformer_tiny, but the speed is slow. Replacing backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the accuracy will be greatly reduced. Replacing backbone with faster backbone PPLCNet_x1_0, the accuracy is higher more 8.6 percentage points than MobileNetv3_small_x0_35. At the same time, the speed can be more than 10% faster. On this basis, by changing the resolution and stripe (refer to [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)), the speed becomes 27% slower, but the accuracy can be improved by 4.5 percentage points. After additional using the SSLD pretrained model, the accuracy can be improved by about 0.05 percentage points without affecting the inference speed. Finally, additional using the EDA strategy, the accuracy can be increased by 1.9 percentage points. The training method and deployment instructions of PULC will be introduced in detail below.
......
......@@ -38,7 +38,7 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of traffic sign using PaddleClas PULC (Practical Ultra Lightweight Classification). The model can be widely used in automatic driving, road monitoring, etc.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of traffic sign using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in automatic driving, road monitoring, etc.
The following table lists the relevant indicators of the model. The first two lines means that using SwinTransformer_tiny and MobileNetV3_small_x0_35 as the backbone to training. The third to sixth lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
......
......@@ -38,13 +38,12 @@
## 1. Introduction
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of vehicle attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in
Vehicle identification, road monitoring and other scenarios.
This case provides a way for users to quickly build a lightweight, high-precision and practical classification model of vehicle attribute using PaddleClas PULC (Practical Ultra Lightweight image Classification). The model can be widely used in Vehicle identification, road monitoring and other scenarios.
The following table lists the relevant indicators of the model. The first three lines means that using Res2Net200_vd_26w_4s, ResNet50 and MobileNetV3_small_x0_35 as the backbone to training. The fourth to seventh lines means that the backbone is replaced by PPLCNet, additional use of EDA strategy and additional use of EDA strategy and SKL-UGI knowledge distillation strategy.
| Backbone | ma(%) | Latency(ms) | Size(M) | Training Strategy |
| Backbone | mA(%) | Latency(ms) | Size(M) | Training Strategy |
|-------|-----------|----------|---------------|---------------|
| Res2Net200_vd_26w_4s | 91.36 | 79.46 | 293 | using ImageNet pretrained |
| ResNet50 | 89.98 | 12.83 | 92 | using ImageNet pretrained |
......@@ -52,11 +51,11 @@ The following table lists the relevant indicators of the model. The first three
| PPLCNet_x1_0 | 89.57 | 2.36 | 7.2 | using ImageNet pretrained |
| PPLCNet_x1_0 | 90.07 | 2.36 | 7.2 | using SSLD pretrained |
| PPLCNet_x1_0 | 90.59 | 2.36 | 7.2 | using SSLD pretrained + EDA strategy|
| <b>PPLCNet_x1_0<b> | <b>90.81<b> | <b>2.36<b> | <b>8.2<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
| <b>PPLCNet_x1_0<b> | <b>90.81<b> | <b>2.36<b> | <b>7.2<b> | using SSLD pretrained + EDA strategy + SKL-UGI knowledge distillation strategy|
It can be seen from the table that the ma metric is higher when the backbone is Res2Net200_vd_26w_4s, but the inference speed is slower. After replacing the backbone with the lightweight model MobileNetV3_small_x0_35, the speed can be greatly improved, but the ma metric drops significantly. When the backbone is replaced by PPLCNet_x1_0, the ma metric is increased by 2 percentage points, and the speed is also increased by about 23%. On this basis, after using the SSLD pre-training model, the ma metric can be improved by about 0.5 percentage points without changing the inference speed. Further, when the EDA strategy is integrated, the ma metric can be improved by another 0.52 percentage points. Finally, using After SKL-UGI knowledge distillation, the ma metric can continue to improve by 0.23 percentage points. At this time, the ma metric of PPLCNet_x1_0 is only 0.55 percentage points away from Res2Net200_vd_26w_4s, but it is 32 times faster. The training method and deployment instructions of PULC will be introduced in detail below.
**Note**:
......@@ -163,16 +162,16 @@ Part of the data visualization is shown below.
<div align="center">
<img src="../../images/PULC/docs/vehicle_attribute_data_demo.png" width = "500" />
</div>
First, apply for and download data from [VeRi dataset official website](https://www.v7labs.com/open-datasets/veri-dataset), put it in the `dataset` directory of PaddleClas, the dataset directory name is `VeRi `, use the following command to enter the folder.
```shell
cd PaddleClas/dataset/VeRi/
```
Then use the following code to convert the label (you can execute the following command in the python terminal, or you can write it to a file and run the file using `python3 convert.py`).
```python
import os
from xml.dom.minidom import parse
......@@ -209,10 +208,10 @@ def convert_annotation(input_fp, output_fp):
convert_annotation('train_label.xml', 'train_list.txt') #imagename vehiclenum colorid typeid
convert_annotation('test_label.xml', 'test_list.txt')
```
After executing the above command, the `VeRi` directory has the following data:
```
VeRi
├── image_train
......@@ -231,7 +230,7 @@ VeRi
├── train_label.xml
├── test_label.xml
```
where `train/` and `test/` are the training set and validation set, respectively. `train_list.txt` and `test_list.txt` are the converted label files for training and validation sets, respectively.
......@@ -427,7 +426,7 @@ python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehi
The prediction results:
```
0002_c002_00030670_0.jpg: {'attributes': 'Color: (yellow, prob: 0.9893478155136108), Type: (hatchback, prob: 0.9734099507331848)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
0002_c002_00030670_0.jpg: {'attributes': 'Color: (yellow, prob: 0.9893478155136108), Type: (hatchback, prob: 0.9734099507331848)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
```
......@@ -445,8 +444,8 @@ python3.7 python/predict_cls.py -c configs/PULC/vehicle_attribute/inference_vehi
All prediction results will be printed, as shown below.
```
0002_c002_00030670_0.jpg: {'attributes': 'Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
0014_c012_00040750_0.jpg: {'attributes': 'Color: (red, prob: 0.999872088432312), Type: (sedan, prob: 0.999976634979248)', 'output': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]}
0002_c002_00030670_0.jpg: {'attributes': 'Color: (yellow, prob: 0.9893476963043213), Type: (hatchback, prob: 0.9734097719192505)', 'output': [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0]}
0014_c012_00040750_0.jpg: {'attributes': 'Color: (red, prob: 0.999872088432312), Type: (sedan, prob: 0.999976634979248)', 'output': [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0]}
```
Among the prediction results above, `someone` means that there is a human in the image, `nobody` means that there is no human in the image.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册