{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. PP-LCNet Introduction\n", "PP-LCNet series model is a lightweight convolutional neural network model proposed by PaddleCV team. This series model is specially optimized for Intel CPU hardware platform and takes into account various platforms at the same time. It achieves a good balance between model inference speed and precision and achieves a lightweight backbone network with faster speed and higher accuracy. It can also improve the effect of object detection, semantic segmentation and other downstream visual tasks. On the ImageNet1k dataset, the PP-LCNet series model is compared to other lightweight networks as shown in the figure below.\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "The PP-LCNet series model was developed and trained based on PaddleClas. For more information on PP-LCNet series models, see [PaddleClas-PPLCNet](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/en/models/PP-LCNet_en.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Performance\n", "\n", "We evaluated the performance of PP-LCNet series models on the tasks of image classification, object detection and semantic segmentation.\n", "\n", "### 2.1 Image Classification\n", "\n", "The evaluation results of PP-LCNet series models in ImageNet1k dataset are shown as follows:\n", "\n", "| Model | Params(M) | FLOPs(M) | Top-1 Acc(\\%) | Top-5 Acc(\\%) | Latency(ms) |\n", "|:--:|:--:|:--:|:--:|:--:|:--:|\n", "| PPLCNet_x0_25 | 1.5 | 18 | 51.86 | 75.65 | 1.74 |\n", "| PPLCNet_x0_35 | 1.6 | 29 | 58.09 | 80.83 | 1.92 |\n", "| PPLCNet_x0_5 | 1.9 | 47 | 63.14 | 84.66 | 2.05 |\n", "| PPLCNet_x0_75 | 2.4 | 99 | 68.18 | 88.30 | 2.29 |\n", "| PPLCNet_x1_0 | 3.0 | 161 | 71.32 | 90.03 | 2.46 |\n", "| PPLCNet_x1_5 | 4.5 | 342 | 73.71 | 91.53 | 3.19 |\n", "| PPLCNet_x2_0 | 6.5 | 590 | 75.18 | 92.27 | 4.27 |\n", "| PPLCNet_x2_5 | 9.0 | 906 | 76.60 | 93.00 | 5.39 |\n", "| PPLCNet_x0_5_ssld | 1.9 | 47 | 66.10 | 86.46 | 2.05 |\n", "| PPLCNet_x1_0_ssld | 3.0 | 161 | 74.39 | 92.09 | 2.46 |\n", "| PPLCNet_x2_5_ssld | 9.0 | 906 | 80.82 | 95.33 | 5.39 |\n", "\n", "`_ssld` represents the model after using `SSLD distillation`. For details about `SSLD distillation`, see [SSLD distillation](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/en/advanced_tutorials/knowledge_distillation_en.md).\n", "\n", "Performance comparison with other lightweight networks:\n", "\n", "| Model | Params(M) | FLOPs(M) | Top-1 Acc(\\%) | Top-5 Acc(\\%) | Latency(ms) |\n", "|:--:|:--:|:--:|:--:|:--:|:--:|\n", "| MobileNetV2_x0_25 | 1.5 | 34 | 53.21 | 76.52 | 2.47 |\n", "| MobileNetV3_small_x0_35 | 1.7 | 15 | 53.03 | 76.37 | 3.02 |\n", "| ShuffleNetV2_x0_33 | 0.6 | 24 | 53.73 | 77.05 | 4.30 |\n", "| PPLCNet_x0_25 | 1.5 | 18 | 51.86 | 75.65 | 1.74 |\n", "| MobileNetV2_x0_5 | 2.0 | 99 | 65.03 | 85.72 | 2.85 |\n", "| MobileNetV3_large_x0_35 | 2.1 | 41 | 64.32 | 85.46 | 3.68 |\n", "| ShuffleNetV2_x0_5 | 1.4 | 43 | 60.32 | 82.26 | 4.65 |\n", "| PPLCNet_x0_5 | 1.9 | 47 | 63.14 | 84.66 | 2.05 |\n", "| MobileNetV1_x1_0 | 4.3 | 578 | 70.99 | 89.68 | 3.38 |\n", "| MobileNetV2_x1_0 | 3.5 | 327 | 72.15 | 90.65 | 4.26 |\n", "| MobileNetV3_small_x1_25 | 3.6 | 100 | 70.67 | 89.51 | 3.95 |\n", "| PPLCNet_x1_0 | 3.0 | 161 | 71.32 | 90.03 | 2.46 |\n", "\n", "### 2.2 Object Detection\n", "\n", "The object detection method uses PicoDet developed by Baidu, which focuses on lightweight object detection scenarios. The following table shows the comparison between PP-LCNet and MobileNetV3 on COCO datasets. In terms of accuracy and speed, The advantages of PP-LCNet are obvious.\n", "\n", "| Backbone | mAP(%) | Latency(ms) |\n", "|:--:|:--:|:--:|\n", "| MobileNetV3_large_x0_35 | 19.2 | 8.1 |\n", "| PPLCNet_x0_5 | 20.3 | 6.0 |\n", "| MobileNetV3_large_x0_75 | 25.8 | 11.1 |\n", "| PPLCNet_x1_0 | 26.9 | 7.9 |\n", "\n", "### 2.3 Semantic Segmentation\n", "\n", "The semantic segmentation method uses DeeplabV3+,The following table shows the comparison between PP-LCNet and MobileNetV3 on Cityscapes datasets. PP-LCNet has obvious advantages in terms of accuracy and speed.\n", "\n", "| Backbone | mIoU(%) | Latency(ms) |\n", "|:--:|:--:|:--:|\n", "| MobileNetV3_large_x0_5 | 55.42 | 135 |\n", "| PPLCNet_x0_5 | 58.36 | 82 |\n", "| MobileNetV3_large_x0_75 | 64.53 | 151 |\n", "| PPLCNet_x1_0 | 66.03 | 96 |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Quick Start\n", "\n", "### 3.1 Inference:\n", "* Install the relevant Python packages\n", "\n", " (Remove the\"!\" when not running on the Jupyter Notebook) \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the following command to install if CUDA9、CUDA10 or CUDA11 is available." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install paddlepaddle-gpu" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Run the following command to install if GPU device is unavailable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install paddlepaddle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install paddleclas" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!pip install paddleclas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* Quick Start\n", "\n", "Congratulations! Now that you've successfully installed PaddleClas, you can experience the image classification effects." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!wget https://gitee.com/paddlepaddle/PaddleClas/raw/release/2.5/docs/images/inference_deployment/whl_demo.jpg\n", "!paddleclas --model_name=PPLCNet_x1_0 --infer_imgs=\"./whl_demo.jpg\"\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above command results are as follows:\n", "\n", "class_ids: [8, 7, 86, 81, 85], scores: [0.91347, 0.03779, 0.0036, 0.00117, 0.00112], label_names: ['hen', 'cock', 'partridge', 'ptarmigan', 'quail'], filename: docs/images/inference_deployment/whl_demo.jpg\n", "Predict complete!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Training\n", "* PP-LCNet series models are implemented based on PaddleClas. For details of model training, please refer to[Training, Evaluation and Inference](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/en/models/PP-LCNet_en.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Theory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1 Overall Structure\n", "\n", "The overall structure of the network is shown in the figure below. \n", "\n", "
\n", "\n", "
\n", "\n", "### 4.2 Model Details\n", "\n", "Build on extensive experiments, through the experiment of different network structure,we summarized some strategies that can improve the accuracy of the model without increasing the latency and combined these four strategies to form PP-LCNet.Each of these strategies is described below:\n", "\n", "#### 4.2.1 Better Activation Function\n", "\n", "Since the convolutional neural network uses ReLU activation function, the network performance has been greatly improved. In recent years, variants of ReLU activation function have also appeared successively, such as Leaky-ReLU, P-ReLU, ELU, etc. In 2017, Google Brain team obtained the swish activation function by searching. The activation function performs well on lightweight networks, and in 2019, the authors of MobileNetV3 further optimized the activation function to H-Swish, which eliminates exponential operations and is faster with little impact on network accuracy. After many experiments, we find that this activation function has excellent performance on lightweight networks. Therefore, at PP-LCNet, we chose this activation function.\n", "\n", "#### 4.2.2 Add the SE Module at an Appropriate Position\n", "SE module is a channel attention mechanism proposed by SENet, which can effectively improve the accuracy of the model. However, in the Intel CPU side, this module will also bring a large delay, how to balance the accuracy and speed is a problem we need to solve. Although the position of SE module was searched in the network based on NAS search such as MobileNetV3, no general conclusion was drawn. We found through the experiment that the closer the SE module is to the tail of the network, the greater the improvement of model accuracy. The following table also shows some of our experimental results:\n", "\n", "| SE Location | Top-1 Acc(\\%) | Latency(ms) |\n", "|:--:|:--:|:--:|\n", "| 1100000000000 | 61.73 | 2.06 |\n", "| 0000001100000 | 62.17 | 2.03 |\n", "| 0000000000011 | 63.14 | 2.05 |\n", "| 1111111111111 | 64.27 | 3.80 |\n", "\n", "Finally, the position of SE module in PP-LCNet selected the scheme in the third row of the table.\n", "\n", "#### 4.2.3 Add a Larger Convolution Kernel at Appropriate Position \n", "\n", "In the paper on MixNet, the author analyzed the influence of convolution kernel size on model performance and concluded that large convolution kernel could improve model performance within a certain range, but beyond this range would damage model performance. Therefore, the author combined a split-concat paradigm MixConv. Although this combination can improve the performance of the model, it is not conducive to reasoning. Through experiments, we have summarized the functions of some larger convolution kernel in different positions. Similar to the position of SE module, larger convolution nuclei play a more obvious role in the middle and back of the network. The following table shows the influence of the position of 5x5 convolution kernel on the accuracy:\n", "\n", "| large-kernel Location | Top-1 Acc(\\%) | Latency(ms) |\n", "|:--:|:--:|:--:|\n", "| 1111111111111 | 63.22 | 2.08 |\n", "| 1111111000000 | 62.70 | 2.07 |\n", "| 0000001111111 | 63.14 | 2.05 |\n", "\n", "Experiments show that a larger convolution kernel placed in the middle and back of the network can achieve the accuracy of placing in all positions, and at the same time, faster inference speed can be obtained. PP-LCNet finally chose the third row of the table.\n", "\n", "\n", "#### 4.2.4 Use a Larger 1x1 Convolution Layer After the GAP\n", "After GoogLeNet, the classification layer is often directly adopted after the GAP(Global-Average-Pooling). However, in lightweight networks, this will lead to no further integration and processing of features extracted after the GAP. If a larger 1x1 convolution layer (equivalent to FC layer) is used thereafter, the features after GAP will not be directly through the classification layer, but the fusion is first carried out, and the fusion features are classified. This can greatly improve the accuracy without affecting the model inference speed. BaseNet obtained PP-LCNet by improving the above four aspects. \n", "\n", "\n", "The following table further illustrates the impact of each strategy on the results:\n", "\n", "| Activation | SE-block | Large-kernel | last-1x1-conv | Top-1 Acc(\\%) | Latency(ms) |\n", "|:--:|:--:|:--:|:--:|:--:|:--:|\n", "| 0 | 1 | 1 | 1 | 61.93 | 1.94 |\n", "| 1 | 0 | 1 | 1 | 62.51 | 1.87 |\n", "| 1 | 1 | 0 | 1 | 62.44 | 2.01 |\n", "| 1 | 1 | 1 | 0 | 59.91 | 1.85 |\n", "| 1 | 1 | 1 | 1 | 63.14 | 2.05 |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Notes\n", "The inference speed of PP-LCNet series model were tested at Intel Xeon Gold 6148 CPU with PaddlePaddle inference framework, and MKLDNN was turned on, the batch size was 1 and the number of threads was 10." ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Relevant Papers and Citations\n", "\n", "@misc{cui2021pplcnet,\n", " title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n", " author={Cheng Cui and Tingquan Gao and Shengyu Wei and Yuning Du and Ruoyu Guo and Shuilong Dong and Bin Lu and Ying Zhou and Xueying Lv and Qiwen Liu and Xiaoguang Hu and Dianhai Yu and Yanjun Ma},\n", " year={2021},\n", " eprint={2109.15099},\n", " archivePrefix={arXiv},\n", " primaryClass={cs.CV}\n", "}" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 4 }