未验证 提交 3463e8be 编写于 作者: Q Qian Zhang 提交者: GitHub

Add PP-LiteSeg English Docs (#5584)

* Add PP-HumansegV2 English Docs

* Add PP-LiteSeg English Docs

* Update benchmark_en.md

update

* Update download_en.md

update

* Update introduction_en.ipynb

update

* Update benchmark_en.md

update

* Update download_en.md

update

* Update introduction_en.ipynb

update
上级 2bc82f2f
## 1. Inference Benchmark
### 1.1 Environment
* Segmentation accuracy (mIoU): PP-LightSeg is used for training and testing on Cityscapes dataset.
* Inference latency on Nvidia GPU: The hardware is Nvidia GPU (1080Ti). To keep the same as other methods, first use [infer_onnx_trt.py](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/deploy/python/infer_onnx_trt.py) to convert the model to ONNX format, and then use the native TRT prediction engine for testing.
### 1.2 Datasets
* Cityscapes dataset is used for testing.
### 1.3 Benchmark
<div align="center">
|Model|Encoder|Resolution|mIoU(Val)|mIoU(Test)|FPS|
|-|-|-|-|-|-|
ESPNet | ESPNet | 512x1024 | - | 60.3 | 112.9 |
ESPNetV2 | ESPNetV2 | 512x1024 | 66.4 | 66.2 | - |
SwiftNet | ResNet18 | 1024x2048 | 75.4 | 75.5 | 39.9 |
BiSeNetV1 | Xception39 | 768x1536 | 69.0 | 68.4 | 105.8 |
BiSeNetV1-L | ResNet18 | 768x1536 | 74.8 | 74.7 | 65.5 |
BiSeNetV2 | - | 512x1024 | 73.4 | 72.6 | 156 |
BiSeNetV2-L | - | 512x1024 | 75.8 | 75.3 | 47.3 |
FasterSeg | - | 1024x2048 | 73.1 | 71.5 | 163.9 |
SFNet | DF1 | 1024x2048 | - | 74.5 | 121 |
STDC1-Seg50 | STDC1 | 512x1024 | 72.2 | 71.9 | 250.4 |
STDC2-Seg50 | STDC2 | 512x1024 | 74.2 | 73.4 | 188.6 |
STDC1-Seg75 | STDC1 | 768x1536 | 74.5 | 75.3 | 126.7 |
STDC2-Seg75 | STDC2 | 768x1536 | 77.0 | 76.8 | 97.0 |
PP-LiteSeg-T1 | STDC1 | 512x1024 | 73.1 | 72.0 | 273.6 |
PP-LiteSeg-B1 | STDC2 | 512x1024 | 75.3 | 73.9 | 195.3 |
PP-LiteSeg-T2 | STDC1 | 768x1536 | 76.0 | 74.9 | 143.6 |
PP-LiteSeg-B2 | STDC2 | 768x1536 | 78.2 | 77.5 | 102.6|
</div>
<div align="center">
<img src="https://user-images.githubusercontent.com/52520497/162148733-70be896a-eadb-4790-94e5-f48dad356b2d.png" width = "500" height = "430" alt="iou_fps" />
</div>
## 2. Reference
Ref: https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/configs/pp_liteseg
# Model Lists
## 1 Semantic segmentation models on Cityscapes
| Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
| --- | --- | --- | ---| --- | --- | --- | --- | --- |
|PP-LiteSeg-T|STDC1|160000|1024x512|1025x512|73.10%|73.89%|-|[config](./pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k_inference_model.zip)|
|PP-LiteSeg-T|STDC1|160000|1024x512|1536x768|76.03%|76.74%|-|[config](./pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc1_cityscapes_1024x512_scale0.75_160k_inference_model.zip)|
|PP-LiteSeg-T|STDC1|160000|1024x512|2048x1024|77.04%|77.73%|77.46%|[config](./pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k_inference_model.zip)|
|PP-LiteSeg-B|STDC2|160000|1024x512|1024x512|75.25%|75.65%|-|[config](./pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc2_cityscapes_1024x512_scale0.5_160k_inference_model.zip)|
|PP-LiteSeg-B|STDC2|160000|1024x512|1536x768|78.75%|79.23%|-|[config](./pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc2_cityscapes_1024x512_scale0.75_160k_inference_model.zip)|
|PP-LiteSeg-B|STDC2|160000|1024x512|2048x1024|79.04%|79.52%|79.85%|[config](./pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/cityscapes/pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc2_cityscapes_1024x512_scale1.0_160k_inference_model.zip)|
## 2 Semantic segmentation models on CamVid
| Model | Backbone | Training Iters | Train Resolution | Test Resolution | mIoU | mIoU (flip) | mIoU (ms+flip) | Links |
| --- | --- | --- | ---| --- | --- | --- | --- | --- |
|PP-LiteSeg-T|STDC1|10000|960x720|960x720|73.30%|73.89%|73.66%|[config](./pp_liteseg_stdc1_camvid_960x720_10k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/camvid/pp_liteseg_stdc1_camvid_960x720_10k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc1_camvid_960x720_10k_inference_model.zip)|
|PP-LiteSeg-B|STDC2|10000|960x720|960x720|75.10%|75.85%|75.48%|[config](./pp_liteseg_stdc2_camvid_960x720_10k.yml)\|[Pretrained_model](https://paddleseg.bj.bcebos.com/dygraph/camvid/pp_liteseg_stdc2_camvid_960x720_10k/model.pdparams)\|[inference_model](https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc2_camvid_960x720_10k_inference_model.zip)|
{
"cells": [
{
"cell_type": "markdown",
"id": "58d0e877-51fa-431a-9c76-dd0fe1177861",
"metadata": {},
"source": [
"## PP-LiteSeg Introduction\n",
"\n",
"Real-world applications have high demands for semantic segmentation methods. Although semantic segmentation has made remarkable leap-forwards with deeplearning, the precision and performance of the semantic model is not satisfactory.\n",
"\n",
"So, we propose PP-LiteSeg, a novel lightweight model for the real-time semantic segmentation task. Specifically, we present a Flexible and Lightweight Decoder (FLD) to reduce computation overhead of previous decoder. To strengthen feature representations, we propose a Unified Attention Fusion Module (UAFM), which takes advantage of spatial and channel attention to produce a weight and then fuses the input features with the weight. Moreover, a Simple Pyramid Pooling Module (SPPM) is proposed to aggregate global context with low computation cost.\n",
"\n",
"On the Cityscapes test set, PP-LiteSeg achieves 72.0% mIoU/273.6 FPS and 77.5% mIoU/102.6 FPS on NVIDIA GTX 1080Ti. PP-LiteSeg achieves a superior tradeoff between accuracy and speed compared to other methods.\n",
"\n",
"PP-LiteSeg model is officially produced by PaddlePaddle and is a SOTA model proposed by PaddleSeg. More information about PaddleSeg can be found here https://github.com/PaddlePaddle/PaddleSeg."
]
},
{
"cell_type": "markdown",
"id": "55360c7a-3191-40bf-99c5-64c1c3d89967",
"metadata": {},
"source": [
"## 2. Model Effects and Application Scenarios\n",
"\n",
"### 2.1 Real-Time Semantic Segmentation Tasks:\n",
"\n",
"#### 2.1.1 Datasets:\n",
"\n",
"The dataset is mainly Cityscapes, which is divided into training set and test set.\n",
"\n",
"#### 2.1.2 Model Effects:\n",
"\n",
"The segmentation effect of PP-LiteSeg on the image is:\n",
"\n",
"Original image:\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/48357642/201077761-3ebeda52-b15d-4913-b64c-0798d1f922a5.png\" width = \"60%\" />\n",
"</div>\n",
"\n",
"Segmentation effection image:\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/48357642/201077985-29954838-9df6-4ab4-9f91-23e9a20be513.png\" width = \"60%\" />\n",
"</div>"
]
},
{
"cell_type": "markdown",
"id": "be5dce27-7842-4af5-8ba7-8a1d71b31680",
"metadata": {},
"source": [
"## 3. How to Use the Model\n",
"\n",
"### 3.1 Model Inference:\n",
"\n",
"* Install PaddlePaddle\n",
"\n",
"To install PaddlePaddle, PaddlePaddle>=2.2.0 is required. Since the image segmentation model is computationally expensive, it is recommended to use the GPU version of PaddlePaddle.\n",
"\n",
"In AI Studio, you can directly choose the environment where PaddlePaddle is installed. If you need to install PaddlePaddle, please refer to the PaddlePaddle official website.\n",
" \n"
]
},
{
"cell_type": "markdown",
"id": "99436a4d-9c54-4b5e-b494-68424a09d7a5",
"metadata": {},
"source": [
"\n",
"* Download PaddleSeg \n",
"\n",
"(When not running on Jupyter Notebook, \"!\" Or \"%\" should be removed.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a16f87a6-85ea-4050-a698-ea634db9c235",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%cd ~\n",
"!git clone https://gitee.com/PaddlePaddle/PaddleSeg.git"
]
},
{
"cell_type": "markdown",
"id": "846586a0-456d-49da-a4d3-6192f26c2e01",
"metadata": {},
"source": [
"* Install PaddleSeg"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a234d0dc-a4bd-48b2-af1b-168538b6d9b6",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"%cd ~/PaddleSeg\n",
"!git checkout release/2.6\n",
"!pip install -v -e ."
]
},
{
"cell_type": "markdown",
"id": "01b72385-c22e-414a-8d3e-5fcaea6e34b4",
"metadata": {},
"source": [
"* Quick experience"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3b63e945-38d8-45dd-a066-3b5c05fc06a2",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [],
"source": [
"# Download model\n",
"!wget https://paddleseg.bj.bcebos.com/inference/pp_liteseg_infer_models/pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k_inference_model.zip\n",
"!unzip pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k_inference_model.zip\n",
"# Download test image\n",
"!wget https://paddleseg.bj.bcebos.com/dygraph/demo/cityscapes_demo.png\n",
"# predict\n",
"!python deploy/python/infer.py \\\n",
" --config ./pp_liteseg_stdc1_cityscapes_1024x512_scale1.0_160k_inference_model/deploy.yaml \\\n",
" --image_path ./cityscapes_demo.png \\\n",
" --save_dir output/result"
]
},
{
"cell_type": "markdown",
"id": "5d99268c-5dcf-47bb-97f7-c11b54ebed48",
"metadata": {},
"source": [
"The result as follows is saved in `PaddleSeg/output/result/cityscapes_demo.png`.\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/48357642/201077985-29954838-9df6-4ab4-9f91-23e9a20be513.png\" width = \"60%\" />\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"id": "40e633a3-8f48-4883-8de7-fc88b3cb7ea7",
"metadata": {},
"source": [
"### 3.2 Model Training\n",
"\n",
"* Prepare\n",
"\n",
"Install PaddleSeg([doc](https://github.com/PaddlePaddle/PaddleSeg)), Download cityscapes dataset and link it to `PaddleSeg/data`([cityscapes](https://paddleseg.bj.bcebos.com/dataset/cityscapes.tar)).\n",
"```\n",
"PaddleSeg/data\n",
"├── cityscapes\n",
"│   ├── gtFine\n",
"│   ├── infer.list\n",
"│   ├── leftImg8bit\n",
"│   ├── test.list\n",
"│   ├── train.list\n",
"│   ├── trainval.list\n",
"│   └── val.list\n",
"```\n"
]
},
{
"cell_type": "markdown",
"id": "4fd01e45-577f-4357-af98-26d815403ea8",
"metadata": {},
"source": [
"* Training:\n",
"\n",
"The config files of PP-LiteSeg are under `PaddleSeg/configs/pp_liteseg/`.\n",
"\n",
"Based on the `train.py` script, we set the config file and start training model."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "034f5d95-61c4-4ee0-9445-e432ba04b366",
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"!export CUDA_VISIBLE_DEVICES=0,1,2,3\n",
"!python -m paddle.distributed.launch train.py \\\n",
" --config configs/pp_liteseg/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k.yml \\\n",
" --save_dir output/pp_liteseg_stdc1_cityscapes_1024x512_scale0.5_160k \\\n",
" --save_interval 1000 \\\n",
" --num_workers 3 \\\n",
" --do_eval \\\n",
" --use_vdl"
]
},
{
"cell_type": "markdown",
"id": "f005c439-6677-439a-9e2d-9317f084d303",
"metadata": {},
"source": [
"After the training, the weights are saved in `PaddleSeg/output/xxx/best_model/model.pdparams`\n",
"\n",
"Refer to ([doc](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/train/train.md)) for the detailed usage of training.\n"
]
},
{
"cell_type": "markdown",
"id": "3abb9723-e839-407f-a3dc-e9ed6bca5f45",
"metadata": {},
"source": [
"## 4. Model Principles\n",
"\n",
"The PP-LiteSeg model structure:\n",
"\n",
"<div align=\"center\">\n",
"<img src=\"https://user-images.githubusercontent.com/52520497/162148786-c8b91fd1-d006-4bad-8599-556daf959a75.png\" width = \"50%\" />\n",
"</div>\n",
"\n",
"* We propose a Flexible and Lightweight Decoder\n",
"\n",
"We propose a Flexible and Lightweight Decoder, which gradually reduces the channels and increases the spatial size for the features. Besides, the volume of proposed decoder can be easily adjusted according to the encoder. The flexible design balances the computation complexity of encoder and decoder, which makes the overall model more efficient.\n",
"\n",
"* We present a Unified Attention Fusion Module(UAFM) \n",
"\n",
"Strengthening feature representations is a crucial way to improve segmentation accuracy . It is usually achieved through fusing the low-level and high-level features in a decoder. However, the fusion modules in existing methods usually suffer from high computation cost. We propose a Unified Attention Fusion Module (UAFM) to strengthen feature representations efficiently. In UAFM, there are two kinds of attention modules, i.e. spatial and channel attention modules, which exploit inter-spatial and inter-channel relationships of the input features.\n",
"\n",
"* We propose a Simple Pyramid Pooling Module(SPPM) \n",
"\n",
"Contextual aggregation is another key to promote segmentation accuracy, but previous aggregation modules are\n",
"time-consuming for real-time networks. We design a Simple Pyramid Pooling Module (SPPM), which reduces the intermediate and output channels, removes the short-cut, and replaces the concatenate operation with an add operation. Experimental results show SPPM contributes to the segmentation accuracy with low computation cost.\n",
"\n",
"We evaluate the proposed PP-LiteSeg through extensive experiments on Cityscapes and CamVid dataset. PP-LiteSeg achieves a superior tradeoff between segmentation accuracy and inference speed. Specifically, PP-LiteSeg achieves 72.0% mIoU/273.6 FPS and 77.5% mIoU/102.6 FPS on the Cityscapes test set.\n"
]
},
{
"cell_type": "markdown",
"id": "b23dc65f-eff4-4cad-98c8-9856c5cb9ee1",
"metadata": {},
"source": [
"## 5. Related papers and citations\n",
"\n",
"If you find our project useful in your research, please consider citing:\n",
"\n",
"```\n",
"@article{peng2022pp-liteseg,\n",
" title={PP-LiteSeg: A Superior Real-Time Semantic Segmentation Model},\n",
" author={Juncai Peng, Yi Liu, Shiyu Tang, Yuying Hao, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Zhiliang Yu, Yuning Du, Qingqing Dang,Baohua Lai, Qiwen Liu, Xiaoguang Hu, Dianhai Yu, Yanjun Ma.},\n",
" journal={arXiv e-prints},\n",
" pages={arXiv--2204},\n",
" year={2022}\n",
"}\n",
"```\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "py35-paddle1.2.0"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册