diff --git a/modelcenter/PP-HumanSegV2/benchmark_en.md b/modelcenter/PP-HumanSegV2/benchmark_en.md new file mode 100644 index 0000000000000000000000000000000000000000..8b6d4c9c0d7afe4a3bf9fbf6513087d41c284fb2 --- /dev/null +++ b/modelcenter/PP-HumanSegV2/benchmark_en.md @@ -0,0 +1,42 @@ +## 1. Inference Benchmark + +### 1.1 Environment + +* Segmentation accuracy (mIoU): We tested the models on PP-HumanSeg-14K dataset using the best input shape, without tricks like multi-scale and flip. +* Inference latency on Snapdragon 855 ARM CPU: We tested the models on xiaomi9 (Snapdragon 855 CPU) using PaddleLite, with single thread, large kernel and best input shape. +* Inference latency on Intel CPU: We tested the models on Intel(R) Xeon(R) Gold 6271C CPU using Paddle Inference. + +### 1.2 Datasets + +* PP-HumanSeg14k[(paper)](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/paper.md), proposed by PaddleSeg team, is used for testing. + +### 1.3 Benchmark + +Inference speed is tested on Snapdragon 855 ARM CPU for each SOTA model. + +Model Name | Input Shape | mIou(%) | Speed(FPS) +---|---|---|--- +PortraitNet | 224x224 | 95.20 |31.93 +SINet | 224x224 | 93.76 | 78.12 +PP-HumanSegV2 | 224x224 | 95.21 | 52.27 +PP-HumanSegV2 | 192x192 | 94.87 | 81.96 + +Inference speed is tested on Arm CPU for model solutions. + +Model Name | Input Shape | mIoU(%) | Inference Speed on Arm CPU(FPS) +---|---|---|--- +PP-HumanSegV1 | 398x224 | 93.60 | 33.69 +PP-HumanSegV2 | 398x224 | 97.50 | 35.23 +PP-HumanSegV2 | 256x144 | 96.63 | 63.05 + +Inference speed is tested on Intel CPU for model solutions. + +Model Name | Input Shape | mIoU(%) | Inference Speed on Intel CPU(FPS) +---|---|---|--- +PP-HumanSegV1 | 398x224 | 93.60 | 36.02 +PP-HumanSegV2 | 398x224 | 97.50 | 60.75 +PP-HumanSegV2 | 256x144 | 96.63 | 70.67 + + +## 2. Reference +Ref: https://github.com/PaddlePaddle/PaddleSeg/tree/release/2.6/contrib/PP-HumanSeg diff --git a/modelcenter/PP-HumanSegV2/download_en.md b/modelcenter/PP-HumanSegV2/download_en.md new file mode 100644 index 0000000000000000000000000000000000000000..a87acab9034692b37511547029a47d961bb95107 --- /dev/null +++ b/modelcenter/PP-HumanSegV2/download_en.md @@ -0,0 +1,58 @@ +# Model List + +## 1 Portrait Segmentation Models + +Self-developed portrait segmentation models are released for real-time segmentation applications such as mobile phone video calls and web video conferences. These models can be directly integrated into products at zero cost. + +| Model Name | Best Input Shape | mIou(%) | Inference Time on Phone(ms) | Modle Size(MB) | Config File | Links | +| --- | --- | --- | ---| --- | --- | --- | +| PP-HumanSegV1-Lite | 398x224 | 93.60 | 29.68 | 2.3 | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/portrait_pp_humansegv1_lite.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/portrait_pp_humansegv1_lite_398x224_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/portrait_pp_humansegv1_lite_398x224_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/portrait_pp_humansegv1_lite_398x224_inference_model_with_softmax.zip) | +| PP-HumanSegV2-Lite | 256x144 | 96.63 | 15.86 | 5.4 | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/portrait_pp_humansegv2_lite.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/portrait_pp_humansegv2_lite_256x144_smaller/portrait_pp_humansegv2_lite_256x144_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/portrait_pp_humansegv2_lite_256x144_smaller/portrait_pp_humansegv2_lite_256x144_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/portrait_pp_humansegv2_lite_256x144_smaller/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax.zip) | + +
Note: + +* Segmentation accuracy (mIoU): We tested the models on [PP-HumanSeg-14K](https://github.com/PaddlePaddle/models/pull/5583/paper.md) dataset using the best input shape, without tricks like multi-scale and flip. +* Inference latency on Snapdragon 855 ARM CPU: We tested the models on xiaomi9 (Snapdragon 855 CPU) using [PaddleLite](https://www.paddlepaddle.org.cn/lite), with single thread, large kernel and best input shape. +* For the best input shape, the ratio of width and height is 16:9, which is the same as the camera of mobile phone and laptop. +* The checkpoint is the pretrained weight, which can be used for finetune together with the config file. +* Inference model is used for deployment. +* Inference Model (Argmax): The last operation of inference model is argmax, so the output is a single-channel image (int64), where the portrait area is 1, and the background area is 0. +* Inference Model (Softmax): The last operation of inference model is softmax, so the output is a single-channel image(float32), where each pixel represents the probability of portrait area. +
+ +
Usage: + +* Portrait segmentation model can be directly integrated into products at zero cost. Best input shape is suggested. +* For mobile phone, screen can be horizontal or vertical. Before fed into model, the image should be rotated to keep the portrait always vertical. +
+ +## 2 General Human Segmentation Models + +For general human segmentation task, we first built a large dataset, then used the SOTA models in PaddleSeg for training, and finally released several general models for human segmentation. + + +| Model Name | Best Input Shape | mIou(%) | Inference Time on ARM CPU(ms) | Inference Time on Nvidia GPU(ms) | Config File | Links | +| ----- | ---------- | ---------- | -----------------| ----------------- | ------- | ------- | +| PP-HumanSegV1-Lite | 192x192 | 86.02 | 12.3 | - | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/human_pp_humansegv1_lite.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_lite_192x192_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_lite_192x192_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_lite_192x192_inference_model_with_softmax.zip) | +| PP-HumanSegV2-Lite | 192x192 | 92.52 | 15.3 | - | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/human_pp_humansegv2_lite.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv2_lite_192x192_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv2_lite_192x192_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv2_lite_192x192_inference_model_with_softmax.zip) | +| PP-HumanSegV1-Mobile | 192x192 | 91.64 | - | 2.83 | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/human_pp_humansegv1_mobile.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_mobile_192x192_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_mobile_192x192_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_mobile_192x192_inference_model_with_softmax.zip) | +| PP-HumanSegV2-Mobile | 192x192 | 93.13 | - | 2.67 | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/human_pp_humansegv2_mobile.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv2_mobile_192x192_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv2_mobile_192x192_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv2_mobile_192x192_inference_model_with_softmax.zip) | +| PP-HumanSegV1-Server | 512x512 | 96.47 | - | 24.9 | [cfg](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/contrib/PP-HumanSeg/configs/human_pp_humansegv1_server.yml) | [Checkpoint](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_server_512x512_pretrained.zip) \| [Inference Model (Argmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_server_512x512_inference_model.zip) \| [Inference Model (Softmax)](https://paddleseg.bj.bcebos.com/dygraph/pp_humanseg_v2/human_pp_humansegv1_server_512x512_inference_model_with_softmax.zip) | + + +
Note: + +* Segmentation accuracy (mIoU): After training the models on large human segmentation dataset, we test these models on small Supervisely Person dataset ([url](https://paddleseg.bj.bcebos.com/humanseg/data/mini_supervisely.zip)). +* Inference latency on Snapdragon 855 ARM CPU: We tested the models on xiaomi9 (Snapdragon 855 CPU) using [PaddleLite](https://www.paddlepaddle.org.cn/lite), with single thread, large kernel and best input shape. +* Inference latency on server: We tested the models on Nvidia V100 GPU using Paddle Inference, with TensorRT and best input shape. +* The checkpoint is the pretrained weight, which can be used for finetune together with the config file. +* Inference model is used for deployment. +* Inference Model (Argmax): The last operation of inference model is argmax, so the output is a single-channel image (int64), where the portrait area is 1, and the background area is 0. +* Inference Model (Softmax): The last operation of inference model is softmax, so the output is a single-channel image(float32), where each pixel represents the probability of portrait area. +
+ +
Usage: + +* Since the images for general human segmentation are various, you should evaluate the released models according to the actual scene. +* If the segmentation accuracy is not satisfied, you are suggested to collect more data, annotate them and finetune the model. +
diff --git a/modelcenter/PP-HumanSegV2/introduction_en.ipynb b/modelcenter/PP-HumanSegV2/introduction_en.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..ce6d53954a843bd236eac4f3898434f3b815a2ea --- /dev/null +++ b/modelcenter/PP-HumanSegV2/introduction_en.ipynb @@ -0,0 +1,416 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "55903e0e-3e6d-430f-91b7-d270a953ffd7", + "metadata": {}, + "source": [ + "## 1. PP-HumanSegV2 Introduction\n", + "\n", + "Human segmentation is a high-frequency application in the field of image segmentation. Generally, human segentation can be classified as portrait segmentation and general human segmentation.\n", + "\n", + "For portrait segmentation and general human segmentation, PaddleSeg releases the PP-HumanSeg models, which have good performance in accuracy, inference speed and robustness. Besides, PP-HumanSeg models can be deployed to products without training, at zero cost, and fine-tuning is also supported to achieve better performance.\n", + "\n", + "In July 2022, PaddleSeg upgraded PP-HumanSeg to PP-HumanSegV2, providing new portrait segmentation solution which refreshed the SOTA indicator of the open-source portrait segmentation solutions with 96.63% mIoU accuracy and 63FPS mobile inference speed. Compared with the V1 solution, the inference speed is increased by 87.15%, the segmentation accuracy is increased by 3.03%, and the visualization effect is better. The PP-HumanSegV2 is comparable to the commercial solutions!\n", + "\n", + "PP-HumanSeg is officially produced by PaddlePaddle and proposed by PaddleSeg team. More information about PaddleSeg can be found here https://github.com/PaddlePaddle/PaddleSeg." + ] + }, + { + "cell_type": "markdown", + "id": "ba317a85-c8a1-49bd-afa3-59bfad7e86c3", + "metadata": {}, + "source": [ + "## 2. Model Effects and Application Scenarios\n", + "### 2.1 Portrait Segmentation and General Human Segmentation Tasks:\n", + "\n", + "#### 2.1.1 Datasets:\n", + "\n", + "The dataset is mainly PP-HumanSeg14k, which is divided into training set and test set.\n", + "\n", + "#### 2.1.2 Model Effects:\n", + "\n", + "The segmentation effect of PP-HumanSegV2 on the image is:\n", + "\n", + "Original image:\n", + "
\n", + "\n", + "
\n", + "\n", + "\n", + "Segmentation result:\n", + "
\n", + "\n", + "
\n" + ] + }, + { + "cell_type": "markdown", + "id": "0d9668da-d14a-4491-ab01-7b039399bff6", + "metadata": {}, + "source": [ + "## 3. How to Use the Model\n", + "\n", + "### 3.1 Model Inference:\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "69e98dbf-a3ba-4d6c-8bde-b1abb469c7b2", + "metadata": {}, + "source": [ + "* Install PaddlePaddle\n", + "\n", + "To install PaddlePaddle, PaddlePaddle>=2.2.0 is required. Since the image segmentation model is computationally expensive, it is recommended to use the GPU version of PaddlePaddle.\n", + "\n", + "In AI Studio, you can directly choose the environment where PaddlePaddle is installed. If you need to install PaddlePaddle, please refer to the PaddlePaddle official website.\n", + "\n", + "This tutorial has been verified with PaddlePaddle 2.3.2.\n" + ] + }, + { + "cell_type": "markdown", + "id": "5e2e25c6-6943-4d6a-b809-5f49158a5619", + "metadata": {}, + "source": [ + "* Download PaddleSeg \n", + "\n", + "(When not running on Jupyter Notebook, \"!\" Or \"%\" should be removed.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0c587ea0-52d6-48c2-b2cc-beae58b3262d", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%cd ~\n", + "\n", + "!git clone https://gitee.com/PaddlePaddle/PaddleSeg.git" + ] + }, + { + "cell_type": "markdown", + "id": "1fc9e04c-070f-48c3-bd6f-1fcacafb4e1f", + "metadata": {}, + "source": [ + "* Install PaddleSeg" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5f7a0af6-4968-4966-b27e-130d70c94886", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%cd ~/PaddleSeg\n", + "!pip install -v -e ." + ] + }, + { + "cell_type": "markdown", + "id": "96376cd0-99d1-4f28-8521-94e541e065e2", + "metadata": {}, + "source": [ + "* Download Dataset and Models" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6d7f3c8f-bf9b-4a4a-b7dd-9374c4fd2c9c", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "%cd ~/PaddleSeg/contrib/PP-HumanSeg\n", + "!python src/download_inference_models.py\n", + "!python src/download_data.py" + ] + }, + { + "cell_type": "markdown", + "id": "9c67ff4a-bf96-4bf2-87c2-992b8d3bacd7", + "metadata": {}, + "source": [ + "* Quick Experience\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9a96627b-2b2c-40eb-b13a-e1eb82a707df", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "!python src/seg_demo.py \\\n", + " --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n", + " --img_path data/images/portrait_heng.jpg \\\n", + " --bg_img_path data/images/bg_2.jpg \\\n", + " --save_dir data/images_result/portrait_heng_v2_withbg.jpg\n", + "\n", + "!python src/seg_demo.py \\\n", + " --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n", + " --img_path data/images/portrait_shu.jpg \\\n", + " --bg_img_path data/images/bg_1.jpg \\\n", + " --save_dir data/images_result/portrait_shu_v2_withbg.jpg \\\n", + " --vertical_screen" + ] + }, + { + "cell_type": "markdown", + "id": "bd5a6d38-a18d-4bb2-ae0d-9397ba7c172e", + "metadata": {}, + "source": [ + "The result as follow is saved in `data/images_result/portrait_heng_v2.jpg`.\n", + "\n", + " \n" + ] + }, + { + "cell_type": "markdown", + "id": "f6aeb78d-63ff-4e01-aa50-da37522e0b08", + "metadata": {}, + "source": [ + "### 3.2 Model Training\n" + ] + }, + { + "cell_type": "markdown", + "id": "4da9e355-e99a-4821-9fa2-1571a7b84f97", + "metadata": {}, + "source": [ + "* Preparation\n", + "\n", + "With reference to the above, install PaddleSeg, download the dataset, and then download the pretrained weights." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5945d193-bae0-4c5f-bd43-59e205b0484a", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "!python src/download_pretrained_models.py" + ] + }, + { + "cell_type": "markdown", + "id": "d8ba1500-8475-40e0-a257-9ddad436c14d", + "metadata": {}, + "source": [ + "* Training\n", + "\n", + "The config files are saved in `./configs` as follows. We have set the path of pretrained weight in all config files.\n", + "```\n", + "configs\n", + "├── human_pp_humansegv1_lite.yml\n", + "├── human_pp_humansegv2_lite.yml\n", + "├── human_pp_humansegv1_mobile.yml\n", + "├── human_pp_humansegv2_mobile.yml\n", + "├── human_pp_humansegv1_server.yml\n", + "├── portrait_pp_humansegv1_lite.yml\n", + "├── portrait_pp_humansegv2_lite.yml\n", + "```\n", + "\n", + "Run the following command to start finetuning. You should change the details, such as learning rate, according to the actual situation. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/train/train_cn.md) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "14461b15-5c88-453a-b877-739ddf0062ce", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "!export CUDA_VISIBLE_DEVICES=0 # Set GPU on Linux\n", + "# set CUDA_VISIBLE_DEVICES=0 # Set GPU on Windows\n", + "!python ../../train.py \\\n", + " --config configs/human_pp_humansegv2_lite.yml \\\n", + " --save_dir output/human_pp_humansegv2_lite \\\n", + " --save_interval 100 --do_eval --use_vdl" + ] + }, + { + "cell_type": "markdown", + "id": "6c44f772-2104-4379-bd66-d6f9b8d72414", + "metadata": {}, + "source": [ + "* Evaluation\n", + "\n", + "Load trained model and start evaluation. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/evaluation/evaluate/evaluate_cn.md) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9ff7c87f-62b4-4377-a7fd-260b8278e6c4", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "!python ../../val.py \\\n", + " --config configs/human_pp_humansegv2_lite.yml \\\n", + " --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams" + ] + }, + { + "cell_type": "markdown", + "id": "e346d381-15df-421f-8473-9e96e39bbd0c", + "metadata": {}, + "source": [ + "* Prediction\n", + "\n", + "Load trained model and start prediction. The result are saved in `./data/images_result/added_prediction` and `./data/images_result/pseudo_color_prediction` " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b5b53557-9dbe-472d-8406-c60c2c58927d", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "!python ../../predict.py \\\n", + " --config configs/human_pp_humansegv2_lite.yml \\\n", + " --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n", + " --image_path data/images/human.jpg \\\n", + " --save_dir ./data/images_result" + ] + }, + { + "cell_type": "markdown", + "id": "d98d8455-6e84-493a-b823-428aab76e932", + "metadata": {}, + "source": [ + "* Export\n", + "\n", + "Load trained model and export it. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/model_export_cn.md) for more details." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8fe76c54-aafe-43e0-a0cc-b8feb49bb770", + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "!python ../../export.py \\\n", + " --config configs/human_pp_humansegv2_lite.yml \\\n", + " --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n", + " --save_dir output/human_pp_humansegv2_lite \\\n", + " --without_argmax \\\n", + " --with_softmax" + ] + }, + { + "cell_type": "markdown", + "id": "629a50ee-3e33-42a5-b505-aacc4ccd220f", + "metadata": {}, + "source": [ + "When set --without_argmax --with_softmax, the last operation of inference model will be changed from argmax to softmax, which outputs an image of type float32, representing the probability of foreground, and sometimes it can be useful to make the edges smoother and more natural." + ] + }, + { + "cell_type": "markdown", + "id": "6a6fd1d6-a7a5-4388-b225-86b62ab72d67", + "metadata": {}, + "source": [ + "## 4. Model Principles\n", + "\n", + "The model structure.\n", + "
\n", + "\n", + "
\n", + "\n", + "* The computational load of the model is greatly reduced\n", + "\n", + "For the Encoder part of the model, we selected MobileNetV3 as the backbone network to extract multi-layer features. Through analysis, we found that the parameters of MobileNetV3 are mainly concentrated in the last stage. On the premise of not affecting the segmentation accuracy, we only retain the first four stages of MobileNetV3, successfully reducing the number of parameters by 68.6%. For the context part, we use the lightweight SPPM module proposed in the PP-LiteSeg model, and the ordinary convolutions are replaced by separable convolutions to further reduce the computational complexity. For the Decoder part, we designed three fusion modules to fuse deep semantic features and shallow detailed features for many times. The last fusion module again gathers feature maps at different levels and outputs the segmentation results.\n", + "\n", + "Multilevel feature fusion module:\n", + "
\n", + "\n", + "
\n", + "\n", + "* Use two-stage training method to improve segmentation accuracy\n", + "\n", + "The two-stage training is based on the idea of transfer learning. First, it is trained on a large-scale mixed portrait dataset (data volume 100k+), then it uses the pretrained weight to train on the PP-HumanSeg14k dataset (data volume 14k), and finally the trained model is obtained. The two-stage training method can make full use of other datasets and improve the segmentation accuracy and generalization ability of the model.\n", + "\n", + "* Adjust image resolution and improve inference speed\n", + "\n", + "Adjusting the image resolution also directly affects the inference speed of the model. We use a variety of image resolutions for training and testing, and select the best image resolution in the PP-HumanSeg v2 solution to further improve the inference speed of the model.\n", + "\n", + "* Use morphological post-processing to improve visualization effect\n", + "\n", + "First, obtain the original predictive image I, then use threshold processing, image erosion, image expansion and other operations to obtain the mask image M. Finally, multiply the predictive image I and the mask image M, and output the final predictive image O.\n" + ] + }, + { + "cell_type": "markdown", + "id": "dc360fb0-fb54-4266-a4e3-4ae385d43a76", + "metadata": {}, + "source": [ + "## 5. Related papers and citations\n", + "If you find our project useful in your research, please consider citing:\n", + "\n", + "```\n", + "@InProceedings{Chu_2022_WACV,\n", + " author = {Chu, Lutao and Liu, Yi and Wu, Zewu and Tang, Shiyu and Chen, Guowei and Hao, Yuying and Peng, Juncai and Yu, Zhiliang and Chen, Zeyu and Lai, Baohua and Xiong, Haoyi},\n", + " title = {PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset},\n", + " booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},\n", + " month = {January},\n", + " year = {2022},\n", + " pages = {202-209}\n", + "}\n", + "```" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "py35-paddle1.2.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}