introduction_en.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "55903e0e-3e6d-430f-91b7-d270a953ffd7",
   "metadata": {},
   "source": [
    "## 1. PP-HumanSegV2 Introduction\n",
    "\n",
    "Human segmentation is a high-frequency application in the field of image segmentation. Generally, human segentation can be classified as portrait segmentation and general human segmentation.\n",
    "\n",
    "For portrait segmentation and general human segmentation, PaddleSeg releases the PP-HumanSeg models, which have good performance in accuracy, inference speed and robustness. Besides, PP-HumanSeg models can be deployed to products without training, at zero cost, and fine-tuning is also supported to achieve better performance.\n",
    "\n",
    "In July 2022, PaddleSeg upgraded PP-HumanSeg to PP-HumanSegV2, providing new portrait segmentation solution which refreshed the SOTA indicator of the open-source portrait segmentation solutions with 96.63% mIoU accuracy and 63FPS mobile inference speed. Compared with the V1 solution, the inference speed is increased by 87.15%, the segmentation accuracy is increased by 3.03%, and the visualization effect is better. The PP-HumanSegV2 is comparable to the commercial solutions!\n",
    "\n",
    "PP-HumanSeg is officially produced by PaddlePaddle and proposed by PaddleSeg team. More information about PaddleSeg can be found here https://github.com/PaddlePaddle/PaddleSeg."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba317a85-c8a1-49bd-afa3-59bfad7e86c3",
   "metadata": {},
   "source": [
    "## 2. Model Effects and Application Scenarios\n",
    "### 2.1 Portrait Segmentation and General Human Segmentation Tasks:\n",
    "\n",
    "#### 2.1.1 Datasets:\n",
    "\n",
    "The dataset is mainly PP-HumanSeg14k, which is divided into training set and test set.\n",
    "\n",
    "#### 2.1.2 Model Effects:\n",
    "\n",
    "The segmentation effect of PP-HumanSegV2 on the image is:\n",
    "\n",
    "Original image：\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200734740-72e98c73-5b41-47c4-b208-7fd9c10d8b8a.jpeg\"  width = \"60%\"  />\n",
    "</div>\n",
    "\n",
    "\n",
    "Segmentation result：\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200735017-eb8a2b22-7ef9-4e4f-acc2-1ea538672f75.jpeg\"  width = \"60%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d9668da-d14a-4491-ab01-7b039399bff6",
   "metadata": {},
   "source": [
    "## 3. How to Use the Model\n",
    "\n",
    "### 3.1 Model Inference:\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69e98dbf-a3ba-4d6c-8bde-b1abb469c7b2",
   "metadata": {},
   "source": [
    "* Install PaddlePaddle\n",
    "\n",
    "To install PaddlePaddle, PaddlePaddle>=2.2.0 is required. Since the image segmentation model is computationally expensive, it is recommended to use the GPU version of PaddlePaddle.\n",
    "\n",
    "In AI Studio, you can directly choose the environment where PaddlePaddle is installed. If you need to install PaddlePaddle, please refer to the PaddlePaddle official website.\n",
    "\n",
    "This tutorial has been verified with PaddlePaddle 2.3.2.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e2e25c6-6943-4d6a-b809-5f49158a5619",
   "metadata": {},
   "source": [
    "* Download PaddleSeg \n",
    "\n",
    "(When not running on Jupyter Notebook, \"!\" Or \"%\" should be removed.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c587ea0-52d6-48c2-b2cc-beae58b3262d",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~\n",
    "\n",
    "!git clone https://gitee.com/PaddlePaddle/PaddleSeg.git"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fc9e04c-070f-48c3-bd6f-1fcacafb4e1f",
   "metadata": {},
   "source": [
    "* Install PaddleSeg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f7a0af6-4968-4966-b27e-130d70c94886",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~/PaddleSeg\n",
    "!pip install -v -e ."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96376cd0-99d1-4f28-8521-94e541e065e2",
   "metadata": {},
   "source": [
    "* Download Dataset and Models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d7f3c8f-bf9b-4a4a-b7dd-9374c4fd2c9c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%cd ~/PaddleSeg/contrib/PP-HumanSeg\n",
    "!python src/download_inference_models.py\n",
    "!python src/download_data.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c67ff4a-bf96-4bf2-87c2-992b8d3bacd7",
   "metadata": {},
   "source": [
    "* Quick Experience\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a96627b-2b2c-40eb-b13a-e1eb82a707df",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python src/seg_demo.py \\\n",
    "  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n",
    "  --img_path data/images/portrait_heng.jpg \\\n",
    "  --bg_img_path data/images/bg_2.jpg \\\n",
    "  --save_dir data/images_result/portrait_heng_v2_withbg.jpg\n",
    "\n",
    "!python src/seg_demo.py \\\n",
    "  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n",
    "  --img_path data/images/portrait_shu.jpg \\\n",
    "  --bg_img_path data/images/bg_1.jpg \\\n",
    "  --save_dir data/images_result/portrait_shu_v2_withbg.jpg \\\n",
    "  --vertical_screen"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd5a6d38-a18d-4bb2-ae0d-9397ba7c172e",
   "metadata": {},
   "source": [
    "The result as follow is saved in `data/images_result/portrait_heng_v2.jpg`.\n",
    "\n",
    "<img src=\"https://user-images.githubusercontent.com/52520497/188776878-130f4f6a-6379-4fb0-87e4-9a7ee4707c1d.jpg\" width=\"200\"> \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6aeb78d-63ff-4e01-aa50-da37522e0b08",
   "metadata": {},
   "source": [
    "### 3.2 Model Training\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4da9e355-e99a-4821-9fa2-1571a7b84f97",
   "metadata": {},
   "source": [
    "* Preparation\n",
    "\n",
    "With reference to the above, install PaddleSeg, download the dataset, and then download the pretrained weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5945d193-bae0-4c5f-bd43-59e205b0484a",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python src/download_pretrained_models.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8ba1500-8475-40e0-a257-9ddad436c14d",
   "metadata": {},
   "source": [
    "* Training\n",
    "\n",
    "The config files are saved in `./configs` as follows. We have set the path of pretrained weight in all config files.\n",
    "```\n",
    "configs\n",
    "├── human_pp_humansegv1_lite.yml\n",
    "├── human_pp_humansegv2_lite.yml\n",
    "├── human_pp_humansegv1_mobile.yml\n",
    "├── human_pp_humansegv2_mobile.yml\n",
    "├── human_pp_humansegv1_server.yml\n",
    "├── portrait_pp_humansegv1_lite.yml\n",
    "├── portrait_pp_humansegv2_lite.yml\n",
    "```\n",
    "\n",
    "Run the following command to start finetuning. You should change the details, such as learning rate, according to the actual situation. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/train/train_cn.md) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14461b15-5c88-453a-b877-739ddf0062ce",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!export CUDA_VISIBLE_DEVICES=0 # Set GPU on Linux\n",
    "# set CUDA_VISIBLE_DEVICES=0  # Set GPU on Windows\n",
    "!python ../../train.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --save_dir output/human_pp_humansegv2_lite \\\n",
    "  --save_interval 100 --do_eval --use_vdl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c44f772-2104-4379-bd66-d6f9b8d72414",
   "metadata": {},
   "source": [
    "* Evaluation\n",
    "\n",
    "Load trained model and start evaluation. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/evaluation/evaluate/evaluate_cn.md) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ff7c87f-62b4-4377-a7fd-260b8278e6c4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python ../../val.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e346d381-15df-421f-8473-9e96e39bbd0c",
   "metadata": {},
   "source": [
    "* Prediction\n",
    "\n",
    "Load trained model and start prediction. The result are saved in `./data/images_result/added_prediction` and `./data/images_result/pseudo_color_prediction` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5b53557-9dbe-472d-8406-c60c2c58927d",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python ../../predict.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n",
    "  --image_path data/images/human.jpg \\\n",
    "  --save_dir ./data/images_result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d98d8455-6e84-493a-b823-428aab76e932",
   "metadata": {},
   "source": [
    "* Export\n",
    "\n",
    "Load trained model and export it. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/model_export_cn.md) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fe76c54-aafe-43e0-a0cc-b8feb49bb770",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python ../../export.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n",
    "  --save_dir output/human_pp_humansegv2_lite \\\n",
    "  --without_argmax \\\n",
    "  --with_softmax"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629a50ee-3e33-42a5-b505-aacc4ccd220f",
   "metadata": {},
   "source": [
    "When set --without_argmax --with_softmax, the last operation of inference model will be changed from argmax to softmax, which outputs an image of type float32, representing the probability of foreground, and sometimes it can be useful to make the edges smoother and more natural."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a6fd1d6-a7a5-4388-b225-86b62ab72d67",
   "metadata": {},
   "source": [
    "## 4. Model Principles\n",
    "\n",
    "The model structure.\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200757494-1e63215e-4cd1-4c39-8dd9-a0e37f8719f2.png\"  width = \"60%\"  />\n",
    "</div>\n",
    "\n",
    "* The computational load of the model is greatly reduced\n",
    "\n",
    "For the Encoder part of the model, we selected MobileNetV3 as the backbone network to extract multi-layer features. Through analysis, we found that the parameters of MobileNetV3 are mainly concentrated in the last stage. On the premise of not affecting the segmentation accuracy, we only retain the first four stages of MobileNetV3, successfully reducing the number of parameters by 68.6%. For the context part, we use the lightweight SPPM module proposed in the PP-LiteSeg model, and the ordinary convolutions are replaced by separable convolutions to further reduce the computational complexity. For the Decoder part, we designed three fusion modules to fuse deep semantic features and shallow detailed features for many times. The last fusion module again gathers feature maps at different levels and outputs the segmentation results.\n",
    "\n",
    "Multilevel feature fusion module：\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200758284-a8d5e6f9-1a66-414b-804c-57c6b6fcd698.png\"  width = \"30%\"  />\n",
    "</div>\n",
    "\n",
    "* Use two-stage training method to improve segmentation accuracy\n",
    "\n",
    "The two-stage training is based on the idea of transfer learning. First, it is trained on a large-scale mixed portrait dataset (data volume 100k+), then it uses the pretrained weight to train on the PP-HumanSeg14k dataset (data volume 14k), and finally the trained model is obtained. The two-stage training method can make full use of other datasets and improve the segmentation accuracy and generalization ability of the model.\n",
    "\n",
    "* Adjust image resolution and improve inference speed\n",
    "\n",
    "Adjusting the image resolution also directly affects the inference speed of the model. We use a variety of image resolutions for training and testing, and select the best image resolution in the PP-HumanSeg v2 solution to further improve the inference speed of the model.\n",
    "\n",
    "* Use morphological post-processing to improve visualization effect\n",
    "\n",
    "First, obtain the original predictive image I, then use threshold processing, image erosion, image expansion and other operations to obtain the mask image M. Finally, multiply the predictive image I and the mask image M, and output the final predictive image O.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc360fb0-fb54-4266-a4e3-4ae385d43a76",
   "metadata": {},
   "source": [
    "## 5. Related papers and citations\n",
    "If you find our project useful in your research, please consider citing:\n",
    "\n",
    "```\n",
    "@InProceedings{Chu_2022_WACV,\n",
    "    author    = {Chu, Lutao and Liu, Yi and Wu, Zewu and Tang, Shiyu and Chen, Guowei and Hao, Yuying and Peng, Juncai and Yu, Zhiliang and Chen, Zeyu and Lai, Baohua and Xiong, Haoyi},\n",
    "    title     = {PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset},\n",
    "    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},\n",
    "    month     = {January},\n",
    "    year      = {2022},\n",
    "    pages     = {202-209}\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}