introduction_en.ipynb 14.9 KB
Notebook
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "55903e0e-3e6d-430f-91b7-d270a953ffd7",
   "metadata": {},
   "source": [
    "## 1. PP-HumanSegV2 Introduction\n",
    "\n",
    "Human segmentation is a high-frequency application in the field of image segmentation. Generally, human segentation can be classified as portrait segmentation and general human segmentation.\n",
    "\n",
    "For portrait segmentation and general human segmentation, PaddleSeg releases the PP-HumanSeg models, which have good performance in accuracy, inference speed and robustness. Besides, PP-HumanSeg models can be deployed to products without training, at zero cost, and fine-tuning is also supported to achieve better performance.\n",
    "\n",
    "In July 2022, PaddleSeg upgraded PP-HumanSeg to PP-HumanSegV2, providing new portrait segmentation solution which refreshed the SOTA indicator of the open-source portrait segmentation solutions with 96.63% mIoU accuracy and 63FPS mobile inference speed. Compared with the V1 solution, the inference speed is increased by 87.15%, the segmentation accuracy is increased by 3.03%, and the visualization effect is better. The PP-HumanSegV2 is comparable to the commercial solutions!\n",
    "\n",
    "PP-HumanSeg is officially produced by PaddlePaddle and proposed by PaddleSeg team. More information about PaddleSeg can be found here https://github.com/PaddlePaddle/PaddleSeg."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba317a85-c8a1-49bd-afa3-59bfad7e86c3",
   "metadata": {},
   "source": [
    "## 2. Model Effects and Application Scenarios\n",
    "### 2.1 Portrait Segmentation and General Human Segmentation Tasks:\n",
    "\n",
    "#### 2.1.1 Datasets:\n",
    "\n",
    "The dataset is mainly PP-HumanSeg14k, which is divided into training set and test set.\n",
    "\n",
    "#### 2.1.2 Model Effects:\n",
    "\n",
    "The segmentation effect of PP-HumanSegV2 on the image is:\n",
    "\n",
    "Original image:\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200734740-72e98c73-5b41-47c4-b208-7fd9c10d8b8a.jpeg\"  width = \"60%\"  />\n",
    "</div>\n",
    "\n",
    "\n",
    "Segmentation result:\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200735017-eb8a2b22-7ef9-4e4f-acc2-1ea538672f75.jpeg\"  width = \"60%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d9668da-d14a-4491-ab01-7b039399bff6",
   "metadata": {},
   "source": [
    "## 3. How to Use the Model\n",
    "\n",
    "### 3.1 Model Inference:\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69e98dbf-a3ba-4d6c-8bde-b1abb469c7b2",
   "metadata": {},
   "source": [
    "* Install PaddlePaddle\n",
    "\n",
    "To install PaddlePaddle, PaddlePaddle>=2.2.0 is required. Since the image segmentation model is computationally expensive, it is recommended to use the GPU version of PaddlePaddle.\n",
    "\n",
    "In AI Studio, you can directly choose the environment where PaddlePaddle is installed. If you need to install PaddlePaddle, please refer to the PaddlePaddle official website.\n",
    "\n",
    "This tutorial has been verified with PaddlePaddle 2.3.2.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e2e25c6-6943-4d6a-b809-5f49158a5619",
   "metadata": {},
   "source": [
    "* Download PaddleSeg \n",
    "\n",
    "(When not running on Jupyter Notebook, \"!\" Or \"%\" should be removed.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c587ea0-52d6-48c2-b2cc-beae58b3262d",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~\n",
    "\n",
    "!git clone https://gitee.com/PaddlePaddle/PaddleSeg.git"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fc9e04c-070f-48c3-bd6f-1fcacafb4e1f",
   "metadata": {},
   "source": [
    "* Install PaddleSeg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f7a0af6-4968-4966-b27e-130d70c94886",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~/PaddleSeg\n",
    "!pip install -v -e ."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96376cd0-99d1-4f28-8521-94e541e065e2",
   "metadata": {},
   "source": [
    "* Download Dataset and Models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d7f3c8f-bf9b-4a4a-b7dd-9374c4fd2c9c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%cd ~/PaddleSeg/contrib/PP-HumanSeg\n",
    "!python src/download_inference_models.py\n",
    "!python src/download_data.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c67ff4a-bf96-4bf2-87c2-992b8d3bacd7",
   "metadata": {},
   "source": [
    "* Quick Experience\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a96627b-2b2c-40eb-b13a-e1eb82a707df",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python src/seg_demo.py \\\n",
    "  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n",
    "  --img_path data/images/portrait_heng.jpg \\\n",
    "  --bg_img_path data/images/bg_2.jpg \\\n",
    "  --save_dir data/images_result/portrait_heng_v2_withbg.jpg\n",
    "\n",
    "!python src/seg_demo.py \\\n",
    "  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n",
    "  --img_path data/images/portrait_shu.jpg \\\n",
    "  --bg_img_path data/images/bg_1.jpg \\\n",
    "  --save_dir data/images_result/portrait_shu_v2_withbg.jpg \\\n",
    "  --vertical_screen"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd5a6d38-a18d-4bb2-ae0d-9397ba7c172e",
   "metadata": {},
   "source": [
    "The result as follow is saved in `data/images_result/portrait_heng_v2.jpg`.\n",
    "\n",
    "<img src=\"https://user-images.githubusercontent.com/52520497/188776878-130f4f6a-6379-4fb0-87e4-9a7ee4707c1d.jpg\" width=\"200\"> \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6aeb78d-63ff-4e01-aa50-da37522e0b08",
   "metadata": {},
   "source": [
    "### 3.2 Model Training\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4da9e355-e99a-4821-9fa2-1571a7b84f97",
   "metadata": {},
   "source": [
    "* Preparation\n",
    "\n",
    "With reference to the above, install PaddleSeg, download the dataset, and then download the pretrained weights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5945d193-bae0-4c5f-bd43-59e205b0484a",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python src/download_pretrained_models.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8ba1500-8475-40e0-a257-9ddad436c14d",
   "metadata": {},
   "source": [
    "* Training\n",
    "\n",
    "The config files are saved in `./configs` as follows. We have set the path of pretrained weight in all config files.\n",
    "```\n",
    "configs\n",
    "├── human_pp_humansegv1_lite.yml\n",
    "├── human_pp_humansegv2_lite.yml\n",
    "├── human_pp_humansegv1_mobile.yml\n",
    "├── human_pp_humansegv2_mobile.yml\n",
    "├── human_pp_humansegv1_server.yml\n",
    "├── portrait_pp_humansegv1_lite.yml\n",
    "├── portrait_pp_humansegv2_lite.yml\n",
    "```\n",
    "\n",
    "Run the following command to start finetuning. You should change the details, such as learning rate, according to the actual situation. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/train/train_cn.md) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14461b15-5c88-453a-b877-739ddf0062ce",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!export CUDA_VISIBLE_DEVICES=0 # Set GPU on Linux\n",
    "# set CUDA_VISIBLE_DEVICES=0  # Set GPU on Windows\n",
    "!python ../../train.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --save_dir output/human_pp_humansegv2_lite \\\n",
    "  --save_interval 100 --do_eval --use_vdl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c44f772-2104-4379-bd66-d6f9b8d72414",
   "metadata": {},
   "source": [
    "* Evaluation\n",
    "\n",
    "Load trained model and start evaluation. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/evaluation/evaluate/evaluate_cn.md) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ff7c87f-62b4-4377-a7fd-260b8278e6c4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python ../../val.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e346d381-15df-421f-8473-9e96e39bbd0c",
   "metadata": {},
   "source": [
    "* Prediction\n",
    "\n",
    "Load trained model and start prediction. The result are saved in `./data/images_result/added_prediction` and `./data/images_result/pseudo_color_prediction` "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5b53557-9dbe-472d-8406-c60c2c58927d",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python ../../predict.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n",
    "  --image_path data/images/human.jpg \\\n",
    "  --save_dir ./data/images_result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d98d8455-6e84-493a-b823-428aab76e932",
   "metadata": {},
   "source": [
    "* Export\n",
    "\n",
    "Load trained model and export it. Please refer to [url](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/model_export_cn.md) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fe76c54-aafe-43e0-a0cc-b8feb49bb770",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python ../../export.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n",
    "  --save_dir output/human_pp_humansegv2_lite \\\n",
    "  --without_argmax \\\n",
    "  --with_softmax"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629a50ee-3e33-42a5-b505-aacc4ccd220f",
   "metadata": {},
   "source": [
    "When set --without_argmax --with_softmax, the last operation of inference model will be changed from argmax to softmax, which outputs an image of type float32, representing the probability of foreground, and sometimes it can be useful to make the edges smoother and more natural."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a6fd1d6-a7a5-4388-b225-86b62ab72d67",
   "metadata": {},
   "source": [
    "## 4. Model Principles\n",
    "\n",
    "The model structure.\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200757494-1e63215e-4cd1-4c39-8dd9-a0e37f8719f2.png\"  width = \"60%\"  />\n",
    "</div>\n",
    "\n",
    "* The computational load of the model is greatly reduced\n",
    "\n",
    "For the Encoder part of the model, we selected MobileNetV3 as the backbone network to extract multi-layer features. Through analysis, we found that the parameters of MobileNetV3 are mainly concentrated in the last stage. On the premise of not affecting the segmentation accuracy, we only retain the first four stages of MobileNetV3, successfully reducing the number of parameters by 68.6%. For the context part, we use the lightweight SPPM module proposed in the PP-LiteSeg model, and the ordinary convolutions are replaced by separable convolutions to further reduce the computational complexity. For the Decoder part, we designed three fusion modules to fuse deep semantic features and shallow detailed features for many times. The last fusion module again gathers feature maps at different levels and outputs the segmentation results.\n",
    "\n",
    "Multilevel feature fusion module:\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200758284-a8d5e6f9-1a66-414b-804c-57c6b6fcd698.png\"  width = \"30%\"  />\n",
    "</div>\n",
    "\n",
    "* Use two-stage training method to improve segmentation accuracy\n",
    "\n",
    "The two-stage training is based on the idea of transfer learning. First, it is trained on a large-scale mixed portrait dataset (data volume 100k+), then it uses the pretrained weight to train on the PP-HumanSeg14k dataset (data volume 14k), and finally the trained model is obtained. The two-stage training method can make full use of other datasets and improve the segmentation accuracy and generalization ability of the model.\n",
    "\n",
    "* Adjust image resolution and improve inference speed\n",
    "\n",
    "Adjusting the image resolution also directly affects the inference speed of the model. We use a variety of image resolutions for training and testing, and select the best image resolution in the PP-HumanSeg v2 solution to further improve the inference speed of the model.\n",
    "\n",
    "* Use morphological post-processing to improve visualization effect\n",
    "\n",
    "First, obtain the original predictive image I, then use threshold processing, image erosion, image expansion and other operations to obtain the mask image M. Finally, multiply the predictive image I and the mask image M, and output the final predictive image O.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc360fb0-fb54-4266-a4e3-4ae385d43a76",
   "metadata": {},
   "source": [
    "## 5. Related papers and citations\n",
    "If you find our project useful in your research, please consider citing:\n",
    "\n",
    "```\n",
    "@InProceedings{Chu_2022_WACV,\n",
    "    author    = {Chu, Lutao and Liu, Yi and Wu, Zewu and Tang, Shiyu and Chen, Guowei and Hao, Yuying and Peng, Juncai and Yu, Zhiliang and Chen, Zeyu and Lai, Baohua and Xiong, Haoyi},\n",
    "    title     = {PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset},\n",
    "    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},\n",
    "    month     = {January},\n",
    "    year      = {2022},\n",
    "    pages     = {202-209}\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}