introduction_cn.ipynb 14.5 KB
Notebook
Newer Older
C
cc 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "55903e0e-3e6d-430f-91b7-d270a953ffd7",
   "metadata": {},
   "source": [
    "## 1. PP-HumanSegV2模型简介\n",
    "\n",
    "将人物和背景在像素级别进行区分,是一个图像分割的经典任务,具有广泛的应用。 一般而言,该任务可以分为两类:针对半身人像的分割,简称肖像分割;针对全身和半身人像的分割,简称通用人像分割。\n",
    "\n",
    "对于肖像分割和通用人像分割,PaddleSeg发布了PP-HumanSeg系列模型,具有分割精度高、推理速度快、通用型强的优点。而且PP-HumanSeg系列模型可以开箱即用,零成本部署到产品中,也支持针对特定场景数据进行微调,实现更佳分割效果。\n",
    "\n",
    "2022年7月,PaddleSeg重磅升级的PP-HumanSegV2人像分割方案,以96.63%的mIoU精度, 63FPS的手机端推理速度,再次刷新开源人像分割算法SOTA指标。相比PP-HumanSegV1方案,推理速度提升87.15%,分割精度提升3.03%,可视化效果更佳。V2方案可与商业收费方案媲美,而且支持零成本、开箱即用!\n",
    "\n",
    "PP-HumanSeg由飞桨官方出品,是PaddleSeg团队推出的模型和方案。 更多关于PaddleSeg可以点击 https://github.com/PaddlePaddle/PaddleSeg 进行了解。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba317a85-c8a1-49bd-afa3-59bfad7e86c3",
   "metadata": {},
   "source": [
    "## 2. 模型效果及应用场景\n",
    "### 2.1 肖像分割和通用人像分割任务\n",
    "\n",
    "#### 2.1.1 数据集\n",
    "\n",
    "数据集以PP-HumanSeg14k为主,分为训练集和测试集。\n",
    "\n",
    "#### 2.1.2 模型效果速览\n",
    "\n",
    "PP-HumanSegV2在图片上的分割效果如下。\n",
    "\n",
    "原图:\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200734740-72e98c73-5b41-47c4-b208-7fd9c10d8b8a.jpeg\"  width = \"60%\"  />\n",
    "</div>\n",
    "\n",
    "\n",
    "分割后的图:\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200735017-eb8a2b22-7ef9-4e4f-acc2-1ea538672f75.jpeg\"  width = \"60%\"  />\n",
    "</div>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d9668da-d14a-4491-ab01-7b039399bff6",
   "metadata": {},
   "source": [
    "## 3. 模型如何使用\n",
    "\n",
    "### 3.1 模型推理\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69e98dbf-a3ba-4d6c-8bde-b1abb469c7b2",
   "metadata": {},
   "source": [
    "* 安装PaddlePaddle\n",
    "\n",
    "安装PaddlePaddle,要求PaddlePaddle >= 2.2.0。由于图像分割模型计算开销大,推荐在GPU版本的PaddlePaddle下使用。\n",
    "\n",
    "在AIStudio中,大家选择可以直接选择安装好PaddlePaddle的环境。\n",
    "如果需要执行安装PaddlePaddle,请参考[PaddlePaddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)。\n",
    "\n",
    "本教程在PaddlePaddle 2.3.2版本下进行了验证。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e2e25c6-6943-4d6a-b809-5f49158a5619",
   "metadata": {},
   "source": [
    "* 下载PaddleSeg \n",
    "\n",
    "(不在Jupyter Notebook上运行时需要将\"!\"或者\"%\"去掉。)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c587ea0-52d6-48c2-b2cc-beae58b3262d",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "%cd ~\n",
    "\n",
    "!git clone https://gitee.com/PaddlePaddle/PaddleSeg.git"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fc9e04c-070f-48c3-bd6f-1fcacafb4e1f",
   "metadata": {},
   "source": [
    "* 安装PaddleSeg"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f7a0af6-4968-4966-b27e-130d70c94886",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# 安装PaddleSeg\n",
    "%cd ~/PaddleSeg\n",
    "!pip install -v -e ."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "96376cd0-99d1-4f28-8521-94e541e065e2",
   "metadata": {},
   "source": [
    "* 下载数据和模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d7f3c8f-bf9b-4a4a-b7dd-9374c4fd2c9c",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%cd ~/PaddleSeg/contrib/PP-HumanSeg\n",
    "!python src/download_inference_models.py\n",
    "!python src/download_data.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c67ff4a-bf96-4bf2-87c2-992b8d3bacd7",
   "metadata": {},
   "source": [
    "* 快速体验\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a96627b-2b2c-40eb-b13a-e1eb82a707df",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python src/seg_demo.py \\\n",
    "  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n",
    "  --img_path data/images/portrait_heng.jpg \\\n",
    "  --bg_img_path data/images/bg_2.jpg \\\n",
    "  --save_dir data/images_result/portrait_heng_v2_withbg.jpg\n",
    "\n",
    "!python src/seg_demo.py \\\n",
    "  --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n",
    "  --img_path data/images/portrait_shu.jpg \\\n",
    "  --bg_img_path data/images/bg_1.jpg \\\n",
    "  --save_dir data/images_result/portrait_shu_v2_withbg.jpg \\\n",
    "  --vertical_screen"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd5a6d38-a18d-4bb2-ae0d-9397ba7c172e",
   "metadata": {},
   "source": [
    "结果保存在`data/images_result/portrait_heng_v2.jpg`(如下图)。\n",
    "\n",
    "<img src=\"https://user-images.githubusercontent.com/52520497/188776878-130f4f6a-6379-4fb0-87e4-9a7ee4707c1d.jpg\" width=\"200\"> \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f6aeb78d-63ff-4e01-aa50-da37522e0b08",
   "metadata": {},
   "source": [
    "### 3.2 模型训练\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4da9e355-e99a-4821-9fa2-1571a7b84f97",
   "metadata": {},
   "source": [
    "* 准备\n",
    "\n",
    "参考前文,安装PaddleSeg、下载数据集,然后下载预训练权重。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5945d193-bae0-4c5f-bd43-59e205b0484a",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python src/download_pretrained_models.py"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8ba1500-8475-40e0-a257-9ddad436c14d",
   "metadata": {},
   "source": [
    "* 训练\n",
    "\n",
    "配置文件保存在`./configs`目录下,如下。配置文件中,已经通过`pretrained`设置好预训练权重的路径。\n",
    "```\n",
    "configs\n",
    "├── human_pp_humansegv1_lite.yml\n",
    "├── human_pp_humansegv2_lite.yml\n",
    "├── human_pp_humansegv1_mobile.yml\n",
    "├── human_pp_humansegv2_mobile.yml\n",
    "├── human_pp_humansegv1_server.yml\n",
    "├── portrait_pp_humansegv1_lite.yml\n",
    "├── portrait_pp_humansegv2_lite.yml\n",
    "```\n",
    "\n",
    "执行如下命令,进行模型微调(大家需要根据实际情况修改配置文件中的超参)。模型训练的详细文档,请参考[链接](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/train/train_cn.md)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14461b15-5c88-453a-b877-739ddf0062ce",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!export CUDA_VISIBLE_DEVICES=0 # Linux下设置1张可用的卡\n",
    "# set CUDA_VISIBLE_DEVICES=0  # Windows下设置1张可用的卡\n",
    "!python ../../train.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --save_dir output/human_pp_humansegv2_lite \\\n",
    "  --save_interval 100 --do_eval --use_vdl"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c44f772-2104-4379-bd66-d6f9b8d72414",
   "metadata": {},
   "source": [
    "* 评估\n",
    "\n",
    "执行如下命令,加载模型和训练好的权重,进行模型评估,输出验证集上的评估精度。模型评估的详细文档,请参考[链接](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/evaluation/evaluate/evaluate_cn.md)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ff7c87f-62b4-4377-a7fd-260b8278e6c4",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python ../../val.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e346d381-15df-421f-8473-9e96e39bbd0c",
   "metadata": {},
   "source": [
    "* 预测\n",
    "\n",
    "执行如下命令,加载模型和训练好的权重,对单张图像进行预测,预测结果保存在`./data/images_result`目录下的`added_prediction`和`pseudo_color_prediction`文件夹中。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5b53557-9dbe-472d-8406-c60c2c58927d",
   "metadata": {
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "!python ../../predict.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n",
    "  --image_path data/images/human.jpg \\\n",
    "  --save_dir ./data/images_result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d98d8455-6e84-493a-b823-428aab76e932",
   "metadata": {},
   "source": [
    "* 导出\n",
    "\n",
    "执行如下命令,加载模型和训练好的权重,导出预测模型。模型导出的详细文档,请参考[链接](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/model_export_cn.md)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8fe76c54-aafe-43e0-a0cc-b8feb49bb770",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!python ../../export.py \\\n",
    "  --config configs/human_pp_humansegv2_lite.yml \\\n",
    "  --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n",
    "  --save_dir output/human_pp_humansegv2_lite \\\n",
    "  --without_argmax \\\n",
    "  --with_softmax"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629a50ee-3e33-42a5-b505-aacc4ccd220f",
   "metadata": {},
   "source": [
    "注意,使用--without_argmax --with_softmax参数,则模型导出的时候,模型最后面不会添加Argmax算子,而是添加Softmax算子。 所以,输出是浮点数类型,表示前景的概率,使得图像融合的边缘更为平滑。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a6fd1d6-a7a5-4388-b225-86b62ab72d67",
   "metadata": {},
   "source": [
    "## 4. 模型原理\n",
    "\n",
    "模型结构如下图。\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200757494-1e63215e-4cd1-4c39-8dd9-a0e37f8719f2.png\"  width = \"60%\"  />\n",
    "</div>\n",
    "\n",
    "* 模型算量大幅减小\n",
    "\n",
    "对于模型Encoder部分,我们选用MobileNetV3作为骨干网络提取多层特征,分析发现MobileNetV3的参数主要集中在最后一个Stage,在不影响分割精度的前提下,我们只保留MobileNetV3的前四个Stage,成功减少了68.6%的参数量。对于上下文部分,我们使用PP-LiteSeg模型中提出的轻量级SPPM模块,而且其中的普通卷积都替换为可分离卷积,进一步减小计算量。对于Decoder部分,我们设计三个Fusion融合模块,多次融合深层语义特征和浅层细节特征,最后一个Fusion融合模块再次汇集不同层次的特征图,输出分割结果。\n",
    "\n",
    "多层次特征融合模块图:\n",
    "<div align=\"center\">\n",
    "<img src=\"https://user-images.githubusercontent.com/48357642/200758284-a8d5e6f9-1a66-414b-804c-57c6b6fcd698.png\"  width = \"30%\"  />\n",
    "</div>\n",
    "\n",
    "* 使用两阶段训练方式,提升分割精度\n",
    "\n",
    "两阶段训练是基于迁移学习的思想,首先在大规模混合人像数据集(数据量100k+)上训练,然后使用该预训练权重,在PP-HumanSeg14k数据集(数据量14k)上训练,最终得到训练好的模型。使用两阶段训练方式,可以充分利用其他数据集,提高模型的分割精度和泛化能力。\n",
    "\n",
    "* 调整图像分辨率,提升推理速度\n",
    "\n",
    "调整图像分辨率也直接影响模型的推理速度,我们使用多种图像分辨率进行训练和测试,在PP-HumanSeg v2方案中选择最佳图像分辨率,进一步提升了模型推理速度。\n",
    "\n",
    "* 使用形态学后处理,提升可视化效果\n",
    "\n",
    "首先获取原始预测图像I,然后使用阈值处理、图像腐蚀、图像膨胀等操作得到掩码图像M,最后预测图像I和掩码图像M相乘,输出最终预测图像O。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc360fb0-fb54-4266-a4e3-4ae385d43a76",
   "metadata": {},
   "source": [
    "## 5. 相关论文以及引用信息\n",
    "如果我们的项目在学术上帮助到你,请考虑以下引用:\n",
    "\n",
    "```\n",
    "@InProceedings{Chu_2022_WACV,\n",
    "    author    = {Chu, Lutao and Liu, Yi and Wu, Zewu and Tang, Shiyu and Chen, Guowei and Hao, Yuying and Peng, Juncai and Yu, Zhiliang and Chen, Zeyu and Lai, Baohua and Xiong, Haoyi},\n",
    "    title     = {PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset},\n",
    "    booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},\n",
    "    month     = {January},\n",
    "    year      = {2022},\n",
    "    pages     = {202-209}\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}