{ "cells": [ { "cell_type": "markdown", "id": "55903e0e-3e6d-430f-91b7-d270a953ffd7", "metadata": {}, "source": [ "## 1. PP-HumanSegV2模型简介\n", "\n", "将人物和背景在像素级别进行区分,是一个图像分割的经典任务,具有广泛的应用。 一般而言,该任务可以分为两类:针对半身人像的分割,简称肖像分割;针对全身和半身人像的分割,简称通用人像分割。\n", "\n", "对于肖像分割和通用人像分割,PaddleSeg发布了PP-HumanSeg系列模型,具有分割精度高、推理速度快、通用型强的优点。而且PP-HumanSeg系列模型可以开箱即用,零成本部署到产品中,也支持针对特定场景数据进行微调,实现更佳分割效果。\n", "\n", "2022年7月,PaddleSeg重磅升级的PP-HumanSegV2人像分割方案,以96.63%的mIoU精度, 63FPS的手机端推理速度,再次刷新开源人像分割算法SOTA指标。相比PP-HumanSegV1方案,推理速度提升87.15%,分割精度提升3.03%,可视化效果更佳。V2方案可与商业收费方案媲美,而且支持零成本、开箱即用!\n", "\n", "PP-HumanSeg由飞桨官方出品,是PaddleSeg团队推出的模型和方案。 更多关于PaddleSeg可以点击 https://github.com/PaddlePaddle/PaddleSeg 进行了解。" ] }, { "cell_type": "markdown", "id": "ba317a85-c8a1-49bd-afa3-59bfad7e86c3", "metadata": {}, "source": [ "## 2. 模型效果及应用场景\n", "### 2.1 肖像分割和通用人像分割任务\n", "\n", "#### 2.1.1 数据集\n", "\n", "数据集以PP-HumanSeg14k为主,分为训练集和测试集。\n", "\n", "#### 2.1.2 模型效果速览\n", "\n", "PP-HumanSegV2在图片上的分割效果如下。\n", "\n", "原图:\n", "
\n", "\n", "
\n", "\n", "\n", "分割后的图:\n", "
\n", "\n", "
\n" ] }, { "cell_type": "markdown", "id": "0d9668da-d14a-4491-ab01-7b039399bff6", "metadata": {}, "source": [ "## 3. 模型如何使用\n", "\n", "### 3.1 模型推理\n", "\n" ] }, { "cell_type": "markdown", "id": "69e98dbf-a3ba-4d6c-8bde-b1abb469c7b2", "metadata": {}, "source": [ "* 安装PaddlePaddle\n", "\n", "安装PaddlePaddle,要求PaddlePaddle >= 2.2.0。由于图像分割模型计算开销大,推荐在GPU版本的PaddlePaddle下使用。\n", "\n", "在AIStudio中,大家选择可以直接选择安装好PaddlePaddle的环境。\n", "如果需要执行安装PaddlePaddle,请参考[PaddlePaddle官网](https://www.paddlepaddle.org.cn/install/quick?docurl=/documentation/docs/zh/install/pip/linux-pip.html)。\n", "\n", "本教程在PaddlePaddle 2.3.2版本下进行了验证。\n" ] }, { "cell_type": "markdown", "id": "5e2e25c6-6943-4d6a-b809-5f49158a5619", "metadata": {}, "source": [ "* 下载PaddleSeg \n", "\n", "(不在Jupyter Notebook上运行时需要将\"!\"或者\"%\"去掉。)" ] }, { "cell_type": "code", "execution_count": null, "id": "0c587ea0-52d6-48c2-b2cc-beae58b3262d", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "%cd ~\n", "\n", "!git clone https://gitee.com/PaddlePaddle/PaddleSeg.git" ] }, { "cell_type": "markdown", "id": "1fc9e04c-070f-48c3-bd6f-1fcacafb4e1f", "metadata": {}, "source": [ "* 安装PaddleSeg" ] }, { "cell_type": "code", "execution_count": null, "id": "5f7a0af6-4968-4966-b27e-130d70c94886", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 安装PaddleSeg\n", "%cd ~/PaddleSeg\n", "!pip install -v -e ." ] }, { "cell_type": "markdown", "id": "96376cd0-99d1-4f28-8521-94e541e065e2", "metadata": {}, "source": [ "* 下载数据和模型" ] }, { "cell_type": "code", "execution_count": null, "id": "6d7f3c8f-bf9b-4a4a-b7dd-9374c4fd2c9c", "metadata": { "scrolled": true }, "outputs": [], "source": [ "%cd ~/PaddleSeg/contrib/PP-HumanSeg\n", "!python src/download_inference_models.py\n", "!python src/download_data.py" ] }, { "cell_type": "markdown", "id": "9c67ff4a-bf96-4bf2-87c2-992b8d3bacd7", "metadata": {}, "source": [ "* 快速体验\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9a96627b-2b2c-40eb-b13a-e1eb82a707df", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!python src/seg_demo.py \\\n", " --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n", " --img_path data/images/portrait_heng.jpg \\\n", " --bg_img_path data/images/bg_2.jpg \\\n", " --save_dir data/images_result/portrait_heng_v2_withbg.jpg\n", "\n", "!python src/seg_demo.py \\\n", " --config inference_models/portrait_pp_humansegv2_lite_256x144_inference_model_with_softmax/deploy.yaml \\\n", " --img_path data/images/portrait_shu.jpg \\\n", " --bg_img_path data/images/bg_1.jpg \\\n", " --save_dir data/images_result/portrait_shu_v2_withbg.jpg \\\n", " --vertical_screen" ] }, { "cell_type": "markdown", "id": "bd5a6d38-a18d-4bb2-ae0d-9397ba7c172e", "metadata": {}, "source": [ "结果保存在`data/images_result/portrait_heng_v2.jpg`(如下图)。\n", "\n", " \n" ] }, { "cell_type": "markdown", "id": "f6aeb78d-63ff-4e01-aa50-da37522e0b08", "metadata": {}, "source": [ "### 3.2 模型训练\n" ] }, { "cell_type": "markdown", "id": "4da9e355-e99a-4821-9fa2-1571a7b84f97", "metadata": {}, "source": [ "* 准备\n", "\n", "参考前文,安装PaddleSeg、下载数据集,然后下载预训练权重。" ] }, { "cell_type": "code", "execution_count": null, "id": "5945d193-bae0-4c5f-bd43-59e205b0484a", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!python src/download_pretrained_models.py" ] }, { "cell_type": "markdown", "id": "d8ba1500-8475-40e0-a257-9ddad436c14d", "metadata": {}, "source": [ "* 训练\n", "\n", "配置文件保存在`./configs`目录下,如下。配置文件中,已经通过`pretrained`设置好预训练权重的路径。\n", "```\n", "configs\n", "├── human_pp_humansegv1_lite.yml\n", "├── human_pp_humansegv2_lite.yml\n", "├── human_pp_humansegv1_mobile.yml\n", "├── human_pp_humansegv2_mobile.yml\n", "├── human_pp_humansegv1_server.yml\n", "├── portrait_pp_humansegv1_lite.yml\n", "├── portrait_pp_humansegv2_lite.yml\n", "```\n", "\n", "执行如下命令,进行模型微调(大家需要根据实际情况修改配置文件中的超参)。模型训练的详细文档,请参考[链接](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/train/train_cn.md)。" ] }, { "cell_type": "code", "execution_count": null, "id": "14461b15-5c88-453a-b877-739ddf0062ce", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!export CUDA_VISIBLE_DEVICES=0 # Linux下设置1张可用的卡\n", "# set CUDA_VISIBLE_DEVICES=0 # Windows下设置1张可用的卡\n", "!python ../../train.py \\\n", " --config configs/human_pp_humansegv2_lite.yml \\\n", " --save_dir output/human_pp_humansegv2_lite \\\n", " --save_interval 100 --do_eval --use_vdl" ] }, { "cell_type": "markdown", "id": "6c44f772-2104-4379-bd66-d6f9b8d72414", "metadata": {}, "source": [ "* 评估\n", "\n", "执行如下命令,加载模型和训练好的权重,进行模型评估,输出验证集上的评估精度。模型评估的详细文档,请参考[链接](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/evaluation/evaluate/evaluate_cn.md)。" ] }, { "cell_type": "code", "execution_count": null, "id": "9ff7c87f-62b4-4377-a7fd-260b8278e6c4", "metadata": { "scrolled": true }, "outputs": [], "source": [ "!python ../../val.py \\\n", " --config configs/human_pp_humansegv2_lite.yml \\\n", " --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams" ] }, { "cell_type": "markdown", "id": "e346d381-15df-421f-8473-9e96e39bbd0c", "metadata": {}, "source": [ "* 预测\n", "\n", "执行如下命令,加载模型和训练好的权重,对单张图像进行预测,预测结果保存在`./data/images_result`目录下的`added_prediction`和`pseudo_color_prediction`文件夹中。" ] }, { "cell_type": "code", "execution_count": null, "id": "b5b53557-9dbe-472d-8406-c60c2c58927d", "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!python ../../predict.py \\\n", " --config configs/human_pp_humansegv2_lite.yml \\\n", " --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n", " --image_path data/images/human.jpg \\\n", " --save_dir ./data/images_result" ] }, { "cell_type": "markdown", "id": "d98d8455-6e84-493a-b823-428aab76e932", "metadata": {}, "source": [ "* 导出\n", "\n", "执行如下命令,加载模型和训练好的权重,导出预测模型。模型导出的详细文档,请参考[链接](https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.6/docs/model_export_cn.md)。" ] }, { "cell_type": "code", "execution_count": null, "id": "8fe76c54-aafe-43e0-a0cc-b8feb49bb770", "metadata": { "scrolled": true }, "outputs": [], "source": [ "!python ../../export.py \\\n", " --config configs/human_pp_humansegv2_lite.yml \\\n", " --model_path pretrained_models/human_pp_humansegv2_lite_192x192_pretrained/model.pdparams \\\n", " --save_dir output/human_pp_humansegv2_lite \\\n", " --without_argmax \\\n", " --with_softmax" ] }, { "cell_type": "markdown", "id": "629a50ee-3e33-42a5-b505-aacc4ccd220f", "metadata": {}, "source": [ "注意,使用--without_argmax --with_softmax参数,则模型导出的时候,模型最后面不会添加Argmax算子,而是添加Softmax算子。 所以,输出是浮点数类型,表示前景的概率,使得图像融合的边缘更为平滑。" ] }, { "cell_type": "markdown", "id": "6a6fd1d6-a7a5-4388-b225-86b62ab72d67", "metadata": {}, "source": [ "## 4. 模型原理\n", "\n", "模型结构如下图。\n", "
\n", "\n", "
\n", "\n", "* 模型算量大幅减小\n", "\n", "对于模型Encoder部分,我们选用MobileNetV3作为骨干网络提取多层特征,分析发现MobileNetV3的参数主要集中在最后一个Stage,在不影响分割精度的前提下,我们只保留MobileNetV3的前四个Stage,成功减少了68.6%的参数量。对于上下文部分,我们使用PP-LiteSeg模型中提出的轻量级SPPM模块,而且其中的普通卷积都替换为可分离卷积,进一步减小计算量。对于Decoder部分,我们设计三个Fusion融合模块,多次融合深层语义特征和浅层细节特征,最后一个Fusion融合模块再次汇集不同层次的特征图,输出分割结果。\n", "\n", "多层次特征融合模块图:\n", "
\n", "\n", "
\n", "\n", "* 使用两阶段训练方式,提升分割精度\n", "\n", "两阶段训练是基于迁移学习的思想,首先在大规模混合人像数据集(数据量100k+)上训练,然后使用该预训练权重,在PP-HumanSeg14k数据集(数据量14k)上训练,最终得到训练好的模型。使用两阶段训练方式,可以充分利用其他数据集,提高模型的分割精度和泛化能力。\n", "\n", "* 调整图像分辨率,提升推理速度\n", "\n", "调整图像分辨率也直接影响模型的推理速度,我们使用多种图像分辨率进行训练和测试,在PP-HumanSeg v2方案中选择最佳图像分辨率,进一步提升了模型推理速度。\n", "\n", "* 使用形态学后处理,提升可视化效果\n", "\n", "首先获取原始预测图像I,然后使用阈值处理、图像腐蚀、图像膨胀等操作得到掩码图像M,最后预测图像I和掩码图像M相乘,输出最终预测图像O。\n" ] }, { "cell_type": "markdown", "id": "dc360fb0-fb54-4266-a4e3-4ae385d43a76", "metadata": {}, "source": [ "## 5. 相关论文以及引用信息\n", "如果我们的项目在学术上帮助到你,请考虑以下引用:\n", "\n", "```\n", "@InProceedings{Chu_2022_WACV,\n", " author = {Chu, Lutao and Liu, Yi and Wu, Zewu and Tang, Shiyu and Chen, Guowei and Hao, Yuying and Peng, Juncai and Yu, Zhiliang and Chen, Zeyu and Lai, Baohua and Xiong, Haoyi},\n", " title = {PP-HumanSeg: Connectivity-Aware Portrait Segmentation With a Large-Scale Teleconferencing Video Dataset},\n", " booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops},\n", " month = {January},\n", " year = {2022},\n", " pages = {202-209}\n", "}\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "py35-paddle1.2.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 5 }