{ "cells": [ { "cell_type": "markdown", "metadata": { "jupyter": { "outputs_hidden": false } }, "source": [ "## 1. PP-ShiTuV2模型简介\n", "PP-ShiTuV2 是基于 PP-ShiTuV1 改进的一个实用轻量级通用图像识别系统,由主体检测、特征提取、向量检索三个模块构成,相比 PP-ShiTuV1 具有更高的识别精度、更强的泛化能力以及相近的推理速度*。主要针对训练数据集、特征提取两个部分进行优化,使用了更优的骨干网络、损失函数与训练策略,使得 PP-ShiTuV2 在多个实际应用场景上的检索性能有显著提升。\n", "\n", "PP-ShiTuV2模型由飞桨官方出品,是PaddleClas优化和改进的识别检索模型。 更多关于PaddleClas可以点击 https://github.com/PaddlePaddle/PaddleClas 进行了解。\n", "\n", "## 2. 模型效果及应用场景\n", "### 2.1 商品识别任务:\n", "\n", "#### 2.1.1 数据集:\n", "\n", "PP-ShiTuV2的训练数据集以 Aliproduct、GLDv2等数据集为主,详细信息可参考 [PP-ShiTuV2 实验部分](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#4-%E5%AE%9E%E9%AA%8C%E9%83%A8%E5%88%86)\n", "\n", "#### 2.1.2 模型效果速览:\n", "\n", "PP-ShiTuV2 在图片上的检测效果如下\n", "\n", "![](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. 模型如何使用\n", "\n", "### 3.1 模型推理:\n", "\n", "- 下载 PaddleClas" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2022-11-08T08:24:16.514016Z", "iopub.status.busy": "2022-11-08T08:24:16.513368Z", "iopub.status.idle": "2022-11-08T08:25:00.630629Z", "shell.execute_reply": "2022-11-08T08:25:00.629113Z", "shell.execute_reply.started": "2022-11-08T08:24:16.513971Z" }, "jupyter": { "outputs_hidden": true }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 不在Jupyter Notebook上运行时需要将含 \"!\" 和 \"%\" 的语句注释,不需要运行。\n", "%cd ~/work\n", "\n", "# 克隆 PaddleClas(gitee上克隆速度较快)\n", "!git clone https://gitee.com/paddlepaddle/PaddleClas" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 安装 PaddleClas 及其依赖包" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "execution": { "iopub.execute_input": "2022-11-08T08:26:02.622321Z", "iopub.status.busy": "2022-11-08T08:26:02.621656Z", "iopub.status.idle": "2022-11-08T08:26:05.016413Z", "shell.execute_reply": "2022-11-08T08:26:05.015052Z", "shell.execute_reply.started": "2022-11-08T08:26:02.622277Z" }, "jupyter": { "outputs_hidden": true }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 进入 PaddleClas 目录\n", "%cd ~/work/PaddleClas/\n", "\n", "# 安装所需依赖项\n", "!pip install -r requirements.txt\n", "\n", "# 设置GPU\n", "# %env CUDA_VISIBLE_DEVICES=0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- 快速体验\n", "\n", "恭喜! 您已经成功安装了 PaddleClas,接下来快速体验图像识别效果" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-11-08T08:26:11.091828Z", "iopub.status.busy": "2022-11-08T08:26:11.090376Z", "iopub.status.idle": "2022-11-08T08:29:06.202735Z", "shell.execute_reply": "2022-11-08T08:29:06.201197Z", "shell.execute_reply.started": "2022-11-08T08:26:11.091754Z" }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 进入 PaddleClas 目录\n", "%cd ~/work/PaddleClas/\n", "\n", "# 创建存放主体检测、特征提取推理模型的文件夹\n", "%mkdir -p deploy/models\n", "\n", "# 进入该文件夹\n", "%cd deploy/models\n", "\n", "# 下载主体检测inference模型并解压\n", "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar\n", "\n", "# 下载特征提取inference模型并解压\n", "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar\n", "\n", "# 返回至deploy文件夹\n", "%cd ~/work/PaddleClas/deploy/\n", "\n", "# 下载测试数据 drink_dataset_v2.0 并解压\n", "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": { "iopub.execute_input": "2022-11-08T08:31:30.364422Z", "iopub.status.busy": "2022-11-08T08:31:30.363351Z", "iopub.status.idle": "2022-11-08T08:31:37.682006Z", "shell.execute_reply": "2022-11-08T08:31:37.680563Z", "shell.execute_reply.started": "2022-11-08T08:31:30.364378Z" }, "scrolled": true, "tags": [] }, "outputs": [], "source": [ "# 进入 PaddleClas 目录\n", "%cd ~/work/PaddleClas/\n", "\n", "# 进入deploy文件夹\n", "%cd ./deploy\n", "\n", "# 对 100.jpeg 图片进行识别推理\n", "!python python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs=\"./drink_dataset_v2.0/test_images/100.jpeg\" -o Global.use_gpu=False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "同时识别的结果(带有检测框、对应类别以及相似度)会保存至 `PaddleClas/deploy/output/100.jpeg`,如文档开头的 [2.1.2 模型效果速览](#212-模型效果速览) 所展示。\n", "\n", "![](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 模型训练\n", "\n", "- 克隆 PaddleClas 仓库(参考 3.1模型推理 - 下载PaddleClas)\n", "- 主体检测模型的数据集准备、开始训练、模型评估等步骤,请参考 [PP-ShiTu 主体检测 文档](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md)\n", "- 特征提取模型的数据集准备、开始训练、模型评估等步骤,请参考 [PP-ShiTu 特征提取 文档](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#5-%E8%87%AA%E5%AE%9A%E4%B9%89%E7%89%B9%E5%BE%81%E6%8F%90%E5%8F%96)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. 模型原理\n", "PP-ShiTu 系列识别系统,包括本文档介绍的 PP-ShiTuV2,均由3个模块串联完成整个识别过程,如下图所示\n", "\n", "![PP-ShiTu系统](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/structure.jpg)\n", "\n", "- 主体检测:上图中的蓝色模块,主要负责检测出用户输入图片中可能的识别目标,进而裁剪出这些目标,过滤不重要的背景,减少背景的干扰。事实上这种保留主体,过滤背景的做法是实践中会采用的一种简单而有效的方法。\n", "- 特征提取:接收 **主体检测** 模块输出的含有目标主体的裁剪后的图片,将其输入到特征提取模型中,得到对应的特征向量,作为该图片的表示特征用于接下来的检索步骤。\n", "- 向量检索:接收 **特征提取** 模块输出的一个或多个特征向量,逐个地在向量库中检索,将检索库中最邻近(一般以相似度表示邻近程度)的向量的类别,作为检索向量的类别,最后返回检索结果。该模块不需要额外训练,安装第三方开源的faiss检索库即可使用\n", "\n", "在检索系统中,最重要的模块之一就是特征提取模型,其特征提取能力好坏直接影响检索库内向量和待检索向量的质量,因此接下来分5个部分,重点介绍 PP-ShiTuV2 所使用的特征提取模型。\n", "\n", "- Backbone\n", "\n", " Backbone 部分采用了 PP-LCNetV2_base,其在 PPLCNet_V1 的基础上,加入了包括Rep 策略、PW 卷积、Shortcut、激活函数改进、SE 模块改进等多个优化点,使得最终分类精度与 PPLCNet_x2_5 相近,且推理延时减少了40%\\*。在实验过程中我们对 PPLCNetV2_base 进行了适当的改进,在保持速度基本不变的情况下,让其在识别任务中得到更高的性能,包括:去掉 PPLCNetV2_base 末尾的 ReLU 和 FC、将最后一个 stage(RepDepthwiseSeparable) 的 stride 改为1。\n", "\n", " **注**: \\*推理环境基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。\n", "\n", "- Neck\n", "\n", " Neck 部分采用了 BN Neck,对 Backbone 抽取得到的特征的每个维度进行标准化操作,减少了同时优化度量学习损失函数和分类损失函数的难度,加快收敛速度的同时减少 IDLoss 和 TripletLoss 之间由于优化目标所在空间不同带来的影响。\n", "\n", "- Head\n", "\n", " Head 部分选用 FC Layer,使用分类头将 feature 转换成 logits 供后续计算分类损失(一般使用交叉熵损失,称之为CELoss或IDLoss)。\n", "\n", "- Loss\n", "\n", " Loss 部分选用 Cross entropy loss 和 TripletAngularMarginLoss,在训练时以分类损失和基于角度的三元组损失来指导网络进行优化。我们基于原始的 TripletLoss (困难三元组损失)进行了改进,将优化目标从 L2 欧几里得空间更换成余弦空间,并加入了 anchor 与 positive/negtive 之间的硬性距离约束,让训练与测试的目标更加接近,提升模型的泛化能力。详细的配置文件见 [GeneralRecognitionV2_PPLCNetV2_base.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-L77)。\n", "\n", "- Data Augmentation\n", "\n", " 我们考虑到实际相机拍摄时目标主体可能出现一定的旋转而不一定能保持正立状态,因此我们在数据增强中加入了适当的 [随机旋转增强](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117),以提升模型在真实场景中的检索能力。\n", "\n", "## 5. 注意事项\n", "PP-ShiTuV2 是在寻找在产业实践中最高性价比的图像识别方案,但考虑到不同识别场景的数据集均有各自的分布特点,以及训练时的软硬件限制,无法一次性将所有的数据集全部纳入到训练集中,经过权衡才使用了目前这套训练数据集的组合。因此推荐用户在了解自己实际业务数据集的特点之后,基于 PP-ShiTuV2 的预训练模型以及训练配置,在自己的业务数据集上进行微调甚至二次开发,以获得性能更好,更适配自己数据集的识别模型。\n", "\n", "## 6. 相关论文以及引用信息\n", "```log\n", "@article{cui2021pp,\n", " title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n", " author={Cui, Cheng and Gao, Tingquan and Wei, Shengyu and Du, Yuning and Guo, Ruoyu and Dong, Shuilong and Lu, Bin and Zhou, Ying and Lv, Xueying and Liu, Qiwen and others},\n", " journal={arXiv preprint arXiv:2109.15099},\n", " year={2021}\n", "}\n", "\n", "@InProceedings{Luo_2019_CVPR_Workshops,\n", " author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei},\n", " title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification},\n", " booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},\n", " month = {June},\n", " year = {2019}\n", "}\n", "\n", "@ARTICLE{Luo_2019_Strong_TMM,\n", " author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}},\n", " journal={IEEE Transactions on Multimedia},\n", " title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification},\n", " year={2019},\n", " pages={1-1},\n", " doi={10.1109/TMM.2019.2958756},\n", " ISSN={1941-0077},\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.
\n", "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "py35-paddle1.2.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }