diff --git a/modelcenter/PP-ShiTu/benchmark_cn.md b/modelcenter/PP-ShiTu/benchmark_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..7f59f450204f4adacc3da16a4f2d7ffd4efefdc1 --- /dev/null +++ b/modelcenter/PP-ShiTu/benchmark_cn.md @@ -0,0 +1,53 @@ +## 1. 训练Benchmark + +### 1.1 软硬件环境 + +* PP-ShiTu 的特征提取模型训练过程中使用8 GPUs,每GPU batch size为256进行训练,如训练GPU数和batch size不使用上述配置,须参考FAQ调整学习率和迭代次数。 + +* PP-ShiTu 的检测模型训练过程中使用8 GPUs,每GPU batch size为28进行训练,如训练GPU数和batch size不使用上述配置,须参考FAQ调整学习率和迭代次数。 + +### 1.2 数据集 +特征提取模型对原有的训练数据进行了合理扩充与优化,最终使用如下 17 个公开数据集的汇总: + +| 数据集 | 数据量 | 类别数 | 场景 | 数据集地址 | +| :----------: | :-----: | :------: | :------: | :--------------------------------------------------------------------------: | +| Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | +| GLDv2 | 1580470 | 81313 | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) | +| VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild) | +| LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| iCartoonFace | 389678 | 5013 | 动漫人物 | [地址](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) | +| SOP | 59551 | 11318 | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) | +| Inshop | 25882 | 3997 | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | +| **Total** | **5M** | **185K** | ---- | ---- | + +主体检测模型的数据集请参考 [主体检测模型数据集](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86) + +### 1.3 指标 (字段可根据模型情况,自行定义) + +| 模型名称 | 模型简介 | 模型体积 | 输入尺寸 | ips | +| ------------------------------------ | -------- | ---------- | -------- | --- | +| picodet_lcnet_x2_5_640_mainbody.yml | 主体检测 | 30MB(量化) | 640 | 21 | +| GeneralRecognition_PPLCNet_x2_5.yaml | 特征提取 | 34MB(量化) | 224 | 200 | + + +## 2. 推理 Benchmark + +### 2.1 软硬件环境 + +* PP-ShiTu主体检测和特征提取模型的推理速度测试采用CPU,开启MKLDNN,10线程,batch size=1进行测试。 + + +### 2.2 数据集 + +PP-ShiTu特征提取模型使用自建产品数据集作为测试集 + +### 2.3 指标(字段可根据模型情况,自行定义) + +| 模型 | 存储(主体检测+特征提取) | product | +| :------- | :---------------------- | :------- | +| | | recall@1 | +| PP-ShiTu | 64(30+34)MB | 66.8% | + + +## 3. 相关使用说明 +请参考:https://github.com/PaddlePaddle/PaddleClas/tree/release/2.4/docs/zh_CN/image_recognition_pipeline diff --git a/modelcenter/PP-ShiTu/benchmark_en.md b/modelcenter/PP-ShiTu/benchmark_en.md new file mode 100644 index 0000000000000000000000000000000000000000..59357d20a703892e972e910c4abfee3aeabaae21 --- /dev/null +++ b/modelcenter/PP-ShiTu/benchmark_en.md @@ -0,0 +1,54 @@ +## 1. Train Benchmark + +### 1.1 Software and hardware environment + +* The feature extraction model of PP-ShiTu uses 8 GPUs in the training process, the batch size of each GPU is 256 for training. If the number of training GPUs and batch size are not consistent with the above configuration, you must refer to the FAQ to adjust the learning rate and the number of iterations. + +* 8 GPUs are used in the training process of the detection model of PP-ShiTu, and the batch size of each GPU is 28 for training. If the number of training GPUs and batch size are not consistent with the above configuration, you must refer to the FAQ to adjust the learning rate and the number of iterations. + +### 1.2 Dataset + +The feature extraction model expands and optimizes the original training data, and finally uses the following 17 public datasets: + +| Dataset | Data Amount | Number of classes | Scenario | Dataset Address | +| :----------- | :---------: | :---------------: | :------: | :--------------------------------------------------------------------------: | +| Aliproduct | 2498771 | 50030 | goods | [link](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | +| GLDv2 | 1580470 | 81313 | landmark | [link](https://github.com/cvdfoundation/google-landmark) | +| VeRI-Wild | 277797 | 30671 | vehicle | [link](https://github.com/PKU-IMRE/VERI-Wild) | +| LogoDet-3K | 155427 | 3000 | logo | [link](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| iCartoonFace | 389678 | 5013 | cartoon | [link](http://challenge.ai.iqiyi.com/detail?raceId=5def69ace9fcf68aef76a75d) | +| SOP | 59551 | 11318 | goods | [link](https://cvgl.stanford.edu/projects/lifted_struct/) | +| Inshop | 25882 | 3997 | goods | [link](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | +| **Total** | **6M** | **192K** | - | - | + + +For the dataset of the subject detection model, please refer to [subject detection model dataset](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86) + +### Metrics + +| Model Name | Model Introduction | Model Volume | Input Dimensions | ips | +| ------------------------------------ | ------------------ | ------------ | ---------------- | --- | +| picodet_lcnet_x2_5_640_mainbody.yml | body detection | 30MB | 640 | 21 | +| GeneralRecognition_PPLCNet_x2_5.yaml | Feature extraction | 34MB | 224 | 200 | + + +## 2. Inference Benchmark + +### 2.1 Environment + +* The inference speed test of the PP-ShiTu mainbody detection and feature extraction model uses CPU, with MKLDNN turned on, 10 threads, and batch size=1 for testing. + +### 2.2 Dataset + +The PP-ShiTu feature extraction model uses the self-built product dataset as the test set + +### 2.3 Results + +| model | storage (mainbody detection + feature extraction) | product | +| :------- | :----------------------------------------------- | :------- | +| | | recall@1 | +| PP-ShiTu | 64(30+34)MB | 66.8% | + + +## 3. Related Instructions +Please refer to: https://github.com/PaddlePaddle/PaddleClas/tree/release/2.4/docs/zh_CN/image_recognition_pipeline diff --git a/modelcenter/PP-ShiTu/download_cn.md b/modelcenter/PP-ShiTu/download_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..65ce2afffcb4eaef84293c98b1fbe3917580a50b --- /dev/null +++ b/modelcenter/PP-ShiTu/download_cn.md @@ -0,0 +1,5 @@ +# 提供模型所支持的任务场景、推理和预训练模型文件: +| 模型名称 | 模型简介 | 模型体积 | 输入尺寸 | 下载地址 | +| ----------------------------------- | -------- | -------- | -------- | --------------------------- | +| picodet_lcnet_x2_5_640_mainbody | 主体检测 | 30MB | 640 | [推理模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar)/[预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | +| GeneralRecognition_PPLCNet_x2_5 | 特征提取 | 19MB | 224 | [推理模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar)/[预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams) | diff --git a/modelcenter/PP-ShiTu/download_en.md b/modelcenter/PP-ShiTu/download_en.md new file mode 100644 index 0000000000000000000000000000000000000000..b5e43e72b327717018c9cde0e6257dbcf3f66dde --- /dev/null +++ b/modelcenter/PP-ShiTu/download_en.md @@ -0,0 +1,5 @@ +# Related pretrained model and inference model: +| model name | role | storage | input size(inference) | download link | +| ----------------------------------- | ------------------ | ------- | --------------------- | ---------------------------- | +| picodet_lcnet_x2_5_640_mainbody | mainbody detection | 30MB | 640 | [inference model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar)/[pretrained model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | +| GeneralRecognition_PPLCNet_x2_5 | feature extraction | 19MB | 224 | [inference model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar)/[pretrained model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/general_PPLCNet_x2_5_pretrained_v1.0.pdparams) | diff --git a/modelcenter/PP-ShiTu/info.yaml b/modelcenter/PP-ShiTu/info.yaml new file mode 100644 index 0000000000000000000000000000000000000000..64a816e46b4c8eeec2b6548396b798c040ce9f04 --- /dev/null +++ b/modelcenter/PP-ShiTu/info.yaml @@ -0,0 +1,25 @@ +--- +Model_Info: + name: "PP-ShiTu" + description: "PP-ShiTu识别系统" + description_en: "PP-ShiTu, a light-weighted recognition system" + icon: "@后续UE统一设计之后,会存到bos上某个位置" + from_repo: "PaddleClas" +Task: +- tag_en: "Computer Vision" + tag: "计算机视觉" + sub_tag_en: "Characteristics of Product Image" + sub_tag: "商品图片特征" +Example: +- tag_en: "Intelligent Retail" + tag: "智慧零售" + sub_tag_en: "Commodity identification" + title: "生鲜产品自主结算" + sub_tag: "商品识别" + url: "https://aistudio.baidu.com/aistudio/projectdetail/4003418?channelType=0&channel=0" +Datasets: "Aliproduct, GLDv2, VeRI, LogoDet, iCartoonFace, SOP, Inshop" +Pulisher: "Baidu" +License: "apache.2.0" +Paper: "" +IfTraining: 1 +IfOnlineDemo: 1 diff --git a/modelcenter/PP-ShiTu/introduction_cn.ipynb b/modelcenter/PP-ShiTu/introduction_cn.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..38804b713eef92741ffd215cc9396b69d30b0ac2 --- /dev/null +++ b/modelcenter/PP-ShiTu/introduction_cn.ipynb @@ -0,0 +1,288 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "jupyter": { + "outputs_hidden": false + } + }, + "source": [ + "## 1. PP-ShiTu模型简介\n", + "PP-ShiTu是一个实用的轻量级通用图像识别系统,主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化8个方面,采用多种策略,对各个模块的模型进行优化,最终得到在CPU上仅0.2s即可完成10w+库的图像识别的系统。\n", + "更多细节请参考 [PP-ShiTu技术方案](https://arxiv.org/pdf/2111.00775.pdf)。\n", + "\n", + "更多关于PaddleClas可以点击 https://github.com/PaddlePaddle/PaddleClas 进行了解。\n", + "\n", + "## 2. 模型效果及应用场景\n", + "### 2.1 商品识别任务:\n", + "\n", + "#### 2.1.1 数据集:\n", + "\n", + "PP-ShiTu的训练数据集和测试集由 Aliproduct、GLDv2等共7个数据集组成,详细信息可参考 [PP-ShiTu 实验部分](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/image_recognition_pipeline/feature_extraction.md#4-%E5%AE%9E%E9%AA%8C%E9%83%A8%E5%88%86)\n", + "\n", + "#### 2.1.2 模型效果速览:\n", + "\n", + "PP-ShiTu 在图片上的检测效果如下\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.4/docs/images/recognition/drink_data_demo/output/nongfu_spring.jpeg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. 模型如何使用\n", + "\n", + "### 3.1 模型推理:\n", + "\n", + "- 下载 PaddleClas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:24:16.514016Z", + "iopub.status.busy": "2022-11-08T08:24:16.513368Z", + "iopub.status.idle": "2022-11-08T08:25:00.630629Z", + "shell.execute_reply": "2022-11-08T08:25:00.629113Z", + "shell.execute_reply.started": "2022-11-08T08:24:16.513971Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 不在Jupyter Notebook上运行时需要将含 \"!\" 和 \"%\" 的语句注释,不需要运行。\n", + "%cd ~/work\n", + "\n", + "# 克隆 PaddleClas(gitee上克隆速度较快)\n", + "!git clone https://gitee.com/paddlepaddle/PaddleClas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 安装 PaddleClas 及其依赖包" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:26:02.622321Z", + "iopub.status.busy": "2022-11-08T08:26:02.621656Z", + "iopub.status.idle": "2022-11-08T08:26:05.016413Z", + "shell.execute_reply": "2022-11-08T08:26:05.015052Z", + "shell.execute_reply.started": "2022-11-08T08:26:02.622277Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 切换到2.4分支\n", + "!git checkout release/2.4\n", + "\n", + "# 安装所需依赖项\n", + "!pip install -r requirements.txt\n", + "\n", + "# 设置GPU\n", + "# %env CUDA_VISIBLE_DEVICES=0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 快速体验\n", + "\n", + "恭喜! 您已经成功安装了 PaddleClas,接下来快速体验图像识别效果" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:26:11.091828Z", + "iopub.status.busy": "2022-11-08T08:26:11.090376Z", + "iopub.status.idle": "2022-11-08T08:29:06.202735Z", + "shell.execute_reply": "2022-11-08T08:29:06.201197Z", + "shell.execute_reply.started": "2022-11-08T08:26:11.091754Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 创建存放主体检测、特征提取推理模型的文件夹\n", + "%mkdir -p deploy/models\n", + "\n", + "# 进入该文件夹\n", + "%cd deploy/models\n", + "\n", + "# 下载主体检测inference模型并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar\n", + "\n", + "# 下载特征提取inference模型并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar && tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar\n", + "\n", + "# 返回至deploy文件夹\n", + "%cd ~/work/PaddleClas/deploy/\n", + "\n", + "# 下载测试数据 drink_dataset_v1.0 并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:31:30.364422Z", + "iopub.status.busy": "2022-11-08T08:31:30.363351Z", + "iopub.status.idle": "2022-11-08T08:31:37.682006Z", + "shell.execute_reply": "2022-11-08T08:31:37.680563Z", + "shell.execute_reply.started": "2022-11-08T08:31:30.364378Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 进入deploy文件夹\n", + "%cd ./deploy\n", + "\n", + "# 用 general_PPLCNet_x2_5_lite_v1.0 推理模型提取gallery图片的特征,制作成检索库\n", + "!python3.7 python/build_gallery.py -c configs/inference_general.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer\n", + "\n", + "# 对 nongfu_spring.jpeg 图片进行识别推理(GPU推理)\n", + "!python3.7 python/predict_system.py -c configs/inference_general.yaml\n", + "# 对 nongfu_spring.jpeg 图片进行识别推理(CPU推理)\n", + "!python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "同时识别的结果(带有检测框、对应类别以及相似度)会保存至 `PaddleClas/deploy/output/nongfu_spring.jpeg`\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.4/docs/images/recognition/drink_data_demo/output/nongfu_spring.jpeg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 模型训练\n", + "\n", + "- 克隆 PaddleClas 仓库(参考 3.1模型推理 - 下载PaddleClas),并切换到2.4分支\n", + "- 主体检测模型的数据集准备、开始训练、模型评估等步骤,请参考 [PP-ShiTu 主体检测 文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)\n", + "- 特征提取模型的数据集准备、开始训练、模型评估等步骤,请参考 [PP-ShiTu 特征提取 文档](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/image_recognition_pipeline/feature_extraction.md#51-%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. 模型原理\n", + "PP-ShiTu 系列识别系统,包括本文档介绍的 PP-ShiTu,均由3个模块串联完成整个识别过程,如下图所示\n", + "\n", + "![PP-ShiTu系统](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/images/structure.jpg?raw=true)\n", + "\n", + "- 主体检测:上图中的蓝色模块,主要负责检测出用户输入图片中可能的识别目标,进而裁剪出这些目标,过滤不重要的背景,减少背景的干扰。事实上这种保留主体,过滤背景的做法是实践中会采用的一种简单而有效的方法。\n", + "- 特征提取:接收 **主体检测** 模块输出的含有目标主体的裁剪后的图片,将其输入到特征提取模型中,得到对应的特征向量,作为该图片的表示特征用于接下来的检索步骤。\n", + "- 向量检索:接收 **特征提取** 模块输出的一个或多个特征向量,逐个地在向量库中检索,将检索库中最邻近(一般以相似度表示邻近程度)的向量的类别,作为检索向量的类别,最后返回检索结果。该模块不需要额外训练,安装第三方开源的faiss检索库即可使用\n", + "\n", + "在检索系统中,最重要的模块之一就是特征提取模型,其特征提取能力好坏直接影响检索库内向量和待检索向量的质量,因此接下来分5个部分,重点介绍 PP-ShiTu 所使用的特征提取模型。\n", + "\n", + "- Backbone\n", + " Backbone 部分采用了 PP_LCNet_x2_5,其针对Intel CPU端的性能优化探索了多个有效的结构设计方案,最终实现了在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。\n", + "\n", + "- Neck\n", + "\n", + " Neck 部分采用了 FC Layer,对 Backbone 抽取得到的特征进行降维,减少了特征存储的成本与计算量。\n", + "\n", + "- Head\n", + "\n", + " Head 部分选用 ArcMargin,在训练时通过指定margin,增大同类特征之间的角度差异再进行分类,进一步提升抽取特征的表征能力。\n", + "\n", + "- Loss\n", + "\n", + " Loss 部分选用 Cross entropy loss,在训练时以分类任务的损失函数来指导网络进行优化。详细的配置文件见通用识别配置文件。\n", + "\n", + "## 5. 注意事项\n", + "PP-ShiTu 是在寻找在产业实践中最高性价比的图像识别方案,但考虑到不同识别场景的数据集均有各自的分布特点,以及训练时的软硬件限制,无法一次性将所有的数据集全部纳入到训练集中,经过权衡才使用了目前这套训练数据集的组合。因此推荐用户在了解自己实际业务数据集的特点之后,基于 PP-ShiTu 的预训练模型以及训练配置,在自己的业务数据集上进行微调甚至二次开发,以获得性能更好,更适配自己数据集的识别模型。\n", + "\n", + "## 6. 相关论文以及引用信息\n", + "```log\n", + "@article{cui2021pp,\n", + " title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n", + " author={Cui, Cheng and Gao, Tingquan and Wei, Shengyu and Du, Yuning and Guo, Ruoyu and Dong, Shuilong and Lu, Bin and Zhou, Ying and Lv, Xueying and Liu, Qiwen and others},\n", + " journal={arXiv preprint arXiv:2109.15099},\n", + " year={2021}\n", + "}\n", + "\n", + "@article{wei2021pp,\n", + " title={PP-ShiTu: A Practical Lightweight Image Recognition System},\n", + " author={Wei, Shengyu and Guo, Ruoyu and Cui, Cheng and Lu, Bin and Dong, Shuilong and Gao, Tingquan and Du, Yuning and Zhou, Ying and Lyu, Xueying and Liu, Qiwen and others},\n", + " journal={arXiv preprint arXiv:2111.00775},\n", + " year={2021}\n", + "}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.
\n", + "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "py35-paddle1.2.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/modelcenter/PP-ShiTu/introduction_en.ipynb b/modelcenter/PP-ShiTu/introduction_en.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..3f3d3105d9119e5e7e01d8961be93a7cf2400176 --- /dev/null +++ b/modelcenter/PP-ShiTu/introduction_en.ipynb @@ -0,0 +1,293 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "jupyter": { + "outputs_hidden": false + } + }, + "source": [ + "## 1. Introduction of PP-ShiTu\n", + "PP-ShiTu is a practical lightweight general-purpose image recognition system, which mainly consists of three modules: subject detection, feature learning and vector retrieval. The system adopts a variety of strategies from 8 aspects of backbone network selection and adjustment, loss function selection, data enhancement, learning rate transformation strategy, regularization parameter selection, use of pre-trained models, and model pruning and quantification Optimization, and finally a system that can complete the image recognition of the 10w+ library in only 0.2s on the CPU.\n", + "For more details, please refer to [PP-ShiTu Technical Solution](https://arxiv.org/pdf/2111.00775.pdf).\n", + "\n", + "Learn more about PaddleClas at https://github.com/PaddlePaddle/PaddleClas.\n", + "\n", + "## 2. Preview and application scenarios\n", + "### 2.1 product recognition:\n", + "\n", + "#### 2.1.1 dataset:\n", + "\n", + "The training data set and test set of PP-ShiTu are composed of 7 data sets such as Aliproduct and GLDv2. For details, please refer to [PP-ShiTu Experimental Part](https://github.com/PaddlePaddle/PaddleClas/blob/release/ 2.4/docs/zh_CN/image_recognition_pipeline/feature_extraction.md#4-%E5%AE%9E%E9%AA%8C%E9%83%A8%E5%88%86)\n", + "\n", + "#### 2.1.2 output preview:\n", + "\n", + "The detection effect of PP-ShiTu on the picture is as follows\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.4/docs/images/recognition/drink_data_demo/output/nongfu_spring.jpeg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. How to use\n", + "\n", + "### 3.1 model inference:\n", + "\n", + "- download PaddleClas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:24:16.514016Z", + "iopub.status.busy": "2022-11-08T08:24:16.513368Z", + "iopub.status.idle": "2022-11-08T08:25:00.630629Z", + "shell.execute_reply": "2022-11-08T08:25:00.629113Z", + "shell.execute_reply.started": "2022-11-08T08:24:16.513971Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 不在Jupyter Notebook上运行时需要将含 \"!\" 和 \"%\" 的语句注释,不需要运行。\n", + "%cd ~/work\n", + "\n", + "# 克隆 PaddleClas(gitee上克隆速度较快)\n", + "!git clone https://gitee.com/paddlepaddle/PaddleClas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Install PaddleClas and its dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:26:02.622321Z", + "iopub.status.busy": "2022-11-08T08:26:02.621656Z", + "iopub.status.idle": "2022-11-08T08:26:05.016413Z", + "shell.execute_reply": "2022-11-08T08:26:05.015052Z", + "shell.execute_reply.started": "2022-11-08T08:26:02.622277Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 切换到2.4分支\n", + "!git checkout release/2.4\n", + "\n", + "# 安装所需依赖项\n", + "!pip install -r requirements.txt\n", + "\n", + "# 设置GPU\n", + "# %env CUDA_VISIBLE_DEVICES=0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Quick start\n", + "\n", + "Congratulations! You have successfully installed PaddleClas, now you can experience the image recognition as guided below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:26:11.091828Z", + "iopub.status.busy": "2022-11-08T08:26:11.090376Z", + "iopub.status.idle": "2022-11-08T08:29:06.202735Z", + "shell.execute_reply": "2022-11-08T08:29:06.201197Z", + "shell.execute_reply.started": "2022-11-08T08:26:11.091754Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 创建存放主体检测、特征提取推理模型的文件夹\n", + "%mkdir -p deploy/models\n", + "\n", + "# 进入该文件夹\n", + "%cd deploy/models\n", + "\n", + "# 下载主体检测inference模型并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar\n", + "\n", + "# 下载特征提取inference模型并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/general_PPLCNet_x2_5_lite_v1.0_infer.tar && tar -xf general_PPLCNet_x2_5_lite_v1.0_infer.tar\n", + "\n", + "# 返回至deploy文件夹\n", + "%cd ~/work/PaddleClas/deploy/\n", + "\n", + "# 下载测试数据 drink_dataset_v1.0 并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v1.0.tar && tar -xf drink_dataset_v1.0.tar" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:31:30.364422Z", + "iopub.status.busy": "2022-11-08T08:31:30.363351Z", + "iopub.status.idle": "2022-11-08T08:31:37.682006Z", + "shell.execute_reply": "2022-11-08T08:31:37.680563Z", + "shell.execute_reply.started": "2022-11-08T08:31:30.364378Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 进入deploy文件夹\n", + "%cd ./deploy\n", + "\n", + "# 用 general_PPLCNet_x2_5_lite_v1.0 推理模型提取gallery图片的特征,制作成检索库\n", + "!python3.7 python/build_gallery.py -c configs/inference_general.yaml -o Global.rec_inference_model_dir=./models/general_PPLCNet_x2_5_lite_v1.0_infer\n", + "\n", + "# 对 nongfu_spring.jpeg 图片进行识别推理(GPU推理)\n", + "!python3.7 python/predict_system.py -c configs/inference_general.yaml\n", + "# 对 nongfu_spring.jpeg 图片进行识别推理(CPU推理)\n", + "!python3.7 python/predict_system.py -c configs/inference_general.yaml -o Global.use_gpu=False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At the same time, the recognition results (with detection bounding box, predicted class name and similarity) will be saved to `PaddleClas/deploy/output/nongfu_spring.jpeg`, as [2.1.2 output preview](#212-output-preview) displayed.\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/raw/release/2.4/docs/images/recognition/drink_data_demo/output/nongfu_spring.jpeg)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Model training\n", + "\n", + "- clone PaddleClas repo(refer to [3.1 model inference](#31-model-inference)), and checkout to release/2.4 branch\n", + "- For the dataset preparation, training, evaluation and other steps of the mainbody detection model, please refer to [PP-ShiTu mainbody detection doc](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/image_recognition_pipeline/mainbody_detection.md)\n", + "- For the dataset preparation, training, evaluation and other steps of the mainbody detection model, please refer to [PP-ShiTu feature extraction doc](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/zh_CN/image_recognition_pipeline/feature_extraction.md#51-%E6%95%B0%E6%8D%AE%E5%87%86%E5%A4%87)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. 模型原理\n", + "PP-ShiTu 系列识别系统,包括本文档介绍的 PP-ShiTu,均由3个模块串联完成整个识别过程,如下图所示\n", + "\n", + "![PP-ShiTu系统](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/images/structure.jpg?raw=true)\n", + "\n", + "- 主体检测:上图中的蓝色模块,主要负责检测出用户输入图片中可能的识别目标,进而裁剪出这些目标,过滤不重要的背景,减少背景的干扰。事实上这种保留主体,过滤背景的做法是实践中会采用的一种简单而有效的方法。\n", + "- 特征提取:接收 **主体检测** 模块输出的含有目标主体的裁剪后的图片,将其输入到特征提取模型中,得到对应的特征向量,作为该图片的表示特征用于接下来的检索步骤。\n", + "- 向量检索:接收 **特征提取** 模块输出的一个或多个特征向量,逐个地在向量库中检索,将检索库中最邻近(一般以相似度表示邻近程度)的向量的类别,作为检索向量的类别,最后返回检索结果。该模块不需要额外训练,安装第三方开源的faiss检索库即可使用\n", + "\n", + "在检索系统中,最重要的模块之一就是特征提取模型,其特征提取能力好坏直接影响检索库内向量和待检索向量的质量,因此接下来分5个部分,重点介绍 PP-ShiTu 所使用的特征提取模型。\n", + "\n", + "- Backbone\n", + " Backbone 部分采用了 PP_LCNet_x2_5,其针对Intel CPU端的性能优化探索了多个有效的结构设计方案,最终实现了在不增加推理时间的情况下,进一步提升模型的性能,最终大幅度超越现有的 SOTA 模型。\n", + "\n", + "- Neck\n", + "\n", + " Neck 部分采用了 FC Layer,对 Backbone 抽取得到的特征进行降维,减少了特征存储的成本与计算量。\n", + "\n", + "- Head\n", + "\n", + " Head 部分选用 ArcMargin,在训练时通过指定margin,增大同类特征之间的角度差异再进行分类,进一步提升抽取特征的表征能力。\n", + "\n", + "- Loss\n", + "\n", + " Loss 部分选用 Cross entropy loss,在训练时以分类任务的损失函数来指导网络进行优化。详细的配置文件见通用识别配置文件。\n", + "\n", + "## 5. 注意事项\n", + "PP-ShiTu 是在寻找在产业实践中最高性价比的图像识别方案,但考虑到不同识别场景的数据集均有各自的分布特点,以及训练时的软硬件限制,无法一次性将所有的数据集全部纳入到训练集中,经过权衡才使用了目前这套训练数据集的组合。因此推荐用户在了解自己实际业务数据集的特点之后,基于 PP-ShiTu 的预训练模型以及训练配置,在自己的业务数据集上进行微调甚至二次开发,以获得性能更好,更适配自己数据集的识别模型。\n", + "\n", + "## 6. 相关论文以及引用信息\n", + "```log\n", + "@article{cui2021pp,\n", + " title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n", + " author={Cui, Cheng and Gao, Tingquan and Wei, Shengyu and Du, Yuning and Guo, Ruoyu and Dong, Shuilong and Lu, Bin and Zhou, Ying and Lv, Xueying and Liu, Qiwen and others},\n", + " journal={arXiv preprint arXiv:2109.15099},\n", + " year={2021}\n", + "}\n", + "\n", + "@article{wei2021pp,\n", + " title={PP-ShiTu: A Practical Lightweight Image Recognition System},\n", + " author={Wei, Shengyu and Guo, Ruoyu and Cui, Cheng and Lu, Bin and Dong, Shuilong and Gao, Tingquan and Du, Yuning and Zhou, Ying and Lyu, Xueying and Liu, Qiwen and others},\n", + " journal={arXiv preprint arXiv:2111.00775},\n", + " year={2021}\n", + "}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.
\n", + "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "py35-paddle1.2.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/modelcenter/PP-ShiTuV2/APP/app.py b/modelcenter/PP-ShiTuV2/APP/app.py new file mode 100644 index 0000000000000000000000000000000000000000..4b9bb5787624a64f2a497b8720641fead0505077 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/APP/app.py @@ -0,0 +1,157 @@ +import base64 +import os +from io import BytesIO +from typing import Any, Dict, List, Union + +import gradio as gr +import numpy as np +import requests +from paddleclas import PaddleClas +from PIL import Image, ImageDraw, ImageFont +from tqdm import tqdm + + +def download_with_progressbar(url: str, save_path: str): + """Download file from given url and decompress it + + Args: + url (str): url + save_path (str): path for saving downloaded file + + Raises: + Exception: exception + """ + print(f"Auto downloading {url} to {save_path}") + if os.path.exists(save_path): + print("File already exist, skip...") + else: + response = requests.get(url, stream=True) + total_size_in_bytes = int(response.headers.get("content-length", 0)) + block_size = 1024 # 1 Kibibyte + progress_bar = tqdm( + total=total_size_in_bytes, unit="iB", unit_scale=True) + with open(save_path, "wb") as file: + for data in response.iter_content(block_size): + progress_bar.update(len(data)) + file.write(data) + progress_bar.close() + if total_size_in_bytes == 0 or progress_bar.n != total_size_in_bytes or not os.path.isfile( + save_path): + raise Exception( + f"Something went wrong while downloading file from {url}") + print("Finished downloading") + print(f"Try decompression at {save_path}") + os.system(f"tar -xf {save_path}") + print(f"Finished decompression at {save_path}") + + +def image_to_base64(image: Image.Image) -> str: + """encode Pillow image to base64 string + + Args: + image (Image.Image): image to be encoded + + Returns: + str: encoded string + """ + byte_data = BytesIO() # 创建一个字节流管道 + image.save(byte_data, format="JPEG") # 将图片数据存入字节流管道 + byte_data = byte_data.getvalue() # 从字节流管道中获取二进制 + base64_str = base64.b64encode(byte_data).decode("ascii") # 二进制转base64 + return base64_str + + +# UGC: Define the inference fn() for your models +def model_inference(image) -> tuple: + """send given image to inference model and get result from output + + Args: + image (gr.Image): input image + + Returns: + tuple: (drawn image to display, result in json format) + """ + results = clas_engine.predict(image, print_pred=True, predict_type="shitu") + + # bs = 1, fetch the first result + results = list(results)[0] + + image_draw_box = draw_bbox_results(image, results) + + im_show = Image.fromarray(image_draw_box) + + json_out = {"base64": image_to_base64(im_show), "result": str(results)} + return im_show, json_out + + +def draw_bbox_results(image: Union[np.ndarray, Image.Image], + results: List[Dict[str, Any]]) -> np.ndarray: + """draw bounding box(es) + + Args: + image (Union[np.ndarray, Image.Image]): image to be drawn + results (List[Dict[str, Any]]): information for drawing bounding box + + Returns: + np.ndarray: drawn image + """ + if isinstance(image, np.ndarray): + image = Image.fromarray(image) + draw = ImageDraw.Draw(image) + font_size = 18 + font = ImageFont.truetype("./simfang.ttf", font_size, encoding="utf-8") + + color = (0, 102, 255) + + for result in results: + # empty results + if result["rec_docs"] is None: + continue + + xmin, ymin, xmax, ymax = result["bbox"] + text = "{}, {:.2f}".format(result["rec_docs"], result["rec_scores"]) + th = font_size + tw = font.getsize(text)[0] + start_y = max(0, ymin - th) + + draw.rectangle( + [(xmin + 1, start_y), (xmin + tw + 1, start_y + th)], fill=color) + + draw.text((xmin + 1, start_y), text, fill=(255, 255, 255), font=font) + + draw.rectangle( + [(xmin, ymin), (xmax, ymax)], outline=(255, 0, 0), width=2) + + return np.array(image) + + +def clear_all(): + return None, None, None + + +# download drink_dataset_v2.0.tar +dataset_url = "https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar" +download_with_progressbar(dataset_url, + os.path.join("./", dataset_url.split("/")[-1])) + +clas_engine = PaddleClas(model_name="PP-ShiTuV2", use_gpu=False) + +with gr.Blocks() as demo: + gr.Markdown("PP-ShiTuV2") + + with gr.Column(scale=1, min_width=100): + img_in = gr.Image( + value="https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/images/recognition/drink_data_demo/test_images/100.jpeg?raw=true", + label="Input") + + with gr.Row(): + btn1 = gr.Button("Clear") + btn2 = gr.Button("Submit") + img_out = gr.Image(label="Output").style(height=400) + json_out = gr.JSON(label="jsonOutput") + + btn2.click(fn=model_inference, inputs=img_in, outputs=[img_out, json_out]) + btn1.click(fn=clear_all, inputs=None, outputs=[img_in, img_out, json_out]) + gr.Button.style(1) + +demo.launch(share=True) diff --git a/modelcenter/PP-ShiTuV2/APP/app.yml b/modelcenter/PP-ShiTuV2/APP/app.yml new file mode 100644 index 0000000000000000000000000000000000000000..f10579b7e5af26910a57d33e8803c388cf85d587 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/APP/app.yml @@ -0,0 +1,11 @@ +【PP-ShiTuV2-App-YAML】 + +APP_Info: + title: PP-ShiTuV2-App + colorFrom: blue + colorTo: yellow + sdk: gradio + sdk_version: 3.4.1 + app_file: app.py + license: apache-2.0 + device: gpu \ No newline at end of file diff --git a/modelcenter/PP-ShiTuV2/APP/requirements.txt b/modelcenter/PP-ShiTuV2/APP/requirements.txt new file mode 100644 index 0000000000000000000000000000000000000000..ece02d9069b18d42ff2b3fa6556975b629fa43ce --- /dev/null +++ b/modelcenter/PP-ShiTuV2/APP/requirements.txt @@ -0,0 +1,3 @@ +gradio +paddlepaddle +paddleclas==2.5.0 diff --git a/modelcenter/PP-ShiTuV2/benchmark_cn.md b/modelcenter/PP-ShiTuV2/benchmark_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..f4280e496fc052d6f1cce633447bd3888d5c18c8 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/benchmark_cn.md @@ -0,0 +1,68 @@ +## 1. 训练Benchmark + +### 1.1 软硬件环境 + +* PP-ShiTuV2 的特征提取模型训练过程中使用8 GPUs,每GPU batch size为256进行训练,采样器使用PKSampler,一个含256个样本mini-batch有64个类别,每个类别内含4张不同的图片,如训练GPU数和batch size不使用上述配置,须参考FAQ调整学习率和迭代次数。 + + **注**:由于本模型使用PKSampler和metric learning相关方法,因此改变batch size可能对性能有比较明显的影响。 + +* PP-ShiTuV2 的检测模型训练过程中使用8 GPUs,每GPU batch size为28进行训练,如训练GPU数和batch size不使用上述配置,须参考FAQ调整学习率和迭代次数。 + +### 1.2 数据集 + +特征提取模型对原有的训练数据进行了合理扩充与优化,最终使用如下 17 个公开数据集的汇总: + +| 数据集 | 数据量 | 类别数 | 场景 | 数据集地址 | +| :--------------------- | :-----: | :------: | :---: | :----------------------------------------------------------------------------------: | +| Aliproduct | 2498771 | 50030 | 商品 | [地址](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | +| GLDv2 | 1580470 | 81313 | 地标 | [地址](https://github.com/cvdfoundation/google-landmark) | +| VeRI-Wild | 277797 | 30671 | 车辆 | [地址](https://github.com/PKU-IMRE/VERI-Wild) | +| LogoDet-3K | 155427 | 3000 | Logo | [地址](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| SOP | 59551 | 11318 | 商品 | [地址](https://cvgl.stanford.edu/projects/lifted_struct/) | +| Inshop | 25882 | 3997 | 商品 | [地址](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | +| bird400 | 58388 | 400 | 鸟类 | [地址](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) | +| 104flows | 12753 | 104 | 花类 | [地址](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | +| Cars | 58315 | 112 | 车辆 | [地址](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) | +| Fashion Product Images | 44441 | 47 | 商品 | [地址](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) | +| flowerrecognition | 24123 | 59 | 花类 | [地址](https://www.kaggle.com/datasets/aymenktari/flowerrecognition) | +| food-101 | 101000 | 101 | 食物 | [地址](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) | +| fruits-262 | 225639 | 262 | 水果 | [地址](https://www.kaggle.com/datasets/aelchimminut/fruits262) | +| inaturalist | 265213 | 1010 | 自然 | [地址](https://github.com/visipedia/inat_comp/tree/master/2017) | +| indoor-scenes | 15588 | 67 | 室内 | [地址](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019) | +| Products-10k | 141931 | 9691 | 商品 | [地址](https://products-10k.github.io/) | +| CompCars | 16016 | 431 | 车辆 | [地址](http://​​​​​​http://ai.stanford.edu/~jkrause/cars/car_dataset.html​) | +| **Total** | **6M** | **192K** | - | - | + + +主体检测模型的数据集请参考 [主体检测模型数据集](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86) + +### 1.3 指标 + +| 模型名称 | 模型简介 | 模型体积 | 输入尺寸 | ips | +| ---------------------------------------- | -------- | ------------ | -------- | --- | +| picodet_lcnet_x2_5_640_mainbody.yml | 主体检测 | 30MB | 640 | 21 | +| GeneralRecognitionV2_PPLCNetV2_base.yaml | 特征提取 | 19MB(KL量化) | 224 | 163 | + + +## 2. 推理 Benchmark + +### 2.1 软硬件环境 + +* PP-ShiTuV2主体检测和特征提取模型的推理速度测试采用CPU,开启MKLDNN,10线程,batch size=1进行测试。 + + +### 2.2 数据集 + +PP-ShiTuV2特征提取模型使用自建产品数据集作为测试集 + +### 2.3 指标 + +| 模型 | 存储(主体检测+特征提取) | product | +| :--------- | :---------------------- | :------------------ | +| | | recall@1 | +| PP-ShiTuV1 | 64(30+34)MB | 66.8% | +| PP-ShiTuV2 | 49(30+19)MB | 73.8% | + + +## 3. 相关使用说明 +请参考:https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/models/PP-ShiTu/README.md diff --git a/modelcenter/PP-ShiTuV2/benchmark_en.md b/modelcenter/PP-ShiTuV2/benchmark_en.md new file mode 100644 index 0000000000000000000000000000000000000000..cdc8c369f29c5e5700d3589f187d0bd7eb364be1 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/benchmark_en.md @@ -0,0 +1,67 @@ +## 1. Train Benchmark + +### 1.1 Software and hardware environment + +* The feature extraction model of PP-ShiTuV2 uses 8 GPUs in the training process, the batch size of each GPU is 256 for training, the sampler uses PKSampler, a mini-batch with 256 samples has 64 classes, and each class contains 4 different pictures. If the number of training GPUs and batch size are not consistent with the above configuration, you must refer to the FAQ to adjust the learning rate and the number of iterations. + + **Note**: Since this model uses PKSampler and metric learning methods, changing the batch size may have a significant impact on performance. + +* 8 GPUs are used in the training process of the detection model of PP-ShiTuV2, and the batch size of each GPU is 28 for training. If the number of training GPUs and batch size are not consistent with the above configuration, you must refer to the FAQ to adjust the learning rate and the number of iterations. + +### 1.2 Dataset + +The feature extraction model expands and optimizes the original training data, and finally uses the following 17 public datasets: + +| Dataset | Data Amount | Number of classes | Scenario | Dataset Address | +| :--------------------- | :---------: | :---------------: | :---------: | :-------------------------------------------------------------------------------------: | +| Aliproduct | 2498771 | 50030 | Commodities | [Address](https://retailvisionworkshop.github.io/recognition_challenge_2020/) | +| GLDv2 | 1580470 | 81313 | Landmark | [address](https://github.com/cvdfoundation/google-landmark) | +| VeRI-Wild | 277797 | 30671 | Vehicles | [Address](https://github.com/PKU-IMRE/VERI-Wild) | +| LogoDet-3K | 155427 | 3000 | Logo | [Address](https://github.com/Wangjing1551/LogoDet-3K-Dataset) | +| SOP | 59551 | 11318 | Commodities | [Address](https://cvgl.stanford.edu/projects/lifted_struct/) | +| Inshop | 25882 | 3997 | Commodities | [Address](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) | +| bird400 | 58388 | 400 | birds | [address](https://www.kaggle.com/datasets/gpiosenka/100-bird-species) | +| 104flows | 12753 | 104 | Flowers | [Address](https://www.robots.ox.ac.uk/~vgg/data/flowers/102/) | +| Cars | 58315 | 112 | Vehicles | [Address](https://ai.stanford.edu/~jkrause/cars/car_dataset.html) | +| Fashion Product Images | 44441 | 47 | Products | [Address](https://www.kaggle.com/datasets/paramaggarwal/fashion-product-images-dataset) | +| flowerrecognition | 24123 | 59 | flower | [address](https://www.kaggle.com/datasets/aymenktari/flowerrecognition) | +| food-101 | 101000 | 101 | food | [address](https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/) | +| fruits-262 | 225639 | 262 | fruits | [address](https://www.kaggle.com/datasets/aelchimminut/fruits262) | +| inaturalist | 265213 | 1010 | natural | [address](https://github.com/visipedia/inat_comp/tree/master/2017) | +| indoor-scenes | 15588 | 67 | indoor | [address](https://www.kaggle.com/datasets/itsahmad/indoor-scenes-cvpr-2019) | +| Products-10k | 141931 | 9691 | Products | [Address](https://products-10k.github.io/) | +| CompCars | 16016 | 431 | Vehicles | [Address](http://​​​​​​http://ai.stanford.edu/~jkrause/cars/car_dataset.html​) | +| **Total** | **6M** | **192K** | - | - | + + +For the dataset of the mainbody detection model, please refer to [mainbody detection model dataset](https://github.com/PaddlePaddle/PaddleClas/blob/release%2F2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86) + +### Metrics + +| Model Name | Model Introduction | Model Volume | Input Dimensions | ips | +| ---------------------------------------- | ------------------ | ---------------------- | ---------------- | --- | +| picodet_lcnet_x2_5_640_mainbody.yml | body detection | 30MB | 640 | 21 | +| GeneralRecognitionV2_PPLCNetV2_base.yaml | Feature extraction | 19MB (KL quantization) | 224 | 163 | + + +## 2. Inference Benchmark + +### 2.1 Environment + +* The inference speed test of the PP-ShiTuV2 mainbody detection and feature extraction model uses CPU, with MKLDNN turned on, 10 threads, and batch size=1 for testing. + +### 2.2 Dataset + +The PP-ShiTuV2 feature extraction model uses the self-built product dataset as the test set + +### 2.3 Results + +| model | storage (mainbody detection + feature extraction) | product | +| :--------- | :----------------------------------------------- | :------- | +| | | recall@1 | +| PP-ShiTuV1 | 64(30+34)MB | 66.8% | +| PP-ShiTuV2 | 49(30+19)MB | 73.8% | + + +## 3. Related Instructions +Please refer to: https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/models/PP-ShiTu/README.md diff --git a/modelcenter/PP-ShiTuV2/download_cn.md b/modelcenter/PP-ShiTuV2/download_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..65ed07b15c34f5cc47a3eef7e585075f6e639045 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/download_cn.md @@ -0,0 +1,5 @@ +# 提供模型所支持的任务场景、推理和预训练模型文件: +| 模型名称 | 模型简介 | 模型体积 | 输入尺寸 | 下载地址 | +| ----------------------------------- | -------- | -------- | -------- | --------------------------- | +| picodet_lcnet_x2_5_640_mainbody | 主体检测 | 30MB | 640 | [推理模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar)/[预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | +| GeneralRecognitionV2_PPLCNetV2_base | 特征提取 | 19MB | 224 | [推理模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar)/[预训练模型](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams) | diff --git a/modelcenter/PP-ShiTuV2/download_en.md b/modelcenter/PP-ShiTuV2/download_en.md new file mode 100644 index 0000000000000000000000000000000000000000..04c971e82d370e2e01262490bc8e7afd77029161 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/download_en.md @@ -0,0 +1,5 @@ +# Related pretrained model and inference model: +| model name | role | storage | input size(inference) | download link | +| ----------------------------------- | ------------------ | ------- | --------------------- | ---------------------------- | +| picodet_lcnet_x2_5_640_mainbody | mainbody detection | 30MB | 640 | [inference model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar)/[pretrained model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar) | +| GeneralRecognitionV2_PPLCNetV2_base | feature extraction | 19MB | 224 | [inference model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar)/[pretrained model](https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/pretrain/PPShiTuV2/general_PPLCNetV2_base_pretrained_v1.0.pdparams) | diff --git a/modelcenter/PP-ShiTuV2/info.yaml b/modelcenter/PP-ShiTuV2/info.yaml new file mode 100644 index 0000000000000000000000000000000000000000..f380e5d9856c87634e701e1c072f7fefe461fc21 --- /dev/null +++ b/modelcenter/PP-ShiTuV2/info.yaml @@ -0,0 +1,27 @@ +--- +Model_Info: + name: "PP-ShiTuV2" + description: "PP-ShiTuV2识别系统" + description_en: "PP-ShiTuV2, an light-weighted recognition system" + icon: "@后续UE统一设计之后,会存到bos上某个位置" + from_repo: "PaddleClas" +Task: +- tag_en: "Computer Vision" + tag: "计算机视觉" + sub_tag_en: "Characteristics of Product Image" + sub_tag: "商品图片特征" +Example: +- tag_en: "Intelligent Retail" + tag: "智慧零售" + sub_tag_en: "Commodity identification" + title: "生鲜产品自主结算" + sub_tag: "商品识别" + url: "https://aistudio.baidu.com/aistudio/projectdetail/4486158?channelType=0&channel=0" +Datasets: "Aliproduct, GLDv2, VeRI-Wild, LogoDet-3K, SOP, Inshop, bird400, 104flows,\ + \ Cars, Fashion Product Images, flowerrecognition, food-101, fruits-262, inaturalist,\ + \ indoor-scenes, Products-10k, CompCars" +Pulisher: "Baidu" +License: "apache.2.0" +Paper: "" +IfTraining: 1 +IfOnlineDemo: 1 diff --git a/modelcenter/PP-ShiTuV2/introduction_cn.ipynb b/modelcenter/PP-ShiTuV2/introduction_cn.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..56c86bc7d9e6cd847fb4ce1d8ff57810d7ba024e --- /dev/null +++ b/modelcenter/PP-ShiTuV2/introduction_cn.ipynb @@ -0,0 +1,299 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "jupyter": { + "outputs_hidden": false + } + }, + "source": [ + "## 1. PP-ShiTuV2模型简介\n", + "PP-ShiTuV2 是基于 PP-ShiTuV1 改进的一个实用轻量级通用图像识别系统,由主体检测、特征提取、向量检索三个模块构成,相比 PP-ShiTuV1 具有更高的识别精度、更强的泛化能力以及相近的推理速度*。主要针对训练数据集、特征提取两个部分进行优化,使用了更优的骨干网络、损失函数与训练策略,使得 PP-ShiTuV2 在多个实际应用场景上的检索性能有显著提升。\n", + "\n", + "PP-ShiTuV2模型由飞桨官方出品,是PaddleClas优化和改进的识别检索模型。 更多关于PaddleClas可以点击 https://github.com/PaddlePaddle/PaddleClas 进行了解。\n", + "\n", + "## 2. 模型效果及应用场景\n", + "### 2.1 商品识别任务:\n", + "\n", + "#### 2.1.1 数据集:\n", + "\n", + "PP-ShiTuV2的训练数据集以 Aliproduct、GLDv2等数据集为主,详细信息可参考 [PP-ShiTuV2 实验部分](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#4-%E5%AE%9E%E9%AA%8C%E9%83%A8%E5%88%86)\n", + "\n", + "#### 2.1.2 模型效果速览:\n", + "\n", + "PP-ShiTuV2 在图片上的检测效果如下\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. 模型如何使用\n", + "\n", + "### 3.1 模型推理:\n", + "\n", + "- 下载 PaddleClas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:24:16.514016Z", + "iopub.status.busy": "2022-11-08T08:24:16.513368Z", + "iopub.status.idle": "2022-11-08T08:25:00.630629Z", + "shell.execute_reply": "2022-11-08T08:25:00.629113Z", + "shell.execute_reply.started": "2022-11-08T08:24:16.513971Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 不在Jupyter Notebook上运行时需要将含 \"!\" 和 \"%\" 的语句注释,不需要运行。\n", + "%cd ~/work\n", + "\n", + "# 克隆 PaddleClas(gitee上克隆速度较快)\n", + "!git clone https://gitee.com/paddlepaddle/PaddleClas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 安装 PaddleClas 及其依赖包" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:26:02.622321Z", + "iopub.status.busy": "2022-11-08T08:26:02.621656Z", + "iopub.status.idle": "2022-11-08T08:26:05.016413Z", + "shell.execute_reply": "2022-11-08T08:26:05.015052Z", + "shell.execute_reply.started": "2022-11-08T08:26:02.622277Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 安装所需依赖项\n", + "!pip install -r requirements.txt\n", + "\n", + "# 设置GPU\n", + "# %env CUDA_VISIBLE_DEVICES=0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- 快速体验\n", + "\n", + "恭喜! 您已经成功安装了 PaddleClas,接下来快速体验图像识别效果" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:26:11.091828Z", + "iopub.status.busy": "2022-11-08T08:26:11.090376Z", + "iopub.status.idle": "2022-11-08T08:29:06.202735Z", + "shell.execute_reply": "2022-11-08T08:29:06.201197Z", + "shell.execute_reply.started": "2022-11-08T08:26:11.091754Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 创建存放主体检测、特征提取推理模型的文件夹\n", + "%mkdir -p deploy/models\n", + "\n", + "# 进入该文件夹\n", + "%cd deploy/models\n", + "\n", + "# 下载主体检测inference模型并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar\n", + "\n", + "# 下载特征提取inference模型并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar\n", + "\n", + "# 返回至deploy文件夹\n", + "%cd ~/work/PaddleClas/deploy/\n", + "\n", + "# 下载测试数据 drink_dataset_v2.0 并解压\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:31:30.364422Z", + "iopub.status.busy": "2022-11-08T08:31:30.363351Z", + "iopub.status.idle": "2022-11-08T08:31:37.682006Z", + "shell.execute_reply": "2022-11-08T08:31:37.680563Z", + "shell.execute_reply.started": "2022-11-08T08:31:30.364378Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# 进入 PaddleClas 目录\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# 进入deploy文件夹\n", + "%cd ./deploy\n", + "\n", + "# 对 100.jpeg 图片进行识别推理\n", + "!python python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs=\"./drink_dataset_v2.0/test_images/100.jpeg\" -o Global.use_gpu=False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "同时识别的结果(带有检测框、对应类别以及相似度)会保存至 `PaddleClas/deploy/output/100.jpeg`,如文档开头的 [2.1.2 模型效果速览](#212-模型效果速览) 所展示。\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 模型训练\n", + "\n", + "- 克隆 PaddleClas 仓库(参考 3.1模型推理 - 下载PaddleClas)\n", + "- 主体检测模型的数据集准备、开始训练、模型评估等步骤,请参考 [PP-ShiTu 主体检测 文档](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md)\n", + "- 特征提取模型的数据集准备、开始训练、模型评估等步骤,请参考 [PP-ShiTu 特征提取 文档](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#5-%E8%87%AA%E5%AE%9A%E4%B9%89%E7%89%B9%E5%BE%81%E6%8F%90%E5%8F%96)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. 模型原理\n", + "PP-ShiTu 系列识别系统,包括本文档介绍的 PP-ShiTuV2,均由3个模块串联完成整个识别过程,如下图所示\n", + "\n", + "![PP-ShiTu系统](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/structure.jpg)\n", + "\n", + "- 主体检测:上图中的蓝色模块,主要负责检测出用户输入图片中可能的识别目标,进而裁剪出这些目标,过滤不重要的背景,减少背景的干扰。事实上这种保留主体,过滤背景的做法是实践中会采用的一种简单而有效的方法。\n", + "- 特征提取:接收 **主体检测** 模块输出的含有目标主体的裁剪后的图片,将其输入到特征提取模型中,得到对应的特征向量,作为该图片的表示特征用于接下来的检索步骤。\n", + "- 向量检索:接收 **特征提取** 模块输出的一个或多个特征向量,逐个地在向量库中检索,将检索库中最邻近(一般以相似度表示邻近程度)的向量的类别,作为检索向量的类别,最后返回检索结果。该模块不需要额外训练,安装第三方开源的faiss检索库即可使用\n", + "\n", + "在检索系统中,最重要的模块之一就是特征提取模型,其特征提取能力好坏直接影响检索库内向量和待检索向量的质量,因此接下来分5个部分,重点介绍 PP-ShiTuV2 所使用的特征提取模型。\n", + "\n", + "- Backbone\n", + "\n", + " Backbone 部分采用了 PP-LCNetV2_base,其在 PPLCNet_V1 的基础上,加入了包括Rep 策略、PW 卷积、Shortcut、激活函数改进、SE 模块改进等多个优化点,使得最终分类精度与 PPLCNet_x2_5 相近,且推理延时减少了40%\\*。在实验过程中我们对 PPLCNetV2_base 进行了适当的改进,在保持速度基本不变的情况下,让其在识别任务中得到更高的性能,包括:去掉 PPLCNetV2_base 末尾的 ReLU 和 FC、将最后一个 stage(RepDepthwiseSeparable) 的 stride 改为1。\n", + "\n", + " **注**: \\*推理环境基于 Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz 硬件平台,OpenVINO 推理平台。\n", + "\n", + "- Neck\n", + "\n", + " Neck 部分采用了 BN Neck,对 Backbone 抽取得到的特征的每个维度进行标准化操作,减少了同时优化度量学习损失函数和分类损失函数的难度,加快收敛速度的同时减少 IDLoss 和 TripletLoss 之间由于优化目标所在空间不同带来的影响。\n", + "\n", + "- Head\n", + "\n", + " Head 部分选用 FC Layer,使用分类头将 feature 转换成 logits 供后续计算分类损失(一般使用交叉熵损失,称之为CELoss或IDLoss)。\n", + "\n", + "- Loss\n", + "\n", + " Loss 部分选用 Cross entropy loss 和 TripletAngularMarginLoss,在训练时以分类损失和基于角度的三元组损失来指导网络进行优化。我们基于原始的 TripletLoss (困难三元组损失)进行了改进,将优化目标从 L2 欧几里得空间更换成余弦空间,并加入了 anchor 与 positive/negtive 之间的硬性距离约束,让训练与测试的目标更加接近,提升模型的泛化能力。详细的配置文件见 [GeneralRecognitionV2_PPLCNetV2_base.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-L77)。\n", + "\n", + "- Data Augmentation\n", + "\n", + " 我们考虑到实际相机拍摄时目标主体可能出现一定的旋转而不一定能保持正立状态,因此我们在数据增强中加入了适当的 [随机旋转增强](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117),以提升模型在真实场景中的检索能力。\n", + "\n", + "## 5. 注意事项\n", + "PP-ShiTuV2 是在寻找在产业实践中最高性价比的图像识别方案,但考虑到不同识别场景的数据集均有各自的分布特点,以及训练时的软硬件限制,无法一次性将所有的数据集全部纳入到训练集中,经过权衡才使用了目前这套训练数据集的组合。因此推荐用户在了解自己实际业务数据集的特点之后,基于 PP-ShiTuV2 的预训练模型以及训练配置,在自己的业务数据集上进行微调甚至二次开发,以获得性能更好,更适配自己数据集的识别模型。\n", + "\n", + "## 6. 相关论文以及引用信息\n", + "```log\n", + "@article{cui2021pp,\n", + " title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n", + " author={Cui, Cheng and Gao, Tingquan and Wei, Shengyu and Du, Yuning and Guo, Ruoyu and Dong, Shuilong and Lu, Bin and Zhou, Ying and Lv, Xueying and Liu, Qiwen and others},\n", + " journal={arXiv preprint arXiv:2109.15099},\n", + " year={2021}\n", + "}\n", + "\n", + "@InProceedings{Luo_2019_CVPR_Workshops,\n", + " author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei},\n", + " title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification},\n", + " booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},\n", + " month = {June},\n", + " year = {2019}\n", + "}\n", + "\n", + "@ARTICLE{Luo_2019_Strong_TMM,\n", + " author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}},\n", + " journal={IEEE Transactions on Multimedia},\n", + " title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification},\n", + " year={2019},\n", + " pages={1-1},\n", + " doi={10.1109/TMM.2019.2958756},\n", + " ISSN={1941-0077},\n", + "}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.
\n", + "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "py35-paddle1.2.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/modelcenter/PP-ShiTuV2/introduction_en.ipynb b/modelcenter/PP-ShiTuV2/introduction_en.ipynb new file mode 100644 index 0000000000000000000000000000000000000000..f8c0b0fbcc6a0d7a251e70305681a080de2a564b --- /dev/null +++ b/modelcenter/PP-ShiTuV2/introduction_en.ipynb @@ -0,0 +1,296 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "jupyter": { + "outputs_hidden": false + } + }, + "source": [ + "## 1. Introduction\n", + "PP-ShiTuv2 is a practical lightweight general image recognition system improved on PP-ShitUV1. It is composed of three modules: mainbody detection, feature extraction and vector search. Compared with PP-ShiTuV1, PP-ShiTuV2 has higher recognition accuracy, stronger generalization and similar inference speed *. This paper mainly optimize in training dataset, feature extraction with better backbone network, loss function and training strategy, which significantly improved the retrieval performance of PP-ShiTuV2 in multiple practical application scenarios.\n", + "\n", + "The PP-ShiTuV2 model is officially produced by PaddleClas, which is an optimized and improved recognition retrieval model by PaddleClas. More about PaddleClas can be found at https://github.com/PaddlePaddle/PaddleClas.\n", + "\n", + "## 2. Preview and application scenarios\n", + "### 2.1 product recognition:\n", + "\n", + "#### 2.1.1 dataset:\n", + "\n", + "Including Aliproduct and GLDv2. For details, please refer to [PP-ShiTuV2 Experiment Section](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#4-%E5%AE%9E%E9%AA%8C%E9%83%A8%E5%88%86)\n", + "\n", + "#### 2.1.2 output preview:\n", + "\n", + "for example, the output of PP-ShiTuV2 on the picture is as follows\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. How to use\n", + "\n", + "### 3.1 model inference:\n", + "\n", + "- download PaddleClas" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:24:16.514016Z", + "iopub.status.busy": "2022-11-08T08:24:16.513368Z", + "iopub.status.idle": "2022-11-08T08:25:00.630629Z", + "shell.execute_reply": "2022-11-08T08:25:00.629113Z", + "shell.execute_reply.started": "2022-11-08T08:24:16.513971Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Statements containing \"!\" and \"%\" need to be commented out when not running on Jupyter Notebook and do not need to be run.\n", + "%cd ~/work\n", + "\n", + "!git clone https://gitee.com/paddlepaddle/PaddleClas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Install PaddleClas and its dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true, + "execution": { + "iopub.execute_input": "2022-11-08T08:26:02.622321Z", + "iopub.status.busy": "2022-11-08T08:26:02.621656Z", + "iopub.status.idle": "2022-11-08T08:26:05.016413Z", + "shell.execute_reply": "2022-11-08T08:26:05.015052Z", + "shell.execute_reply.started": "2022-11-08T08:26:02.622277Z" + }, + "jupyter": { + "outputs_hidden": true + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Access into PaddleClas\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# Install required dependencies\n", + "!pip install -r requirements.txt\n", + "\n", + "# set GPU environment\n", + "# %env CUDA_VISIBLE_DEVICES=0" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- Quick start\n", + "\n", + "Congratulations! You have successfully installed PaddleClas, now you can experience the image recognition as guided below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:26:11.091828Z", + "iopub.status.busy": "2022-11-08T08:26:11.090376Z", + "iopub.status.idle": "2022-11-08T08:29:06.202735Z", + "shell.execute_reply": "2022-11-08T08:29:06.201197Z", + "shell.execute_reply.started": "2022-11-08T08:26:11.091754Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Access into PaddleClas\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# Create a folder for storing mainbody detection and feature extraction inference models\n", + "%mkdir -p deploy/models\n", + "\n", + "# Access into models\n", + "%cd deploy/models\n", + "\n", + "# Download the mainbody detection inference model and unzip it\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar && tar -xf picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer.tar\n", + "\n", + "# Download the feature extraction inference model and unzip it\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/models/inference/PP-ShiTuV2/general_PPLCNetV2_base_pretrained_v1.0_infer.tar && tar -xf general_PPLCNetV2_base_pretrained_v1.0_infer.tar\n", + "\n", + "# Back to deloy/\n", + "%cd ~/work/PaddleClas/deploy/\n", + "\n", + "# Download the test data drink_dataset_v2.0 and unzip it\n", + "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "execution": { + "iopub.execute_input": "2022-11-08T08:31:30.364422Z", + "iopub.status.busy": "2022-11-08T08:31:30.363351Z", + "iopub.status.idle": "2022-11-08T08:31:37.682006Z", + "shell.execute_reply": "2022-11-08T08:31:37.680563Z", + "shell.execute_reply.started": "2022-11-08T08:31:30.364378Z" + }, + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "# Access into PaddleClas\n", + "%cd ~/work/PaddleClas/\n", + "\n", + "# Access into deploy\n", + "%cd ./deploy\n", + "\n", + "# Perform recognition and inference on 100.jpeg\n", + "!python python/predict_system.py -c configs/inference_general.yaml -o Global.infer_imgs=\"./drink_dataset_v2.0/test_images/100.jpeg\" -o Global.use_gpu=False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At the same time, the recognition results (with detection bounding box, predicted class name and similarity) will be saved to `PaddleClas/deploy/output/100.jpeg`, as [2.1.2 output preview](#212-output-preview) displayed.\n", + "\n", + "![](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2 Model training\n", + "\n", + "- clone PaddleClas repo(refer to [3.1 model inference](#31-model-inference))\n", + "- For the dataset preparation, training, evaluation and other steps of the mainbody detection model, please refer to [PP-ShiTuV2 mainbody detection doc](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md)\n", + "- For the dataset preparation, training, evaluation and other steps of the feature extraction model, please refer to [PP-ShiTuV2 feature extraction doc](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#5-%E8%87%AA%E5%AE%9A%E4%B9%89%E7%89%B9%E5%BE%81%E6%8F%90%E5%8F%96)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Algorithm\n", + "PP-ShiTu series recognition systems, including PP-ShiTuV2 introduced in this document, consists of three modules to complete the entire recognition process, as shown in the figure below\n", + "\n", + "![PP-ShiTu System](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/structure.jpg)\n", + "\n", + "- Mainbody detection: The blue colored module in the figure above detects potential targets in the input image, and then cropping these targets, filtering unimportant backgrounds, and reducing background interference. In fact, this practice of retaining the mainbody and filtering the background is a simple, effective and widely used method in practice.\n", + "- Feature extraction: Receive the cropped image containing the target mainbody output by the **mainbody detection** module, and input it into the feature extraction model to obtain the corresponding feature vector, which is used as the representation feature of the image for subsequent retrieval.\n", + "- Vector retrieval: Receive one or more feature vectors output by the **feature extraction** module, and retrieve them one by one in the vector library, finally return the retrieval result. This module does not require additional training and can be used by installing the third-party open source faiss retrieval library\n", + "\n", + "In the recognition system, one of the most important modules is the feature extraction model. The generalization of feature extraction model directly affects the quality of the vectors in the retrieval library and the vectors to be retrieved. Therefore, we will introduce feature extraction model below in 5 parts.\n", + "\n", + "- Backbone\n", + "\n", + " The Backbone adopts PP-LCNetV2_base. On the basis of PPLCNet_V1, it adds multiple optimization points including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement, etc., so that the final classification accuracy is similar to PPLCNet_x2_5, but the inference latency has been reduced by 40%\\*. During the experiment, we made appropriate tweaks for PPLCNetV2_base, and make higher performance in recognition tasks while keeping the inference speed basically unchanged. Including: remove ReLU and FC layer at the end of the PPLCNetV2_base, and change the stride of the last stage (RepDepthwise Separated) to 1.\n", + "\n", + " **Note**: \\*The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.\n", + "\n", + "- Neck\n", + "\n", + " The Neck adopts BN Neck to standardize each dimension of the features extracted by Backbone, which reduces the difficulty of optimizing the metric learning loss function and the classification loss function at the same time, accelerates the convergence speed, and reduces the difference between IDLoss and TripletLoss due to optimization goals.\n", + "\n", + "- Head\n", + "\n", + " The Head adopts FC Layer, as classification head to convert features into logits for subsequent calculation of classification loss (generally using cross entropy loss, called CELoss or IDLoss).\n", + "\n", + "- Loss\n", + "\n", + " The Loss adopts Cross entropy loss and TripletAngularMarginLoss, using classification loss and cos-similarity based triplet loss to optimize the network during training. We improved based on the original TripletLoss (Hard Triplet Loss), with replacing the optimization objective from L2 Euclidean space to cosine space, and added a hard distance constraint between anchor and positive/negtive samples, making training and testing goal more closer, and the generalization ability of the model is improved. For detailed configuration files, please refer to [GeneralRecognitionV2_PPLCNetV2_base.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-L77).\n", + "\n", + "- Data Augmentation\n", + "\n", + " We consider that the mainbody may rotate to a certain extent and not maintain an upright state when the camera is shot in real scenes, so we add an [RandomRotation](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117) in data augmentation to improve the generalization ability of the model in real scenes.\n", + "\n", + "## 5. Note\n", + "PP-ShiTuV2 is looking for the most cost-effective image recognition solution in industrial practice. However, considering that the datasets of different recognition scenarios have their own distribution characteristics, as well as the limitations of software and hardware during training, it is difficult to integrate all datasets at one time. Therefore, it is recommended that users, after understanding the characteristics of your actual datasets, fine-tune or even make an further development on your own datasets based on the PP-ShiTuV2 pre-training model and training configuration, in order to obtain better performance and generalization.\n", + "\n", + "## 6. Reference\n", + "```log\n", + "@article{cui2021pp,\n", + " title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n", + " author={Cui, Cheng and Gao, Tingquan and Wei, Shengyu and Du, Yuning and Guo, Ruoyu and Dong, Shuilong and Lu, Bin and Zhou, Ying and Lv, Xueying and Liu, Qiwen and others},\n", + " journal={arXiv preprint arXiv:2109.15099},\n", + " year={2021}\n", + "}\n", + "\n", + "@InProceedings{Luo_2019_CVPR_Workshops,\n", + " author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei},\n", + " title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification},\n", + " booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},\n", + " month = {June},\n", + " year = {2019}\n", + "}\n", + "\n", + "@ARTICLE{Luo_2019_Strong_TMM,\n", + " author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}},\n", + " journal={IEEE Transactions on Multimedia},\n", + " title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification},\n", + " year={2019},\n", + " pages={1-1},\n", + " doi={10.1109/TMM.2019.2958756},\n", + " ISSN={1941-0077},\n", + "}\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.
\n", + "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "py35-paddle1.2.0" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.4" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}