introduction_en.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "jupyter": {
     "outputs_hidden": false
    }
   },
   "source": [
    "## 1. Introduction\n",
    "PP-ShiTuv2 is a practical lightweight general image recognition system improved on PP-ShitUV1. It is composed of three modules: mainbody detection, feature extraction and vector search. Compared with PP-ShiTuV1, PP-ShiTuV2 has higher recognition accuracy, stronger generalization and similar inference speed *. This paper mainly optimize in training dataset, feature extraction with better backbone network, loss function and training strategy, which significantly improved the retrieval performance of PP-ShiTuV2 in multiple practical application scenarios.\n",
    "\n",
    "The PP-ShiTuV2 model is officially produced by PaddleClas, which is an optimized and improved recognition retrieval model by PaddleClas. More about PaddleClas can be found at https://github.com/PaddlePaddle/PaddleClas.\n",
    "\n",
    "## 2. Preview and application scenarios\n",
    "### 2.1 product recognition：\n",
    "\n",
    "#### 2.1.1 dataset：\n",
    "\n",
    "Including Aliproduct and GLDv2. For details, please refer to [PP-ShiTuV2 Experiment Section](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#4-%E5%AE%9E%E9%AA%8C%E9%83%A8%E5%88%86)\n",
    "\n",
    "#### 2.1.2 output preview：\n",
    "\n",
    "for example, the output of PP-ShiTuV2 on the picture is as follows\n",
    "\n",
    "![](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/images/recognition/drink_data_demo/output/100.jpeg?raw=true)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to use\n",
    "\n",
    "### 3.1 model inference：\n",
    "\n",
    "- Install PaddleClas and its dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true,
    "execution": {
     "iopub.execute_input": "2022-11-08T08:24:16.514016Z",
     "iopub.status.busy": "2022-11-08T08:24:16.513368Z",
     "iopub.status.idle": "2022-11-08T08:25:00.630629Z",
     "shell.execute_reply": "2022-11-08T08:25:00.629113Z",
     "shell.execute_reply.started": "2022-11-08T08:24:16.513971Z"
    },
    "jupyter": {
     "outputs_hidden": true
    },
    "scrolled": true,
    "tags": []
   },
   "source": [
    "Install paddlepaddle-gpu if using GPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install paddlepaddle-gpu -i https://mirror.baidu.com/pypi/simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install paddlepaddle if using CPU"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Install paddleclas whl package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install paddleclas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Quick start\n",
    "\n",
    "Congratulations! You have successfully installed PaddleClas, now you can experience the image recognition as guided below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-11-08T08:26:11.091828Z",
     "iopub.status.busy": "2022-11-08T08:26:11.090376Z",
     "iopub.status.idle": "2022-11-08T08:29:06.202735Z",
     "shell.execute_reply": "2022-11-08T08:29:06.201197Z",
     "shell.execute_reply.started": "2022-11-08T08:26:11.091754Z"
    },
    "scrolled": true,
    "tags": []
   },
   "outputs": [],
   "source": [
    "# download and unzip demo data\n",
    "!wget -nc https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/rec/data/drink_dataset_v2.0.tar && tar -xf drink_dataset_v2.0.tar\n",
    "\n",
    "# build gallery\n",
    "!paddleclas --build_gallery=True --model_name=\"PP-ShiTuV2\" \\\n",
    "-o IndexProcess.image_root=./drink_dataset_v2.0/gallery/ \\\n",
    "-o IndexProcess.index_dir=./drink_dataset_v2.0/index \\\n",
    "-o IndexProcess.data_file=./drink_dataset_v2.0/gallery/drink_label.txt\n",
    "\n",
    "# run recognition\n",
    "!paddleclas --model_name=\"PP-ShiTuV2\" --predict_type=shitu \\\n",
    "-o Global.infer_imgs='./drink_dataset_v2.0/test_images/100.jpeg' \\\n",
    "-o IndexProcess.index_dir='./drink_dataset_v2.0/index'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "the result are below\n",
    "```log\n",
    "ppcls INFO: [{'bbox': [437, 71, 660, 728], 'rec_docs': '元气森林', 'rec_scores': 0.7740249}, {'bbox': [221, 72, 449, 701], 'rec_docs': '元气森林', 'rec_scores': 0.6950992}, {'bbox': [794, 104, 979, 652], 'rec_docs': '元气森林', 'rec_scores': 0.6305153}], filename: ./drink_dataset_v2.0/test_images/100.jpeg\n",
    "ppcls INFO: Predict complete!\n",
    "```\n",
    "\n",
    "if you need to save and visualize result as picture，please refer to [PP-ShiTuV2 inference](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/en/PPShiTu/PPShiTuV2_introduction.md#4-inference-deployment), inference with full PaddleClas code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Model training\n",
    "\n",
    "- clone PaddleClas repo(refer to [3.1 model inference](#31-model-inference))\n",
    "- For the dataset preparation, training, evaluation and other steps of the mainbody detection model, please refer to [PP-ShiTuV2 mainbody detection doc](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/mainbody_detection.md)\n",
    "- For the dataset preparation, training, evaluation and other steps of the feature extraction model, please refer to [PP-ShiTuV2 feature extraction doc](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/docs/zh_CN/training/PP-ShiTu/feature_extraction.md#5-%E8%87%AA%E5%AE%9A%E4%B9%89%E7%89%B9%E5%BE%81%E6%8F%90%E5%8F%96)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Algorithm\n",
    "PP-ShiTu series recognition systems, including PP-ShiTuV2 introduced in this document, consists of three modules to complete the entire recognition process, as shown in the figure below\n",
    "\n",
    "![PP-ShiTu System](https://github.com/PaddlePaddle/PaddleClas/raw/develop/docs/images/structure.jpg)\n",
    "\n",
    "- Mainbody detection: The blue colored module in the figure above detects potential targets in the input image, and then cropping these targets, filtering unimportant backgrounds, and reducing background interference. In fact, this practice of retaining the mainbody and filtering the background is a simple, effective and widely used method in practice.\n",
    "- Feature extraction: Receive the cropped image containing the target mainbody output by the **mainbody detection** module, and input it into the feature extraction model to obtain the corresponding feature vector, which is used as the representation feature of the image for subsequent retrieval.\n",
    "- Vector retrieval: Receive one or more feature vectors output by the **feature extraction** module, and retrieve them one by one in the vector library, finally return the retrieval result. This module does not require additional training and can be used by installing the third-party open source faiss retrieval library\n",
    "\n",
    "In the recognition system, one of the most important modules is the feature extraction model. The generalization of feature extraction model directly affects the quality of the vectors in the retrieval library and the vectors to be retrieved. Therefore, we will introduce feature extraction model below in 5 parts.\n",
    "\n",
    "- Backbone\n",
    "\n",
    "    The Backbone adopts PP-LCNetV2_base. On the basis of PPLCNet_V1, it adds multiple optimization points including Rep strategy, PW convolution, Shortcut, activation function improvement, SE module improvement, etc., so that the final classification accuracy is similar to PPLCNet_x2_5, but the inference latency has been reduced by 40%\\*. During the experiment, we made appropriate tweaks for PPLCNetV2_base, and make higher performance in recognition tasks while keeping the inference speed basically unchanged. Including: remove ReLU and FC layer at the end of the PPLCNetV2_base, and change the stride of the last stage (RepDepthwise Separated) to 1.\n",
    "\n",
    "    **Note**: \\*The inference environment is based on Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz hardware platform, OpenVINO inference platform.\n",
    "\n",
    "- Neck\n",
    "\n",
    "    The Neck adopts BN Neck to standardize each dimension of the features extracted by Backbone, which reduces the difficulty of optimizing the metric learning loss function and the classification loss function at the same time, accelerates the convergence speed, and reduces the difference between IDLoss and TripletLoss due to optimization goals.\n",
    "\n",
    "- Head\n",
    "\n",
    "    The Head adopts FC Layer, as classification head to convert features into logits for subsequent calculation of classification loss (generally using cross entropy loss, called CELoss or IDLoss).\n",
    "\n",
    "- Loss\n",
    "\n",
    "    The Loss adopts Cross entropy loss and TripletAngularMarginLoss, using classification loss and cos-similarity based triplet loss to optimize the network during training. We improved based on the original TripletLoss (Hard Triplet Loss), with replacing the optimization objective from L2 Euclidean space to cosine space, and added a hard distance constraint between anchor and positive/negtive samples, making training and testing goal more closer, and the generalization ability of the model is improved. For detailed configuration files, please refer to [GeneralRecognitionV2_PPLCNetV2_base.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L63-L77).\n",
    "\n",
    "- Data Augmentation\n",
    "\n",
    "    We consider that the mainbody may rotate to a certain extent and not maintain an upright state when the camera is shot in real scenes, so we add an [RandomRotation](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.5/ppcls/configs/GeneralRecognitionV2/GeneralRecognitionV2_PPLCNetV2_base.yaml#L117-L120) in data augmentation to improve the generalization ability of the model in real scenes.\n",
    "\n",
    "## 5. Note\n",
    "PP-ShiTuV2 is looking for the most cost-effective image recognition solution in industrial practice. However, considering that the datasets of different recognition scenarios have their own distribution characteristics, as well as the limitations of software and hardware during training, it is difficult to integrate all datasets at one time. Therefore, it is recommended that users, after understanding the characteristics of your actual datasets, fine-tune or even make an further development on your own datasets based on the PP-ShiTuV2 pre-training model and training configuration, in order to obtain better performance and generalization.\n",
    "\n",
    "## 6. Reference\n",
    "```log\n",
    "@article{cui2021pp,\n",
    "    title={PP-LCNet: A Lightweight CPU Convolutional Neural Network},\n",
    "    author={Cui, Cheng and Gao, Tingquan and Wei, Shengyu and Du, Yuning and Guo, Ruoyu and Dong, Shuilong and Lu, Bin and Zhou, Ying and Lv, Xueying and Liu, Qiwen and others},\n",
    "    journal={arXiv preprint arXiv:2109.15099},\n",
    "    year={2021}\n",
    "}\n",
    "\n",
    "@InProceedings{Luo_2019_CVPR_Workshops,\n",
    "    author = {Luo, Hao and Gu, Youzhi and Liao, Xingyu and Lai, Shenqi and Jiang, Wei},\n",
    "    title = {Bag of Tricks and a Strong Baseline for Deep Person Re-Identification},\n",
    "    booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},\n",
    "    month = {June},\n",
    "    year = {2019}\n",
    "}\n",
    "\n",
    "@ARTICLE{Luo_2019_Strong_TMM,\n",
    "    author={H. {Luo} and W. {Jiang} and Y. {Gu} and F. {Liu} and X. {Liao} and S. {Lai} and J. {Gu}},\n",
    "    journal={IEEE Transactions on Multimedia},\n",
    "    title={A Strong Baseline and Batch Normalization Neck for Deep Person Re-identification},\n",
    "    year={2019},\n",
    "    pages={1-1},\n",
    "    doi={10.1109/TMM.2019.2958756},\n",
    "    ISSN={1941-0077},\n",
    "}\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.  <br>\n",
    "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}