introduction_en.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Introduction\n",
    "PP-YOLOE+ is a upgraded version of PP-YOLOE. Starting with a large-scale obj365 object detection pretraining model, PP-YOLOE+ not only greatly improves the convergence speed, but also improves the speed of the model on the COCO dataset. At the same time, PP-YOLOE+ greatly improves the end-to-end prediction speed including data preprocessing. For more details, please refer to [official documentation](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/configs/ppyoloe/README.md)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Model Effects\n",
    "PP-YOLOE+_l achieves 53.3 mAP on COCO test-dev2017 dataset with 78.1 FPS on Tesla V100. While using TensorRT FP16, PP-YOLOE+_l can be further accelerated to 149.2 FPS. PP-YOLOE+_s/m/x also have excellent accuracy and speed performance as shown below.\n",
    "<div align=\"center\">\n",
    "  <img src=\"https://raw.githubusercontent.com/PaddlePaddle/PaddleDetection/release/2.5/docs/images/ppyoloe_plus_map_fps.png\" width=500 />\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to use the model\n",
    "Clone PaddleDetection firstly and put the COCO-style dataset in `dataset/coco`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "%cd ~/work\n",
    "!git clone https://gitee.com/paddlepaddle/PaddleDetection\n",
    "%cd PaddleDetection\n",
    "!pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 Training\n",
    "Training PP-YOLOE+ on 8 GPUs with following command"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "# training with single GPU\n",
    "!python tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp\n",
    "\n",
    "# training with mutiple GPUs\n",
    "!python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Notes:**\n",
    "- If you need to evaluate while training, please add `--eval`.\n",
    "- PP-YOLOE supports mixed precision training, please add `--amp`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2 Inference\n",
    "#### 3.2.1 Inference with Paddle-TRT\n",
    "Next, we will introduce how to use Paddle Inference to deploy PP-YOLOE+ models in TensorRT FP16 mode.\n",
    "\n",
    "First, refer to [Paddle Inference Docs](https://www.paddlepaddle.org.cn/inference/master/user_guides/download_lib.html#python), download and install packages corresponding to CUDA, CUDNN and TensorRT version.\n",
    "\n",
    "Then, Exporting PP-YOLOE+ for Paddle Inference **with TensorRT**, use following command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "!python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams trt=True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, inference in TensorRT FP16 mode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "# inference single image\n",
    "!CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=trt_fp16\n",
    "\n",
    "# inference all images in the directory\n",
    "!CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_dir=demo/ --device=gpu  --run_mode=trt_fp16"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "*Notes:**\n",
    "- TensorRT will perform optimization for the current hardware platform according to the definition of the network, generate an inference engine and serialize it into a file. This inference engine is only applicable to the current hardware hardware platform. If your hardware and software platform has not changed, you can set `use_static=True` in [enable_tensorrt_engine](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/infer.py#L745). In this way, the serialized file generated will be saved in the `output_inference` folder, and the saved serialized file will be loaded the next time when TensorRT is executed.\n",
    "- PaddleDetection release/2.4 and later versions will support NMS calling TensorRT, which requires PaddlePaddle release/2.3 and later versions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2.2 Inference with PaddleInference\n",
    "For some AI acceleration hardware that does not support TensorRT, we can directly use PaddleInference to deploy. First run the following command to export the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "!python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_l_80e_coco.pdparams"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "Inference with PaddleInference directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "# inference single image\n",
    "!CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_file=demo/000000014439_640x640.jpg --device=gpu --run_mode=paddle\n",
    "\n",
    "# inference all images in the directory\n",
    "!CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/ppyoloe_plus_crn_l_80e_coco --image_dir=demo/ --device=gpu  --run_mode=paddle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3.2.3 Speed testing with ONNX-TensorRT\n",
    "**Using TensorRT Inference with ONNX** to test speed, run following command"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "# export inference model with trt=True\n",
    "!python tools/export_model.py -c configs/ppyoloe/ppyoloe_plus_crn_s_80e_coco.yml -o weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams exclude_nms=True trt=True\n",
    "\n",
    "# convert to onnx\n",
    "!paddle2onnx --model_dir output_inference/ppyoloe_plus_crn_s_80e_coco --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ppyoloe_plus_crn_s_80e_coco.onnx\n",
    "\n",
    "# trt inference using fp16 and batch_size=1\n",
    "!trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs1.engine --workspace=1024 --avgRuns=1000 --shapes=image:1x3x640x640,scale_factor:1x2 --fp16\n",
    "\n",
    "# trt inference using fp16 and batch_size=32\n",
    "!trtexec --onnx=./ppyoloe_plus_crn_s_80e_coco.onnx --saveEngine=./ppyoloe_s_bs32.engine --workspace=1024 --avgRuns=1000 --shapes=image:32x3x640x640,scale_factor:32x2 --fp16\n",
    "\n",
    "# Using the above script, T4 and tensorrt 7.2 machine, the speed of PPYOLOE-s model is as follows,\n",
    "\n",
    "# batch_size=1, 2.80ms, 357fps\n",
    "# batch_size=32, 67.69ms, 472fps"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Model principle\n",
    "The overall structure of PP-YOLOE+ is roughly the same as that of PP-YOLOE. PP-YOLOE+ improves the preformance with the following methods:\n",
    "- Pre training model using large-scale data set obj365\n",
    "- In the backbone, add the alpha parameter to the block branch\n",
    "- Optimize the end-to-end inference speed and improve the training convergence speed\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Attention\n",
    "**All commands run on AI Studio's `jupyter` by default. If running on a terminal, remove the % or ! at the beginning of the command.**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Related papers and citations\n",
    "```\n",
    "@article{xu2022pp,\n",
    "  title={PP-YOLOE: An evolved version of YOLO},\n",
    "  author={Xu, Shangliang and Wang, Xinxin and Lv, Wenyu and Chang, Qinyao and Cui, Cheng and Deng, Kaipeng and Wang, Guanzhong and Dang, Qingqing and Wei, Shengyu and Du, Yuning and others},\n",
    "  journal={arXiv preprint arXiv:2203.16250},\n",
    "  year={2022}\n",
    "}\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}