introduction_en.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-Human introduction\n",
    "PaddleDetection has provide out-of-the-box tools in pedestrian and vehicle analysis, and it support multiple input format such as images/videos/multi-videos/online video streams. This make it popular in smart-city\\smart transportation and so on. It can be deployed easily with GPU server and TensorRT, which achieves real-time performace.\n",
    "\n",
    "PP Human is officially produced by the PaddlePaddle and is a pedestrian analysis pipeline based on PaddleDetection.\n",
    "For more information about PaddleDetection, click https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/deploy/pipeline 。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Model effects and application scenarios\n",
    "### 2.1 PP-Human Model effects\n",
    "\n",
    "| Task                                   | End-to-End Speed（ms） | Model                                                                                                                                                                                                                                                                                                                           | Size                                                                                                   |\n",
    "|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:|\n",
    "| Pedestrian detection (high precision)  | 25.1ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                      | 182M                                                                                                   |\n",
    "| Pedestrian detection (lightweight)     | 16.2ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip)                                                                                                                                                                                                                      | 27M                                                                                                    |\n",
    "| Pedestrian tracking (high precision)   | 31.8ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                      | 182M                                                                                                   |\n",
    "| Pedestrian tracking (lightweight)      | 21.0ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip)                                                                                                                                                                                                                      | 27M                                                                                                    |\n",
    "|  MTMCT(REID)  |  Single Person 1.5ms | [REID](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | REID：92M |\n",
    "| Attribute recognition (high precision) | Single person8.5ms   | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip)                                                                                                         | Object detection：182M<br>Attribute recognition：86M                                                     |\n",
    "| Attribute recognition (lightweight)    | Single person 7.1ms  | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip)                                                                                                         | Object detection：182M<br>Attribute recognition：86M                                                     |\n",
    "| Falling detection                      | Single person 10ms   | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [Keypoint detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [Behavior detection based on key points](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-object tracking：182M<br>Keypoint detection：101M<br>Behavior detection based on key points: 21.8M |\n",
    "| Intrusion detection                    | 31.8ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                      | 182M                                                                                                   |\n",
    "| Fighting detection                     | 19.7ms               | [Video classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                       | 90M                                                                                                    |\n",
    "| Smoking detection                      | Single person 15.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[Object detection based on Human Id](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip)                                                                                        | Object detection：182M<br>Object detection based on Human ID: 27M                                       |\n",
    "| Phoning detection                      | Single person ms     | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[Image classification based on Human ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip)                                                                                         | Object detection：182M<br>Image classification based on Human ID：45M                                    |\n",
    "\n",
    "\n",
    "### 2.2 application scenarios：\n",
    "| Feature                                          |Advantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |Example                                                                                                                                     |\n",
    "| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |\n",
    "| **ReID**                                           | Extraordinary performance: special optimization for technical challenges such as target occlusion, uncompleted and blurry objects to achieve mAP 98.8, 1.5ms/person                                                                                                                                                                                                                                                                                                                                                    | <img src=\"https://user-images.githubusercontent.com/48054808/173037607-0a5deadc-076e-4dcc-bd96-d54eea205f1f.png\" title=\"\" alt=\"\" width=\"191\"> |\n",
    "| **Attribute analysis**                             | Compatible with a variety of data formats: support for images, video input<br/><br/>High performance: Integrated open-sourced datasets with real enterprise data for training, achieved mAP 94.86, 2ms/person<br/><br/>Support 26 attributes: gender, age, glasses, tops, shoes, hats, backpacks and other 26 high-frequency attributes                                                                                                                                                                                | <img src=\"https://user-images.githubusercontent.com/48054808/173036043-68b90df7-e95e-4ada-96ae-20f52bc98d7c.png\" title=\"\" alt=\"\" width=\"207\"> |\n",
    "| **Behaviour detection**                            | Rich function: support five high-frequency anomaly behavior detection of falling, fighting, smoking, telephoning, and intrusion<br/><br/>Robust: unlimited by different environmental backgrounds, light, and camera angles.<br/><br/>High performance: Compared with video recognition technology, it takes significantly smaller computation resources; support localization and service-oriented rapid deployment<br/><br/>Fast training: only takes 15 minutes to produce high precision behavior detection models | <img src=\"https://user-images.githubusercontent.com/48054808/173034825-623e4f78-22a5-4f14-9b83-dc47aa868478.gif\" title=\"\" alt=\"\" width=\"209\"> |\n",
    "| **Visitor traffic statistics**<br>**Trace record** | Simple and easy to use: single parameter to initiate functions of visitor traffic statistics and trace record                                                                                                                                                                                                                                                                                                                                                                                                          | <img src=\"https://user-images.githubusercontent.com/22989727/174736440-87cd5169-c939-48f8-90a1-0495a1fcb2b1.gif\" title=\"\" alt=\"\" width=\"200\"> |\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to use the model¶\n",
    "（You need to add \"!\" when running on Jupyter Notebook, Add \"%\" if cd command）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "# clone PaddleDetection\n",
    "%cd ~/work/\n",
    "!git clone https://github.com/PaddlePaddle/PaddleDetection.git\n",
    "\n",
    "# Other Dependencies\n",
    "%cd PaddleDetection\n",
    "!pip install -r requirements.txt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "### 3.1 Configuration\n",
    "\n",
    "The PP-Human-related configuration is located in ``deploy/pipeline/config/infer_cfg_pphuman.yml``, and this configuration file contains all the features currently supported by PP-Human. If you want to see the configuration for a specific feature, please refer to the relevant configuration in ``deploy/pipeline/config/examples/``. In addition, the contents of the configuration file can be modified with the `-o`command line parameter. E.g. to modify the model directory of an attribute, developers can run ```-o ATTR.model_dir=\"DIR_PATH\"``.\n",
    "\n",
    "The features and corresponding task types are as follows.\n",
    "\n",
    "| Input               | Feature               | Task                                                       | Config                  |\n",
    "| ------------------- | --------------------- | ---------------------------------------------------------- | ----------------------- |\n",
    "| Image               | Attribute Recognition | Object Detection Attribute Recognition                     | DET ATTR                |\n",
    "| Single-camera video | Attribute Recognition | Multi-Object Tracking Attribute Recognition                | MOT ATTR                |\n",
    "| Single-camera video | Behaviour Recognition | Multi-Object Tracking Keypoint Detection Falling detection | MOT KPT SKELETON_ACTION |\n",
    "\n",
    "Take attribute recognition based on video input as an example: Its task type includes multi-object tracking and attributes recognition. The specific configuration is as follows.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "```\n",
    "crop_thresh: 0.5\n",
    "attr_thresh: 0.5\n",
    "visual: True\n",
    "\n",
    "MOT:\n",
    "  model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip\n",
    "  tracker_config: deploy/pipeline/config/tracker_config.yml\n",
    "  batch_size: 1\n",
    "  enable: True\n",
    "\n",
    "ATTR:\n",
    "  model_dir:  https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip\n",
    "  batch_size: 8\n",
    "  enable: True\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:**\n",
    "\n",
    "- If developer needs to carry out different tasks, set the corresponding enables option to be True in the configuration file.\n",
    "\n",
    "### 3.2 Inference Deployment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "#Use the default configuration directly or the configuration file in examples, or modify the configuration in `infer_cfg_pphuman.yml`\n",
    "# Example: In pedestrian detection model, specify configuration file path and test image, and image input opens detection model by default\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu\n",
    "# Example: In pedestrian attribute recognition, directly configure the examples\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu\n",
    "   \n",
    "\n",
    "#Use the command line to enable functions or change the model path.\n",
    "# Example: Pedestrian tracking, specify config file path, model path and test video. The specified model path on the command line has a higher priority than the config file.\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu\n",
    "\n",
    "# Example: In behaviour recognition, with fall recognition as an example, enable the SKELETON_ACTION model on the command line\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o SKELETON_ACTION.enbale=True --video_file=test_video.mp4 --device=gpu\n",
    "\n",
    "\n",
    "#rtsp push/pull stream\n",
    "#For rtsp pull stream, use `--rtsp RTSP [RTSP ...]` parameter to specify one or more rtsp streams. Separate the multiple addresses with a space, or replace the video address directly after the video_file with the rtsp stream address), examples as follows\n",
    "\n",
    "# Example: Single video stream for pedestrian attribute recognition\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE]  --device=gpu\n",
    "# Example: Multiple-video stream for pedestrian attribute recognition\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu                                                                      |\n",
    "\n",
    "\n",
    "#rtsp push stream\n",
    "\n",
    "#For rtsp push stream, use `--pushurl rtsp:[IP]` parameter to push stream to a IP set, and you can visualize the output video by [VLC Player](https://vlc.onl/) with the `open network` funciton. the whole url path is `rtsp:[IP]/videoname`, the videoname here is the basename of the video file to infer, and the default of videoname is `output` when the video come from local camera and there is no video name. \n",
    "# Example：Pedestrian attribute recognition，in this example the whole url path is: rtsp://[YOUR_SERVER_IP]:8554/test_video\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4  --device=gpu --pushurl rtsp://[YOUR_SERVER_IP]:8554\n",
    "\n",
    "#Note: \n",
    "#1. rtsp push stream is based on [rtsp-simple-server](https://github.com/aler9/rtsp-simple-server), please enable this serving first.\n",
    "#2. the output visualize will be frozen frequently if the model cost too much time, we suggest to use faster model like ppyoloe_s in tracking, this is simply replace mot_ppyoloe_l_36e_pipeline.zip with mot_ppyoloe_s_36e_pipeline.zip in model config yaml file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 3.3 Jetson Deployment\n",
    "\n",
    "Due to the large gap in computing power of the Jetson platform compared to the server, we suggest:\n",
    "\n",
    "1. choose a lightweight model, especially for tracking model, `ppyoloe_s: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip` is recommended\n",
    "2. For frame skipping of tracking; we recommend 2 or 3: `skip_frame_num: 3`\n",
    "\n",
    "With this recommended configuration, it is possible to achieve higher speeds on the TX2 platform. It has been tested with attribute case, with speeds up to 20fps. The configuration file can be modified directly (recommended) or from the command line (not recommended due to its long fields).\n",
    "\n",
    "### Parameters\n",
    "\n",
    "| Parameters             | Necessity | Implications                                                                                                                                                                                                                                      |\n",
    "| ---------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n",
    "| --config               | Yes       | Path to configuration file                                                                                                                                                                                                                        |\n",
    "| -o                     | Option    | Overwrite the corresponding configuration in the configuration file                                                                                                                                                                               |\n",
    "| --image_file           | Option    | Images to be predicted                                                                                                                                                                                                                            |\n",
    "| --image_dir            | Option    | Path to the images folder to be predicted                                                                                                                                                                                                         |\n",
    "| --video_file           | Option    | Video to be predicted, or rtsp stream address (rtsp parameter recommended)                                                                                                                                                                        |\n",
    "| --rtsp                 | Option    | rtsp video stream address, supports one or more simultaneous streams input                                                                                                                                                                        |\n",
    "| --camera_id            | Option    | The camera ID for prediction, default is -1 ( for no camera prediction, can be set to 0 - (number of cameras - 1) ), press `q` in the visualization interface during the prediction process to output the prediction result to: output/output.mp4 |\n",
    "| --device               | Option    | Running device, options include `CPU/GPU/XPU`, and the default is `CPU`.                                                                                                                                                                          |\n",
    "| --pushurl              | Option    | push the output video to rtsp stream, normaly start with `rtsp://`; this has higher priority than local video save, while this is set, pipeline will not save local visualize video, the default is \"\", means this will not work now.\n",
    "                |\n",
    "| --output_dir           | Option    | The root directory for the visualization results, and the default is output/                                                                                                                                                                      |\n",
    "| --run_mode             | Option    | For GPU, the default is paddle, with (paddle/trt_fp32/trt_fp16/trt_int8) as optional                                                                                                                                                              |\n",
    "| --enable_mkldnn        | Option    | Whether to enable MKLDNN acceleration in CPU prediction, the default is False                                                                                                                                                                     |\n",
    "| --cpu_threads          | Option    | Set the number of cpu threads, and the default is 1                                                                                                                                                                                               |\n",
    "| --trt_calib_mode       | Option    | Whether TensorRT uses the calibration function, and the default is False; set to True when using TensorRT's int8 function and False when using the PaddleSlim quantized model                                                                     |\n",
    "| --do_entrance_counting | Option    | Whether to count entrance/exit traffic flows, the default is False                                                                                                                                                                                |\n",
    "| --draw_center_traj     | Option    | Whether to map the trace, the default is False                                                                                                                                                                                                    |\n",
    "| --region_type          | Option    | 'horizontal' (default), 'vertical': traffic count direction; 'custom': set break-in area                                                                                                                                                          |\n",
    "| --region_polygon       | Option    | Set the coordinates of the polygon multipoint in the break-in area. No default.                                                                                                                                                                   |\n",
    "| --do_break_in_counting | Option    | Area break-in checks                                                                                                                                                                                                                              |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Solutions\n",
    "The overall solution for PP-Human is shown in the graph below:\n",
    "\n",
    "<div width=\"1000\" align=\"center\">\n",
    "  <img src=\"pphumanv2.png\"/>\n",
    "</div>\n",
    "\n",
    "### Pedestrian detection\n",
    "- Take PP-YOLOE L as the object detection model\n",
    "- For detailed documentation, please refer to [PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/configs/ppyoloe) and [Multiple-Object-Tracking](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_mot_en.md)\n",
    "\n",
    "### Pedestrian tracking\n",
    "- Vehicle tracking by SDE solution\n",
    "- Adopt PP-YOLOE L (high precision) and S (lightweight) for detection models\n",
    "- Adopt the OC-SORT solution for racking module\n",
    "- Refer to [OC-SORT](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/configs/mot/ocsort) and [Multi-Object Tracking](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_mot_en.md)\n",
    "\n",
    "### Multi-camera & multi-pedestrain tracking\n",
    "- Use PP-YOLOE & OC-SORT to acquire single-camera multi-object tracking trajectory\n",
    "- Extract features for each frame using ReID (StrongBaseline network).\n",
    "- Match multi-camera trajectory features to obtain multi-camera tracking results.\n",
    "- Refer to [Multi-camera & multi-pedestrain tracking](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md))\n",
    "\n",
    "### Attribute Recognition\n",
    "- Use PP-YOLOE + OC-SORT to track the human body.\n",
    "- Use PP-HGNet, PP-LCNet (multi-classification model) to complete the attribute recognition. Main attributes include age, gender, hat, eyes, top and bottom dressing style, backpack.\n",
    "- Refer to [attribute recognition](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md)\n",
    "\n",
    "### Behaviour Recognition:\n",
    "- Four behaviour recognition solutions are provided:\n",
    "\n",
    "- 1. Behaviour recognition based on skeletal points, e.g. falling recognition\n",
    "\n",
    "- 2. Behaviour recognition based on image classification, e.g. phone call recognition\n",
    "\n",
    "- 3. Behaviour recognition based on detection, e.g. smoking recognition\n",
    "\n",
    "- 4. Behaviour recognition based on Video classification, e.g. fighting recognition\n",
    "\n",
    "- For details, please refer to [Behaviour Recognition]https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_action_en.md)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}