introduction_en.ipynb 28.7 KB
Notebook
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. PP-Human introduction\n",
    "PaddleDetection has provide out-of-the-box tools in pedestrian and vehicle analysis, and it support multiple input format such as images/videos/multi-videos/online video streams. This make it popular in smart-city\\smart transportation and so on. It can be deployed easily with GPU server and TensorRT, which achieves real-time performace.\n",
    "\n",
    "PP Human is officially produced by the PaddlePaddle and is a pedestrian analysis pipeline based on PaddleDetection.\n",
    "For more information about PaddleDetection, click https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.5/deploy/pipeline 。\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Model effects and application scenarios\n",
    "### 2.1 PP-Human Model effects\n",
    "\n",
    "| Task                                   | End-to-End Speed(ms) | Model                                                                                                                                                                                                                                                                                                                           | Size                                                                                                   |\n",
    "|:--------------------------------------:|:--------------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:------------------------------------------------------------------------------------------------------:|\n",
    "| Pedestrian detection (high precision)  | 25.1ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                      | 182M                                                                                                   |\n",
    "| Pedestrian detection (lightweight)     | 16.2ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip)                                                                                                                                                                                                                      | 27M                                                                                                    |\n",
    "| Pedestrian tracking (high precision)   | 31.8ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                      | 182M                                                                                                   |\n",
    "| Pedestrian tracking (lightweight)      | 21.0ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip)                                                                                                                                                                                                                      | 27M                                                                                                    |\n",
    "|  MTMCT(REID)  |  Single Person 1.5ms | [REID](https://bj.bcebos.com/v1/paddledet/models/pipeline/reid_model.zip) | REID:92M |\n",
    "| Attribute recognition (high precision) | Single person8.5ms   | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip)                                                                                                         | Object detection:182M<br>Attribute recognition:86M                                                     |\n",
    "| Attribute recognition (lightweight)    | Single person 7.1ms  | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br> [Attribute recognition](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.zip)                                                                                                         | Object detection:182M<br>Attribute recognition:86M                                                     |\n",
    "| Falling detection                      | Single person 10ms   | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) <br> [Keypoint detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip) <br> [Behavior detection based on key points](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) | Multi-object tracking:182M<br>Keypoint detection:101M<br>Behavior detection based on key points: 21.8M |\n",
    "| Intrusion detection                    | 31.8ms               | [Multi-object tracking](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                      | 182M                                                                                                   |\n",
    "| Fighting detection                     | 19.7ms               | [Video classification](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)                                                                                                                                                                                                                       | 90M                                                                                                    |\n",
    "| Smoking detection                      | Single person 15.1ms | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[Object detection based on Human Id](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.zip)                                                                                        | Object detection:182M<br>Object detection based on Human ID: 27M                                       |\n",
    "| Phoning detection                      | Single person ms     | [Object detection](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip)<br>[Image classification based on Human ID](https://bj.bcebos.com/v1/paddledet/models/pipeline/PPHGNet_tiny_calling_halfbody.zip)                                                                                         | Object detection:182M<br>Image classification based on Human ID:45M                                    |\n",
    "\n",
    "\n",
    "### 2.2 application scenarios:\n",
    "| Feature                                          |Advantages                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |Example                                                                                                                                     |\n",
    "| -------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- |\n",
    "| **ReID**                                           | Extraordinary performance: special optimization for technical challenges such as target occlusion, uncompleted and blurry objects to achieve mAP 98.8, 1.5ms/person                                                                                                                                                                                                                                                                                                                                                    | <img src=\"https://user-images.githubusercontent.com/48054808/173037607-0a5deadc-076e-4dcc-bd96-d54eea205f1f.png\" title=\"\" alt=\"\" width=\"191\"> |\n",
    "| **Attribute analysis**                             | Compatible with a variety of data formats: support for images, video input<br/><br/>High performance: Integrated open-sourced datasets with real enterprise data for training, achieved mAP 94.86, 2ms/person<br/><br/>Support 26 attributes: gender, age, glasses, tops, shoes, hats, backpacks and other 26 high-frequency attributes                                                                                                                                                                                | <img src=\"https://user-images.githubusercontent.com/48054808/173036043-68b90df7-e95e-4ada-96ae-20f52bc98d7c.png\" title=\"\" alt=\"\" width=\"207\"> |\n",
    "| **Behaviour detection**                            | Rich function: support five high-frequency anomaly behavior detection of falling, fighting, smoking, telephoning, and intrusion<br/><br/>Robust: unlimited by different environmental backgrounds, light, and camera angles.<br/><br/>High performance: Compared with video recognition technology, it takes significantly smaller computation resources; support localization and service-oriented rapid deployment<br/><br/>Fast training: only takes 15 minutes to produce high precision behavior detection models | <img src=\"https://user-images.githubusercontent.com/48054808/173034825-623e4f78-22a5-4f14-9b83-dc47aa868478.gif\" title=\"\" alt=\"\" width=\"209\"> |\n",
    "| **Visitor traffic statistics**<br>**Trace record** | Simple and easy to use: single parameter to initiate functions of visitor traffic statistics and trace record                                                                                                                                                                                                                                                                                                                                                                                                          | <img src=\"https://user-images.githubusercontent.com/22989727/174736440-87cd5169-c939-48f8-90a1-0495a1fcb2b1.gif\" title=\"\" alt=\"\" width=\"200\"> |\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. How to use the model¶\n",
    "(You need to add \"!\" when running on Jupyter Notebook, Add \"%\" if cd command)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "# clone PaddleDetection\n",
L
LokeZhou 已提交
68 69
    "%cd ~/work/\n",
    "!git clone https://github.com/PaddlePaddle/PaddleDetection.git\n",
70 71
    "\n",
    "# Other Dependencies\n",
L
LokeZhou 已提交
72 73
    "%cd PaddleDetection\n",
    "!pip install -r requirements.txt\n"
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "### 3.1 Configuration\n",
    "\n",
    "The PP-Human-related configuration is located in ``deploy/pipeline/config/infer_cfg_pphuman.yml``, and this configuration file contains all the features currently supported by PP-Human. If you want to see the configuration for a specific feature, please refer to the relevant configuration in ``deploy/pipeline/config/examples/``. In addition, the contents of the configuration file can be modified with the `-o`command line parameter. E.g. to modify the model directory of an attribute, developers can run ```-o ATTR.model_dir=\"DIR_PATH\"``.\n",
    "\n",
    "The features and corresponding task types are as follows.\n",
    "\n",
    "| Input               | Feature               | Task                                                       | Config                  |\n",
    "| ------------------- | --------------------- | ---------------------------------------------------------- | ----------------------- |\n",
    "| Image               | Attribute Recognition | Object Detection Attribute Recognition                     | DET ATTR                |\n",
    "| Single-camera video | Attribute Recognition | Multi-Object Tracking Attribute Recognition                | MOT ATTR                |\n",
    "| Single-camera video | Behaviour Recognition | Multi-Object Tracking Keypoint Detection Falling detection | MOT KPT SKELETON_ACTION |\n",
    "\n",
    "Take attribute recognition based on video input as an example: Its task type includes multi-object tracking and attributes recognition. The specific configuration is as follows.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
    "```\n",
    "crop_thresh: 0.5\n",
    "attr_thresh: 0.5\n",
    "visual: True\n",
    "\n",
    "MOT:\n",
    "  model_dir: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip\n",
    "  tracker_config: deploy/pipeline/config/tracker_config.yml\n",
    "  batch_size: 1\n",
    "  enable: True\n",
    "\n",
    "ATTR:\n",
    "  model_dir:  https://bj.bcebos.com/v1/paddledet/models/pipeline/PPLCNet_x1_0_person_attribute_945_infer.zip\n",
    "  batch_size: 8\n",
    "  enable: True\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:**\n",
    "\n",
    "- If developer needs to carry out different tasks, set the corresponding enables option to be True in the configuration file.\n",
    "\n",
    "### 3.2 Inference Deployment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "vscode": {
     "languageId": "plaintext"
    }
   },
   "outputs": [],
   "source": [
L
LokeZhou 已提交
146 147 148 149 150 151 152 153
    "#Use the default configuration directly or the configuration file in examples, or modify the configuration in `infer_cfg_pphuman.yml`\n",
    "# Example: In pedestrian detection model, specify configuration file path and test image, and image input opens detection model by default\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml --image_file=test_image.jpg --device=gpu\n",
    "# Example: In pedestrian attribute recognition, directly configure the examples\n",
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4 --device=gpu\n",
    "   \n",
    "\n",
    "#Use the command line to enable functions or change the model path.\n",
154
    "# Example: Pedestrian tracking, specify config file path, model path and test video. The specified model path on the command line has a higher priority than the config file.\n",
L
LokeZhou 已提交
155
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o MOT.enable=True MOT.model_dir=ppyoloe_infer/ --video_file=test_video.mp4 --device=gpu\n",
156 157
    "\n",
    "# Example: In behaviour recognition, with fall recognition as an example, enable the SKELETON_ACTION model on the command line\n",
L
LokeZhou 已提交
158
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml -o SKELETON_ACTION.enbale=True --video_file=test_video.mp4 --device=gpu\n",
159 160
    "\n",
    "\n",
L
LokeZhou 已提交
161 162
    "#rtsp push/pull stream\n",
    "#For rtsp pull stream, use `--rtsp RTSP [RTSP ...]` parameter to specify one or more rtsp streams. Separate the multiple addresses with a space, or replace the video address directly after the video_file with the rtsp stream address), examples as follows\n",
163 164
    "\n",
    "# Example: Single video stream for pedestrian attribute recognition\n",
L
LokeZhou 已提交
165
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE]  --device=gpu\n",
166
    "# Example: Multiple-video stream for pedestrian attribute recognition\n",
L
LokeZhou 已提交
167
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml -o visual=False --rtsp rtsp://[YOUR_RTSP_SITE1] rtsp://[YOUR_RTSP_SITE2] --device=gpu                                                                      |\n",
168 169
    "\n",
    "\n",
L
LokeZhou 已提交
170
    "#rtsp push stream\n",
171
    "\n",
L
LokeZhou 已提交
172
    "#For rtsp push stream, use `--pushurl rtsp:[IP]` parameter to push stream to a IP set, and you can visualize the output video by [VLC Player](https://vlc.onl/) with the `open network` funciton. the whole url path is `rtsp:[IP]/videoname`, the videoname here is the basename of the video file to infer, and the default of videoname is `output` when the video come from local camera and there is no video name. \n",
173
    "# Example:Pedestrian attribute recognition,in this example the whole url path is: rtsp://[YOUR_SERVER_IP]:8554/test_video\n",
L
LokeZhou 已提交
174 175 176 177 178
    "!python deploy/pipeline/pipeline.py --config deploy/pipeline/config/examples/infer_cfg_human_attr.yml --video_file=test_video.mp4  --device=gpu --pushurl rtsp://[YOUR_SERVER_IP]:8554\n",
    "\n",
    "#Note: \n",
    "#1. rtsp push stream is based on [rtsp-simple-server](https://github.com/aler9/rtsp-simple-server), please enable this serving first.\n",
    "#2. the output visualize will be frozen frequently if the model cost too much time, we suggest to use faster model like ppyoloe_s in tracking, this is simply replace mot_ppyoloe_l_36e_pipeline.zip with mot_ppyoloe_s_36e_pipeline.zip in model config yaml file."
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### 3.3 Jetson Deployment\n",
    "\n",
    "Due to the large gap in computing power of the Jetson platform compared to the server, we suggest:\n",
    "\n",
    "1. choose a lightweight model, especially for tracking model, `ppyoloe_s: https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_s_36e_pipeline.zip` is recommended\n",
    "2. For frame skipping of tracking; we recommend 2 or 3: `skip_frame_num: 3`\n",
    "\n",
    "With this recommended configuration, it is possible to achieve higher speeds on the TX2 platform. It has been tested with attribute case, with speeds up to 20fps. The configuration file can be modified directly (recommended) or from the command line (not recommended due to its long fields).\n",
    "\n",
    "### Parameters\n",
    "\n",
    "| Parameters             | Necessity | Implications                                                                                                                                                                                                                                      |\n",
    "| ---------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n",
    "| --config               | Yes       | Path to configuration file                                                                                                                                                                                                                        |\n",
    "| -o                     | Option    | Overwrite the corresponding configuration in the configuration file                                                                                                                                                                               |\n",
    "| --image_file           | Option    | Images to be predicted                                                                                                                                                                                                                            |\n",
    "| --image_dir            | Option    | Path to the images folder to be predicted                                                                                                                                                                                                         |\n",
    "| --video_file           | Option    | Video to be predicted, or rtsp stream address (rtsp parameter recommended)                                                                                                                                                                        |\n",
    "| --rtsp                 | Option    | rtsp video stream address, supports one or more simultaneous streams input                                                                                                                                                                        |\n",
    "| --camera_id            | Option    | The camera ID for prediction, default is -1 ( for no camera prediction, can be set to 0 - (number of cameras - 1) ), press `q` in the visualization interface during the prediction process to output the prediction result to: output/output.mp4 |\n",
    "| --device               | Option    | Running device, options include `CPU/GPU/XPU`, and the default is `CPU`.                                                                                                                                                                          |\n",
    "| --pushurl              | Option    | push the output video to rtsp stream, normaly start with `rtsp://`; this has higher priority than local video save, while this is set, pipeline will not save local visualize video, the default is \"\", means this will not work now.\n",
    "                |\n",
    "| --output_dir           | Option    | The root directory for the visualization results, and the default is output/                                                                                                                                                                      |\n",
    "| --run_mode             | Option    | For GPU, the default is paddle, with (paddle/trt_fp32/trt_fp16/trt_int8) as optional                                                                                                                                                              |\n",
    "| --enable_mkldnn        | Option    | Whether to enable MKLDNN acceleration in CPU prediction, the default is False                                                                                                                                                                     |\n",
    "| --cpu_threads          | Option    | Set the number of cpu threads, and the default is 1                                                                                                                                                                                               |\n",
    "| --trt_calib_mode       | Option    | Whether TensorRT uses the calibration function, and the default is False; set to True when using TensorRT's int8 function and False when using the PaddleSlim quantized model                                                                     |\n",
    "| --do_entrance_counting | Option    | Whether to count entrance/exit traffic flows, the default is False                                                                                                                                                                                |\n",
    "| --draw_center_traj     | Option    | Whether to map the trace, the default is False                                                                                                                                                                                                    |\n",
    "| --region_type          | Option    | 'horizontal' (default), 'vertical': traffic count direction; 'custom': set break-in area                                                                                                                                                          |\n",
    "| --region_polygon       | Option    | Set the coordinates of the polygon multipoint in the break-in area. No default.                                                                                                                                                                   |\n",
    "| --do_break_in_counting | Option    | Area break-in checks                                                                                                                                                                                                                              |"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Solutions\n",
    "The overall solution for PP-Human is shown in the graph below:\n",
    "\n",
    "<div width=\"1000\" align=\"center\">\n",
    "  <img src=\"pphumanv2.png\"/>\n",
    "</div>\n",
    "\n",
    "### Pedestrian detection\n",
    "- Take PP-YOLOE L as the object detection model\n",
    "- For detailed documentation, please refer to [PP-YOLOE](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/configs/ppyoloe) and [Multiple-Object-Tracking](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_mot_en.md)\n",
    "\n",
    "### Pedestrian tracking\n",
    "- Vehicle tracking by SDE solution\n",
    "- Adopt PP-YOLOE L (high precision) and S (lightweight) for detection models\n",
    "- Adopt the OC-SORT solution for racking module\n",
    "- Refer to [OC-SORT](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/configs/mot/ocsort) and [Multi-Object Tracking](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_mot_en.md)\n",
    "\n",
    "### Multi-camera & multi-pedestrain tracking\n",
    "- Use PP-YOLOE & OC-SORT to acquire single-camera multi-object tracking trajectory\n",
    "- Extract features for each frame using ReID (StrongBaseline network).\n",
    "- Match multi-camera trajectory features to obtain multi-camera tracking results.\n",
    "- Refer to [Multi-camera & multi-pedestrain tracking](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_mtmct_en.md))\n",
    "\n",
    "### Attribute Recognition\n",
    "- Use PP-YOLOE + OC-SORT to track the human body.\n",
    "- Use PP-HGNet, PP-LCNet (multi-classification model) to complete the attribute recognition. Main attributes include age, gender, hat, eyes, top and bottom dressing style, backpack.\n",
    "- Refer to [attribute recognition](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_attribute_en.md)\n",
    "\n",
    "### Behaviour Recognition:\n",
    "- Four behaviour recognition solutions are provided:\n",
    "\n",
    "- 1. Behaviour recognition based on skeletal points, e.g. falling recognition\n",
    "\n",
    "- 2. Behaviour recognition based on image classification, e.g. phone call recognition\n",
    "\n",
    "- 3. Behaviour recognition based on detection, e.g. smoking recognition\n",
    "\n",
    "- 4. Behaviour recognition based on Video classification, e.g. fighting recognition\n",
    "\n",
    "- For details, please refer to [Behaviour Recognition]https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.5/deploy/pipeline/docs/tutorials/pphuman_action_en.md)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}