Add pphuman doc (#5403)

* add pphuman doc * add pphuman doc & add enable_attr, enable_action * update doc * update pphuman-tech.png

Add pphuman doc (#5403)
* add pphuman doc * add pphuman doc & add enable_attr, enable_action * update doc * update pphuman-tech.png
674840fb · wangguanzhong · GitHub · 6c619791 · 674840fb · 674840fb
5 changed file
--- a/deploy/pphuman/README.md
+++ b/deploy/pphuman/README.md
+[English](README_en.md) | 简体中文
+
+# 实时行人分析 PP-Human
+
+PP-Human是基于飞桨深度学习框架的业界首个开源的实时行人分析工具，具有功能丰富，应用广泛和部署高效三大优势。PP-Human
+支持图片/单镜头视频/多镜头视频多种输入方式，功能覆盖多目标跟踪、属性识别和行为分析。能够广泛应用于智慧交通、智慧社区、工业巡检等领域。支持服务器端部署及TensorRT加速，T4服务器上可达到实时。
+
+
+## 一、环境准备
+
+环境要求： PaddleDetection版本 >= release/2.4
+
+PaddlePaddle和PaddleDetection安装
+
+```
+# PaddlePaddle CUDA10.1
+python -m pip install paddlepaddle-gpu==2.2.2.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
+
+# PaddlePaddle CPU
+python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
+
+# 克隆PaddleDetection仓库
+cd <path/to/clone/PaddleDetection>
+git clone https://github.com/PaddlePaddle/PaddleDetection.git
+
+# 安装其他依赖
+cd PaddleDetection
+pip install -r requirements.txt
+```
+
+详细安装文档参考[文档](docs/tutorials/INSTALL_cn.md)
+
+## 二、快速开始
+
+### 1. 模型下载
+
+PP-Human提供了目标检测、属性识别、行为识别、ReID预训练模型，以实现不同使用场景，用户可以直接下载使用
+
+| 任务            | 适用场景 | 精度 | 预测速度（FPS） | 预测部署模型 |
+| :---------:     |:---------:     |:---------------     | :-------:  | :------:      |
+| 目标检测        | 图片/视频输入 | -  | -           | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip) |
+| 属性识别    | 图片/视频输入 属性识别  | - |  -       | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/strongbaseline_r50_30e_pa100k.tar) |
+| 关键点检测    | 视频输入 行为识别 | - | -        | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip)
+| 行为识别   |  视频输入 行为识别  | - |  -          | [下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/STGCN.zip) |
+| ReID         | 视频输入 跨镜跟踪   | - | -         | [下载链接]() |
+
+下载模型后，解压至`./output_inference`文件夹
+
+**注意：**
+
+- 模型精度为融合数据集结果，数据集包含开源数据集和企业数据集
+- 预测速度为T4下，开启TensorRT FP16的效果
+
+### 2. 配置文件准备
+
+PP-Human相关配置位于```deploy/pphuman/config/infer_cfg.yml```中，存放模型路径，完成不同功能需要设置不同的任务类型
+
+功能及任务类型对应表单如下：
+
+| 输入类型 | 功能 | 任务类型 | 配置项 |
+|-------|-------|----------|-----|
+| 图片 | 属性识别 | 目标检测 属性识别 | DET ATTR |
+| 单镜头视频 | 属性识别 | 多目标跟踪 属性识别 | MOT ATTR |
+| 单镜头视频 | 行为识别 | 多目标跟踪 关键点检测 行为识别 | MOT KPT ACTION |
+
+例如基于视频输入的属性识别，任务类型包含多目标跟踪和属性识别，具体配置如下：
+
+```
+crop_thresh: 0.5
+attr_thresh: 0.5
+visual: True
+
+MOT:
+  model_dir: output_inference/mot_ppyoloe_l_36e_pipeline/
+  tracker_config: deploy/pphuman/config/tracker_config.yml
+  batch_size: 1
+
+ATTR:
+  model_dir: output_inference/strongbaseline_r50_30e_pa100k/
+  batch_size: 8
+```
+
+
+
+### 3. 预测部署
+
+```
+# 指定配置文件路径和测试图片
+python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --image_file=test_image.jpg --device=gpu
+
+# 指定配置文件路径和测试视频，完成属性识别
+python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --enable_attr=True
+
+# 指定配置文件路径和测试视频，完成行为识别
+python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --enable_action=True
+
+# 指定配置文件路径，模型路径和测试视频，完成多目标跟踪
+# 命令行中指定的模型路径优先级高于配置文件
+python deploy/pphuman/pipeline.py --config deploy/pphuman/config/infer_cfg.yml --video_file=test_video.mp4 --device=gpu --model_dir det=ppyoloe/
+```
+
+#### 3.1 参数说明
+
+| 参数 | 是否必须|含义 |
+|-------|-------|----------|
+| --config | Yes | 配置文件路径 |
+| --model_dir | Option | PP-Human中各任务模型路径，优先级高于配置文件 |
+| --image_file | Option | 需要预测的图片 |
+| --image_dir  | Option |  要预测的图片文件夹路径   |
+| --video_file | Option | 需要预测的视频 |
+| --camera_id | Option | 用来预测的摄像头ID，默认为-1(表示不使用摄像头预测，可设置为：0 - (摄像头数目-1) )，预测过程中在可视化界面按`q`退出输出预测结果到：output/output.mp4|
+| --enable_attr| Option | 是否进行属性识别 |
+| --enable_action| Option | 是否进行行为识别 |
+| --device | Option | 运行时的设备，可选择`CPU/GPU/XPU`，默认为`CPU`|
+| --output_dir | Option|可视化结果保存的根目录，默认为output/|
+| --run_mode | Option |使用GPU时，默认为paddle, 可选（paddle/trt_fp32/trt_fp16/trt_int8）|
+| --enable_mkldnn | Option | CPU预测中是否开启MKLDNN加速，默认为False |
+| --cpu_threads | Option| 设置cpu线程数，默认为1 |
+| --trt_calib_mode | Option| TensorRT是否使用校准功能，默认为False。使用TensorRT的int8功能时，需设置为True，使用PaddleSlim量化后的模型时需要设置为False |
+
+
+## 三、方案介绍
+
+PP-Human整体方案如下图所示
+
+<div width="1000" align="center">
+  <img src="../../docs/images/pphuman-tech.png"/>
+</div>
+
+
+### 1. 目标检测
+- 采用PP-YOLOE L 作为目标检测模型
+- 详细文档参考[PP-YOLOE](configs/ppyoloe/)
+
+### 2. 多目标跟踪
+- 采用SDE方案完成多目标跟踪
+- 检测模型使用PP-YOLOE L
+- 跟踪模块采用Bytetrack方案
+- 详细文档参考[Bytetrack](configs/mot/bytetrack)
+
+### 3. 跨镜跟踪
+- 使用PP-YOLOE + Bytetrack得到单镜头多目标跟踪轨迹
+- 使用ReID（centroid网络）对每一帧的检测结果提取特征
+- 多镜头轨迹特征进行匹配，得到跨镜头跟踪结果
+- 详细文档参考[跨镜跟踪](doc/mtmct.md)
+
+### 4. 属性识别
+- 使用PP-YOLOE + Bytetrack跟踪人体
+- 使用StrongBaseline（多分类模型）完成识别属性，主要属性包括年龄、性别、帽子、眼睛、上衣下衣款式、背包等
+- 详细文档参考[属性识别](doc/attribute.md)
+
+### 5. 行为识别：
+- 使用PP-YOLOE + Bytetrack跟踪人体
+- 使用HRNet进行关键点检测得到人体17个骨骼点
+- 结合100帧内同一个人骨骼点的变化，通过ST-GCN判断100帧内发生的动作是否为摔倒
+- 详细文档参考[行为识别](doc/action.md)
--- a/deploy/pphuman/pipe_utils.py
+++ b/deploy/pphuman/pipe_utils.py
@@ -52,6 +52,16 @@ def argsparser():
        type=int,
        default=-1,
        help="device id of camera to predict.")
+    parser.add_argument(
+        "--enable_attr",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use attribute recognition.")
+    parser.add_argument(
+        "--enable_action",
+        type=ast.literal_eval,
+        default=False,
+        help="Whether use action recognition.")
    parser.add_argument(
        "--output_dir",
        type=str,

--- a/deploy/pphuman/pipeline.py
+++ b/deploy/pphuman/pipeline.py
@@ -48,6 +48,8 @@ class Pipeline(object):
            then all the images in directory will be predicted, default as None
        video_file (string|None): the path of video file, default as None
        camera_id (int): the device id of camera to predict, default as -1
+        enable_attr (bool): whether use attribute recognition, default as false
+        enable_action (bool): whether use action recognition, default as false
        device (string): the device to predict, options are: CPU/GPU/XPU, 
            default as CPU
        run_mode (string): the mode of prediction, options are: 
@@ -68,6 +70,8 @@ class Pipeline(object):
                 image_dir=None,
                 video_file=None,
                 camera_id=-1,
+                 enable_attr=False,
+                 enable_action=True,
                 device='CPU',
                 run_mode='paddle',
                 trt_min_shape=1,
@@ -87,6 +91,8 @@ class Pipeline(object):
                    cfg,
                    is_video=True,
                    multi_camera=True,
+                    enable_attr=enable_attr,
+                    enable_action=enable_action,
                    device=device,
                    run_mode=run_mode,
                    trt_min_shape=trt_min_shape,
@@ -100,6 +106,8 @@ class Pipeline(object):
            self.predictor = PipePredictor(
                cfg,
                self.is_video,
+                enable_attr=enable_attr,
+                enable_action=enable_action,
                device=device,
                run_mode=run_mode,
                trt_min_shape=trt_min_shape,
@@ -172,7 +180,7 @@ class Result(object):
        self.res_dict[name].update(res)

    def get(self, name):
-        if name in self.res_dict:
+        if name in self.res_dict and len(self.res_dict[name]) > 0:
            return self.res_dict[name]
        return None

@@ -198,6 +206,8 @@ class PipePredictor(object):
        multi_camera (bool): whether to use multi camera in pipeline, 
            default as False
        camera_id (int): the device id of camera to predict, default as -1
+        enable_attr (bool): whether use attribute recognition, default as false
+        enable_action (bool): whether use action recognition, default as false
        device (string): the device to predict, options are: CPU/GPU/XPU, 
            default as CPU
        run_mode (string): the mode of prediction, options are: 
@@ -216,6 +226,8 @@ class PipePredictor(object):
                 cfg,
                 is_video=True,
                 multi_camera=False,
+                 enable_attr=False,
+                 enable_action=False,
                 device='CPU',
                 run_mode='paddle',
                 trt_min_shape=1,
@@ -226,8 +238,22 @@ class PipePredictor(object):
                 enable_mkldnn=False,
                 output_dir='output'):

-        self.with_attr = cfg.get('ATTR', False)
-        self.with_action = cfg.get('ACTION', False)
+        if enable_attr and not cfg.get('ATTR', False):
+            ValueError(
+                'enable_attr is set to True, please set ATTR in config file')
+        if enable_action and (not cfg.get('ACTION', False) or
+                              not cfg.get('KPT', False)):
+            ValueError(
+                'enable_action is set to True, please set KPT and ACTION in config file'
+            )
+
+        self.with_attr = cfg.get('ATTR', False) and enable_attr
+        self.with_action = cfg.get('ACTION', False) and enable_action
+        if self.with_attr:
+            print('Attribute Recognition enabled')
+        if self.with_action:
+            print('Action Recognition enabled')
+
        self.is_video = is_video
        self.multi_camera = multi_camera
        self.cfg = cfg
@@ -483,9 +509,10 @@ def main():
    print_arguments(cfg)
    pipeline = Pipeline(
        cfg, FLAGS.image_file, FLAGS.image_dir, FLAGS.video_file,
-        FLAGS.camera_id, FLAGS.device, FLAGS.run_mode, FLAGS.trt_min_shape,
-        FLAGS.trt_max_shape, FLAGS.trt_opt_shape, FLAGS.trt_calib_mode,
-        FLAGS.cpu_threads, FLAGS.enable_mkldnn, FLAGS.output_dir)
+        FLAGS.camera_id, FLAGS.enable_attr, FLAGS.enable_action, FLAGS.device,
+        FLAGS.run_mode, FLAGS.trt_min_shape, FLAGS.trt_max_shape,
+        FLAGS.trt_opt_shape, FLAGS.trt_calib_mode, FLAGS.cpu_threads,
+        FLAGS.enable_mkldnn, FLAGS.output_dir)

    pipeline.run()


--- a/deploy/python/visualize.py
+++ b/deploy/python/visualize.py
@@ -338,8 +338,8 @@ def visualize_attr(im, results, boxes=None):
        im = np.ascontiguousarray(np.copy(im))

    im_h, im_w = im.shape[:2]
-    text_scale = max(1, int(im.shape[0] / 1200.))
-    text_thickness = 3
+    text_scale = max(1, int(im.shape[0] / 1600.))
+    text_thickness = 2

    line_inter = im.shape[0] / 50.
    for i, res in enumerate(results):

--- a/docs/images/pphuman-tech.png
+++ b/docs/images/pphuman-tech.png