From a04d0d2247abdb8cf9d0b563e992b54a70c95093 Mon Sep 17 00:00:00 2001 From: JYChen Date: Tue, 12 Jul 2022 21:46:14 +0800 Subject: [PATCH] Optimize action doc (#6412) * optimize pphuman action document * add en doc --- configs/pphuman/README.md | 9 + .../ppyoloe_crn_s_80e_smoking_visdrone.yml | 54 ++++++ deploy/pipeline/README.md | 2 +- deploy/pipeline/docs/tutorials/action.md | 30 ++-- deploy/pipeline/docs/tutorials/action_en.md | 33 ++-- .../action_recognotion/README.md | 2 +- .../action_recognotion/idbased_clas.md | 139 ++++++++++++++-- .../action_recognotion/idbased_det.md | 154 +++++++++++++++++- .../action_recognotion/skeletonbased_rec.md | 114 ++++++++++++- 9 files changed, 484 insertions(+), 53 deletions(-) create mode 100644 configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml diff --git a/configs/pphuman/README.md b/configs/pphuman/README.md index 9e501046e..65484eb1b 100644 --- a/configs/pphuman/README.md +++ b/configs/pphuman/README.md @@ -16,6 +16,15 @@ PaddleDetection团队提供了针对行人的基于PP-YOLOE的检测模型,用 - 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。 +# PP-YOLOE 香烟检测模型 +基于PP-YOLOE模型的香烟检测模型,是实现PP-Human中的基于检测的行为识别方案的一环,如何在PP-Human中使用该模型进行吸烟行为识别,可参考[PP-Human行为识别模块](../../deploy/pipeline/docs/tutorials/action.md)。该模型检测类别仅包含香烟一类。由于数据来源限制,目前暂无法直接公开训练数据。该模型使用了小目标数据集Visdrone上的权重作为预训练模型,以提升检测效果。 + +| 模型 | 数据集 | mAPval
0.5:0.95 | mAPval
0.5 | 下载 | 配置文件 | +|:---------|:-------:|:------:|:------:| :----: | :------:| + +| PP-YOLOE-s | 业务数据集 | 39.7 | 79.5 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_80e_smoking_visdrone.yml) | + + ## 引用 ``` @article{shao2018crowdhuman, diff --git a/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml b/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml new file mode 100644 index 000000000..40a731d4d --- /dev/null +++ b/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml @@ -0,0 +1,54 @@ +_BASE_: [ + '../runtime.yml', + '../ppyoloe/_base_/optimizer_300e.yml', + '../ppyoloe/_base_/ppyoloe_crn.yml', + '../ppyoloe/_base_/ppyoloe_reader.yml', +] + +log_iter: 100 +snapshot_epoch: 10 +weights: output/ppyoloe_crn_s_80e_smoking_visdrone/model_final + +pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams +depth_mult: 0.33 +width_mult: 0.50 + +TrainReader: + batch_size: 16 + +LearningRate: + base_lr: 0.01 + +epoch: 80 +LearningRate: + base_lr: 0.01 + schedulers: + - !CosineDecay + max_epochs: 80 + - !LinearWarmup + start_factor: 0. + epochs: 1 + +PPYOLOEHead: + static_assigner_epoch: -1 + +metric: COCO +num_classes: 1 + +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking diff --git a/deploy/pipeline/README.md b/deploy/pipeline/README.md index 82c89fc81..895e1dd7e 100644 --- a/deploy/pipeline/README.md +++ b/deploy/pipeline/README.md @@ -57,7 +57,7 @@ PP-Human支持图片/单镜头视频/多镜头视频多种输入方式,功能 * [快速开始](docs/tutorials/action.md) * 摔倒检测 * 打架识别 -* [二次开发教程](../../docs/advanced_tutorials/customization/action.md) +* [二次开发教程](../../docs/advanced_tutorials/customization/action_recognotion/README.md) * 方案选择 * 数据准备 * 模型优化 diff --git a/deploy/pipeline/docs/tutorials/action.md b/deploy/pipeline/docs/tutorials/action.md index 3cabae555..75d8cf676 100644 --- a/deploy/pipeline/docs/tutorials/action.md +++ b/deploy/pipeline/docs/tutorials/action.md @@ -33,6 +33,7 @@
数据来源及版权归属:天覆科技,感谢提供并开源实际场景数据,仅限学术研究使用
+ ### 配置说明 [配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下: ``` @@ -47,7 +48,7 @@ SKELETON_ACTION: # 基于骨骼点的行为识别模型配置 ``` ### 使用方法 -1. 从上表链接中下载模型并解压到```./output_inference```路径下。默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 +1. 从`模型库`中下载`行人检测/跟踪`、`关键点识别`、`摔倒行为识别`三个预测部署模型并解压到```./output_inference```路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 2. 目前行为识别模块仅支持视频输入,根据期望开启的行为识别方案类型,设置infer_cfg_pphuman.yml中`SKELETON_ACTION`的enable: True, 然后启动命令如下: ```python python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ @@ -64,6 +65,8 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph --device=gpu \ --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN ``` +4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)。 + ### 方案说明 1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。 @@ -97,14 +100,15 @@ ID_BASED_CLSACTION: # 基于分类的行为识别模型配置 ``` ### 使用方法 -1. 从上表链接中下载预测部署模型并解压到`./output_inference`路径下; +1. 从`模型库`中下载`行人检测/跟踪`、`打电话行为识别`两个预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_CLSACTION`下的`enable`为`True`; -3. 输入视频,启动命令如下: +3. 仅支持输入视频,启动命令如下: ``` python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu ``` +4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)。 ### 方案说明 1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。 @@ -137,14 +141,15 @@ ID_BASED_DETACTION: # 基于检测的行为识别模型配置 ``` ### 使用方法 -1. 从上表链接中下载预测部署模型并解压到`./output_inference`路径下; +1. 从`模型库`中下载`行人检测/跟踪`、`抽烟行为识别`两个预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_DETACTION`下的`enable`为`True`; -3. 输入视频,启动命令如下: +3. 仅支持输入视频,启动命令如下: ``` python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu ``` +4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)。 ### 方案说明 1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号,模型方案为PP-YOLOE,详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。 @@ -191,14 +196,15 @@ VIDEO_ACTION: # 基于视频分类的行为识别模型配置 ``` ### 使用方法 -1. 从上表链接中下载预测部署模型并解压到`./output_inference`路径下; +1. 从上表链接中下载`打架识别`任务的预测部署模型并解压到`./output_inference`路径下;默认自动下载模型,如果手动下载,需要修改模型文件夹为模型存放路径。 2. 修改配置文件`deploy/pphuman/config/infer_cfg_pphuman.yml`中`VIDEO_ACTION`下的`enable`为`True`; -3. 输入视频,启动命令如下: +3. 仅支持输入视频,启动命令如下: ``` python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu ``` +4. 启动命令中的完整参数说明,请参考[参数说明](./QUICK_STARTED.md)。 测试效果如下: @@ -214,14 +220,14 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph ## 自定义模型训练 我们已经提供了检测/跟踪、关键点识别以及识别摔倒、吸烟、打电话以及打架的预训练模型,可直接下载使用。如果希望使用自定义场景数据训练,或是对模型进行优化,根据具体模型,分别参考下面的链接: -| 任务 | 算法 | 模型训练及导出文档 | +| 任务 | 算法 | 模型开发文档 | | ---- | ---- | -------- | | 行人检测/跟踪 | PP-YOLOE | [使用教程](../../../../configs/ppyoloe/README_cn.md#使用教程) | | 关键点识别 | HRNet | [使用教程](../../../../configs/keypoint#3训练与测试) | -| 行为识别(摔倒)| ST-GCN | [使用教程](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman) | -| 行为识别(吸烟)| PP-YOLOE | [使用教程](../../../../docs/advanced_tutorials/customization/action.md) | -| 行为识别(打电话)| PP-HGNet | [使用教程](../../../../docs/advanced_tutorials/customization/action.md) | -| 行为识别 (打架)| PP-TSM | [使用教程](../../../../docs/advanced_tutorials/customization/action.md) +| 行为识别(摔倒)| ST-GCN | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) | +| 行为识别(吸烟)| PP-YOLOE | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) | +| 行为识别(打电话)| PP-HGNet | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) | +| 行为识别 (打架)| PP-TSM | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md) ## 参考文献 diff --git a/deploy/pipeline/docs/tutorials/action_en.md b/deploy/pipeline/docs/tutorials/action_en.md index d2d44bfab..e804aad42 100644 --- a/deploy/pipeline/docs/tutorials/action_en.md +++ b/deploy/pipeline/docs/tutorials/action_en.md @@ -60,9 +60,9 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model ## How to Use -- Download models from the links of the above table and unzip them to ```./output_inference```. +1. Download models `Pedestrian Detection/Tracking`, `Keypoint Detection` and `Falling Recognition` from the links in the Model Zoo and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. -- Now the only available input is the video input in the action recognition module. set the "enable: True" of `SKELETON_ACTION` in infer_cfg_pphuman.yml. And then run the command: +2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `SKELETON_ACTION` in infer_cfg_pphuman.yml. And then run the command: ```python python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ @@ -70,7 +70,7 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model --device=gpu ``` -- There are two ways to modify the model path: +3. There are two ways to modify the model path: - In ```./deploy/pipeline/config/infer_cfg_pphuman.yml```, you can configurate different model paths,which is proper only if you match keypoint models and action recognition models with the fields of `KPT` and `SKELETON_ACTION` respectively, and modify the corresponding path of each field into the expected path. - Add `--model_dir` in the command line to revise the model path: @@ -81,6 +81,7 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model --device=gpu \ --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN ``` +4. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md) ### Introduction to the Solution @@ -98,7 +99,7 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model ``` - The falling action recognition model uses [ST-GCN](https://arxiv.org/abs/1801.07455), and employ the [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md) toolkit to complete model training. -## Image-Classification-Based Action Recognition -- Calling Detection +## Image-Classification-Based Action Recognition -- Calling Recognition
Data source and copyright owner:Skyinfor Technology. Thanks for the provision of actual scenario data, which are only @@ -122,9 +123,9 @@ ID_BASED_CLSACTION: # config for classfication-based action recognition model ### How to Use -1. Download models from the links of the above table and unzip them to ```./output_inference```. +1. Download models `Pedestrian Detection/Tracking` and `Calling Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. -2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `ID_BASED_CLSACTION` in infer_cfg_pphuman.yml. +2. Now the only available input is the video input in the action recognition module. Set the "enable: True" of `ID_BASED_CLSACTION` in infer_cfg_pphuman.yml. 3. Run this command: ```python @@ -132,6 +133,7 @@ ID_BASED_CLSACTION: # config for classfication-based action recognition model --video_file=test_video.mp4 \ --device=gpu ``` +4. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md) ### Introduction to the Solution 1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../configs/ppyoloe). @@ -168,7 +170,7 @@ ID_BASED_DETACTION: # Config for detection-based action recognition model ### How to Use -1. Download models from the links of the above table and unzip them to ```./output_inference```. +1. Download models `Pedestrian Detection/Tracking` and `Smoking Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. 2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `ID_BASED_DETACTION` in infer_cfg_pphuman.yml. @@ -178,6 +180,7 @@ ID_BASED_DETACTION: # Config for detection-based action recognition model --video_file=test_video.mp4 \ --device=gpu ``` +4. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md) ### Introduction to the Solution 1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../../configs/ppyoloe). @@ -223,18 +226,20 @@ VIDEO_ACTION: # Config for detection-based action recognition model ### How to Use -1. Download models from the links of the above table and unzip them to ```./output_inference```. +1. Download model `Fighting Detection` from the links of the above table and unzip it to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path. 2. Modify the file names in the `ppTSM` folder to `model.pdiparams, model.pdiparams.info and model.pdmodel`; 3. Now the only available input is the video input in the action recognition module. set the "enable: True" of `VIDEO_ACTION` in infer_cfg_pphuman.yml. -3. Run this command: +4. Run this command: ```python python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \ --video_file=test_video.mp4 \ --device=gpu ``` +5. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md). + The result is shown as follow: @@ -252,14 +257,14 @@ The current fight recognition model is using [PP-TSM](https://github.com/PaddleP The pretrained models are provided and can be used directly, including pedestrian detection/ tracking, keypoint detection, smoking, calling and fighting recognition. If users need to train custom action or optimize the model performance, please refer the link below. -| Task | Model | Training and Export doc | +| Task | Model | Development Document | | ---- | ---- | -------- | | pedestrian detection/tracking | PP-YOLOE | [doc](../../../../configs/ppyoloe/README.md#getting-start) | | keypoint detection | HRNet | [doc](../../../../configs/keypoint/README_en.md#3training-and-testing) | -| action recognition (fall down) | ST-GCN | [doc](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman) | -| action recognition (smoking) | PP-YOLOE | [doc](../../../../docs/advanced_tutorials/customization/action.md) | -| action recognition (calling) | PP-HGNet | [doc](../../../../docs/advanced_tutorials/customization/action.md) | -| action recognition (fighting) | PP-TSM | [doc](../../../../docs/advanced_tutorials/customization/action.md) | +| action recognition (fall down) | ST-GCN | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) | +| action recognition (smoking) | PP-YOLOE | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) | +| action recognition (calling) | PP-HGNet | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) | +| action recognition (fighting) | PP-TSM | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md) | ## Reference diff --git a/docs/advanced_tutorials/customization/action_recognotion/README.md b/docs/advanced_tutorials/customization/action_recognotion/README.md index 0966114bd..99a2e5205 100644 --- a/docs/advanced_tutorials/customization/action_recognotion/README.md +++ b/docs/advanced_tutorials/customization/action_recognotion/README.md @@ -15,7 +15,7 @@ | 基于视频分类的行为识别 | 应用视频分类技术对整个视频场景进行分类。 | 1.充分利用背景上下文和时序信息;
2. 可利用语音、字幕等多模态信息;
3. 不依赖检测及跟踪模型;
4. 可处理多人共同组成的动作; | 1. 无法定位到具体某个人的行为;
2. 场景泛化能力较弱;
3.真实数据采集困难; | 无需具体到人的场景的判定,即判断是否存在某种特定行为,多人或对背景依赖较强的动作,如监控画面中打架识别等场景。 | -下面我们以PaddleDetection支持的几个具体动作为例,介绍每个动作方案的选择原因: +下面以PaddleDetection目前已经支持的几个具体动作为例,介绍每个动作方案的选型依据: ### 吸烟 diff --git a/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md index 42c4ae4e4..0895d0820 100644 --- a/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md +++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md @@ -1,15 +1,19 @@ -# 基于人体id的分类 +# 基于人体id的分类模型开发 + +## 环境准备 + +基于人体id的分类方案是使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md)完成环境安装,以进行后续的模型训练及使用流程。 ## 数据准备 基于图像分类的行为识别方案直接对视频中的图像帧结果进行识别,因此模型训练流程与通常的图像分类模型一致。 -#### 数据集下载 +### 数据集下载 打电话的行为识别是基于公开数据集[UAV-Human](https://github.com/SUTDCV/UAV-Human)进行训练的。请通过该链接填写相关数据集申请材料后获取下载链接。 在`UAVHuman/ActionRecognition/RGBVideos`路径下包含了该数据集中RGB视频数据集,每个视频的文件名即为其标注信息。 -#### 训练及测试图像处理 +### 训练及测试图像处理 根据视频文件名,其中与行为识别相关的为`A`相关的字段(即action),我们可以找到期望识别的动作类型数据。 - 正样本视频:以打电话为例,我们只需找到包含`A024`的文件。 - 负样本视频:除目标动作以外所有的视频。 @@ -18,7 +22,7 @@ **注意**: 正样本视频中并不完全符合打电话这一动作,在视频开头结尾部分会出现部分冗余动作,需要移除。 -#### 标注文件准备 +### 标注文件准备 基于图像分类的行为识别方案是借助[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)进行模型训练的。使用该方案训练的模型,需要准备期望识别的图像数据及对应标注文件。根据[PaddleClas数据集格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/classification_dataset.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)准备对应的数据即可。标注文件样例如下,其中`0`,`1`分别是图片对应所属的类别: ``` @@ -29,35 +33,144 @@ ... ``` +此外,标签文件`phone_label_list.txt`,帮助将分类序号映射到具体的类型名称: +``` +0 make_a_phone_call # 类型0 +1 normal # 类型1 +``` + +完成上述内容后,放置于`dataset`目录下,文件结构如下: +``` +data/ +├── images # 放置所有图片 +├── phone_label_list.txt # 标签文件 +├── phone_train_list.txt # 训练列表,包含图片及其对应类型 +└── phone_val_list.txt # 测试列表,包含图片及其对应类型 +``` + ## 模型优化 +### 检测-跟踪模型优化 +基于分类的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../mot.md)对检测/跟踪模型进行优化。 + + ### 半身图预测 在打电话这一动作中,实际是通过上半身就能实现动作的区分的,因此在训练和预测过程中,将图像由行人全身图换为半身图 ## 新增行为 -### 模型训练及测试 -- 首先根据[Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/installation/install_paddleclas_en.md)完成PaddleClas的环境配置。 -- 按照`数据准备`部分,完成训练/验证集图像的裁剪及标注文件准备。 -- 模型训练: 参考[使用预训练模型进行训练](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_classification_new_user.md#422-%E4%BD%BF%E7%94%A8%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B%E8%BF%9B%E8%A1%8C%E8%AE%AD%E7%BB%83)完成模型的训练及精度验证 +### 数据准备 +参考前述介绍的内容,完成数据准备的部分,放置于`{root of PaddleClas}/dataset`下: +``` +data/ +├── images # 放置所有图片 +├── label_list.txt # 标签文件 +├── train_list.txt # 训练列表,包含图片及其对应类型 +└── val_list.txt # 测试列表,包含图片及其对应类型 +``` +其中,训练及测试列表如下: +``` + # 每一行采用"空格"分隔图像路径与标注 + train/000001.jpg 0 + train/000002.jpg 0 + train/000003.jpg 1 + train/000004.jpg 2 # 新增的类别直接填写对应类别号即可 + ... +``` +`label_list.txt`中需要同样对应扩展类型的名称: +``` +0 make_a_phone_call # 类型0 +1 Your New Action # 类型1 + ... +n normal # 类型n +``` + +### 配置文件设置 +在PaddleClas中已经集成了[训练配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml),需要重点关注的设置项如下: + +```yaml +# model architecture +Arch: + name: PPHGNet_tiny + class_num: 2 # 对应新增后的数量 + + ... + +# 正确设置image_root与cls_label_path,保证image_root + cls_label_path中的图片路径能够正确访问图片路径 +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: ./dataset/ + cls_label_path: ./dataset/phone_train_list_halfbody.txt + + ... + +Infer: + infer_imgs: docs/images/inference_deployment/whl_demo.jpg + batch_size: 1 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - ResizeImage: + size: 224 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 2 # 显示topk的数量,不要超过类别总数 + class_id_map_file: dataset/phone_label_list.txt # 修改后的label_list.txt路径 +``` + +### 模型训练及评估 +#### 模型训练 +通过如下命令启动训练: +```bash +export CUDA_VISIBLE_DEVICES=0,1,2,3 +python3 -m paddle.distributed.launch \ + --gpus="0,1,2,3" \ + tools/train.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Arch.pretrained=True +``` +其中 `Arch.pretrained` 为 `True`表示使用预训练权重帮助训练。 + +#### 模型评估 + +训练好模型之后,可以通过以下命令实现对模型指标的评估。 + +```bash +python3 tools/eval.py \ + -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \ + -o Global.pretrained_model=output/PPHGNet_tiny/best_model +``` + +其中 `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` 指定了当前最佳权重所在的路径,如果指定其他权重,只需替换对应的路径即可。 ### 模型导出 模型导出的详细介绍请参考[这里](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model) 可以参考以下步骤实现: ```python python tools/export_model.py - -c ./PPHGNet_tiny_resize_halfbody.yaml \ + -c ./PPHGNet_tiny_calling_halfbody.yaml \ -o Global.pretrained_model=./output/PPHGNet_tiny/best_model \ - -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_resize_halfbody + -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody ``` -然后将导出的模型重命名,并加入配置文件,以适配PP-Human的使用 +然后将导出的模型重命名,并加入配置文件,以适配PP-Human的使用。 ```bash -cd ./output_inference/PPHGNet_tiny_resize_halfbody +cd ./output_inference/PPHGNet_tiny_calling_halfbody mv inference.pdiparams model.pdiparams mv inference.pdiparams.info model.pdiparams.info mv inference.pdmodel model.pdmodel -cp infer_cfg.yml . +# 下载预测配置文件 +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml ``` + 至此,即可使用PP-Human进行实际预测了。 diff --git a/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md b/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md index 10082f98d..17ef883c2 100644 --- a/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md +++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md @@ -1,11 +1,21 @@ -# 基于人体id的检测开发 +# 基于人体id的检测模型开发 -## 数据准备 +## 环境准备 + +基于人体id的检测方案是直接使用[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL_cn.md)完成环境安装,以进行后续的模型训练及使用流程。 +## 数据准备 基于检测的行为识别方案中,数据准备的流程与一般的检测模型一致,详情可参考[目标检测数据准备](../../tutorials/data/PrepareDetDataSet.md)。将图像和标注数据组织成PaddleDetection中支持的格式之一即可。 +**注意** : 在实际使用的预测过程中,使用的是单人图像进行预测,因此在训练过程中建议将图像裁剪为单人图像,再进行烟头检测框的标注,以提升准确率。 + + ## 模型优化 +### 检测-跟踪模型优化 +基于检测的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../mot.md)对检测/跟踪模型进行优化。 + + ### 更大的分辨率 烟头的检测在监控视角下是一个典型的小目标检测问题,使用更大的分辨率有助于提升模型整体的识别率 @@ -13,18 +23,146 @@ 加入小目标场景数据集VisDrone下的预训练模型进行训练,模型mAP由38.1提升到39.7。 ## 新增行为 +### 数据准备 +参考[目标检测数据准备](../../tutorials/data/PrepareDetDataSet.md)完成训练数据准备。 + +准备完成后,数据路径为 +``` +dataset/smoking +├── smoking # 存放所有的图片 +│   ├── 1.jpg +│   ├── 2.jpg +├── smoking_test_cocoformat.json # 测试标注文件 +├── smoking_train_cocoformat.json # 训练标注文件 +``` + +以`COCO`格式为例,完成后的json标注文件内容如下: + +```json +# images字段下包含了图像的路径,id及对应宽高信息 + "images": [ + { + "file_name": "smoking/1.jpg", + "id": 0, # 此处id为图片id序号,不要重复 + "height": 437, + "width": 212 + }, + { + "file_name": "smoking/2.jpg", + "id": 1, + "height": 655, + "width": 365 + }, + + ... + +# categories 字段下包含所有类别信息,如果希望新增更多的检测类别,请在这里增加, 示例如下。 + "categories": [ + { + "supercategory": "cigarette", + "id": 1, + "name": "cigarette" + }, + { + "supercategory": "Class_Defined_by_Yourself", + "id": 2, + "name": "Class_Defined_by_Yourself" + }, + + ... + +# annotations 字段下包含了所有目标实例的信息,包括类别,检测框坐标, id, 所属图像id等信息 + "annotations": [ + { + "category_id": 1, # 对应定义的类别,在这里1代表cigarette + "bbox": [ + 97.0181345931, + 332.7033243081, + 7.5943999555, + 16.4545332369 + ], + "id": 0, # 此处id为实例的id序号,不要重复 + "image_id": 0, # 此处为实例所在图片的id序号,可能重复,此时即一张图片上有多个实例对象 + "iscrowd": 0, + "area": 124.96230648208665 + }, + { + "category_id": 2, # 对应定义的类别,在这里2代表Class_Defined_by_Yourself + "bbox": [ + 114.3895698372, + 221.9131122343, + 25.9530363697, + 50.5401234568 + ], + "id": 1, + "image_id": 1, + "iscrowd": 0, + "area": 1311.6696622034585 +``` -#### 模型训练及测试 -- 按照`数据准备`部分,完成训练/验证集图像的裁剪及标注文件准备。 -- 模型训练: 参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md),执行下列步骤实现 +### 配置文件设置 +参考[配置文件](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), 其中需要关注重点如下: +```yaml +metric: COCO +num_classes: 1 # 如果新增了更多的类别,请对应修改此处 + +# 正确设置image_dir,anno_path,dataset_dir +# 保证dataset_dir + anno_path 能正确对应标注文件的路径 +# 保证dataset_dir + image_dir + 标注文件中的图片路径可以正确对应到图片路径 +TrainDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_train_cocoformat.json + dataset_dir: dataset/smoking + data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd'] + +EvalDataset: + !COCODataSet + image_dir: "" + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking + +TestDataset: + !ImageFolder + anno_path: smoking_test_cocoformat.json + dataset_dir: dataset/smoking +``` + +### 模型训练及评估 +- 模型训练 + +参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md),执行下列步骤实现 ```bash -python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c ppyoloe_smoking/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval +# At Root of PaddleDetection + +python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval ``` -#### 模型导出 +- 模型评估 + +训练好模型之后,可以通过以下命令实现对模型指标的评估 +```bash +# At Root of PaddleDetection + +python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml +``` + +### 模型导出 注意:如果在Tensor-RT环境下预测, 请开启`-o trt=True`以获得更好的性能 ```bash -python tools/export_model.py -c ppyoloe_smoking/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True +# At Root of PaddleDetection + +python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True ``` + +导出模型后,可以得到: +``` +ppyoloe_crn_s_80e_smoking_visdrone/ +├── infer_cfg.yml +├── model.pdiparams +├── model.pdiparams.info +└── model.pdmodel +``` + 至此,即可使用PP-Human进行实际预测了。 diff --git a/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md index 71ae9f9b7..a6ea303a3 100644 --- a/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md +++ b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md @@ -1,11 +1,15 @@ # 基于人体骨骼点的行为识别 +## 环境准备 + +基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/install.md)完成PaddleVideo的环境安装,以进行后续的模型训练及使用流程。 + ## 数据准备 +使用该方案训练的模型,可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据,以适配PaddleVideo进行训练,其主要流程包含以下步骤: -基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。使用该方案训练的模型,可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据。其主要流程包含以下步骤: ### 数据格式说明 -STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中,训练数据为采用`.npy`格式存储的`Numpy`数据,标签则可以是`.npy`或`.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`。 +STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中,训练数据为采用`.npy`格式存储的`Numpy`数据,标签则可以是`.npy`或`.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`,当前方案仅支持单人构成的行为(但视频中可以存在多人,每个人独自进行行为识别判断),即`M=1`。 | 维度 | 大小 | 说明 | | ---- | ---- | ---------- | @@ -20,6 +24,29 @@ STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo - 模型预测:可以直接选用[PaddleDetection KeyPoint模型系列](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint) 模型库中的模型,并根据`3、训练与测试 - 部署预测 - 检测+keypoint top-down模型联合部署`中的步骤获取目标序列的17个关键点坐标。 - 人工标注:若对关键点的数量或是定义有其他需求,也可以直接人工标注各个关键点的坐标位置,注意对于被遮挡或较难标注的点,仍需要标注一个大致坐标,否则后续网络学习过程会受到影响。 + +当使用模型预测获取时,可以参考如下步骤进行,请注意此时在PaddleDetection中进行操作。 + +```bash +# current path is under root of PaddleDetection + +# Step 1: download pretrained inference models. +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip +wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip +unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip +unzip -d output_inference/ dark_hrnet_w32_256x192.zip + +# Step 2: Get the keypoint coordinarys + +# if your data is image sequence +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True + +# if your data is video +python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True +``` +这样我们会得到一个`det_keypoint_unite_image_results.json`的检测结果文件。内容的具体含义请见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108)。 + + ### 统一序列的时序长度 由于实际数据中每个动作的长度不一,首先需要根据您的数据和实际场景预定时序长度(在PP-Human中我们采用50帧为一个动作序列),并对数据做以下处理: - 实际长度超过预定长度的数据,随机截取一个50帧的片段 @@ -35,11 +62,30 @@ STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo 注意:这里的`class_id`是`int`类型,与其他分类任务类似。例如`0:摔倒, 1:其他`。 + +我们提供了执行该步骤的[脚本文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py),可以直接处理生成的`det_keypoint_unite_image_results.json`文件,该脚本执行的内容包括解析json文件内容、前述步骤中介绍的整理训练数据及保存数据文件。 + +```bash +mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations + +mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json + +cd {root of PaddleVideo}/applications/PPHuman/datasets/ + +python prepare_dataset.py +``` + 至此,我们得到了可用的训练数据(`.npy`)和对应的标注文件(`.pkl`)。 ## 模型优化 +### 检测-跟踪模型优化 +基于骨骼点的行为识别模型效果依赖于前序的检测和跟踪效果,如果实际场景中不能准确检测到行人位置,或是难以正确在不同帧之间正确分配人物ID,都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题,请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../mot.md)对检测/跟踪模型进行优化。 + +### 关键点模型优化 +骨骼点作为该方案的核心特征,对行人的骨骼点定位效果也决定了行为识别的整体效果。若发现在实际场景中对关键点坐标的识别结果有明显错误,从关键点组成的骨架图像看,已经难以辨别具体动作,可以参考[关键点检测任务二次开发](../keypoint_detection.md)对关键点模型进行优化。 + ### 坐标归一化处理 在完成骨骼点坐标的获取后,建议根据各人物的检测框进行归一化处理,以消除人物位置、尺度的差异给网络带来的收敛难度。 @@ -48,9 +94,58 @@ STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo 基于关键点的行为识别方案中,行为识别模型使用的是[ST-GCN](https://arxiv.org/abs/1801.07455),并在[PaddleVideo训练流程](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)的基础上修改适配,完成模型训练及导出使用流程。 + +### 数据准备与配置文件修改 +- 按照`数据准备`, 准备训练数据(`.npy`)和对应的标注文件(`.pkl`)。对应放置在`{root of PaddleVideo}/applications/PPHuman/datasets/`下。 + +- 参考[配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), 需要重点关注的内容如下: + +```yaml +MODEL: #MODEL field + framework: + backbone: + name: "STGCN" + in_channels: 2 # 此处对应数据说明中的C维,表示二维坐标。 + dropout: 0.5 + layout: 'coco_keypoint' + data_bn: True + head: + name: "STGCNHead" + num_classes: 2 # 如果数据中有多种行为类型,需要修改此处使其与预测类型数目一致。 + if_top5: False # 行为类型数量不足5时请设置为False,否则会报错 + +... + + +# 请根据数据路径正确设置train/valid/test部分的数据及label路径 +DATASET: #DATASET field + batch_size: 64 + num_workers: 4 + test_batch_size: 1 + test_num_workers: 0 + train: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle + file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path + label_path: "./applications/PPHuman/datasets/train_label.pkl" + + valid: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True + test: + format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset' + file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path + label_path: "./applications/PPHuman/datasets/val_label.pkl" + + test_mode: True +``` + ### 模型训练与测试 -- 按照`数据准备`, 准备训练数据 + - 在PaddleVideo中,使用以下命令即可开始训练: + ```bash # current path is under root of PaddleVideo python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml @@ -59,7 +154,7 @@ python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml ``` -在训练完成后,采用以下命令进行预测: +- 在训练完成后,采用以下命令进行预测: ```bash python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams ``` @@ -90,3 +185,14 @@ STGCN ``` 至此,就可以使用PP-Human进行行为识别的推理了。 + +**注意**:如果在训练时调整了视频序列的长度或关键点的数量,在此处需要对应修改配置文件中`INFERENCE`字段内容,以实现正确预测。 +```yaml +# 序列数据的维度为(N,C,T,V,M) +INFERENCE: + name: 'STGCN_Inference_helper' + num_channels: 2 # 对应C维 + window_size: 50 # 对应T维,请对应调整为数据长度 + vertex_nums: 17 # 对应V维,请对应调整为关键点数目 + person_nums: 1 # 对应M维 +``` -- GitLab