Optimize action doc (#6412)

* optimize pphuman action document * add en doc

Optimize action doc (#6412)
* optimize pphuman action document * add en doc
a04d0d22 · JYChen · GitHub · b03a8498 · a04d0d22 · a04d0d22
9 changed file
--- a/configs/pphuman/README.md
+++ b/configs/pphuman/README.md
@@ -16,6 +16,15 @@ PaddleDetection团队提供了针对行人的基于PP-YOLOE的检测模型，用
 - 具体使用教程请参考[ppyoloe](../ppyoloe#getting-start)。


+# PP-YOLOE 香烟检测模型
+基于PP-YOLOE模型的香烟检测模型，是实现PP-Human中的基于检测的行为识别方案的一环，如何在PP-Human中使用该模型进行吸烟行为识别，可参考[PP-Human行为识别模块](../../deploy/pipeline/docs/tutorials/action.md)。该模型检测类别仅包含香烟一类。由于数据来源限制，目前暂无法直接公开训练数据。该模型使用了小目标数据集Visdrone上的权重作为预训练模型，以提升检测效果。
+
+|    模型   |  数据集  | mAP<sup>val<br>0.5:0.95 |  mAP<sup>val<br>0.5 | 下载  | 配置文件 |
+|:---------|:-------:|:------:|:------:| :----: | :------:|
+
+| PP-YOLOE-s | 业务数据集 |  39.7 | 79.5 |[下载链接](https://bj.bcebos.com/v1/paddledet/models/pipeline/ppyoloe_crn_s_80e_smoking_visdrone.pdparams) | [配置文件](./ppyoloe_crn_s_80e_smoking_visdrone.yml) |
+
+
 ## 引用
 ```
 @article{shao2018crowdhuman,

--- a/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
+++ b/configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
+_BASE_: [
+  '../runtime.yml',
+  '../ppyoloe/_base_/optimizer_300e.yml',
+  '../ppyoloe/_base_/ppyoloe_crn.yml',
+  '../ppyoloe/_base_/ppyoloe_reader.yml',
+]
+
+log_iter: 100
+snapshot_epoch: 10
+weights: output/ppyoloe_crn_s_80e_smoking_visdrone/model_final
+
+pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_s_80e_visdrone.pdparams
+depth_mult: 0.33
+width_mult: 0.50
+
+TrainReader:
+  batch_size: 16
+
+LearningRate:
+  base_lr: 0.01
+
+epoch: 80
+LearningRate:
+  base_lr: 0.01
+  schedulers:
+    - !CosineDecay
+      max_epochs: 80
+    - !LinearWarmup
+      start_factor: 0.
+      epochs: 1
+
+PPYOLOEHead:
+  static_assigner_epoch: -1
+
+metric: COCO
+num_classes: 1
+
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: smoking_train_cocoformat.json
+    dataset_dir: dataset/smoking
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: smoking_test_cocoformat.json
+    dataset_dir: dataset/smoking
+
+TestDataset:
+  !ImageFolder
+    anno_path: smoking_test_cocoformat.json
+    dataset_dir: dataset/smoking
--- a/deploy/pipeline/README.md
+++ b/deploy/pipeline/README.md
@@ -57,7 +57,7 @@ PP-Human支持图片/单镜头视频/多镜头视频多种输入方式，功能
 * [快速开始](docs/tutorials/action.md)
  * 摔倒检测
  * 打架识别
-* [二次开发教程](../../docs/advanced_tutorials/customization/action.md)
+* [二次开发教程](../../docs/advanced_tutorials/customization/action_recognotion/README.md)
  * 方案选择
  * 数据准备
  * 模型优化

--- a/deploy/pipeline/docs/tutorials/action.md
+++ b/deploy/pipeline/docs/tutorials/action.md
@@ -33,6 +33,7 @@
  <center>数据来源及版权归属：天覆科技，感谢提供并开源实际场景数据，仅限学术研究使用</center>
 </div>

+
 ### 配置说明
 [配置文件](../../config/infer_cfg_pphuman.yml)中与行为识别相关的参数如下：
 ```
@@ -47,7 +48,7 @@ SKELETON_ACTION: # 基于骨骼点的行为识别模型配置
 ```

 ### 使用方法
-1. 从上表链接中下载模型并解压到```./output_inference```路径下。默认自动下载模型，如果手动下载，需要修改模型文件夹为模型存放路径。
+1. 从`模型库`中下载`行人检测/跟踪`、`关键点识别`、`摔倒行为识别`三个预测部署模型并解压到```./output_inference```路径下;默认自动下载模型，如果手动下载，需要修改模型文件夹为模型存放路径。
 2. 目前行为识别模块仅支持视频输入，根据期望开启的行为识别方案类型，设置infer_cfg_pphuman.yml中`SKELETON_ACTION`的enable: True, 然后启动命令如下：
 ```python
 python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
@@ -64,6 +65,8 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph
                                                   --device=gpu \
                                                   --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN
 ```
+4. 启动命令中的完整参数说明，请参考[参数说明](./QUICK_STARTED.md)。
+

 ### 方案说明
 1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号，模型方案为PP-YOLOE，详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。
@@ -97,14 +100,15 @@ ID_BASED_CLSACTION: # 基于分类的行为识别模型配置
 ```

 ### 使用方法
-1. 从上表链接中下载预测部署模型并解压到`./output_inference`路径下；
+1. 从`模型库`中下载`行人检测/跟踪`、`打电话行为识别`两个预测部署模型并解压到`./output_inference`路径下；默认自动下载模型，如果手动下载，需要修改模型文件夹为模型存放路径。
 2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_CLSACTION`下的`enable`为`True`；
-3. 输入视频，启动命令如下：
+3. 仅支持输入视频，启动命令如下：
 ```
 python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
                                                   --video_file=test_video.mp4 \
                                                   --device=gpu
 ```
+4. 启动命令中的完整参数说明，请参考[参数说明](./QUICK_STARTED.md)。

 ### 方案说明
 1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号，模型方案为PP-YOLOE，详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。
@@ -137,14 +141,15 @@ ID_BASED_DETACTION: # 基于检测的行为识别模型配置
 ```

 ### 使用方法
-1. 从上表链接中下载预测部署模型并解压到`./output_inference`路径下；
+1. 从`模型库`中下载`行人检测/跟踪`、`抽烟行为识别`两个预测部署模型并解压到`./output_inference`路径下；默认自动下载模型，如果手动下载，需要修改模型文件夹为模型存放路径。
 2. 修改配置文件`deploy/pipeline/config/infer_cfg_pphuman.yml`中`ID_BASED_DETACTION`下的`enable`为`True`；
-3. 输入视频，启动命令如下：
+3. 仅支持输入视频，启动命令如下：
 ```
 python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
                                                   --video_file=test_video.mp4 \
                                                   --device=gpu
 ```
+4. 启动命令中的完整参数说明，请参考[参数说明](./QUICK_STARTED.md)。

 ### 方案说明
 1. 使用目标检测与多目标跟踪获取视频输入中的行人检测框及跟踪ID序号，模型方案为PP-YOLOE，详细文档参考[PP-YOLOE](../../../../configs/ppyoloe/README_cn.md)。
@@ -191,14 +196,15 @@ VIDEO_ACTION:  # 基于视频分类的行为识别模型配置
 ```

 ### 使用方法
-1. 从上表链接中下载预测部署模型并解压到`./output_inference`路径下；
+1. 从上表链接中下载`打架识别`任务的预测部署模型并解压到`./output_inference`路径下；默认自动下载模型，如果手动下载，需要修改模型文件夹为模型存放路径。
 2. 修改配置文件`deploy/pphuman/config/infer_cfg_pphuman.yml`中`VIDEO_ACTION`下的`enable`为`True`；
-3. 输入视频，启动命令如下：
+3. 仅支持输入视频，启动命令如下：
 ```
 python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
                                                   --video_file=test_video.mp4 \
                                                   --device=gpu
 ```
+4. 启动命令中的完整参数说明，请参考[参数说明](./QUICK_STARTED.md)。

 测试效果如下：

@@ -214,14 +220,14 @@ python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pph

 ## 自定义模型训练
 我们已经提供了检测/跟踪、关键点识别以及识别摔倒、吸烟、打电话以及打架的预训练模型，可直接下载使用。如果希望使用自定义场景数据训练，或是对模型进行优化，根据具体模型，分别参考下面的链接：
-| 任务 | 算法 | 模型训练及导出文档 |
+| 任务 | 算法 | 模型开发文档 |
 | ---- | ---- | -------- |
 | 行人检测/跟踪 | PP-YOLOE | [使用教程](../../../../configs/ppyoloe/README_cn.md#使用教程) |
 | 关键点识别 | HRNet | [使用教程](../../../../configs/keypoint#3训练与测试) |
-| 行为识别（摔倒）|  ST-GCN  | [使用教程](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman) |
-| 行为识别（吸烟）|  PP-YOLOE  | [使用教程](../../../../docs/advanced_tutorials/customization/action.md) |
-| 行为识别（打电话）|  PP-HGNet  | [使用教程](../../../../docs/advanced_tutorials/customization/action.md) |
-| 行为识别 （打架）| PP-TSM | [使用教程](../../../../docs/advanced_tutorials/customization/action.md)
+| 行为识别（摔倒）|  ST-GCN  | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) |
+| 行为识别（吸烟）|  PP-YOLOE  | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) |
+| 行为识别（打电话）|  PP-HGNet  | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) |
+| 行为识别 （打架）| PP-TSM | [使用教程](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md)


 ## 参考文献

--- a/deploy/pipeline/docs/tutorials/action_en.md
+++ b/deploy/pipeline/docs/tutorials/action_en.md
@@ -60,9 +60,9 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model

 ## How to Use

- Download models from the links of the above table and unzip them to ```./output_inference```.
+1. Download models `Pedestrian Detection/Tracking`, `Keypoint Detection` and `Falling Recognition` from the links in the Model Zoo and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path.

- Now the only available input is the video input in the action recognition module. set the "enable: True" of `SKELETON_ACTION` in infer_cfg_pphuman.yml. And then run the command:
+2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `SKELETON_ACTION` in infer_cfg_pphuman.yml. And then run the command:

  ```python
  python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
@@ -70,7 +70,7 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model
                                                     --device=gpu
  ```

- There are two ways to modify the model path:
+3. There are two ways to modify the model path:

  - In ```./deploy/pipeline/config/infer_cfg_pphuman.yml```, you can configurate different model paths，which is proper only if you match keypoint models and action recognition models with the fields of `KPT` and `SKELETON_ACTION` respectively, and modify the corresponding path of each field into the expected path.
  - Add `--model_dir` in the command line to revise the model path：
@@ -81,6 +81,7 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model
                                                       --device=gpu \
                                                       --model_dir kpt=./dark_hrnet_w32_256x192 action=./STGCN
    ```
+4. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md)

 ### Introduction to the Solution

@@ -98,7 +99,7 @@ SKELETON_ACTION: # Config for skeleton-based action recognition model
 ```
 - The falling action recognition model uses [ST-GCN](https://arxiv.org/abs/1801.07455), and employ the [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md) toolkit to complete model training.

-## Image-Classification-Based Action Recognition -- Calling Detection
+## Image-Classification-Based Action Recognition -- Calling Recognition

 <div align="center">  <img src="../images/calling.gif" width='1000'/> <center>Data source and copyright owner：Skyinfor
 Technology. Thanks for the provision of actual scenario data, which are only
@@ -122,9 +123,9 @@ ID_BASED_CLSACTION: # config for classfication-based action recognition model

 ### How to Use

-1. Download models from the links of the above table and unzip them to ```./output_inference```.
+1. Download models `Pedestrian Detection/Tracking` and `Calling Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path.

-2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `ID_BASED_CLSACTION` in infer_cfg_pphuman.yml.
+2. Now the only available input is the video input in the action recognition module. Set the "enable: True" of `ID_BASED_CLSACTION` in infer_cfg_pphuman.yml.

 3. Run this command:
  ```python
@@ -132,6 +133,7 @@ ID_BASED_CLSACTION: # config for classfication-based action recognition model
                                                     --video_file=test_video.mp4 \
                                                     --device=gpu
  ```
+4. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md)

 ### Introduction to the Solution
 1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../configs/ppyoloe).
@@ -168,7 +170,7 @@ ID_BASED_DETACTION: # Config for detection-based action recognition model

 ### How to Use

-1. Download models from the links of the above table and unzip them to ```./output_inference```.
+1. Download models `Pedestrian Detection/Tracking` and `Smoking Recognition` from the links in `Model Zoo` and unzip them to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path.

 2. Now the only available input is the video input in the action recognition module. set the "enable: True" of `ID_BASED_DETACTION` in infer_cfg_pphuman.yml.

@@ -178,6 +180,7 @@ ID_BASED_DETACTION: # Config for detection-based action recognition model
                                                     --video_file=test_video.mp4 \
                                                     --device=gpu
  ```
+4. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md)

 ### Introduction to the Solution
 1. Get the pedestrian detection box and the tracking ID number of the video input through object detection and multi-object tracking. The adopted model is PP-YOLOE, and for details, please refer to [PP-YOLOE](../../../../configs/ppyoloe).
@@ -223,18 +226,20 @@ VIDEO_ACTION:  # Config for detection-based action recognition model

 ### How to Use

-1. Download models from the links of the above table and unzip them to ```./output_inference```.
+1. Download model `Fighting Detection` from the links of the above table and unzip it to ```./output_inference```. The models are automatically downloaded by default. If you download them manually, you need to modify the `model_dir` as the model storage path.

 2. Modify the file names in the `ppTSM` folder  to `model.pdiparams, model.pdiparams.info and model.pdmodel`;

 3. Now the only available input is the video input in the action recognition module. set the "enable: True" of `VIDEO_ACTION` in infer_cfg_pphuman.yml.

-3. Run this command:
+4. Run this command:
  ```python
  python deploy/pipeline/pipeline.py --config deploy/pipeline/config/infer_cfg_pphuman.yml \
                                                     --video_file=test_video.mp4 \
                                                     --device=gpu
  ```
+5. For detailed parameter description, please refer to [Parameter Description](./QUICK_STARTED.md).
+

 The result is shown as follow:

@@ -252,14 +257,14 @@ The current fight recognition model is using [PP-TSM](https://github.com/PaddleP

 The pretrained models are provided and can be used directly, including pedestrian detection/ tracking, keypoint detection, smoking, calling and fighting recognition. If users need to train custom action or optimize the model performance, please refer the link below.

-| Task | Model | Training and Export doc |
+| Task | Model | Development Document |
 | ---- | ---- | -------- |
 | pedestrian detection/tracking | PP-YOLOE | [doc](../../../../configs/ppyoloe/README.md#getting-start) |
 | keypoint detection | HRNet | [doc](../../../../configs/keypoint/README_en.md#3training-and-testing) |
-| action recognition (fall down) |  ST-GCN  | [doc](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman) |
-| action recognition (smoking) |  PP-YOLOE  | [doc](../../../../docs/advanced_tutorials/customization/action.md) |
-| action recognition (calling) |  PP-HGNet  | [doc](../../../../docs/advanced_tutorials/customization/action.md) |
-| action recognition (fighting) |  PP-TSM  | [doc](../../../../docs/advanced_tutorials/customization/action.md) |
+| action recognition (fall down) |  ST-GCN  | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md) |
+| action recognition (smoking) |  PP-YOLOE  | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_det.md) |
+| action recognition (calling) |  PP-HGNet  | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md) |
+| action recognition (fighting) |  PP-TSM  | [doc](../../../../docs/advanced_tutorials/customization/action_recognotion/videobased_rec.md) |


 ## Reference

--- a/docs/advanced_tutorials/customization/action_recognotion/README.md
+++ b/docs/advanced_tutorials/customization/action_recognotion/README.md
@@ -15,7 +15,7 @@
 | 基于视频分类的行为识别 | 应用视频分类技术对整个视频场景进行分类。 | 1.充分利用背景上下文和时序信息；<br>2. 可利用语音、字幕等多模态信息；<br>3. 不依赖检测及跟踪模型；<br>4. 可处理多人共同组成的动作； | 1. 无法定位到具体某个人的行为；<br>2. 场景泛化能力较弱；<br>3.真实数据采集困难； | 无需具体到人的场景的判定，即判断是否存在某种特定行为，多人或对背景依赖较强的动作，如监控画面中打架识别等场景。 |


-下面我们以PaddleDetection支持的几个具体动作为例，介绍每个动作方案的选择原因：
+下面以PaddleDetection目前已经支持的几个具体动作为例，介绍每个动作方案的选型依据：

 ### 吸烟


--- a/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md
+++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_clas.md
-# 基于人体id的分类
+# 基于人体id的分类模型开发
+
+## 环境准备
+
+基于人体id的分类方案是使用[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/installation/install_paddleclas.md)完成环境安装，以进行后续的模型训练及使用流程。

 ## 数据准备

 基于图像分类的行为识别方案直接对视频中的图像帧结果进行识别，因此模型训练流程与通常的图像分类模型一致。

-#### 数据集下载
+### 数据集下载
 打电话的行为识别是基于公开数据集[UAV-Human](https://github.com/SUTDCV/UAV-Human)进行训练的。请通过该链接填写相关数据集申请材料后获取下载链接。

 在`UAVHuman/ActionRecognition/RGBVideos`路径下包含了该数据集中RGB视频数据集，每个视频的文件名即为其标注信息。

-#### 训练及测试图像处理
+### 训练及测试图像处理
 根据视频文件名，其中与行为识别相关的为`A`相关的字段（即action），我们可以找到期望识别的动作类型数据。
 - 正样本视频：以打电话为例，我们只需找到包含`A024`的文件。
 - 负样本视频：除目标动作以外所有的视频。
@@ -18,7 +22,7 @@

 **注意**: 正样本视频中并不完全符合打电话这一动作，在视频开头结尾部分会出现部分冗余动作，需要移除。

-#### 标注文件准备
+### 标注文件准备

 基于图像分类的行为识别方案是借助[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)进行模型训练的。使用该方案训练的模型，需要准备期望识别的图像数据及对应标注文件。根据[PaddleClas数据集格式说明](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/data_preparation/classification_dataset.md#1-%E6%95%B0%E6%8D%AE%E9%9B%86%E6%A0%BC%E5%BC%8F%E8%AF%B4%E6%98%8E)准备对应的数据即可。标注文件样例如下，其中`0`,`1`分别是图片对应所属的类别：
 ```
@@ -29,35 +33,144 @@
    ...
 ```

+此外，标签文件`phone_label_list.txt`，帮助将分类序号映射到具体的类型名称：
+```
+0 make_a_phone_call  # 类型0
+1 normal             # 类型1
+```
+
+完成上述内容后，放置于`dataset`目录下，文件结构如下：
+```
+data/
+├── images  # 放置所有图片
+├── phone_label_list.txt # 标签文件
+├── phone_train_list.txt # 训练列表，包含图片及其对应类型
+└── phone_val_list.txt   # 测试列表，包含图片及其对应类型
+```
+
 ## 模型优化

+### 检测-跟踪模型优化
+基于分类的行为识别模型效果依赖于前序的检测和跟踪效果，如果实际场景中不能准确检测到行人位置，或是难以正确在不同帧之间正确分配人物ID，都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题，请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../mot.md)对检测/跟踪模型进行优化。
+
+
 ### 半身图预测
 在打电话这一动作中，实际是通过上半身就能实现动作的区分的，因此在训练和预测过程中，将图像由行人全身图换为半身图

 ## 新增行为

-### 模型训练及测试
- 首先根据[Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/installation/install_paddleclas_en.md)完成PaddleClas的环境配置。
- 按照`数据准备`部分，完成训练/验证集图像的裁剪及标注文件准备。
- 模型训练: 参考[使用预训练模型进行训练](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/zh_CN/quick_start/quick_start_classification_new_user.md#422-%E4%BD%BF%E7%94%A8%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B%E8%BF%9B%E8%A1%8C%E8%AE%AD%E7%BB%83)完成模型的训练及精度验证
+### 数据准备
+参考前述介绍的内容，完成数据准备的部分，放置于`{root of PaddleClas}/dataset`下：
+```
+data/
+├── images  # 放置所有图片
+├── label_list.txt # 标签文件
+├── train_list.txt # 训练列表，包含图片及其对应类型
+└── val_list.txt   # 测试列表，包含图片及其对应类型
+```
+其中，训练及测试列表如下：
+```
+    # 每一行采用"空格"分隔图像路径与标注
+    train/000001.jpg 0
+    train/000002.jpg 0
+    train/000003.jpg 1
+    train/000004.jpg 2   # 新增的类别直接填写对应类别号即可
+    ...
+```
+`label_list.txt`中需要同样对应扩展类型的名称:
+```
+0 make_a_phone_call  # 类型0
+1 Your New Action    # 类型1
+ ...
+n normal             # 类型n
+```
+
+### 配置文件设置
+在PaddleClas中已经集成了[训练配置文件](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml)，需要重点关注的设置项如下：
+
+```yaml
+# model architecture
+Arch:
+  name: PPHGNet_tiny
+  class_num: 2       # 对应新增后的数量
+
+  ...
+
+# 正确设置image_root与cls_label_path，保证image_root + cls_label_path中的图片路径能够正确访问图片路径
+DataLoader:
+  Train:
+    dataset:
+      name: ImageNetDataset
+      image_root: ./dataset/
+      cls_label_path: ./dataset/phone_train_list_halfbody.txt
+
+      ...
+
+Infer:
+  infer_imgs: docs/images/inference_deployment/whl_demo.jpg
+  batch_size: 1
+  transforms:
+    - DecodeImage:
+        to_rgb: True
+        channel_first: False
+    - ResizeImage:
+        size: 224
+    - NormalizeImage:
+        scale: 1.0/255.0
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+  PostProcess:
+    name: Topk
+    topk: 2                                           # 显示topk的数量，不要超过类别总数
+    class_id_map_file: dataset/phone_label_list.txt   # 修改后的label_list.txt路径
+```
+
+### 模型训练及评估
+#### 模型训练
+通过如下命令启动训练：
+```bash
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
+        -o Arch.pretrained=True
+```
+其中 `Arch.pretrained` 为 `True`表示使用预训练权重帮助训练。
+
+#### 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估。
+
+```bash
+python3 tools/eval.py \
+    -c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
+    -o Global.pretrained_model=output/PPHGNet_tiny/best_model
+```
+
+其中 `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` 指定了当前最佳权重所在的路径，如果指定其他权重，只需替换对应的路径即可。

 ### 模型导出
 模型导出的详细介绍请参考[这里](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model)
 可以参考以下步骤实现：
 ```python
 python tools/export_model.py
-    -c ./PPHGNet_tiny_resize_halfbody.yaml \
+    -c ./PPHGNet_tiny_calling_halfbody.yaml \
    -o Global.pretrained_model=./output/PPHGNet_tiny/best_model \
-    -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_resize_halfbody
+    -o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody
 ```
-然后将导出的模型重命名，并加入配置文件，以适配PP-Human的使用
+然后将导出的模型重命名，并加入配置文件，以适配PP-Human的使用。
 ```bash
-cd ./output_inference/PPHGNet_tiny_resize_halfbody
+cd ./output_inference/PPHGNet_tiny_calling_halfbody

 mv inference.pdiparams model.pdiparams
 mv inference.pdiparams.info model.pdiparams.info
 mv inference.pdmodel model.pdmodel

-cp infer_cfg.yml .
+# 下载预测配置文件
+wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml
 ```
+
 至此，即可使用PP-Human进行实际预测了。
--- a/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md
+++ b/docs/advanced_tutorials/customization/action_recognotion/idbased_det.md
-# 基于人体id的检测开发
+# 基于人体id的检测模型开发

-## 数据准备
+## 环境准备
+
+基于人体id的检测方案是直接使用[PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection)的功能进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/docs/tutorials/INSTALL_cn.md)完成环境安装，以进行后续的模型训练及使用流程。

+## 数据准备
 基于检测的行为识别方案中，数据准备的流程与一般的检测模型一致，详情可参考[目标检测数据准备](../../tutorials/data/PrepareDetDataSet.md)。将图像和标注数据组织成PaddleDetection中支持的格式之一即可。

+**注意** ： 在实际使用的预测过程中，使用的是单人图像进行预测，因此在训练过程中建议将图像裁剪为单人图像，再进行烟头检测框的标注，以提升准确率。
+
+
 ## 模型优化

+### 检测-跟踪模型优化
+基于检测的行为识别模型效果依赖于前序的检测和跟踪效果，如果实际场景中不能准确检测到行人位置，或是难以正确在不同帧之间正确分配人物ID，都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题，请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../mot.md)对检测/跟踪模型进行优化。
+
+
 ### 更大的分辨率
 烟头的检测在监控视角下是一个典型的小目标检测问题，使用更大的分辨率有助于提升模型整体的识别率

@@ -13,18 +23,146 @@
 加入小目标场景数据集VisDrone下的预训练模型进行训练，模型mAP由38.1提升到39.7。

 ## 新增行为
+### 数据准备
+参考[目标检测数据准备](../../tutorials/data/PrepareDetDataSet.md)完成训练数据准备。
+
+准备完成后，数据路径为
+```
+dataset/smoking
+├── smoking # 存放所有的图片
+│   ├── 1.jpg
+│   ├── 2.jpg
+├── smoking_test_cocoformat.json # 测试标注文件
+├── smoking_train_cocoformat.json # 训练标注文件
+```
+
+以`COCO`格式为例，完成后的json标注文件内容如下：
+
+```json
+# images字段下包含了图像的路径，id及对应宽高信息
+  "images": [
+    {
+      "file_name": "smoking/1.jpg",
+      "id": 0,    # 此处id为图片id序号，不要重复
+      "height": 437,
+      "width": 212
+    },
+    {
+      "file_name": "smoking/2.jpg",
+      "id": 1,
+      "height": 655,
+      "width": 365
+    },
+
+ ...
+
+# categories 字段下包含所有类别信息，如果希望新增更多的检测类别，请在这里增加, 示例如下。
+  "categories": [
+    {
+      "supercategory": "cigarette",
+      "id": 1,
+      "name": "cigarette"
+    },
+    {
+      "supercategory": "Class_Defined_by_Yourself",
+      "id": 2,
+      "name": "Class_Defined_by_Yourself"
+    },
+
+  ...
+
+# annotations 字段下包含了所有目标实例的信息，包括类别，检测框坐标, id, 所属图像id等信息
+  "annotations": [
+    {
+      "category_id": 1,  # 对应定义的类别，在这里1代表cigarette
+      "bbox": [
+        97.0181345931,
+        332.7033243081,
+        7.5943999555,
+        16.4545332369
+      ],
+      "id": 0,           # 此处id为实例的id序号，不要重复
+      "image_id": 0,     # 此处为实例所在图片的id序号，可能重复，此时即一张图片上有多个实例对象
+      "iscrowd": 0,
+      "area": 124.96230648208665
+    },
+    {
+      "category_id": 2, # 对应定义的类别，在这里2代表Class_Defined_by_Yourself
+      "bbox": [
+        114.3895698372,
+        221.9131122343,
+        25.9530363697,
+        50.5401234568
+      ],
+      "id": 1,
+      "image_id": 1,
+      "iscrowd": 0,
+      "area": 1311.6696622034585
+```

-#### 模型训练及测试
- 按照`数据准备`部分，完成训练/验证集图像的裁剪及标注文件准备。
- 模型训练: 参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md)，执行下列步骤实现
+### 配置文件设置
+参考[配置文件](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), 其中需要关注重点如下：

+```yaml
+metric: COCO
+num_classes: 1 # 如果新增了更多的类别，请对应修改此处
+
+# 正确设置image_dir，anno_path，dataset_dir
+# 保证dataset_dir + anno_path 能正确对应标注文件的路径
+# 保证dataset_dir + image_dir + 标注文件中的图片路径可以正确对应到图片路径
+TrainDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: smoking_train_cocoformat.json
+    dataset_dir: dataset/smoking
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+
+EvalDataset:
+  !COCODataSet
+    image_dir: ""
+    anno_path: smoking_test_cocoformat.json
+    dataset_dir: dataset/smoking
+
+TestDataset:
+  !ImageFolder
+    anno_path: smoking_test_cocoformat.json
+    dataset_dir: dataset/smoking
+```
+
+### 模型训练及评估
+- 模型训练
+
+参考[PP-YOLOE](../../../configs/ppyoloe/README_cn.md)，执行下列步骤实现
 ```bash
-python -m paddle.distributed.launch --gpus 0,1,2,3  tools/train.py -c ppyoloe_smoking/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
+# At Root of PaddleDetection
+
+python -m paddle.distributed.launch --gpus 0,1,2,3  tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
 ```

-#### 模型导出
+- 模型评估
+
+训练好模型之后，可以通过以下命令实现对模型指标的评估
+```bash
+# At Root of PaddleDetection
+
+python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
+```
+
+### 模型导出
 注意：如果在Tensor-RT环境下预测, 请开启`-o trt=True`以获得更好的性能
 ```bash
-python tools/export_model.py -c ppyoloe_smoking/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
+# At Root of PaddleDetection
+
+python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
 ```
+
+导出模型后，可以得到：
+```
+ppyoloe_crn_s_80e_smoking_visdrone/
+├── infer_cfg.yml
+├── model.pdiparams
+├── model.pdiparams.info
+└── model.pdmodel
+```
+
 至此，即可使用PP-Human进行实际预测了。
--- a/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md
+++ b/docs/advanced_tutorials/customization/action_recognotion/skeletonbased_rec.md
 # 基于人体骨骼点的行为识别

+## 环境准备
+
+基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。请按照[安装说明](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/install.md)完成PaddleVideo的环境安装，以进行后续的模型训练及使用流程。
+
 ## 数据准备
+使用该方案训练的模型，可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据，以适配PaddleVideo进行训练，其主要流程包含以下步骤：

-基于骨骼点的行为识别方案是借助[PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo)进行模型训练的。使用该方案训练的模型，可以参考[此文档](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE%AD%E7%BB%83%E6%95%B0%E6%8D%AE)准备训练数据。其主要流程包含以下步骤：

 ### 数据格式说明
-STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中，训练数据为采用`.npy`格式存储的`Numpy`数据，标签则可以是`.npy`或`.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`。
+STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo中，训练数据为采用`.npy`格式存储的`Numpy`数据，标签则可以是`.npy`或`.pkl`格式存储的文件。对于序列数据的维度要求为`(N,C,T,V,M)`，当前方案仅支持单人构成的行为（但视频中可以存在多人，每个人独自进行行为识别判断），即`M=1`。

 | 维度 | 大小 | 说明 |
 | ---- | ---- | ---------- |
@@ -20,6 +24,29 @@ STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo
 - 模型预测：可以直接选用[PaddleDetection KeyPoint模型系列](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.4/configs/keypoint) 模型库中的模型，并根据`3、训练与测试 - 部署预测 - 检测+keypoint top-down模型联合部署`中的步骤获取目标序列的17个关键点坐标。
 - 人工标注：若对关键点的数量或是定义有其他需求，也可以直接人工标注各个关键点的坐标位置，注意对于被遮挡或较难标注的点，仍需要标注一个大致坐标，否则后续网络学习过程会受到影响。

+
+当使用模型预测获取时，可以参考如下步骤进行，请注意此时在PaddleDetection中进行操作。
+
+```bash
+# current path is under root of PaddleDetection
+
+# Step 1: download pretrained inference models.
+wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
+wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
+unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip
+unzip -d output_inference/ dark_hrnet_w32_256x192.zip
+
+# Step 2: Get the keypoint coordinarys
+
+# if your data is image sequence
+python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True
+
+# if your data is video
+python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True
+```
+这样我们会得到一个`det_keypoint_unite_image_results.json`的检测结果文件。内容的具体含义请见[这里](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108)。
+
+
 ### 统一序列的时序长度
 由于实际数据中每个动作的长度不一，首先需要根据您的数据和实际场景预定时序长度（在PP-Human中我们采用50帧为一个动作序列），并对数据做以下处理：
 - 实际长度超过预定长度的数据，随机截取一个50帧的片段
@@ -35,11 +62,30 @@ STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo

 注意：这里的`class_id`是`int`类型，与其他分类任务类似。例如`0：摔倒， 1：其他`。

+
+我们提供了执行该步骤的[脚本文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py),可以直接处理生成的`det_keypoint_unite_image_results.json`文件，该脚本执行的内容包括解析json文件内容、前述步骤中介绍的整理训练数据及保存数据文件。
+
+```bash
+mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations
+
+mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json
+
+cd {root of PaddleVideo}/applications/PPHuman/datasets/
+
+python prepare_dataset.py
+```
+
 至此，我们得到了可用的训练数据（`.npy`）和对应的标注文件（`.pkl`）。


 ## 模型优化

+### 检测-跟踪模型优化
+基于骨骼点的行为识别模型效果依赖于前序的检测和跟踪效果，如果实际场景中不能准确检测到行人位置，或是难以正确在不同帧之间正确分配人物ID，都会使行为识别部分表现受限。如果在实际使用中遇到了上述问题，请参考[目标检测任务二次开发](../detection.md)以及[多目标跟踪任务二次开发](../mot.md)对检测/跟踪模型进行优化。
+
+### 关键点模型优化
+骨骼点作为该方案的核心特征，对行人的骨骼点定位效果也决定了行为识别的整体效果。若发现在实际场景中对关键点坐标的识别结果有明显错误，从关键点组成的骨架图像看，已经难以辨别具体动作，可以参考[关键点检测任务二次开发](../keypoint_detection.md)对关键点模型进行优化。
+
 ### 坐标归一化处理
 在完成骨骼点坐标的获取后，建议根据各人物的检测框进行归一化处理，以消除人物位置、尺度的差异给网络带来的收敛难度。

@@ -48,9 +94,58 @@ STGCN是一个基于骨骼点坐标序列进行预测的模型。在PaddleVideo

 基于关键点的行为识别方案中，行为识别模型使用的是[ST-GCN](https://arxiv.org/abs/1801.07455)，并在[PaddleVideo训练流程](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/zh-CN/model_zoo/recognition/stgcn.md)的基础上修改适配，完成模型训练及导出使用流程。

+
+### 数据准备与配置文件修改
+- 按照`数据准备`, 准备训练数据（`.npy`）和对应的标注文件（`.pkl`）。对应放置在`{root of PaddleVideo}/applications/PPHuman/datasets/`下。
+
+- 参考[配置文件](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), 需要重点关注的内容如下：
+
+```yaml
+MODEL: #MODEL field
+    framework:
+        backbone:
+        name: "STGCN"
+        in_channels: 2  # 此处对应数据说明中的C维，表示二维坐标。
+        dropout: 0.5
+        layout: 'coco_keypoint'
+        data_bn: True
+    head:
+        name: "STGCNHead"
+        num_classes: 2  # 如果数据中有多种行为类型，需要修改此处使其与预测类型数目一致。
+    if_top5: False # 行为类型数量不足5时请设置为False，否则会报错
+
+...
+
+
+# 请根据数据路径正确设置train/valid/test部分的数据及label路径
+DATASET: #DATASET field
+    batch_size: 64
+    num_workers: 4
+    test_batch_size: 1
+    test_num_workers: 0
+    train:
+        format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle
+        file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path
+        label_path: "./applications/PPHuman/datasets/train_label.pkl"
+
+    valid:
+        format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
+        file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
+        label_path: "./applications/PPHuman/datasets/val_label.pkl"
+
+        test_mode: True
+    test:
+        format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
+        file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
+        label_path: "./applications/PPHuman/datasets/val_label.pkl"
+
+        test_mode: True
+```
+
 ### 模型训练与测试
- 按照`数据准备`, 准备训练数据
+
 - 在PaddleVideo中，使用以下命令即可开始训练：
+
 ```bash
 # current path is under root of PaddleVideo
 python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
@@ -59,7 +154,7 @@ python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
 python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml
 ```

-在训练完成后，采用以下命令进行预测：
+- 在训练完成后，采用以下命令进行预测：
 ```bash
 python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml  -w output/STGCN/STGCN_best.pdparams
 ```
@@ -90,3 +185,14 @@ STGCN
 ```

 至此，就可以使用PP-Human进行行为识别的推理了。
+
+**注意**：如果在训练时调整了视频序列的长度或关键点的数量，在此处需要对应修改配置文件中`INFERENCE`字段内容，以实现正确预测。
+```yaml
+# 序列数据的维度为(N,C,T,V,M)
+INFERENCE:
+    name: 'STGCN_Inference_helper'
+    num_channels: 2 # 对应C维
+    window_size: 50 # 对应T维，请对应调整为数据长度
+    vertex_nums: 17 # 对应V维，请对应调整为关键点数目
+    person_nums: 1 # 对应M维
+```