未验证 提交 49d0000c 编写于 作者: J JYChen 提交者: GitHub

add en doc for action development (#6484)

上级 2cb48107
简体中文 | [English](./README_en.md)
# 行为识别任务二次开发
......@@ -5,7 +7,7 @@
## 方案选择
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
......@@ -43,7 +45,7 @@
1. [基于人体id检测的行为识别](./idbased_det.md)
2. [基于人体id分类的行为识别](./idbased_clas.md)
[简体中文](./README.md) | English
# Secondary Development for Action Recognition Task
In the process of industrial implementation, the application of action recognition algorithms will inevitably lead to the need for customized types of action, or the optimization of existing action recognition models to improve the performance of the model in specific scenarios. In view of the diversity of behaviors, PP-Human supports the identification of five abnormal behavioras of smoking, making phone calls, falling, fighting, and people intrusion. At the same time, according to the different behaviors, PP-Human integrates five action recognition technology solutions based on video classification, detection-based, image-based classification, tracking-based and skeleton-based, which can cover 90%+ action type recognition and meet various development needs. In this document, we use a case to introduce how to select a action recognition solution according to the expected behavior, and use PaddleDetection to carry out the secondary development of the action recognition algorithm, including: solution selection, data preparation, model optimization and development process for adding new actions.
## Solution Selection
In PaddleDetection's PP-Human, we provide a variety of solutions for behavior recognition: video classification, image classification, detection, tracking-based, and skeleton point-based behavior recognition solutions, in order to meet the needs of different scenes and different target behaviors.
<img width="1091" alt="image" src="https://user-images.githubusercontent.com/22989727/178742352-d0c61784-3e93-4406-b2a2-9067f42cb343.png">
The following takes several specific actions that PaddleDetection currently supports as an example to introduce the selection basis of each action:
### Smoking
Solution selection: action recognition based on detection with human id.
Reason: The smoking action has a obvious feature target, that is, cigarette. So we can think that when a cigarette is detected in the corresponding image of a person, the person is with the smoking action. Compared with video-based or skeleton-based recognition schemes, training detection model needs to collect data at the image level rather than the video level, which can significantly reduce the difficulty of data collection and labeling. In addition, the detection task has abundant pre-training model resources, and the performance of the model will be more guaranteed.
### Making Phone Calls
Solution selection: action recognition based on classification with human id.
Reason: Although there is a characteristic target of a mobile phone in the call action, in order to distinguish actions such as looking at the mobile phone, and considering that there will be much occlusion of the mobile phone in the calling action in the security scene (such as the occlusion of the mobile phone by the hand or head, etc.), is not conducive to the detection model to correctly detect the target. Simultaneous, calls usually last a long time, and the character's action do not change much, so a strategy for frame-level image classification can therefore be employed. In addition, the action of making a phone call can mainly be judged by the upper body, and the half-body picture can be used to remove redundant information to reduce the difficulty of model training.
### Falling
Solution selection: action recognition based on skelenton.
Reason: Falling is an obvious temporal action, which is distinguishable by a character himself, and it is scene-independent. Since PP-Human is towards the security monitoring scene, where the background changes are more complicated, and the real-time inference needs to be considered in the deployment, the action recognition based on skeleton points is adopted to obtain better generalization and running speed.
### People Intrusion
Solution selection: action recognition based on tracking with human id.
Reason: The intrusion recognition can be judged by whether the pedestrian's path or location is in a selected area, and it is unrelated to pedestrian's body action. Therefore, it is only necessary to track the human and use coordinate results to analyze whether there is intrusion behavior.
### Fighting
Solution selection: action recognition based on video classification.
Reason: Unlike the actions above, fighting is a typical multiplayer action. Therefore, the detection and tracking model is no longer used to extract pedestrians and their IDs, but the entire video clip is processed. In addition, the mutual occlusion between various targets in the fighting scene is extremely serious, leading to the accuracy of keypoint recognition is not good.
The following are detailed description for the five major categories of solutions, including the data preparation, model optimization and adding new actions.
1. [action recognition based on detection with human id.](./idbased_det_en.md)
2. [action recognition based on classification with human id.](./idbased_clas_en.md)
3. [action recognition based on skelenton.](./skeletonbased_rec_en.md)
4. [action recognition based on tracking with human id](../mot_en.md)
5. [action recognition based on video classification](./videobased_rec_en.md)
简体中文 | [English](./idbased_clas_en.md)
# 基于人体id的分类模型开发
## 环境准备
[简体中文](./idbased_clas.md) | English
# Development for Action Recognition Based on Classification with Human ID
## Environmental Preparation
The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Please refer to [Install PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/install_paddleclas_en.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
The model of action recognition based on classification with human id directly recognizes the image frames of video, so the model training process is same with the usual image classification model.
### Dataset Download
The action recognition of making phone calls is trained on the public dataset [UAV-Human](https://github.com/SUTDCV/UAV-Human). Please fill in the relevant application materials through this link to obtain the download link.
The RGB video in this dataset is included in the `UAVHuman/ActionRecognition/RGBVideos` path, and the file name of each video is its annotation information.
### Image Processing for Training and Validation
According to the video file name, in which the `A` field (i.e. action) related to action recognition, we can find the action type of the video data that we expect to recognize.
- Positive sample video: Taking phone calls as an example, we just need to find the file containing `A024`.
- Negative sample video: All videos except the target action.
In view of the fact that there will be much redundancy when converting video data into images, for positive sample videos, we sample at intervals of 8 frames, and use the pedestrian detection model to process it into a half-body image (take the upper half of the detection frame, that is, `img = img[: H/2, :, :]`). The image sampled from the positive sample video is regarded as a positive sample, and the sampled image from the negative sample video is regarded as a negative sample.
**Note**: The positive sample video does not completely are the action of making a phone call. There will be some redundant actions at the beginning and end of the video, which need to be removed.
### Preparation for Annotation File
The model of action recognition based on classification with human id is trained with [PaddleClas](https://github.com/PaddlePaddle/PaddleClas). Thus the model trained with this scheme needs to prepare the desired image data and corresponding annotation files. Please refer to [Image Classification Datasets](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/data_preparation/classification_dataset_en.md) to prepare the data. An example of an annotation file is as follows, where `0` and `1` are the corresponding categories of the image:
# Each line uses "space" to separate the image path and label
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
Additionally, the label file `phone_label_list.txt` helps map category numbers to specific type names:
0 make_a_phone_call # type 0
1 normal # type 1
After the above content finished, place it to the `dataset` directory, the file structure is as follow:
├── images # All images
├── phone_label_list.txt # Label file
├── phone_train_list.txt # Training list, including pictures and their corresponding types
└── phone_val_list.txt # Validation list, including pictures and their corresponding types
## Model Optimization
### Detection-Tracking Model Optimization
The performance of action recognition based on classification with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../mot_en.md) for detection/track model optimization.
### Half-Body Prediction
In the action of making a phone call, the action classification can be achieved through the upper body image. Therefore, during the training and prediction process, the image is changed from the pedestrian full-body to half-body.
## Add New Action
### Data Preparation
Referring to the previous introduction, complete the data preparation part and place it under `{root of PaddleClas}/dataset`:
├── images # All images
├── label_list.txt # Label file
├── train_list.txt # Training list, including pictures and their corresponding types
└── val_list.txt # Validation list, including pictures and their corresponding types
Where the training list and validation list file are as follow:
# Each line uses "space" to separate the image path and label
train/000001.jpg 0
train/000002.jpg 0
train/000003.jpg 1
train/000004.jpg 2 # For the newly added categories, simply fill in the corresponding category number.
`label_list.txt` should give name of the extension type:
0 make_a_phone_call # class 0
1 Your New Action # class 1
n normal # class n
### Configuration File Settings
The [training configuration file] (https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml) has been integrated in PaddleClas. The settings that need to be paid attention to are as follows:
# model architecture
name: PPHGNet_tiny
class_num: 2 # Corresponding to the number of action categories
# Please correctly set image_root and cls_label_path to ensure that the image_root + image path in cls_label_path can access the image correctly
name: ImageNetDataset
image_root: ./dataset/
cls_label_path: ./dataset/phone_train_list_halfbody.txt
infer_imgs: docs/images/inference_deployment/whl_demo.jpg
batch_size: 1
- DecodeImage:
to_rgb: True
channel_first: False
- ResizeImage:
size: 224
- NormalizeImage:
scale: 1.0/255.0
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: ''
- ToCHWImage:
name: Topk
topk: 2 # Display the number of topks, do not exceed the total number of categories
class_id_map_file: dataset/phone_label_list.txt # path of label_list.txt
### Model Training And Evaluation
#### Model Training
Start training with the following command:
python3 -m paddle.distributed.launch \
--gpus="0,1,2,3" \
tools/train.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Arch.pretrained=True
where `Arch.pretrained=True` is to use pretrained weights to help with training.
#### Model Evaluation
After training the model, use the following command to evaluate the model metrics.
python3 tools/eval.py \
-c ./ppcls/configs/practical_models/PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=output/PPHGNet_tiny/best_model
Where `-o Global.pretrained_model="output/PPHGNet_tiny/best_model"` specifies the path where the current best weight is located. If other weights are needed, just replace the corresponding path.
#### Model Export
For the detailed introduction of model export, please refer to [here](https://github.com/PaddlePaddle/PaddleClas/blob/develop/docs/en/inference_deployment/export_model_en.md#2-export-classification-model)
You can refer to the following steps:
python tools/export_model.py
-c ./PPHGNet_tiny_calling_halfbody.yaml \
-o Global.pretrained_model=./output/PPHGNet_tiny/best_model \
-o Global.save_inference_dir=./output_inference/PPHGNet_tiny_calling_halfbody
Then rename the exported model and add the configuration file to suit the usage of PP-Human.
cd ./output_inference/PPHGNet_tiny_calling_halfbody
mv inference.pdiparams model.pdiparams
mv inference.pdiparams.info model.pdiparams.info
mv inference.pdmodel model.pdmodel
# Download configuration file for inference
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_configs/PPHGNet_tiny_calling_halfbody/infer_cfg.yml
At this point, this model can be used in PP-Human.
简体中文 | [English](./idbased_det_en.md)
# 基于人体id的检测模型开发
## 环境准备
......@@ -5,7 +7,7 @@
## 数据准备
**注意** : 在实际使用的预测过程中,使用的是单人图像进行预测,因此在训练过程中建议将图像裁剪为单人图像,再进行烟头检测框的标注,以提升准确率。
......@@ -24,7 +26,7 @@
## 新增行为
### 数据准备
......@@ -130,16 +132,16 @@ TestDataset:
### 模型训练及评估
- 模型训练
#### 模型训练
# At Root of PaddleDetection
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
- 模型评估
#### 模型评估
[简体中文](./idbased_det.md) | English
# Development for Action Recognition Based on Detection with Human ID
## Environmental Preparation
The model of action recognition based on detection with human id is trained with [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection). Please refer to [Installation](../../../tutorials/INSTALL.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
The model of action recognition based on detection with human id directly recognizes the image frames of video, so the model training process is same with preparation process of general detection model. For details, please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md). Please process image and annotation of data into one of the formats PaddleDetection supports.
**Note**: In the actual prediction process, a single person image is used for prediction. So it is recommended to crop the image into a single person image during the training process, and label the cigarette detection bounding box to improve the accuracy.
## Model Optimization
### Detection-Tracking Model Optimization
The performance of action recognition based on detection with human id depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../mot_en.md) for detection/track model optimization.
### Larger resolution
The detection of cigarette is a typical small target detection problem from the monitoring perspective. Using a larger resolution can help improve the overall performance of the model.
### Pretrained model
The pretrained model under the small target scene dataset VisDrone is used for training, and the mAP of the model is increased from 38.1 to 39.7.
## Add New Action
### Data Preparation
please refer to [Data Preparation for Detection](../../../tutorials/data/PrepareDetDataSet_en.md) to complete the data preparation part.
When finish this step, the path will look like:
├── smoking # all images
│   ├── 1.jpg
│   ├── 2.jpg
├── smoking_test_cocoformat.json # Validation file
├── smoking_train_cocoformat.json # Training file
Taking the `COCO` format as an example, the content of the completed json annotation file is as follows:
# The "images" field contains the path, id and corresponding width and height information of the images.
"images": [
"file_name": "smoking/1.jpg",
"id": 0, # Here id is the picture id serial number, do not duplicate
"height": 437,
"width": 212
"file_name": "smoking/2.jpg",
"id": 1,
"height": 655,
"width": 365
# The "categories" field contains all category information. If you want to add more detection categories, please add them here. The example is as follows.
"categories": [
"supercategory": "cigarette",
"id": 1,
"name": "cigarette"
"supercategory": "Class_Defined_by_Yourself",
"id": 2,
"name": "Class_Defined_by_Yourself"
# The "annotations" field contains information about all instances, including category, bounding box coordinates, id, image id and other information
"annotations": [
"category_id": 1, # Corresponding to the defined category, where 1 represents cigarette
"bbox": [
"id": 0, # Here id is the id serial number of the instance, do not duplicate
"image_id": 0, # Here is the id serial number of the image where the instance is located, which may be duplicated. In this case, there are multiple instance objects on one image.
"iscrowd": 0,
"area": 124.96230648208665
"category_id": 2, # Corresponding to the defined category, where 2 represents Class_Defined_by_Yourself
"bbox": [
"id": 1,
"image_id": 1,
"iscrowd": 0,
"area": 1311.6696622034585
### Configuration File Settings
Refer to [Configuration File](../../../../configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml), the key should be paid attention to are as follows:
metric: COCO
num_classes: 1 # If more categories are added, please modify here accordingly
# Set image_dir,anno_path,dataset_dir correctly
# Ensure that dataset_dir + anno_path can correctly access to the path of the annotation file
# Ensure that dataset_dir + image_dir + the image path in the annotation file can correctly access to the image path
image_dir: ""
anno_path: smoking_train_cocoformat.json
dataset_dir: dataset/smoking
data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
image_dir: ""
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
anno_path: smoking_test_cocoformat.json
dataset_dir: dataset/smoking
### Model Training And Evaluation
#### Model Training
As [PP-YOLOE](../../../../configs/ppyoloe/README.md), start training with the following command:
# At Root of PaddleDetection
python -m paddle.distributed.launch --gpus 0,1,2,3 tools/train.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml --eval
#### Model Evaluation
After training the model, use the following command to evaluate the model metrics.
# At Root of PaddleDetection
python tools/eval.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml
#### Model Export
Note: If predicting in Tensor-RT environment, please enable `-o trt=True` for better performance.
# At Root of PaddleDetection
python tools/export_model.py -c configs/pphuman/ppyoloe_crn_s_80e_smoking_visdrone.yml -o weights=output/ppyoloe_crn_s_80e_smoking_visdrone/best_model trt=True
After exporting the model, you can get:
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
└── model.pdmodel
At this point, this model can be used in PP-Human.
简体中文 | [English](./skeletonbased_rec_en.md)
# 基于人体骨骼点的行为识别
## 环境准备
......@@ -143,7 +145,6 @@ DATASET: #DATASET field
### 模型训练与测试
- 在PaddleVideo中,使用以下命令即可开始训练:
[简体中文](./skeletonbased_rec.md) | English
# Skeleton-based action recognition
## Environmental Preparation
The skeleton-based action recognition is trained with [PaddleVideo](https://github.com/PaddlePaddle/PaddleVideo). Please refer to [Installation](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/install.md) to complete the environment installation for subsequent model training and usage processes.
## Data Preparation
For the model of skeleton-based model, you can refer to [this document](https://github.com/PaddlePaddle/PaddleVideo/tree/develop/applications/PPHuman#%E5%87%86%E5%A4%87%E8%AE %AD%E7%BB%83%E6%95%B0%E6%8D%AE) to preparation training adapted to PaddleVideo. The main process includes the following steps:
### Data Format Description
STGCN is a model based on the sequence of skeleton point coordinates. In PaddleVideo, training data is `Numpy` data stored with `.npy` format, and labels can be files stored in `.npy` or `.pkl` format. The dimension requirement for sequence data is `(N,C,T,V,M)`, the current solution only supports behaviors composed of a single person (but there can be multiple people in the video, and each person performs action recognition separately), that is` M=1`.
| Dim | Size | Description |
| ---- | ---- | ---------- |
| N | Not Fixed | The number of sequences in the dataset |
| C | 2 | Keypoint coordinate, i.e. (x, y) |
| T | 50 | The temporal dimension of the action sequence (i.e. the number of continuous frames)|
| V | 17 | The number of keypoints of each person, here we use the definition of the `COCO` dataset, see [here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/docs/tutorials/PrepareKeypointDataSet_en.md#description-for-coco-datasetkeypoint) |
| M | 1 | The number of persons, here we only predict a single person for each action sequence |
### Get The Skeleton Point Coordinates of The Sequence
For a sequence to be labeled (here a sequence refers to an action segment, which can be a video or an ordered collection of pictures). The coordinates of skeletal points (also known as keypoints) can be obtained through model prediction or manual annotation.
- Model prediction: You can directly select the model in the [PaddleDetection KeyPoint Models](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/README_en.md) and according to `3, training and testing - Deployment Prediction - Detect + keypoint top-down model joint deployment` to get the 17 keypoint coordinates of the target sequence.
When using the model to predict and obtain the coordinates, you can refer to the following steps, please note that the operation in PaddleDetection at this time.
# current path is under root of PaddleDetection
# Step 1: download pretrained inference models.
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/mot_ppyoloe_l_36e_pipeline.zip
wget https://bj.bcebos.com/v1/paddledet/models/pipeline/dark_hrnet_w32_256x192.zip
unzip -d output_inference/ mot_ppyoloe_l_36e_pipeline.zip
unzip -d output_inference/ dark_hrnet_w32_256x192.zip
# Step 2: Get the keypoint coordinarys
# if your data is image sequence
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --image_dir={your image directory path} --device=GPU --save_res=True
# if your data is video
python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/mot_ppyoloe_l_36e_pipeline/ --keypoint_model_dir=output_inference/dark_hrnet_w32_256x192 --video_file={your video file path} --device=GPU --save_res=True
We can get a detection result file named `det_keypoint_unite_image_results.json`. The detail of content can be seen at [Here](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/deploy/python/det_keypoint_unite_infer.py#L108).
### Uniform Sequence Length
Since the length of each action in the actual data is different, the first step is to pre-determine the time sequence length according to your data and the actual scene (in PP-Human, we use 50 frames as an action sequence), and do the following processing to the data:
- If the actual length exceeds the predetermined length, a 50-frame segment will be randomly intercepted
- Data whose actual length is less than the predetermined length: fill with 0 until 50 frames are met
- data exactly equal to the predeter: no processing required
Note: After this step is completed, please strictly confirm that the processed data contains a complete action, and there will be no ambiguity in prediction. It is recommended to confirm by visualizing the data.
### Save to PaddleVideo usable formats
After the first two steps of processing, we get the annotation of each character action fragment. At this time, we have a list `all_kpts`, which contains multiple keypoint sequence fragments, each one has a shape of (T, V, C) (in our case (50, 17, 2)), which is further converted into a format usable by PaddleVideo.
- Adjust dimension order: `np.transpose` and `np.expand_dims` can be used to convert the dimension of each fragment into (C, T, V, M) format.
- Combine and save all clips as one file
Note: `class_id` is a `int` type variable, similar to other classification tasks. For example `0: falling, 1: other`.
We provide a [script file](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/datasets/prepare_dataset.py) to do this step, which can directly process the generated `det_keypoint_unite_image_results.json` file. The content executed by the script includes parsing the content of the json file, unforming the training data sequence and saving the data file as described in the preceding steps.
mkdir {root of PaddleVideo}/applications/PPHuman/datasets/annotations
mv det_keypoint_unite_image_results.json {root of PaddleVideo}/applications/PPHuman/datasets/annotations/det_keypoint_unite_image_results_{video_id}_{camera_id}.json
cd {root of PaddleVideo}/applications/PPHuman/datasets/
python prepare_dataset.py
Now, we have available training data (`.npy`) and corresponding annotation files (`.pkl`).
## Model Optimization
### detection-tracking model optimization
The performance of action recognition based on skelenton depends on the pre-order detection and tracking models. If the pedestrian location cannot be accurately detected in the actual scene, or it is difficult to correctly assign the person ID between different frames, the performance of the action recognition part will be limited. If you encounter the above problems in actual use, please refer to [Secondary Development of Detection Task](../detection_en.md) and [Secondary Development of Multi-target Tracking Task](../mot_en.md) for detection/track model optimization.
### keypoint model optimization
As the core feature of the scheme, the skeleton point positioning performance also determines the overall effect of action recognition. If there are obvious errors in the recognition results of the keypoint coordinates of in the actual scene, it is difficult to distinguish the specific actions from the skeleton image composed of the keypoint.
You can refer to [Secondary Development of Keypoint Detection Task](../keypoint_detection_en.md) to optimize the keypoint model.
### Coordinate Normalization
After getting coordinates of the skeleton points, it is recommended to perform normalization processing according to the detection bounding box of each person to reduce the convergence difficulty brought by the difference in the position and scale of the person.
## Add New Action
In skeleton-based action recognition, the model is [ST-GCN](https://arxiv.org/abs/1801.07455). Modified to adapt PaddleVideo based on [Training Step](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/docs/en/model_zoo/recognition/stgcn.md). And complete the model training and exporting process.
### Data Preparation And Configuration File Settings
- Prepare the training data (`.npy`) and the corresponding annotation file (`.pkl`) according to `Data preparation`. Correspondingly placed under `{root of PaddleVideo}/applications/PPHuman/datasets/`.
- Refer [Configuration File](https://github.com/PaddlePaddle/PaddleVideo/blob/develop/applications/PPHuman/configs/stgcn_pphuman.yaml), the things to focus on are as follows:
name: "STGCN"
in_channels: 2 # This corresponds to the C dimension in the data format description, representing two-dimensional coordinates.
dropout: 0.5
layout: 'coco_keypoint'
data_bn: True
name: "STGCNHead"
num_classes: 2 # If there are multiple action types in the data, this needs to be modified to match the number of types.
if_top5: False # When the number of action types is less than 5, please set it to False, otherwise an error will be raised.
# Please set the data and label path of the train/valid/test part correctly according to the data path
batch_size: 64
num_workers: 4
test_batch_size: 1
test_num_workers: 0
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddle
file_path: "./applications/PPHuman/datasets/train_data.npy" #mandatory, train data index file path
label_path: "./applications/PPHuman/datasets/train_label.pkl"
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
format: "SkeletonDataset" #Mandatory, indicate the type of dataset, associate to the 'paddlevideo/loader/dateset'
file_path: "./applications/PPHuman/datasets/val_data.npy" #Mandatory, valid data index file path
label_path: "./applications/PPHuman/datasets/val_label.pkl"
test_mode: True
### Model Training And Evaluation
- In PaddleVideo, start training with the following command:
# current path is under root of PaddleVideo
python main.py -c applications/PPHuman/configs/stgcn_pphuman.yaml
# Since the task may overfit, it is recommended to evaluate model during training to save the best model.
python main.py --validate -c applications/PPHuman/configs/stgcn_pphuman.yaml
- After training the model, use the following command to do inference.
python main.py --test -c applications/PPHuman/configs/stgcn_pphuman.yaml -w output/STGCN/STGCN_best.pdparams
### Model Export
In PaddleVideo, use the following command to export model and get structure file `STGCN.pdmodel` and weight file `STGCN.pdiparams`. And add the configuration file here.
# current path is under root of PaddleVideo
python tools/export_model.py -c applications/PPHuman/configs/stgcn_pphuman.yaml \
-p output/STGCN/STGCN_best.pdparams \
-o output_inference/STGCN
cp applications/PPHuman/configs/infer_cfg.yml output_inference/STGCN
# Rename model files to adapt PP-Human
cd output_inference/STGCN
mv STGCN.pdiparams model.pdiparams
mv STGCN.pdiparams.info model.pdiparams.info
mv STGCN.pdmodel model.pdmodel
The directory structure will look like:
├── infer_cfg.yml
├── model.pdiparams
├── model.pdiparams.info
├── model.pdmodel
At this point, this model can be used in PP-Human.
**Note**: If the length of the video sequence or the number of keypoints is changed during training, the content of the `INFERENCE` field in the configuration file needs to be modified accordingly to correct prediction.
# The dimension of the sequence data is (N,C,T,V,M)
name: 'STGCN_Inference_helper'
num_channels: 2 # Corresponding to C dimension
window_size: 50 # Corresponding to T dimension, please set it accordingly to the sequence length.
vertex_nums: 17 # Corresponding to V dimension, please set it accordingly to the number of keypoints
person_nums: 1 # Corresponding to M dimension
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
想要评论请 注册