add en doc & update yoloseries, test=document_fix (#6614)

* add en doc & update yoloseries, test=document_fix * update readme, test=document_fix

add en doc & update yoloseries, test=document_fix (#6614)
* add en doc & update yoloseries, test=document_fix * update readme, test=document_fix
d1892fff · wangguanzhong · GitHub · 54c3e05e · d1892fff · d1892fff
12 changed file
--- a/README_cn.md
+++ b/README_cn.md
@@ -22,12 +22,18 @@
 </div>
 ## <img src="https://user-images.githubusercontent.com/48054808/157793354-6e7f381a-0aa6-4bb7-845c-9acf2ecc05c3.png" width="20"/> 产品动态
+- 🔥 **2022.8.09：[YOLO家族全系列模型](https://github.com/nemonameless/PaddleDetection_YOLOSeries)发布**
+  - 全面覆盖的YOLO家族经典与最新模型: 包括YOLOv3，百度飞桨自研的实时高精度目标检测检测模型PP-YOLOE，以及前沿检测算法YOLOv4、YOLOv5、YOLOX，MT-YOLOv6及YOLOv7
+  - 更强的模型性能：基于各家前沿YOLO算法进行创新并升级，缩短训练周期5~8倍，精度普遍提升1%~5% mAP；使用模型压缩策略实现精度无损的同时速度提升30%以上
+  - 完备的端到端开发支持：支持从模型训练、评估、预测到模型量化压缩，部署多种硬件的端到端开发全流程。同时支持不同模型算法灵活切换，一键实现算法二次开发
 - 🔥 **2022.8.01：发布[PP-TinyPose升级版](./configs/keypoint/tiny_pose/). 在健身、舞蹈等场景的业务数据集端到端AP提升9.1**
  - 新增体育场景真实数据，复杂动作识别效果显著提升，覆盖侧身、卧躺、跳跃、高抬腿等非常规动作
  - 检测模型采用[PP-PicoDet增强版](./configs/picodet/README.md)，在COCO数据集上精度提升3.1%
  - 关键点稳定性增强，新增滤波稳定方式，使得视频预测结果更加稳定平滑
- 🔥 **2022.7.14：[行人分析工具PP-Human v2](./deploy/pipeline)发布**
+- 2022.7.14：[行人分析工具PP-Human v2](./deploy/pipeline)发布
  - 四大产业特色功能：高性能易扩展的五大复杂行为识别、闪电级人体属性识别、一行代码即可实现的人流检测与轨迹留存以及高精度跨镜跟踪
  - 底层核心算法性能强劲：覆盖行人检测、跟踪、属性三类核心算法能力，对目标人数、光线、背景均无限制
  - 极低使用门槛：提供保姆级全流程开发及模型优化策略、一行命令完成推理、兼容各类数据输入格式
@@ -38,15 +44,6 @@
  - 发布实时行人分析工具[PP-Human](deploy/pipeline)，支持行人跟踪、人流量统计、人体属性识别与摔倒检测四大能力，基于真实场景数据特殊优化，精准识别各类摔倒姿势，适应不同环境背景、光线及摄像角度。
  - 新增[YOLOX](configs/yolox)目标检测模型，支持nano/tiny/s/m/l/x版本，x版本COCO val2017数据集精度51.8%。
- 2021.11.03: PaddleDetection发布[release/2.3版本](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3)
-  - 发布轻量级检测特色模型⚡[PP-PicoDet](configs/picodet)，0.99m的参数量可实现精度30+mAP、速度150FPS。
-  - 发布轻量级关键点特色模型⚡[PP-TinyPose](configs/keypoint/tiny_pose)，单人场景FP16推理可达122FPS、51.8AP，具有精度高速度快、检测人数无限制、微小目标效果好的优势。
-  - 发布实时跟踪系统[PP-Tracking](deploy/pptracking)，覆盖单、多镜头下行人、车辆、多类别跟踪，对小目标、密集型特殊优化，提供人、车流量技术解决方案。
-  - 新增[Swin Transformer](configs/faster_rcnn)，[TOOD](configs/tood)，[GFL](configs/gfl)目标检测模型。
-  - 发布[Sniper](configs/sniper)小目标检测优化模型，发布针对EdgeBoard优化[PP-YOLO-EB](configs/ppyolo)模型。
-  - 新增轻量化关键点模型[Lite HRNet](configs/keypoint)关键点模型并支持Paddle Lite部署。
 - [更多版本发布](https://github.com/PaddlePaddle/PaddleDetection/releases)
 ## <img title="" src="https://user-images.githubusercontent.com/48054808/157795569-9fc77c85-732f-4870-9be0-99a7fe2cff27.png" alt="" width="20"> 简介
@@ -108,6 +105,9 @@
            <li>PSS-Det</li>
            <li>RetinaNet</li>
            <li>YOLOv3</li>  
+            <li>YOLOv5</li>  
+            <li>MT-YOLOv6</li>  
+            <li>YOLOv7</li>  
            <li>PP-YOLOv1/v2</li>
            <li>PP-YOLO-Tiny</li>
            <li>PP-YOLOE</li>
@@ -263,7 +263,7 @@
 - `PP-YOLO`在COCO数据集精度45.9%，Tesla V100预测速度72.9FPS，精度速度均优于[YOLOv4](https://arxiv.org/abs/2004.10934)
 - `PP-YOLO v2`是对`PP-YOLO`模型的进一步优化，在COCO数据集精度49.5%，Tesla V100预测速度68.9FPS
 - `PP-YOLOE`是对`PP-YOLO v2`模型的进一步优化，在COCO数据集精度51.6%，Tesla V100预测速度78.1FPS
- [`YOLOX`](configs/yolox)和[`YOLOv5`](https://github.com/nemonameless/PaddleDetection_YOLOv5/tree/main/configs/yolov5)均为基于PaddleDetection复现算法
+- [`YOLOX`](configs/yolox)和[`YOLOv5`](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov5)均为基于PaddleDetection复现算法
 - 图中模型均可在[模型库](#模型库)中获取
 </details>
@@ -312,7 +312,8 @@
 | 模型名称                                                               | COCO精度（mAP） | V100 TensorRT FP16速度(FPS) | 配置文件                                                                                                         | 模型下载                                                                       |
 |:------------------------------------------------------------------ |:-----------:|:-------------------------:|:------------------------------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------:|
 | [YOLOX-l](configs/yolox)                                           | 50.1        | 107.5                     | [链接](configs/yolox/yolox_l_300e_coco.yml)                                                                    | [下载地址](https://paddledet.bj.bcebos.com/models/yolox_l_300e_coco.pdparams)  |
-| [YOLOv5-l](https://github.com/nemonameless/PaddleDetection_YOLOv5) | 48.6        | 136.0                     | [链接](https://github.com/nemonameless/PaddleDetection_YOLOv5/blob/main/configs/yolov5/yolov5_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) |
+| [YOLOv5-l](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov5) | 48.6        | 136.0                     | [链接](https://github.com/nemonameless/PaddleDetection_YOLOSeries/blob/develop/configs/yolov5/yolov5_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov5_l_300e_coco.pdparams) |
+| [YOLOv7-l](https://github.com/nemonameless/PaddleDetection_YOLOSeries/tree/develop/configs/yolov7) | 51.0        | 135.0                     | [链接](https://github.com/nemonameless/PaddleDetection_YOLOSeries/blob/develop/configs/yolov7/yolov7_l_300e_coco.yml) | [下载地址](https://paddledet.bj.bcebos.com/models/yolov7_l_300e_coco.pdparams) |
 #### 其他通用检测模型 [文档链接](docs/MODEL_ZOO_cn.md)

--- a/README_en.md
+++ b/README_en.md
@@ -23,7 +23,18 @@
 ## <img src="https://user-images.githubusercontent.com/48054808/157793354-6e7f381a-0aa6-4bb7-845c-9acf2ecc05c3.png" width="20"/> Product Update
- 🔥 **2022.7.14：Release [pedestrian analysis tool PP-Human v2](./deploy/pipeline)**
+- 🔥 **2022.8.09：Release [YOLO series model zoo](https://github.com/nemonameless/PaddleDetection_YOLOSeries)**
+  - Comprehensive coverage of classic and latest models of the YOLO series: Including YOLOv3，Paddle real-time object detection model PP-YOLOE, and frontier detection algorithms YOLOv4, YOLOv5, YOLOX, MT-YOLOv6 and YOLOv7
+  - Better model performance：Upgrade based on various YOLO algorithms, shorten training time in 5-8 times and the accuracy is generally improved by 1%-5% mAP. The model compression strategy is used to achieve 30% improvement in speed without precision loss
+  - Complete end-to-end development support：End-to-end development pipieline including training, evaluation, inference, model compression and deployment on various hardware. Meanwhile, support flexible algorithnm switch and implement customized development efficiently
+- 🔥 **2022.8.01：Release [PP-TinyPose plus](./configs/keypoint/tiny_pose/). The end-to-end precision improves 9.1% AP in dataset
+ of fitness and dance scenes**
+  - Increase data of sports scenes, and the recognition performance of complex actions is significantly improved, covering actions such as sideways, lying down, jumping, and raising legs
+  - Detection model uses PP-PicoDet plus and the precision on COCO dataset is improved by 3.1% mAP
+  - The stability of keypoints is enhanced. Implement the filter stabilization method to make the video prediction result more stable and smooth.
+- 2022.7.14：Release [pedestrian analysis tool PP-Human v2](./deploy/pipeline)
  - Four major functions: five complicated action recognition with high performance and Flexible, real-time human attribute recognition, visitor flow statistics and high-accuracy multi-camera tracking.
  - High performance algorithm: including pedestrian detection, tracking, attribute recognition which is robust to the number of targets and the variant of background and light.
  - Highly Flexible: providing complete introduction of end-to-end development and optimization strategy, simple command for deployment and compatibility with different input format.
@@ -34,15 +45,6 @@
  - Release the real-time pedestrian analysis tool [PP-Human](deploy/pphuman). It has four major functions: pedestrian tracking, visitor flow statistics, human attribute recognition and falling detection. For falling detection, it is optimized based on real-life data with accurate recognition of various types of falling posture. It can adapt to different environmental background, light and camera angle.
  - Add [YOLOX](configs/yolox) object detection model with nano/tiny/S/M/L/X. X version has the accuracy as 51.8% on COCO  Val2017 dataset.
- 2021.11.03: PaddleDetection released [release/2.3 version](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3)
-  - Release light-weight featured detection model ⚡[PP-PicoDet](configs/picodet). With a 0.99m parameter, its inference speed could reach to 150FPS when COCO mAP as over 30%
-  - Release light-weight keypoint special model ⚡[PP-TinyPose](configs/keypoint/tiny_pose), FP16 inference speed as 122 FPS  and on a single person detection. It has high performance and fast speed, unlimited detection headcounts while being effective on small objects.
-  - Release real-time tracking system [PP-Tracking](deploy/pptracking), covering pedestrian, vehicle and multi-category tracking with single and multi-camera, optimization for small and intensive objects, providing technical solutions for human and vehicle traffic.
-  - Add object detection models [Swin Transformer](configs/faster_rcnn)，[TOOD](configs/tood)，[GFL](configs/gfl)
-  - Release optimized small object detection model [Sniper](configs/sniper) and [PP-YOLO-EB](configs/ppyolo) model which optimized for EdgeBoard
-  - Add light-weight keypoint model [Lite HRNet](configs/keypoint) and supported Paddle Lite deployment
 - [More releases](https://github.com/PaddlePaddle/PaddleDetection/releases)
 ## <img title="" src="https://user-images.githubusercontent.com/48054808/157795569-9fc77c85-732f-4870-9be0-99a7fe2cff27.png" alt="" width="20"> Brief Introduction

--- a/docs/advanced_tutorials/customization/detection.md
+++ b/docs/advanced_tutorials/customization/detection.md
+简体中文 | [English](./detection_en.md)
 # 目标检测任务二次开发
 在目标检测算法产业落地过程中，常常会出现需要额外训练以满足实际使用的要求，项目迭代过程中也会出先需要修改类别的情况。本文档详细介绍如何使用PaddleDetection进行目标检测算法二次开发，流程包括：数据准备、模型优化思路和修改类别开发流程。
@@ -39,7 +41,7 @@ TestDataset:
 ```
 export CUDA_VISIBLE_DEVICES=0
-python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_roadsign.yml --eval
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval
 ```
 更详细的命令参考[30分钟快速上手PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)

--- a/docs/advanced_tutorials/customization/detection_en.md
+++ b/docs/advanced_tutorials/customization/detection_en.md
+[简体中文](./detection.md) | English
+# Customize Object Detection task
+In the practical application of object detection algorithms in a specific industry, additional training is often required for practical use. The project iteration will also need to modify categories. This document details how to use PaddleDetection for a customized object detection algorithm. The process includes data preparation, model optimization roadmap, and modifying the category development process.
+## Data Preparation
+Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection bouding boxes and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/PrepareDetDataSet_en.md)
+## Model Optimization
+### 1. Use customized dataset for training
+Modify the corresponding path in the data configuration file based on the prepared data, for example:
+configs/dataset/coco_detection.yml`:
+```
+metric: COCO
+num_classes: 80
+TrainDataset:
+  !COCODataSet
+    image_dir: train2017 # Path to the images of the training set relative to the dataset_dir
+    anno_path: annotations/instances_train2017.json # Path to the annotation file of the training set relative to the dataset_dir
+    dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path
+    data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
+EvalDataset:
+  !COCODataSet
+    image_dir: val2017 # Path to the images of the evaldataset set relative to the dataset_dir
+    anno_path: annotations/instances_val2017.json # Path to the annotation file of the evaldataset relative to the dataset_dir
+    dataset_dir: dataset/coco # Path to the dataset relative to the PaddleDetection path
+TestDataset:
+  !ImageFolder
+    anno_path: annotations/instances_val2017.json # also support txt (like VOC's label_list.txt) # Path to the annotation files relative to dataset_di.
+    dataset_dir: dataset/coco # if set, anno_path will be 'dataset_dir/anno_path' # Path to the dataset relative to the PaddleDetection path
+```
+Once the configuration changes are completed, the training evaluation can be started with the following command
+```
+export CUDA_VISIBLE_DEVICES=0
+python tools/train.py -c configs/yolov3/yolov3_mobilenet_v1_270e_coco.yml --eval
+```
+More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md)
+###
+### 2. Load the COCO model as pre-training
+The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows.
+#### 1) Set pre-training weight path
+The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
+#### 2) Modify hyperparameters
+After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example
+In `configs/ppyoloe/_base_/optimizer_300e.yml`:
+```
+epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately
+LearningRate:
+ base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced.
+ schedulers:
+ - !CosineDecay
+ max_epochs: 144 # Modify based on the number of epochs
+ - LinearWarmup
+ start_factor: 0.
+ epochs: 5
+```
+## Modify categories
+When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`:
+```
+metric: COCO
+num_classes: 10 # original class 80
+```
+After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed.
--- a/docs/advanced_tutorials/customization/keypoint_detection.md
+++ b/docs/advanced_tutorials/customization/keypoint_detection.md
+简体中文 | [English](./keypoint_detection_en.md)
 # 关键点检测任务二次开发
 在实际场景中应用关键点检测算法，不可避免地会出现需要二次开发的需求。包括对目前的预训练模型效果不满意，希望优化模型效果；或是目前的关键点点位定义不能满足实际场景需求，希望新增或是替换关键点点位的定义，训练新的关键点模型。本文档将介绍如何在PaddleDetection中，对关键点检测算法进行二次开发。

--- a/docs/advanced_tutorials/customization/keypoint_detection_en.md
+++ b/docs/advanced_tutorials/customization/keypoint_detection_en.md
+[简体中文](./keypoint_detection.md) | English
+# Customized Keypoint Detection
+When applying keypoint detection algorithms in real practice, inevitably, we may need customization as we may dissatisfy with the current pre-trained model results, or the current keypoint detection cannot meet the actual demand, or we may want to add or replace the definition of keypoints and train a new keypoint detection model. This document will introduce how to customize the keypoint detection algorithm in PaddleDetection.
+## Data Preparation
+### Basic Process Description
+PaddleDetection currently supports `COCO` and `MPII` annotation data formats. For detailed descriptions of these two data formats, please refer to the document [Keypoint Data Preparation](./../tutorials/data/PrepareKeypointDataSet.md). In this step, by using annotation tools such as Labeme, the corresponding coordinates are annotated according to the feature point serial numbers and then converted into the corresponding trainable annotation format. And we recommend `COCO` format.
+### Merging datasets
+To extend the training data, we can merge several different datasets together. But different datasets often have different definitions of key points. Therefore, the first step in merging datasets is to unify the point definitions of different datasets, and determine the benchmark points, i.e., the types of feature points finally learned by the model, and then adjust them according to the relationship between the point definitions of each dataset and the benchmark point definitions.
+- Points in the benchmark point location: adjust the point number to make it consistent with the benchmark point location
+- Points that are not in the benchmark points: discard
+- Points in the dataset that are missing from the benchmark: annotate the marked points as "unannotated".
+In [Key point data preparation](... /... /tutorials/data/PrepareKeypointDataSet.md), we provide a case illustration of how to merge the `COCO` dataset and the `AI Challenger` dataset and unify them as a benchmark point definition with `COCO` for your reference.
+## Model Optimization
+### Detection and tracking model optimization
+In PaddleDetection, the keypoint detection supports Top-Down and Bottom-Up solutions. Top-Down first detects the main body and then detects the local key points. It has higher accuracy but will take a longer time as the number of detected objects increases.The Bottom-Up plan first detects the keypoints and then combines them with the corresponding parts. It is fast and its speed is independent of the number of detected objects. Its disadvantage is that the accuracy is relatively low. For details of the two solutions and the corresponding models, please refer to [Keypoint Detection Series Models](../../../configs/keypoint/README.md)
+When using the Top-Down solution, the model's effects depend on the previous detection or tracking effect. If the pedestrian position cannot be accurately detected in the actual practice, the performance of the keypoint detection will be limited. If you encounter the above problem in actual application, please refer to [Customized Object Detection](./detection_en.md) and [Customized Multi-target tracking](./pphuman_mot_en.md) for optimization of the detection and tracking model.
+### Iterate with scenario-compatible data
+The currently released keypoint detection algorithm models are mainly iterated on open source datasets such as `COCO`/ `AI Challenger`, which may lack surveillance scenarios (angles, lighting and other factors), sports scenarios (more unconventional poses) that are more similar to the actual task. Training with data that more closely matches the actual task scenario can help improve the model's results.
+### Iteration via pre-trained models
+The data annotation of the keypoint model is complex, and using the model directly to train on the business dataset from scratch is often difficult to meet the demand. When used in practical projects, it is recommended to load the pre-trained weights, which usually improve the model accuracy significantly. Let's take `HRNet` as an example  with the following method:
+```
+python tools/train.py \
+        -c configs/keypoint/hrnet/hrnet_w32_256x192.yml \
+        -o pretrain_weights=https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams
+```
+After loading the pre-trained model, the initial learning rate and the rounds of iterations can be reduced appropriately. It is recommended that the initial learning rate be 1/2 to 1/5 of the default configuration, and you can enable`--eval` to observe the change of AP values during the iterations.
+## Data augmentation with occlusion
+There are a lot of data in occlusion in keypoint tasks, including self-covered objects and occlusion between different objects.
+1. Detection model optimization (only for Top-Down solutions)
+Refer to [Target Detection Task Secondary Development](. /detection.md) to improve the detection model in complex scenarios.
+2. Keypoint data augmentation
+Augmentation of covered data in keypoint model training to improve model performance in such scenarios, please refer to [PP-TinyPose](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/configs/keypoint/tiny_pose/)
+### Smooth video prediction
+The keypoint model is trained and predicted on the basis of image, and video input is also predicted by splitting the video into frames. Although the content is mostly similar between frames, small differences may still lead to large changes in the output of the model. As a result of that, although the predicted coordinates are roughly correct, there may be jitters in the visual effect.
+By adding a smoothing filter process, the performance of the video output can be effectively improved by combining the predicted results of each frame and the historical results. For this part, please see [Filter Smoothing](https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/python/det_keypoint_unite_infer.py#L206).
+## Add or modify keypoint definition
+### Data Preparation
+Complete the data preparation according to the previous instructions and place it under `{root of PaddleDetection}/dataset`.
+<details>
+<summary><b> Examples of annotation file</b></summary>
+```
+self_dataset/
+├── train_coco_joint.json # training set annotation file
+├── val_coco_joint.json # Validation set annotation file
+├── images/ # Store the image files
+    ├── 0.jpg
+    ├── 1.jpg
+    ├── 2.jpg  
+```
+Notable changes as follows:
+```
+{
+    "images": [
+        {
+            "file_name": "images/0.jpg",
+            "id": 0, # image id, id cannotdo not repeat
+            "height": 1080,
+            "width": 1920
+        },
+        {
+            "file_name": "images/1.jpg",
+            "id": 1,
+            "height": 1080,
+            "width": 1920
+        },
+        {
+            "file_name": "images/2.jpg",
+            "id": 2,
+            "height": 1080,
+            "width": 1920
+        },
+    ...
+    "categories": [
+        {
+            "supercategory": "person",
+            "id": 1,
+            "name": "person",
+            "keypoints": [ # the name of the point serial number
+                "point1",
+                "point2",
+                "point3",
+                "point4",
+                "point5",
+            ],
+            "skeleton": [ # Skeleton composed of points, not necessary for training
+                [
+                    1,
+                    2
+                ],
+                [
+                    1,
+                    3
+                ],
+                [
+                    2,
+                    4
+                ],
+                [
+                    3,
+                    5
+                ]
+            ]
+    ...
+    "annotations": [
+        {
+            {
+            "category_id": 1, # The category to which the instance belongs
+            "num_keypoints": 3, # the number of marked points of the instance
+              "bbox": [         # location of detection box,format is x, y, w, h
+                799,
+                575,
+                55,
+                185
+            ],
+            # N*3 list of x, y, v.
+            "keypoints": [  
+                807.5899658203125,
+                597.5455322265625,
+                2,
+                0,  
+                0,
+                0, # unlabeled points noted as 0, 0, 0
+                805.8563232421875,
+                592.3446655273438,
+                2,
+                816.258056640625,
+                594.0783081054688,
+                2,
+                0,
+                0,
+                0
+            ]
+            "id": 1, # the id of the instance, id cannot repeat
+            "image_id": 8, # The id of the image where the instance is located, repeatable. This represents the presence of multiple objects on a single image
+"iscrowd": 0, # covered or not, when the value is 0, it will participate in training
+            "area": 10175 # the area occupied by the instance, can be simply taken as w * h. Note that when the value is 0, it will be skipped, and if it is too small, it will be ignored in eval
+    ...
+```
+### Settings of configuration file
+In the configuration file, refer to [config yaml configuration](... /... /tutorials/KeyPointConfigGuide_cn.md) for more details . Take [HRNet model configuration](... /... /... /configs/keypoint/hrnet/hrnet_w32_256x192.yml) as an example, we need to focus on following contents:
+<details>
+<summary><b> Example of configuration</b></summary>
+```
+use_gpu: true
+log_iter: 5
+save_dir: output
+snapshot_epoch: 10
+weights: output/hrnet_w32_256x192/model_final
+epoch: 210
+num_joints: &num_joints 5 # The number of predicted points matches the number of defined points
+pixel_std: &pixel_std 200
+Metric. keyPointTopDownCOCOEval
+num_classes: 1  
+train_height: &train_height 256
+train_width: &train_width 192
+trainsize: &trainsize [*train_width, *train_height].
+hmsize: &hmsize [48, 64].
+flip_perm: &flip_perm [[1, 2], [3, 4]]. # Note that only points that are mirror-symmetric are recorded here.
+...
+# Ensure that dataset_dir + anno_path can correctly locate the annotation file
+# Ensure that dataset_dir + image_dir + image path in annotation file can correctly locate the image.
+TrainDataset:
+  !KeypointTopDownCocoDataset
+    image_dir: images
+    anno_path: train_coco_joint.json
+    dataset_dir: dataset/self_dataset
+    num_joints: *num_joints
+    trainsize. *trainsize
+    pixel_std: *pixel_std
+    use_gt_box: true
+Evaluate the dataset.
+  !KeypointTopDownCocoDataset
+    image_dir: images
+    anno_path: val_coco_joint.json
+    dataset_dir: dataset/self_dataset
+    bbox_file: bbox.json
+    num_joints: *num_joints
+    trainsize. *trainsize
+    pixel_std: *pixel_std
+    use_gt_box: true
+    image_thre: 0.0
+```
+### Model Training and Evaluation
+#### Model Training
+Run the following command to start training:
+```
+CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
+```
+#### Model Evaluation
+After training the model, you can evaluate the model metrics by running the following commands:
+```
+python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml
+```
+### Model Export and Inference
+#### Top-Down model deployment
+```
+#Export keypoint model
+python tools/export_model.py -c configs/keypoint/hrnet/hrnet_w32_256x192.yml -o weights={path_to_your_weights}
+#detector detection + keypoint top-down model co-deployment（for top-down solutions only）
+python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_256x192/ --video_file=../video/xxx.mp4  --device=gpu
+```
--- a/docs/advanced_tutorials/customization/pphuman_attribute.md
+++ b/docs/advanced_tutorials/customization/pphuman_attribute.md
+简体中文 | [English](./pphuman_attribute_en.md)
 # 属性识别任务二次开发
 ## 数据准备

--- a/docs/advanced_tutorials/customization/pphuman_attribute_en.md
+++ b/docs/advanced_tutorials/customization/pphuman_attribute_en.md
+[简体中文](pphuman_attribute.md) | English
+# Customized attribute recognition
+## Data Preparation
+### Data format
+We use the PA100K attribute annotation format, with a total of 26 attributes.
+The names, locations, and the number of these 26 attributes are shown in the table below.
+| Attribute                                                                       | index                  | length |
+|:------------------------------------------------------------------------------- |:---------------------- |:------ |
+| 'Hat','Glasses'                                                                 | [0, 1]                 | 2      |
+| 'ShortSleeve','LongSleeve','UpperStride','UpperLogo','UpperPlaid','UpperSplice' | [2, 3, 4, 5, 6, 7]     | 6      |
+| 'LowerStripe','LowerPattern','LongCoat','Trousers','Shorts','Skirt&Dress'       | [8, 9, 10, 11, 12, 13] | 6      |
+| 'boots'                                                                         | [14, ]                 | 1      |
+| 'HandBag','ShoulderBag','Backpack','HoldObjectsInFront'                         | [15, 16, 17, 18]       | 4      |
+| 'AgeOver60', 'Age18-60', 'AgeLess18'                                            | [19, 20, 21]           | 3      |
+| 'Female'                                                                        | [22, ]                 | 1      |
+| 'Front','Side','Back'                                                           | [23, 24, 25]           | 3      |
+Examples:
+[0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0]
+The first group: position [0, 1] values are [0, 1], which means'no hat', 'has glasses'.
+The second group: position [22, ] values are [0, ], indicating that the gender attribute is 'male', otherwise it is 'female'.
+The third group: position [23, 24, 25] values are [0, 1, 0], indicating that the direction attribute is 'side'.
+Other groups follow in this order
+### Data Annotation
+After knowing the purpose of the above `attribute annotation` format, we can start to annotate data. The essence is that each single-person image creates a set of 26 annotation items, corresponding to the attribute values at 26 positions.
+Examples:
+For an original image:
+1) Using bounding boxes to annotate the position of each person in the picture.
+2) Each detection box (corresponding to each person) contains 26 attribute values which are represented by 0 or 1. It corresponds to the above 26 attributes. For example, if the picture is 'Female', then the 22nd bit of the array is 0. If the person is between 'Age18-60', then the corresponding value at position [19, 20, 21] is [0, 1, 0], or if the person matches 'AgeOver60', then the corresponding value is [1, 0, 0].
+After the annotation is completed, the model will use the detection box to intercept each person into a single-person picture, and its picture establishes a corresponding relationship with the 26 attribute annotation. It is also possible to cut into a single-person image first and then annotate it. The results are the same.
+Model Training
+Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
+There are two main steps: 1) Organize the data and annotated data into the training format. 2) Modify the configuration file to start training.
+### Training data format
+The training data includes the images used for training and a training list called train.txt. Its location is specified in the training configuration, with the following example:
+```
+Attribute/
+|-- data Training images folder
+|-- 00001.jpg
+|-- 00002.jpg
+| `-- 0000x.jpg
+train.txt List of training data
+```
+train.txt file contains the names of all training images (file path relative to the root path) + 26 annotation values
+Each line of it represents a person's image and annotation result. The format is as follows:
+```
+00001.jpg    0,0,1,0,....
+```
+Note 1) The images are separated by Tab[\t], 2) The annotated values are separated by commas [,]. If the format is wrong, the parsing will fail.
+### Modify the configuration to start training
+First run the following command to download the training code (for more environmental issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md)):
+```
+git clone https://github.com/PaddlePaddle/PaddleClas
+```
+You need to modify the following configuration in the configuration file `PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`
+```
+DataLoader:
+  Train:
+    Train: dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/" #Specify the root path of training image
+      cls_label_path: "dataset/pa100k/train_list.txt" #Specify the location of the training list file
+      label_ratio: True
+      transform_ops:
+  Eval:
+    dataset:
+      name: MultiLabelDataset
+      image_root: "dataset/pa100k/" #Specify the root path of evaluated image
+      cls_label_path: "dataset/pa100k/val_list.txt" #Specify the location of the evaluation list file
+      label_ratio: True
+      transform_ops:
+```
+Note:
+1. here image_root path and the relative path of the image in train.txt, corresponding to the full path of the image.
+2. If you modify the number of attributes, the number of attribute types in the content configuration item should also be modified accordingly.
+```
+# model architecture
+Arch:
+name: "PPLCNet_x1_0"
+pretrained: True
+use_ssld: True
+class_num: 26           #Attribute classes and numbers
+```
+Then run the following command to start training:
+```
+#Multi-card training
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
+#Single card training
+python3 tools/train.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
+```
+You can run the following commands for performance evaluation after the training is completed:
+```
+#Multi-card evaluation
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/eval.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+        -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
+#Single card evaluation
+python3 tools/eval.py \
+        -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+        -o Global.pretrained_model=./output/PPLCNet_x1_0/best_model
+```
+### Model Export
+Use the following command to export the trained model as an inference deployment model.
+```
+python3 tools/export_model.py \
+    -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml \
+    -o Global.pretrained_model=output/PPLCNet_x1_0/best_model \
+    -o Global.save_inference_dir=deploy/models/PPLCNet_x1_0_person_attribute_infer
+```
+After exporting the model, you need to download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/infer_cfg.yml) file and put it into the exported model folder `PPLCNet_x1_0_person_ attribute_infer` .
+When you use the model, you need to modify the new model path `model_dir` entry and set `enable: True` in the configuration file of PP-Human `. /deploy/pipeline/config/infer_cfg_pphuman.yml` .
+```
+ATTR:
+  model_dir: [YOUR_DEPLOY_MODEL_DIR]/PPLCNet_x1_0_person_attribute_infer/   #The exported model location
+  enable: True                                                              #Whether to enable the function
+```
+Now, the model is ready for you.
+ To this point,  a new attribute category recognition task is completed.
+## Adding or deleting attributes
+The above is the annotation and training process with 26 attributes.
+If the attributes need to be added or deleted, you need to
+1) New attribute category information needs to be added or deleted when annotating the data.
+2) Modify the number and name of attributes used in train.txt corresponding to the training.
+3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
+Example of adding attributes.
+1. Continue to add new attribute annotation values after 26 values when annotating the data.
+2. Add new attribute values to the annotated values in the train.txt file as well.
+3. The above is the annotation and training process with 26 attributes.
+   If the attributes need to be added or deleted, you need to
+   1) New attribute category information needs to be added or deleted when annotating the data.
+   2) Modify the number and name of attributes used in train.txt corresponding to the training.
+   3) Modify the training configuration, for example, the number of attributes in the ``PaddleClas/blob/develop/ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml`` file, for details, please see the ``Modify configuration to start training`` section above.
+   Example of adding attributes.
+   1. Continue to add new attribute annotation values after 26 values when annotating the data.
+   2. Add new attribute values to the annotated values in the train.txt file as well.
+   3. Note that the correlation of attribute types and values in train.txt needs to be fixed, for example, the [19, 20, 21] position indicates age, and all images should use the [19, 20, 21] position to indicate age.
+   The same applies to the deletion of attributes.
+   For example, if the age attribute is not needed, the values in positions [19, 20, 21] can be removed. You can simply remove all the values in positions 19-21 from the 26 numbers marked in train.txt, and you no longer need to annotate these 3 attribute values.
--- a/docs/advanced_tutorials/customization/pphuman_mot.md
+++ b/docs/advanced_tutorials/customization/pphuman_mot.md
+简体中文 | [English](./pphuman_mot_en.md)
 # 多目标跟踪任务二次开发
 在产业落地过程中应用多目标跟踪算法，不可避免地会出现希望自定义类型的多目标跟踪的需求，或是对已有多目标跟踪模型的优化，以提升在特定场景下模型的效果。我们在本文档通过案例来介绍如何根据期望识别的行为来进行多目标跟踪方案的选择，以及使用PaddleDetection进行多目标跟踪算法二次开发工作，包括：数据准备、模型优化思路和跟踪类别修改的开发流程。

--- a/docs/advanced_tutorials/customization/pphuman_mot_en.md
+++ b/docs/advanced_tutorials/customization/pphuman_mot_en.md
+[简体中文](./pphuman_mot.md) | English
+# Customized multi-object tracking task
+When applying multi-object tracking algorithms in industrial applications, there will be inevitable demands for customized types of multi-object tracking or optimization of existing multi-object tracking models to improve the effectiveness of the models in specific scenarios. In this document, we present examples of how to choose a multi-object tracking solution based on the expected identified behavior, and how to use PaddleDetection for further development of multi-object tracking algorithms, including data preparation, model optimization ideas, and the development process of tracking category modification.
+## Data Preparation
+The multi-object tracking model scheme uses [ByteTrack](https://arxiv.org/pdf/2110.06864.pdf), which adopts PP-YOLOE to replace the original YOLOX as a detector and BYTETracker as a tracker, for details, please refer to [ByteTrack](... /... /... /configs/mot/bytetrack). The original ByteTrack only supports single pedestrian category, while PaddleDetection supports multiple categories for simultaneous tracking. Training ByteTrack, which is the process of training the detector, only requires the detection annotations to be prepared, and does not require ReID annotation information, i.e., it can be done as pure detection. The dataset should preferably be extracted from continuous video rather than a collection of unrelated images.
+Customization starts with the preparation of the dataset. We need to collect suitable data for the scenario features, so as to improve the model effect and generalization performance. Then Labeme, LabelImg and other labeling tools will be used to label the object detection frame and convert the labeling results into COCO or VOC data format. Details please refer to [Data Preparation](../../tutorials/data/README.md)
+## Model Optimization
+### 1. Use customized data set for training
+The dataset used by the ByteTrack tracking solution only needs detection annotations. Refer to [MOT dataset preparation](... /... /... /configs/mot) and [MOT dataset tutorial](docs/tutorials/data/PrepareMOTDataSet.md).
+```
+# Single card training
+CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
+# Multi-card training
+python -m paddle.distributed.launch --log_dir=log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_crn_l_300e_coco.yml --eval --amp
+```
+More details please refer to [Getting Started for PaddleDetection](../../tutorials/GETTING_STARTED_cn.md) and [ByteTrack](../../../configs/mot/bytetrack/detector)
+### 2. Load the COCO model as the pre-trained model
+The currently provided pre-trained models in PaddleDetection's configurations are weights from the ImageNet dataset, loaded into the backbone network of the detection algorithm. For practical use, it is recommended to load the weights trained on the COCO dataset, which can usually provide a large improvement to the model accuracy. The method is as follows.
+#### 1) Set pre-training weight path
+The trained model weights for the COCO dataset are saved in the configuration folder of each algorithm, for example, PP-YOLOE-l COCO dataset weights are provided under `configs/ppyoloe`: [Link](https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams) The configuration file sets`pretrain_weights: https://paddledet.bj.bcebos.com/models/ppyoloe_crn_l_300e_coco.pdparams`
+#### 2) Modify hyperparameters
+After loading the COCO pre-training weights, the learning rate hyperparameters need to be modified, for example
+In `configs/ppyoloe/*base*/optimizer_300e.yml`:
+```
+epoch: 120 # The original configuration is 300 epoch, after loading COCO weights, the iteration number can be reduced appropriately
+LearningRate:
+base_lr: 0.005 # The original configuration is 0.025, after loading COCO weights, the learning rate should be reduced.
+  schedulers:
+    - !CosineDecay
+      max_epochs: 144 # Modified according to the number of epochs, usually 1.2 times the number of epochs
+    - LinearWarmup
+      start_factor: 0.
+      epochs: 5
+```
+## Modify categories
+When the actual application scenario category changes, the data configuration file needs to be modified, for example in `configs/datasets/coco_detection.yml`:
+```
+metric: COCO
+num_classes: 10 # original class 80
+```
+After the configuration changes are completed, the COCO pre-training weights can also be loaded. PaddleDetection supports automatic loading of shape-matching weights, and weights that do not match the shape are automatically ignored, so no other modifications are needed.
--- a/docs/advanced_tutorials/customization/pphuman_mtmct.md
+++ b/docs/advanced_tutorials/customization/pphuman_mtmct.md
+简体中文 | [English](./pphuman_mtmct_en.md)
 # 跨镜跟踪任务二次开发
 ## 数据准备

--- a/docs/advanced_tutorials/customization/pphuman_mtmct_en.md
+++ b/docs/advanced_tutorials/customization/pphuman_mtmct_en.md
+[简体中文](./pphuman_mtmct.md) | English
+# Customized Multi-Target Multi-Camera Tracking Module of PP-Human
+## Data Preparation
+### Data Format
+Multi-target multi-camera tracking, or mtmct is achieved by the pedestrian REID technique. It is trained with a multiclassification model and uses the features before the head of the classification softmax as the retrieval feature vector.
+Therefore its format is the same as the multi-classification task. Each pedestrian is assigned an exclusive id, which is different for different pedestrians while the same pedestrian has the same id in different images.
+For example, images 0001.jpg, 0003.jpg are the same person, 0002.jpg, 0004.jpg are different pedestrians. Then the labeled ids are.
+```
+0001.jpg    00001
+0002.jpg    00002
+0003.jpg    00001
+0004.jpg    00003
+...
+```
+### Data Annotation
+After understanding the meaning of the `annotation` format above, we can work on the data annotation. The essence of data annotation is that each single person diagram creates an annotation item that corresponds to the id assigned to that pedestrian.
+For example:
+For an original picture
+1) Use bouding boxes to annotate the position of each person in the picture.
+2) Each bouding box (corresponding to each person) contains an int id attribute. For example, the person in 0001.jpg in the above example corresponds to id: 1.
+After the annotation is completed, use the detection box to intercept each person into a single picture, the picture and id attribute annotation will establish a corresponding relationship. You can also first cut into a single image and then annotate, the result is the same.
+## Model Training
+Once the data is annotated, it can be used for model training to complete the optimization of the customized model.
+There are two main steps to implement: 1) organize the data and annotated data into a training format. 2) modify the configuration file to start training.
+### Training data format
+The training data consists of the images used for training and a training list bounding_box_train.txt, the location of which is specified in the training configuration, with the following example placement.
+```
+REID/
+|-- data Training image folder
+|-- 00001.jpg
+|-- 00002.jpg
+|-- 0000x.jpg
+`-- bounding_box_train.txt List of training data
+```
+bounding_box_train.txt file contains the names of all training images (file path relative to the root path) + 1 id annotation value
+Each line represents a person's image and id annotation result. The format is as follows:
+```
+0001.jpg    00001
+0002.jpg    00002
+0003.jpg    00001
+0004.jpg    00003
+```
+Note: The images are separated from the annotated values by a Tab[\t] symbol. This format must be correct, otherwise, the parsing will fail.
+### Modify the configuration to start training
+First, execute the following command to download the training code (for more environment issues, please refer to [Install_PaddleClas](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.4/docs/en/installation/ install_paddleclas_en.md):
+```
+git clone https://github.com/PaddlePaddle/PaddleClas
+```
+You need to change the following configuration items in the configuration file [softmax_triplet_with_center.yaml](https://github.com/PaddlePaddle/PaddleClas/blob/develop/ppcls/configs/reid/strong_ baseline/softmax_triplet_with_center.yaml):
+```
+  Head:
+    name: "FC"
+    embedding_size: *feat_dim
+    class_num: &class_num 751 #Total number of pedestrian ids
+DataLoader:
+  Train:
+    dataset:
+        name: "Market1501"
+        image_root: ". /dataset/" #training image root path
+        cls_label_path: "bounding_box_train" #training_file_list
+  Eval:
+    Query:
+      dataset:
+        name: "Market1501"
+        image_root: ". /dataset/" #Evaluated image root path
+        cls_label_path: "query" #List of evaluation files
+```
+Note:
+1. Here the image_root path + the relative path of the image in the bounding_box_train.txt corresponds to the full path where the image is stored.
+Then run the following command to start the training.
+```
+#Multi-card training
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/train.py \
+        -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
+#Single card training
+python3 tools/train.py \
+    -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml
+```
+After the training is completed, you may run the following commands for performance evaluation:
+```
+#Multi-card evaluation
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+python3 -m paddle.distributed.launch \
+    --gpus="0,1,2,3" \
+    tools/eval.py \
+        -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
+        -o Global.pretrained_model=./output/strong_baseline/best_model
+#Single card evaluation
+python3 tools/eval.py \
+        -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
+        -o Global.pretrained_model=./output/strong_baseline/best_model
+```
+### Model Export
+Use the following command to export the trained model as an inference deployment model.
+```
+python3 tools/export_model.py \
+    -c ./ppcls/configs/reid/strong_baseline/softmax_triplet_with_center.yaml \
+    -o Global.pretrained_model=./output/strong_baseline/best_model \
+    -o Global.save_inference_dir=deploy/models/strong_baseline_inference
+```
+After exporting the model, download the [infer_cfg.yml](https://bj.bcebos.com/v1/paddledet/models/pipeline/REID/infer_cfg.yml) file to the newly exported model folder 'strong_baseline_ inference'.
+Change the model path `model_dir` in the configuration file `infer_cfg_pphuman.yml` in PP-Human and set `enable`.
+```
+REID:
+ model_dir: [YOUR_DEPLOY_MODEL_DIR]/strong_baseline_inference/
+ enable: True
+```
+Now, the model is ready.