[Cherry-pick] add en-docs for keypoint (#4796)

* add en-docs for keypoint, test=document_fix * fix link to relative path

[Cherry-pick] add en-docs for keypoint (#4796)
* add en-docs for keypoint, test=document_fix * fix link to relative path
3167ecb6 · JYChen · GitHub · 576accc1 · 3167ecb6 · 3167ecb6
7 changed file
--- a/README_en.md
+++ b/README_en.md
@@ -5,7 +5,7 @@ English | [简体中文](README_cn.md)
 - 2021.11.03: Release [release/2.3](https://github.com/PaddlePaddle/Paddleetection/tree/release/2.3) version. Release mobile object detection model ⚡[PP-PicoDet](configs/picodet), mobile keypoint detection model ⚡[PP-TinyPose](configs/keypoint/tiny_pose)，Real-time tracking system [PP-Tracking](deploy/pptracking). Release object detection models, including [Swin-Transformer](configs/faster_rcnn), [TOOD](configs/tood), [GFL](configs/gfl), release [Sniper](configs/sniper) tiny object detection models and optimized [PP-YOLO-EB](configs/ppyolo) model for EdgeBoard. Release mobile keypoint detection model [Lite HRNet](configs/keypoint).
 - 2021.08.10: Release [release/2.2](https://github.com/PaddlePaddle/Paddleetection/tree/release/2.2) version. Release Transformer object detection models, including [DETR](configs/detr), [Deformable DETR](configs/deformable_detr), [Sparse RCNN](configs/sparse_rcnn). Release [keypoint detection](configs/keypoint) models, including DarkHRNet and model trained on MPII dataset. Release [head-tracking](configs/mot/headtracking21) and [vehicle-tracking](configs/mot/vehicle) multi-object tracking models.
- 2021.05.20: Release [release/2.1]((https://github.com/PaddlePaddle/Paddleetection/tree/release/2.1) version. Release [Keypoint Detection](configs/keypoint), including HigherHRNet and HRNet, [Multi-Object Tracking](configs/mot), including DeepSORT，JDE and FairMOT. Release model compression for PPYOLO series models.Update documents such as [EXPORT ONNX MODEL](deploy/EXPORT_ONNX_MODEL.md).
+- 2021.05.20: Release [release/2.1](https://github.com/PaddlePaddle/Paddleetection/tree/release/2.1) version. Release [Keypoint Detection](configs/keypoint), including HigherHRNet and HRNet, [Multi-Object Tracking](configs/mot), including DeepSORT，JDE and FairMOT. Release model compression for PPYOLO series models.Update documents such as [EXPORT ONNX MODEL](deploy/EXPORT_ONNX_MODEL.md).
 # Introduction

--- a/configs/keypoint/README.md
+++ b/configs/keypoint/README.md
+简体中文 | [English](README_en.md)
 # KeyPoint模型系列

--- a/configs/keypoint/README_en.md
+++ b/configs/keypoint/README_en.md
+[简体中文](README.md) | English
+# KeyPoint Detection Models
+## Introduction
+-    The keypoint detection part in PaddleDetection follows the state-of-the-art algorithm closely, including Top-Down and BottomUp methods, which can meet the different needs of users.
+<div align="center">
+  <img src="./football_keypoint.gif" width='800'/>
+</div>
+####   Model Zoo
+COCO Dataset
+| Model              | Input Size | AP(coco val) |                           Model Download                           | Config File                                                    |
+| :---------------- | -------- | :----------: | :----------------------------------------------------------: | ----------------------------------------------------------- |
+| HigherHRNet-w32       | 512      |     67.1     | [higherhrnet_hrnet_w32_512.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512.yml)       |
+| HigherHRNet-w32       | 640      |     68.3     | [higherhrnet_hrnet_w32_640.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_640.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_640.yml)       |
+| HigherHRNet-w32+SWAHR | 512      |     68.9     | [higherhrnet_hrnet_w32_512_swahr.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/higherhrnet_hrnet_w32_512_swahr.pdparams) | [config](./higherhrnet/higherhrnet_hrnet_w32_512_swahr.yml) |
+| HRNet-w32             | 256x192  |     76.9     | [hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x192.pdparams) | [config](./hrnet/hrnet_w32_256x192.yml)                     |
+| HRNet-w32             | 384x288  |     77.8     | [hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_384x288.pdparams) | [config](./hrnet/hrnet_w32_384x288.yml)                     |
+| HRNet-w32+DarkPose             | 256x192  |     78.0     | [dark_hrnet_w32_256x192.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_256x192.pdparams) | [config](./hrnet/dark_hrnet_w32_256x192.yml)                     |
+| HRNet-w32+DarkPose             | 384x288  |     78.3     | [dark_hrnet_w32_384x288.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/dark_hrnet_w32_384x288.pdparams) | [config](./hrnet/dark_hrnet_w32_384x288.yml)                     |
+| WiderNaiveHRNet-18         | 256x192  |     67.6(+DARK 68.4)     | [wider_naive_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/wider_naive_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/wider_naive_hrnet_18_256x192_coco.yml)     |
+| LiteHRNet-18                   | 256x192  |     66.5     | [lite_hrnet_18_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_256x192_coco.yml)     |
+| LiteHRNet-18                   | 384x288  |     69.7     | [lite_hrnet_18_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_18_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_18_384x288_coco.yml)     |
+| LiteHRNet-30                   | 256x192  |     69.4     | [lite_hrnet_30_256x192_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_256x192_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_256x192_coco.yml)     |
+| LiteHRNet-30                   | 384x288  |     72.5     | [lite_hrnet_30_384x288_coco.pdparams](https://bj.bcebos.com/v1/paddledet/models/keypoint/lite_hrnet_30_384x288_coco.pdparams) | [config](./lite_hrnet/lite_hrnet_30_384x288_coco.yml)     |
+Note：The AP results of Top-Down models are based on bounding boxes in GroundTruth.
+MPII Dataset
+| Model  | Input Size | PCKh(Mean) | PCKh(Mean@0.1) |                           Model Download                           | Config File                                     |
+| :---- | -------- | :--------: | :------------: | :----------------------------------------------------------: | -------------------------------------------- |
+| HRNet-w32 | 256x256  |    90.6    |      38.5      | [hrnet_w32_256x256_mpii.pdparams](https://paddledet.bj.bcebos.com/models/keypoint/hrnet_w32_256x256_mpii.pdparams) | [config](./hrnet/hrnet_w32_256x256_mpii.yml) |
+We also release [PP-TinyPose](./tiny_pose/README_en.md), a real-time keypoint detection model optimized for mobile devices. Welcome to experience.
+## Getting Start
+### 1. Environmental Installation
+    Please refer to [PaddleDetection Installation Guild](../../docs/tutorials/INSTALL.md) to install PaddlePaddle and PaddleDetection correctly.
+### 2. Dataset Preparation
+    Currently, KeyPoint Detection Models support [COCO](https://cocodataset.org/#keypoints-2017) and [MPII](http://human-pose.mpi-inf.mpg.de/#overview). Please refer to [Keypoint Dataset Preparation](../../docs/tutorials/PrepareKeypointDataSet_en.md) to prepare dataset.
+   About the description for config files, please refer to [Keypoint Config Guild](../../docs/tutorials/KeyPointConfigGuide_en.md).
+  - Note that, when testing by detected bounding boxes in Top-Down method, We should get `bbox.json` by a detection model. You can download the detected results for COCO val2017 [(Detector having human AP of 56.4 on COCO val2017 dataset)](https://paddledet.bj.bcebos.com/data/bbox.json) directly, put it at the root path (`PaddleDetection/`), and set `use_gt_bbox: False` in config file.
+### 3、Training and Testing
+    **Training on single gpu:**
+```shell
+#COCO DataSet
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml
+#MPII DataSet
+CUDA_VISIBLE_DEVICES=0 python3 tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml
+```
+    **Training on multiple gpu:**
+```shell
+#COCO DataSet
+CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml
+#MPII DataSet
+CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml
+```
+    **Evaluation**
+```shell
+#COCO DataSet
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml
+#MPII DataSet
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/hrnet/hrnet_w32_256x256_mpii.yml
+#If you only need the prediction result, you can set --save_prediction_only. Then the result will be saved at output/keypoints_results.json by default.
+CUDA_VISIBLE_DEVICES=0 python3 tools/eval.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml --save_prediction_only
+```
+    **Inference**
+    Note：Top-down models only support inference for a cropped image with single person. If you want to do inference on image with several people, please see "joint inference by detection and keypoint". Or you can choose a Bottom-up model.
+```shell
+CUDA_VISIBLE_DEVICES=0 python3 tools/infer.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=./output/higherhrnet_hrnet_w32_512/model_final.pdparams --infer_dir=../images/ --draw_threshold=0.5 --save_txt=True
+```
+    **Deploy Inference**
+```shell
+#export models
+python tools/export_model.py -c configs/keypoint/higherhrnet/higherhrnet_hrnet_w32_512.yml -o weights=output/higherhrnet_hrnet_w32_512/model_final.pdparams
+#deploy inference
+#keypoint inference for a single model of top-down/bottom-up method. In this mode, top-down model only support inference for a cropped image with single person.
+python deploy/python/keypoint_infer.py --model_dir=output_inference/higherhrnet_hrnet_w32_512/ --image_file=./demo/000000014439_640x640.jpg --device=gpu --threshold=0.5
+python deploy/python/keypoint_infer.py --model_dir=output_inference/hrnet_w32_384x288/ --image_file=./demo/hrnet_demo.jpg --device=gpu --threshold=0.5
+#joint inference by detection and keypoint for top-down models.
+python deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/ppyolo_r50vd_dcn_2x_coco/ --keypoint_model_dir=output_inference/hrnet_w32_384x288/ --video_file=../video/xxx.mp4  --device=gpu
+```
+    **joint inference with Multi-Object Tracking model FairMOT**
+```shell
+#export FairMOT model
+python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams
+#joint inference with Multi-Object Tracking model FairMOT
+python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inference/fairmot_dla34_30e_1088x608/ --keypoint_model_dir=output_inference/higherhrnet_hrnet_w32_512/ --video_file={your video name}.mp4 --device=GPU
+```
+**Note:**
+ To export MOT model, please refer to [Here](../../configs/mot/README_en.md).
+## Reference
+```
+@inproceedings{cheng2020bottom,
+  title={HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation},
+  author={Bowen Cheng and Bin Xiao and Jingdong Wang and Honghui Shi and Thomas S. Huang and Lei Zhang},
+  booktitle={CVPR},
+  year={2020}
+}
+@inproceedings{SunXLW19,
+  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
+  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
+  booktitle={CVPR},
+  year={2019}
+}
+@article{wang2019deep,
+  title={Deep High-Resolution Representation Learning for Visual Recognition},
+  author={Wang, Jingdong and Sun, Ke and Cheng, Tianheng and Jiang, Borui and Deng, Chaorui and Zhao, Yang and Liu, Dong and Mu, Yadong and Tan, Mingkui and Wang, Xinggang and Liu, Wenyu and Xiao, Bin},
+  journal={TPAMI},
+  year={2019}
+}
+@InProceedings{Zhang_2020_CVPR,
+    author = {Zhang, Feng and Zhu, Xiatian and Dai, Hanbin and Ye, Mao and Zhu, Ce},
+    title = {Distribution-Aware Coordinate Representation for Human Pose Estimation},
+    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+    month = {June},
+    year = {2020}
+}
+@inproceedings{Yulitehrnet21,
+  title={Lite-HRNet: A Lightweight High-Resolution Network},
+  author={Yu, Changqian and Xiao, Bin and Gao, Changxin and Yuan, Lu and Zhang, Lei and Sang, Nong and Wang, Jingdong},
+  booktitle={CVPR},
+  year={2021}
+}
+```
--- a/configs/keypoint/tiny_pose/README.md
+++ b/configs/keypoint/tiny_pose/README.md
+简体中文 | [English](README_en.md)
 # PP-TinyPose
 <div align="center">

--- a/configs/keypoint/tiny_pose/README_en.md
+++ b/configs/keypoint/tiny_pose/README_en.md
+[简体中文](README.md) | English
+# PP-TinyPose
+<div align="center">
+  <img src="../../../docs/images/tinypose_demo.png"/>
+  <center>Image Source: COCO2017</center>
+</div>
+## Introduction
+PP-TinyPose is a real-time keypoint detection model optimized by PaddleDetecion for mobile devices, which can smoothly run multi-person pose estimation tasks on mobile devices. With the excellent self-developed lightweight detection model [PicoDet](../../picodet/README.md), we also provide a lightweight pedestrian detection model. PP-TinyPose has the following dependency requirements:
+- [PaddlePaddle](https://github.com/PaddlePaddle/Paddle)>=2.2
+If you want to deploy it on the mobile devives, you also need:
+- [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)>=2.10
+<div align="center">
+  <img src="../../../docs/images/tinypose_pipeline.png" width='800'/>
+</div>
+## Deployment Case
+- [Android Fitness Demo](https://github.com/zhiboniu/pose_demo_android) based on PP-TinyPose, which efficiently implements fitness calibration and counting.
+<div align="center">
+  <img src="../../../docs/images/fitness_demo.gif" width='636'/>
+</div>
+- Welcome to scan the QR code for quick experience.
+<div align="center">
+  <img src="../../../docs/images/tinypose_app.png" width='220'/>
+</div>
+## Model Zoo
+### Keypoint Detection Model
+| Model  | Input Size | AP (COCO Val) | Inference Time for Single Person (FP32)| Inference Time for Single Person（FP16) | Config | Model Weights | Deployment Model | Paddle-Lite Model（FP32) | Paddle-Lite Model（FP16)|
+| :------------------------ | :-------:  | :------: | :------: |:---: | :---: | :---: | :---: | :---: | :---: |
+| PP-TinyPose | 128*96 | 58.1 | 4.57ms | 3.27ms | [Config](./tinypose_128x96.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96.nb) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_128x96_fp16.nb) |
+| PP-TinyPose | 256*192 | 68.8 | 14.07ms | 8.33ms | [Config](./tinypose_256x192.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192.nb) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/tinypose_256x192_fp16.nb) |
+### Pedestrian Detection Model
+| Model  | Input Size | mAP (COCO Val) | Average Inference Time (FP32)| Average Inference Time (FP16) | Config | Model Weights | Deployment Model | Paddle-Lite Model（FP32) | Paddle-Lite Model（FP16)|
+| :------------------------ | :-------:  | :------: | :------: | :---: | :---: | :---: | :---: | :---: | :---: |
+| PicoDet-S-Pedestrian | 192*192 | 29.0 | 4.30ms |  2.37ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml) |[Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian.nb) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_192_pedestrian_fp16.nb) |
+| PicoDet-S-Pedestrian | 320*320 | 38.5 | 10.26ms |  6.30ms | [Config](../../picodet/application/pedestrian_detection/picodet_s_320_pedestrian.yml) | [Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.pdparams) | [Deployment Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.tar) | [Lite Model](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian.nb) | [Lite Model(FP16)](https://bj.bcebos.com/v1/paddledet/models/keypoint/picodet_s_320_pedestrian_fp16.nb) |
+**Tips**
+- The keypoint detection model and pedestrian detection model are both trained on `COCO train2017` and `AI Challenger trainset`. The keypoint detection model is evaluated on `COCO person keypoints val2017`, and the pedestrian detection model is evaluated on `COCO instances val2017`.
+- The AP results of keypoint detection models are based on bounding boxes in GroundTruth.
+- Both keypoint detection model and pedestrian detection model are trained in a 4-GPU environment. In practice, if number of GPUs or batch size need to be changed according to the training environment, you should refer to [FAQ](../../../docs/tutorials/FAQ/README.md) to adjust the learning rate.
+- The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8.
+### Pipeline Performance
+| Model for Single-Pose | AP (COCO Val Single-Person) | Time for Single Person(FP32) |  Time for Single Person(FP16) |
+| :------------------------ | :------: | :---: | :---: |
+| PicoDet-S-Pedestrian-192\*192 + PP-TinyPose-128\*96 | 51.8 | 11.72 ms| 8.18 ms |
+| Other opensource model-192\*192 | 22.3 | 12.0 ms| - |
+| Model for Multi-Pose | AP (COCO Val Multi-Persons) | Time for Six Persons(FP32) | Time for Six Persons(FP16)|
+| :------------------------ | :-------: | :---: | :---: |
+| PicoDet-S-Pedestrian-320\*320 + PP-TinyPose-128\*96 | 50.3 | 44.0 ms| 32.57 ms |
+| Other opensource model-256\*256 | 39.4 | 51.0 ms| - |
+**Tips**
+- The AP results of keypoint detection models are based on bounding boxes detected by corresponding detection model.
+- In accuracy evaluation, there is no flip, and threshold of bounding boxes is set to 0.5.
+- For fairness, in multi-persons test, we remove images with more than 6 people.
+- The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8, FP32.
+- Pipeline time includes time for preprocess, inferece and postprocess.
+- About the deployment and testing for other opensource model, please refer to [Here](https://github.com/zhiboniu/MoveNet-PaddleLite).
+## Model Training
+In addition to `COCO`, the trainset for keypoint detection model and pedestrian detection model also includes [AI Challenger](https://arxiv.org/abs/1711.06475). Keypoints of each dataset are defined as follows:
+```
+COCO keypoint Description:
+    0: "Nose",
+    1: "Left Eye",
+    2: "Right Eye",
+    3: "Left Ear",
+    4: "Right Ear",
+    5: "Left Shoulder,
+    6: "Right Shoulder",
+    7: "Left Elbow",
+    8: "Right Elbow",
+    9: "Left Wrist",
+    10: "Right Wrist",
+    11: "Left Hip",
+    12: "Right Hip",
+    13: "Left Knee",
+    14: "Right Knee",
+    15: "Left Ankle",
+    16: "Right Ankle"
+AI Challenger Description:
+    0: "Right Shoulder",
+    1: "Right Elbow",
+    2: "Right Wrist",
+    3: "Left Shoulder",
+    4: "Left Elbow",
+    5: "Left Wrist",
+    6: "Right Hip",
+    7: "Right Knee",
+    8: "Right Ankle",
+    9: "Left Hip",
+    10: "Left Knee",
+    11: "Left Ankle",
+    12: "Head top",
+    13: "Neck"
+```
+Since the annatation format of these two datasets are different, we aligned their annotations to `COCO` format. You can download [Training List](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json) and put it at `dataset/`. To align these two datasets, we mainly did the following works:
+- Align the indexes of the `AI Challenger` keypoint to be consistent with `COCO` and unify the flags whether the keypoint is labeled/visible.
+- Discard the unique keypoints in `AI Challenger`. For keypoints not in this dataset but in `COCO`, set it to not labeled.
+- Rearranged `image_id` and `annotation id`.
+Training with merged annotation file converted to `COCO` format:
+```bash
+# keypoint detection model
+python3 -m paddle.distributed.launch tools/train.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml
+# pedestrian detection model
+python3 -m paddle.distributed.launch tools/train.py -c configs/picodet/application/pedestrian_detection/picodet_s_320_pedestrian.yml
+```
+## Model Deployment
+### Deploy Inference
+1. Export the trained model through the following command:
+```bash
+python3 tools/export_model.py -c configs/picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml --output_dir=outut_inference -o weights=output/picodet_s_192_pedestrian/model_final
+python3 tools/export_model.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --output_dir=outut_inference -o weights=output/tinypose_128x96/model_final
+```
+The exported model looks as:
+```
+picodet_s_192_pedestrian
+├── infer_cfg.yml
+├── model.pdiparams
+├── model.pdiparams.info
+└── model.pdmodel
+```
+You can also download `Deployment Model` from `Model Zoo` directly. And obtain the deployment models of pedestrian detection model and keypoint detection model, then unzip them.
+2. Python joint inference by detection and keypoint
+```bash
+# inference for one image
+python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --image_file={your image file} --device=GPU
+# inference for several images
+python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --image_dir={dir of image file} --device=GPU
+# inference for a video
+python3 deploy/python/det_keypoint_unite_infer.py --det_model_dir=output_inference/picodet_s_320_pedestrian --keypoint_model_dir=output_inference/tinypose_128x96 --video_file={your video file} --device=GPU
+```
+3. C++ joint inference by detection and keypoint
+- First, please refer to [C++ Deploy Inference](https://github.com/PaddlePaddle/PaddleDetection/tree/release/2.3/deploy/cpp), prepare the corresponding `paddle_inference` library and related dependencies according to your environment.
+- We provide [Compile Script](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.3/deploy/cpp/scripts/build.sh). You can fill the location of the relevant environment variables in this script and excute it to compile the above codes. you can get an executable file. Please ensure `WITH_KEYPOINT=ON` during this process.
+- After compilation, you can do inference like:
+```bash
+# inference for one image
+./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --image_file={your image file} --device=GPU
+# inference for several images
+./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --image_dir={dir of image file} --device=GPU
+# inference for a video
+./build/main --model_dir=output_inference/picodet_s_320_pedestrian --model_dir_keypoint=output_inference/tinypose_128x96 --video_file={your video file} --device=GPU
+```
+### Deployment on Mobile Devices
+#### Deploy directly using models we provide
+1. Download `Lite Model` from `Model Zoo` directly. And get the `.nb` format files of pedestrian detection model and keypoint detection model.
+2. Prepare environment for Paddle-Lite, you can obtain precompiled libraries from [PaddleLite Precompiled Libraries](https://paddle-lite.readthedocs.io/zh/latest/quick_start/release_lib.html). If FP16 is needed, you should download [Precompiled Libraries for FP16](https://github.com/PaddlePaddle/Paddle-Lite/releases/download/v2.10-rc/inference_lite_lib.android.armv8_clang_c++_static_with_extra_with_cv_with_fp16.tiny_publish_427e46.zip).
+3. Compile the code to run models. The detail can be seen in [Paddle-Lite Deployment on Mobile Devices](../../../deploy/lite/README.md).
+#### Deployment self-trained models on Mobile Devices
+If you want to deploy self-trained models, you can refer to the following steps:
+1. Export the trained model
+```bash
+python3 tools/export_model.py -c configs/picodet/application/pedestrian_detection/picodet_s_192_pedestrian.yml --output_dir=outut_inference -o weights=output/picodet_s_192_pedestrian/model_final TestReader.fuse_normalize=true
+python3 tools/export_model.py -c configs/keypoint/tiny_pose/tinypose_128x96.yml --output_dir=outut_inference -o weights=output/tinypose_128x96/model_final TestReader.fuse_normalize=true
+```
+2. Convert to Lite Model（rely on [Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite))
+- Install Paddle-Lite:
+```bash
+pip install paddlelite
+```
+- Run the following commands to obtain `.nb` format models of Paddle-Lite:
+```
+# 1. Convert pedestrian detection model
+# FP32
+paddle_lite_opt --model_dir=inference_model/picodet_s_192_pedestrian --valid_targets=arm --optimize_out=picodet_s_192_pedestrian_fp32
+# FP16
+paddle_lite_opt --model_dir=inference_model/picodet_s_192_pedestrian --valid_targets=arm --optimize_out=picodet_s_192_pedestrian_fp16 --enable_fp16=true
+# 2. keypoint detection model
+# FP32
+paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm --optimize_out=tinypose_128x96_fp32
+# FP16
+paddle_lite_opt --model_dir=inference_model/tinypose_128x96 --valid_targets=arm --optimize_out=tinypose_128x96_fp16 --enable_fp16=true
+```
+3. Compile the code to run models. The detail can be seen in [Paddle-Lite Deployment on Mobile Devices](../../../deploy/lite/README.md).
+We provide [Example Code](../../../deploy/lite/) including data preprocessing, inferece and postpreocess. You can modify the codes according to your actual needs.
+**Note:**
+- Add `TestReader.fuse_normalize=true` during the step of exporting model. The Normalize operation for the image will be executed in the model, which can achieve acceleration.
+- With FP16, we can get a faster inference speed. If you want to deploy the FP16 model, in addition to the model conversion step, you also need to compile the Paddle-Lite prediction library that supports FP16. The detail is in [Paddle Lite Deployment on ARM CPU](https://paddle-lite.readthedocs.io/zh/latest/demo_guides/arm_cpu.html).
+## Optimization Strategies
+TinyPose adopts the following strategies to balance the speed and accuracy of the model:
+- Lightweight backbone network for pose estimation, [wider naive Lite-HRNet](https://arxiv.org/abs/2104.06403).
+- Smaller input size.
+- Distribution-Aware coordinate Representation of Keypoints ([DARK](https://arxiv.org/abs/1910.06278)), which can improve the accuracy of the model under the low-resolution heatmap.
+- Unbiased Data Processing ([UDP](https://arxiv.org/abs/1911.07524)).
+- Augmentation by Information Dropping ([AID](https://arxiv.org/abs/2008.07139v2)).
+- FP16 inference.
--- a/docs/tutorials/PrepareKeypointDataSet_cn.md
+++ b/docs/tutorials/PrepareKeypointDataSet_cn.md
+简体中文 | [English](PrepareKeypointDataSet_en.md)
 # 如何准备关键点数据集
 ## 目录
 - [COCO数据集](#COCO数据集)
@@ -11,7 +13,7 @@
 ### COCO数据集（KeyPoint）说明
 在COCO中，关键点序号与部位的对应关系为：
 ```
-COCO keypoint indexes::
+COCO keypoint indexes:
        0: 'nose',
        1: 'left_eye',
        2: 'right_eye',
@@ -56,7 +58,7 @@ mpii
 ### MPII数据集的说明
 在MPII中，关键点序号与部位的对应关系为：
 ```
-MPII keypoint indexes::
+MPII keypoint indexes:
        0: 'right_ankle',
        1: 'right_knee',
        2: 'right_hip',

--- a/docs/tutorials/PrepareKeypointDataSet_en.md
+++ b/docs/tutorials/PrepareKeypointDataSet_en.md
+[简体中文](PrepareKeypointDataSet_cn.md) | English
+# How to prepare dataset?
+## Table of Contents
+- [COCO](#COCO)
+- [MPII](#MPII)
+- [Training for other dataset](#Training_for_other_dataset)
+## COCO
+### Preperation for COCO dataset
+We provide a one-click script to automatically complete the download and preparation of the COCO2017 dataset. Please refer to [COCO Download](https://github.com/PaddlePaddle/PaddleDetection/blob/f0a30f3ba6095ebfdc8fffb6d02766406afc438a/docs/tutorials/PrepareDataSet.md#COCO%E6%95%B0%E6%8D%AE).
+### Description for COCO dataset（Keypoint):
+In COCO, the indexes and corresponding keypoint name are:
+```
+COCO keypoint indexes:
+        0: 'nose',
+        1: 'left_eye',
+        2: 'right_eye',
+        3: 'left_ear',
+        4: 'right_ear',
+        5: 'left_shoulder',
+        6: 'right_shoulder',
+        7: 'left_elbow',
+        8: 'right_elbow',
+        9: 'left_wrist',
+        10: 'right_wrist',
+        11: 'left_hip',
+        12: 'right_hip',
+        13: 'left_knee',
+        14: 'right_knee',
+        15: 'left_ankle',
+        16: 'right_ankle'
+```
+Being different from detection task, the annotation files for keyPoint task are `person_keypoints_train2017.json` and `person_keypoints_val2017.json`. In these two json files, the terms `info`、`licenses` and `images` are same with detection task. However, the `annotations` and `categories` are different.
+In `categories`, in addition to the category, there are also the names of the keypoints and the connectivity among them.
+In `annotations`, the ID and image of each instance are annotated, as well as segmentation information and keypoint information. Among them, terms related to the keypoints are:
+- `keypoints`: `[x1,y1,v1 ...]`, which is a `List` with length 17*3=51. Each combination represents the coordinates and visibility of one keypoint. `v=0, x=0, y=0` indicates this keypoint is not visible and unlabeled. `v=1` indicates this keypoint is labeled but not visible. `v=2` indicates this keypoint is labeled and visible.
+- `bbox`: `[x1,y1,w,h]`, the bounding box of this instance.
+- `num_keypoints`: the number of labeled keypoints of this instance.
+## MPII
+### Preperation for MPII dataset
+Please download MPII dataset images and corresponding annotation files from [MPII Human Pose Dataset](http://human-pose.mpi-inf.mpg.de/#download), and save them to `dataset/mpii`.  You can use [mpii_annotations](https://download.openmmlab.com/mmpose/datasets/mpii_annotations.tar), which are already converted to `.json`.  The directory structure will be shown as:
+```
+mpii
+|── annotations
+|   |── mpii_gt_val.mat
+|   |── mpii_test.json
+|   |── mpii_train.json
+|   |── mpii_trainval.json
+|   `── mpii_val.json
+`── images
+    |── 000001163.jpg
+    |── 000003072.jpg
+```
+### Description for MPII dataset
+In MPII, the indexes and corresponding keypoint name are:
+```
+MPII keypoint indexes:
+        0: 'right_ankle',
+        1: 'right_knee',
+        2: 'right_hip',
+        3: 'left_hip',
+        4: 'left_knee',
+        5: 'left_ankle',
+        6: 'pelvis',
+        7: 'thorax',
+        8: 'upper_neck',
+        9: 'head_top',
+        10: 'right_wrist',
+        11: 'right_elbow',
+        12: 'right_shoulder',
+        13: 'left_shoulder',
+        14: 'left_elbow',
+        15: 'left_wrist',
+```
+The following example takes a parsed annotation information to illustrate the content of the annotation, each annotation information represents a person instance:
+```
+{
+    'joints_vis': [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
+    'joints': [
+        [-1.0, -1.0],
+        [-1.0, -1.0],
+        [-1.0, -1.0],
+        [-1.0, -1.0],
+        [-1.0, -1.0],
+        [-1.0, -1.0],
+        [-1.0, -1.0],
+        [1232.0, 288.0],
+        [1236.1271, 311.7755],
+        [1181.8729, -0.77553],
+        [692.0, 464.0],
+        [902.0, 417.0],
+        [1059.0, 247.0],
+        [1405.0, 329.0],
+        [1498.0, 613.0],
+        [1303.0, 562.0]
+    ],
+    'image': '077096718.jpg',
+    'scale': 9.516749,
+    'center': [1257.0, 297.0]
+}
+```
+- `joints_vis`: indicates whether the 16 keypoints are labeled respectively, if it is 0, the corresponding coordinate will be `[-1.0, -1.0]`.
+- `joints`: the coordinates of 16 keypoints.
+- `image`: image file which this instance belongs to.
+- `center`: the coordinate of person instance center, which is used to locate instance in the image.
+- `scale`: scale of the instance, corresponding to 200px.
+## Training for other dataset
+Here, we take `AI Challenger` dataset as example, to show how to align other datasets to `COCO` and add them into training of keypoint models.
+In `AI Challenger`, the indexes and corresponding keypoint name are:
+```
+AI Challenger Description:
+        0: 'Right Shoulder',
+        1: 'Right Elbow',
+        2: 'Right Wrist',
+        3: 'Left Shoulder',
+        4: 'Left Elbow',
+        5: 'Left Wrist',
+        6: 'Right Hip',
+        7: 'Right Knee',
+        8: 'Right Ankle',
+        9: 'Left Hip',
+        10: 'Left Knee',
+        11: 'Left Ankle',
+        12: 'Head top',
+        13: 'Neck'
+```
+1. Align the indexes of the `AI Challenger` keypoint to be consistent with `COCO`. For example, the index of `Right Shoulder` should be adjusted from `0` to `13`.
+2. Unify the flags whether the keypoint is labeled/visible. For example, `labeled and visible` in `AI Challenger` needs to be adjusted from `1` to `2`.
+3. In this proprocess, we discard the unique keypoints in this dataset (like `Neck`). For keypoints not in this dataset but in `COCO` (like `left_eye`), we set `v=0, x=0, y=0` to indicate these keypoints are not labeled.
+4. To avoid the problem of ID duplication in different datasets, the `image_id` and `annotation id` need to be rearranged.
+5. Rewrite the image path `file_name`, to make sure images can be accessed correctly.
+We also provide an [annotation file](https://bj.bcebos.com/v1/paddledet/data/keypoint/aic_coco_train_cocoformat.json) combining `COCO` trainset and `AI Challenger` trainset.