add keypoint benchmark data (#4814)

* add keypoint benchmark data * update benchmark-data and change default keypoint bs=8, to be same as C++ infer

add keypoint benchmark data (#4814)
* add keypoint benchmark data * update benchmark-data and change default keypoint bs=8, to be same as C++ infer
14eedfa3 · JYChen · GitHub · 692d7329 · 14eedfa3 · 14eedfa3
6 changed file
--- a/configs/keypoint/KeypointBenchmark.md
+++ b/configs/keypoint/KeypointBenchmark.md
+# Keypoint Inference Benchmark
+## Benchmark on Server
+We tested benchmarks in different runtime environments。 See the table below for details.
+| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) |
+| :------------------------ | :------: | :------: | :-----: | :---: | :---: |
+| LiteHRNet-18-256x192 | 88.8 ms |  40.7 ms | 4.4 ms | 2.0 ms | 1.8 ms |
+| LiteHRNet-18-384x288 | 188.0 ms | 79.3 ms | 4.8 ms | 3.6 ms | 3.2 ms |
+| LiteHRNet-30-256x192 | 148.4 ms | 69.0 ms | 7.1 ms | 3.1 ms | 2.8 ms |
+| LiteHRNet-30-384x288 | 309.8 ms | 133.5 ms | 8.2 ms | 6.0 ms | 5.3 ms |
+| PP-TinyPose-128x96 | 25.2 ms | 14.1 ms | 2.7 ms | 0.9 ms | 0.8 ms |
+| PP-TinyPose-256x192 | 82.4 ms | 36.1 ms | 3.0 ms | 1.5 ms | 1.1 ms |
+**Notes:**
+- These tests above are based Python deployment.
+- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
+- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
+- The time only includes inference time.
+| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) |
+| :------------------------ | :------: | :------: | :-----: | :---: | :---: |
+| DARK_HRNet_w32-256x192 | 363.93 ms | 97.38 ms | 4.13 ms | 3.74 ms | 1.75 ms |
+| DARK_HRNet_w32-384x288 | 823.71 ms | 218.55 ms | 9.44 ms | 8.91 ms | 2.96 ms |
+| HRNet_w32-256x192 | 363.67 ms | 97.64 ms | 4.11 ms | 3.71 ms | 1.72 ms |
+| HRNet_w32-256x256_mpii | 485.56 ms | 131.48 ms | 4.81 ms | 4.26 ms | 2.00 ms |
+| HRNet_w32-384x288 | 822.73 ms | 215.48 ms | 9.40 ms | 8.81 ms | 2.97 ms |
+| PP-TinyPose-128x96 | 24.06 ms | 13.05 ms | 2.43 ms | 0.75 ms | 0.72 ms |
+| PP-TinyPose-256x192 | 82.73 ms | 36.25 ms | 2.57 ms | 1.38 ms | 1.15 ms |
+**Notes:**
+- These tests above are based C++ deployment.
+- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
+- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
+- The time only includes inference time.
+## Benchmark on Mobile
+We tested benchmarks on Kirin and Qualcomm Snapdragon devices. See the table below for details.
+| Model | Kirin 980 (1-thread) | Kirin 980 (4-threads)  | Qualcomm Snapdragon 845 (1-thread) | Qualcomm Snapdragon 845 (4-threads) | Qualcomm Snapdragon 660 (1-thread) | Qualcomm Snapdragon 660 (4-threads) |
+| :------------------------ | :---: | :---: | :---: | :---: | :---: | :---: |
+| PicoDet-s-192x192 (det) | 14.85 ms | 5.45 ms | 17.50 ms | 7.56 ms | 80.08 ms | 27.36 ms |
+| PicoDet-s-320x320 (det) | 38.09 ms | 12.00 ms | 45.26 ms | 17.07 ms | 232.81 ms | 58.68 ms |
+| PP-TinyPose-128x96 (pose) | 12.03 ms | 5.09 ms | 13.14 ms | 6.73 ms | 71.87 ms | 20.04 ms |
+**Notes:**
+- These tests above are based Paddle Lite deployment, and version is v2.10-rc.
+- The time only includes inference time.
--- a/configs/keypoint/README.md
+++ b/configs/keypoint/README.md
@@ -128,6 +128,8 @@ python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inferenc
 **注意:**
 跟踪模型导出教程请参考`configs/mot/README.md`。
+## Benchmark
+我们给出了不同运行环境下的测试结果，供您在选用模型时参考。详细数据请见[Keypoint Inference Benchmark](./KeypointBenchmark.md)。
 ## 引用
 ```

--- a/configs/keypoint/README_en.md
+++ b/configs/keypoint/README_en.md
@@ -126,6 +126,8 @@ python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inferenc
 **Note:**
 To export MOT model, please refer to [Here](../../configs/mot/README_en.md).
+## Benchmark
+We provide benchmarks in different runtime environments for your reference when choosing models. See [Keypoint Inference Benchmark](./KeypointBenchmark.md) for details.
 ## Reference
 ```

--- a/configs/keypoint/tiny_pose/README.md
+++ b/configs/keypoint/tiny_pose/README.md
@@ -71,6 +71,7 @@ PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测
 - 速度测试环境为qualcomm snapdragon 865，采用arm8下4线程、FP32推理得到。
 - Pipeline速度包含模型的预处理、推理及后处理部分。
 - 其他优秀开源模型的测试及部署方案，请参考[这里](https://github.com/zhiboniu/MoveNet-PaddleLite)。
+- 更多环境下的性能测试结果，请参考[Keypoint Inference Benchmark](../KeypointBenchmark.md)。
 ## 模型训练
 关键点检测模型与行人检测模型的训练集在`COCO`以外还扩充了[AI Challenger](https://arxiv.org/abs/1711.06475)数据集，各数据集关键点定义如下：

--- a/configs/keypoint/tiny_pose/README_en.md
+++ b/configs/keypoint/tiny_pose/README_en.md
@@ -71,6 +71,7 @@ If you want to deploy it on the mobile devives, you also need:
 - The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8, FP32.
 - Pipeline time includes time for preprocess, inferece and postprocess.
 - About the deployment and testing for other opensource model, please refer to [Here](https://github.com/zhiboniu/MoveNet-PaddleLite).
+- For more performance data in other runtime environment, please refer to [Keypoint Inference Benchmark](../KeypointBenchmark.md).
 ## Model Training
 In addition to `COCO`, the trainset for keypoint detection model and pedestrian detection model also includes [AI Challenger](https://arxiv.org/abs/1711.06475). Keypoints of each dataset are defined as follows:

--- a/deploy/python/det_keypoint_unite_utils.py
+++ b/deploy/python/det_keypoint_unite_utils.py
@@ -42,7 +42,7 @@ def argsparser():
    parser.add_argument(
        "--keypoint_batch_size",
        type=int,
-        default=1,
+        default=8,
        help=("batch_size for keypoint inference. In detection-keypoint unit"
              "inference, the batch size in detection is 1. Then collate det "
              "result in batch for keypoint inference."))