From 14eedfa3d38fc2a2abd84629e6d905a21c255b5c Mon Sep 17 00:00:00 2001 From: JYChen Date: Thu, 23 Dec 2021 14:31:38 +0800 Subject: [PATCH] add keypoint benchmark data (#4814) * add keypoint benchmark data * update benchmark-data and change default keypoint bs=8, to be same as C++ infer --- configs/keypoint/KeypointBenchmark.md | 50 +++++++++++++++++++++++ configs/keypoint/README.md | 2 + configs/keypoint/README_en.md | 2 + configs/keypoint/tiny_pose/README.md | 1 + configs/keypoint/tiny_pose/README_en.md | 1 + deploy/python/det_keypoint_unite_utils.py | 2 +- 6 files changed, 57 insertions(+), 1 deletion(-) create mode 100644 configs/keypoint/KeypointBenchmark.md diff --git a/configs/keypoint/KeypointBenchmark.md b/configs/keypoint/KeypointBenchmark.md new file mode 100644 index 000000000..c7e5bd6ac --- /dev/null +++ b/configs/keypoint/KeypointBenchmark.md @@ -0,0 +1,50 @@ +# Keypoint Inference Benchmark + +## Benchmark on Server +We tested benchmarks in different runtime environments。 See the table below for details. + +| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) | +| :------------------------ | :------: | :------: | :-----: | :---: | :---: | +| LiteHRNet-18-256x192 | 88.8 ms | 40.7 ms | 4.4 ms | 2.0 ms | 1.8 ms | +| LiteHRNet-18-384x288 | 188.0 ms | 79.3 ms | 4.8 ms | 3.6 ms | 3.2 ms | +| LiteHRNet-30-256x192 | 148.4 ms | 69.0 ms | 7.1 ms | 3.1 ms | 2.8 ms | +| LiteHRNet-30-384x288 | 309.8 ms | 133.5 ms | 8.2 ms | 6.0 ms | 5.3 ms | +| PP-TinyPose-128x96 | 25.2 ms | 14.1 ms | 2.7 ms | 0.9 ms | 0.8 ms | +| PP-TinyPose-256x192 | 82.4 ms | 36.1 ms | 3.0 ms | 1.5 ms | 1.1 ms | + +**Notes:** +- These tests above are based Python deployment. +- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6. +- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8. +- The time only includes inference time. + + +| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) | +| :------------------------ | :------: | :------: | :-----: | :---: | :---: | +| DARK_HRNet_w32-256x192 | 363.93 ms | 97.38 ms | 4.13 ms | 3.74 ms | 1.75 ms | +| DARK_HRNet_w32-384x288 | 823.71 ms | 218.55 ms | 9.44 ms | 8.91 ms | 2.96 ms | +| HRNet_w32-256x192 | 363.67 ms | 97.64 ms | 4.11 ms | 3.71 ms | 1.72 ms | +| HRNet_w32-256x256_mpii | 485.56 ms | 131.48 ms | 4.81 ms | 4.26 ms | 2.00 ms | +| HRNet_w32-384x288 | 822.73 ms | 215.48 ms | 9.40 ms | 8.81 ms | 2.97 ms | +| PP-TinyPose-128x96 | 24.06 ms | 13.05 ms | 2.43 ms | 0.75 ms | 0.72 ms | +| PP-TinyPose-256x192 | 82.73 ms | 36.25 ms | 2.57 ms | 1.38 ms | 1.15 ms | + + +**Notes:** +- These tests above are based C++ deployment. +- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6. +- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8. +- The time only includes inference time. + +## Benchmark on Mobile +We tested benchmarks on Kirin and Qualcomm Snapdragon devices. See the table below for details. + +| Model | Kirin 980 (1-thread) | Kirin 980 (4-threads) | Qualcomm Snapdragon 845 (1-thread) | Qualcomm Snapdragon 845 (4-threads) | Qualcomm Snapdragon 660 (1-thread) | Qualcomm Snapdragon 660 (4-threads) | +| :------------------------ | :---: | :---: | :---: | :---: | :---: | :---: | +| PicoDet-s-192x192 (det) | 14.85 ms | 5.45 ms | 17.50 ms | 7.56 ms | 80.08 ms | 27.36 ms | +| PicoDet-s-320x320 (det) | 38.09 ms | 12.00 ms | 45.26 ms | 17.07 ms | 232.81 ms | 58.68 ms | +| PP-TinyPose-128x96 (pose) | 12.03 ms | 5.09 ms | 13.14 ms | 6.73 ms | 71.87 ms | 20.04 ms | + +**Notes:** +- These tests above are based Paddle Lite deployment, and version is v2.10-rc. +- The time only includes inference time. diff --git a/configs/keypoint/README.md b/configs/keypoint/README.md index 0a3e68b59..d7d6c1e55 100644 --- a/configs/keypoint/README.md +++ b/configs/keypoint/README.md @@ -128,6 +128,8 @@ python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inferenc **注意:** 跟踪模型导出教程请参考`configs/mot/README.md`。 +## Benchmark +我们给出了不同运行环境下的测试结果,供您在选用模型时参考。详细数据请见[Keypoint Inference Benchmark](./KeypointBenchmark.md)。 ## 引用 ``` diff --git a/configs/keypoint/README_en.md b/configs/keypoint/README_en.md index 1fd562e9c..9ab6d1b74 100644 --- a/configs/keypoint/README_en.md +++ b/configs/keypoint/README_en.md @@ -126,6 +126,8 @@ python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inferenc **Note:** To export MOT model, please refer to [Here](../../configs/mot/README_en.md). +## Benchmark +We provide benchmarks in different runtime environments for your reference when choosing models. See [Keypoint Inference Benchmark](./KeypointBenchmark.md) for details. ## Reference ``` diff --git a/configs/keypoint/tiny_pose/README.md b/configs/keypoint/tiny_pose/README.md index 3a2cdabd6..276222ce8 100644 --- a/configs/keypoint/tiny_pose/README.md +++ b/configs/keypoint/tiny_pose/README.md @@ -71,6 +71,7 @@ PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测 - 速度测试环境为qualcomm snapdragon 865,采用arm8下4线程、FP32推理得到。 - Pipeline速度包含模型的预处理、推理及后处理部分。 - 其他优秀开源模型的测试及部署方案,请参考[这里](https://github.com/zhiboniu/MoveNet-PaddleLite)。 +- 更多环境下的性能测试结果,请参考[Keypoint Inference Benchmark](../KeypointBenchmark.md)。 ## 模型训练 关键点检测模型与行人检测模型的训练集在`COCO`以外还扩充了[AI Challenger](https://arxiv.org/abs/1711.06475)数据集,各数据集关键点定义如下: diff --git a/configs/keypoint/tiny_pose/README_en.md b/configs/keypoint/tiny_pose/README_en.md index 6bd069ee1..d2b33a0fe 100644 --- a/configs/keypoint/tiny_pose/README_en.md +++ b/configs/keypoint/tiny_pose/README_en.md @@ -71,6 +71,7 @@ If you want to deploy it on the mobile devives, you also need: - The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8, FP32. - Pipeline time includes time for preprocess, inferece and postprocess. - About the deployment and testing for other opensource model, please refer to [Here](https://github.com/zhiboniu/MoveNet-PaddleLite). +- For more performance data in other runtime environment, please refer to [Keypoint Inference Benchmark](../KeypointBenchmark.md). ## Model Training In addition to `COCO`, the trainset for keypoint detection model and pedestrian detection model also includes [AI Challenger](https://arxiv.org/abs/1711.06475). Keypoints of each dataset are defined as follows: diff --git a/deploy/python/det_keypoint_unite_utils.py b/deploy/python/det_keypoint_unite_utils.py index cbae04333..ccd4a9b45 100644 --- a/deploy/python/det_keypoint_unite_utils.py +++ b/deploy/python/det_keypoint_unite_utils.py @@ -42,7 +42,7 @@ def argsparser(): parser.add_argument( "--keypoint_batch_size", type=int, - default=1, + default=8, help=("batch_size for keypoint inference. In detection-keypoint unit" "inference, the batch size in detection is 1. Then collate det " "result in batch for keypoint inference.")) -- GitLab