未验证 提交 14eedfa3 编写于 作者: J JYChen 提交者: GitHub

add keypoint benchmark data (#4814)

* add keypoint benchmark data

* update benchmark-data and change default keypoint bs=8, to be same as C++ infer
上级 692d7329
# Keypoint Inference Benchmark
## Benchmark on Server
We tested benchmarks in different runtime environments。 See the table below for details.
| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) |
| :------------------------ | :------: | :------: | :-----: | :---: | :---: |
| LiteHRNet-18-256x192 | 88.8 ms | 40.7 ms | 4.4 ms | 2.0 ms | 1.8 ms |
| LiteHRNet-18-384x288 | 188.0 ms | 79.3 ms | 4.8 ms | 3.6 ms | 3.2 ms |
| LiteHRNet-30-256x192 | 148.4 ms | 69.0 ms | 7.1 ms | 3.1 ms | 2.8 ms |
| LiteHRNet-30-384x288 | 309.8 ms | 133.5 ms | 8.2 ms | 6.0 ms | 5.3 ms |
| PP-TinyPose-128x96 | 25.2 ms | 14.1 ms | 2.7 ms | 0.9 ms | 0.8 ms |
| PP-TinyPose-256x192 | 82.4 ms | 36.1 ms | 3.0 ms | 1.5 ms | 1.1 ms |
**Notes:**
- These tests above are based Python deployment.
- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
- The time only includes inference time.
| Model | CPU + MKLDNN (thread=1) | CPU + MKLDNN (thread=4) | GPU | TensorRT (FP32) | TensorRT (FP16) |
| :------------------------ | :------: | :------: | :-----: | :---: | :---: |
| DARK_HRNet_w32-256x192 | 363.93 ms | 97.38 ms | 4.13 ms | 3.74 ms | 1.75 ms |
| DARK_HRNet_w32-384x288 | 823.71 ms | 218.55 ms | 9.44 ms | 8.91 ms | 2.96 ms |
| HRNet_w32-256x192 | 363.67 ms | 97.64 ms | 4.11 ms | 3.71 ms | 1.72 ms |
| HRNet_w32-256x256_mpii | 485.56 ms | 131.48 ms | 4.81 ms | 4.26 ms | 2.00 ms |
| HRNet_w32-384x288 | 822.73 ms | 215.48 ms | 9.40 ms | 8.81 ms | 2.97 ms |
| PP-TinyPose-128x96 | 24.06 ms | 13.05 ms | 2.43 ms | 0.75 ms | 0.72 ms |
| PP-TinyPose-256x192 | 82.73 ms | 36.25 ms | 2.57 ms | 1.38 ms | 1.15 ms |
**Notes:**
- These tests above are based C++ deployment.
- The environment is NVIDIA T4 / PaddlePaddle(commit: 7df301f2fc0602745e40fa3a7c43ccedd41786ca) / CUDA10.1 / CUDNN7 / Python3.7 / TensorRT6.
- The test is based on deploy/python/det_keypoint_unite_infer.py with image demo/000000014439.jpg. And input batch size for keypoint model is set to 8.
- The time only includes inference time.
## Benchmark on Mobile
We tested benchmarks on Kirin and Qualcomm Snapdragon devices. See the table below for details.
| Model | Kirin 980 (1-thread) | Kirin 980 (4-threads) | Qualcomm Snapdragon 845 (1-thread) | Qualcomm Snapdragon 845 (4-threads) | Qualcomm Snapdragon 660 (1-thread) | Qualcomm Snapdragon 660 (4-threads) |
| :------------------------ | :---: | :---: | :---: | :---: | :---: | :---: |
| PicoDet-s-192x192 (det) | 14.85 ms | 5.45 ms | 17.50 ms | 7.56 ms | 80.08 ms | 27.36 ms |
| PicoDet-s-320x320 (det) | 38.09 ms | 12.00 ms | 45.26 ms | 17.07 ms | 232.81 ms | 58.68 ms |
| PP-TinyPose-128x96 (pose) | 12.03 ms | 5.09 ms | 13.14 ms | 6.73 ms | 71.87 ms | 20.04 ms |
**Notes:**
- These tests above are based Paddle Lite deployment, and version is v2.10-rc.
- The time only includes inference time.
......@@ -128,6 +128,8 @@ python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inferenc
**注意:**
跟踪模型导出教程请参考`configs/mot/README.md`
## Benchmark
我们给出了不同运行环境下的测试结果,供您在选用模型时参考。详细数据请见[Keypoint Inference Benchmark](./KeypointBenchmark.md)
## 引用
```
......
......@@ -126,6 +126,8 @@ python deploy/python/mot_keypoint_unite_infer.py --mot_model_dir=output_inferenc
**Note:**
To export MOT model, please refer to [Here](../../configs/mot/README_en.md).
## Benchmark
We provide benchmarks in different runtime environments for your reference when choosing models. See [Keypoint Inference Benchmark](./KeypointBenchmark.md) for details.
## Reference
```
......
......@@ -71,6 +71,7 @@ PP-TinyPose是PaddleDetecion针对移动端设备优化的实时关键点检测
- 速度测试环境为qualcomm snapdragon 865,采用arm8下4线程、FP32推理得到。
- Pipeline速度包含模型的预处理、推理及后处理部分。
- 其他优秀开源模型的测试及部署方案,请参考[这里](https://github.com/zhiboniu/MoveNet-PaddleLite)
- 更多环境下的性能测试结果,请参考[Keypoint Inference Benchmark](../KeypointBenchmark.md)
## 模型训练
关键点检测模型与行人检测模型的训练集在`COCO`以外还扩充了[AI Challenger](https://arxiv.org/abs/1711.06475)数据集,各数据集关键点定义如下:
......
......@@ -71,6 +71,7 @@ If you want to deploy it on the mobile devives, you also need:
- The inference time is tested on a Qualcomm Snapdragon 865, with 4 threads at arm8, FP32.
- Pipeline time includes time for preprocess, inferece and postprocess.
- About the deployment and testing for other opensource model, please refer to [Here](https://github.com/zhiboniu/MoveNet-PaddleLite).
- For more performance data in other runtime environment, please refer to [Keypoint Inference Benchmark](../KeypointBenchmark.md).
## Model Training
In addition to `COCO`, the trainset for keypoint detection model and pedestrian detection model also includes [AI Challenger](https://arxiv.org/abs/1711.06475). Keypoints of each dataset are defined as follows:
......
......@@ -42,7 +42,7 @@ def argsparser():
parser.add_argument(
"--keypoint_batch_size",
type=int,
default=1,
default=8,
help=("batch_size for keypoint inference. In detection-keypoint unit"
"inference, the batch size in detection is 1. Then collate det "
"result in batch for keypoint inference."))
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册