Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
s920243400
PaddleDetection
提交
aa9e0d54
P
PaddleDetection
项目概览
s920243400
/
PaddleDetection
与 Fork 源项目一致
Fork自
PaddlePaddle / PaddleDetection
通知
2
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
PaddleDetection
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
aa9e0d54
编写于
5月 11, 2021
作者:
G
Guanghua Yu
提交者:
GitHub
5月 11, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add inference benchmark (#2508)
* add inference benchmark
上级
99143a3f
变更
17
显示空白变更内容
内联
并排
Showing
17 changed file
with
1016 addition
and
311 deletion
+1016
-311
deploy/README.md
deploy/README.md
+34
-34
deploy/TENSOR_RT.md
deploy/TENSOR_RT.md
+1
-1
deploy/benchmark/benchmark.sh
deploy/benchmark/benchmark.sh
+36
-0
deploy/benchmark/benchmark_quant.sh
deploy/benchmark/benchmark_quant.sh
+23
-0
deploy/benchmark/log_parser_excel.py
deploy/benchmark/log_parser_excel.py
+301
-0
deploy/cpp/docs/Jetson_build.md
deploy/cpp/docs/Jetson_build.md
+6
-3
deploy/cpp/docs/linux_build.md
deploy/cpp/docs/linux_build.md
+6
-3
deploy/cpp/docs/windows_vs2019_build.md
deploy/cpp/docs/windows_vs2019_build.md
+6
-3
deploy/cpp/include/object_detector.h
deploy/cpp/include/object_detector.h
+29
-13
deploy/cpp/src/main.cc
deploy/cpp/src/main.cc
+131
-50
deploy/cpp/src/object_detector.cc
deploy/cpp/src/object_detector.cc
+32
-28
deploy/python/README.md
deploy/python/README.md
+14
-43
deploy/python/infer.py
deploy/python/infer.py
+122
-130
deploy/python/preprocess.py
deploy/python/preprocess.py
+8
-3
deploy/python/utils.py
deploy/python/utils.py
+262
-0
docs/images/input_shape.png
docs/images/input_shape.png
+0
-0
ppdet/engine/export_utils.py
ppdet/engine/export_utils.py
+5
-0
未找到文件。
deploy/README.md
浏览文件 @
aa9e0d54
# PaddleDetection 预测部署
训练得到一个满足要求的模型后,如果想要将该模型部署到已选择的平台上,需要通过
`tools/export_model.py`
将模型导出预测部署的模型和配置文件。
并在同一文件夹下导出预测时使用的配置文件,配置文件名为
`infer_cfg.yml`
。
## 1、`PaddleDetection`目前支持的部署方式按照部署设备可以分为:
-
在本机
`python`
语言部署,支持在有
`python paddle`
(支持
`CPU`
、
`GPU`
)环境下部署,有两种方式:
-
使用
`tools/infer.py`
,此种方式依赖
`PaddleDetection`
代码库。
-
将模型导出,使用
`deploy/python/infer.py`
,此种方式不依赖
`PaddleDetection`
代码库,可以单个
`python`
文件部署。
-
在本机
`C++`
语言使用
`paddle inference`
预测库部署,支持在
`Linux`
和
`Windows`
系统下部署。请参考文档
[
C++部署
](
cpp/README.md
)
。
-
在服务器端以服务形式部署,使用
[
PaddleServing
](
./serving/README.md
)
部署。
-
在手机移动端部署,使用
[
Paddle-Lite
](
https://github.com/PaddlePaddle/Paddle-Lite
)
在手机移动端部署。
常见模型部署Demo请参考
[
Paddle-Lite-Demo
](
https://github.com/PaddlePaddle/Paddle-Lite-Demo
)
。
-
`NV Jetson`
嵌入式设备上部署
-
`TensorRT`
加速请参考文档
[
TensorRT预测部署教程
](
TENSOR_RT.md
)
## 2、模型导出
目前支持的部署方式有:
-
`Paddle Inference预测库`
部署:
-
`Python`
语言部署,支持
`CPU`
、
`GPU`
和
`XPU`
环境,参考文档
[
python部署
](
python/README.md
)
。
-
`C++`
语言部署 ,支持
`CPU`
、
`GPU`
和
`XPU`
环境,支持在
`Linux`
、
`Windows`
系统下部署,支持
`NV Jetson`
嵌入式设备上部署。请参考文档
[
C++部署
](
cpp/README.md
)
。
-
`TensorRT`
加速:请参考文档
[
TensorRT预测部署教程
](
TENSOR_RT.md
)
-
服务器端部署:使用
[
PaddleServing
](
./serving/README.md
)
部署。
-
手机移动端部署:使用
[
Paddle-Lite
](
https://github.com/PaddlePaddle/Paddle-Lite
)
在手机移动端部署。
## 1.模型导出
使用
`tools/export_model.py`
脚本导出模型已经部署时使用的配置文件,配置文件名字为
`infer_cfg.yml`
。模型导出脚本如下:
```
bash
# 导出YOLOv3模型
...
...
@@ -29,47 +26,50 @@ python tools/export_model.py -c configs/yolov3/yolov3_darknet53_270e_coco.yml -o
模型导出具体请参考文档
[
PaddleDetection模型导出教程
](
EXPORT_MODEL.md
)
。
## 3、如何选择部署时依赖库的版本
### (1)CUDA、cuDNN、TensorRT版本选择
由于CUDA、cuDNN、TENSORRT不一定都是向前兼容的,需要使用与编译Paddle预测库使用的环境完全一致的环境进行部署。
### (2)部署时预测库版本、预测引擎版本选择
## 2.部署环境准备
-
Linux、Windows平台下C++部署,需要使用Paddle预测库进行部署。
(1)Paddle官网提供在不同平台、不同环境下编译好的预测库,您可以直接使用,请在这里
[
Paddle预测库
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html
)
选择。
(2)如果您将要部署的平台环境,Paddle官网上没有提供已编译好的预测库,您可以自行编译,编译过程请参考
[
Paddle源码编译
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html
)
。
-
Python预测:在python环境下安装PaddlePaddle环境即可,如需TensorRT预测,在
[
Paddle Release版本
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release
)
中下载合适的wheel包即可。
-
C++预测库:请从
[
这里
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html
)
,如果需要使用TensorRT,请下载带有TensorRT编译的预测库。您也可以自行编译,编译过程请参考
[
Paddle源码编译
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html
)
。
**注意:**
Paddle预测库版本需要>=2.0
-
Python语言部署,需要在对应平台上安装Paddle Python包。如果Paddle官网上没有提供该平台下的Paddle Python包,您可以自行编译,编译过程请参考
[
Paddle源码编译
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html
)
。
-
PaddleServing部署
PaddleServing 0.4.0是基于Paddle 1.8.4开发,PaddleServing 0.5.0是基于Paddle2.0开发
。
请选择PaddleServing>0.5.0以上版本,具体可参考
[
PaddleServing安装文档
](
https://github.com/PaddlePaddle/Serving/blob/develop/README.md#installation
)
。
-
Paddle-Lite部署
Paddle-Lite支持OP列表请参考:
[
Paddle-Lite支持的OP列表
](
https://paddle-lite.readthedocs.io/zh/latest/source_compile/library.html
)
,请跟进所部署模型中使用到的op选择Paddle-Lite版本。
-
NV Jetson部署
Paddle官网提供在NV Jetson平台上已经编译好的预测库,
[
Paddle NV Jetson预测库
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html
)
。
若列表中没有您需要的预测库,您可以在您的平台上自行编译,编译过程请参考
[
Paddle源码编译
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html
)
。
Paddle官网提供在NV Jetson平台上已经编译好的预测库,
[
Paddle NV Jetson预测库
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/05_inference_deployment/inference/build_and_install_lib_cn.html
)
。若列表中没有您需要的预测库,您可以在您的平台上自行编译,编译过程请参考
[
Paddle源码编译
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/compile/linux-compile.html
)
。
## 3.部署预测
-
Python部署:使用
`deploy/python/infer.py`
进行预测,可具体参考
[
python部署文档
](
python/README.md
)
。
```
shell
python deploy/python/infer.py
--model_dir
=
/path/to/models
--image_file
=
/path/to/image
--use_gpu
=(
False/True
)
```
## 4、部署
-
C++部署,先使用跨平台编译工具
`CMake`
根据
`CMakeLists.txt`
生成
`Makefile`
,支持
`Windows、Linux、NV Jetson`
平台,然后进行编译产出可执行文件。可以直接使用
`cpp/scripts/build.sh`
脚本编译:
-
C++部署,先使用跨平台编译工具
`CMake`
根据
`CMakeLists.txt`
生成
`Makefile`
,支持
[
Windows
](
cpp/docs/windows_vs2019_build.md
)
、
[
Linux
](
cpp/docs/linux_build.md
)
、
[
NV Jetson
](
cpp/docs/Jetson_build.md
)
平台部署,然后进行编译产出可执行文件。可以直接使用
`cpp/scripts/build.sh`
脚本编译:
```
buildoutcfg
cd cpp
sh scripts/build.sh
```
-
Python部署,可以使用使用
`tools/infer.py`
(以来PaddleDetection源码)部署,或者使用
`deploy/python/infer.py`
单文件部署
-
PaddleServing部署请参考,
[
PaddleServing部署
](
./serving/README.md
)
部署。
-
手机移动端部署,请参考
[
Paddle-Lite-Demo
](
https://github.com/PaddlePaddle/Paddle-Lite-Demo
)
部署。
## 4.BenchMark测试
-
使用导出的模型,运行BenchMark批量测试脚本:
```
shell
sh deploy/benchmark/benchmark_ppdet.sh
{
model_dir
}
{
model_name
}
```
**注意**
如果是量化模型,请使用
`deploy/benchmark/benchmark_ppdet_quant.sh`
脚本。
-
将测试结果log导出至Excel中:
```
python deploy/benchmark/log_parser_excel.py --log_path=./output_pipeline --output_name=benchmark_excel.xlsx
```
## 5
、
常见问题QA
## 5
.
常见问题QA
-
1、
`Paddle 1.8.4`
训练的模型,可以用
`Paddle2.0`
部署吗?
Paddle 2.0是兼容Paddle 1.8.4的,因此是可以的。但是部分模型(如SOLOv2)使用到了Paddle 2.0中新增OP,这类模型不可以。
...
...
deploy/TENSOR_RT.md
浏览文件 @
aa9e0d54
...
...
@@ -38,7 +38,7 @@ TensorRT版本<=5时,使用TensorRT预测时,只支持固定尺寸输入。
`TestReader.inputs_def.image_shape`
设置的是输入TensorRT引擎的数据尺寸(在像FasterRCNN中,
`TestReader.inputs_def.image_shape`
指定的是在
`Pad`
操作之前的图像数据尺寸)。
可以通过
[
visualdl
](
https://www.paddlepaddle.org.cn/paddle/visualdl/demo/graph
)
打开
`model.pdmodel`
文件,查看输入的第一个Tensor尺寸是否是固定的,如果不指定,尺寸会用
`?`
表示,如下图所示:
![
img
](
img
s/input_shape.png
)
![
img
](
../docs/image
s/input_shape.png
)
同时需要将图像预处理后的尺寸与设置车模型输入尺寸保持一致,需要设置
`infer_cfg.yml`
配置文件中
`Resize OP`
的
`target_size`
参数和
`keep_ratio`
参数。
...
...
deploy/benchmark/benchmark.sh
0 → 100644
浏览文件 @
aa9e0d54
# All rights `PaddleDetection` reserved
#!/bin/bash
model_dir
=
$1
model_name
=
$2
export
img_dir
=
"demo"
export
log_path
=
"output_pipeline"
echo
"model_dir :
${
model_dir
}
"
echo
"img_dir:
${
img_dir
}
"
# TODO: support batch size>1
for
use_mkldnn
in
"True"
"False"
;
do
for
threads
in
"1"
"6"
;
do
echo
"
${
model_name
}
${
model_dir
}
, use_mkldnn:
${
use_mkldnn
}
threads:
${
threads
}
"
python deploy/python/infer.py
\
--model_dir
=
${
model_dir
}
\
--run_benchmark
True
\
--enable_mkldnn
=
${
use_mkldnn
}
\
--use_gpu
=
False
\
--cpu_threads
=
${
threads
}
\
--image_dir
=
${
img_dir
}
2>&1 |
tee
${
log_path
}
/
${
model_name
}
_cpu_usemkldnn_
${
use_mkldnn
}
_cputhreads_
${
threads
}
_bs1_infer.log
done
done
for
run_mode
in
"fluid"
"trt_fp32"
"trt_fp16"
;
do
echo
"
${
model_name
}
${
model_dir
}
, run_mode:
${
run_mode
}
"
python deploy/python/infer.py
\
--model_dir
=
${
model_dir
}
\
--run_benchmark
=
True
\
--use_gpu
=
True
\
--run_mode
=
${
run_mode
}
\
--image_dir
=
${
img_dir
}
2>&1 |
tee
${
log_path
}
/
${
model_name
}
_gpu_runmode_
${
run_mode
}
_bs1_infer.log
done
deploy/benchmark/benchmark_quant.sh
0 → 100644
浏览文件 @
aa9e0d54
# All rights `PaddleDetection` reserved
#!/bin/bash
model_dir
=
$1
model_name
=
$2
export
img_dir
=
"demo"
export
log_path
=
"output_pipeline"
echo
"model_dir :
${
model_dir
}
"
echo
"img_dir:
${
img_dir
}
"
# TODO: support batch size>1
for
run_mode
in
"trt_int8"
;
do
echo
"
${
model_name
}
${
model_dir
}
, run_mode:
${
run_mode
}
"
python deploy/python/infer.py
\
--model_dir
=
${
model_dir
}
\
--run_benchmark
=
True
\
--use_gpu
=
True
\
--run_mode
=
${
run_mode
}
\
--image_dir
=
${
img_dir
}
2>&1 |
tee
${
log_path
}
/
${
model_name
}
_gpu_runmode_
${
run_mode
}
_bs1_infer.log
done
deploy/benchmark/log_parser_excel.py
0 → 100644
浏览文件 @
aa9e0d54
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
os
import
sys
import
re
import
argparse
import
pandas
as
pd
def
parse_args
():
"""
parse input args
"""
parser
=
argparse
.
ArgumentParser
()
parser
.
add_argument
(
"--log_path"
,
type
=
str
,
default
=
"./output_pipeline"
,
help
=
"benchmark log path"
)
parser
.
add_argument
(
"--output_name"
,
type
=
str
,
default
=
"benchmark_excel.xlsx"
,
help
=
"output excel file name"
)
parser
.
add_argument
(
"--analysis_trt"
,
dest
=
"analysis_trt"
,
action
=
'store_true'
)
parser
.
add_argument
(
"--analysis_mkl"
,
dest
=
"analysis_mkl"
,
action
=
'store_true'
)
return
parser
.
parse_args
()
def
find_all_logs
(
path_walk
):
"""
find all .log files from target dir
"""
for
root
,
ds
,
files
in
os
.
walk
(
path_walk
):
for
file_name
in
files
:
if
re
.
match
(
r
'.*.log'
,
file_name
):
full_path
=
os
.
path
.
join
(
root
,
file_name
)
yield
file_name
,
full_path
def
process_log
(
file_name
):
"""
process log to dict
"""
output_dict
=
{}
with
open
(
file_name
,
'r'
)
as
f
:
for
i
,
data
in
enumerate
(
f
.
readlines
()):
if
i
==
0
:
continue
line_lists
=
data
.
split
(
" "
)
# conf info
if
"runtime_device:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"runtime_device:"
)
output_dict
[
"runtime_device"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"ir_optim:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"ir_optim:"
)
output_dict
[
"ir_optim"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"enable_memory_optim:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"enable_memory_optim:"
)
output_dict
[
"enable_memory_optim"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"enable_tensorrt:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"enable_tensorrt:"
)
output_dict
[
"enable_tensorrt"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"precision:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"precision:"
)
output_dict
[
"precision"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"enable_mkldnn:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"enable_mkldnn:"
)
output_dict
[
"enable_mkldnn"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"cpu_math_library_num_threads:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"cpu_math_library_num_threads:"
)
output_dict
[
"cpu_math_library_num_threads"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
# model info
if
"model_name:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"model_name:"
)
output_dict
[
"model_name"
]
=
list
(
filter
(
None
,
line_lists
[
pos_buf
+
1
].
strip
().
split
(
'/'
)))[
-
1
]
# data info
if
"batch_size:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"batch_size:"
)
output_dict
[
"batch_size"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
if
"input_shape:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"input_shape:"
)
output_dict
[
"input_shape"
]
=
line_lists
[
pos_buf
+
1
].
strip
()
# perf info
if
"cpu_rss(MB):"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"cpu_rss(MB):"
)
output_dict
[
"cpu_rss(MB)"
]
=
line_lists
[
pos_buf
+
1
].
strip
(
).
split
(
','
)[
0
]
if
"gpu_rss(MB):"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"gpu_rss(MB):"
)
output_dict
[
"gpu_rss(MB)"
]
=
line_lists
[
pos_buf
+
1
].
strip
(
).
split
(
','
)[
0
]
if
"gpu_util:"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"gpu_util:"
)
output_dict
[
"gpu_util"
]
=
line_lists
[
pos_buf
+
1
].
strip
().
split
(
','
)[
0
]
if
"preproce_time(ms):"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"preproce_time(ms):"
)
output_dict
[
"preproce_time(ms)"
]
=
line_lists
[
pos_buf
+
1
].
strip
().
split
(
','
)[
0
]
if
"inference_time(ms):"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"inference_time(ms):"
)
output_dict
[
"inference_time(ms)"
]
=
line_lists
[
pos_buf
+
1
].
strip
().
split
(
','
)[
0
]
if
"postprocess_time(ms):"
in
line_lists
:
pos_buf
=
line_lists
.
index
(
"postprocess_time(ms):"
)
output_dict
[
"postprocess_time(ms)"
]
=
line_lists
[
pos_buf
+
1
].
strip
().
split
(
','
)[
0
]
return
output_dict
def
filter_df_merge
(
cpu_df
,
filter_column
=
None
):
"""
process cpu data frame, merge by 'model_name', 'batch_size'
Args:
cpu_df ([type]): [description]
"""
if
not
filter_column
:
raise
Exception
(
"please assign filter_column for filter_df_merge function"
)
df_lists
=
[]
filter_column_lists
=
[]
for
k
,
v
in
cpu_df
.
groupby
(
filter_column
,
dropna
=
True
):
filter_column_lists
.
append
(
k
)
df_lists
.
append
(
v
)
final_output_df
=
df_lists
[
-
1
]
# merge same model
for
i
in
range
(
len
(
df_lists
)
-
1
):
left_suffix
=
cpu_df
[
filter_column
].
unique
()[
0
]
right_suffix
=
df_lists
[
i
][
filter_column
].
unique
()[
0
]
print
(
left_suffix
,
right_suffix
)
if
not
pd
.
isnull
(
right_suffix
):
final_output_df
=
pd
.
merge
(
final_output_df
,
df_lists
[
i
],
how
=
'left'
,
left_on
=
[
'model_name'
,
'batch_size'
],
right_on
=
[
'model_name'
,
'batch_size'
],
suffixes
=
(
''
,
'_{0}_{1}'
.
format
(
filter_column
,
right_suffix
)))
# rename default df columns
origin_column_names
=
list
(
cpu_df
.
columns
.
values
)
origin_column_names
.
remove
(
filter_column
)
suffix
=
final_output_df
[
filter_column
].
unique
()[
0
]
for
name
in
origin_column_names
:
final_output_df
.
rename
(
columns
=
{
name
:
"{0}_{1}_{2}"
.
format
(
name
,
filter_column
,
suffix
)},
inplace
=
True
)
final_output_df
.
rename
(
columns
=
{
filter_column
:
"{0}_{1}_{2}"
.
format
(
filter_column
,
filter_column
,
suffix
)
},
inplace
=
True
)
final_output_df
.
sort_values
(
by
=
[
"model_name_{0}_{1}"
.
format
(
filter_column
,
suffix
),
"batch_size_{0}_{1}"
.
format
(
filter_column
,
suffix
)
],
inplace
=
True
)
return
final_output_df
def
trt_perf_analysis
(
raw_df
):
"""
sperate raw dataframe to a list of dataframe
compare tensorrt percision performance
"""
# filter df by gpu, compare tensorrt and gpu
# define default dataframe for gpu performance analysis
gpu_df
=
raw_df
.
loc
[
raw_df
[
'runtime_device'
]
==
'gpu'
]
new_df
=
filter_df_merge
(
gpu_df
,
"precision"
)
# calculate qps diff percentail
infer_fp32
=
"inference_time(ms)_precision_fp32"
infer_fp16
=
"inference_time(ms)_precision_fp16"
infer_int8
=
"inference_time(ms)_precision_int8"
new_df
[
"fp32_fp16_diff"
]
=
new_df
[[
infer_fp32
,
infer_fp16
]].
apply
(
lambda
x
:
(
float
(
x
[
infer_fp16
])
-
float
(
x
[
infer_fp32
]))
/
float
(
x
[
infer_fp32
]),
axis
=
1
)
new_df
[
"fp32_gpu_diff"
]
=
new_df
[[
"inference_time(ms)"
,
infer_fp32
]].
apply
(
lambda
x
:
(
float
(
x
[
infer_fp32
])
-
float
(
x
[
infer_fp32
]))
/
float
(
x
[
"inference_time(ms)"
]),
axis
=
1
)
new_df
[
"fp16_int8_diff"
]
=
new_df
[[
infer_fp16
,
infer_int8
]].
apply
(
lambda
x
:
(
float
(
x
[
infer_int8
])
-
float
(
x
[
infer_fp16
]))
/
float
(
x
[
infer_fp16
]),
axis
=
1
)
return
new_df
def
mkl_perf_analysis
(
raw_df
):
"""
sperate raw dataframe to a list of dataframe
compare mkldnn performance with not enable mkldnn
"""
# filter df by cpu, compare mkl and cpu
# define default dataframe for cpu mkldnn analysis
cpu_df
=
raw_df
.
loc
[
raw_df
[
'runtime_device'
]
==
'cpu'
]
mkl_compare_df
=
cpu_df
.
loc
[
cpu_df
[
'cpu_math_library_num_threads'
]
==
'1'
]
thread_compare_df
=
cpu_df
.
loc
[
cpu_df
[
'enable_mkldnn'
]
==
'True'
]
# define dataframe need to be analyzed
output_mkl_df
=
filter_df_merge
(
mkl_compare_df
,
'enable_mkldnn'
)
output_thread_df
=
filter_df_merge
(
thread_compare_df
,
'cpu_math_library_num_threads'
)
# calculate performance diff percentail
# compare mkl performance with cpu
enable_mkldnn
=
"inference_time(ms)_enable_mkldnn_True"
disable_mkldnn
=
"inference_time(ms)_enable_mkldnn_False"
output_mkl_df
[
"mkl_infer_diff"
]
=
output_mkl_df
[[
enable_mkldnn
,
disable_mkldnn
]].
apply
(
lambda
x
:
(
float
(
x
[
enable_mkldnn
])
-
float
(
x
[
disable_mkldnn
]))
/
float
(
x
[
disable_mkldnn
]),
axis
=
1
)
cpu_enable_mkldnn
=
"cpu_rss(MB)_enable_mkldnn_True"
cpu_disable_mkldnn
=
"cpu_rss(MB)_enable_mkldnn_False"
output_mkl_df
[
"mkl_cpu_rss_diff"
]
=
output_mkl_df
[[
cpu_enable_mkldnn
,
cpu_disable_mkldnn
]].
apply
(
lambda
x
:
(
float
(
x
[
cpu_enable_mkldnn
])
-
float
(
x
[
cpu_disable_mkldnn
]))
/
float
(
x
[
cpu_disable_mkldnn
]),
axis
=
1
)
# compare cpu_multi_thread performance with cpu
num_threads_1
=
"inference_time(ms)_cpu_math_library_num_threads_1"
num_threads_6
=
"inference_time(ms)_cpu_math_library_num_threads_6"
output_thread_df
[
"mkl_infer_diff"
]
=
output_thread_df
[[
num_threads_6
,
num_threads_1
]].
apply
(
lambda
x
:
(
float
(
x
[
num_threads_6
])
-
float
(
x
[
num_threads_1
]))
/
float
(
x
[
num_threads_1
]),
axis
=
1
)
cpu_num_threads_1
=
"cpu_rss(MB)_cpu_math_library_num_threads_1"
cpu_num_threads_6
=
"cpu_rss(MB)_cpu_math_library_num_threads_6"
output_thread_df
[
"mkl_cpu_rss_diff"
]
=
output_thread_df
[[
cpu_num_threads_6
,
cpu_num_threads_1
]].
apply
(
lambda
x
:
(
float
(
x
[
cpu_num_threads_6
])
-
float
(
x
[
cpu_num_threads_1
]))
/
float
(
x
[
cpu_num_threads_1
]),
axis
=
1
)
return
output_mkl_df
,
output_thread_df
def
main
():
"""
main
"""
args
=
parse_args
()
# create empty DataFrame
origin_df
=
pd
.
DataFrame
(
columns
=
[
"model_name"
,
"batch_size"
,
"input_shape"
,
"runtime_device"
,
"ir_optim"
,
"enable_memory_optim"
,
"enable_tensorrt"
,
"precision"
,
"enable_mkldnn"
,
"cpu_math_library_num_threads"
,
"preproce_time(ms)"
,
"inference_time(ms)"
,
"postprocess_time(ms)"
,
"cpu_rss(MB)"
,
"gpu_rss(MB)"
,
"gpu_util"
])
for
file_name
,
full_path
in
find_all_logs
(
args
.
log_path
):
dict_log
=
process_log
(
full_path
)
origin_df
=
origin_df
.
append
(
dict_log
,
ignore_index
=
True
)
raw_df
=
origin_df
.
sort_values
(
by
=
'model_name'
)
raw_df
.
sort_values
(
by
=
[
"model_name"
,
"batch_size"
],
inplace
=
True
)
raw_df
.
to_excel
(
args
.
output_name
)
if
args
.
analysis_trt
:
trt_df
=
trt_perf_analysis
(
raw_df
)
trt_df
.
to_excel
(
"trt_analysis_{}"
.
format
(
args
.
output_name
))
if
args
.
analysis_mkl
:
mkl_df
,
thread_df
=
mkl_perf_analysis
(
raw_df
)
mkl_df
.
to_excel
(
"mkl_enable_analysis_{}"
.
format
(
args
.
output_name
))
thread_df
.
to_excel
(
"mkl_threads_analysis_{}"
.
format
(
args
.
output_name
))
if
__name__
==
"__main__"
:
main
()
deploy/cpp/docs/Jetson_build.md
浏览文件 @
aa9e0d54
...
...
@@ -155,16 +155,19 @@ CUDNN_LIB=/usr/lib/aarch64-linux-gnu/
| 参数 | 说明 |
| ---- | ---- |
| --model_dir | 导出的预测模型所在路径 |
| --image_path | 要预测的图片文件路径 |
| --video_path | 要预测的视频文件路径 |
| --image_file | 要预测的图片文件路径 |
| --image_dir | 要预测的图片文件夹路径 |
| --video_file | 要预测的视频文件路径 |
| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测)|
| --use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| --gpu_id | 指定进行推理的GPU device id(默认值为0)|
| --run_mode | 使用GPU时,默认为fluid, 可选(fluid/trt_fp32/trt_fp16/trt_int8)|
| --run_benchmark | 是否重复预测来进行benchmark测速 |
| --output_dir | 输出图片所在的文件夹, 默认为output |
| --use_mkldnn | CPU预测中是否开启MKLDNN加速 |
| --cpu_threads | 设置cpu线程数,默认为1 |
**注意**
:
如果同时设置了
`video_path`
和
`image_path`
,程序仅预测
`video_path
`
。
**注意**
:
优先级顺序:
`camera_id`
>
`video_file`
>
`image_dir`
>
`image_file
`
。
`样例一`
:
...
...
deploy/cpp/docs/linux_build.md
浏览文件 @
aa9e0d54
...
...
@@ -97,16 +97,19 @@ make
| 参数 | 说明 |
| ---- | ---- |
| --model_dir | 导出的预测模型所在路径 |
| --image_path | 要预测的图片文件路径 |
| --video_path | 要预测的视频文件路径 |
| --image_file | 要预测的图片文件路径 |
| --image_dir | 要预测的图片文件夹路径 |
| --video_file | 要预测的视频文件路径 |
| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测)|
| --use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| --gpu_id | 指定进行推理的GPU device id(默认值为0)|
| --run_mode | 使用GPU时,默认为fluid, 可选(fluid/trt_fp32/trt_fp16/trt_int8)|
| --run_benchmark | 是否重复预测来进行benchmark测速 |
| --output_dir | 输出图片所在的文件夹, 默认为output |
| --use_mkldnn | CPU预测中是否开启MKLDNN加速 |
| --cpu_threads | 设置cpu线程数,默认为1 |
**注意**
:
如果同时设置了
`video_path`
和
`image_path`
,程序仅预测
`video_path
`
。
**注意**
:
优先级顺序:
`camera_id`
>
`video_file`
>
`image_dir`
>
`image_file
`
。
`样例一`
:
...
...
deploy/cpp/docs/windows_vs2019_build.md
浏览文件 @
aa9e0d54
...
...
@@ -92,17 +92,20 @@ cd D:\projects\PaddleDetection\deploy\cpp\out\build\x64-Release
| 参数 | 说明 |
| ---- | ---- |
| --model_dir | 导出的预测模型所在路径 |
| --image_path | 要预测的图片文件路径 |
| --video_path | 要预测的视频文件路径 |
| --image_file | 要预测的图片文件路径 |
| --image_dir | 要预测的图片文件夹路径 |
| --video_file | 要预测的视频文件路径 |
| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测)|
| --use_gpu | 是否使用 GPU 预测, 支持值为0或1(默认值为0)|
| --gpu_id | 指定进行推理的GPU device id(默认值为0)|
| --run_mode | 使用GPU时,默认为fluid, 可选(fluid/trt_fp32/trt_fp16/trt_int8)|
| --run_benchmark | 是否重复预测来进行benchmark测速 |
| --output_dir | 输出图片所在的文件夹, 默认为output |
| --use_mkldnn | CPU预测中是否开启MKLDNN加速 |
| --cpu_threads | 设置cpu线程数,默认为1 |
**注意**:
(1)
如果同时设置了`
video_path
`和`
image_path
`,程序仅预测`
video_path
`。
(1)
优先级顺序:`
camera_id
` > `
video_file
` > `
image_dir
` > `
image_file
`。
(2)如果提示找不到`
opencv_world346.dll
`,把`
D:
\p
rojects
\p
ackages
\o
pencv3_4_6
\b
uild
\x
64
\v
c14
\b
in
`文件夹下的`
opencv_world346.dll
`拷贝到`
main.exe
`文件夹下即可。
...
...
deploy/cpp/include/object_detector.h
浏览文件 @
aa9e0d54
...
...
@@ -58,40 +58,46 @@ class ObjectDetector {
public:
explicit
ObjectDetector
(
const
std
::
string
&
model_dir
,
bool
use_gpu
=
false
,
bool
use_mkldnn
=
false
,
int
cpu_threads
=
1
,
const
std
::
string
&
run_mode
=
"fluid"
,
const
int
gpu_id
=
0
,
bool
use_dynamic_shape
=
false
,
const
int
trt_min_shape
=
1
,
const
int
trt_max_shape
=
1280
,
const
int
trt_opt_shape
=
640
)
{
const
int
trt_opt_shape
=
640
,
bool
trt_calib_mode
=
false
)
{
this
->
use_gpu_
=
use_gpu
;
this
->
gpu_id_
=
gpu_id
;
this
->
cpu_math_library_num_threads_
=
cpu_threads
;
this
->
use_mkldnn_
=
use_mkldnn
;
this
->
use_dynamic_shape_
=
use_dynamic_shape
;
this
->
trt_min_shape_
=
trt_min_shape
;
this
->
trt_max_shape_
=
trt_max_shape
;
this
->
trt_opt_shape_
=
trt_opt_shape
;
this
->
trt_calib_mode_
=
trt_calib_mode
;
config_
.
load_config
(
model_dir
);
this
->
min_subgraph_size_
=
config_
.
min_subgraph_size_
;
threshold_
=
config_
.
draw_threshold_
;
image_shape_
=
config_
.
image_shape_
;
preprocessor_
.
Init
(
config_
.
preprocess_info_
,
image_shape_
);
LoadModel
(
model_dir
,
use_gpu
,
config_
.
min_subgraph_size_
,
1
,
run_mode
,
gpu_id
,
use_dynamic_shape
,
trt_min_shape
,
trt_max_shape
,
trt_opt_shape
);
LoadModel
(
model_dir
,
1
,
run_mode
);
}
// Load Paddle inference model
void
LoadModel
(
const
std
::
string
&
model_dir
,
bool
use_gpu
,
const
int
min_subgraph_size
,
const
int
batch_size
=
1
,
const
std
::
string
&
run_mode
=
"fluid"
,
const
int
gpu_id
=
0
,
bool
use_dynamic_shape
=
false
,
const
int
trt_min_shape
=
1
,
const
int
trt_max_shape
=
1280
,
const
int
trt_opt_shape
=
640
);
const
std
::
string
&
run_mode
=
"fluid"
);
// Run predictor
void
Predict
(
const
cv
::
Mat
&
im
,
const
double
threshold
=
0.5
,
const
int
warmup
=
0
,
const
int
repeats
=
1
,
const
bool
run_benchmark
=
false
,
std
::
vector
<
ObjectResult
>*
result
=
nullptr
);
std
::
vector
<
ObjectResult
>*
result
=
nullptr
,
std
::
vector
<
double
>*
times
=
nullptr
);
// Get Model Label list
const
std
::
vector
<
std
::
string
>&
GetLabelList
()
const
{
...
...
@@ -99,6 +105,16 @@ class ObjectDetector {
}
private:
bool
use_gpu_
=
false
;
int
gpu_id_
=
0
;
int
cpu_math_library_num_threads_
=
1
;
bool
use_mkldnn_
=
false
;
int
min_subgraph_size_
=
3
;
bool
use_dynamic_shape_
=
false
;
int
trt_min_shape_
=
1
;
int
trt_max_shape_
=
1280
;
int
trt_opt_shape_
=
640
;
bool
trt_calib_mode_
=
false
;
// Preprocess image and copy data to input buffer
void
Preprocess
(
const
cv
::
Mat
&
image_mat
);
// Postprocess result
...
...
deploy/cpp/src/main.cc
浏览文件 @
aa9e0d54
...
...
@@ -14,9 +14,11 @@
#include <glog/logging.h>
#include <dirent.h>
#include <iostream>
#include <string>
#include <vector>
#include <numeric>
#include <sys/types.h>
#include <sys/stat.h>
...
...
@@ -33,20 +35,54 @@
DEFINE_string
(
model_dir
,
""
,
"Path of inference model"
);
DEFINE_string
(
image_path
,
""
,
"Path of input image"
);
DEFINE_string
(
video_path
,
""
,
"Path of input video"
);
DEFINE_string
(
image_file
,
""
,
"Path of input image"
);
DEFINE_string
(
image_dir
,
""
,
"Dir of input image, `image_file` has a higher priority."
);
DEFINE_string
(
video_file
,
""
,
"Path of input video, `video_file` or `camera_id` has a highest priority."
);
DEFINE_int32
(
camera_id
,
-
1
,
"Device id of camera to predict"
);
DEFINE_bool
(
use_gpu
,
false
,
"Infering with GPU or CPU"
);
DEFINE_bool
(
use_camera
,
false
,
"Use camera or not"
);
DEFINE_double
(
threshold
,
0.5
,
"Threshold of score."
);
DEFINE_string
(
output_dir
,
"output"
,
"Directory of output visualization files."
);
DEFINE_string
(
run_mode
,
"fluid"
,
"Mode of running(fluid/trt_fp32/trt_fp16/trt_int8)"
);
DEFINE_int32
(
gpu_id
,
0
,
"Device id of GPU to execute"
);
DEFINE_int32
(
camera_id
,
-
1
,
"Device id of camera to predict"
);
DEFINE_bool
(
run_benchmark
,
false
,
"Whether to predict a image_file repeatedly for benchmark"
);
DEFINE_
double
(
threshold
,
0.5
,
"Threshold of score.
"
);
DEFINE_
string
(
output_dir
,
"output"
,
"Directory of output visualization files.
"
);
DEFINE_
bool
(
use_mkldnn
,
false
,
"Whether use mkldnn with CPU
"
);
DEFINE_
int32
(
cpu_threads
,
1
,
"Num of threads with CPU
"
);
DEFINE_bool
(
use_dynamic_shape
,
false
,
"Trt use dynamic shape or not"
);
DEFINE_int32
(
trt_min_shape
,
1
,
"Min shape of TRT DynamicShapeI"
);
DEFINE_int32
(
trt_max_shape
,
1280
,
"Max shape of TRT DynamicShapeI"
);
DEFINE_int32
(
trt_opt_shape
,
640
,
"Opt shape of TRT DynamicShapeI"
);
DEFINE_bool
(
trt_calib_mode
,
false
,
"If the model is produced by TRT offline quantitative calibration, trt_calib_mode need to set True"
);
void
PrintBenchmarkLog
(
std
::
vector
<
double
>
det_time
,
int
img_num
){
LOG
(
INFO
)
<<
"----------------------- Config info -----------------------"
;
LOG
(
INFO
)
<<
"runtime_device: "
<<
(
FLAGS_use_gpu
?
"gpu"
:
"cpu"
);
LOG
(
INFO
)
<<
"ir_optim: "
<<
"True"
;
LOG
(
INFO
)
<<
"enable_memory_optim: "
<<
"True"
;
int
has_trt
=
FLAGS_run_mode
.
find
(
"trt"
);
if
(
has_trt
>=
0
)
{
LOG
(
INFO
)
<<
"enable_tensorrt: "
<<
"True"
;
std
::
string
precision
=
FLAGS_run_mode
.
substr
(
4
,
8
);
LOG
(
INFO
)
<<
"precision: "
<<
precision
;
}
else
{
LOG
(
INFO
)
<<
"enable_tensorrt: "
<<
"False"
;
LOG
(
INFO
)
<<
"precision: "
<<
"fp32"
;
}
LOG
(
INFO
)
<<
"enable_mkldnn: "
<<
(
FLAGS_use_mkldnn
?
"True"
:
"False"
);
LOG
(
INFO
)
<<
"cpu_math_library_num_threads: "
<<
FLAGS_cpu_threads
;
LOG
(
INFO
)
<<
"----------------------- Data info -----------------------"
;
LOG
(
INFO
)
<<
"batch_size: "
<<
1
;
LOG
(
INFO
)
<<
"input_shape: "
<<
"dynamic shape"
;
LOG
(
INFO
)
<<
"----------------------- Model info -----------------------"
;
FLAGS_model_dir
.
erase
(
FLAGS_model_dir
.
find_last_not_of
(
"/"
)
+
1
);
LOG
(
INFO
)
<<
"model_name: "
<<
FLAGS_model_dir
.
substr
(
FLAGS_model_dir
.
find_last_of
(
'/'
)
+
1
);
LOG
(
INFO
)
<<
"----------------------- Perf info ------------------------"
;
LOG
(
INFO
)
<<
"Total number of predicted data: "
<<
img_num
<<
" and total time spent(s): "
<<
std
::
accumulate
(
det_time
.
begin
(),
det_time
.
end
(),
0
);
LOG
(
INFO
)
<<
"preproce_time(ms): "
<<
det_time
[
0
]
/
img_num
<<
", inference_time(ms): "
<<
det_time
[
1
]
/
img_num
<<
", postprocess_time(ms): "
<<
det_time
[
2
];
}
static
std
::
string
DirName
(
const
std
::
string
&
filepath
)
{
auto
pos
=
filepath
.
rfind
(
OS_PATH_SEP
);
...
...
@@ -89,6 +125,37 @@ static void MkDirs(const std::string& path) {
MkDir
(
path
);
}
void
GetAllFiles
(
const
char
*
dir_name
,
std
::
vector
<
std
::
string
>
&
all_inputs
)
{
if
(
NULL
==
dir_name
)
{
std
::
cout
<<
" dir_name is null ! "
<<
std
::
endl
;
return
;
}
struct
stat
s
;
lstat
(
dir_name
,
&
s
);
if
(
!
S_ISDIR
(
s
.
st_mode
))
{
std
::
cout
<<
"dir_name is not a valid directory !"
<<
std
::
endl
;
all_inputs
.
push_back
(
dir_name
);
return
;
}
else
{
struct
dirent
*
filename
;
// return value for readdir()
DIR
*
dir
;
// return value for opendir()
dir
=
opendir
(
dir_name
);
if
(
NULL
==
dir
)
{
std
::
cout
<<
"Can not open dir "
<<
dir_name
<<
std
::
endl
;
return
;
}
std
::
cout
<<
"Successfully opened the dir !"
<<
std
::
endl
;
while
((
filename
=
readdir
(
dir
))
!=
NULL
)
{
if
(
strcmp
(
filename
->
d_name
,
"."
)
==
0
||
strcmp
(
filename
->
d_name
,
".."
)
==
0
)
continue
;
all_inputs
.
push_back
(
dir_name
+
std
::
string
(
"/"
)
+
std
::
string
(
filename
->
d_name
));
}
}
}
void
PredictVideo
(
const
std
::
string
&
video_path
,
PaddleDetection
::
ObjectDetector
*
det
)
{
// Open video
...
...
@@ -122,6 +189,7 @@ void PredictVideo(const std::string& video_path,
}
std
::
vector
<
PaddleDetection
::
ObjectResult
>
result
;
std
::
vector
<
double
>
det_times
;
auto
labels
=
det
->
GetLabelList
();
auto
colormap
=
PaddleDetection
::
GenerateColorMap
(
labels
.
size
());
// Capture all frames and do inference
...
...
@@ -131,7 +199,7 @@ void PredictVideo(const std::string& video_path,
if
(
frame
.
empty
())
{
break
;
}
det
->
Predict
(
frame
,
0.5
,
0
,
1
,
false
,
&
result
);
det
->
Predict
(
frame
,
0.5
,
0
,
1
,
&
result
,
&
det_times
);
cv
::
Mat
out_im
=
PaddleDetection
::
VisualizeResult
(
frame
,
result
,
labels
,
colormap
);
for
(
const
auto
&
item
:
result
)
{
...
...
@@ -151,21 +219,22 @@ void PredictVideo(const std::string& video_path,
video_out
.
release
();
}
void
PredictImage
(
const
std
::
string
&
image_path
,
void
PredictImage
(
const
std
::
vector
<
std
::
string
>
all_img_list
,
const
double
threshold
,
const
bool
run_benchmark
,
PaddleDetection
::
ObjectDetector
*
det
,
const
std
::
string
&
output_dir
=
"output"
)
{
std
::
vector
<
double
>
det_t
=
{
0
,
0
,
0
};
for
(
auto
image_file
:
all_img_list
)
{
// Open input image as an opencv cv::Mat object
cv
::
Mat
im
=
cv
::
imread
(
image_path
,
1
);
cv
::
Mat
im
=
cv
::
imread
(
image_file
,
1
);
// Store all detected result
std
::
vector
<
PaddleDetection
::
ObjectResult
>
result
;
if
(
run_benchmark
)
{
det
->
Predict
(
im
,
threshold
,
100
,
100
,
run_benchmark
,
&
result
);
}
else
{
det
->
Predict
(
im
,
0.5
,
0
,
1
,
run_benchmark
,
&
result
);
std
::
vector
<
double
>
det_times
;
if
(
run_benchmark
)
{
det
->
Predict
(
im
,
threshold
,
10
,
10
,
&
result
,
&
det_times
);
}
else
{
det
->
Predict
(
im
,
0.5
,
0
,
1
,
&
result
,
&
det_times
);
for
(
const
auto
&
item
:
result
)
{
printf
(
"class=%d confidence=%.4f rect=[%d %d %d %d]
\n
"
,
item
.
class_id
,
...
...
@@ -187,19 +256,25 @@ void PredictImage(const std::string& image_path,
if
(
output_dir
.
rfind
(
OS_PATH_SEP
)
!=
output_dir
.
size
()
-
1
)
{
output_path
+=
OS_PATH_SEP
;
}
output_path
+=
"output.jpg"
;
;
output_path
+=
image_file
.
substr
(
image_file
.
find_last_of
(
'/'
)
+
1
);
cv
::
imwrite
(
output_path
,
vis_img
,
compression_params
);
printf
(
"Visualized output saved as %s
\n
"
,
output_path
.
c_str
());
}
det_t
[
0
]
+=
det_times
[
0
];
det_t
[
1
]
+=
det_times
[
1
];
det_t
[
2
]
+=
det_times
[
2
];
}
PrintBenchmarkLog
(
det_t
,
all_img_list
.
size
());
}
int
main
(
int
argc
,
char
**
argv
)
{
// Parsing command-line
google
::
ParseCommandLineFlags
(
&
argc
,
&
argv
,
true
);
if
(
FLAGS_model_dir
.
empty
()
||
(
FLAGS_image_
path
.
empty
()
&&
FLAGS_video_path
.
empty
()))
{
||
(
FLAGS_image_
file
.
empty
()
&&
FLAGS_image_dir
.
empty
()
&&
FLAGS_video_file
.
empty
()))
{
std
::
cout
<<
"Usage: ./main --model_dir=/PATH/TO/INFERENCE_MODEL/ "
<<
"--image_
path
=/PATH/TO/INPUT/IMAGE/"
<<
std
::
endl
;
<<
"--image_
file
=/PATH/TO/INPUT/IMAGE/"
<<
std
::
endl
;
return
-
1
;
}
if
(
!
(
FLAGS_run_mode
==
"fluid"
||
FLAGS_run_mode
==
"trt_fp32"
...
...
@@ -208,17 +283,23 @@ int main(int argc, char** argv) {
return
-
1
;
}
// Load model and create a object detector
PaddleDetection
::
ObjectDetector
det
(
FLAGS_model_dir
,
FLAGS_use_gpu
,
FLAGS_
run_mode
,
FLAGS_
gpu_id
,
FLAGS_use_dynamic_shape
,
FLAGS_trt_min
_shape
,
FLAGS_trt_m
ax_shape
,
FLAGS_trt_opt_shap
e
);
PaddleDetection
::
ObjectDetector
det
(
FLAGS_model_dir
,
FLAGS_use_gpu
,
FLAGS_
use_mkldnn
,
FLAGS_
cpu_threads
,
FLAGS_run_mode
,
FLAGS_gpu_id
,
FLAGS_use_dynamic
_shape
,
FLAGS_trt_m
in_shape
,
FLAGS_trt_max_shape
,
FLAGS_trt_opt_shape
,
FLAGS_trt_calib_mod
e
);
// Do inference on input video or image
if
(
!
FLAGS_video_
path
.
empty
()
||
FLAGS_use_camera
)
{
PredictVideo
(
FLAGS_video_
path
,
&
det
);
}
else
if
(
!
FLAGS_image_
path
.
empty
())
{
if
(
!
FLAGS_video_
file
.
empty
()
||
FLAGS_camera_id
!=
-
1
)
{
PredictVideo
(
FLAGS_video_
file
,
&
det
);
}
else
if
(
!
FLAGS_image_
file
.
empty
()
||
!
FLAGS_image_dir
.
empty
())
{
if
(
!
PathExists
(
FLAGS_output_dir
))
{
MkDirs
(
FLAGS_output_dir
);
}
PredictImage
(
FLAGS_image_path
,
FLAGS_threshold
,
FLAGS_run_benchmark
,
&
det
,
FLAGS_output_dir
);
std
::
vector
<
std
::
string
>
all_img_list
;
if
(
!
FLAGS_image_file
.
empty
())
{
all_img_list
.
push_back
(
FLAGS_image_file
);
}
else
{
GetAllFiles
((
char
*
)
FLAGS_image_dir
.
c_str
(),
all_img_list
);
}
PredictImage
(
all_img_list
,
FLAGS_threshold
,
FLAGS_run_benchmark
,
&
det
,
FLAGS_output_dir
);
}
return
0
;
}
deploy/cpp/src/object_detector.cc
浏览文件 @
aa9e0d54
...
...
@@ -24,24 +24,16 @@ namespace PaddleDetection {
// Load Model and create model predictor
void
ObjectDetector
::
LoadModel
(
const
std
::
string
&
model_dir
,
bool
use_gpu
,
const
int
min_subgraph_size
,
const
int
batch_size
,
const
std
::
string
&
run_mode
,
const
int
gpu_id
,
bool
use_dynamic_shape
,
const
int
trt_min_shape
,
const
int
trt_max_shape
,
const
int
trt_opt_shape
)
{
const
std
::
string
&
run_mode
)
{
paddle_infer
::
Config
config
;
std
::
string
prog_file
=
model_dir
+
OS_PATH_SEP
+
"model.pdmodel"
;
std
::
string
params_file
=
model_dir
+
OS_PATH_SEP
+
"model.pdiparams"
;
config
.
SetModel
(
prog_file
,
params_file
);
if
(
use_gpu
)
{
config
.
EnableUseGpu
(
200
,
gpu_id
);
if
(
this
->
use_gpu_
)
{
config
.
EnableUseGpu
(
200
,
this
->
gpu_id_
);
config
.
SwitchIrOptim
(
true
);
// use tensorrt
bool
use_calib_mode
=
false
;
if
(
run_mode
!=
"fluid"
)
{
auto
precision
=
paddle_infer
::
Config
::
Precision
::
kFloat32
;
if
(
run_mode
==
"trt_fp32"
)
{
...
...
@@ -52,7 +44,6 @@ void ObjectDetector::LoadModel(const std::string& model_dir,
}
else
if
(
run_mode
==
"trt_int8"
)
{
precision
=
paddle_infer
::
Config
::
Precision
::
kInt8
;
use_calib_mode
=
true
;
}
else
{
printf
(
"run_mode should be 'fluid', 'trt_fp32', 'trt_fp16' or 'trt_int8'"
);
}
...
...
@@ -60,17 +51,17 @@ void ObjectDetector::LoadModel(const std::string& model_dir,
config
.
EnableTensorRtEngine
(
1
<<
30
,
batch_size
,
min_subgraph_size
,
this
->
min_subgraph_size_
,
precision
,
false
,
use_calib_mode
);
this
->
trt_calib_mode_
);
// set use dynamic shape
if
(
use_dynamic_shape
)
{
if
(
this
->
use_dynamic_shape_
)
{
// set DynamicShsape for image tensor
const
std
::
vector
<
int
>
min_input_shape
=
{
1
,
3
,
t
rt_min_shape
,
trt_min_shape
};
const
std
::
vector
<
int
>
max_input_shape
=
{
1
,
3
,
t
rt_max_shape
,
trt_max_shape
};
const
std
::
vector
<
int
>
opt_input_shape
=
{
1
,
3
,
t
rt_opt_shape
,
trt_opt_shape
};
const
std
::
vector
<
int
>
min_input_shape
=
{
1
,
3
,
t
his
->
trt_min_shape_
,
this
->
trt_min_shape_
};
const
std
::
vector
<
int
>
max_input_shape
=
{
1
,
3
,
t
his
->
trt_max_shape_
,
this
->
trt_max_shape_
};
const
std
::
vector
<
int
>
opt_input_shape
=
{
1
,
3
,
t
his
->
trt_opt_shape_
,
this
->
trt_opt_shape_
};
const
std
::
map
<
std
::
string
,
std
::
vector
<
int
>>
map_min_input_shape
=
{{
"image"
,
min_input_shape
}};
const
std
::
map
<
std
::
string
,
std
::
vector
<
int
>>
map_max_input_shape
=
{{
"image"
,
max_input_shape
}};
const
std
::
map
<
std
::
string
,
std
::
vector
<
int
>>
map_opt_input_shape
=
{{
"image"
,
opt_input_shape
}};
...
...
@@ -84,8 +75,15 @@ void ObjectDetector::LoadModel(const std::string& model_dir,
}
else
{
config
.
DisableGpu
();
if
(
this
->
use_mkldnn_
)
{
config
.
EnableMKLDNN
();
// cache 10 different shapes for mkldnn to avoid memory leak
config
.
SetMkldnnCacheCapacity
(
10
);
}
config
.
SetCpuMathLibraryNumThreads
(
this
->
cpu_math_library_num_threads_
);
}
config
.
SwitchUseFeedFetchOps
(
false
);
config
.
SwitchIrOptim
(
true
);
config
.
DisableGlogInfo
();
// Memory optimization
config
.
EnableMemoryOptim
();
...
...
@@ -189,8 +187,9 @@ void ObjectDetector::Predict(const cv::Mat& im,
const
double
threshold
,
const
int
warmup
,
const
int
repeats
,
const
bool
run_benchmark
,
std
::
vector
<
ObjectResult
>*
result
)
{
std
::
vector
<
ObjectResult
>*
result
,
std
::
vector
<
double
>*
times
)
{
auto
preprocess_start
=
std
::
chrono
::
steady_clock
::
now
();
// Preprocess image
Preprocess
(
im
);
// Prepare input tensor
...
...
@@ -210,6 +209,7 @@ void ObjectDetector::Predict(const cv::Mat& im,
in_tensor
->
CopyFromCpu
(
inputs_
.
scale_factor_
.
data
());
}
}
auto
preprocess_end
=
std
::
chrono
::
steady_clock
::
now
();
// Run predictor
for
(
int
i
=
0
;
i
<
warmup
;
i
++
)
{
...
...
@@ -231,7 +231,7 @@ void ObjectDetector::Predict(const cv::Mat& im,
out_tensor
->
CopyToCpu
(
output_data_
.
data
());
}
auto
start
=
std
::
chrono
::
steady_clock
::
now
();
auto
inference_
start
=
std
::
chrono
::
steady_clock
::
now
();
for
(
int
i
=
0
;
i
<
repeats
;
i
++
)
{
predictor_
->
Run
();
...
...
@@ -251,14 +251,18 @@ void ObjectDetector::Predict(const cv::Mat& im,
output_data_
.
resize
(
output_size
);
out_tensor
->
CopyToCpu
(
output_data_
.
data
());
}
auto
end
=
std
::
chrono
::
steady_clock
::
now
();
std
::
chrono
::
duration
<
float
>
diff
=
end
-
start
;
float
ms
=
diff
.
count
()
/
repeats
*
1000
;
printf
(
"Inference: %f ms per batch image
\n
"
,
ms
);
auto
inference_end
=
std
::
chrono
::
steady_clock
::
now
();
auto
postprocess_start
=
std
::
chrono
::
steady_clock
::
now
();
// Postprocessing result
if
(
!
run_benchmark
)
{
Postprocess
(
im
,
result
);
}
auto
postprocess_end
=
std
::
chrono
::
steady_clock
::
now
();
std
::
chrono
::
duration
<
float
>
preprocess_diff
=
preprocess_end
-
preprocess_start
;
times
->
push_back
(
double
(
preprocess_diff
.
count
()
*
1000
));
std
::
chrono
::
duration
<
float
>
inference_diff
=
inference_end
-
inference_start
;
times
->
push_back
(
double
(
inference_diff
.
count
()
/
repeats
*
1000
));
std
::
chrono
::
duration
<
float
>
postprocess_diff
=
postprocess_end
-
postprocess_start
;
times
->
push_back
(
double
(
postprocess_diff
.
count
()
*
1000
));
}
std
::
vector
<
int
>
GenerateColorMap
(
int
num_class
)
{
...
...
deploy/python/README.md
浏览文件 @
aa9e0d54
# Python端预测部署
Python预测可以使用
`tools/infer.py`
,此种方式依赖PaddleDetection源码;也可以使用本篇教程预测方式,先将模型导出,使用一个独立的文件进行预测。
本篇教程使用AnalysisPredictor对
[
导出模型
](
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/deploy/EXPORT_MODEL.md
)
进行高性能预测。
在PaddlePaddle中预测引擎和训练引擎底层有着不同的优化方法, 预测引擎使用了AnalysisPredictor,专门针对推理进行了优化,是基于
[
C++预测库
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/native_infer.html
)
的Python接口,该引擎可以对模型进行多项图优化,减少不必要的内存拷贝。如果用户在部署已训练模型的过程中对性能有较高的要求,我们提供了独立于PaddleDetection的预测脚本,方便用户直接集成部署。
主要包含两个步骤:
-
导出预测模型
-
基于Python
的
预测
-
基于Python
进行
预测
## 1. 导出预测模型
...
...
@@ -22,16 +17,14 @@ PaddleDetection在训练过程包括网络的前向和优化器相关参数,
## 2. 基于python的预测
### 2.1 安装依赖
-
`PaddlePaddle`
的安装:
请点击
[
官方安装文档
](
https://paddlepaddle.org.cn/install/quick
)
选择适合的方式,版本为2.0rc1以上即可
-
切换到
`PaddleDetection`
代码库根目录,执行
`pip install -r requirements.txt`
安装其它依赖
-
`PaddlePaddle`
的安装: 请点击
[
官方安装文档
](
https://paddlepaddle.org.cn/install/quick
)
选择适合的版本进行安装,要求PaddlePaddle>=2.0.1以上。
-
切换到
`PaddleDetection`
代码库根目录,执行
`pip install -r requirements.txt`
安装其它依赖。
### 2.2 执行预测程序
在终端输入以下命令进行预测:
```
bash
python deploy/python/infer.py
--model_dir
=
/path/to/models
--image_file
=
/path/to/image
--use_gpu
=(
False/True
)
python deploy/python/infer.py
--model_dir
=
/path/to/models
--image_file
=
/path/to/image
--use_gpu
=(
False/True
)
```
参数说明如下:
...
...
@@ -40,41 +33,19 @@ python deploy/python/infer.py --model_dir=/path/to/models --image_file=/path/to/
|-------|-------|----------|
| --model_dir | Yes|上述导出的模型路径 |
| --image_file | Option |需要预测的图片 |
| --image_dir | Option | 要预测的图片文件夹路径 |
| --video_file | Option |需要预测的视频 |
| --camera_id | Option | 用来预测的摄像头ID,默认为-1(表示不使用摄像头预测,可设置为:0 - (摄像头数目-1) ),预测过程中在可视化界面按
`q`
退出输出预测结果到:output/output.mp4|
| --use_gpu |No|是否GPU,默认为False|
| --run_mode |No|使用GPU时,默认为fluid, 可选(fluid/trt_fp32/trt_fp16/trt_int8)|
| --threshold |No|预测得分的阈值,默认为0.5|
| --output_dir |No|可视化结果保存的根目录,默认为output/|
| --run_benchmark |No|是否运行benchmark,同时需指定--image_file|
| --use_gpu | No |是否GPU,默认为False|
| --run_mode | No |使用GPU时,默认为fluid, 可选(fluid/trt_fp32/trt_fp16/trt_int8)|
| --threshold | No|预测得分的阈值,默认为0.5|
| --output_dir | No|可视化结果保存的根目录,默认为output/|
| --run_benchmark | No| 是否运行benchmark,同时需指定
`--image_file`
或
`--image_dir`
|
| --use_mkldnn | No | CPU预测中是否开启MKLDNN加速 |
| --cpu_threads | No| 设置cpu线程数,默认为1 |
说明:
-
参数优先级顺序:
`camera_id`
>
`video_file`
>
`image_dir`
>
`image_file`
。
-
run_mode:fluid代表使用AnalysisPredictor,精度float32来推理,其他参数指用AnalysisPredictor,TensorRT不同精度来推理。
-
PaddlePaddle默认的GPU安装包(<=1.7),不支持基于TensorRT进行预测,如果想基于TensorRT加速预测,需要自行编译,详细可参考
[
预测库编译教程
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/deploy/inference/paddle_tensorrt_infer.html
)
。
## 3. 部署性能对比测试
对比AnalysisPredictor相对Executor的推理速度
### 3.1 测试环境:
-
CUDA 9.0
-
CUDNN 7.5
-
PaddlePaddle 1.71
-
GPU: Tesla P40
### 3.2 测试方式:
-
Batch Size=1
-
去掉前100轮warmup时间,测试100轮的平均时间,单位ms/image,只计算模型运行时间,不包括数据的处理和拷贝。
### 3.3 测试结果
|模型 | AnalysisPredictor | Executor | 输入|
|---|----|---|---|
| YOLOv3-MobileNetv1 | 15.20 | 19.54 | 608
*
608
| faster_rcnn_r50_fpn_1x | 50.05 | 69.58 |800
*
1088
| faster_rcnn_r50_1x | 326.11 | 347.22 | 800
*
1067
| mask_rcnn_r50_fpn_1x | 67.49 | 91.02 | 800
*
1088
| mask_rcnn_r50_1x | 326.11 | 350.94 | 800
*
1067
-
如果安装的PaddlePaddle不支持基于TensorRT进行预测,需要自行编译,详细可参考
[
预测库编译教程
](
https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_usage/deploy/inference/paddle_tensorrt_infer.html
)
。
deploy/python/infer.py
浏览文件 @
aa9e0d54
...
...
@@ -13,21 +13,22 @@
# limitations under the License.
import
os
import
argparse
import
time
import
yaml
import
ast
import
glob
from
functools
import
reduce
from
PIL
import
Image
import
cv2
import
numpy
as
np
import
paddle
from
preprocess
import
preprocess
,
Resize
,
NormalizeImage
,
Permute
,
PadStride
from
visualize
import
visualize_box_mask
from
paddle.inference
import
Config
from
paddle.inference
import
create_predictor
from
preprocess
import
preprocess
,
Resize
,
NormalizeImage
,
Permute
,
PadStride
from
visualize
import
visualize_box_mask
from
utils
import
argsparser
,
Timer
,
get_current_memory_mb
,
LoggerHelper
# Global dictionary
SUPPORT_MODELS
=
{
'YOLO'
,
...
...
@@ -63,7 +64,7 @@ class Detector(object):
trt_min_shape
=
1
,
trt_max_shape
=
1280
,
trt_opt_shape
=
640
,
t
hreshold
=
0.5
):
t
rt_calib_mode
=
False
):
self
.
pred_config
=
pred_config
self
.
predictor
=
load_predictor
(
model_dir
,
...
...
@@ -73,7 +74,10 @@ class Detector(object):
use_dynamic_shape
=
use_dynamic_shape
,
trt_min_shape
=
trt_min_shape
,
trt_max_shape
=
trt_max_shape
,
trt_opt_shape
=
trt_opt_shape
)
trt_opt_shape
=
trt_opt_shape
,
trt_calib_mode
=
trt_calib_mode
)
self
.
det_times
=
Timer
()
self
.
cpu_mem
,
self
.
gpu_mem
,
self
.
gpu_util
=
0
,
0
,
0
def
preprocess
(
self
,
im
):
preprocess_ops
=
[]
...
...
@@ -102,12 +106,7 @@ class Detector(object):
results
[
'masks'
]
=
np_masks
return
results
def
predict
(
self
,
image
,
threshold
=
0.5
,
warmup
=
0
,
repeats
=
1
,
run_benchmark
=
False
):
def
predict
(
self
,
image
,
threshold
=
0.5
,
warmup
=
0
,
repeats
=
1
):
'''
Args:
image (str/np.ndarray): path of image/ np.ndarray read by cv2
...
...
@@ -118,13 +117,14 @@ class Detector(object):
MaskRCNN's results include 'masks': np.ndarray:
shape: [N, im_h, im_w]
'''
self
.
det_times
.
preprocess_time
.
start
()
inputs
=
self
.
preprocess
(
image
)
np_boxes
,
np_masks
=
None
,
None
input_names
=
self
.
predictor
.
get_input_names
()
for
i
in
range
(
len
(
input_names
)):
input_tensor
=
self
.
predictor
.
get_input_handle
(
input_names
[
i
])
input_tensor
.
copy_from_cpu
(
inputs
[
input_names
[
i
]])
self
.
det_times
.
preprocess_time
.
end
()
for
i
in
range
(
warmup
):
self
.
predictor
.
run
()
output_names
=
self
.
predictor
.
get_output_names
()
...
...
@@ -134,7 +134,7 @@ class Detector(object):
masks_tensor
=
self
.
predictor
.
get_output_handle
(
output_names
[
2
])
np_masks
=
masks_tensor
.
copy_to_cpu
()
t1
=
time
.
time
()
self
.
det_times
.
inference_time
.
start
()
for
i
in
range
(
repeats
):
self
.
predictor
.
run
()
output_names
=
self
.
predictor
.
get_output_names
()
...
...
@@ -143,20 +143,18 @@ class Detector(object):
if
self
.
pred_config
.
mask
:
masks_tensor
=
self
.
predictor
.
get_output_handle
(
output_names
[
2
])
np_masks
=
masks_tensor
.
copy_to_cpu
()
t2
=
time
.
time
()
ms
=
(
t2
-
t1
)
*
1000.0
/
repeats
print
(
"Inference: {} ms per batch image"
.
format
(
ms
))
self
.
det_times
.
inference_time
.
end
(
repeats
=
repeats
)
# do not perform postprocess in benchmark mode
self
.
det_times
.
postprocess_time
.
start
()
results
=
[]
if
not
run_benchmark
:
if
reduce
(
lambda
x
,
y
:
x
*
y
,
np_boxes
.
shape
)
<
6
:
print
(
'[WARNNING] No object detected.'
)
results
=
{
'boxes'
:
np
.
array
([])}
else
:
results
=
self
.
postprocess
(
np_boxes
,
np_masks
,
inputs
,
threshold
=
threshold
)
self
.
det_times
.
postprocess_time
.
end
()
self
.
det_times
.
img_num
+=
1
return
results
...
...
@@ -183,7 +181,7 @@ class DetectorSOLOv2(Detector):
trt_min_shape
=
1
,
trt_max_shape
=
1280
,
trt_opt_shape
=
640
,
t
hreshold
=
0.5
):
t
rt_calib_mode
=
False
):
self
.
pred_config
=
pred_config
self
.
predictor
=
load_predictor
(
model_dir
,
...
...
@@ -193,14 +191,11 @@ class DetectorSOLOv2(Detector):
use_dynamic_shape
=
use_dynamic_shape
,
trt_min_shape
=
trt_min_shape
,
trt_max_shape
=
trt_max_shape
,
trt_opt_shape
=
trt_opt_shape
)
def
predict
(
self
,
image
,
threshold
=
0.5
,
warmup
=
0
,
repeats
=
1
,
run_benchmark
=
False
):
trt_opt_shape
=
trt_opt_shape
,
trt_calib_mode
=
trt_calib_mode
)
self
.
det_times
=
Timer
()
def
predict
(
self
,
image
,
threshold
=
0.5
,
warmup
=
0
,
repeats
=
1
):
'''
Args:
image (str/np.ndarray): path of image/ np.ndarray read by cv2
...
...
@@ -210,13 +205,14 @@ class DetectorSOLOv2(Detector):
'cate_label': label of segm, shape:[N]
'cate_score': confidence score of segm, shape:[N]
'''
self
.
det_times
.
preprocess_time
.
start
()
inputs
=
self
.
preprocess
(
image
)
np_label
,
np_score
,
np_segms
=
None
,
None
,
None
input_names
=
self
.
predictor
.
get_input_names
()
for
i
in
range
(
len
(
input_names
)):
input_tensor
=
self
.
predictor
.
get_input_handle
(
input_names
[
i
])
input_tensor
.
copy_from_cpu
(
inputs
[
input_names
[
i
]])
self
.
det_times
.
preprocess_time
.
end
()
for
i
in
range
(
warmup
):
self
.
predictor
.
run
()
output_names
=
self
.
predictor
.
get_output_names
()
...
...
@@ -227,7 +223,7 @@ class DetectorSOLOv2(Detector):
np_segms
=
self
.
predictor
.
get_output_handle
(
output_names
[
3
]).
copy_to_cpu
()
t1
=
time
.
time
()
self
.
det_times
.
inference_time
.
start
()
for
i
in
range
(
repeats
):
self
.
predictor
.
run
()
output_names
=
self
.
predictor
.
get_output_names
()
...
...
@@ -237,15 +233,10 @@ class DetectorSOLOv2(Detector):
2
]).
copy_to_cpu
()
np_segms
=
self
.
predictor
.
get_output_handle
(
output_names
[
3
]).
copy_to_cpu
()
t2
=
time
.
time
()
ms
=
(
t2
-
t1
)
*
1000.0
/
repeats
print
(
"Inference: {} ms per batch image"
.
format
(
ms
))
self
.
det_times
.
inference_time
.
end
(
repeats
=
repeats
)
self
.
det_times
.
img_num
+=
1
# do not perform postprocess in benchmark mode
results
=
[]
if
not
run_benchmark
:
return
dict
(
segm
=
np_segms
,
label
=
np_label
,
score
=
np_score
)
return
results
def
create_inputs
(
im
,
im_info
):
...
...
@@ -316,7 +307,8 @@ def load_predictor(model_dir,
use_dynamic_shape
=
False
,
trt_min_shape
=
1
,
trt_max_shape
=
1280
,
trt_opt_shape
=
640
):
trt_opt_shape
=
640
,
trt_calib_mode
=
False
):
"""set AnalysisConfig, generate AnalysisPredictor
Args:
model_dir (str): root path of __model__ and __params__
...
...
@@ -326,6 +318,8 @@ def load_predictor(model_dir,
trt_min_shape (int): min shape for dynamic shape in trt
trt_max_shape (int): max shape for dynamic shape in trt
trt_opt_shape (int): opt shape for dynamic shape in trt
trt_calib_mode (bool): If the model is produced by TRT offline quantitative
calibration, trt_calib_mode need to set True
Returns:
predictor (PaddlePredictor): AnalysisPredictor
Raises:
...
...
@@ -335,7 +329,6 @@ def load_predictor(model_dir,
raise
ValueError
(
"Predict by TensorRT mode: {}, expect use_gpu==True, but use_gpu == {}"
.
format
(
run_mode
,
use_gpu
))
use_calib_mode
=
True
if
run_mode
==
'trt_int8'
else
False
config
=
Config
(
os
.
path
.
join
(
model_dir
,
'model.pdmodel'
),
os
.
path
.
join
(
model_dir
,
'model.pdiparams'
))
...
...
@@ -351,6 +344,17 @@ def load_predictor(model_dir,
config
.
switch_ir_optim
(
True
)
else
:
config
.
disable_gpu
()
config
.
set_cpu_math_library_num_threads
(
FLAGS
.
cpu_threads
)
if
FLAGS
.
enable_mkldnn
:
try
:
# cache 10 different shapes for mkldnn to avoid memory leak
config
.
set_mkldnn_cache_capacity
(
10
)
config
.
enable_mkldnn
()
except
Exception
as
e
:
print
(
"The current environment does not support `mkldnn`, so disable mkldnn."
)
pass
if
run_mode
in
precision_map
.
keys
():
config
.
enable_tensorrt_engine
(
...
...
@@ -359,10 +363,9 @@ def load_predictor(model_dir,
min_subgraph_size
=
min_subgraph_size
,
precision_mode
=
precision_map
[
run_mode
],
use_static
=
False
,
use_calib_mode
=
use
_calib_mode
)
use_calib_mode
=
trt
_calib_mode
)
if
use_dynamic_shape
:
print
(
'use_dynamic_shape'
)
min_input_shape
=
{
'image'
:
[
1
,
3
,
trt_min_shape
,
trt_min_shape
]}
max_input_shape
=
{
'image'
:
[
1
,
3
,
trt_max_shape
,
trt_max_shape
]}
opt_input_shape
=
{
'image'
:
[
1
,
3
,
trt_opt_shape
,
trt_opt_shape
]}
...
...
@@ -380,6 +383,37 @@ def load_predictor(model_dir,
return
predictor
def
get_test_images
(
infer_dir
,
infer_img
):
"""
Get image path list in TEST mode
"""
assert
infer_img
is
not
None
or
infer_dir
is
not
None
,
\
"--infer_img or --infer_dir should be set"
assert
infer_img
is
None
or
os
.
path
.
isfile
(
infer_img
),
\
"{} is not a file"
.
format
(
infer_img
)
assert
infer_dir
is
None
or
os
.
path
.
isdir
(
infer_dir
),
\
"{} is not a directory"
.
format
(
infer_dir
)
# infer_img has a higher priority
if
infer_img
and
os
.
path
.
isfile
(
infer_img
):
return
[
infer_img
]
images
=
set
()
infer_dir
=
os
.
path
.
abspath
(
infer_dir
)
assert
os
.
path
.
isdir
(
infer_dir
),
\
"infer_dir {} is not a directory"
.
format
(
infer_dir
)
exts
=
[
'jpg'
,
'jpeg'
,
'png'
,
'bmp'
]
exts
+=
[
ext
.
upper
()
for
ext
in
exts
]
for
ext
in
exts
:
images
.
update
(
glob
.
glob
(
'{}/*.{}'
.
format
(
infer_dir
,
ext
)))
images
=
list
(
images
)
assert
len
(
images
)
>
0
,
"no image found in {}"
.
format
(
infer_dir
)
print
(
"Found {} inference images in total."
.
format
(
len
(
images
)))
return
images
def
visualize
(
image_file
,
results
,
labels
,
output_dir
=
'output/'
,
threshold
=
0.5
):
# visualize the predict result
im
=
visualize_box_mask
(
image_file
,
results
,
labels
,
threshold
=
threshold
)
...
...
@@ -398,18 +432,19 @@ def print_arguments(args):
print
(
'------------------------------------------'
)
def
predict_image
(
detector
):
def
predict_image
(
detector
,
image_list
):
for
i
,
img_file
in
enumerate
(
image_list
):
if
FLAGS
.
run_benchmark
:
detector
.
predict
(
FLAGS
.
image_file
,
FLAGS
.
threshold
,
warmup
=
100
,
repeats
=
100
,
run_benchmark
=
True
)
detector
.
predict
(
img_file
,
FLAGS
.
threshold
,
warmup
=
10
,
repeats
=
10
)
cm
,
gm
,
gu
=
get_current_memory_mb
()
detector
.
cpu_mem
+=
cm
detector
.
gpu_mem
+=
gm
detector
.
gpu_util
+=
gu
print
(
'Test iter {}, file name:{}'
.
format
(
i
,
img_file
)
)
else
:
results
=
detector
.
predict
(
FLAGS
.
image
_file
,
FLAGS
.
threshold
)
results
=
detector
.
predict
(
img
_file
,
FLAGS
.
threshold
)
visualize
(
FLAGS
.
image
_file
,
img
_file
,
results
,
detector
.
pred_config
.
labels
,
output_dir
=
FLAGS
.
output_dir
,
...
...
@@ -465,7 +500,8 @@ def main():
use_dynamic_shape
=
FLAGS
.
use_dynamic_shape
,
trt_min_shape
=
FLAGS
.
trt_min_shape
,
trt_max_shape
=
FLAGS
.
trt_max_shape
,
trt_opt_shape
=
FLAGS
.
trt_opt_shape
)
trt_opt_shape
=
FLAGS
.
trt_opt_shape
,
trt_calib_mode
=
FLAGS
.
trt_calib_mode
)
if
pred_config
.
arch
==
'SOLOv2'
:
detector
=
DetectorSOLOv2
(
pred_config
,
...
...
@@ -475,77 +511,33 @@ def main():
use_dynamic_shape
=
FLAGS
.
use_dynamic_shape
,
trt_min_shape
=
FLAGS
.
trt_min_shape
,
trt_max_shape
=
FLAGS
.
trt_max_shape
,
trt_opt_shape
=
FLAGS
.
trt_opt_shape
)
# predict from image
if
FLAGS
.
image_file
!=
''
:
predict_image
(
detector
)
trt_opt_shape
=
FLAGS
.
trt_opt_shape
,
trt_calib_mode
=
FLAGS
.
trt_calib_mode
)
# predict from video file or camera video stream
if
FLAGS
.
video_file
!=
''
or
FLAGS
.
camera_id
!=
-
1
:
if
FLAGS
.
video_file
is
not
None
or
FLAGS
.
camera_id
!=
-
1
:
predict_video
(
detector
,
FLAGS
.
camera_id
)
else
:
# predict from image
img_list
=
get_test_images
(
FLAGS
.
image_dir
,
FLAGS
.
image_file
)
predict_image
(
detector
,
img_list
)
if
not
FLAGS
.
run_benchmark
:
detector
.
det_times
.
info
(
average
=
True
)
else
:
mems
=
{
'cpu_rss'
:
detector
.
cpu_mem
/
len
(
img_list
),
'gpu_rss'
:
detector
.
gpu_mem
/
len
(
img_list
),
'gpu_util'
:
detector
.
gpu_util
*
100
/
len
(
img_list
)
}
det_logger
=
LoggerHelper
(
FLAGS
,
detector
.
det_times
.
report
(
average
=
True
),
mems
)
det_logger
.
report
()
if
__name__
==
'__main__'
:
paddle
.
enable_static
()
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
.
add_argument
(
"--model_dir"
,
type
=
str
,
default
=
None
,
help
=
(
"Directory include:'model.pdiparams', 'model.pdmodel', "
"'infer_cfg.yml', created by tools/export_model.py."
),
required
=
True
)
parser
.
add_argument
(
"--image_file"
,
type
=
str
,
default
=
''
,
help
=
"Path of image file."
)
parser
.
add_argument
(
"--video_file"
,
type
=
str
,
default
=
''
,
help
=
"Path of video file."
)
parser
.
add_argument
(
"--camera_id"
,
type
=
int
,
default
=-
1
,
help
=
"device id of camera to predict."
)
parser
.
add_argument
(
"--run_mode"
,
type
=
str
,
default
=
'fluid'
,
help
=
"mode of running(fluid/trt_fp32/trt_fp16/trt_int8)"
)
parser
.
add_argument
(
"--use_gpu"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether to predict with GPU."
)
parser
.
add_argument
(
"--run_benchmark"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether to predict a image_file repeatedly for benchmark"
)
parser
.
add_argument
(
"--threshold"
,
type
=
float
,
default
=
0.5
,
help
=
"Threshold of score."
)
parser
.
add_argument
(
"--output_dir"
,
type
=
str
,
default
=
"output"
,
help
=
"Directory of output visualization files."
)
parser
.
add_argument
(
"--use_dynamic_shape"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Dynamic_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_min_shape"
,
type
=
int
,
default
=
1
,
help
=
"min_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_max_shape"
,
type
=
int
,
default
=
1280
,
help
=
"max_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_opt_shape"
,
type
=
int
,
default
=
640
,
help
=
"opt_shape for TensorRT."
)
parser
=
argsparser
()
FLAGS
=
parser
.
parse_args
()
print_arguments
(
FLAGS
)
if
FLAGS
.
image_file
!=
''
and
FLAGS
.
video_file
!=
''
:
assert
"Cannot predict image and video at the same time"
main
()
deploy/python/preprocess.py
浏览文件 @
aa9e0d54
...
...
@@ -71,9 +71,6 @@ class Resize(object):
assert
self
.
target_size
[
0
]
>
0
and
self
.
target_size
[
1
]
>
0
im_channel
=
im
.
shape
[
2
]
im_scale_y
,
im_scale_x
=
self
.
generate_scale
(
im
)
# set image_shape
im_info
[
'input_shape'
][
1
]
=
int
(
im_scale_y
*
im
.
shape
[
0
])
im_info
[
'input_shape'
][
2
]
=
int
(
im_scale_x
*
im
.
shape
[
1
])
im
=
cv2
.
resize
(
im
,
None
,
...
...
@@ -84,6 +81,14 @@ class Resize(object):
im_info
[
'im_shape'
]
=
np
.
array
(
im
.
shape
[:
2
]).
astype
(
'float32'
)
im_info
[
'scale_factor'
]
=
np
.
array
(
[
im_scale_y
,
im_scale_x
]).
astype
(
'float32'
)
# padding im when image_shape fixed by infer_cfg.yml
if
self
.
keep_ratio
and
im_info
[
'input_shape'
][
1
]
!=
-
1
:
max_size
=
im_info
[
'input_shape'
][
1
]
padding_im
=
np
.
zeros
(
(
max_size
,
max_size
,
im_channel
),
dtype
=
np
.
float32
)
im_h
,
im_w
=
im
.
shape
[:
2
]
padding_im
[:
im_h
,
:
im_w
,
:]
=
im
im
=
padding_im
return
im
,
im_info
def
generate_scale
(
self
,
im
):
...
...
deploy/python/utils.py
0 → 100644
浏览文件 @
aa9e0d54
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import
time
import
os
import
ast
import
argparse
def
argsparser
():
parser
=
argparse
.
ArgumentParser
(
description
=
__doc__
)
parser
.
add_argument
(
"--model_dir"
,
type
=
str
,
default
=
None
,
help
=
(
"Directory include:'model.pdiparams', 'model.pdmodel', "
"'infer_cfg.yml', created by tools/export_model.py."
),
required
=
True
)
parser
.
add_argument
(
"--image_file"
,
type
=
str
,
default
=
None
,
help
=
"Path of image file."
)
parser
.
add_argument
(
"--image_dir"
,
type
=
str
,
default
=
None
,
help
=
"Dir of image file, `image_file` has a higher priority."
)
parser
.
add_argument
(
"--video_file"
,
type
=
str
,
default
=
None
,
help
=
"Path of video file, `video_file` or `camera_id` has a highest priority."
)
parser
.
add_argument
(
"--camera_id"
,
type
=
int
,
default
=-
1
,
help
=
"device id of camera to predict."
)
parser
.
add_argument
(
"--threshold"
,
type
=
float
,
default
=
0.5
,
help
=
"Threshold of score."
)
parser
.
add_argument
(
"--output_dir"
,
type
=
str
,
default
=
"output"
,
help
=
"Directory of output visualization files."
)
parser
.
add_argument
(
"--run_mode"
,
type
=
str
,
default
=
'fluid'
,
help
=
"mode of running(fluid/trt_fp32/trt_fp16/trt_int8)"
)
parser
.
add_argument
(
"--use_gpu"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether to predict with GPU."
)
parser
.
add_argument
(
"--run_benchmark"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether to predict a image_file repeatedly for benchmark"
)
parser
.
add_argument
(
"--enable_mkldnn"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Whether use mkldnn with CPU."
)
parser
.
add_argument
(
"--cpu_threads"
,
type
=
int
,
default
=
1
,
help
=
"Num of threads with CPU."
)
parser
.
add_argument
(
"--use_dynamic_shape"
,
type
=
ast
.
literal_eval
,
default
=
False
,
help
=
"Dynamic_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_min_shape"
,
type
=
int
,
default
=
1
,
help
=
"min_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_max_shape"
,
type
=
int
,
default
=
1280
,
help
=
"max_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_opt_shape"
,
type
=
int
,
default
=
640
,
help
=
"opt_shape for TensorRT."
)
parser
.
add_argument
(
"--trt_calib_mode"
,
type
=
bool
,
default
=
False
,
help
=
"If the model is produced by TRT offline quantitative "
"calibration, trt_calib_mode need to set True."
)
return
parser
class
Times
(
object
):
def
__init__
(
self
):
self
.
time
=
0.
# start time
self
.
st
=
0.
# end time
self
.
et
=
0.
def
start
(
self
):
self
.
st
=
time
.
time
()
def
end
(
self
,
repeats
=
1
,
accumulative
=
True
):
self
.
et
=
time
.
time
()
if
accumulative
:
self
.
time
+=
(
self
.
et
-
self
.
st
)
/
repeats
else
:
self
.
time
=
(
self
.
et
-
self
.
st
)
/
repeats
def
reset
(
self
):
self
.
time
=
0.
self
.
st
=
0.
self
.
et
=
0.
def
value
(
self
):
return
round
(
self
.
time
,
4
)
class
Timer
(
Times
):
def
__init__
(
self
):
super
(
Timer
,
self
).
__init__
()
self
.
preprocess_time
=
Times
()
self
.
inference_time
=
Times
()
self
.
postprocess_time
=
Times
()
self
.
img_num
=
0
def
info
(
self
,
average
=
False
):
total_time
=
self
.
preprocess_time
.
value
()
+
self
.
inference_time
.
value
(
)
+
self
.
postprocess_time
.
value
()
total_time
=
round
(
total_time
,
4
)
print
(
"------------------ Inference Time Info ----------------------"
)
print
(
"total_time(ms): {}, img_num: {}"
.
format
(
total_time
*
1000
,
self
.
img_num
))
preprocess_time
=
round
(
self
.
preprocess_time
.
value
()
/
self
.
img_num
,
4
)
if
average
else
self
.
preprocess_time
.
value
()
postprocess_time
=
round
(
self
.
postprocess_time
.
value
()
/
self
.
img_num
,
4
)
if
average
else
self
.
postprocess_time
.
value
()
inference_time
=
round
(
self
.
inference_time
.
value
()
/
self
.
img_num
,
4
)
if
average
else
self
.
inference_time
.
value
()
average_latency
=
total_time
/
self
.
img_num
print
(
"average latency time(ms): {:.2f}, QPS: {:2f}"
.
format
(
average_latency
*
1000
,
1
/
average_latency
))
print
(
"preprocess_time(ms): {:.2f}, inference_time(ms): {:.2f}, postprocess_time(ms): {:.2f}"
.
format
(
preprocess_time
*
1000
,
inference_time
*
1000
,
postprocess_time
*
1000
))
def
report
(
self
,
average
=
False
):
dic
=
{}
dic
[
'preprocess_time'
]
=
round
(
self
.
preprocess_time
.
value
()
/
self
.
img_num
,
4
)
if
average
else
self
.
preprocess_time
.
value
()
dic
[
'postprocess_time'
]
=
round
(
self
.
postprocess_time
.
value
()
/
self
.
img_num
,
4
)
if
average
else
self
.
postprocess_time
.
value
()
dic
[
'inference_time'
]
=
round
(
self
.
inference_time
.
value
()
/
self
.
img_num
,
4
)
if
average
else
self
.
inference_time
.
value
()
dic
[
'img_num'
]
=
self
.
img_num
total_time
=
self
.
preprocess_time
.
value
()
+
self
.
inference_time
.
value
(
)
+
self
.
postprocess_time
.
value
()
dic
[
'total_time'
]
=
round
(
total_time
,
4
)
return
dic
def
get_current_memory_mb
():
"""
It is used to Obtain the memory usage of the CPU and GPU during the running of the program.
And this function Current program is time-consuming.
"""
import
pynvml
import
psutil
import
GPUtil
gpu_id
=
os
.
environ
.
get
(
'CUDA_VISIBLE_DEVICES'
,
0
)
pid
=
os
.
getpid
()
p
=
psutil
.
Process
(
pid
)
info
=
p
.
memory_full_info
()
cpu_mem
=
info
.
uss
/
1024.
/
1024.
gpu_mem
=
0
gpu_percent
=
0
gpus
=
GPUtil
.
getGPUs
()
if
gpu_id
is
not
None
and
len
(
gpus
)
>
0
:
gpu_percent
=
gpus
[
gpu_id
].
load
pynvml
.
nvmlInit
()
handle
=
pynvml
.
nvmlDeviceGetHandleByIndex
(
0
)
meminfo
=
pynvml
.
nvmlDeviceGetMemoryInfo
(
handle
)
gpu_mem
=
meminfo
.
used
/
1024.
/
1024.
return
round
(
cpu_mem
,
4
),
round
(
gpu_mem
,
4
),
round
(
gpu_percent
,
4
)
class
LoggerHelper
(
object
):
def
__init__
(
self
,
args
,
times
,
mem_info
=
None
):
"""
args: utility.parse_args()
times: The Timer class
"""
self
.
args
=
args
self
.
times
=
times
self
.
model_name
=
args
.
model_dir
.
strip
(
'/'
).
split
(
'/'
)[
-
1
]
self
.
batch_size
=
1
self
.
shape
=
"dynamic shape"
if
args
.
run_mode
==
'fluid'
:
self
.
precision
=
"fp32"
self
.
use_tensorrt
=
False
else
:
self
.
precision
=
args
.
run_mode
.
split
(
'_'
)[
-
1
]
self
.
use_tensorrt
=
True
self
.
device
=
"gpu"
if
args
.
use_gpu
else
"cpu"
self
.
preprocess_time
=
round
(
times
[
'preprocess_time'
],
4
)
self
.
inference_time
=
round
(
times
[
'inference_time'
],
4
)
self
.
postprocess_time
=
round
(
times
[
'postprocess_time'
],
4
)
self
.
data_num
=
times
[
'img_num'
]
self
.
total_time
=
round
(
times
[
'total_time'
],
4
)
self
.
mem_info
=
{
"cpu_rss"
:
0
,
"gpu_rss"
:
0
,
"gpu_util"
:
0
}
if
mem_info
is
not
None
:
self
.
mem_info
=
mem_info
def
report
(
self
):
print
(
"
\n
"
)
print
(
"----------------------- Config info -----------------------"
)
print
(
"runtime_device:"
,
self
.
device
)
print
(
"ir_optim:"
,
True
)
print
(
"enable_memory_optim:"
,
True
)
print
(
"enable_tensorrt:"
,
self
.
use_tensorrt
)
print
(
"precision:"
,
self
.
precision
)
print
(
"enable_mkldnn:"
,
self
.
args
.
enable_mkldnn
)
print
(
"cpu_math_library_num_threads:"
,
self
.
args
.
cpu_threads
)
print
(
"----------------------- Model info ----------------------"
)
print
(
"model_name:"
,
self
.
model_name
)
print
(
"------------------------ Data info ----------------------"
)
print
(
"batch_size:"
,
self
.
batch_size
)
print
(
"input_shape:"
,
self
.
shape
)
print
(
"----------------------- Perf info -----------------------"
)
print
(
"[cpu_rss(MB): {} gpu_rss(MB): {}, gpu_util: {}%"
.
format
(
round
(
self
.
mem_info
[
'cpu_rss'
],
4
),
round
(
self
.
mem_info
[
'gpu_rss'
],
4
),
round
(
self
.
mem_info
[
'gpu_util'
],
2
)))
print
(
"total number of predicted data: {} and total time spent(s): {}"
.
format
(
self
.
data_num
,
self
.
total_time
))
print
(
"preproce_time(ms): {}, inference_time(ms): {}, postprocess_time(ms): {}"
.
format
(
self
.
preprocess_time
*
1000
,
self
.
inference_time
*
1000
,
self
.
postprocess_time
*
1000
))
d
eploy/img
s/input_shape.png
→
d
ocs/image
s/input_shape.png
浏览文件 @
aa9e0d54
文件已移动
ppdet/engine/export_utils.py
浏览文件 @
aa9e0d54
...
...
@@ -57,6 +57,11 @@ def _parse_reader(reader_cfg, dataset_cfg, metric, arch, image_shape):
for
st
in
sample_transforms
[
1
:]:
for
key
,
value
in
st
.
items
():
p
=
{
'type'
:
key
}
if
key
==
'Resize'
:
if
value
.
get
(
'keep_ratio'
,
False
)
and
int
(
image_shape
[
1
])
!=
-
1
:
max_size
=
max
(
image_shape
[
1
:])
image_shape
=
[
3
,
max_size
,
max_size
]
value
[
'target_size'
]
=
image_shape
[
1
:]
p
.
update
(
value
)
preprocess_list
.
append
(
p
)
batch_transforms
=
reader_cfg
.
get
(
'batch_transforms'
,
None
)
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录