未验证 提交 65c25109 编写于 作者: D Daniel Yang 提交者: GitHub

test=release/1.8, test=document_fix (#2517)

LGTM
上级 e70df1ff
*~
*.pyc
*.DS_Store
._*
docs/_build/
docs/api/
docs/doxyoutput/
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
# Paddle Inference Demos
Paddle Inference为飞桨核心框架推理引擎。Paddle Inference功能特性丰富,性能优异,针对服务器端应用场景进行了深度的适配优化,做到高吞吐、低时延,保证了飞桨模型在服务器端即训即用,快速部署。
为了能让广大用户快速的使用Paddle Inference进行部署应用,我们在此Repo中提供了C++、Python的使用样例。
**在这个repo中我们会假设您已经对Paddle Inference有了一定的了解。**
**如果您刚刚接触Paddle Inference不久,建议您[访问这里](https://paddle-inference.readthedocs.io/en/latest/#)对Paddle Inference做一个初步的认识。**
## 测试样例
1) 在python目录中,我们通过真实输入的方式罗列了一系列的测试样例,其中包括图像的分类,分割,检测,以及NLP的Ernie/Bert等Python使用样例,同时也包含Paddle-TRT, 多线程的使用样例。
2) 在c++目录中,我们通过单测方式展现了一系列的测试样例,其中包括图像的分类,分割,检测,以及NLP的Ernie/Bert等C++使用样例,同时也包含Paddle-TRT, 多线程的使用样例。
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
message("flags" ${CMAKE_CXX_FLAGS})
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
endif()
include_directories("${PADDLE_LIB}")
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
include_directories("${PADDLE_LIB}/third_party/install/xxhash/include")
include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
include_directories("${PADDLE_LIB}/third_party/boost")
include_directories("${PADDLE_LIB}/third_party/eigen3")
if (USE_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_ROOT}/include")
link_directories("${TENSORRT_ROOT}/lib")
endif()
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
if(WITH_MKL)
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif()
else()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
# Note: libpaddle_inference_api.so/a must put before libpaddle_fluid.so/a
if(WITH_STATIC_LIB)
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
${EXTERNAL_LIB})
if(WITH_GPU)
if (USE_TENSORRT)
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/libcublas${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX} )
endif()
target_link_libraries(${DEMO_NAME} ${DEPS})
## 运行C++ LIC2020关系抽取 demo
该工程是[2020语言与智能技术竞赛:关系抽取任务](https://aistudio.baidu.com/aistudio/competition/detail/31)竞赛提供的基线模型的c++预测demo,基线模型的训练相关信息可参考[LIC2020-关系抽取任务基线系统](https://aistudio.baidu.com/aistudio/projectdetail/357344)
### 一:获取基线模型
点击[链接](https://paddle-inference-dist.bj.bcebos.com/lic_model.tgz)下载模型,如果你想获取更多的**模型训练信息**,请访问[LIC2020-关系抽取任务基线系统](https://aistudio.baidu.com/aistudio/projectdetail/357344)
### 二:**样例编译**
文件`demo.cc` 为预测的样例程序(程序中的输入为固定值,如果您有其他方式进行数据读取的需求,需要对程序进行一定的修改)。
文件`CMakeLists.txt` 为编译构建文件。
脚本`run_impl.sh` 包含了第三方库、预编译库的信息配置。
编译demo样例,我们首先需要对脚本`run_impl.sh` 文件中的配置进行修改。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
# 配置预测库的根目录
LIB_DIR=${YOUR_LIB_DIR}/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/usr/local/cudnn/lib64
CUDA_LIB=/usr/local/cuda/lib64
# TENSORRT_ROOT=/usr/local/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./demo -model_file ${LIC_MODEL_PATH}/model --params_file ${LIC_MODEL_PATH}/params
```
运行结束后,程序会将模型输出个数打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!]()
- [Paddle Inference Python Api使用]()
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <chrono>
#include <iostream>
#include <memory>
#include <numeric>
#include "paddle/include/paddle_inference_api.h"
using paddle::AnalysisConfig;
DEFINE_string(model_file, "", "Directory of the inference model.");
DEFINE_string(params_file, "", "Directory of the inference model.");
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_int32(batch_size, 1, "Directory of the inference model.");
DEFINE_int32(seq_len,
128,
"sequence length, should less than or equal to 512.");
DEFINE_bool(use_gpu, true, "enable gpu");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor() {
AnalysisConfig config;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
} else {
config.SetModel(FLAGS_model_file, FLAGS_params_file);
}
if (FLAGS_use_gpu) {
config.EnableUseGpu(100, 0);
}
// We use ZeroCopy, so we set config->SwitchUseFeedFetchOps(false)
config.SwitchUseFeedFetchOps(false);
return CreatePaddlePredictor(config);
}
template <typename Dtype>
std::vector<Dtype> PrepareInput(const std::vector<int>& shape,
int word_size = 18000);
template <>
std::vector<float> PrepareInput(const std::vector<int>& shape, int word_size) {
int count =
std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>());
std::vector<float> datas(count, 0);
for (int i = 0; i < count; ++i) {
datas[i] = i % 2 ? 0.f : 1.f;
}
return datas;
}
template <>
std::vector<int64_t> PrepareInput(const std::vector<int>& shape,
int word_size) {
int count =
std::accumulate(shape.begin(), shape.end(), 1, std::multiplies<int>());
std::vector<int64_t> datas(count, 0);
for (int i = 0; i < count; ++i) {
datas[i] = (i + 13) % word_size == 0 ? 1 : (i + 13) % word_size;
}
return datas;
}
void Run(paddle::PaddlePredictor* predictor,
std::vector<float>* out_data_0,
std::vector<int64_t>* out_data_1,
std::vector<int64_t>* out_data_2) {
const int word_size_0 = 18000;
const int word_size_1 = 2;
const int word_size_2 = 513;
const int batch_size = FLAGS_batch_size;
const int seq_len = FLAGS_seq_len;
auto input_names = predictor->GetInputNames();
std::vector<int> shape_seq{batch_size, seq_len, 1};
std::vector<int> shape_batch{batch_size};
std::vector<int> shape_s{batch_size, seq_len};
#define INPUT_EMB(num) \
auto input_##num = predictor->GetInputTensor(input_names[num]); \
auto data_##num = PrepareInput<int64_t>(shape_seq, word_size_##num); \
input_##num->Reshape(shape_seq); \
input_##num->copy_from_cpu(data_##num.data())
INPUT_EMB(0);
INPUT_EMB(1);
INPUT_EMB(2);
#undef INPUT_EMB
auto input_3 = predictor->GetInputTensor(input_names[3]);
auto data_3 = PrepareInput<float>(shape_seq);
input_3->Reshape(shape_seq);
input_3->copy_from_cpu(data_3.data());
auto input_4 = predictor->GetInputTensor(input_names[4]);
auto data_4 = PrepareInput<int64_t>(shape_batch);
input_4->Reshape(shape_batch);
input_4->copy_from_cpu(data_4.data());
#define INPUT_5_or_6(num) \
auto input_##num = predictor->GetInputTensor(input_names[num]); \
auto data_##num = PrepareInput<int64_t>(shape_s, 1); \
input_##num->Reshape(shape_s); \
input_##num->copy_from_cpu(data_##num.data())
INPUT_5_or_6(5);
INPUT_5_or_6(6);
#undef INPUT_5_or_6
CHECK(predictor->ZeroCopyRun());
auto output_names = predictor->GetOutputNames();
// there is three output of lic2020 baseline model
#define OUTPUT(num) \
auto output_##num = predictor->GetOutputTensor(output_names[num]); \
std::vector<int> output_shape_##num = output_##num->shape(); \
int out_num_##num = std::accumulate(output_shape_##num.begin(), \
output_shape_##num.end(), \
1, \
std::multiplies<int>()); \
out_data_##num->resize(out_num_##num); \
output_##num->copy_to_cpu(out_data_##num->data())
OUTPUT(0);
OUTPUT(1);
OUTPUT(2);
#undef OUTPUT
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = CreatePredictor();
std::vector<float> out_data_0;
std::vector<int64_t> out_data_1;
std::vector<int64_t> out_data_2;
Run(predictor.get(), &out_data_0, &out_data_1, &out_data_2);
LOG(INFO) << "output0 num is " << out_data_0.size();
LOG(INFO) << "output1 num is " << out_data_1.size();
LOG(INFO) << "output2 num is " << out_data_2.size();
return 0;
}
<html>
<head>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'] ],
displayMath: [ ['$$','$$'] ],
processEscapes: true
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
<script type="text/javascript" src="../.tools/theme/marked.js">
</script>
<link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
<script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
<link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
<link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 980px;
margin: 0 auto;
padding: 45px;
}
</style>
<body>
<div id="context" class="container-fluid markdown-body">
</div>
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
## 运行C++ LIC2020关系抽取 demo
该工程是[2020语言与智能技术竞赛:关系抽取任务](https://aistudio.baidu.com/aistudio/competition/detail/31)竞赛提供的基线模型的c++预测demo,基线模型的训练相关信息可参考[LIC2020-关系抽取任务基线系统](https://aistudio.baidu.com/aistudio/projectdetail/357344)。
### 一:获取基线模型
点击[链接](https://paddle-inference-dist.bj.bcebos.com/lic_model.tgz)下载模型,如果你想获取更多的**模型训练信息**,请访问[LIC2020-关系抽取任务基线系统](https://aistudio.baidu.com/aistudio/projectdetail/357344)。
### 二:**样例编译**
文件`demo.cc` 为预测的样例程序(程序中的输入为固定值,如果您有其他方式进行数据读取的需求,需要对程序进行一定的修改)。
文件`CMakeLists.txt` 为编译构建文件。
脚本`run_impl.sh` 包含了第三方库、预编译库的信息配置。
编译demo样例,我们首先需要对脚本`run_impl.sh` 文件中的配置进行修改。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
# 配置预测库的根目录
LIB_DIR=${YOUR_LIB_DIR}/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/usr/local/cudnn/lib64
CUDA_LIB=/usr/local/cuda/lib64
# TENSORRT_ROOT=/usr/local/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./demo -model_file ${LIC_MODEL_PATH}/model --params_file ${LIC_MODEL_PATH}/params
```
运行结束后,程序会将模型输出个数打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!]()
- [Paddle Inference Python Api使用]()
</div>
<!-- You can change the lines below now. -->
<script type="text/javascript">
marked.setOptions({
renderer: new marked.Renderer(),
gfm: true,
breaks: false,
smartypants: true,
highlight: function(code, lang) {
code = code.replace(/&amp;/g, "&")
code = code.replace(/&gt;/g, ">")
code = code.replace(/&lt;/g, "<")
code = code.replace(/&nbsp;/g, " ")
return hljs.highlightAuto(code, [lang]).value;
}
});
document.getElementById("context").innerHTML = marked(
document.getElementById("markdown").innerHTML)
</script>
</body>
work_path=$(dirname $(readlink -f $0))
mkdir -p build
cd build
rm -rf *
# same with the demo.cc
DEMO_NAME=demo
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
LIB_DIR=${work_path}/fluid_inference_install_dir
CUDNN_LIB=/usr/local/cudnn/lib64
CUDA_LIB=/usr/local/cuda/lib64
# TENSORRT_ROOT=/usr/local/TensorRT-6.0.1.5
cmake .. -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=${WITH_MKL} \
-DDEMO_NAME=${DEMO_NAME} \
-DWITH_GPU=${WITH_GPU} \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=${USE_TENSORRT} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
-DTENSORRT_ROOT=${TENSORRT_ROOT}
make -j
# C++ 预测样例
**如果您看到这个目录,我们会假设您已经对Paddle Inference有了一定的了解。**
**如果您刚刚接触Paddle Inference不久,建议您[访问这里](https://paddle-inference.readthedocs.io/en/latest/#)对Paddle Inference做一个初步的认识。**
这个目录包含了图像中使用的分类,检测,以及NLP中Ernie/Bert模型测试样例,同时也包含了Paddle-TRT,多线程等测试样例。
为了能够顺利运行样例,请您在环境中准备Paddle Inference C++预编译库。
**一:获取编译库:**
- [官网下载](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)
- 自行编译获取。
**二:预编译lib目录结构介绍:**
进入预编译库,目录结构为:
```
├── CMakeCache.txt
├── paddle
├── third_party
└── version.txt
```
其中`paddle`目录包含了预编译库的头文件以及lib文件。
`third_party`包含了第三方依赖库的头文件以及lib文件。
`version.txt`包含了lib的相关描述信息,包括:
```
GIT COMMIT ID: 06897f7c4ee41295e6e9a0af2a68800a27804f6c
WITH_MKL: ON # 是否带MKL
WITH_MKLDNN: OFF # 是否带MKLDNN
WITH_GPU: ON # 是否支持GPU
CUDA version: 10.1 # CUDA的版本
CUDNN version: v7。 # CUDNN版本
WITH_TENSORRT: ON # 是否带TRT
```
有了预编译库后我们开始进入各个目录进行样例测试吧~
<html>
<head>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'] ],
displayMath: [ ['$$','$$'] ],
processEscapes: true
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
<script type="text/javascript" src="../.tools/theme/marked.js">
</script>
<link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
<script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
<link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
<link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 980px;
margin: 0 auto;
padding: 45px;
}
</style>
<body>
<div id="context" class="container-fluid markdown-body">
</div>
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
# C++ 预测样例
**如果您看到这个目录,我们会假设您已经对Paddle Inference有了一定的了解。**
**如果您刚刚接触Paddle Inference不久,建议您[访问这里](https://paddle-inference.readthedocs.io/en/latest/#)对Paddle Inference做一个初步的认识。**
这个目录包含了图像中使用的分类,检测,以及NLP中Ernie/Bert模型测试样例,同时也包含了Paddle-TRT,多线程等测试样例。
为了能够顺利运行样例,请您在环境中准备Paddle Inference C++预编译库。
**一:获取编译库:**
- [官网下载](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/build_and_install_lib_cn.html)。
- 自行编译获取。
**二:预编译lib目录结构介绍:**
进入预编译库,目录结构为:
```
├── CMakeCache.txt
├── paddle
├── third_party
└── version.txt
```
其中`paddle`目录包含了预编译库的头文件以及lib文件。
`third_party`包含了第三方依赖库的头文件以及lib文件。
`version.txt`包含了lib的相关描述信息,包括:
```
GIT COMMIT ID: 06897f7c4ee41295e6e9a0af2a68800a27804f6c
WITH_MKL: ON # 是否带MKL
WITH_MKLDNN: OFF # 是否带MKLDNN
WITH_GPU: ON # 是否支持GPU
CUDA version: 10.1 # CUDA的版本
CUDNN version: v7。 # CUDNN版本
WITH_TENSORRT: ON # 是否带TRT
```
有了预编译库后我们开始进入各个目录进行样例测试吧~
</div>
<!-- You can change the lines below now. -->
<script type="text/javascript">
marked.setOptions({
renderer: new marked.Renderer(),
gfm: true,
breaks: false,
smartypants: true,
highlight: function(code, lang) {
code = code.replace(/&amp;/g, "&")
code = code.replace(/&gt;/g, ">")
code = code.replace(/&lt;/g, "<")
code = code.replace(/&nbsp;/g, " ")
return hljs.highlightAuto(code, [lang]).value;
}
});
document.getElementById("context").innerHTML = marked(
document.getElementById("markdown").innerHTML)
</script>
</body>
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -g")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
message("flags" ${CMAKE_CXX_FLAGS})
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
endif()
include_directories("${PADDLE_LIB}")
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
include_directories("${PADDLE_LIB}/third_party/install/xxhash/include")
include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
include_directories("${PADDLE_LIB}/third_party/boost")
include_directories("${PADDLE_LIB}/third_party/eigen3")
if (USE_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_ROOT}/include")
link_directories("${TENSORRT_ROOT}/lib")
endif()
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
if(WITH_MKL)
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif()
else()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
# Note: libpaddle_inference_api.so/a must put before libpaddle_fluid.so/a
if(WITH_STATIC_LIB)
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
${EXTERNAL_LIB})
if(WITH_GPU)
if (USE_TENSORRT)
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/libcublas${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX} )
endif()
target_link_libraries(${DEMO_NAME} ${DEPS})
## 使用Paddle-TRT进行ResNet50图像分类样例
该文档为使用Paddle-TRT预测在ResNet50分类模型上的实践demo。如果您刚接触Paddle-TRT,推荐先访问[这里](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)对Paddle-TRT有个初步认识。
本目录下,
- `trt_fp32_test.cc` 为使用Paddle-TRT进行FP32精度预测的样例程序源文件(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
- `trt_gen_calib_table_test.cc` 为离线量化校准中,产出量化校准表的样例程序源文件。
- `trt_int8_test.cc` 为使用Paddle-TRT进行Int8精度预测的样例程序源文件,根据传入布尔类型参数`use_calib``true``false`,可以进行加载离线量化校准表进行Int8预测,或加载PaddleSlim量化产出的Int8模型进行预测。
- `CMakeLists.txt` 为编译构建文件。
- `run_impl.sh` 包含了第三方库、预编译库的信息配置。
### 获取模型
首先,我们从下列链接下载所需模型:
[ResNet50 FP32模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50.tar.gz)
[ResNet50 PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz)
其中,FP32模型用于FP32精度预测,以及Int8离线校准预测;量化模型由模型压缩工具库PaddleSlim产出,PaddleSlim模型量化相关信息可以参考[这里](https://paddlepaddle.github.io/PaddleSlim/quick_start/quant_aware_tutorial.html)。使用Paddle-TRT进行Int8量化预测的介绍可以参考[这里](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/docs/optimize/paddle_trt.rst#int8%E9%87%8F%E5%8C%96%E9%A2%84%E6%B5%8B)
### 一、使用TRT FP32精度预测
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 选择使用fp32预测的demo
DEMO_NAME=trt_fp32_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./trt_fp32_test --model_file=../ResNet50/model --params_file=../ResNet50/params
```
运行结束后,程序会将模型预测输出的前20个结果打印到屏幕,说明运行成功。
### 二、使用TRT Int8离线量化预测
使用TRT Int8离线量化预测分为两步:生成量化校准表,以及加载校准表执行Int8预测。需要注意的是TRT Int8离线量化预测使用的仍然是ResNet50 FP32 模型,是通过校准表中包含的量化scale在运行时将FP32转为Int8从而加速预测的。
#### 生成量化校准表
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 选择生成量化校准表的demo
DEMO_NAME=trt_gen_calib_table_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./trt_gen_calib_table_test --model_file=../ResNet50/model --params_file=../ResNet50/params
```
运行结束后,模型文件夹`ResNet50`下的`_opt_cache`文件夹下会多出一个名字为`trt_calib_*`的文件,即校准表。
#### 加载校准表执行Int8预测
再次修改`run_impl.sh`,换成执行Int8预测的demo:
```shell
# 选择执行Int8预测的demo
DEMO_NAME=trt_int8_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例,注意此处要将use_calib配置为true
./trt_int8_test --model_file=../ResNet50/model --params_file=../ResNet50/params --use_calib=true
```
运行结束后,程序会将模型预测输出的前20个结果打印到屏幕,说明运行成功。
**Note**
观察`trt_gen_calib_table_test``trt_int8_test`的代码可以发现,生成校准表和加载校准表进行Int8预测的TensorRT配置是相同的,都是
```c++
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, AnalysisConfig::Precision::kInt8, false, true /*use_calib*/);
```
Paddle-TRT判断是生成还是加载校准表的条件是模型目录下`_opt_cache`文件夹里是否有一个名字为`trt_calib_*`的与当前模型对应的校准表文件。在运行时为了防止混淆生成与加载过程,可以通过观察运行log来区分。
生成校准表的log:
```
I0623 08:40:49.386909 107053 tensorrt_engine_op.h:159] This process is generating calibration table for Paddle TRT int8...
I0623 08:40:49.387279 107057 tensorrt_engine_op.h:352] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0623 08:41:13.784473 107053 analysis_predictor.cc:791] Wait for calib threads done.
I0623 08:41:14.419198 107053 analysis_predictor.cc:793] Generating TRT Calibration table data, this may cost a lot of time...
```
加载校准表预测的log:
```
I0623 08:40:27.217701 107040 tensorrt_subgraph_pass.cc:258] RUN Paddle TRT int8 calibration mode...
I0623 08:40:27.217834 107040 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
```
### 三、使用TRT 加载PaddleSlim Int8量化模型预测
这里,我们使用前面下载的ResNet50 PaddleSlim量化模型。与加载离线量化校准表执行Int8预测的区别是,PaddleSlim量化模型已经将scale保存在模型op的属性中,这里我们就不再需要校准表了,所以在运行样例时将`use_calib`配置为false。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 选择使用Int8预测的demo
DEMO_NAME=trt_int8_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例,注意此处要将use_calib配置为false
./trt_int8_test --model_file=../ResNet50_quant/model --params_file=../ResNet50_quant/params --use_calib=false
```
运行结束后,程序会将模型预测输出的前20个结果打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!](https://paddle-inference.readthedocs.io/en/latest/introduction/quick_start.html)
- [Paddle Inference C++ Api使用](https://paddle-inference.readthedocs.io/en/latest/user_guides/cxx_api.html)
- [Paddle Inference Python Api使用](https://paddle-inference.readthedocs.io/en/latest/user_guides/inference_python_api.html)
<html>
<head>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'] ],
displayMath: [ ['$$','$$'] ],
processEscapes: true
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
<script type="text/javascript" src="../.tools/theme/marked.js">
</script>
<link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
<script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
<link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
<link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 980px;
margin: 0 auto;
padding: 45px;
}
</style>
<body>
<div id="context" class="container-fluid markdown-body">
</div>
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
## 使用Paddle-TRT进行ResNet50图像分类样例
该文档为使用Paddle-TRT预测在ResNet50分类模型上的实践demo。如果您刚接触Paddle-TRT,推荐先访问[这里](https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html)对Paddle-TRT有个初步认识。
本目录下,
- `trt_fp32_test.cc` 为使用Paddle-TRT进行FP32精度预测的样例程序源文件(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
- `trt_gen_calib_table_test.cc` 为离线量化校准中,产出量化校准表的样例程序源文件。
- `trt_int8_test.cc` 为使用Paddle-TRT进行Int8精度预测的样例程序源文件,根据传入布尔类型参数`use_calib`为`true`或`false`,可以进行加载离线量化校准表进行Int8预测,或加载PaddleSlim量化产出的Int8模型进行预测。
- `CMakeLists.txt` 为编译构建文件。
- `run_impl.sh` 包含了第三方库、预编译库的信息配置。
### 获取模型
首先,我们从下列链接下载所需模型:
[ResNet50 FP32模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50.tar.gz)
[ResNet50 PaddleSlim量化模型](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50_quant.tar.gz)
其中,FP32模型用于FP32精度预测,以及Int8离线校准预测;量化模型由模型压缩工具库PaddleSlim产出,PaddleSlim模型量化相关信息可以参考[这里](https://paddlepaddle.github.io/PaddleSlim/quick_start/quant_aware_tutorial.html)。使用Paddle-TRT进行Int8量化预测的介绍可以参考[这里](https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/docs/optimize/paddle_trt.rst#int8%E9%87%8F%E5%8C%96%E9%A2%84%E6%B5%8B)。
### 一、使用TRT FP32精度预测
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 选择使用fp32预测的demo
DEMO_NAME=trt_fp32_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./trt_fp32_test --model_file=../ResNet50/model --params_file=../ResNet50/params
```
运行结束后,程序会将模型预测输出的前20个结果打印到屏幕,说明运行成功。
### 二、使用TRT Int8离线量化预测
使用TRT Int8离线量化预测分为两步:生成量化校准表,以及加载校准表执行Int8预测。需要注意的是TRT Int8离线量化预测使用的仍然是ResNet50 FP32 模型,是通过校准表中包含的量化scale在运行时将FP32转为Int8从而加速预测的。
#### 生成量化校准表
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 选择生成量化校准表的demo
DEMO_NAME=trt_gen_calib_table_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./trt_gen_calib_table_test --model_file=../ResNet50/model --params_file=../ResNet50/params
```
运行结束后,模型文件夹`ResNet50`下的`_opt_cache`文件夹下会多出一个名字为`trt_calib_*`的文件,即校准表。
#### 加载校准表执行Int8预测
再次修改`run_impl.sh`,换成执行Int8预测的demo:
```shell
# 选择执行Int8预测的demo
DEMO_NAME=trt_int8_test
# 本节中,我们使用了TensorRT,需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。请注意CUDA和CUDNN需要设置到lib64一层,而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例,注意此处要将use_calib配置为true
./trt_int8_test --model_file=../ResNet50/model --params_file=../ResNet50/params --use_calib=true
```
运行结束后,程序会将模型预测输出的前20个结果打印到屏幕,说明运行成功。
**Note**
观察`trt_gen_calib_table_test`和`trt_int8_test`的代码可以发现,生成校准表和加载校准表进行Int8预测的TensorRT配置是相同的,都是
```c++
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, AnalysisConfig::Precision::kInt8, false, true /*use_calib*/);
```
Paddle-TRT判断是生成还是加载校准表的条件是模型目录下`_opt_cache`文件夹里是否有一个名字为`trt_calib_*`的与当前模型对应的校准表文件在运行时为了防止混淆生成与加载过程可以通过观察运行log来区分
生成校准表的log
```
I0623 08:40:49.386909 107053 tensorrt_engine_op.h:159] This process is generating calibration table for Paddle TRT int8...
I0623 08:40:49.387279 107057 tensorrt_engine_op.h:352] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0623 08:41:13.784473 107053 analysis_predictor.cc:791] Wait for calib threads done.
I0623 08:41:14.419198 107053 analysis_predictor.cc:793] Generating TRT Calibration table data, this may cost a lot of time...
```
加载校准表预测的log
```
I0623 08:40:27.217701 107040 tensorrt_subgraph_pass.cc:258] RUN Paddle TRT int8 calibration mode...
I0623 08:40:27.217834 107040 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
```
### 使用TRT 加载PaddleSlim Int8量化模型预测
这里我们使用前面下载的ResNet50 PaddleSlim量化模型与加载离线量化校准表执行Int8预测的区别是PaddleSlim量化模型已经将scale保存在模型op的属性中这里我们就不再需要校准表了所以在运行样例时将`use_calib`配置为false
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改
```shell
# 选择使用Int8预测的demo
DEMO_NAME=trt_int8_test
# 本节中我们使用了TensorRT需要将USE_TENSORRT打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
# 配置预测库的根目录
LIB_DIR=/paddle/fluid_inference_install_dir
# 如果上述的WITH_GPU USE_TENSORRT设为ON请设置对应的CUDA CUDNN TENSORRT的路径请注意CUDA和CUDNN需要设置到lib64一层而TensorRT是设置到根目录一层
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例注意此处要将use_calib配置为false
./trt_int8_test --model_file=../ResNet50_quant/model --params_file=../ResNet50_quant/params --use_calib=false
```
运行结束后程序会将模型预测输出的前20个结果打印到屏幕说明运行成功
### 更多链接
- [Paddle Inference使用Quick Start!](https://paddle-inference.readthedocs.io/en/latest/introduction/quick_start.html)
- [Paddle Inference C++ Api使用](https://paddle-inference.readthedocs.io/en/latest/user_guides/cxx_api.html)
- [Paddle Inference Python Api使用](https://paddle-inference.readthedocs.io/en/latest/user_guides/inference_python_api.html)
</div>
<!-- You can change the lines below now. -->
<script type="text/javascript">
marked.setOptions({
renderer: new marked.Renderer(),
gfm: true,
breaks: false,
smartypants: true,
highlight: function(code, lang) {
code = code.replace(/&amp;/g, "&")
code = code.replace(/&gt;/g, ">")
code = code.replace(/&lt;/g, "<")
code = code.replace(/&nbsp;/g, " ")
return hljs.highlightAuto(code, [lang]).value;
}
});
document.getElementById("context").innerHTML = marked(
document.getElementById("markdown").innerHTML)
</script>
</body>
mkdir -p build
cd build
rm -rf *
# same with the resnet50_test.cc
DEMO_NAME=resnet50_trt_test
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=ON
LIB_DIR=/paddle/build/fluid_inference_install_dir
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.6_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
cmake .. -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=${WITH_MKL} \
-DDEMO_NAME=${DEMO_NAME} \
-DWITH_GPU=${WITH_GPU} \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=${USE_TENSORRT} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
-DTENSORRT_ROOT=${TENSORRT_ROOT}
make -j
#include <numeric>
#include <iostream>
#include <memory>
#include <chrono>
#include <gflags/gflags.h>
#include <glog/logging.h>
#include "paddle/include/paddle_inference_api.h"
using paddle::AnalysisConfig;
DEFINE_string(model_file, "", "Path of the inference model file.");
DEFINE_string(params_file, "", "Path of the inference params file.");
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_int32(batch_size, 1, "Batch size.");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor() {
AnalysisConfig config;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
} else {
config.SetModel(FLAGS_model_file,
FLAGS_params_file);
}
config.EnableUseGpu(500, 0);
// We use ZeroCopy, so we set config.SwitchUseFeedFetchOps(false) here.
config.SwitchUseFeedFetchOps(false);
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, AnalysisConfig::Precision::kFloat32, false, false);
return CreatePaddlePredictor(config);
}
void run(paddle::PaddlePredictor *predictor,
const std::vector<float>& input,
const std::vector<int>& input_shape,
std::vector<float> *out_data) {
int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1, std::multiplies<int>());
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputTensor(input_names[0]);
input_t->Reshape(input_shape);
input_t->copy_from_cpu(input.data());
CHECK(predictor->ZeroCopyRun());
auto output_names = predictor->GetOutputNames();
// there is only one output of Resnet50
auto output_t = predictor->GetOutputTensor(output_names[0]);
std::vector<int> output_shape = output_t->shape();
int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
out_data->resize(out_num);
output_t->copy_to_cpu(out_data->data());
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = CreatePredictor();
std::vector<int> input_shape = {FLAGS_batch_size, 3, 224, 224};
// Init input as 1.0 here for example. You can also load preprocessed real pictures to vectors as input.
std::vector<float> input_data(FLAGS_batch_size * 3 * 224 * 224, 1.0);
std::vector<float> out_data;
run(predictor.get(), input_data, input_shape, &out_data);
// Print first 20 outputs
for (int i = 0; i < 20; i++) {
LOG(INFO) << out_data[i] << std::endl;
}
return 0;
}
#include <numeric>
#include <iostream>
#include <memory>
#include <chrono>
#include <random>
#include <gflags/gflags.h>
#include <glog/logging.h>
#include "paddle/include/paddle_inference_api.h"
using paddle::AnalysisConfig;
DEFINE_string(model_file, "", "Path of the inference model file.");
DEFINE_string(params_file, "", "Path of the inference params file.");
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_int32(batch_size, 1, "Batch size.");
float Random(float low, float high) {
static std::random_device rd;
static std::mt19937 mt(rd());
std::uniform_real_distribution<double> dist(low, high);
return dist(mt);
}
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor() {
AnalysisConfig config;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
} else {
config.SetModel(FLAGS_model_file,
FLAGS_params_file);
}
config.EnableUseGpu(500, 0);
// We use ZeroCopy, so we set config.SwitchUseFeedFetchOps(false) here.
config.SwitchUseFeedFetchOps(false);
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, AnalysisConfig::Precision::kInt8, false, true /*use_calib*/);
return CreatePaddlePredictor(config);
}
void run(paddle::PaddlePredictor *predictor,
std::vector<float>& input,
const std::vector<int>& input_shape,
std::vector<float> *out_data) {
int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1, std::multiplies<int>());
for (size_t i = 0; i < 500; i++) {
// We use random data here for example. Change this to real data in your application.
for (int j = 0; j < input_num; j++) {
input[j] = Random(0, 1.0);
}
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputTensor(input_names[0]);
input_t->Reshape(input_shape);
input_t->copy_from_cpu(input.data());
// Run predictor to generate calibration table. Can be very time-consuming.
CHECK(predictor->ZeroCopyRun());
}
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = CreatePredictor();
std::vector<int> input_shape = {FLAGS_batch_size, 3, 224, 224};
// Init input as 1.0 here for example. You can also load preprocessed real pictures to vectors as input.
std::vector<float> input_data(FLAGS_batch_size * 3 * 224 * 224, 1.0);
std::vector<float> out_data;
run(predictor.get(), input_data, input_shape, &out_data);
return 0;
}
#include <numeric>
#include <iostream>
#include <memory>
#include <chrono>
#include <gflags/gflags.h>
#include <glog/logging.h>
#include "paddle/include/paddle_inference_api.h"
using paddle::AnalysisConfig;
DEFINE_string(model_file, "", "Path of the inference model file.");
DEFINE_string(params_file, "", "Path of the inference params file.");
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_int32(batch_size, 1, "Batch size.");
DEFINE_bool(use_calib, true, "Whether to use calib. Set to true if you are using TRT calibration; Set to false if you are using PaddleSlim quant models.");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor() {
AnalysisConfig config;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
} else {
config.SetModel(FLAGS_model_file,
FLAGS_params_file);
}
config.EnableUseGpu(500, 0);
// We use ZeroCopy, so we set config.SwitchUseFeedFetchOps(false) here.
config.SwitchUseFeedFetchOps(false);
config.EnableTensorRtEngine(1 << 30, FLAGS_batch_size, 5, AnalysisConfig::Precision::kInt8, false, FLAGS_use_calib);
return CreatePaddlePredictor(config);
}
void run(paddle::PaddlePredictor *predictor,
const std::vector<float>& input,
const std::vector<int>& input_shape,
std::vector<float> *out_data) {
int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1, std::multiplies<int>());
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputTensor(input_names[0]);
input_t->Reshape(input_shape);
input_t->copy_from_cpu(input.data());
CHECK(predictor->ZeroCopyRun());
auto output_names = predictor->GetOutputNames();
// there is only one output of Resnet50
auto output_t = predictor->GetOutputTensor(output_names[0]);
std::vector<int> output_shape = output_t->shape();
int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
out_data->resize(out_num);
output_t->copy_to_cpu(out_data->data());
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = CreatePredictor();
std::vector<int> input_shape = {FLAGS_batch_size, 3, 224, 224};
// Init input as 1.0 here for example. You can also load preprocessed real pictures to vectors as input.
std::vector<float> input_data(FLAGS_batch_size * 3 * 224 * 224, 1.0);
std::vector<float> out_data;
run(predictor.get(), input_data, input_shape, &out_data);
// Print first 20 outputs
for (int i = 0; i < 20; i++) {
LOG(INFO) << out_data[i] << std::endl;
}
return 0;
}
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -g")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
message("flags" ${CMAKE_CXX_FLAGS})
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
endif()
include_directories("${PADDLE_LIB}")
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
include_directories("${PADDLE_LIB}/third_party/install/xxhash/include")
include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
include_directories("${PADDLE_LIB}/third_party/boost")
include_directories("${PADDLE_LIB}/third_party/eigen3")
if (USE_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_ROOT}/include")
link_directories("${TENSORRT_ROOT}/lib")
endif()
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
if(WITH_MKL)
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif()
else()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
# Note: libpaddle_inference_api.so/a must put before libpaddle_fluid.so/a
if(WITH_STATIC_LIB)
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
${EXTERNAL_LIB})
if(WITH_GPU)
if (USE_TENSORRT)
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/libcublas${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX} )
endif()
target_link_libraries(${DEMO_NAME} ${DEPS})
## 运行C++ ResNet50图像分类样例
### 一:获取Resnet50模型
点击[链接](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50.tar.gz)下载模型, 该模型在imagenet 数据集训练得到的,如果你想获取更多的**模型训练信息**,请访问[这里](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification)
### 二:**样例编译**
文件`resnet50_test.cc` 为预测的样例程序(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
文件`CMakeLists.txt` 为编译构建文件。
脚本`run_impl.sh` 包含了第三方库、预编译库的信息配置。
编译Resnet50样例,我们首先需要对脚本`run_impl.sh` 文件中的配置进行修改。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
# 配置预测库的根目录
LIB_DIR=${YOUR_LIB_DIR}/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.5_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
# TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./resnet50_test --model_file=${RESNET_MODEL_PATH}/ResNet/model --params_file=${RESNET_MODEL_PATH}/ResNet/params
```
运行结束后,程序会将模型结果打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!]()
- [Paddle Inference Python Api使用]()
<html>
<head>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'] ],
displayMath: [ ['$$','$$'] ],
processEscapes: true
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
<script type="text/javascript" src="../.tools/theme/marked.js">
</script>
<link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
<script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
<link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
<link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 980px;
margin: 0 auto;
padding: 45px;
}
</style>
<body>
<div id="context" class="container-fluid markdown-body">
</div>
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
## 运行C++ ResNet50图像分类样例
### 一:获取Resnet50模型
点击[链接](https://paddle-inference-dist.bj.bcebos.com/inference_demo/python/resnet50/ResNet50.tar.gz)下载模型, 该模型在imagenet 数据集训练得到的,如果你想获取更多的**模型训练信息**,请访问[这里](https://github.com/PaddlePaddle/models/tree/develop/PaddleCV/image_classification)。
### 二:**样例编译**
文件`resnet50_test.cc` 为预测的样例程序(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
文件`CMakeLists.txt` 为编译构建文件。
脚本`run_impl.sh` 包含了第三方库、预编译库的信息配置。
编译Resnet50样例,我们首先需要对脚本`run_impl.sh` 文件中的配置进行修改。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
# 配置预测库的根目录
LIB_DIR=${YOUR_LIB_DIR}/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.5_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
# TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./resnet50_test --model_file=${RESNET_MODEL_PATH}/ResNet/model --params_file=${RESNET_MODEL_PATH}/ResNet/params
```
运行结束后,程序会将模型结果打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!]()
- [Paddle Inference Python Api使用]()
</div>
<!-- You can change the lines below now. -->
<script type="text/javascript">
marked.setOptions({
renderer: new marked.Renderer(),
gfm: true,
breaks: false,
smartypants: true,
highlight: function(code, lang) {
code = code.replace(/&amp;/g, "&")
code = code.replace(/&gt;/g, ">")
code = code.replace(/&lt;/g, "<")
code = code.replace(/&nbsp;/g, " ")
return hljs.highlightAuto(code, [lang]).value;
}
});
document.getElementById("context").innerHTML = marked(
document.getElementById("markdown").innerHTML)
</script>
</body>
#include <numeric>
#include <iostream>
#include <memory>
#include <chrono>
#include <gflags/gflags.h>
#include <glog/logging.h>
#include "paddle/include/paddle_inference_api.h"
using paddle::AnalysisConfig;
DEFINE_string(model_file, "", "Directory of the inference model.");
DEFINE_string(params_file, "", "Directory of the inference model.");
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_int32(batch_size, 1, "Directory of the inference model.");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor() {
AnalysisConfig config;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
}
config.SetModel(FLAGS_model_file,
FLAGS_params_file);
config.EnableUseGpu(100, 0);
// We use ZeroCopy, so we set config->SwitchUseFeedFetchOps(false)
config.SwitchUseFeedFetchOps(false);
config.EnableMKLDNN();
// Open the memory optim.
config.EnableMemoryOptim();
return CreatePaddlePredictor(config);
}
void run(paddle::PaddlePredictor *predictor,
const std::vector<float>& input,
const std::vector<int>& input_shape,
std::vector<float> *out_data) {
int input_num = std::accumulate(input_shape.begin(), input_shape.end(), 1, std::multiplies<int>());
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputTensor(input_names[0]);
input_t->Reshape(input_shape);
input_t->copy_from_cpu(input.data());
CHECK(predictor->ZeroCopyRun());
auto output_names = predictor->GetOutputNames();
// there is only one output of Resnet50
auto output_t = predictor->GetOutputTensor(output_names[0]);
std::vector<int> output_shape = output_t->shape();
int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
out_data->resize(out_num);
output_t->copy_to_cpu(out_data->data());
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = CreatePredictor();
std::vector<int> input_shape = {FLAGS_batch_size, 3, 224, 224};
// init 0 for the input.
std::vector<float> input_data(FLAGS_batch_size * 3 * 224 * 224, 0);
std::vector<float> out_data;
run(predictor.get(), input_data, input_shape, &out_data);
for (auto e : out_data) {
LOG(INFO) << e << std::endl;
}
return 0;
}
mkdir -p build
cd build
rm -rf *
# same with the resnet50_test.cc
DEMO_NAME=resnet50_test
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
LIB_DIR=/paddle/trt_refine_int8/build/fluid_inference_install_dir
CUDNN_LIB=/paddle/nvidia-downloads/cudnn_v7.5_cuda10.1/lib64
CUDA_LIB=/paddle/nvidia-downloads/cuda-10.1/lib64
# TENSORRT_ROOT=/paddle/nvidia-downloads/TensorRT-6.0.1.5
cmake .. -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=${WITH_MKL} \
-DDEMO_NAME=${DEMO_NAME} \
-DWITH_GPU=${WITH_GPU} \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=${USE_TENSORRT} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
-DTENSORRT_ROOT=${TENSORRT_ROOT}
make -j
cmake_minimum_required(VERSION 3.0)
project(cpp_inference_demo CXX C)
option(WITH_MKL "Compile demo with MKL/OpenBlas support, default use MKL." ON)
option(WITH_GPU "Compile demo with GPU/CPU, default use CPU." OFF)
option(WITH_STATIC_LIB "Compile demo with static/shared library, default use static." ON)
option(USE_TENSORRT "Compile demo with TensorRT." OFF)
macro(safe_set_static_flag)
foreach(flag_var
CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE
CMAKE_CXX_FLAGS_MINSIZEREL CMAKE_CXX_FLAGS_RELWITHDEBINFO)
if(${flag_var} MATCHES "/MD")
string(REGEX REPLACE "/MD" "/MT" ${flag_var} "${${flag_var}}")
endif(${flag_var} MATCHES "/MD")
endforeach(flag_var)
endmacro()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
set(CMAKE_STATIC_LIBRARY_PREFIX "")
message("flags" ${CMAKE_CXX_FLAGS})
if(NOT DEFINED PADDLE_LIB)
message(FATAL_ERROR "please set PADDLE_LIB with -DPADDLE_LIB=/path/paddle/lib")
endif()
if(NOT DEFINED DEMO_NAME)
message(FATAL_ERROR "please set DEMO_NAME with -DDEMO_NAME=demo_name")
endif()
include_directories("${PADDLE_LIB}")
include_directories("${PADDLE_LIB}/third_party/install/protobuf/include")
include_directories("${PADDLE_LIB}/third_party/install/glog/include")
include_directories("${PADDLE_LIB}/third_party/install/gflags/include")
include_directories("${PADDLE_LIB}/third_party/install/xxhash/include")
include_directories("${PADDLE_LIB}/third_party/install/zlib/include")
include_directories("${PADDLE_LIB}/third_party/boost")
include_directories("${PADDLE_LIB}/third_party/eigen3")
if (USE_TENSORRT AND WITH_GPU)
include_directories("${TENSORRT_ROOT}/include")
link_directories("${TENSORRT_ROOT}/lib")
endif()
link_directories("${PADDLE_LIB}/third_party/install/zlib/lib")
link_directories("${PADDLE_LIB}/third_party/install/protobuf/lib")
link_directories("${PADDLE_LIB}/third_party/install/glog/lib")
link_directories("${PADDLE_LIB}/third_party/install/gflags/lib")
link_directories("${PADDLE_LIB}/third_party/install/xxhash/lib")
link_directories("${PADDLE_LIB}/paddle/lib")
add_executable(${DEMO_NAME} ${DEMO_NAME}.cc)
if(WITH_MKL)
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
set(MATH_LIB ${PADDLE_LIB}/third_party/install/mklml/lib/libmklml_intel${CMAKE_SHARED_LIBRARY_SUFFIX}
${PADDLE_LIB}/third_party/install/mklml/lib/libiomp5${CMAKE_SHARED_LIBRARY_SUFFIX})
set(MKLDNN_PATH "${PADDLE_LIB}/third_party/install/mkldnn")
if(EXISTS ${MKLDNN_PATH})
include_directories("${MKLDNN_PATH}/include")
set(MKLDNN_LIB ${MKLDNN_PATH}/lib/libmkldnn.so.0)
endif()
else()
set(MATH_LIB ${PADDLE_LIB}/third_party/install/openblas/lib/libopenblas${CMAKE_STATIC_LIBRARY_SUFFIX})
endif()
# Note: libpaddle_inference_api.so/a must put before libpaddle_fluid.so/a
if(WITH_STATIC_LIB)
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_STATIC_LIBRARY_SUFFIX})
else()
set(DEPS
${PADDLE_LIB}/paddle/lib/libpaddle_fluid${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(EXTERNAL_LIB "-lrt -ldl -lpthread")
set(DEPS ${DEPS}
${MATH_LIB} ${MKLDNN_LIB}
glog gflags protobuf z xxhash
${EXTERNAL_LIB})
if(WITH_GPU)
if (USE_TENSORRT)
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS}
${TENSORRT_ROOT}/lib/libnvinfer_plugin${CMAKE_SHARED_LIBRARY_SUFFIX})
endif()
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX})
set(DEPS ${DEPS} ${CUDA_LIB}/libcudart${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDA_LIB}/libcublas${CMAKE_SHARED_LIBRARY_SUFFIX} )
set(DEPS ${DEPS} ${CUDNN_LIB}/libcudnn${CMAKE_SHARED_LIBRARY_SUFFIX} )
endif()
target_link_libraries(${DEMO_NAME} ${DEPS})
## 运行C++ YOLOv3图像检测样例
### 一:获取YOLOv3模型
点击[链接](https://paddle-inference-dist.cdn.bcebos.com/PaddleLite/yolov3_infer.tar.gz)下载模型, 该模型在imagenet数据集训练得到的,如果你想获取更多的**模型训练信息**,请访问[这里](https://github.com/PaddlePaddle/PaddleDetection)
### 二:**样例编译**
文件`yolov3_test.cc` 为预测的样例程序(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
文件`CMakeLists.txt` 为编译构建文件。
脚本`run_impl.sh` 包含了第三方库、预编译库的信息配置。
编译yolov3样例,我们首先需要对脚本`run_impl.sh` 文件中的配置进行修改。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
# 配置预测库的根目录
LIB_DIR=${YOUR_LIB_DIR}/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/usr/local/cudnn/lib64
CUDA_LIB=/usr/local/cuda/lib64
# TENSORRT_ROOT=/usr/local/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./yolov3_test -model_file ${YOLO_MODEL_PATH}/__model__ --params_file ${YOLO_MODEL_PATH}/__params__
```
运行结束后,程序会将模型输出个数打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!]()
- [Paddle Inference Python Api使用]()
<html>
<head>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'] ],
displayMath: [ ['$$','$$'] ],
processEscapes: true
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
<script type="text/javascript" src="../.tools/theme/marked.js">
</script>
<link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
<script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
<link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
<link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 980px;
margin: 0 auto;
padding: 45px;
}
</style>
<body>
<div id="context" class="container-fluid markdown-body">
</div>
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
## 运行C++ YOLOv3图像检测样例
### 一:获取YOLOv3模型
点击[链接](https://paddle-inference-dist.cdn.bcebos.com/PaddleLite/yolov3_infer.tar.gz)下载模型, 该模型在imagenet数据集训练得到的,如果你想获取更多的**模型训练信息**,请访问[这里](https://github.com/PaddlePaddle/PaddleDetection)。
### 二:**样例编译**
文件`yolov3_test.cc` 为预测的样例程序(程序中的输入为固定值,如果您有opencv或其他方式进行数据读取的需求,需要对程序进行一定的修改)。
文件`CMakeLists.txt` 为编译构建文件。
脚本`run_impl.sh` 包含了第三方库、预编译库的信息配置。
编译yolov3样例,我们首先需要对脚本`run_impl.sh` 文件中的配置进行修改。
1)**修改`run_impl.sh`**
打开`run_impl.sh`,我们对以下的几处信息进行修改:
```shell
# 根据预编译库中的version.txt信息判断是否将以下三个标记打开
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
# 配置预测库的根目录
LIB_DIR=${YOUR_LIB_DIR}/fluid_inference_install_dir
# 如果上述的WITH_GPU 或 USE_TENSORRT设为ON,请设置对应的CUDA, CUDNN, TENSORRT的路径。
CUDNN_LIB=/usr/local/cudnn/lib64
CUDA_LIB=/usr/local/cuda/lib64
# TENSORRT_ROOT=/usr/local/TensorRT-6.0.1.5
```
运行 `sh run_impl.sh`, 会在目录下产生build目录。
2) **运行样例**
```shell
# 进入build目录
cd build
# 运行样例
./yolov3_test -model_file ${YOLO_MODEL_PATH}/__model__ --params_file ${YOLO_MODEL_PATH}/__params__
```
运行结束后,程序会将模型输出个数打印到屏幕,说明运行成功。
### 更多链接
- [Paddle Inference使用Quick Start!]()
- [Paddle Inference Python Api使用]()
</div>
<!-- You can change the lines below now. -->
<script type="text/javascript">
marked.setOptions({
renderer: new marked.Renderer(),
gfm: true,
breaks: false,
smartypants: true,
highlight: function(code, lang) {
code = code.replace(/&amp;/g, "&")
code = code.replace(/&gt;/g, ">")
code = code.replace(/&lt;/g, "<")
code = code.replace(/&nbsp;/g, " ")
return hljs.highlightAuto(code, [lang]).value;
}
});
document.getElementById("context").innerHTML = marked(
document.getElementById("markdown").innerHTML)
</script>
</body>
work_path=$(dirname $(readlink -f $0))
mkdir -p build
cd build
rm -rf *
# same with the yolov3_test.cc
DEMO_NAME=yolov3_test
WITH_MKL=ON
WITH_GPU=ON
USE_TENSORRT=OFF
LIB_DIR=${work_path}/fluid_inference_install_dir
CUDNN_LIB=/usr/local/cudnn/lib64
CUDA_LIB=/usr/local/cuda/lib64
# TENSORRT_ROOT=/usr/local/TensorRT-6.0.1.5
cmake .. -DPADDLE_LIB=${LIB_DIR} \
-DWITH_MKL=${WITH_MKL} \
-DDEMO_NAME=${DEMO_NAME} \
-DWITH_GPU=${WITH_GPU} \
-DWITH_STATIC_LIB=OFF \
-DUSE_TENSORRT=${USE_TENSORRT} \
-DCUDNN_LIB=${CUDNN_LIB} \
-DCUDA_LIB=${CUDA_LIB} \
-DTENSORRT_ROOT=${TENSORRT_ROOT}
make -j
#include "paddle/include/paddle_inference_api.h"
#include <numeric>
#include <iostream>
#include <memory>
#include <chrono>
#include <gflags/gflags.h>
#include <glog/logging.h>
using paddle::AnalysisConfig;
DEFINE_string(model_file, "", "Directory of the inference model.");
DEFINE_string(params_file, "", "Directory of the inference model.");
DEFINE_string(model_dir, "", "Directory of the inference model.");
DEFINE_int32(batch_size, 1, "Directory of the inference model.");
DEFINE_bool(use_gpu, false, "enable gpu");
DEFINE_bool(use_mkldnn, false, "enable mkldnn");
DEFINE_bool(mem_optim, false, "enable memory optimize");
using Time = decltype(std::chrono::high_resolution_clock::now());
Time time() { return std::chrono::high_resolution_clock::now(); };
double time_diff(Time t1, Time t2) {
typedef std::chrono::microseconds ms;
auto diff = t2 - t1;
ms counter = std::chrono::duration_cast<ms>(diff);
return counter.count() / 1000.0;
}
std::unique_ptr<paddle::PaddlePredictor> CreatePredictor() {
AnalysisConfig config;
if (FLAGS_model_dir != "") {
config.SetModel(FLAGS_model_dir);
} else {
config.SetModel(FLAGS_model_file,
FLAGS_params_file);
}
if (FLAGS_use_gpu) {
config.EnableUseGpu(100, 0);
}
if (FLAGS_use_mkldnn) {
config.EnableMKLDNN();
}
// Open the memory optim.
if (FLAGS_mem_optim) {
config.EnableMemoryOptim();
}
// We use ZeroCopy, so we set config->SwitchUseFeedFetchOps(false)
config.SwitchUseFeedFetchOps(false);
return CreatePaddlePredictor(config);
}
void run(paddle::PaddlePredictor *predictor,
const std::vector<float>& input,
const std::vector<int>& input_shape,
const std::vector<int>& input_im,
const std::vector<int>& input_im_shape,
std::vector<float> *out_data) {
auto input_names = predictor->GetInputNames();
auto input_img = predictor->GetInputTensor(input_names[0]);
input_img->Reshape(input_shape);
input_img->copy_from_cpu(input.data());
auto input_size = predictor->GetInputTensor(input_names[1]);
input_size->Reshape(input_im_shape);
input_size->copy_from_cpu(input_im.data());
CHECK(predictor->ZeroCopyRun());
auto output_names = predictor->GetOutputNames();
// there is only one output of yolov3
auto output_t = predictor->GetOutputTensor(output_names[0]);
std::vector<int> output_shape = output_t->shape();
int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
out_data->resize(out_num);
output_t->copy_to_cpu(out_data->data());
}
int main(int argc, char* argv[]) {
google::ParseCommandLineFlags(&argc, &argv, true);
auto predictor = CreatePredictor();
const int height = 608;
const int width = 608;
const int channels = 3;
std::vector<int> input_shape = {FLAGS_batch_size, channels, height, width};
std::vector<float> input_data(FLAGS_batch_size * channels * height * width, 0);
for (size_t i = 0; i < input_data.size(); ++i) {
input_data[i] = i % 255 * 0.13f;
}
std::vector<int> input_im_shape = {FLAGS_batch_size, 2};
std::vector<int> input_im_data(FLAGS_batch_size * 2, 608);
std::vector<float> out_data;
run(predictor.get(), input_data, input_shape, input_im_data, input_im_shape, &out_data);
LOG(INFO) << "output num is " << out_data.size();
return 0;
}
# Minimal makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SOURCEDIR = .
BUILDDIR = _build
# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
.PHONY: help Makefile
# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
\ No newline at end of file
性能数据
=========
GPU性能
--------------
**测试条件**
- 测试模型
- Mobilenetv1
- Resnet50
- Yolov3
- Unet
- Bert/Ernie
- 测试机器
- P4
- T4
- 测试说明
- 测试Paddle版本:release/1.8
- warmup=10, repeats=1000,统计平均时间,单位ms。
**性能数据**
X86 CPU性能
-------------
**测试条件**
**性能数据**
# -*- coding: utf-8 -*-
#
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config
# -- Path setup --------------------------------------------------------------
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
#sys.path.insert(0, os.path.abspath('.'))
import sphinx_rtd_theme
from recommonmark.parser import CommonMarkParser
from recommonmark.transform import AutoStructify
# -- Project information -----------------------------------------------------
project = u'Paddle-Inference'
copyright = u'2020, Paddle-Inference Developer'
author = u'Paddle-Inference Developer'
# The short X.Y version
version = u'latest'
# The full version, including alpha/beta/rc tags
release = u''
# -- General configuration ---------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['recommonmark', 'sphinx_markdown_tables']
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
source_suffix = ['.rst', '.md']
# The master toctree document.
master_doc = 'index'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store']
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = None
# -- Options for HTML output -------------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
#
# html_sidebars = {}
# -- Options for HTMLHelp output ---------------------------------------------
# Output file base name for HTML help builder.
htmlhelp_basename = 'Paddle-Inference doc'
# -- Options for LaTeX output ------------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'Paddle-Inference.tex', u'Paddle-Inference Documentation',
u'Paddle-Inference Developer', 'manual'),
]
# -- Options for manual page output ------------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, 'paddle-inference',
u'Paddle-Inference Documentation', [author], 1)]
# -- Options for Texinfo output ----------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'Paddle-Inference', u'Paddle-Inference Documentation', author,
'Paddle-Inference', 'One line description of project.', 'Miscellaneous'),
]
# -- Options for Epub output -------------------------------------------------
# Bibliographic Dublin Core info.
epub_title = project
# The unique identifier of the text. This can be a ISBN number
# or the project homepage.
#
# epub_identifier = ''
# A unique identification for the text.
#
# epub_uid = ''
# A list of files that should not be packed into the epub file.
epub_exclude_files = ['search.html']
extensions = [
# there may be others here already, e.g. 'sphinx.ext.mathjax'
'breathe',
'exhale'
]
# Setup the breathe extension
breathe_projects = {"My Project": "./doxyoutput/xml"}
breathe_default_project = "My Project"
# Setup the exhale extension
exhale_args = {
# These arguments are required
"containmentFolder": "./api",
"rootFileName": "library_root.rst",
"rootFileTitle": "Library API",
"doxygenStripFromPath": "..",
# Suggested optional arguments
"createTreeView": True,
# TIP: if using the sphinx-bootstrap-theme, you need
# "treeViewIsBootstrap": True,
"exhaleExecutesDoxygen": True,
"exhaleDoxygenStdin": "INPUT = paddle_include_file"
}
# Tell sphinx what the primary language being documented is.
primary_domain = 'cpp'
# Tell sphinx what the pygments highlight language should be.
highlight_language = 'cpp'
import os
on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
if not on_rtd: # only import and set the theme if we're building docs locally
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
#html_theme = "alabaster"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
<html>
<head>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
extensions: ["tex2jax.js", "TeX/AMSsymbols.js", "TeX/AMSmath.js"],
jax: ["input/TeX", "output/HTML-CSS"],
tex2jax: {
inlineMath: [ ['$','$'] ],
displayMath: [ ['$$','$$'] ],
processEscapes: true
},
"HTML-CSS": { availableFonts: ["TeX"] }
});
</script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.0/MathJax.js" async></script>
<script type="text/javascript" src="../.tools/theme/marked.js">
</script>
<link href="http://cdn.bootcss.com/highlight.js/9.9.0/styles/darcula.min.css" rel="stylesheet">
<script src="http://cdn.bootcss.com/highlight.js/9.9.0/highlight.min.js"></script>
<link href="http://cdn.bootcss.com/bootstrap/4.0.0-alpha.6/css/bootstrap.min.css" rel="stylesheet">
<link href="https://cdn.jsdelivr.net/perfect-scrollbar/0.6.14/css/perfect-scrollbar.min.css" rel="stylesheet">
<link href="../.tools/theme/github-markdown.css" rel='stylesheet'>
</head>
<style type="text/css" >
.markdown-body {
box-sizing: border-box;
min-width: 200px;
max-width: 980px;
margin: 0 auto;
padding: 45px;
}
</style>
<body>
<div id="context" class="container-fluid markdown-body">
</div>
<!-- This block will be replaced by each markdown file content. Please do not change lines below.-->
<div id="markdown" style='display:none'>
</div>
<!-- You can change the lines below now. -->
<script type="text/javascript">
marked.setOptions({
renderer: new marked.Renderer(),
gfm: true,
breaks: false,
smartypants: true,
highlight: function(code, lang) {
code = code.replace(/&amp;/g, "&")
code = code.replace(/&gt;/g, ">")
code = code.replace(/&lt;/g, "<")
code = code.replace(/&nbsp;/g, " ")
return hljs.highlightAuto(code, [lang]).value;
}
});
document.getElementById("context").innerHTML = marked(
document.getElementById("markdown").innerHTML)
</script>
</body>
.. Paddle-Inference documentation master file, created by
sphinx-quickstart on Thu Feb 6 14:11:30 2020.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
################
预测部署
################
Welcome to Paddle-Inference's documentation!
=======================================
.. toctree::
:maxdepth: 1
:caption: 开始使用
:name: sec-introduction
introduction/summary
introduction/quick_start
.. toctree::
:maxdepth: 1
:caption: 使用方法
:name: sec-user-guides
user_guides/tutorial
user_guides/source_compile
user_guides/inference_python_api
user_guides/cxx_api
.. toctree::
:maxdepth: 1
:caption: 性能调优
:name: sec-optimize
optimize/paddle_trt
.. toctree::
:maxdepth: 1
:caption: 工具
:name: sec-tools
tools/visual
tools/x2paddle
.. toctree::
:maxdepth: 1
:caption: Benchmark
:name: sec-benchmark
benchmark/benchmark
.. toctree::
:maxdepth: 2
:caption: API文档
api/library_root
.. toctree::
:maxdepth: 1
:caption: FAQ
introduction/faq
Quick Start
=================
**前提准备**
接下来我们会通过几段Python代码的方式对Paddle Inference使用进行介绍,
为了能够成功运行代码,请您在环境中(Mac, Windows,Linux)安装不低于1.7版本的Paddle,
安装Paddle 请参考 `飞桨官网主页 <https://www.paddlepaddle.org.cn/>`_。
导出预测模型文件
----------------
在模型训练期间,我们通常使用Python来构建模型结构,比如:
.. code:: python
import paddle.fluid as fluid
res = fluid.layers.conv2d(input=data, num_filters=2, filter_size=3, act="relu", param_attr=param_attr)
在模型部署时,我们需要提前将这种Python表示的结构以及参数序列化到磁盘中。那是如何做到的呢?
在模型训练过程中或者模型训练结束后,我们可以通过save_inference_model接口来导出标准化的模型文件。
我们用一个简单的代码例子来展示下导出模型文件的这一过程。
.. code:: python
import paddle
import paddle.fluid as fluid
# 建立一个简单的网络,网络的输入的shape为[batch, 3, 28, 28]
image_shape = [3, 28, 28]
img = fluid.layers.data(name='image', shape=image_shape, dtype='float32', append_batch_size=True)
# 模型包含两个Conv层
conv1 = fluid.layers.conv2d(
input=img,
num_filters=8,
filter_size=3,
stride=2,
padding=1,
groups=1,
act=None,
bias_attr=True)
out = fluid.layers.conv2d(
input=conv1,
num_filters=8,
filter_size=3,
stride=2,
padding=1,
groups=1,
act=None,
bias_attr=True)
place = fluid.CPUPlace()
exe = fluid.Executor(place)
# 创建网络中的参数变量,并初始化参数变量
exe.run(fluid.default_startup_program())
# 如果存在预训练模型
# def if_exist(var):
# return os.path.exists(os.path.join("./ShuffleNet", var.name))
# fluid.io.load_vars(exe, "./pretrained_model", predicate=if_exist)
# 保存模型到model目录中,只保存与输入image和输出与推理相关的部分网络
fluid.io.save_inference_model(dirname='./sample_model', feeded_var_names=['image'], target_vars = [out], executor=exe, model_filename='model', params_filename='params')
该程序运行结束后,会在本目录中生成一个sample_model目录,目录中包含model, params 两个文件,model文件表示模型的结构文件,params表示所有参数的融合文件。
飞桨提供了 **两种标准** 的模型文件,一种为Combined方式, 一种为No-Combined的方式。
- Combined的方式
.. code:: python
fluid.io.save_inference_model(dirname='./sample_model', feeded_var_names=['image'], target_vars = [out], executor=exe, model_filename='model', params_filename='params')
model_filename,params_filename表示要生成的模型结构文件、融合参数文件的名字。
* No-Combined的方式
.. code:: python
fluid.io.save_inference_model(dirname='./sample_model', feeded_var_names=['image'], target_vars = [out], executor=exe)
如果不指定model_filename,params_filename,会在sample_model目录下生成__model__ 模型结构文件,以及一系列的参数文件。
在模型部署期间,**我们更推荐使用Combined的方式**,因为涉及模型上线加密的场景时,这种方式会更友好一些。
加载模型预测
----------------
1)使用load_inference方式
我们可以使用load_inference_model接口加载训练好的模型(以sample_model模型举例),并复用训练框架的前向计算,直接完成推理。
示例程序如下所示:
.. code:: python
import paddle.fluid as fluid
import numpy as np
data = np.ones((1, 3, 28, 28)).astype(np.float32)
exe = fluid.Executor(fluid.CPUPlace())
# 加载Combined的模型需要指定model_filename, params_filename
# 加载No-Combined的模型不需要指定model_filename, params_filename
[inference_program, feed_target_names, fetch_targets] = \
fluid.io.load_inference_model(dirname='sample_model', executor=exe, model_filename='model', params_filename='params')
with fluid.program_guard(inference_program):
results = exe.run(inference_program,
feed={feed_target_names[0]: data},
fetch_list=fetch_targets, return_numpy=False)
print (np.array(results[0]).shape)
# (1, 8, 7, 7)
在上述方式中,在模型加载后会按照执行顺序将所有的OP进行拓扑排序,在运行期间Op会按照排序一一运行,整个过程中运行的为训练中前向的OP,期间不会有任何的优化(OP融合,显存优化,预测Kernel针对优化)。 因此,load_inference_model的方式预测期间很可能不会有很好的性能表现,此方式比较适合用来做实验(测试模型的效果、正确性等)使用,并不适用于真正的部署上线。接下来我们会重点介绍Paddle Inference的使用。
2)使用Paddle Inference API方式
不同于 load_inference_model方式,Paddle Inference 在模型加载后会进行一系列的优化,包括: Kernel优化,OP横向,纵向融合,显存/内存优化,以及MKLDNN,TensorRT的集成等,性能和吞吐会得到大幅度的提升。这些优化会在之后的文档中进行详细的介绍。
那我们先用一个简单的代码例子来介绍Paddle Inference 的使用。
.. code::
from paddle.fluid.core import AnalysisConfig
from paddle.fluid.core import create_paddle_predictor
import numpy as np
# 配置运行信息
# config = AnalysisConfig("./sample_model") # 加载non-combined 模型格式
config = AnalysisConfig("./sample_model/model", "./sample_model/params") # 加载combine的模型格式
config.switch_use_feed_fetch_ops(False)
config.enable_memory_optim()
config.enable_use_gpu(1000, 0)
# 根据config创建predictor
predictor = create_paddle_predictor(config)
img = np.ones((1, 3, 28, 28)).astype(np.float32)
# 准备输入
input_names = predictor.get_input_names()
input_tensor = predictor.get_input_tensor(input_names[0])
input_tensor.reshape(img.shape)
input_tensor.copy_from_cpu(img.copy())
# 运行
predictor.zero_copy_run()
# 获取输出
output_names = predictor.get_output_names()
output_tensor = predictor.get_output_tensor(output_names[0])
output_data = output_tensor.copy_to_cpu()
print (output_data)
上述的代码例子,我们通过加载一个简答模型以及随机输入的方式,展示了如何使用Paddle Inference进行模型预测。可能对于刚接触Paddle Inferenece同学来说,代码中会有一些陌生名词出现,比如AnalysisConfig, Predictor 等。先不要着急,接下来的文章中会对这些概念进行详细的介绍。
**相关链接**
`Python API 使用介绍 <../user_guides/inference_python_api.html>`_
`C++ API使用介绍 <../user_guides/cxx_api.html>`_
`Python 使用样例 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/python>`_
`C++ 使用样例 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B>`_
概述
========
Paddle Inference为飞桨核心框架推理引擎。Paddle Inference功能特性丰富,性能优异,针对不同平台不同的应用场景进行了深度的适配优化,做到高吞吐、低时延,保证了飞桨模型在服务器端即训即用,快速部署。
特性
-------
- 通用性。支持对Paddle训练出的所有模型进行预测。
- 内存/显存复用。在推理初始化阶段,对模型中的OP输出Tensor 进行依赖分析,将两两互不依赖的Tensor在内存/显存空间上进行复用,进而增大计算并行量,提升服务吞吐量。
- 细粒度OP融合。在推理初始化阶段,按照已有的融合模式将模型中的多个OP融合成一个OP,减少了模型的计算量的同时,也减少了 Kernel Launch的次数,从而能提升推理性能。目前Paddle Inference支持的融合模式多达几十个。
- 高性能CPU/GPU Kernel。内置同Intel、Nvidia共同打造的高性能kernel,保证了模型推理高性能的执行。
- 子图集成 `TensorRT <https://developer.nvidia.com/tensorrt>`_。Paddle Inference采用子图的形式集成TensorRT,针对GPU推理场景,TensorRT可对一些子图进行优化,包括OP的横向和纵向融合,过滤冗余的OP,并为OP自动选择最优的kernel,加快推理速度。
- 集成MKLDNN
- 支持加载PaddleSlim量化压缩后的模型。 `PaddleSlim <https://github.com/PaddlePaddle/PaddleSlim>`_ 是飞桨深度学习模型压缩工具,Paddle Inference可联动PaddleSlim,支持加载量化、裁剪和蒸馏后的模型并部署,由此减小模型存储空间、减少计算占用内存、加快模型推理速度。其中在模型量化方面,`Paddle Inference在X86 CPU上做了深度优化 <https://github.com/PaddlePaddle/PaddleSlim/tree/80c9fab3f419880dd19ca6ea30e0f46a2fedf6b3/demo/mkldnn_quant/quant_aware>`_ ,常见分类模型的单线程性能可提升近3倍,ERNIE模型的单线程性能可提升2.68倍。
支持系统及硬件
------------
支持服务器端X86 CPU、NVIDIA GPU芯片,兼容Linux/macOS/Windows系统。
同时也支持NVIDIA Jetson嵌入式平台。
语言支持
------------
- 支持Pyhton语言
- 支持C++ 语言
- 支持Go语言
- 支持R语言
**下一步**
- 如果您刚接触Paddle Inference, 请访问 `Quick start <./quick_start.html>`_。
@ECHO OFF
pushd %~dp0
REM Command file for Sphinx documentation
if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build
if "%1" == "" goto help
%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
goto end
:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS%
:end
popd
使用Paddle-TensorRT库预测
================
NVIDIA TensorRT 是一个高性能的深度学习预测库,可为深度学习推理应用程序提供低延迟和高吞吐量。PaddlePaddle 采用子图的形式对TensorRT进行了集成,即我们可以使用该模块来提升Paddle模型的预测性能。在这篇文章中,我们会介绍如何使用Paddle-TRT子图加速预测。
概述
----------------
当模型加载后,神经网络可以表示为由变量和运算节点组成的计算图。如果我们打开TRT子图模式,在图分析阶段,Paddle会对模型图进行分析同时发现图中可以使用TensorRT优化的子图,并使用TensorRT节点替换它们。在模型的推断期间,如果遇到TensorRT节点,Paddle会调用TensorRT库对该节点进行优化,其他的节点调用Paddle的原生实现。TensorRT除了有常见的OP融合以及显存/内存优化外,还针对性的对OP进行了优化加速实现,降低预测延迟,提升推理吞吐。
目前Paddle-TRT支持静态shape模式以及/动态shape模式。在静态shape模式下支持图像分类,分割,检测模型,同时也支持Fp16, Int8的预测加速。在动态shape模式下,除了对动态shape的图像模型(FCN, Faster rcnn)支持外,同时也对NLP的Bert/Ernie模型也进行了支持。
**Paddle-TRT的现有能力:**
**1)静态shape:**
支持模型:
=========== ============= ========
分类模型 检测模型 分割模型
=========== ============= ========
Mobilenetv1 yolov3 ICNET
Resnet50 SSD UNet
Vgg16 Mask-rcnn FCN
Resnext Faster-rcnn
AlexNet Cascade-rcnn
Se-ResNext Retinanet
GoogLeNet Mobilenet-SSD
DPN
=========== ============= ========
.. |check| raw:: html
<input checked="" type="checkbox">
.. |check_| raw:: html
<input checked="" disabled="" type="checkbox">
.. |uncheck| raw:: html
<input type="checkbox">
.. |uncheck_| raw:: html
<input disabled="" type="checkbox">
Fp16: |check|
Calib Int8: |check|
优化信息序列化: |check|
加载PaddleSlim Int8模型: |check|
**2)动态shape:**
支持模型:
=========== =====
图像 NLP
=========== =====
FCN Bert
Faster_RCNN Ernie
=========== =====
Fp16: |check|
Calib Int8: |uncheck|
优化信息序列化: |uncheck|
加载PaddleSlim Int8模型: |uncheck|
**Note:**
1. 从源码编译时,TensorRT预测库目前仅支持使用GPU编译,且需要设置编译选项TENSORRT_ROOT为TensorRT所在的路径。
2. Windows支持需要TensorRT 版本5.0以上。
3. 使用Paddle-TRT的动态shape输入功能要求TRT的版本在6.0以上。
一:环境准备
-------------
使用Paddle-TRT功能,我们需要准备带TRT的Paddle运行环境,我们提供了以下几种方式:
1)linux下通过pip安装
.. code:: shell
# 该whl包依赖cuda10.1, cudnnv7.6, tensorrt6.0 的lib, 需自行下载安装并设置lib路径到LD_LIBRARY_PATH中
wget https://paddle-inference-dist.bj.bcebos.com/libs/paddlepaddle_gpu-1.8.0-cp27-cp27mu-linux_x86_64.whl
pip install -U paddlepaddle_gpu-1.8.0-cp27-cp27mu-linux_x86_64.whl
如果您想在Nvidia Jetson平台上使用,请点击此 `链接 <https://paddle-inference-dist.cdn.bcebos.com/temp_data/paddlepaddle_gpu-0.0.0-cp36-cp36m-linux_aarch64.whl>`_ 下载whl包,然后通过pip 安装。
2)使用docker镜像
.. code:: shell
# 拉取镜像,该镜像预装Paddle 1.8 Python环境,并包含c++的预编译库,lib存放在主目录~/ 下。
docker pull hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7-trt6
export CUDA_SO="$(\ls /usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
export NVIDIA_SMI="-v /usr/bin/nvidia-smi:/usr/bin/nvidia-smi"
docker run $CUDA_SO $DEVICES $NVIDIA_SMI --name trt_open --privileged --security-opt seccomp=unconfined --net=host -v $PWD:/paddle -it hub.baidubce.com/paddlepaddle/paddle:1.8.0-gpu-cuda10.0-cudnn7-trt6 /bin/bash
3)手动编译
编译的方式请参照 `编译文档 <../user_guides/source_compile.html>`_
**Note1:** cmake 期间请设置 TENSORRT_ROOT (即TRT lib的路径), WITH_PYTHON (是否产出python whl包, 设置为ON)选项。
**Note2:** 编译期间会出现TensorRT相关的错误。
需要手动在 NvInfer.h (trt5) 或 NvInferRuntime.h (trt6) 文件中为 class IPluginFactory 和 class IGpuAllocator 分别添加虚析构函数:
.. code:: c++
virtual ~IPluginFactory() {};
virtual ~IGpuAllocator() {};
需要将 `NvInferRuntime.h` (trt6)中的 **protected: ~IOptimizationProfile() noexcept = default;**
改为
.. code:: c++
virtual ~IOptimizationProfile() noexcept = default;
二:API使用介绍
-----------------
在 `使用流程 <../user_guides/tutorial.html>`_ 一节中,我们了解到Paddle Inference预测包含了以下几个方面:
- 配置推理选项
- 创建predictor
- 准备模型输入
- 模型推理
- 获取模型输出
使用Paddle-TRT 也是遵照这样的流程。我们先用一个简单的例子来介绍这一流程(我们假设您已经对Paddle Inference有一定的了解,如果您刚接触Paddle Inference,请访问 `这里 <../introduction/quick_start>`_ 对Paddle Inference有个初步认识。):
.. code:: python
import numpy as np
from paddle.fluid.core import AnalysisConfig
from paddle.fluid.core import create_paddle_predictor
def create_predictor():
# config = AnalysisConfig("")
config = AnalysisConfig("./resnet50/model", "./resnet50/params")
config.switch_use_feed_fetch_ops(False)
config.enable_memory_optim()
config.enable_use_gpu(1000, 0)
# 打开TensorRT。此接口的详细介绍请见下文
config.enable_tensorrt_engine(workspace_size = 1<<30,
max_batch_size=1, min_subgraph_size=5,
precision_mode=AnalysisConfig.Precision.Float32,
use_static=False, use_calib_mode=False)
predictor = create_paddle_predictor(config)
return predictor
def run(predictor, img):
# 准备输入
input_names = predictor.get_input_names()
for i, name in enumerate(input_names):
input_tensor = predictor.get_input_tensor(name)
input_tensor.reshape(img[i].shape)
input_tensor.copy_from_cpu(img[i].copy())
# 预测
predictor.zero_copy_run()
results = []
# 获取输出
output_names = predictor.get_output_names()
for i, name in enumerate(output_names):
output_tensor = predictor.get_output_tensor(name)
output_data = output_tensor.copy_to_cpu()
results.append(output_data)
return results
if __name__ == '__main__':
pred = create_predictor()
img = np.ones((1, 3, 224, 224)).astype(np.float32)
result = run(pred, [img])
print ("class index: ", np.argmax(result[0][0]))
通过例子我们可以看出,我们通过 `enable_tensorrt_engine` 接口来打开TensorRT选项的。
.. code:: python
config.enable_tensorrt_engine(
workspace_size = 1<<30,
max_batch_size=1, min_subgraph_size=5,
precision_mode=AnalysisConfig.Precision.Float32,
use_static=False, use_calib_mode=False)
接下来让我们看下该接口中各个参数的作用:
- **workspace_size**,类型:int,默认值为1 << 30 (1G)。指定TensorRT使用的工作空间大小,TensorRT会在该大小限制下筛选最优的kernel执行预测运算。
- **max_batch_size**,类型:int,默认值为1。需要提前设置最大的batch大小,运行时batch大小不得超过此限定值。
- **min_subgraph_size**,类型:int,默认值为3。Paddle-TRT是以子图的形式运行,为了避免性能损失,当子图内部节点个数大于 min_subgraph_size 的时候,才会使用Paddle-TRT运行。
- **precision_mode**,类型:**AnalysisConfig.Precision**, 默认值为 **AnalysisConfig.Precision.Float32**。指定使用TRT的精度,支持FP32(Float32),FP16(Half),Int8(Int8)。若需要使用Paddle-TRT int8离线量化校准,需设定precision为 **AnalysisConfig.Precision.Int8** , 且设置 **use_calib_mode** 为True。
- **use_static**,类型:bool, 默认值为False。如果指定为True,在初次运行程序的时候会将TRT的优化信息进行序列化到磁盘上,下次运行时直接加载优化的序列化信息而不需要重新生成。
- **use_calib_mode**,类型:bool, 默认值为False。若要运行Paddle-TRT int8离线量化校准,需要将此选项设置为True。
Int8量化预测
>>>>>>>>>>>>>>
神经网络的参数在一定程度上是冗余的,在很多任务上,我们可以在保证模型精度的前提下,将Float32的模型转换成Int8的模型,从而达到减小计算量降低运算耗时、降低计算内存、降低模型大小的目的。使用Int8量化预测的流程可以分为两步:1)产出量化模型;2)加载量化模型进行Int8预测。下面我们对使用Paddle-TRT进行Int8量化预测的完整流程进行详细介绍。
**1. 产出量化模型**
目前,我们支持通过两种方式产出量化模型:
a. 使用TensorRT自带Int8离线量化校准功能。校准即基于训练好的FP32模型和少量校准数据(如500~1000张图片)生成校准表(Calibration table),预测时,加载FP32模型和此校准表即可使用Int8精度预测。生成校准表的方法如下:
- 指定TensorRT配置时,将 **precision_mode** 设置为 **AnalysisConfig.Precision.Int8** 并且设置 **use_calib_mode** 为 **True**。
.. code:: python
config.enable_tensorrt_engine(
workspace_size=1<<30,
max_batch_size=1, min_subgraph_size=5,
precision_mode=AnalysisConfig.Precision.Int8,
use_static=False, use_calib_mode=True)
- 准备500张左右的真实输入数据,在上述配置下,运行模型。(Paddle-TRT会统计模型中每个tensor值的范围信息,并将其记录到校准表中,运行结束后,会将校准表写入模型目录下的 `_opt_cache` 目录中)
如果想要了解使用TensorRT自带Int8离线量化校准功能生成校准表的完整代码,请参考 `这里 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/paddle-trt/README.md#%E7%94%9F%E6%88%90%E9%87%8F%E5%8C%96%E6%A0%A1%E5%87%86%E8%A1%A8>`_ 的demo。
b. 使用模型压缩工具库PaddleSlim产出量化模型。PaddleSlim支持离线量化和在线量化功能,其中,离线量化与TensorRT离线量化校准原理相似;在线量化又称量化训练(Quantization Aware Training, QAT),是基于较多数据(如>=5000张图片)对预训练模型进行重新训练,使用模拟量化的思想,在训练阶段更新权重,实现减小量化误差的方法。使用PaddleSlim产出量化模型可以参考文档:
- 离线量化 `快速开始教程 <https://paddlepaddle.github.io/PaddleSlim/quick_start/quant_post_tutorial.html>`_
- 离线量化 `API接口说明 <https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#quant-post>`_
- 离线量化 `Demo <https://github.com/PaddlePaddle/PaddleSlim/tree/release/1.1.0/demo/quant/quant_post>`_
- 量化训练 `快速开始教程 <https://paddlepaddle.github.io/PaddleSlim/quick_start/quant_aware_tutorial.html>`_
- 量化训练 `API接口说明 <https://paddlepaddle.github.io/PaddleSlim/api_cn/quantization_api.html#quant-aware>`_
- 量化训练 `Demo <https://github.com/PaddlePaddle/PaddleSlim/tree/release/1.1.0/demo/quant/quant_aware>`_
离线量化的优点是无需重新训练,简单易用,但量化后精度可能受影响;量化训练的优点是模型精度受量化影响较小,但需要重新训练模型,使用门槛稍高。在实际使用中,我们推荐先使用TRT离线量化校准功能生成量化模型,若精度不能满足需求,再使用PaddleSlim产出量化模型。
**2. 加载量化模型进行Int8预测**
加载量化模型进行Int8预测,需要在指定TensorRT配置时,将 **precision_mode** 设置为 **AnalysisConfig.Precision.Int8** 。
若使用的量化模型为TRT离线量化校准产出的,需要将 **use_calib_mode** 设为 **True** :
.. code:: python
config.enable_tensorrt_engine(
workspace_size=1<<30,
max_batch_size=1, min_subgraph_size=5,
precision_mode=AnalysisConfig.Precision.Int8,
use_static=False, use_calib_mode=True)
完整demo请参考 `这里 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/paddle-trt/README.md#%E5%8A%A0%E8%BD%BD%E6%A0%A1%E5%87%86%E8%A1%A8%E6%89%A7%E8%A1%8Cint8%E9%A2%84%E6%B5%8B>`_ 。
若使用的量化模型为PaddleSlim量化产出的,需要将 **use_calib_mode** 设为 **False** :
.. code:: python
config.enable_tensorrt_engine(
workspace_size=1<<30,
max_batch_size=1, min_subgraph_size=5,
precision_mode=AnalysisConfig.Precision.Int8,
use_static=False, use_calib_mode=False)
完整demo请参考 `这里 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B/paddle-trt/README.md#%E4%B8%89%E4%BD%BF%E7%94%A8trt-%E5%8A%A0%E8%BD%BDpaddleslim-int8%E9%87%8F%E5%8C%96%E6%A8%A1%E5%9E%8B%E9%A2%84%E6%B5%8B>`_ 。
运行Dynamic shape
>>>>>>>>>>>>>>
从1.8 版本开始, Paddle对TRT子图进行了Dynamic shape的支持。
使用接口如下:
.. code:: python
config.enable_tensorrt_engine(
workspace_size = 1<<30,
max_batch_size=1, min_subgraph_size=5,
precision_mode=AnalysisConfig.Precision.Float32,
use_static=False, use_calib_mode=False)
min_input_shape = {"image":[1,3, 10, 10]}
max_input_shape = {"image":[1,3, 224, 224]}
opt_input_shape = {"image":[1,3, 100, 100]}
config.set_trt_dynamic_shape_info(min_input_shape, max_input_shape, opt_input_shape)
从上述使用方式来看,在 config.enable_tensorrt_engine 接口的基础上,新加了一个config.set_trt_dynamic_shape_info 的接口。
该接口用来设置模型输入的最小,最大,以及最优的输入shape。 其中,最优的shape处于最小最大shape之间,在预测初始化期间,会根据opt shape对op选择最优的kernel。
调用了 **config.set_trt_dynamic_shape_info** 接口,预测器会运行TRT子图的动态输入模式,运行期间可以接受最小,最大shape间的任意的shape的输入数据。
三:测试样例
-------------
我们在github上提供了使用TRT子图预测的更多样例:
- Python 样例请访问此处 `链接 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/python/paddle_trt>`_ 。
- C++ 样例地址请访问此处 `链接 <https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master/c%2B%2B>`_ 。
四:Paddle-TRT子图运行原理
---------------
PaddlePaddle采用子图的形式对TensorRT进行集成,当模型加载后,神经网络可以表示为由变量和运算节点组成的计算图。Paddle TensorRT实现的功能是对整个图进行扫描,发现图中可以使用TensorRT优化的子图,并使用TensorRT节点替换它们。在模型的推断期间,如果遇到TensorRT节点,Paddle会调用TensorRT库对该节点进行优化,其他的节点调用Paddle的原生实现。TensorRT在推断期间能够进行Op的横向和纵向融合,过滤掉冗余的Op,并对特定平台下的特定的Op选择合适的kernel等进行优化,能够加快模型的预测速度。
下图使用一个简单的模型展示了这个过程:
**原始网络**
.. image:: https://raw.githubusercontent.com/NHZlX/FluidDoc/add_trt_doc/doc/fluid/user_guides/howto/inference/image/model_graph_original.png
**转换的网络**
.. image:: https://raw.githubusercontent.com/NHZlX/FluidDoc/add_trt_doc/doc/fluid/user_guides/howto/inference/image/model_graph_trt.png
我们可以在原始模型网络中看到,绿色节点表示可以被TensorRT支持的节点,红色节点表示网络中的变量,黄色表示Paddle只能被Paddle原生实现执行的节点。那些在原始网络中的绿色节点被提取出来汇集成子图,并由一个TensorRT节点代替,成为转换后网络中的 **block-25** 节点。在网络运行过程中,如果遇到该节点,Paddle将调用TensorRT库来对其执行。
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
///
/// \file paddle_analysis_config.h
///
/// \brief Paddle Analysis Config API信息
///
/// \author paddle-infer@baidu.com
/// \date 2020-03-20
/// \since 1.7
///
#pragma once
#include <cassert>
#include <map>
#include <memory>
#include <string>
#include <unordered_set>
#include <utility>
#include <vector>
/*! \file */
// Here we include some header files with relative paths, for that in deploy,
// the abstract path of this header file will be changed.
#include "paddle_api.h" // NOLINT
#include "paddle_pass_builder.h" // NOLINT
#ifdef PADDLE_WITH_MKLDNN
#include "paddle_mkldnn_quantizer_config.h" // NOLINT
#endif
namespace paddle {
class AnalysisPredictor;
struct MkldnnQuantizerConfig;
///
/// \brief configuration manager for AnalysisPredictor.
/// \since 1.7.0
///
/// AnalysisConfig manages configurations of AnalysisPredictor.
/// During inference procedure, there are many parameters(model/params path,
/// place of inference, etc.)
/// to be specified, and various optimizations(subgraph fusion, memory
/// optimazation, TensorRT engine, etc.)
/// to be done. Users can manage these settings by creating and modifying an
/// AnalysisConfig,
/// and loading it into AnalysisPredictor.
///
struct AnalysisConfig {
AnalysisConfig() = default;
///
/// \brief Construct a new AnalysisConfig from another
/// AnalysisConfig.
///
/// \param[in] other another AnalysisConfig
///
explicit AnalysisConfig(const AnalysisConfig& other);
///
/// \brief Construct a new AnalysisConfig from a no-combined model.
///
/// \param[in] model_dir model directory of the no-combined model.
///
explicit AnalysisConfig(const std::string& model_dir);
///
/// \brief Construct a new AnalysisConfig from a combined model.
///
/// \param[in] prog_file model file path of the combined model.
/// \param[in] params_file params file path of the combined model.
///
explicit AnalysisConfig(const std::string& prog_file,
const std::string& params_file);
///
/// \brief Precision of inference in TensorRT.
///
enum class Precision {
kFloat32 = 0, ///< fp32
kInt8, ///< int8
kHalf, ///< fp16
};
///
/// \brief Set the no-combined model dir path.
///
/// \param model_dir model dir path.
///
void SetModel(const std::string& model_dir) { model_dir_ = model_dir; }
///
/// \brief Set the combined model with two specific pathes for program and
/// parameters.
///
/// \param prog_file_path model file path of the combined model.
/// \param params_file_path params file path of the combined model.
///
void SetModel(const std::string& prog_file_path,
const std::string& params_file_path);
///
/// \brief Set the model file path of a combined model.
///
/// \param x model file path.
///
void SetProgFile(const std::string& x) { prog_file_ = x; }
///
/// \brief Set the params file path of a combined model.
///
/// \param x params file path.
///
void SetParamsFile(const std::string& x) { params_file_ = x; }
///
/// \brief Set the path of optimization cache directory.
///
/// \param opt_cache_dir the path of optimization cache directory.
///
void SetOptimCacheDir(const std::string& opt_cache_dir) {
opt_cache_dir_ = opt_cache_dir;
}
///
/// \brief Get the model directory path.
///
/// \return const std::string& The model directory path.
///
const std::string& model_dir() const { return model_dir_; }
///
/// \brief Get the program file path.
///
/// \return const std::string& The program file path.
///
const std::string& prog_file() const { return prog_file_; }
///
/// \brief Get the combined parameters file.
///
/// \return const std::string& The combined parameters file.
///
const std::string& params_file() const { return params_file_; }
// Padding related.
///
/// \brief Turn off FC Padding.
///
///
void DisableFCPadding();
///
/// \brief A boolean state telling whether fc padding is used.
///
/// \return bool Whether fc padding is used.
///
bool use_fc_padding() const { return use_fc_padding_; }
// GPU related.
///
/// \brief Turn on GPU.
///
/// \param memory_pool_init_size_mb initial size of the GPU memory pool in MB.
/// \param device_id device_id the GPU card to use (default is 0).
///
void EnableUseGpu(uint64_t memory_pool_init_size_mb, int device_id = 0);
///
/// \brief Turn off GPU.
///
///
void DisableGpu();
///
/// \brief A boolean state telling whether the GPU is turned on.
///
/// \return bool Whether the GPU is turned on.
///
bool use_gpu() const { return use_gpu_; }
///
/// \brief Get the GPU device id.
///
/// \return int The GPU device id.
///
int gpu_device_id() const { return device_id_; }
///
/// \brief Get the initial size in MB of the GPU memory pool.
///
/// \return int The initial size in MB of the GPU memory pool.
///
int memory_pool_init_size_mb() const { return memory_pool_init_size_mb_; }
///
/// \brief Get the proportion of the initial memory pool size compared to the
/// device.
///
/// \return float The proportion of the initial memory pool size.
///
float fraction_of_gpu_memory_for_pool() const;
// CUDNN related.
///
/// \brief Turn on CUDNN.
///
///
void EnableCUDNN();
///
/// \brief A boolean state telling whether to use CUDNN.
///
/// \return bool Whether to use CUDNN.
///
bool cudnn_enabled() const { return use_cudnn_; }
///
/// \brief Control whether to perform IR graph optimization.
/// If turned off, the AnalysisConfig will act just like a NativeConfig.
///
/// \param x Whether the ir graph optimization is actived.
///
void SwitchIrOptim(int x = true) { enable_ir_optim_ = x; }
///
/// \brief A boolean state telling whether the ir graph optimization is
/// actived.
///
/// \return bool Whether to use ir graph optimization.
///
bool ir_optim() const { return enable_ir_optim_; }
///
/// \brief INTERNAL Determine whether to use the feed and fetch operators.
/// Just for internal development, not stable yet.
/// When ZeroCopyTensor is used, this should be turned off.
///
/// \param x Whether to use the feed and fetch operators.
///
void SwitchUseFeedFetchOps(int x = true) { use_feed_fetch_ops_ = x; }
///
/// \brief A boolean state telling whether to use the feed and fetch
/// operators.
///
/// \return bool Whether to use the feed and fetch operators.
///
bool use_feed_fetch_ops_enabled() const { return use_feed_fetch_ops_; }
///
/// \brief Control whether to specify the inputs' names.
/// The ZeroCopyTensor type has a name member, assign it with the
/// corresponding
/// variable name. This is used only when the input ZeroCopyTensors passed to
/// the
/// AnalysisPredictor.ZeroCopyRun() cannot follow the order in the training
/// phase.
///
/// \param x Whether to specify the inputs' names.
///
void SwitchSpecifyInputNames(bool x = true) { specify_input_name_ = x; }
///
/// \brief A boolean state tell whether the input ZeroCopyTensor names
/// specified should
/// be used to reorder the inputs in AnalysisPredictor.ZeroCopyRun().
///
/// \return bool Whether to specify the inputs' names.
///
bool specify_input_name() const { return specify_input_name_; }
///
/// \brief Turn on the TensorRT engine.
/// The TensorRT engine will accelerate some subgraphes in the original Fluid
/// computation graph. In some models such as resnet50, GoogleNet and so on,
/// it gains significant performance acceleration.
///
/// \param workspace_size The memory size(in byte) used for TensorRT
/// workspace.
/// \param max_batch_size The maximum batch size of this prediction task,
/// better set as small as possible for less performance loss.
/// \param min_subgrpah_size The minimum TensorRT subgraph size needed, if a
/// subgraph is smaller than this, it will not be transferred to TensorRT
/// engine.
/// \param precision The precision used in TensorRT.
/// \param use_static Serialize optimization information to disk for reusing.
/// \param use_calib_mode Use TRT int8 calibration(post training
/// quantization).
///
///
void EnableTensorRtEngine(int workspace_size = 1 << 20,
int max_batch_size = 1, int min_subgraph_size = 3,
Precision precision = Precision::kFloat32,
bool use_static = false,
bool use_calib_mode = true);
///
/// \brief A boolean state telling whether the TensorRT engine is used.
///
/// \return bool Whether the TensorRT engine is used.
///
bool tensorrt_engine_enabled() const { return use_tensorrt_; }
///
/// \brief Set min, max, opt shape for TensorRT Dynamic shape mode.
/// \param min_input_shape The min input shape of the subgraph input.
/// \param max_input_shape The max input shape of the subgraph input.
/// \param opt_input_shape The opt input shape of the subgraph input.
/// \param disable_trt_plugin_fp16 Setting this parameter to true means that
/// TRT plugin will not run fp16.
///
void SetTRTDynamicShapeInfo(
std::map<std::string, std::vector<int>> min_input_shape,
std::map<std::string, std::vector<int>> max_input_shape,
std::map<std::string, std::vector<int>> optim_input_shape,
bool disable_trt_plugin_fp16 = false);
///
/// \brief Turn on the usage of Lite sub-graph engine.
///
/// \param precision_mode Precion used in Lite sub-graph engine.
/// \param passes_filter Set the passes used in Lite sub-graph engine.
/// \param ops_filter Operators not supported by Lite.
///
void EnableLiteEngine(
AnalysisConfig::Precision precision_mode = Precision::kFloat32,
const std::vector<std::string>& passes_filter = {},
const std::vector<std::string>& ops_filter = {});
///
/// \brief A boolean state indicating whether the Lite sub-graph engine is
/// used.
///
/// \return bool whether the Lite sub-graph engine is used.
///
bool lite_engine_enabled() const { return use_lite_; }
///
/// \brief Control whether to debug IR graph analysis phase.
/// This will generate DOT files for visualizing the computation graph after
/// each analysis pass applied.
///
/// \param x whether to debug IR graph analysis phase.
///
void SwitchIrDebug(int x = true);
///
/// \brief Turn on MKLDNN.
///
///
void EnableMKLDNN();
///
/// \brief Set the cache capacity of different input shapes for MKLDNN.
/// Default value 0 means not caching any shape.
///
/// \param capacity The cache capacity.
///
void SetMkldnnCacheCapacity(int capacity);
///
/// \brief A boolean state telling whether to use the MKLDNN.
///
/// \return bool Whether to use the MKLDNN.
///
bool mkldnn_enabled() const { return use_mkldnn_; }
///
/// \brief Set the number of cpu math library threads.
///
/// \param cpu_math_library_num_threads The number of cpu math library
/// threads.
///
void SetCpuMathLibraryNumThreads(int cpu_math_library_num_threads);
///
/// \brief An int state telling how many threads are used in the CPU math
/// library.
///
/// \return int The number of threads used in the CPU math library.
///
int cpu_math_library_num_threads() const {
return cpu_math_library_num_threads_;
}
///
/// \brief Transform the AnalysisConfig to NativeConfig.
///
/// \return NativeConfig The NativeConfig transformed.
///
NativeConfig ToNativeConfig() const;
///
/// \brief Specify the operator type list to use MKLDNN acceleration.
///
/// \param op_list The operator type list.
///
void SetMKLDNNOp(std::unordered_set<std::string> op_list) {
mkldnn_enabled_op_types_ = op_list;
}
///
/// \brief Turn on MKLDNN quantization.
///
///
void EnableMkldnnQuantizer();
///
/// \brief A boolean state telling whether the MKLDNN quantization is enabled.
///
/// \return bool Whether the MKLDNN quantization is enabled.
///
bool mkldnn_quantizer_enabled() const { return use_mkldnn_quantizer_; }
///
/// \brief Get MKLDNN quantizer config.
///
/// \return MkldnnQuantizerConfig* MKLDNN quantizer config.
///
MkldnnQuantizerConfig* mkldnn_quantizer_config() const;
///
/// \brief Specify the memory buffer of program and parameter.
/// Used when model and params are loaded directly from memory.
///
/// \param prog_buffer The memory buffer of program.
/// \param prog_buffer_size The size of the model data.
/// \param params_buffer The memory buffer of the combined parameters file.
/// \param params_buffer_size The size of the combined parameters data.
///
void SetModelBuffer(const char* prog_buffer, size_t prog_buffer_size,
const char* params_buffer, size_t params_buffer_size);
///
/// \brief A boolean state telling whether the model is set from the CPU
/// memory.
///
/// \return bool Whether model and params are loaded directly from memory.
///
bool model_from_memory() const { return model_from_memory_; }
///
/// \brief Turn on memory optimize
/// NOTE still in development.
///
void EnableMemoryOptim();
///
/// \brief A boolean state telling whether the memory optimization is
/// activated.
///
/// \return bool Whether the memory optimization is activated.
///
bool enable_memory_optim() const;
///
/// \brief Turn on profiling report.
/// If not turned on, no profiling report will be generated.
///
void EnableProfile();
///
/// \brief A boolean state telling whether the profiler is activated.
///
/// \return bool Whether the profiler is activated.
///
bool profile_enabled() const { return with_profile_; }
///
/// \brief Mute all logs in Paddle inference.
///
void DisableGlogInfo();
///
/// \brief A boolean state telling whether logs in Paddle inference are muted.
///
/// \return bool Whether logs in Paddle inference are muted.
///
bool glog_info_disabled() const { return !with_glog_info_; }
///
/// \brief Set the AnalysisConfig to be invalid.
/// This is to ensure that an AnalysisConfig can only be used in one
/// AnalysisPredictor.
///
void SetInValid() const { is_valid_ = false; }
///
/// \brief A boolean state telling whether the AnalysisConfig is valid.
///
/// \return bool Whether the AnalysisConfig is valid.
///
bool is_valid() const { return is_valid_; }
friend class ::paddle::AnalysisPredictor;
///
/// \brief Get a pass builder for customize the passes in IR analysis phase.
/// NOTE: Just for developer, not an official API, easy to be broken.
///
///
PassStrategy* pass_builder() const;
void PartiallyRelease();
protected:
// Update the config.
void Update();
std::string SerializeInfoCache();
protected:
// Model pathes.
std::string model_dir_;
mutable std::string prog_file_;
mutable std::string params_file_;
// GPU related.
bool use_gpu_{false};
int device_id_{0};
uint64_t memory_pool_init_size_mb_{100}; // initial size is 100MB.
bool use_cudnn_{false};
// Padding related
bool use_fc_padding_{true};
// TensorRT related.
bool use_tensorrt_{false};
// For workspace_size, refer it from here:
// https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting
int tensorrt_workspace_size_{1 << 30};
// While TensorRT allows an engine optimized for a given max batch size
// to run at any smaller size, the performance for those smaller
// sizes may not be as well-optimized. Therefore, Max batch is best
// equivalent to the runtime batch size.
int tensorrt_max_batchsize_{1};
// We transform the Ops that can be converted into TRT layer in the model,
// and aggregate these Ops into subgraphs for TRT execution.
// We set this variable to control the minimum number of nodes in the
// subgraph, 3 as default value.
int tensorrt_min_subgraph_size_{3};
Precision tensorrt_precision_mode_{Precision::kFloat32};
bool trt_use_static_engine_{false};
bool trt_use_calib_mode_{true};
std::map<std::string, std::vector<int>> min_input_shape_{};
std::map<std::string, std::vector<int>> max_input_shape_{};
std::map<std::string, std::vector<int>> optim_input_shape_{};
bool disable_trt_plugin_fp16_{false};
// memory reuse related.
bool enable_memory_optim_{false};
bool use_mkldnn_{false};
std::unordered_set<std::string> mkldnn_enabled_op_types_;
bool model_from_memory_{false};
bool enable_ir_optim_{true};
bool use_feed_fetch_ops_{true};
bool ir_debug_{false};
bool specify_input_name_{false};
int cpu_math_library_num_threads_{1};
bool with_profile_{false};
bool with_glog_info_{true};
// A runtime cache, shouldn't be transferred to others.
std::string serialized_info_cache_;
mutable std::unique_ptr<PassStrategy> pass_builder_;
bool use_lite_{false};
std::vector<std::string> lite_passes_filter_;
std::vector<std::string> lite_ops_filter_;
Precision lite_precision_mode_;
// mkldnn related.
int mkldnn_cache_capacity_{0};
bool use_mkldnn_quantizer_{false};
std::shared_ptr<MkldnnQuantizerConfig> mkldnn_quantizer_config_;
// If the config is already used on a predictor, it becomes invalid.
// Any config can only be used with one predictor.
// Variables held by config can take up a lot of memory in some cases.
// So we release the memory when the predictor is set up.
mutable bool is_valid_{true};
std::string opt_cache_dir_;
};
} // namespace paddle
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
/*! \file paddle_api.h
*/
/*! \mainpage Paddle Inference APIs
* \section intro_sec Introduction
* The Paddle inference library aims to offer an high performance inference SDK
* for Paddle users.
*/
#include <cassert>
#include <map>
#include <memory>
#include <string>
#include <vector>
/*! \namespace paddle
*/
namespace paddle {
/// \brief Paddle data type.
enum PaddleDType {
FLOAT32,
INT64,
INT32,
UINT8,
// TODO(Superjomn) support more data types if needed.
};
/// \brief Memory manager for PaddleTensor.
///
/// The PaddleBuf holds a buffer for data input or output. The memory can be
/// allocated by user or by PaddleBuf itself, but in any case, the PaddleBuf
/// should be reused for better performance.
///
/// For user allocated memory, the following API can be used:
/// - PaddleBuf(void* data, size_t length) to set an external memory by
/// specifying the memory address and length.
/// - Reset(void* data, size_t length) to reset the PaddleBuf with an external
/// memory.
/// ATTENTION, for user allocated memory, deallocation should be done by users
/// externally after the program finished. The PaddleBuf won't do any allocation
/// or deallocation.
///
/// To have the PaddleBuf allocate and manage the memory:
/// - PaddleBuf(size_t length) will allocate a memory of size `length`.
/// - Resize(size_t length) resize the memory to no less than `length`,
/// ATTENTION
/// if the allocated memory is larger than `length`, nothing will done.
///
/// Usage:
///
/// Let PaddleBuf manage the memory internally.
/// \code{cpp}
/// const int num_elements = 128;
/// PaddleBuf buf(num_elements/// sizeof(float));
/// \endcode
///
/// Or
/// \code{cpp}
/// PaddleBuf buf;
/// buf.Resize(num_elements/// sizeof(float));
/// \endcode
/// Works the exactly the same.
///
/// One can also make the `PaddleBuf` use the external memory.
/// \code{cpp}
/// PaddleBuf buf;
/// void* external_memory = new float[num_elements];
/// buf.Reset(external_memory, num_elements*sizeof(float));
/// ...
/// delete[] external_memory; // manage the memory lifetime outside.
/// \endcode
///
class PaddleBuf {
public:
///
/// \brief PaddleBuf allocate memory internally, and manage it.
///
/// \param[in] length The length of data.
///
explicit PaddleBuf(size_t length)
: data_(new char[length]), length_(length), memory_owned_(true) {}
///
/// \brief Set external memory, the PaddleBuf won't manage it.
///
/// \param[in] data The start address of the external memory.
/// \param[in] length The length of data.
///
PaddleBuf(void* data, size_t length)
: data_(data), length_(length), memory_owned_{false} {}
///
/// \brief Copy only available when memory is managed externally.
///
/// \param[in] other another `PaddleBuf`
///
explicit PaddleBuf(const PaddleBuf& other);
///
/// \brief Resize the memory.
///
/// \param[in] length The length of data.
///
void Resize(size_t length);
///
/// \brief Reset to external memory, with address and length set.
///
/// \param[in] data The start address of the external memory.
/// \param[in] length The length of data.
///
void Reset(void* data, size_t length);
///
/// \brief Tell whether the buffer is empty.
///
bool empty() const { return length_ == 0; }
///
/// \brief Get the data's memory address.
///
void* data() const { return data_; }
///
/// \brief Get the memory length.
///
size_t length() const { return length_; }
~PaddleBuf() { Free(); }
PaddleBuf& operator=(const PaddleBuf&);
PaddleBuf& operator=(PaddleBuf&&);
PaddleBuf() = default;
PaddleBuf(PaddleBuf&& other);
private:
void Free();
void* data_{nullptr}; ///< pointer to the data memory.
size_t length_{0}; ///< number of memory bytes.
bool memory_owned_{true};
};
///
/// \brief Basic input and output data structure for PaddlePredictor.
///
struct PaddleTensor {
PaddleTensor() = default;
std::string name; ///< variable name.
std::vector<int> shape;
PaddleBuf data; ///< blob of data.
PaddleDType dtype;
std::vector<std::vector<size_t>> lod; ///< Tensor+LoD equals LoDTensor
};
enum class PaddlePlace { kUNK = -1, kCPU, kGPU };
/// \brief Represents an n-dimensional array of values.
/// The ZeroCopyTensor is used to store the input or output of the network.
/// Zero copy means that the tensor supports direct copy of host or device data
/// to device,
/// eliminating additional CPU copy. ZeroCopyTensor is only used in the
/// AnalysisPredictor.
/// It is obtained through PaddlePredictor::GetinputTensor()
/// and PaddlePredictor::GetOutputTensor() interface.
class ZeroCopyTensor {
public:
/// \brief Reset the shape of the tensor.
/// Generally it's only used for the input tensor.
/// Reshape must be called before calling mutable_data() or copy_from_cpu()
/// \param shape The shape to set.
void Reshape(const std::vector<int>& shape);
/// \brief Get the memory pointer in CPU or GPU with specific data type.
/// Please Reshape the tensor first before call this.
/// It's usually used to get input data pointer.
/// \param place The place of the tensor.
template <typename T>
T* mutable_data(PaddlePlace place);
/// \brief Get the memory pointer directly.
/// It's usually used to get the output data pointer.
/// \param[out] place To get the device type of the tensor.
/// \param[out] size To get the data size of the tensor.
/// \return The tensor data buffer pointer.
template <typename T>
T* data(PaddlePlace* place, int* size) const;
/// \brief Copy the host memory to tensor data.
/// It's usually used to set the input tensor data.
/// \param data The pointer of the data, from which the tensor will copy.
template <typename T>
void copy_from_cpu(const T* data);
/// \brief Copy the tensor data to the host memory.
/// It's usually used to get the output tensor data.
/// \param[out] data The tensor will copy the data to the address.
template <typename T>
void copy_to_cpu(T* data);
/// \brief Return the shape of the Tensor.
std::vector<int> shape() const;
/// \brief Set lod info of the tensor.
/// More about LOD can be seen here:
/// https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#lodtensor
/// \param x the lod info.
void SetLoD(const std::vector<std::vector<size_t>>& x);
/// \brief Return the lod info of the tensor.
std::vector<std::vector<size_t>> lod() const;
/// \brief Return the name of the tensor.
const std::string& name() const { return name_; }
void SetPlace(PaddlePlace place, int device = -1) {
place_ = place;
device_ = device;
}
/// \brief Return the data type of the tensor.
/// It's usually used to get the output tensor data type.
/// \return The data type of the tensor.
PaddleDType type() const;
protected:
explicit ZeroCopyTensor(void* scope) : scope_{scope} {}
void SetName(const std::string& name) { name_ = name; }
void* FindTensor() const;
private:
std::string name_;
bool input_or_output_;
friend class AnalysisPredictor;
void* scope_{nullptr};
// The corresponding tensor pointer inside Paddle workspace is cached for
// performance.
mutable void* tensor_{nullptr};
PaddlePlace place_;
PaddleDType dtype_;
int device_;
};
/// \brief A Predictor for executing inference on a model.
/// Base class for AnalysisPredictor and NativePaddlePredictor.
class PaddlePredictor {
public:
struct Config;
PaddlePredictor() = default;
PaddlePredictor(const PaddlePredictor&) = delete;
PaddlePredictor& operator=(const PaddlePredictor&) = delete;
/// \brief This interface takes input and runs the network.
/// There are redundant copies of data between hosts in this operation,
/// so it is more recommended to use the zecopyrun interface
/// \param[in] inputs An list of PaddleTensor as the input to the network.
/// \param[out] output_data Pointer to the tensor list, which holds the output
/// paddletensor
/// \param[in] batch_size This setting has been discarded and can be ignored.
/// \return Whether the run is successful
virtual bool Run(const std::vector<PaddleTensor>& inputs,
std::vector<PaddleTensor>* output_data,
int batch_size = -1) = 0;
/// \brief Used to get the name of the network input.
/// Be inherited by AnalysisPredictor, Only used in ZeroCopy scenarios.
/// \return Input tensor names.
virtual std::vector<std::string> GetInputNames() { return {}; }
/// \brief Get the input shape of the model.
/// \return A map contains all the input names and shape defined in the model.
virtual std::map<std::string, std::vector<int64_t>> GetInputTensorShape() {
return {};
}
/// \brief Used to get the name of the network output.
/// Be inherited by AnalysisPredictor, Only used in ZeroCopy scenarios.
/// \return Output tensor names.
virtual std::vector<std::string> GetOutputNames() { return {}; }
/// \brief Get the input ZeroCopyTensor by name.
/// Be inherited by AnalysisPredictor, Only used in ZeroCopy scenarios.
/// The name is obtained from the GetInputNames() interface.
/// \param name The input tensor name.
/// \return Return the corresponding input ZeroCopyTensor.
virtual std::unique_ptr<ZeroCopyTensor> GetInputTensor(
const std::string& name) {
return nullptr;
}
/// \brief Get the output ZeroCopyTensor by name.
/// Be inherited by AnalysisPredictor, Only used in ZeroCopy scenarios.
/// The name is obtained from the GetOutputNames() interface.
/// \param name The output tensor name.
/// \return Return the corresponding output ZeroCopyTensor.
virtual std::unique_ptr<ZeroCopyTensor> GetOutputTensor(
const std::string& name) {
return nullptr;
}
/// \brief Run the network with zero-copied inputs and outputs.
/// Be inherited by AnalysisPredictor and only used in ZeroCopy scenarios.
/// This will save the IO copy for transfering inputs and outputs to predictor
/// workspace
/// and get some performance improvement.
/// To use it, one should call the AnalysisConfig.SwitchUseFeedFetchOp(true)
/// and then use the `GetInputTensor` and `GetOutputTensor`
/// to directly write or read the input/output tensors.
/// \return Whether the run is successful
virtual bool ZeroCopyRun() { return false; }
/// \brief Clone an existing predictor
/// When using clone, the same network will be created,
/// and the parameters between them are shared.
/// \return unique_ptr which contains the pointer of predictor
virtual std::unique_ptr<PaddlePredictor> Clone() = 0;
/// \brief Destroy the Predictor.
virtual ~PaddlePredictor() = default;
virtual std::string GetSerializedProgram() const {
assert(false); // Force raise error.
return "NotImplemented";
}
/// \brief Base class for NativeConfig and AnalysisConfig.
struct Config {
std::string model_dir; /*!< path to the model directory. */
};
};
///
/// \brief configuration manager for `NativePredictor`.
///
/// `AnalysisConfig` manages configurations of `NativePredictor`.
/// During inference procedure, there are many parameters(model/params path,
/// place of inference, etc.)
///
struct NativeConfig : public PaddlePredictor::Config {
/// GPU related fields.
bool use_gpu{false};
int device{0};
float fraction_of_gpu_memory{
-1.f}; ///< Change to a float in (0,1] if needed.
std::string prog_file;
std::string
param_file; ///< Specify the exact path of program and parameter files.
bool specify_input_name{false}; ///< Specify the variable's name of each
///< input if input tensors don't follow the
///< `feeds` and `fetches` of the phase
///< `save_inference_model`.
/// Set and get the number of cpu math library threads.
void SetCpuMathLibraryNumThreads(int cpu_math_library_num_threads) {
cpu_math_library_num_threads_ = cpu_math_library_num_threads;
}
int cpu_math_library_num_threads() const {
return cpu_math_library_num_threads_;
}
protected:
int cpu_math_library_num_threads_{1}; ///< number of cpu math library (such
///< as MKL, OpenBlas) threads for each
///< instance.
};
///
/// \brief A factory to help create different predictors.
///
/// Usage:
///
/// \code{.cpp}
/// NativeConfig config;
/// ... // change the configs.
/// auto native_predictor = CreatePaddlePredictor(config);
/// \endcode
///
/// FOR EXTENSION DEVELOPER:
/// Different predictors are designated by config type. Similar configs can be
/// merged, but there shouldn't be a huge config containing different fields for
/// more than one kind of predictors.
////
template <typename ConfigT>
std::unique_ptr<PaddlePredictor> CreatePaddlePredictor(const ConfigT& config);
/// NOTE The following APIs are too trivial, we will discard it in the following
/// versions.
///
enum class PaddleEngineKind {
kNative = 0, ///< Use the native Fluid facility.
kAutoMixedTensorRT, ///< Automatically mix Fluid with TensorRT.
kAnalysis, ///< More optimization.
};
template <typename ConfigT, PaddleEngineKind engine>
std::unique_ptr<PaddlePredictor> CreatePaddlePredictor(const ConfigT& config);
int PaddleDtypeSize(PaddleDType dtype);
std::string get_version();
} // namespace paddle
/* Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */
/*
* This file contains the definition of a simple Inference API for Paddle.
*
* ATTENTION: It requires some C++11 features, for lower version C++ or C, we
* might release another API.
*/
#pragma once
#include <cassert>
#include <memory>
#include <string>
#include <vector>
#include "paddle_analysis_config.h" // NOLINT
#include "paddle_api.h" // NOLINT
// Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
///
/// \file paddle_mkldnn_quantizer_config.h
///
/// \brief Mkldnn quantizer config.
///
/// \author paddle-infer@baidu.com
/// \date 2020-01-01
/// \since 1.7.0
///
#pragma once
#include <cassert>
#include <map>
#include <memory>
#include <string>
#include <unordered_set>
#include <vector>
#include "paddle_api.h" // NOLINT
namespace paddle {
///
/// \brief Algorithms for finding scale of quantized Tensors.
///
enum class ScaleAlgo {
NONE, ///< Do not compute scale
MAX, ///< Find scale based on the max absolute value
MAX_CH, ///< Find scale based on the max absolute value per output channel
MAX_CH_T, ///< Find scale based on the max absolute value per output channel
///< of a transposed tensor
KL, ///< Find scale based on KL Divergence
};
///
/// \class MkldnnQuantizerConfig
///
/// \brief Config for mkldnn quantize.
///
/// The MkldnnQuantizerConfig is used to configure Mkldnn's quantization
/// parameters, including scale algorithm, warmup data, warmup batch size,
/// quantized op list, etc.
///
/// It is not recommended to use this config directly, please refer to
/// AnalysisConfig::mkldnn_quantizer_config()
///
struct MkldnnQuantizerConfig {
///
/// \brief Construct a new Mkldnn Quantizer Config object
///
MkldnnQuantizerConfig();
///
/// \brief Set the scale algo
///
/// Specify a quantization algorithm for a connection (input/output) of the
/// operator type.
/// \param[in] op_type_name the operator's name.
/// \param[in] conn_name name of the connection (input/output) of the
/// operator.
/// \param[in] algo the algorithm for computing scale.
///
void SetScaleAlgo(std::string op_type_name, std::string conn_name,
ScaleAlgo algo) {
rules_[op_type_name][conn_name] = algo;
}
///
/// \brief Get the scale algo
///
/// Get the quantization algorithm for a connection (input/output) of the
/// operator type.
///
/// \param[in] op_type_name the operator's name.
/// \param[in] conn_name name of the connection (input/output) of the
/// operator.
/// \return the scale algo.
///
ScaleAlgo scale_algo(const std::string& op_type_name,
const std::string& conn_name) const;
///
/// \brief Set the warmup data
///
/// Set the batch of data to be used for warm-up iteration.
///
/// \param[in] data batch of data.
///
void SetWarmupData(std::shared_ptr<std::vector<PaddleTensor>> data) {
warmup_data_ = data;
}
///
/// \brief Get the warmup data
///
/// Get the batch of data used for warm-up iteration.
///
/// \return the warm up data
///
std::shared_ptr<std::vector<PaddleTensor>> warmup_data() const {
return warmup_data_;
}
///
/// \brief Set the warmup batch size
///
/// Set the batch size for warm-up iteration.
///
/// \param[in] batch_size warm-up batch size
///
void SetWarmupBatchSize(int batch_size) { warmup_bs_ = batch_size; }
///
/// \brief Get the warmup batch size
///
/// Get the batch size for warm-up iteration.
///
/// \return the warm up batch size
int warmup_batch_size() const { return warmup_bs_; }
///
/// \brief Set quantized op list
///
/// In the quantization process, set the op list that supports quantization
///
/// \param[in] op_list List of quantized ops
///
void SetEnabledOpTypes(std::unordered_set<std::string> op_list) {
enabled_op_types_ = op_list;
}
///
/// \brief Get quantized op list
///
/// \return list of quantized ops
///
const std::unordered_set<std::string>& enabled_op_types() const {
return enabled_op_types_;
}
///
/// \brief Set the excluded op ids
///
/// \param[in] op_ids_list excluded op ids
///
void SetExcludedOpIds(std::unordered_set<int> op_ids_list) {
excluded_op_ids_ = op_ids_list;
}
///
/// \brief Get the excluded op ids
///
/// \return exclude op ids
///
const std::unordered_set<int>& excluded_op_ids() const {
return excluded_op_ids_;
}
///
/// \brief Set default scale algorithm
///
/// \param[in] algo Method for calculating scale in quantization process
///
void SetDefaultScaleAlgo(ScaleAlgo algo) { default_scale_algo_ = algo; }
///
/// \brief Get default scale algorithm
///
/// \return Method for calculating scale in quantization
/// process
///
ScaleAlgo default_scale_algo() const { return default_scale_algo_; }
protected:
std::map<std::string, std::map<std::string, ScaleAlgo>> rules_;
std::unordered_set<std::string> enabled_op_types_;
std::unordered_set<int> excluded_op_ids_;
std::shared_ptr<std::vector<PaddleTensor>> warmup_data_;
int warmup_bs_{1};
ScaleAlgo default_scale_algo_{ScaleAlgo::MAX};
};
} // namespace paddle
// Copyright (c) 2018 PaddlePaddle Authors. All Rights Reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#pragma once
#include <sstream>
#include <string>
#include <vector>
///
/// \file paddle_pass_builder.h
///
/// \brief Class Paddle Passs Builder and its subclasses(pass strategies).
/// \section sec_intro Introduction
/// This class aims to build passes for paddle and define passes' strategies.
///
/// \author paddle-infer@baidu.com
/// \date 2020-3-23
/// \since 1.7
/// \namespace paddle
namespace paddle {
/// \class PaddlePassBuilder
/// \brief This class build passes based on vector<string> input. It is part of
/// inference API. Users can build passes, insert new passes, delete passes
/// using this class and its functions.
///
/// Example Usage:
/// Build a new pass.
/// \code{cpp}
/// const vector<string> passes(1, "conv_relu_mkldnn_fuse_pass");
/// PaddlePassBuilder builder(passes);
/// \endcode
class PaddlePassBuilder {
public:
/// \brief Constructor of the class. It stores the input passes.
/// \param[in] passes passes' types.
explicit PaddlePassBuilder(const std::vector<std::string> &passes)
: passes_(passes) {}
/// \brief Stores the input passes.
/// \param[in] passes passes' types.
void SetPasses(std::initializer_list<std::string> passes) {
passes_ = passes;
}
/// \brief Append a pass to the end of the passes.
/// \param[in] pass_type the type of the new pass.
void AppendPass(const std::string &pass_type);
/// \brief Insert a pass to a specific position.
/// \param[in] idx the position to insert.
/// \param[in] pass_type the type of insert pass.
void InsertPass(size_t idx, const std::string &pass_type);
/// \brief Delete the pass at certain position 'idx'.
/// \param[in] idx the position to delete.
void DeletePass(size_t idx);
/// \brief Delete all passes that has a certain type 'pass_type'.
/// \param[in] pass_type the certain pass type to be deleted.
void DeletePass(const std::string &pass_type);
/// \brief Delete all the passes.
void ClearPasses();
/// \brief Append an analysis pass.
/// \param[in] pass the type of the new analysis pass.
void AppendAnalysisPass(const std::string &pass);
/// \brief Visualize the computation graph after each pass by generating a DOT
/// language file, one can draw them with the Graphviz toolkit.
void TurnOnDebug();
/// \brief Human-readable information of the passes.
std::string DebugString();
/// \brief Get information of passes.
/// \return Return list of the passes.
const std::vector<std::string> &AllPasses() const { return passes_; }
/// \brief Get information of analysis passes.
/// \return Return list of analysis passes.
std::vector<std::string> AnalysisPasses() const {
auto passes = analysis_passes_;
// To make sure the ir_graph_to_program should be the last pass so any
// modication of IR will persist to the program.
passes.push_back("ir_graph_to_program_pass");
return passes;
}
protected:
/// \cond Protected
std::vector<std::string> analysis_passes_{
{"ir_graph_build_pass", "ir_graph_clean_pass", "ir_analysis_pass",
"ir_params_sync_among_devices_pass", "adjust_cudnn_workspace_size_pass",
"inference_op_replace_pass"}};
std::vector<std::string> passes_;
/// \endcond
};
/// \class PassStrategy
/// \brief This class defines the pass strategies like whether to use gpu/cuDNN
/// kernel/MKLDNN.
class PassStrategy : public PaddlePassBuilder {
public:
/// \brief Constructor of PassStrategy class. It works the same as
/// PaddlePassBuilder class. \param[in] passes passes' types.
explicit PassStrategy(const std::vector<std::string> &passes)
: PaddlePassBuilder(passes) {}
/// \brief Enable the use of cuDNN kernel.
virtual void EnableCUDNN() {}
/// \brief Enable the use of MKLDNN.
/// The MKLDNN control exists in both CPU and GPU mode, because there can
/// still be some CPU kernels running in GPU mode.
virtual void EnableMKLDNN() {}
/// \brief Enable MKLDNN quantize optimization.
virtual void EnableMkldnnQuantizer() {}
/// \brief Check if we are using gpu.
/// \return A bool variable implying whether we are in gpu mode.
bool use_gpu() const { return use_gpu_; }
/// \brief Default destructor.
virtual ~PassStrategy() = default;
protected:
/// \cond Protected
bool use_gpu_{false};
bool use_mkldnn_{false};
/// \endcond
};
/// \class CpuPassStrategy
/// \brief The CPU passes controller, it is used in AnalysisPredictor with CPU
/// mode.
class CpuPassStrategy : public PassStrategy {
public:
/// \brief Default constructor of CpuPassStrategy.
CpuPassStrategy();
/// \brief Construct by copying another CpuPassStrategy object.
/// \param[in] other The CpuPassStrategy object we want to copy.
explicit CpuPassStrategy(const CpuPassStrategy &other)
: PassStrategy(other.AllPasses()) {
use_gpu_ = other.use_gpu_;
use_mkldnn_ = other.use_mkldnn_;
use_mkldnn_quantizer_ = other.use_mkldnn_quantizer_;
}
/// \brief Default destructor.
virtual ~CpuPassStrategy() = default;
/// \brief Enable the use of cuDNN kernel.
void EnableCUDNN() override;
/// \brief Enable the use of MKLDNN.
void EnableMKLDNN() override;
/// \brief Enable MKLDNN quantize optimization.
void EnableMkldnnQuantizer() override;
protected:
/// \cond Protected
bool use_mkldnn_quantizer_{false};
/// \endcond
};
/// \class GpuPassStrategy
/// \brief The GPU passes controller, it is used in AnalysisPredictor with GPU
/// mode.
class GpuPassStrategy : public PassStrategy {
public:
/// \brief Default constructor of GpuPassStrategy.
GpuPassStrategy();
/// \brief Construct by copying another GpuPassStrategy object.
/// \param[in] other The GpuPassStrategy object we want to copy.
explicit GpuPassStrategy(const GpuPassStrategy &other)
: PassStrategy(other.AllPasses()) {
use_gpu_ = true;
use_cudnn_ = other.use_cudnn_;
}
/// \brief Enable the use of cuDNN kernel.
void EnableCUDNN() override;
/// \brief Not supported in GPU mode yet.
void EnableMKLDNN() override;
/// \brief Not supported in GPU mode yet.
void EnableMkldnnQuantizer() override;
/// \brief Default destructor.
virtual ~GpuPassStrategy() = default;
protected:
/// \cond Protected
bool use_cudnn_{false};
/// \endcond
};
/// \brief List of tensorRT subgraph passes.
extern const std::vector<std::string> kTRTSubgraphPasses;
/// \brief List of lite subgraph passes.
extern const std::vector<std::string> kLiteSubgraphPasses;
} // namespace paddle
breathe==4.18.1
sphinx==3.0.3
recommonmark
sphinx_markdown_tables==0.0.14
sphinx_rtd_theme==0.4.3
exhale==0.2.3
模型可视化
==============
通过 `Quick Start <../introduction/quick_start.html>`_ 一节中,我们了解到,预测模型包含了两个文件,一部分为模型结构文件,通常以 **model** 或 **__model__** 文件存在;另一部分为参数文件,通常以params 文件或一堆分散的文件存在。
模型结构文件,顾名思义,存储了模型的拓扑结构,其中包括模型中各种OP的计算顺序以及OP的详细信息。很多时候,我们希望能够将这些模型的结构以及内部信息可视化,方便我们进行模型分析。接下来将会通过两种方式来讲述如何对Paddle 预测模型进行可视化。
一: 通过 VisualDL 可视化
------------------
1) 安装
VisualDL是飞桨可视化分析工具,以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、高维数据分布等,帮助用户更清晰直观地理解深度学习模型训练过程及模型结构,实现高效的模型优化。
我们可以进入 `GitHub主页 <https://github.com/PaddlePaddle/VisualDL#%E5%AE%89%E8%A3%85%E6%96%B9%E5%BC%8F>`_ 进行下载安装。
2)可视化
`点击 <https://paddle-inference-dist.bj.bcebos.com/temp_data/sample_model/__model__>`_ 下载测试模型。
支持两种启动方式:
- 前端拖拽上传模型文件:
- 无需添加任何参数,在命令行执行 visualdl 后启动界面上传文件即可:
.. image:: https://user-images.githubusercontent.com/48054808/88628504-a8b66980-d0e0-11ea-908b-196d02ed1fa2.png
- 后端透传模型文件:
- 在命令行加入参数 --model 并指定 **模型文件** 路径(非文件夹路径),即可启动:
.. code:: python
visualdl --model ./log/model --port 8080
.. image:: https://user-images.githubusercontent.com/48054808/88621327-b664f280-d0d2-11ea-9e76-e3fcfeea4e57.png
Graph功能详细使用,请见 `Graph使用指南 <https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README.md#Graph--%E7%BD%91%E7%BB%9C%E7%BB%93%E6%9E%84%E7%BB%84%E4%BB%B6>`_ 。
二: 通过代码方式生成dot文件
---------------------
1)pip 安装Paddle
2)生成dot文件
`点击 <https://paddle-inference-dist.bj.bcebos.com/temp_data/sample_model/__model__>`_ 下载测试模型。
.. code:: python
#!/usr/bin/env python
import paddle.fluid as fluid
from paddle.fluid import core
from paddle.fluid.framework import IrGraph
def get_graph(program_path):
with open(program_path, 'rb') as f:
binary_str = f.read()
program = fluid.framework.Program.parse_from_string(binary_str)
return IrGraph(core.Graph(program.desc), for_test=True)
if __name__ == '__main__':
program_path = './lecture_model/__model__'
offline_graph = get_graph(program_path)
offline_graph.draw('.', 'test_model', [])
3)生成svg
**Note:需要环境中安装graphviz**
.. code:: python
dot -Tsvg ./test_mode.dot -o test_model.svg
然后将test_model.svg以浏览器打开预览即可。
.. image:: https://user-images.githubusercontent.com/5595332/81796500-19b59e80-9540-11ea-8c70-31122e969683.png
模型转换工具 X2Paddle
=====================
X2Paddle可以将caffe、tensorflow、onnx模型转换成Paddle支持的模型。
`X2Paddle <https://github.com/PaddlePaddle/X2Paddle>`_ 支持将Caffe/TensorFlow模型转换为PaddlePaddle模型。目前X2Paddle支持的模型参考 `x2paddle_model_zoo <https://github.com/PaddlePaddle/X2Paddle/blob/develop/x2paddle_model_zoo.md>`_ 。
多框架支持
----------------
================ ===== ========== ====
模型 caffe tensorflow onnx
================ ===== ========== ====
mobilenetv1 Y Y F
mobilenetv2 Y Y Y
resnet18 Y Y F
resnet50 Y Y Y
mnasnet Y Y F
efficientnet Y Y Y
squeezenetv1.1 Y Y Y
shufflenet Y Y F
mobilenet_ssd Y Y F
mobilenet_yolov3 F Y F
inceptionv4 F F F
mtcnn Y Y F
facedetection Y F F
unet Y Y F
ocr_attention F F F
vgg16 F F F
================ ===== ========== ====
安装
---------------
.. code:: shell
pip install x2paddle
安装最新版本,可使用如下安装方式
.. code:: shell
pip install git+https://github.com/PaddlePaddle/X2Paddle.git@develop
使用
------------
Caffe
>>>>>>>>>>>>>>
.. code:: shell
x2paddle --framework caffe \
--prototxt model.proto \
--weight model.caffemodel \
--save_dir paddle_model
TensorFlow
>>>>>>>>>>
.. code:: shell
x2paddle --framework tensorflow \
--model model.pb \
--save_dir paddle_model
转换结果说明
--------------
在指定的 `save_dir` 下生成两个目录
1. inference_model : 模型结构和参数均序列化保存的模型格式
2. model_with_code : 保存了模型参数文件和模型的python代码
**问题反馈**
X2Paddle使用时存在问题时,欢迎您将问题或Bug报告以 `Github Issues <https://github.com/PaddlePaddle/X2Paddle/issues>`_ 的形式提交给我们,我们会实时跟进。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册