diff --git a/paddle/contrib/inference/high_level_api.md b/paddle/contrib/inference/high_level_api.md new file mode 100644 index 0000000000000000000000000000000000000000..563b696143de9cbf67db38048bbd2f7c11b3a66e --- /dev/null +++ b/paddle/contrib/inference/high_level_api.md @@ -0,0 +1,59 @@ +# Inference High-level APIs +This document describes the high-level inference APIs one can use to easily deploy a Paddle model for an application. + +The APIs are described in `paddle_inference_api.h`, just one header file, and two libaries `libpaddle_fluid.so` and `libpaddle_fluid_api.so` are needed. + +## PaddleTensor +We provide the `PaddleTensor` data structure is to give a general tensor interface. + +The definition is + +```c++ +struct PaddleTensor { + std::string name; // variable name. + std::vector shape; + PaddleBuf data; // blob of data. + PaddleDType dtype; +}; +``` + +The data is stored in a continuous memory `PaddleBuf`, and tensor's data type is specified by a `PaddleDType`. +The `name` field is used to specify the name of input variable, +that is important when there are multiple inputs and need to distiuish which variable to set. + +## engine +The inference APIs has two different underlying implementation, currently there are two valid engines: + +- the native engine, which is consists of the native operators and framework, +- the Anakin engine, which is a Anakin library embeded. + +The native engine takes a native Paddle model as input, and supports any model that trained by Paddle, +but the Anakin engine can only take the Anakin model as input(user need to manully transform the format first) and currently not all Paddle models are supported. + +```c++ +enum class PaddleEngineKind { + kNative = 0, // Use the native Fluid facility. + kAnakin, // Use Anakin for inference. +}; +``` + +## PaddlePredictor and how to create one +The main interface is `PaddlePredictor`, there are following methods + +- `bool Run(const std::vector& inputs, std::vector* output_data)` + - take inputs and output `output_data` +- `Clone` to clone a predictor from an existing one, with model parameter shared. + +There is a factory method to help create a predictor, and the user takes the ownership of this object. + +```c++ +template +std::unique_ptr CreatePaddlePredictor(const ConfigT& config); +``` + +By specifying the engine kind and config, one can get an specific implementation. + +## Reference + +- [paddle_inference_api.h](./paddle_inference_api.h) +- [demos](./demo) diff --git a/paddle/contrib/inference/high_level_api_cn.md b/paddle/contrib/inference/high_level_api_cn.md new file mode 100644 index 0000000000000000000000000000000000000000..a57f015a4e44d43ee4e475cf606faa6f05e095fa --- /dev/null +++ b/paddle/contrib/inference/high_level_api_cn.md @@ -0,0 +1,87 @@ +# Paddle 预测 API + +为了更简单方便的预测部署,Fluid 提供了一套高层 API 用来隐藏底层不同的优化实现。 + +预测库包含: + +- 头文件 `paddle_inference_api.h` 定义了所有的接口 +- 库文件`libpaddle_fluid.so` 或 `libpaddle_fluid.a` +- 库文件 `libpaddle_inference_api.so` 或 `libpaddle_inference_api.a` + +下面是详细的一些 API 概念介绍 + +## PaddleTensor + +PaddleTensor 定义了预测最基本的输入输出的数据格式,其定义是 + +```c++ +struct PaddleTensor { + std::string name; // variable name. + std::vector shape; + PaddleBuf data; // blob of data. + PaddleDType dtype; +}; +``` + +- `name` 用于指定输入数据对应的 模型中variable 的名字 (暂时没有用,但会在后续支持任意 target 时启用) +- `shape` 表示一个 Tensor 的 shape +- `data` 数据以连续内存的方式存储在`PaddleBuf` 中,`PaddleBuf` 可以接收外面的数据或者独立`malloc`内存,详细可以参考头文件中相关定义。 +- `dtype` 表示 Tensor 的数据类型 + +## engine + +高层 API 底层有多种优化实现,我们称之为 engine,目前有三种 engine + +- 原生 engine,由 paddle 原生的 forward operator 组成,可以天然支持所有paddle 训练出的模型, +- Anakin engine,封装了 [Anakin](https://github.com/PaddlePaddle/Anakin) ,在某些模型上性能不错,但只能接受自带模型格式,无法支持所有 paddle 模型, +- TensorRT mixed engine,用子图的方式支持了 [TensorRT](https://developer.nvidia.com/tensorrt) ,支持所有paddle 模型,并自动切割部分计算子图到 TensorRT 上加速(WIP) + +其实现为 + +```c++ +enum class PaddleEngineKind { + kNative = 0, // Use the native Fluid facility. + kAnakin, // Use Anakin for inference. + kAutoMixedTensorRT // Automatically mixing TensorRT with the Fluid ops. +}; +``` + +## 预测部署过程 + +总体上分为以下步骤 + +1. 用合适的配置创建 `PaddlePredictor` +2. 创建输入用的 `PaddleTensor`,传入到 `PaddlePredictor` 中 +3. 获取输出的 `PaddleTensor` ,将结果取出 + +下面完整演示一个简单的模型,部分细节代码隐去 + +```c++ +#include "paddle_inference_api.h" + +// 创建一个 config,并修改相关设置 +paddle::NativeConfig config; +config.model_dir = "xxx"; +config.use_gpu = false; +// 创建一个原生的 PaddlePredictor +auto predictor = + paddle::CreatePaddlePredictor(config); +// 创建输入 tensor +int64_t data[4] = {1, 2, 3, 4}; +paddle::PaddleTensor tensor{.name = "", + .shape = std::vector({4, 1}), + .data = PaddleBuf(data, sizeof(data)), + .dtype = PaddleDType::INT64}; +// 创建输出 tensor,输出 tensor 的内存可以复用 +std::vector outputs; +// 执行预测 +CHECK(predictor->Run(slots, &outputs)); +// 获取 outputs ... +``` + +编译时,联编 `libpaddle_fluid.a/.so` 和 `libpaddle_inference_api.a/.so` 便可。 + +## 详细代码参考 + +- [inference demos](./demo) +- [复杂单线程/多线程例子](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/contrib/inference/test_paddle_inference_api_impl.cc) diff --git a/paddle/contrib/inference/paddle_inference_api.h b/paddle/contrib/inference/paddle_inference_api.h index bd4530fcf9518cb3bf06179d8f60a1dde38ff7dd..38e3cc21413b9ab715b84f278f00b9df23cb7682 100644 --- a/paddle/contrib/inference/paddle_inference_api.h +++ b/paddle/contrib/inference/paddle_inference_api.h @@ -109,8 +109,7 @@ class PaddlePredictor { // The common configs for all the predictors. struct Config { - std::string model_dir; // path to the model directory. - bool enable_engine{false}; // Enable to execute (part of) the model on + std::string model_dir; // path to the model directory. }; };