From 6dfa52fe9d088e44abb3642b9ce1231a95c6b72c Mon Sep 17 00:00:00 2001 From: HongyingG <44694098+HongyingG@users.noreply.github.com> Date: Fri, 15 Feb 2019 09:20:06 +0800 Subject: [PATCH] native_infer_en (#569) * native_infer_en * Update native_infer_en.md * Review * review2 --- .../deploy/inference/native_infer_en.md | 136 ++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 doc/fluid/advanced_usage/deploy/inference/native_infer_en.md diff --git a/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md new file mode 100644 index 000000000..4c82c90d2 --- /dev/null +++ b/doc/fluid/advanced_usage/deploy/inference/native_infer_en.md @@ -0,0 +1,136 @@ +# Introduction to C++ Inference API + +To make the deployment of inference model more convenient, a set of high-level APIs are provided in Fluid to hide diverse optimization processes in low level. + +Inference library contains: + +- header file `paddle_inference_api.h` which defines all interfaces +- library file `libpaddle_fluid.so` or `libpaddle_fluid.a` + +Details are as follows: + +## PaddleTensor + +PaddleTensor defines basic format of input and output data for inference. Common fields are as follows: + +- `name` is used to indicate the name of variable in model correspondent with input data. +- `shape` represents the shape of a Tensor. +- `data` is stored in `PaddleBuf` in method of consecutive storage. `PaddleBuf` can receieve outer data or independently `malloc` memory. You can refer to associated definitions in head file. +- `dtype` represents data type of Tensor. + +## Use Config to create different engines + +The low level of high-level API contains various optimization methods which are called engines. Switch between different engines is done by transferring different Config. + +- `NativeConfig` native engine, consisting of native forward operators of paddle, can naturally support all models trained by paddle. + +- `AnalysisConfig` TensorRT mixed engine. It is used to speed up GPU and supports [TensorRT] with subgraph. Moreover, this engine supports all paddle models and automatically slices part of computing subgraphs to TensorRT to speed up the process (WIP). For specific usage, please refer to [here](http://paddlepaddle.org/documentation/docs/zh/1.1/user_guides/howto/inference/paddle_tensorrt_infer.html). + + +## Process of Inference Deployment + +In general, the steps are: + +1. Use appropriate configuration to create `PaddlePredictor` +2. Create `PaddleTensor` for input and transfer it into `PaddlePredictor` +3. `PaddleTensor` for fetching output + +The complete process of implementing a simple model is shown below with part of details omitted. + +```c++ +#include "paddle_inference_api.h" + +// create a config and modify associated options +paddle::NativeConfig config; +config.model_dir = "xxx"; +config.use_gpu = false; +// create a native PaddlePredictor +auto predictor = + paddle::CreatePaddlePredictor(config); +// create input tensor +int64_t data[4] = {1, 2, 3, 4}; +paddle::PaddleTensor tensor; +tensor.shape = std::vector({4, 1}); +tensor.data.Reset(data, sizeof(data)); +tensor.dtype = paddle::PaddleDType::INT64; +// create output tensor whose memory is reusable +std::vector outputs; +// run inference +CHECK(predictor->Run(slots, &outputs)); +// fetch outputs ... +``` + +At compile time, it is proper to co-build with `libpaddle_fluid.a/.so` . + + + +## Adavanced Usage + +### memory management of input and output + `data` field of `PaddleTensor` is a `PaddleBuf`, used to manage a section of memory for copying data. + +There are two modes in term of memory management in `PaddleBuf` : + +1. Automatic allocation and manage memory + + ```c++ + int some_size = 1024; + PaddleTensor tensor; + tensor.data.Resize(some_size); + ``` + +2. Transfer outer memory + + ```c++ + int some_size = 1024; + // You can allocate outside memory and keep it available during the usage of PaddleTensor + void* memory = new char[some_size]; + + tensor.data.Reset(memory, some_size); + // ... + + // You need to release memory manually to avoid memory leak + + delete[] memory; + ``` + +In the two modes, the first is more convenient while the second strictly controls memory management to facilitate integration with `tcmalloc` and other libraries. + +### Upgrade performance based on contrib::AnalysisConfig (Prerelease) + +AnalyisConfig is at the stage of pre-release and protected by `namespace contrib` , which may be adjusted in the future. + +Similar to `NativeConfig` , `AnalysisConfig` can create a inference engine with high performance after a series of optimization, including analysis and optimization of computing graph as well as integration and revise for some important Ops, which **largely promotes the peformance of models, such as While, LSTM, GRU** . + +The usage of `AnalysisConfig` is similiar with that of `NativeConfig` but the former *only supports CPU at present and is supporting GPU more and more*. + +```c++ +AnalysisConfig config; +config.model_dir = xxx; +config.use_gpu = false; // GPU optimization is not supported at present +config.specify_input_name = true; // it needs to set name of input +``` + +Note that input PaddleTensor needs to be allocated. Previous examples need to be revised as follows: + +```c++ +auto predictor = + paddle::CreatePaddlePredictor(config); // it needs AnalysisConfig here +// create input tensor +int64_t data[4] = {1, 2, 3, 4}; +paddle::PaddleTensor tensor; +tensor.shape = std::vector({4, 1}); +tensor.data.Reset(data, sizeof(data)); +tensor.dtype = paddle::PaddleDType::INT64; +tensor.name = "input0"; // name need to be set here +``` + +### Suggestion for Performance + +1. If the CPU type permits, it's best to use the versions with support for AVX and MKL. +2. Reuse input and output `PaddleTensor` to avoid frequent memory allocation resulting in low performance +3. Try to replace `NativeConfig` with `AnalysisConfig` to perform optimization for CPU inference + +## Code Demo + +[inference demos](https://github.com/PaddlePaddle/Paddle/tree/develop/paddle/fluid/inference/api/demo_ci) -- GitLab