提交 f43b3704 编写于 作者: C chenlong

fix guides index

上级 fc3e4626
###################
飞桨框架2.0基本概念
###################
让我们从学习飞桨的基本概念这里开始:
- `Tensor概念介绍 <./tensor_introduction_cn.html>`_ : 飞桨中数据的表示方式,Tensor概念介绍。
- `飞桨广播介绍 <./broadcasting_cn.html>`_ : 飞桨中广播概念的介绍。
.. toctree::
:hidden:
tensor_introduction_cn.md
broadcasting_cn.rst
########################
Paddle 2.0 Basic Concept
########################
Let's start with studying basic concept of PaddlePaddle:
- `Introduction to Tensor <tensor_introduction_en.html>`_ : Introduction of Tensor, which is the representation of data in Paddle.
- `broadcasting <./broadcasting_en.html>`_ : Introduction of broadcasting.
.. toctree::
:hidden:
tensor_introduction_en.md
broadcasting_en.md
###############
飞桨框架2.0介绍
###############
飞桨框架2.0简要介绍。
您可以通过下面的内容,了解更多飞桨框架2.0的内容:
- `飞桨框架2.0基本概念介绍 <./basic_concept/index_cn.html>`_ : 飞桨框架2.0基本概念的介绍。
- `飞桨框架2.0beta升级指南 <./upgrade_guide_cn.html>`_: 介绍飞桨开源框架2.0beta的主要变化和如何升级。
- `版本迁移工具 <./migration_cn.html>`_: 介绍paddle1to2转换工具的使用。
.. toctree::
:hidden:
basic_concept/index_cn.rst
upgrade_guide_cn.md
migration_cn.rst
#####################
Paddle 2 Introduction
####################
Introduction of paddle2.
For more information, you can view these pages:
- `paddle 2 basic concept <./basic_concept/index_en.html>`_ : introduction of paddle2 basic concept.
- `migration tools <./migration_en.html>`_ :how to use migration tools to upgrade your code.
.. toctree::
:hidden:
migration_en.rst
###################
飞桨框架2.0模型开发
###################
飞桨框架2.0模型开发相关内容。
..
TODO
补充内容
10分钟快速上手Paddle
数据预处理 (vision + text)
数据加载 (Dataset + DataLoader、内置数据集介绍)
模型组网 (paddle.nn + paddle.nn.functional、Model介绍、内置模型介绍)
训练与预测 (model.fit evaluate predict、一步步拆解fit、evaluate、predict)
单机多卡 (训练 + 预测)
动态图代码调试
.. PaddlePaddle Fluid documentation master file, created by
sphinx-quickstart on Thu Jun 7 17:04:53 2018.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
##############
VisualDL 工具
##############
.. toctree::
:maxdepth: 1
visualdl.md
visualdl_usage.md
VisualDL Tools
==========================
.. toctree::
:maxdepth: 1
visualdl_en.md
visualdl_usage_en.md
# VisualDL 工具简介
<p align="center">
<img src="http://visualdl.bj.bcebos.com/images/vdl-logo.png" width="70%"/>
</p>
VisualDL是飞桨可视化分析工具,以丰富的图表呈现训练参数变化趋势、模型结构、数据样本、直方图、PR曲线及高维数据分布。可帮助用户更清晰直观地理解深度学习模型训练过程及模型结构,进而实现高效的模型优化。
具体功能使用方式请参见**VisualDL使用指南**。项目正处于高速迭代中,敬请期待新组件的加入。
VisualDL支持浏览器种类:Chrome(81和83)、Safari 13、FireFox(77和78)、Edge(Chromium版)。
VisualDL原生支持python的使用, 通过在模型的Python配置中添加几行代码,便可为训练过程提供丰富的可视化支持。
## 目录
* [核心亮点](#核心亮点)
* [安装方式](#安装方式)
* [使用方式](#使用方式)
* [可视化功能概览](#可视化功能概览)
* [开源贡献](#开源贡献)
* [更多细节](#更多细节)
* [技术交流](#技术交流)
## 核心亮点
### 简单易用
API设计简洁易懂,使用简单。模型结构一键实现可视化。
### 功能丰富
功能覆盖标量、数据样本、图结构、直方图、PR曲线及数据降维可视化。
### 高兼容性
全面支持Paddle、ONNX、Caffe等市面主流模型结构可视化,广泛支持各类用户进行可视化分析。
### 全面支持
与飞桨服务平台及工具组件全面打通,为您在飞桨生态系统中提供最佳使用体验。
## 安装方式
### 使用pip安装
```shell
pip install --upgrade --pre visualdl
```
### 使用代码安装
```
git clone https://github.com/PaddlePaddle/VisualDL.git
cd VisualDL
python setup.py bdist_wheel
pip install --upgrade dist/visualdl-*.whl
```
需要注意,官方自2020年1月1日起不再维护Python2,为了保障代码可用性,VisualDL现仅支持Python3
## 使用方式
VisualDL将训练过程中的数据、参数等信息储存至日志文件中后,启动面板即可查看可视化结果。
### 1. 记录日志
VisualDL的后端提供了Python SDK,可通过LogWriter定制一个日志记录器,接口如下:
```python
class LogWriter(logdir=None,
comment='',
max_queue=10,
flush_secs=120,
filename_suffix='',
write_to_disk=True,
**kwargs)
```
#### 接口参数
| 参数 | 格式 | 含义 |
| --------------- | ------- | ------------------------------------------------------------ |
| logdir | string | 日志文件所在的路径,VisualDL将在此路径下建立日志文件并进行记录,如果不填则默认为`runs/${CURRENT_TIME}` |
| comment | string | 为日志文件夹名添加后缀,如果制定了logdir则此项无效 |
| max_queue | int | 日志记录消息队列的最大容量,达到此容量则立即写入到日志文件 |
| flush_secs | int | 日志记录消息队列的最大缓存时间,达到此时间则立即写入到日志文件 |
| filename_suffix | string | 为默认的日志文件名添加后缀 |
| write_to_disk | boolean | 是否写入到磁盘 |
#### 示例
设置日志文件并记录标量数据:
```python
from visualdl import LogWriter
# 在`./log/scalar_test/train`路径下建立日志文件
with LogWriter(logdir="./log/scalar_test/train") as writer:
# 使用scalar组件记录一个标量数据
writer.add_scalar(tag="acc", step=1, value=0.5678)
writer.add_scalar(tag="acc", step=2, value=0.6878)
writer.add_scalar(tag="acc", step=3, value=0.9878)
```
### 2. 启动面板
在上述示例中,日志已记录三组标量数据,现可启动VisualDL面板查看日志的可视化结果,共有两种启动方式:
#### 在命令行启动
使用命令行启动VisualDL面板,命令格式如下:
```python
visualdl --logdir <dir_1, dir_2, ... , dir_n> --host <host> --port <port> --cache-timeout <cache_timeout> --language <language> --public-path <public_path> --api-only
```
参数详情:
| 参数 | 意义 |
| --------------- | ------------------------------------------------------------ |
| --logdir | 设定日志所在目录,可以指定多个目录,VisualDL将遍历并且迭代寻找指定目录的子目录,将所有实验结果进行可视化 |
| --model | 设定模型文件路径(非文件夹路径),VisualDL将在此路径指定的模型文件进行可视化,目前可支持PaddlePaddle、ONNX、Keras、Core ML、Caffe等多种模型结构,详情可查看[graph支持模型种类]([https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README.md#Graph--%E7%BD%91%E7%BB%9C%E7%BB%93%E6%9E%84%E7%BB%84%E4%BB%B6](https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/components/README.md#Graph--网络结构组件)) |
| --host | 设定IP,默认为`127.0.0.1` |
| --port | 设定端口,默认为`8040` |
| --cache-timeout | 后端缓存时间,在缓存时间内前端多次请求同一url,返回的数据从缓存中获取,默认为20秒 |
| --language | VisualDL面板语言,可指定为'EN'或'ZH',默认为浏览器使用语言 |
| --public-path | VisualDL面板URL路径,默认是'/app',即访问地址为'http://&lt;host&gt;:&lt;port&gt;/app' |
| --api-only | 是否只提供API,如果设置此参数,则VisualDL不提供页面展示,只提供API服务,此时API地址为'http://&lt;host&gt;:&lt;port&gt;/&lt;public_path&gt;/api';若没有设置public_path参数,则默认为'http://&lt;host&gt;:&lt;port&gt;/api' |
针对上一步生成的日志,启动命令为:
```
visualdl --logdir ./log
```
#### 在Python脚本中启动
支持在Python脚本中启动VisualDL面板,接口如下:
```python
visualdl.server.app.run(logdir,
host="127.0.0.1",
port=8080,
cache_timeout=20,
language=None,
public_path=None,
api_only=False,
open_browser=False)
```
请注意:除`logdir`外,其他参数均为不定参数,传递时请指明参数名。
接口参数具体如下:
| 参数 | 格式 | 含义 |
| ------------- | ------------------------------------------------ | ------------------------------------------------------------ |
| logdir | string或list[string_1, string_2, ... , string_n] | 日志文件所在的路径,VisualDL将在此路径下递归搜索日志文件并进行可视化,可指定单个或多个路径 |
| model | string | 模型文件路径(非文件夹路径),VisualDL将在此路径指定的模型文件进行可视化 |
| host | string | 指定启动服务的ip,默认为`127.0.0.1` |
| port | int | 启动服务端口,默认为`8040` |
| cache_timeout | int | 后端缓存时间,在缓存时间内前端多次请求同一url,返回的数据从缓存中获取,默认为20秒 |
| language | string | VisualDL面板语言,可指定为'en'或'zh',默认为浏览器使用语言 |
| public_path | string | VisualDL面板URL路径,默认是'/app',即访问地址为'http://<host>:<port>/app' |
| api_only | boolean | 是否只提供API,如果设置此参数,则VisualDL不提供页面展示,只提供API服务,此时API地址为'http://<host>:<port>/<public_path>/api';若没有设置public_path参数,则默认为http://<host>:<port>/api' |
| open_browser | boolean | 是否打开浏览器,设置为True则在启动后自动打开浏览器并访问VisualDL面板,若设置api_only,则忽略此参数 |
针对上一步生成的日志,我们的启动脚本为:
```python
from visualdl.server import app
app.run(logdir="./log")
```
在使用任意一种方式启动VisualDL面板后,打开浏览器访问VisualDL面板,即可查看日志的可视化结果,如图:
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/82786044-67ae9880-9e96-11ea-8a2b-3a0951a6ec19.png" width="60%"/>
</p>
## 可视化功能概览
### Scalar
以图表形式实时展示训练过程参数,如loss、accuracy。让用户通过观察单组或多组训练参数变化,了解训练过程,加速模型调优。具有两大特点:
#### 动态展示
在启动VisualDL后,LogReader将不断增量的读取日志中数据并供前端调用展示,因此能够在训练中同步观测指标变化,如下图:
<p align="center">
<img src="http://visualdl.bj.bcebos.com/images/dynamic_display.gif" width="60%"/>
</p>
#### 多实验对比
只需在启动VisualDL时将每个实验日志所在路径同时传入即可,每个实验中相同tag的指标将绘制在一张图中同步呈现,如下图:
<p align="center">
<img src="http://visualdl.bj.bcebos.com/images/multi_experiments.gif" width="100%"/>
</p>
### Image
实时展示训练过程中的图像数据,用于观察不同训练阶段的图像变化,进而深入了解训练过程及效果。
<p align="center">
<img src="http://visualdl.bj.bcebos.com/images/image-eye.gif" width="60%"/>
</p>
### Audio
实时查看训练过程中的音频数据,监控语音识别与合成等任务的训练过程。
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/89017647-38605000-d34d-11ea-9d75-7d10b9854c36.gif" width="100%"/>
</p>
### Graph
一键可视化模型的网络结构。可查看模型属性、节点信息、节点输入输出等,并支持节点搜索,辅助用户快速分析模型结构与了解数据流向。
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/84483052-5acdd980-accb-11ea-8519-1608da7ee698.png" width="100%"/>
</p>
### Histogram
以直方图形式展示Tensor(weight、bias、gradient等)数据在训练过程中的变化趋势。深入了解模型各层效果,帮助开发者精准调整模型结构。
- Offset模式
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/86551031-86647c80-bf76-11ea-8ec2-8c86826c8137.png" width="100%"/>
</p>
- Overlay模式
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/86551033-882e4000-bf76-11ea-8e6a-af954c662ced.png" width="100%"/>
</p>
### PR Curve
精度-召回率曲线,帮助开发者权衡模型精度和召回率之间的平衡,设定最佳阈值。
<p align="center">
<img src="https://user-images.githubusercontent.com/48054808/86738774-ee46c000-c067-11ea-90d2-a98aac445cca.png" width="100%"/>
</p>
### High Dimensional
将高维数据进行降维展示,目前支持T-SNE、PCA两种降维方式,用于深入分析高维数据间的关系,方便用户根据数据特征进行算法优化。
<p align="center">
<img src="http://visualdl.bj.bcebos.com/images/high_dimensional_test.png" width="100%"/>
</p>
## 开源贡献
VisualDL 是由 [PaddlePaddle](https://www.paddlepaddle.org/)[ECharts](https://echarts.apache.org/) 合作推出的开源项目。
Graph 相关功能由 [Netron](https://github.com/lutzroeder/netron) 提供技术支持。
欢迎所有人使用,提意见以及贡献代码。
## 更多细节
想了解更多关于VisualDL可视化功能的使用详情介绍,请查看**VisualDL使用指南**
## 技术交流
欢迎您加入VisualDL官方QQ群:1045783368 与飞桨团队以及其他用户共同针对VisualDL进行讨论与交流。
# Introduction to VisualDL Toolset
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/images/vs-logo.png" width="60%" />
</p>
## Introduction
VisualDL is a deep learning visualization tool that can help design deep learning jobs.
It includes features such as scalar, parameter distribution, model structure and image visualization.
Currently it is being developed at a high pace.
New features will be continuously added.
At present, most DNN frameworks use Python as their primary language. VisualDL supports Python by nature.
Users can get plentiful visualization results by simply add a few lines of Python code into their model before training.
Besides Python SDK, VisualDL was writen in C++ on the low level. It also provides C++ SDK that
can be integrated into other platforms.
## Component
VisualDL provides following components:
- scalar
- histogram
- image
- audio
- graph
- high dimensional
### Scalar
Scalar can be used to show the trends of error during training.
<p align="center">
<img src="https://raw.githubusercontent.com/daming-lu/large_files/master/loss_scalar.gif" width="60%"/>
</p>
### Histogram
Histogram can be used to visualize parameter distribution and trends for any tensor.
<p align="center">
<img src="https://raw.githubusercontent.com/daming-lu/large_files/master/histogram.gif" width="60%"/>
</p>
### Image
Image can be used to visualize any tensor or intermediate generated image.
<p align="center">
<img src="https://raw.githubusercontent.com/daming-lu/large_files/master/loss_image.gif" width="60%"/>
</p>
### Audio
Audio can be used to play input audio samples or generated audio samples.
### Graph
VisualDL graph supports displaying paddle model, furthermore is compatible with ONNX ([Open Neural Network Exchange](https://github.com/onnx/onnx)),
Cooperated with Python SDK, VisualDL can be compatible with most major DNN frameworks, including
PaddlePaddle, PyTorch and MXNet.
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/images/graph_demo.gif" width="60%" />
</p>
To display the paddle model, all you have to do is:
1. call the `fluid.io.save_inference_model()`interface to save paddle model
2. use `visualdl --model_pb [paddle_model_dir]` to load paddle model in command line
### High Dimensional
High Dimensional can be used to visualize data embeddings by projecting high-dimensional data into 2D / 3D.
<p align="center">
<img src="https://raw.githubusercontent.com/PaddlePaddle/VisualDL/develop/docs/getting_started/high_dimensional_3d.png" width="60%"/>
</p>
## Quick Start
To give the VisualDL a quick test, please use the following commands.
```
# Install the VisualDL. Preferably under a virtual environment or anaconda.
pip install --upgrade visualdl
# run a demo, vdl_create_scratch_log will create logs for testing.
vdl_create_scratch_log
visualdl --logdir=scratch_log --port=8080
# visit http://127.0.0.1:8080
```
If you encounter the error `TypeError: __init__() got an unexpected keyword argument 'file'`, that is due to protobuf version is not 3.5+,simply run `pip install --upgrade protobuf` will fix the issue.
If you run into any other issues in above steps, it could be error caused by environmental issues by different python or pip versions.
Following installation methods might fix the issues.
## Install with Virtualenv
[Virtualenv](https://virtualenv.pypa.io/en/stable/) creates isolated Python environment that prevents interfering
by other Python programs on the same machine and make sure Python and pip are located properly.
On macOS, install pip and virtualenv by:
```
sudo easy_install pip
pip install --upgrade virtualenv
```
On Linux, install pip and virtualenv by:
```
sudo apt-get install python3-pip python3-dev python-virtualenv
```
Then create a Virtualenv environment by one of following command:
```
virtualenv ~/vdl # for Python2.7
virtualenv -p python3 ~/vdl for Python 3.x
```
```~/vdl``` will be your Virtualenv directory, you may choose to install anywhere.
Activate your Virtualenv environment by:
```
source ~/vdl/bin/activate
```
Now you should be able to install VisualDL and run our demo:
```
pip install --upgrade visualdl
# run a demo, vdl_create_scratch_log will create logs for testing.
vdl_create_scratch_log
visualdl --logdir=scratch_log --port=8080
# visit http://127.0.0.1:8080
```
If you still have issues installing VisualDL from Virtualenv, try following installation method.
## Install with Anaconda
Anaconda is a python distribution, with installation and package management tools. Also it is an environment manager,
which provides the facility to create different python environments, each with their own settings.
Follow the instructions on the [Anaconda download site](https://www.anaconda.com/download) to download and install Anaconda.
Download Python 3.6 version command-Line installer.
Create a conda environment named ```vdl``` or anything you want by:
```
conda create -n vdl pip python=2.7 # or python=3.3, etc.
```
Activate the conda environment by:
```
source activate vdl
```
Now you should be able to install VisualDL and run our demo:
```
pip install --upgrade visualdl
# run a demo, vdl_create_scratch_log will create logs for testing.
vdl_create_scratch_log
visualdl --logdir=scratch_log --port=8080
# visit http://127.0.0.1:8080
```
If you still have issues installing VisualDL, try installing from sources as in following section.
### Install from source
```
#Preferably under a virtualenv or anaconda.
git clone https://github.com/PaddlePaddle/VisualDL.git
cd VisualDL
python setup.py bdist_wheel
pip install --upgrade dist/visualdl-*.whl
```
If there are still issues regarding the ```pip install```, you can still start Visual DL by starting the dev server
[here](https://github.com/PaddlePaddle/VisualDL/blob/develop/docs/develop/how_to_dev_frontend_en.md)
## SDK
VisualDL provides both Python SDK and C++ SDK in order to fit more use cases.
### Python SDK
VisualDL now supports both Python 2 and Python 3.
Below is an example of creating a simple Scalar component and inserting data from different timestamps:
```python
import random
from visualdl import LogWriter
logdir = "./tmp"
logger = LogWriter(logdir, sync_cycle=10000)
# mark the components with 'train' label.
with logger.mode("train"):
# create a scalar component called 'scalars/scalar0'
scalar0 = logger.scalar("scalars/scalar0")
# add some records during DL model running.
for step in range(100):
scalar0.add_record(step, random.random())
```
### C++ SDK
Here is the C++ SDK identical to the Python SDK example above:
```c++
#include <cstdlib>
#include <string>
#include "visualdl/logic/sdk.h"
namespace vs = visualdl;
namespace cp = visualdl::components;
int main() {
const std::string dir = "./tmp";
vs::LogWriter logger(dir, 10000);
logger.SetMode("train");
auto tablet = logger.AddTablet("scalars/scalar0");
cp::Scalar<float> scalar0(tablet);
for (int step = 0; step < 1000; step++) {
float v = (float)std::rand() / RAND_MAX;
scalar0.AddRecord(step, v);
}
return 0;
}
```
## Launch Visual DL
After some logs have been generated during training, users can launch Visual DL application to see real-time data visualization by:
```
visualdl --logdir <some log dir>
```
visualDL also supports following optional parameters:
- `--host` set IP
- `--port` set port
- `-m / --model_pb` specify ONNX format for model file to view graph
### Contribute
VisualDL is initially created by [PaddlePaddle](http://www.paddlepaddle.org/) and
[ECharts](http://echarts.baidu.com/).
We welcome everyone to use, comment and contribute to VisualDL :)
## More details
For more details about how to use VisualDL, please take a look at [documents](https://github.com/PaddlePaddle/VisualDL/tree/develop/demo)
此差异已折叠。
此差异已折叠。
########
预测部署
########
- `服务器端部署 <inference/index_cn.html>`_ :介绍了如何在服务器端将模型部署上线
- `移动端部署 <mobile/index_cn.html>`_ :介绍了 PaddlePaddle 组织下的嵌入式平台深度学习框架Paddle-Lite
- `模型压缩 <paddleslim/paddle_slim.html>`_ :简要介绍了PaddleSlim模型压缩工具库的特点以及使用说明。
.. toctree::
:hidden:
inference/index_cn.rst
mobile/index_cn.rst
paddleslim/paddle_slim.md
#######################
Deploy Inference Model
#######################
- `Server side Deployment <inference/index_en.html>`_ : This section illustrates the method how to deploy and release the trained models on the servers
- `Model Compression <paddleslim/paddle_slim_en.html>`_ : Introduce the features and usage of PaddleSlim which is a toolkit for model compression.
.. toctree::
:hidden:
inference/index_en.rst
paddleslim/paddle_slim_en.rst
.. _install_or_build_cpp_inference_lib:
安装与编译 Linux 预测库
===========================
直接下载安装
-------------
.. csv-table::
:header: "版本说明", "预测库(1.8.4版本)", "预测库(2.0.0-beta0版本)", "预测库(develop版本)"
:widths: 3, 2, 2, 2
"ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-mkl/fluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-cpu-avx-mkl/paddle_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/paddle_inference.tgz>`_"
"ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-openblas/fluid_inference.tgz>`_", ,"`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/paddle_inference.tgz>`_"
"ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-noavx-openblas/fluid_inference.tgz>`_", ,"`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/paddle_inference.tgz>`_"
"ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-gpu-cuda9-cudnn7-avx-mkl/paddle_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/paddle_inference.tgz>`_"
"ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-gpu-cuda10-cudnn7-avx-mkl/paddle_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/paddle_inference.tgz>`_"
"ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Fpaddle_inference.tgz>`_",
"nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", "`paddle_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-nv-jetson-cuda10-cudnn7.5-trt5/paddle_inference.tgz>`_",
从源码编译
----------
用户也可以从 PaddlePaddle 核心代码编译C++预测库,只需在编译时配制下面这些编译选项:
============================ ============= ==================
选项 值 说明
============================ ============= ==================
CMAKE_BUILD_TYPE Release 编译方式,仅使用预测库设为Release即可
FLUID_INFERENCE_INSTALL_DIR 安装路径 预测库安装路径
WITH_PYTHON OFF(推荐) 编译python预测库与whl包
ON_INFER ON(推荐) 预测时使用,必须设为ON
WITH_GPU ON/OFF 编译支持GPU的预测库
WITH_MKL ON/OFF 编译支持MKL的预测库
WITH_MKLDNN ON/OFF 编译支持MKLDNN的预测库
WITH_XBYAK ON 使用XBYAK编译,在jetson硬件上编译需要设置为OFF
WITH_NV_JETSON OFF 在NV Jetson硬件上编译时需要设为ON
============================ ============= ==================
建议按照推荐值设置,以避免链接不必要的库。其它可选编译选项按需进行设定。
首先从github拉取最新代码
.. code-block:: bash
git clone https://github.com/paddlepaddle/Paddle
cd Paddle
# 建议使用git checkout切换到Paddle稳定的版本,如:
git checkout v1.8.4
**note**: 如果您是多卡机器,建议安装NCCL;如果您是单卡机器则可以在编译时显示指定WITH_NCCL=OFF来跳过这一步。注意如果WITH_NCCL=ON,且没有安装NCCL,则编译会报错。
.. code-block:: bash
git clone https://github.com/NVIDIA/nccl.git
cd nccl
make -j4
make install
**Server端预测库源码编译**
下面的代码片段配制编译选项并进行编译(需要将PADDLE_ROOT替换为PaddlePaddle预测库的安装路径,WITH_NCCL根据实际情况进行修改):
.. code-block:: bash
PADDLE_ROOT=/path/of/paddle
cd Paddle
mkdir build
cd build
cmake -DFLUID_INFERENCE_INSTALL_DIR=$PADDLE_ROOT \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_PYTHON=OFF \
-DWITH_MKL=OFF \
-DWITH_GPU=OFF \
-DON_INFER=ON \
-DWITH_NCCL=OFF \
..
make
make inference_lib_dist
**NVIDIA Jetson嵌入式硬件预测库源码编译**
NVIDIA Jetson是NVIDIA推出的嵌入式AI平台,Paddle Inference支持在 NVIDIA Jetson平台上编译预测库。具体步骤如下:
1. 准备环境
开启硬件性能模式
.. code-block:: bash
sudo nvpmodel -m 0 && sudo jetson_clocks
如果硬件为Nano,增加swap空间
.. code-block:: bash
#增加DDR可用空间,Xavier默认内存为16G,所以内存足够,如想在Nano上尝试,请执行如下操作。
sudo fallocate -l 5G /var/swapfile
sudo chmod 600 /var/swapfile
sudo mkswap /var/swapfile
sudo swapon /var/swapfile
sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
2. 编译Paddle Inference预测库
.. code-block:: bash
cd Paddle
mkdir build
cd build
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=OFF \
-DWITH_MKLDNN=OFF \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DON_INFER=ON \
-DWITH_PYTHON=OFF \
-DWITH_XBYAK=OFF \
-DWITH_NV_JETSON=ON
make -j4
# 生成预测lib
make inference_lib_dist -j4
3. 样例测试
请参照官网样例:https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/performance_improving/inference_improving/paddle_tensorrt_infer.html#id2
**FAQ**
1. 报错:
.. code-block:: bash
ERROR: ../aarch64-linux-gpn/crtn.o: Too many open files.
则增加系统同一时间最多可开启的文件数至2048
.. code-block:: bash
ulimit -n 2048
2. 编译卡住
可能是下载第三方库较慢的原因,耐心等待或kill掉编译进程重新编译
3. 使用TensorRT报错IPluginFactory或IGpuAllocator缺少虚析构函数
下载安装TensorRT后,在NvInfer.h文件中为class IPluginFactory和class IGpuAllocator分别添加虚析构函数:
.. code-block:: bash
virtual ~IPluginFactory() {};
virtual ~IGpuAllocator() {};
成功编译后,使用C++预测库所需的依赖(包括:(1)编译出的PaddlePaddle预测库和头文件;(2)第三方链接库和头文件;(3)版本信息与编译选项信息)
均会存放于PADDLE_ROOT目录中。目录结构如下:
.. code-block:: text
PaddleRoot/
├── CMakeCache.txt
├── paddle
│   ├── include
│   │   ├── paddle_anakin_config.h
│   │   ├── paddle_analysis_config.h
│   │   ├── paddle_api.h
│   │   ├── paddle_inference_api.h
│   │   ├── paddle_mkldnn_quantizer_config.h
│   │   └── paddle_pass_builder.h
│   └── lib
│   ├── libpaddle_fluid.a
│   └── libpaddle_fluid.so
├── third_party
│   └── install
│   ├── gflags
│   ├── glog
│   ├── mkldnn
│   ├── mklml
│   └── protobuf
└── version.txt
version.txt 中记录了该预测库的版本信息,包括Git Commit ID、使用OpenBlas或MKL数学库、CUDA/CUDNN版本号,如:
.. code-block:: text
GIT COMMIT ID: 0231f58e592ad9f673ac1832d8c495c8ed65d24f
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 10.1
CUDNN version: v7
.. _install_or_build_cpp_inference_lib_en:
Install and Compile C++ Inference Library on Linux
=============================================
Direct Download and Installation
---------------------------------
.. csv-table:: c++ inference library list
:header: "version description", "inference library(1.8.4 version)", "inference library(2.0.0-beta0 version)", "inference library(develop version)"
:widths: 3, 2, 2, 2
"ubuntu14.04_cpu_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-mkl/fluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-cpu-avx-mkl/paddle_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-mkl/paddle_inference.tgz>`_"
"ubuntu14.04_cpu_avx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-avx-openblas/fluid_inference.tgz>`_", ,"`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-avx-openblas/paddle_inference.tgz>`_"
"ubuntu14.04_cpu_noavx_openblas", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-cpu-noavx-openblas/fluid_inference.tgz>`_", ,"`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-cpu-noavx-openblas/paddle_inference.tgz>`_"
"ubuntu14.04_cuda9.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda9-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-gpu-cuda9-cudnn7-avx-mkl/paddle_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda9-cudnn7-avx-mkl/paddle_inference.tgz>`_"
"ubuntu14.04_cuda10.0_cudnn7_avx_mkl", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10-cudnn7-avx-mkl/fluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-gpu-cuda10-cudnn7-avx-mkl/paddle_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/latest-gpu-cuda10-cudnn7-avx-mkl/paddle_inference.tgz>`_"
"ubuntu14.04_cuda10.1_cudnn7.6_avx_mkl_trt6", "`fluid_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/1.8.4-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Ffluid_inference.tgz>`_", "`paddle_inference.tgz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-gpu-cuda10.1-cudnn7.6-avx-mkl-trt6%2Fpaddle_inference.tgz>`_",
"nv-jetson-cuda10-cudnn7.5-trt5", "`fluid_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/1.7.1-nv-jetson-cuda10-cudnn7.5-trt5/fluid_inference.tar.gz>`_", "`paddle_inference.tar.gz <https://paddle-inference-lib.bj.bcebos.com/2.0.0-beta0-nv-jetson-cuda10-cudnn7.5-trt5/paddle_inference.tgz>`_",
Build from Source Code
-----------------------
Users can also compile C++ inference libraries from the PaddlePaddle core code by specifying the following compile options at compile time:
============================ =============== ==================
Option Value Description
============================ =============== ==================
CMAKE_BUILD_TYPE Release cmake build type, set to Release if debug messages are not needed
FLUID_INFERENCE_INSTALL_DIR path install path of inference libs
WITH_PYTHON OFF(recomended) build python libs and whl package
ON_INFER ON(recomended) build with inference settings
WITH_GPU ON/OFF build inference libs on GPU
WITH_MKL ON/OFF build inference libs supporting MKL
WITH_MKLDNN ON/OFF build inference libs supporting MKLDNN
WITH_XBYAK ON build with XBYAK, must be OFF when building on NV Jetson platforms
WITH_NV_JETSON OFF build inference libs on NV Jetson platforms
============================ =============== ==================
It is recommended to configure options according to the recommended values to avoid linking unnecessary libraries. Other options can be set if it is necessary.
Firstly we pull the latest code from github.
.. code-block:: bash
git clone https://github.com/paddlepaddle/Paddle
cd Paddle
# Use git checkout to switch to stable versions such as v1.8.4
git checkout v1.8.4
**note**: If your environment is a multi-card machine, it is recommended to install nccl; otherwise, you can skip this step by specifying WITH_NCCL = OFF during compilation. Note that if WITH_NCCL = ON, and NCCL is not installed, the compiler will report an error.
.. code-block:: bash
git clone https://github.com/NVIDIA/nccl.git
cd nccl
make -j4
make install
**build inference libs on server**
Following codes set the configurations and execute building(PADDLE_ROOT should be set to the actual installing path of inference libs, WITH_NCCL should be modified according to the actual environment.).
.. code-block:: bash
PADDLE_ROOT=/path/of/capi
git clone https://github.com/PaddlePaddle/Paddle.git
cd Paddle
mkdir build
cd build
cmake -DFLUID_INFERENCE_INSTALL_DIR=$PADDLE_ROOT \
-DCMAKE_BUILD_TYPE=Release \
-DWITH_PYTHON=OFF \
-DWITH_MKL=OFF \
-DWITH_GPU=OFF \
-DON_INFER=ON \
-DWITH_NCCL=OFF \
..
make
make inference_lib_dist
**build inference libs on NVIDIA Jetson platforms**
NVIDIA Jetson is an AI computing platform in embedded systems introduced by NVIDIA. Paddle Inference supports building inference libs on NVIDIA Jetson platforms. The steps are as following.
1. Prepare environments
Turn on hardware performance mode
.. code-block:: bash
sudo nvpmodel -m 0 && sudo jetson_clocks
if building on Nano hardwares, increase swap memory
.. code-block:: bash
# Increase DDR valid space. Default memory allocated is 16G, which is enough for Xavier. Following steps are for Nano hardwares.
sudo fallocate -l 5G /var/swapfile
sudo chmod 600 /var/swapfile
sudo mkswap /var/swapfile
sudo swapon /var/swapfile
sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
2. Build paddle inference libs
.. code-block:: bash
cd Paddle
mkdir build
cd build
cmake .. \
-DWITH_CONTRIB=OFF \
-DWITH_MKL=OFF \
-DWITH_MKLDNN=OFF \
-DWITH_TESTING=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DON_INFER=ON \
-DWITH_PYTHON=OFF \
-DWITH_XBYAK=OFF \
-DWITH_NV_JETSON=ON
make -j4
# Generate inference libs
make inference_lib_dist -j4
3. Test with samples
Please refer to samples on https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/performance_improving/inference_improving/paddle_tensorrt_infer.html#id2
**FAQ**
1. Error:
.. code-block:: bash
ERROR: ../aarch64-linux-gpn/crtn.o: Too many open files.
Fix this by increasing the number of files the system can open at the same time to 2048.
.. code-block:: bash
ulimit -n 2048
2. The building process hangs.
Might be downloading third-party libs. Wait or kill the building process and start again.
3. Lacking virtual destructors for IPluginFactory or IGpuAllocator when using TensorRT.
After downloading and installing TensorRT, add virtual destructors for IPluginFactory and IGpuAllocator in NvInfer.h:
.. code-block:: bash
virtual ~IPluginFactory() {};
virtual ~IGpuAllocator() {};
After successful compilation, dependencies required by the C++ inference library Will be stored in the PADDLE_ROOT directory. (dependencies including: (1) compiled PaddlePaddle inference library and header files; (2) third-party link libraries and header files; (3) version information and compilation option information)
The directory structure is:
.. code-block:: text
PaddleRoot/
├── CMakeCache.txt
├── paddle
│ ├── include
│ │ ├── paddle_anakin_config.h
│ │ ├── paddle_analysis_config.h
│ │ ├── paddle_api.h
│ │ ├── paddle_inference_api.h
│   │   ├── paddle_mkldnn_quantizer_config.h
│ │ └── paddle_pass_builder.h
│ └── lib
│ ├── libpaddle_fluid.a
│ └── libpaddle_fluid.so
├── third_party
│ ├── boost
│ │ └── boost
│ ├── eigen3
│ │ ├── Eigen
│ │ └── unsupported
│ └── install
│ ├── gflags
│ ├── glog
│ ├── mkldnn
│ ├── mklml
│ ├── protobuf
│ ├── snappy
│ ├── snappystream
│ ├── xxhash
│ └── zlib
└── version.txt
The version information of the inference library is recorded in version.txt, including Git Commit ID, version of OpenBlas, MKL math library, or CUDA/CUDNN. For example:
.. code-block:: text
GIT COMMIT ID: cc9028b90ef50a825a722c55e5fda4b7cd26b0d6
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 8.0
CUDNN version: v7
# C 预测 API介绍
Fluid提供了高度优化的[C++预测库](./native_infer.html),为了方便使用,我们也提供了封装了C++预测库对应的C接口。C接口的使用方式,首先是需要`#include paddle_c_api.h`,头文件`paddle_c_api.h`可以在Paddle的仓库中的`paddle/fluid/inference/capi/paddle_c_api.h`找到,或是在编译Paddle的`Paddle/build/`路径下,`build/fluid_inference_c_install_dir/paddle/include/`路径下找到。此外,使用 CAPI 还需要在编译项目的时候,链接相关的编译的库`libpaddle_fluid_c.so`。下面是详细的使用说明。
需要说明的是,与 C++ API 不同,C API 为了兼顾多语言封装的需要,将不会再设置默认参数,即使用时,所有的参数都需要用户显式地提供。
## C预测相关数据结构
使用C预测API与C++预测API不完全一样,C预测主要包括`PD_AnalysisConfig`, `PD_DataType`, `PD_Predictor`, `PD_Buffer``PD_ZeroCopyTensor`。接下来将会进一步详细地介绍这些数据结构以及使用的方法,并提供相应的示例。
### PD_AnalysisConfig
`PD_AnalysisConfig`是创建预测引擎的配置,提供了模型路径设置、预测引擎运行设备选择以及多种优化预测流程的选项,主要包括以下方法
* `PD_AnalysisConfig* PD_NewAnalysisConfig()`: 新建一个`PD_AnalysisConfig`的指针。
* `void PD_DeleteAnalysisConfig(PD_AnalysisConfig* config)`: 删除一个`PD_AnalysisConfig`的指针。
* `void PD_SetModel(PD_AnalysisConfig* config, const char* model_dir, const char* params_path)`: 设置模型的路径,输入的参数包括`PD_AnalysisConfig``model_dir``params_path`,其中`model_dir`是指的是模型保存位置的路径,一般不用包括文件名,`params_path`为可选参数,<strong>注意</strong>:
- 如果不给定`params_path`,即`params_path``NULL`,则认为该模型的参数存储路径与`model_dir`一致,且模型文件和参数文件是按照默认的文件名存储的,此时参数文件可能有多个。此时,需要用户输入参数与模型文件的`model_dir`,即<strong>模型和参数保存的路径名</strong>,不需要指定文件名,同时,需要显式地设置`params_path``NULL`
- 如果提供了`params_path`,为了方便用户的自定义,则在指明`model_dir`路径最后需要加上模型文件的文件名传入,即`model_dir`传入对应的<strong>模型文件的路径</strong>`params_path`传入对应的<strong>模型参数文件的路径</strong>,需要指定文件名。
* `const char* PD_ModelDir(const PD_AnalysisConfig* config)`: 如果未指明`PD_SetModel()``params_path`,则可以返回模型文件夹路径。
* `const char* PD_ProgFile(const PD_AnalysisConfig* config)`: 如果是指明`PD_SetModel()``params_path`,则可以返回模型文件路径。
* `const char* PD_ParamsFile(const PD_AnalysisConfig* config)`: 如果是指明`PD_SetModel()``params_path`,则可以返回参数文件路径。
* `void PD_SwitchSpecifyInputNames(PD_AnalysisConfig* config, bool x)`: 设置为`true`是指模型运算在读取输入的时候,依据名称来确定不同的输入,否则根据输入的顺序。使用`PD_ZeroCopyTensor`并且是多输入的情况,建议设置为`true`
* `void PD_SwitchUseFeedFetchOps(PD_AnalysisConfig* config, bool x)`: 设置是否使用`feed``fetch` op。在使用`PD_ZeroCopyTensor`必须设置该选项为`false`
* `void PD_EnableUseGpu(PD_AnalysisConfig* config, uint64_t memory_pool_init_size_mb, int device_id)`: 设置开启GPU,并且设定GPU显存(单位M)和设备的Device ID。
* `void PD_DisableGpu(PD_AnalysisConfig* config)`: 禁用GPU。
* `int PD_GpuDeviceId(const PD_AnalysisConfig* config)`: 返回使用的GPU设备的ID。
* `void PD_SwitchIrOptim(PD_AnalysisConfig* config, bool x)`: 设置预测是否开启IR优化。
* `void PD_EnableTensorRtEngine(PD_AnalysisConfig* config, int workspace_size, int max_batch_size, int min_subgraph_size, Precision precision, bool use_static, bool use_calib_mode)`: 开启TensorRT。关于参数的解释,详见[使用Paddle-TensorRT库预测](../../performance_improving/inference_improving/paddle_tensorrt_infer.html)
* `void PD_EnableMKLDNN(PD_AnalysisConfig* config)`: 开启MKLDNN。
#### 代码示例
首先,新建一个`PD_AnalysisConfig`的指针。
``` C
PD_AnalysisConfig* config = PD_NewAnalysisConfig();
```
如前文所述,设置模型和参数路径有两种形式:
* 当模型文件夹下存在一个以默认文件名保存的模型文件和多个参数文件时,传入模型文件夹路径,模型文件名默认为`__model__`,需要显式地设置`params_path``NULL`,不需要指定文件名。
``` C
const char* model_dir = "./model/";
PD_SetModel(config, model_dir, NULL);
```
* 当模型文件夹下只有一个模型文件和一个参数文件,传入模型文件和参数文件,需要指定文件名。
``` C
const char* model_path = "./model/model";
const char* params_path = "./params/params";
PD_SetModel(config, model_path, params_path);
```
其他预测引擎配置选项示例如下
``` C
PD_EnableUseGpu(config, 100, 0); // 初始化100M显存,使用的gpu id为0
PD_GpuDeviceId(config); // 返回正在使用的gpu id
PD_DisableGpu(config); // 禁用gpu
PD_SwitchIrOptim(config, true); // 开启IR优化
PD_EnableMKLDNN(config); // 开启MKLDNN
PD_SwitchSpecifyInputNames(config, true);
PD_SwitchUseFeedFetchOps(config, false);
```
### PD_ZeroCopyTensor
`PD_ZeroCopyTensor`是设置数据传入预测运算的数据结构。包括一下成员:
* `data - (PD_Buffer)`: 设置传入数据的值。
* `shape - (PD_Buffer)`: 设置传入数据的形状(shape)。
* `lod - (PD_Buffer)`: 设置数据的`lod`,目前只支持一阶的`lod`
* `dtype - (PD_DataType)`: 设置传入数据的数据类型,用枚举`PD_DataType`表示。
* `name - (char*)`: 设置传入数据的名称。
涉及使用`PD_ZeroCopyTensor`有以下方法:
* `PD_ZeroCopyTensor* PD_NewZeroCopyTensor()`: 新创建一个`PD_ZeroCopyTensor`的指针。
* `void PD_DeleteZeroCopyTensor(PD_ZeroCopyTensor*)`: 删除一个`PD_ZeroCopyTensor`的指针。
* `void PD_InitZeroCopyTensor(PD_ZeroCopyTensor*)`: 使用默认初始化一个`PD_ZeroCopyTensor`的指针并分配的内存空间。
* `void PD_DestroyZeroCopyTensor(PD_ZeroCopyTensor*)`: 删除`PD_ZeroCopyTensor`指针中,`data``shape``lod``PD_Buffer`的变量。
### PD_DataType
`PD_DataType`是一个提供给用户的枚举,用于设定存有用户数据的`PD_ZeroCopyTensor`的数据类型。包括以下成员:
* `PD_FLOAT32`: 32位浮点型
* `PD_INT32`: 32位整型
* `PD_INT64`: 64位整型
* `PD_UINT8`: 8位无符号整型
#### 代码示例
首先可以新建一个`PD_ZeroCopyTensor`
``` C
PD_ZeroCopyTensor input;
PD_InitZeroCopyTensor(&input);
```
调用设置`PD_ZeroCopyTensor`的数据类型的方式如下:
``` C
input.dtype = PD_FLOAT32;
```
### PD_Buffer
`PD_Buffer`可以用于设置`PD_ZeroCopyTensor`数据结构中,数据的`data``shape``lod`。包括以下成员:
* `data`: 输入的数据,类型是`void*`,用于存储数据开始的地址。
* `length`: 输入数据的实际的<strong>字节长度</strong>
* `capacity`: 为数据分配的内存大小,必定大于等于`length`
### 示例代码
``` C
PD_ZeroCopyTensor input;
PD_InitZeroCopyTensor(&input);
// 设置输入的名称
input.name = "data";
// 设置输入的数据大小
input.data.capacity = sizeof(float) * 1 * 3 * 300 * 300;
input.data.length = input.data.capacity;
input.data.data = malloc(input.data.capacity);
// 设置数据的输入的形状 shape
int shape[] = {1, 3, 300, 300};
input.shape.data = (int *)shape;
input.shape.capacity = sizeof(shape);
input.shape.length = sizeof(shape);
// 设置输入数据的类型
input.dtype = PD_FLOAT32;
```
### PD_Predictor
`PD_Predictor`是一个高性能预测引擎,该引擎通过对计算图的分析,可以完成对计算图的一系列的优化(如OP的融合、内存/显存的优化、 MKLDNN,TensorRT 等底层加速库的支持等)。主要包括一下函数:
* `PD_Predictor* PD_NewPredictor(const PD_AnalysisConfig* config)`: 创建一个新的`PD_Predictor`的指针。
* `void PD_DeletePredictor(PD_Predictor* predictor)`: 删除一个`PD_Predictor`的指针。
* `int PD_GetInputNum(const PD_Predictor* predictor)`: 获取模型输入的个数。
* `int PD_GetOutputNum(const PD_Predictor* predictor)`: 获取模型输出的个数。
* `const char* PD_GetInputName(const PD_Predictor* predictor, int n)`: 获取模型第`n`个输入的名称。
* `const char* PD_GetOutputName(const PD_Predictor* predictor, int n)`: 获取模型第`n`个输出的名称。
* `void PD_SetZeroCopyInput(PD_Predictor* predictor, const PD_ZeroCopyTensor* tensor)`: 使用`PD_ZeroCopyTensor`数据结构设置模型输入的具体值、形状、lod等信息。目前只支持一阶lod。
* `void PD_GetZeroCopyOutput(PD_Predictor* predictor, PD_ZeroCopyTensor* tensor)`: 使用`PD_ZeroCopyTensor`数据结构获取模型输出的具体值、形状、lod等信息。目前只支持一阶lod。
* `void PD_ZeroCopyRun(PD_Predictor* predictor)`: 运行预测的引擎,完成模型由输入到输出的计算。
#### 代码示例
如前文所述,当完成网络配置`PD_AnalysisConfig`以及输入`PD_ZeroCopyTensor`的设置之后,只需要简单的几行代码就可以获得模型的输出。
首先完成`PD_AnalysisConfig`的设置,设置的方式与相关的函数如前文所述,这里同样给出了示例。
``` C
PD_AnalysisConfig* config = PD_NewAnalysisConfig();
const char* model_dir = "./model/";
PD_SetModel(config, model_dir, NULL);
PD_DisableGpu(config);
PD_SwitchSpecifyInputNames(config, true); // 使用PD_ZeroCopyTensor并且是多输入建议设置。
PD_SwitchUseFeedFetchOps(config, false); // 使用PD_ZeroCopyTensor一定需要设置为false。
```
其次,完成相应的输入的设置,设置的方式如前文所述,这里同样给出了示例。
``` C
PD_ZeroCopyTensor input;
PD_InitZeroCopyTensor(&input);
// 设置输入的名称
input.name = (char *)(PD_GetInputName(predictor, 0));
// 设置输入的数据大小
input.data.capacity = sizeof(float) * 1 * 3 * 300 * 300;
input.data.length = input.data.capacity;
input.data.data = malloc(input.data.capacity);
// 设置数据的输入的形状(shape)
int shape[] = {1, 3, 300, 300};
input.shape.data = (int *)shape;
input.shape.capacity = sizeof(shape);
input.shape.length = sizeof(shape);
// 设置输入数据的类型
input.dtype = PD_FLOAT32;
```
最后,执行预测引擎,完成计算的步骤。
``` C
PD_Predictor *predictor = PD_NewPredictor(config);
int input_num = PD_GetInputNum(predictor);
printf("Input num: %d\n", input_num);
int output_num = PD_GetOutputNum(predictor);
printf("Output num: %d\n", output_num);
PD_SetZeroCopyInput(predictor, &input); // 这里只有一个输入,根据多输入情况,可以传入一个数组
PD_ZeroCopyRun(predictor); // 执行预测引擎
PD_ZeroCopyTensor output;
PD_InitZeroCopyTensor(&output);
output.name = (char *)(PD_GetOutputName(predictor, 0));
PD_GetZeroCopyOutput(predictor, &output);
```
最后,可以根据前文所述的`PD_ZeroCopyTensor`的数据结构,获得返回的数据的值等信息。
## 完整使用示例
下面是使用Fluid C API进行预测的一个完整示例,使用resnet50模型
下载[resnet50模型](http://paddle-inference-dist.bj.bcebos.com/resnet50_model.tar.gz)并解压,运行如下代码将会调用预测引擎。
``` C
#include "paddle_c_api.h"
#include <memory.h>
#include <malloc.h>
/*
* The main procedures to run a predictor according to c-api:
* 1. Create config to set how to process the inference.
* 2. Prepare the input PD_ZeroCopyTensor for the inference.
* 3. Set PD_Predictor.
* 4. Call PD_ZeroCopyRun() to start.
* 5. Obtain the output.
* 6. According to the size of the PD_PaddleBuf's data's size, print all the output data.
*/
int main() {
// 配置 PD_AnalysisConfig
PD_AnalysisConfig* config = PD_NewAnalysisConfig();
PD_DisableGpu(config);
const char* model_path = "./model/model";
const char* params_path = "./model/params";
PD_SetModel(config, model_path, params_path);
PD_SwitchSpecifyInputNames(config, true);
PD_SwitchUseFeedFetchOps(config, false);
// 新建一个 PD_Predictor 的指针
PD_Predictor *predictor = PD_NewPredictor(config);
// 获取输入输出的个数
int input_num = PD_GetInputNum(predictor);
printf("Input num: %d\n", input_num);
int output_num = PD_GetOutputNum(predictor);
printf("Output num: %d\n", output_num);
// 设置输入的数据结构
PD_ZeroCopyTensor input;
PD_InitZeroCopyTensor(&input);
// 设置输入的名称
input.name = (char *)(PD_GetInputName(predictor, 0));
// 设置输入的数据大小
input.data.capacity = sizeof(float) * 1 * 3 * 318 * 318;
input.data.length = input.data.capacity;
input.data.data = malloc(input.data.capacity);
memset(input.data.data, 0, (sizeof(float) * 3 * 318 * 318));
// 设置数据的输入的形状(shape)
int shape[] = {1, 3, 318, 318};
input.shape.data = (int *)shape;
input.shape.capacity = sizeof(shape);
input.shape.length = sizeof(shape);
// 设置输入数据的类型
input.dtype = PD_FLOAT32;
PD_SetZeroCopyInput(predictor, &input);
// 执行预测引擎
PD_ZeroCopyRun(predictor);
// 获取预测输出
PD_ZeroCopyTensor output;
PD_InitZeroCopyTensor(&output);
output.name = (char *)(PD_GetOutputName(predictor, 0));
// 获取 output 之后,可以通过该数据结构,读取到 data, shape 等信息
PD_GetZeroCopyOutput(predictor, &output);
float* result = (float *)(output.data.data);
int result_length = output.data.length / sizeof(float);
return 0;
}
```
运行以上代码,需要将 paddle_c_api.h 拷贝到指定位置,确保编译时可以找到这个头文件。同时,需要将 libpaddle_fluid_c.so 的路径加入环境变量。
最后可以使用 gcc 命令编译。
``` shell
gcc ${SOURCE_NAME} \
-lpaddle_fluid_c
```
############
服务器端部署
############
PaddlePaddle 提供了C++,C和Python的API来支持模型的部署上线。
.. toctree::
:titlesonly:
build_and_install_lib_cn.rst
windows_cpp_inference.md
native_infer.md
c_infer_cn.md
python_infer_cn.md
######################
Server-side Deployment
######################
PaddlePaddle provides various methods to support deployment and release of trained models.
.. toctree::
:titlesonly:
build_and_install_lib_en.rst
windows_cpp_inference_en.md
native_infer_en.md
paddle_gpu_benchmark_en.md
# Introduction to C++ Inference API
To make the deployment of inference model more convenient, a set of high-level APIs are provided in Fluid to hide diverse optimization processes in low level.
Details are as follows:
## <a name="Use AnalysisPredictor to perform high-performance inference"> Use AnalysisPredictor to perform high-performance inference</a>
Paddy fluid uses AnalysisPredictor to perform inference. AnalysisPredictor is a high-performance inference engine. Through the analysis of the calculation graph, the engine completes a series of optimization of the calculation graph (such as the integration of OP, the optimization of memory / graphic memory, the support of MKLDNN, TensorRT and other underlying acceleration libraries), which can greatly improve the inference performance.
In order to show the complete inference process, the following is a complete example of using AnalysisPredictor. The specific concepts and configurations involved will be detailed in the following sections.
#### AnalysisPredictor sample
``` c++
#include "paddle_inference_api.h"
namespace paddle {
void CreateConfig(AnalysisConfig* config, const std::string& model_dirname) {
// load model from disk
config->SetModel(model_dirname + "/model",
model_dirname + "/params");
// config->SetModel(model_dirname);
// use SetModelBuffer if load model from memory
// config->SetModelBuffer(prog_buffer, prog_size, params_buffer, params_size);
config->EnableUseGpu(100 /*init graphic memory by 100MB*/, 0 /*set GPUID to 0*/);
/* for cpu
config->DisableGpu();
config->EnableMKLDNN(); // enable MKLDNN
config->SetCpuMathLibraryNumThreads(10);
*/
config->SwitchUseFeedFetchOps(false);
// set to true if there are multiple inputs
config->SwitchSpecifyInputNames(true);
config->SwitchIrDebug(true); // If the visual debugging option is enabled, a dot file will be generated after each graph optimization process
// config->SwitchIrOptim(false); // The default is true. Turn off all optimizations if set to false
// config->EnableMemoryOptim(); // Enable memory / graphic memory reuse
}
void RunAnalysis(int batch_size, std::string model_dirname) {
// 1. create AnalysisConfig
AnalysisConfig config;
CreateConfig(&config, model_dirname);
// 2. create predictor based on config, and prepare input data
auto predictor = CreatePaddlePredictor(config);
int channels = 3;
int height = 224;
int width = 224;
float input[batch_size * channels * height * width] = {0};
// 3. build inputs
// uses ZeroCopy API here to avoid extra copying from CPU, improving performance
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputTensor(input_names[0]);
input_t->Reshape({batch_size, channels, height, width});
input_t->copy_from_cpu(input);
// 4. run inference
CHECK(predictor->ZeroCopyRun());
// 5. get outputs
std::vector<float> out_data;
auto output_names = predictor->GetOutputNames();
auto output_t = predictor->GetOutputTensor(output_names[0]);
std::vector<int> output_shape = output_t->shape();
int out_num = std::accumulate(output_shape.begin(), output_shape.end(), 1, std::multiplies<int>());
out_data.resize(out_num);
output_t->copy_to_cpu(out_data.data());
}
} // namespace paddle
int main() {
// the model can be downloaded from http://paddle-inference-dist.cdn.bcebos.com/tensorrt_test/mobilenet.tar.gz
paddle::RunAnalysis(1, "./mobilenet");
return 0;
}
```
## <a name="Use AnalysisConfig to manage inference configurations"> Use AnalysisConfig to manage inference configurations</a>
AnalysisConfig manages the inference configuration of AnalysisPredictor, providing model path setting, inference engine running device selection, and a variety of options to optimize the inference process. The configuration method is as follows:
#### General optimizing configuration
``` c++
config->SwitchIrOptim(true); // Enable analysis and optimization of calculation graph,including OP fusion, etc
config->EnableMemoryOptim(); // Enable memory / graphic memory reuse
```
**Note:** Using ZeroCopyTensor requires following setting:
``` c++
config->SwitchUseFeedFetchOps(false); // disable feed and fetch OP
```
#### set model and param path
When loading the model from disk, there are two ways to set the path of AnalysisConfig to load the model and parameters according to the storage mode of the model and parameter file:
* Non combined form: when there is a model file and multiple parameter files under the model folder 'model_dir', the path of the model folder is passed in. The default name of the model file is'__model_'.
``` c++
config->SetModel("./model_dir");
```
* Combined form: when there is only one model file 'model' and one parameter file 'params' under the model folder' model_dir ', the model file and parameter file path are passed in.
``` c++
config->SetModel("./model_dir/model", "./model_dir/params");
```
At compile time, it is proper to co-build with `libpaddle_fluid.a/.so` .
#### Configure CPU inference
``` c++
config->DisableGpu(); // disable GPU
config->EnableMKLDNN(); // enable MKLDNN, accelerating CPU inference
config->SetCpuMathLibraryNumThreads(10); // set number of threads of CPU Math libs, accelerating CPU inference if CPU cores are adequate
```
#### Configure GPU inference
``` c++
config->EnableUseGpu(100, 0); // initialize 100M graphic memory, using GPU ID 0
config->GpuDeviceId(); // Returns the GPU ID being used
// Turn on TRT to improve GPU performance. You need to use library with tensorrt
config->EnableTensorRtEngine(1 << 20 /*workspace_size*/,
batch_size /*max_batch_size*/,
3 /*min_subgraph_size*/,
AnalysisConfig::Precision::kFloat32 /*precision*/,
false /*use_static*/,
false /*use_calib_mode*/);
```
## <a name="Use ZeroCopyTensor to manage I/O"> Use ZeroCopyTensor to manage I/O</a>
ZeroCopyTensor is the input / output data structure of AnalysisPredictor. The use of zerocopytensor can avoid redundant data copy when preparing input and obtaining output, and improve inference performance.
**Note:** Using zerocopytensor, be sure to set `config->SwitchUseFeedFetchOps(false);`.
``` c++
// get input/output tensor
auto input_names = predictor->GetInputNames();
auto input_t = predictor->GetInputTensor(input_names[0]);
auto output_names = predictor->GetOutputNames();
auto output_t = predictor->GetOutputTensor(output_names[0]);
// reshape tensor
input_t->Reshape({batch_size, channels, height, width});
// Through the copy_from_cpu interface, the CPU data is prepared; through the copy_to_cpu interface, the output data is copied to the CPU
input_t->copy_from_cpu<float>(input_data /*data pointer*/);
output_t->copy_to_cpu(out_data /*data pointer*/);
// set LOD
std::vector<std::vector<size_t>> lod_data = {{0}, {0}};
input_t->SetLoD(lod_data);
// get Tensor data pointer
float *input_d = input_t->mutable_data<float>(PaddlePlace::kGPU); // use PaddlePlace::kCPU when running inference on CPU
int output_size;
float *output_d = output_t->data<float>(PaddlePlace::kGPU, &output_size);
```
## <a name="C++ inference sample"> C++ inference sample</a>
1. Download or compile C++ Inference Library, refer to [Install and Compile C++ Inference Library](./build_and_install_lib_en.html).
2. Download [C++ inference sample](https://paddle-inference-dist.bj.bcebos.com/tensorrt_test/paddle_inference_sample_v1.7.tar.gz) and uncompress it , then enter `sample/inference` directory.
`inference` directory structure is as following:
``` shell
inference
├── CMakeLists.txt
├── mobilenet_test.cc
├── thread_mobilenet_test.cc
├── mobilenetv1
│ ├── model
│ └── params
├── run.sh
└── run_impl.sh
```
- `mobilenet_test.cc` is the source code for single-thread inference.
- `thread_mobilenet_test.cc` is the source code for multi-thread inference.
- `mobilenetv1` is the model directory.
- `run.sh` is the script for running inference.
3. Configure script:
Before running, we need to configure script `run.sh` as following:
``` shell
# set whether to enable MKL, GPU or TensorRT. Enabling TensorRT requires WITH_GPU being ON
WITH_MKL=ON
WITH_GPU=OFF
USE_TENSORRT=OFF
# set path to CUDA lib dir, CUDNN lib dir, TensorRT root dir and model dir
LIB_DIR=YOUR_LIB_DIR
CUDA_LIB_DIR=YOUR_CUDA_LIB_DIR
CUDNN_LIB_DIR=YOUR_CUDNN_LIB_DIR
TENSORRT_ROOT_DIR=YOUR_TENSORRT_ROOT_DIR
MODEL_DIR=YOUR_MODEL_DIR
```
Please configure `run.sh` depending on your environment.
4. Build and run the sample.
``` shell
sh run.sh
```
## <a name="Performance tuning"> Performance tuning</a>
### Tuning on CPU
1. If the CPU model allows, try to use the version with AVX and MKL.
2. You can try to use Intel's MKLDNN acceleration.
3. When the number of CPU cores available is enough, you can increase the num value in the setting `config->SetCpuMathLibraryNumThreads(num);`.
### Tuning on GPU
1. You can try to open the TensorRT subgraph acceleration engine. Through the graph analysis, Paddle can automatically fuse certain subgraphs, and call NVIDIA's TensorRT for acceleration. For details, please refer to [Use Paddle-TensorRT Library for inference](../../performance_improving/inference_improving/paddle_tensorrt_infer_en.html)
### Tuning with multi-thread
Paddle Fluid supports optimizing prediction performance by running multiple AnalysisPredictors on different threads, and supports CPU and GPU environments.
sample of using multi-threads is `thread_mobilenet_test.cc` downloaded from [sample](https://paddle-inference-dist.bj.bcebos.com/tensorrt_test/paddle_inference_sample_v1.7.tar.gz). You can change `mobilenet_test` in `run.sh` to `thread_mobilenet_test` to run inference with multi-thread.
```
sh run.sh
```
# Performance Profiling for TensorRT Library
## Test Environment
- CPU:Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz GPU:Tesla P4
- TensorRT4.0, CUDA8.0, CUDNNV7
- Test model ResNet50, MobileNet, ResNet101, Inception V3.
## Test Targets
**PaddlePaddle, Pytorch, Tensorflow**
- In test, PaddlePaddle adopts subgraph optimization to integrate TensorRT [model](https://github.com/PaddlePaddle/models/tree/develop/fluid/PaddleCV/image_classification/models) .
- Native implementation is used in Pytorch. Model [address 1](https://github.com/pytorch/vision/tree/master/torchvision/models) , [address 2](https://github.com/marvis/pytorch-mobilenet) .
- Test for TensorFlow contains test for native TF and TF—TRT. **Test for TF—TRT hasn't reached expectation wihch will be complemented later**. Model [address](https://github.com/tensorflow/models) .
### ResNet50
|batch_size|PaddlePaddle(ms)|Pytorch(ms)|TensorFlow(ms)|
|---|---|---|---|
|1|4.64117 |16.3|10.878|
|5|6.90622| 22.9 |20.62|
|10|7.9758 |40.6|34.36|
### MobileNet
|batch_size|PaddlePaddle(ms)|Pytorch(ms)|TensorFlow(ms)|
|---|---|---|---|
|1| 1.7541 | 7.8 |2.72|
|5| 3.04666 | 7.8 |3.19|
|10|4.19478 | 14.47 |4.25|
### ResNet101
|batch_size|PaddlePaddle(ms)|Pytorch(ms)|TensorFlow(ms)|
|---|---|---|---|
|1|8.95767| 22.48 |18.78|
|5|12.9811 | 33.88 |34.84|
|10|14.1463| 61.97 |57.94|
### Inception v3
|batch_size|PaddlePaddle(ms)|Pytorch(ms)|TensorFlow(ms)|
|---|---|---|---|
|1|15.1613 | 24.2 |19.1|
|5|18.5373 | 34.8 |27.2|
|10|19.2781| 54.8 |36.7|
# Python 预测 API介绍
Paddle提供了高度优化的[C++预测库](./native_infer.html),为了方便使用,我们也提供了C++预测库对应的Python接口,下面是详细的使用说明。
如果您在使用2.0之前的Paddle,请参考[旧版API](https://www.paddlepaddle.org.cn/documentation/docs/zh/1.8/advanced_guide/inference_deployment/inference/python_infer_cn.html)文档。
## Python预测相关数据结构
使用Python预测API与C++预测API相似,主要包括`Tensor`, `DataType`, `Config``Predictor`,分别对应于C++ API中同名的类型。
### DataType
class paddle.inference.DataType
`DataType`定义了`Tensor`的数据类型,由传入`Tensor`的numpy数组类型确定,包括以下成员
* `INT64`: 64位整型
* `INT32`: 32位整型
* `FLOAT32`: 32位浮点型
### PrecisionType
class paddle.3.inference.PrecisionType
`PrecisionType`定义了`Predictor`运行的精度模式,包括一下成员
* `Float32`: fp32模式运行
* `Half`: fp16模式运行
* `Int8`: int8模式运行
### Tensor
class paddle.inference.Tensor
`Tensor``Predictor`的一种输入/输出数据结构,通过`predictor`获取输入/输出handle得到,主要提供以下方法
* `copy_from_cpu`: 从cpu获取模型运行所需输入数据
* `copy_to_cpu`: 获取模型运行输出结果
* `lod`: 获取lod信息
* `set_lod`: 设置lod信息
* `shape`: 获取shape信息
* `reshape`: 设置shape信息
* `type`: 获取DataType信息
``` python
# 创建predictor
predictor = create_predictor(config)
# 获取输入的名称
input_names = predictor.get_input_names()
input_tensor = predictor.get_input_handle(input_names[0])
# 设置输入
fake_input = numpy.random.randn(1, 3, 318, 318).astype("float32")
input_tensor.copy_from_cpu(fake_input)
# 运行predictor
predictor.run()
# 获取输出
output_names = predictor.get_output_names()
output_tensor = predictor.get_output_handle(output_names[0])
output_data = output_tensor.copy_to_cpu() # numpy.ndarray类型
```
### Config
class paddle.inference.Config
`Config`是创建预测引擎的配置,提供了模型路径设置、预测引擎运行设备选择以及多种优化预测流程的选项,主要包括以下方法
* `set_model`: 设置模型的路径
* `model_dir`: 返回模型文件夹路径
* `prog_file`: 返回模型文件路径
* `params_file`: 返回参数文件路径
* `enable_use_gpu`: 设置GPU显存(单位M)和Device ID
* `disable_gpu`: 禁用GPU
* `gpu_device_id`: 返回使用的GPU ID
* `switch_ir_optim`: IR优化(默认开启)
* `enable_tensorrt_engine`: 开启TensorRT
* `enable_mkldnn`: 开启MKLDNN
* `disable_glog_info`: 禁用预测中的glog日志
* `delete_pass`: 预测的时候删除指定的pass
#### 代码示例
设置模型和参数路径有两种形式:
* 当模型文件夹下存在一个模型文件和多个参数文件时,传入模型文件夹路径,模型文件名默认为`__model__`
``` python
config = Config("./model")
```
* 当模型文件夹下只有一个模型文件和一个参数文件时,传入模型文件和参数文件路径
``` python
config = Config("./model/model", "./model/params")
```
使用`set_model`方法设置模型和参数路径方式同上
其他预测引擎配置选项示例如下
``` python
config.enable_use_gpu(100, 0) # 初始化100M显存,使用gpu id为0
config.gpu_device_id() # 返回正在使用的gpu id
config.disable_gpu() # 禁用gpu
config.switch_ir_optim(True) # 开启IR优化
config.enable_tensorrt_engine(precision_mode=PrecisionType.Float32,
use_calib_mode=True) # 开启TensorRT预测,精度为fp32,开启int8离线量化
config.enable_mkldnn() # 开启MKLDNN
```
### Predictor
class paddle.inference.Predictor
`Predictor`是运行预测的引擎,由`paddle.inference.create_predictor(config)`创建,主要提供以下方法
* `run()`: 运行预测引擎,返回预测结果
* `get_input_names()`: 获取输入的名称
* `get_input_handle(input_name: str)`: 根据输入的名称获取对应的`Tensor`
* `get_output_names()`: 获取输出的名称
* `get_output_handle(output_name: str)`: 根据输出的名称获取对应的`Tensor`
#### 代码示例
``` python
# 设置完AnalysisConfig后创建预测引擎PaddlePredictor
predictor = create_predictor(config)
# 获取输入的名称
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
# 设置输入
fake_input = numpy.random.randn(1, 3, 318, 318).astype("float32")
input_handle.reshape([1, 3, 318, 318])
input_handle.copy_from_cpu(fake_input)
# 运行predictor
predictor.run()
# 获取输出
output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
```
## 完整使用示例
下面是使用Paddle Inference Python API进行预测的一个完整示例,使用resnet50模型
下载[resnet50模型](http://paddle-inference-dist.bj.bcebos.com/resnet50_model.tar.gz)并解压,运行如下命令将会调用预测引擎
``` bash
python resnet50_infer.py --model_file ./model/model --params_file ./model/params --batch_size 2
```
`resnet50_infer.py` 的内容是
``` python
import argparse
import numpy as np
from paddle.inference import Config
from paddle.inference import create_predictor
def main():
args = parse_args()
# 设置AnalysisConfig
config = set_config(args)
# 创建PaddlePredictor
predictor = create_predictor(config)
# 获取输入的名称
input_names = predictor.get_input_names()
input_handle = predictor.get_input_handle(input_names[0])
# 设置输入
fake_input = np.random.randn(1, 3, 318, 318).astype("float32")
input_handle.reshape([1, 3, 318, 318])
input_handle.copy_from_cpu(fake_input)
# 运行predictor
predictor.run()
# 获取输出
output_names = predictor.get_output_names()
output_handle = predictor.get_output_handle(output_names[0])
output_data = output_handle.copy_to_cpu() # numpy.ndarray类型
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--model_file", type=str, help="model filename")
parser.add_argument("--params_file", type=str, help="parameter filename")
parser.add_argument("--batch_size", type=int, default=1, help="batch size")
return parser.parse_args()
def set_config(args):
config = Config(args.model_file, args.params_file)
config.disable_gpu()
config.switch_use_feed_fetch_ops(False)
config.switch_specify_input_names(True)
return config
if __name__ == "__main__":
main()
```
## 支持方法列表
* Tensor
* `copy_from_cpu(input: numpy.ndarray) -> None`
* `copy_to_cpu() -> numpy.ndarray`
* `reshape(input: numpy.ndarray|List[int]) -> None`
* `shape() -> List[int]`
* `set_lod(input: numpy.ndarray|List[List[int]]) -> None`
* `lod() -> List[List[int]]`
* `type() -> PaddleDType`
* Config
* `set_model(model_dir: str) -> None`
* `set_model(prog_file: str, params_file: str) -> None`
* `set_model_buffer(model: str, model_size: int, param: str, param_size: int) -> None`
* `model_dir() -> str`
* `prog_file() -> str`
* `params_file() -> str`
* `model_from_memory() -> bool`
* `set_cpu_math_library_num_threads(num: int) -> None`
* `enable_use_gpu(memory_pool_init_size_mb: int, device_id: int) -> None`
* `use_gpu() -> bool`
* `gpu_device_id() -> int`
* `switch_ir_optim(x: bool = True) -> None`
* `switch_ir_debug(x: int=True) -> None`
* `ir_optim() -> bool`
* `enable_tensorrt_engine(workspace_size: int = 1 << 20,
max_batch_size: int,
min_subgraph_size: int,
precision_mode: AnalysisConfig.precision,
use_static: bool,
use_calib_mode: bool) -> None`
* `set_trt_dynamic_shape_info(min_input_shape: Dict[str, List[int]]={}, max_input_shape: Dict[str, List[int]]={}, optim_input_shape: Dict[str, List[int]]={}, disable_trt_plugin_fp16: bool=False) -> None`
* `tensorrt_engine_enabled() -> bool`
* `enable_mkldnn() -> None`
* `enable_mkldnn_bfloat16() -> None`
* `mkldnn_enabled() -> bool`
* `set_mkldnn_cache_capacity(capacity: int=0) -> None`
* `set_mkldnn_op(ops: Set[str]) -> None`
* `set_optim_cache_dir(dir: str) -> None`
* `disable_glog_info() -> None`
* `pass_builder() -> paddle::PassStrategy`
* `delete_pass(pass_name: str) -> None`
* `cpu_math_library_num_threads() -> int`
* `disable_gpu() -> None`
* `enable_lite_engine(precision: PrecisionType, zero_copy: bool, passes_filter: List[str]=[], ops_filter: List[str]=[]) -> None`
* `lite_engine_enabled() -> bool`
* `enable_memory_optim() -> None`
* `enable_profile() -> None`
* `enable_quantizer() -> None`
* `quantizer_config() -> paddle::MkldnnQuantizerConfig`
* `fraction_of_gpu_memory_for_pool() -> float`
* `memory_pool_init_size_mb() -> int`
* `glog_info_disabled() -> bool`
* `gpu_device_id() -> int`
* `specify_input_name() -> bool`
* `switch_specify_input_names(x: bool=True) -> None`
* `specify_input_name(q) -> bool`
* `switch_use_feed_fetch_ops(x: int=True) -> None`
* `use_feed_fetch_ops_enabled() -> bool`
* `to_native_config() -> paddle.fluid.core_avx.NativeConfig`
* `create_predictor(config: Config) -> Predictor`
* Predictor
* `run() -> None`
* `get_input_names() -> List[str]`
* `get_input_handle(input_name: str) -> Tensor`
* `get_output_names() -> List[str]`
* `get_output_handle(output_name: str) -> Tensor`
* `clear_intermediate_tensor() -> None`
* `clone() -> Predictor`
* PredictorPool
* `retrive(idx: int) -> Predictor`
可参考对应的[C++预测接口](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/pybind/inference_api.cc),其中定义了每个接口的参数和返回值
##########
移动端部署
##########
本模块介绍了飞桨的端侧推理引擎Paddle-Lite:
* `Paddle Lite <mobile_index.html>`_:简要介绍了 Paddle-Lite 特点以及使用说明。
.. toctree::
:hidden:
mobile_index.md
#################
Mobile Deployment
#################
# Paddle-Lite
Paddle-Lite为Paddle-Mobile的升级版,定位支持包括手机移动端在内更多场景的轻量化高效预测,支持更广泛的硬件和平台,是一个高性能、轻量级的深度学习预测引擎。在保持和PaddlePaddle无缝对接外,也兼容支持其他训练框架产出的模型。
完整使用文档位于 [Paddle-Lite 文档](https://paddle-lite.readthedocs.io/zh/latest/)
## 特性
### 轻量级
执行阶段和计算优化阶段实现良好解耦拆分,移动端可以直接部署执行阶段,无任何第三方依赖。
包含完整的80个 Op+85个 Kernel 的动态库,对于ARMV7只有800K,ARMV8下为1.3M,并可以裁剪到更低。
在应用部署时,载入模型即可直接预测,无需额外分析优化。
### 高性能
极致的 ARM CPU 性能优化,针对不同微架构特点实现kernel的定制,最大发挥计算性能,在主流模型上展现出领先的速度优势。
支持量化模型,结合[PaddleSlim 模型压缩工具](https://github.com/PaddlePaddle/models/tree/v1.5/PaddleSlim) 中量化功能,可以提供高精度高性能的预测能力。
在Huawei NPU, FPGA上也具有有很好的性能表现。
最新性能数据位于 [Benchmark 文档](https://paddle-lite.readthedocs.io/zh/latest/benchmark/benchmark.html)
### 通用性
硬件方面,Paddle-Lite 的架构设计为多硬件兼容支持做了良好设计。除了支持ARM CPU、Mali GPU、Adreno GPU,还特别支持了华为 NPU,以及 FPGA 等边缘设备广泛使用的硬件。即将支持支持包括寒武纪、比特大陆等AI芯片,未来会增加对更多硬件的支持。
模型支持方面,Paddle-Lite和PaddlePaddle训练框架的Op对齐,提供更广泛的模型支持能力。目前已严格验证18个模型85个OP的精度和性能,对视觉类模型做到了较为充分的支持,覆盖分类、检测和定位,包含了特色的OCR模型的支持。未来会持续增加更多模型的支持验证。
框架兼容方面:除了PaddlePaddle外,对其他训练框架也提供兼容支持。当前,支持Caffe 和 TensorFlow 训练出来的模型,通过[X2Paddle] (https://github.com/PaddlePaddle/X2Paddle) 转换工具实现。接下来将会对ONNX等格式模型提供兼容支持。
## 架构
Paddle-Lite 的架构设计着重考虑了对多硬件和平台的支持,并且强化了多个硬件在一个模型中混合执行的能力,多个层面的性能优化处理,以及对端侧应用的轻量化设计。
![](https://github.com/Superjomn/_tmp_images/raw/master/images/paddle-lite-architecture.png)
其中,Analysis Phase 包括了 MIR(Machine IR) 相关模块,能够对原有的模型的计算图针对具体的硬件列表进行算子融合、计算裁剪 在内的多种优化。Execution Phase 只涉及到Kernel 的执行,且可以单独部署,以支持极致的轻量级部署。
## Paddle-Mobile升级为Paddle-Lite的说明
原Paddle-Mobile作为一个致力于嵌入式平台的PaddlePaddle预测引擎,已支持多种硬件平台,包括ARM CPU、 Mali GPU、Adreno GPU,以及支持苹果设备的GPU Metal实现、ZU5、ZU9等FPGA开发板、树莓派等arm-linux开发板。在百度内已经过广泛业务场景应用验证。对应设计文档可参考: [mobile/README](https://github.com/PaddlePaddle/Paddle-Lite/blob/develop/mobile/README.md)
Paddle-Mobile 整体升级重构并更名为Paddle-Lite后,原paddle-mobile 的底层能力大部分已集成到[新架构 ](https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite)下。作为过渡,暂时保留原Paddle-mobile代码。 主体代码位于 `mobile/` 目录中,后续一段时间会继续维护,并完成全部迁移。新功能会统一到[新架构 ](https://github.com/PaddlePaddle/Paddle-Lite/tree/develop/lite)下开发。
metal, web的模块相对独立,会继续在 `./metal``./web` 目录下开发和维护。对苹果设备的GPU Metal实现的需求及web前端预测需求,可以直接进入这两个目录。
## 致谢
Paddle-Lite 借鉴了以下开源项目:
- [ARM compute library](https://github.com/ARM-software/ComputeLibrary)
- [Anakin](https://github.com/PaddlePaddle/Anakin) ,Anakin对应底层的一些优化实现已被集成到Paddle-Lite。Anakin作为PaddlePaddle组织下的一个高性能预测项目,极具前瞻性,对Paddle-Lite有重要贡献。Anakin已和本项目实现整合。之后,Anakin不再升级。
## 交流与反馈
* 欢迎您通过Github Issues来提交问题、报告与建议
* 微信公众号:飞桨PaddlePaddle
* QQ群: 696965088
<p align="center"><img width="200" height="200" src="https://user-images.githubusercontent.com/45189361/64117959-1969de80-cdc9-11e9-84f7-e1c2849a004c.jpeg"/>&#8194;&#8194;&#8194;&#8194;&#8194;<img width="200" height="200" margin="500" src="https://user-images.githubusercontent.com/45189361/64117844-cb54db00-cdc8-11e9-8c08-24bbe594608e.jpeg"/></p>
<p align="center"> &#8194;&#8194;&#8194;微信公众号&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;&#8194;官方技术交流QQ群</p>
* 论坛: 欢迎大家在[PaddlePaddle论坛](https://ai.baidu.com/forum/topic/list/168)分享在使用PaddlePaddle中遇到的问题和经验, 营造良好的论坛氛围
##########
分布式训练
##########
.. toctree::
:maxdepth: 1
cluster_quick_start.rst
fleet_api_howto_cn.rst
.. _user_guide_distribute_en:
######################
Distributed Training
######################
.. toctree::
:maxdepth: 1
cluster_quick_start_en.rst
cluster_howto_en.rst
########
多机训练
########
.. toctree::
:maxdepth: 1
cluster_quick_start.rst
cluster_howto.rst
fleet_api_howto_cn.rst
####################
Multi-node Training
####################
.. toctree::
:maxdepth: 1
cluster_quick_start_en.rst
cluster_howto_en.rst
train_on_baidu_cloud_en.rst
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册