提交 38ce9952 编写于 作者: W wangjiawei04

merge with develop

......@@ -58,7 +58,12 @@ option(APP "Compile Paddle Serving App package" OFF)
option(WITH_ELASTIC_CTR "Compile ELASITC-CTR solution" OFF)
option(PACK "Compile for whl" OFF)
option(WITH_TRT "Compile Paddle Serving with TRT" OFF)
option(PADDLE_ON_INFERENCE "Compile for encryption" ON)
if (PADDLE_ON_INFERENCE)
add_definitions(-DPADDLE_ON_INFERENCE)
message(STATUS "Use PADDLE_ON_INFERENCE")
endif()
set(WITH_MKLML ${WITH_MKL})
if (NOT DEFINED WITH_MKLDNN)
if (WITH_MKL AND AVX2_FOUND)
......
......@@ -20,88 +20,139 @@
<br>
<p>
- [Motivation](./README.md#motivation)
- [AIStudio Tutorial](./README.md#aistuio-tutorial)
- [Installation](./README.md#installation)
- [Quick Start Example](./README.md#quick-start-example)
- [Document](README.md#document)
- [Community](README.md#community)
<h2 align="center">Motivation</h2>
We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
<h3 align="center">Some Key Features of Paddle Serving</h3>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as C++, python and Java.
***
- Any model trained by [PaddlePaddle](https://github.com/paddlepaddle/paddle) can be directly used or [Model Conversion Interface](./doc/SAVE.md) for online deployment of Paddle Serving.
- Support [Multi-model Pipeline Deployment](./doc/PIPELINE_SERVING.md), and provide the requirements of the REST interface and RPC interface itself, [Pipeline example](./python/examples/pipeline).
- Support the model zoos from the Paddle ecosystem, such as [PaddleDetection](./python/examples/detection), [PaddleOCR](./python/examples/ocr), [PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/tools/recserving/movie_recommender).
- Provide a variety of pre-processing and post-processing to facilitate users in training, deployment and other stages of related code, bridging the gap between AI developers and application developers, please refer to
[Serving Examples](./python/examples/).
<p align="center">
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">AIStudio Turorial</h2>
Here we provide tutorial on AIStudio(Chinese Version) [AIStudio教程-Paddle Serving服务化部署框架](https://aistudio.baidu.com/aistudio/projectdetail/1550674)
The tutorial provides
<ul>
<li>Paddle Serving Environment Setup</li>
<ul>
<li>Running in docker images
<li>pip install Paddle Serving
</ul>
<li>Quick Experience of Paddle Serving</li>
<li>Advanced Tutorial of Model Deployment</li>
<ul>
<li>Save/Convert Models for Paddle Serving</li>
<li>Setup Online Inference Service</li>
</ul>
<li>Paddle Serving Examples</li>
<ul>
<li>Paddle Serving for Detections</li>
<li>Paddle Serving for OCR</li>
</ul>
</ul>
<h2 align="center">Installation</h2>
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
**Attention:**: Currently, the default GPU environment of paddlepaddle 2.0 is Cuda 10.2, so the sample code of GPU Docker is based on Cuda 10.2. We also provides docker images and whl packages for other GPU environments. If users use other environments, they need to carefully check and select the appropriate version.
```
# Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```
# Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```shell
pip install paddle-serving-client==0.4.0
pip install paddle-serving-server==0.4.0 # CPU
pip install paddle-serving-app==0.2.0
pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT
pip install paddle-serving-client==0.5.0
pip install paddle-serving-server==0.5.0 # CPU
pip install paddle-serving-app==0.3.0
pip install paddle-serving-server-gpu==0.5.0.post102 #GPU with CUDA10.2 + TensorRT7
# DO NOT RUN ALL COMMANDS! check your GPU env and select the right one
pip install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
```
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you want to compile by yourself, please refer to [How to compile Paddle Serving?](./doc/COMPILE.md)
Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7.
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7/3.8.
Recommended to install paddle >= 1.8.4.
Recommended to install paddle >= 2.0.0
For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
```
# CPU users, please run
pip install paddlepaddle==2.0.0
<h2 align="center"> Pre-built services with Paddle Serving</h2>
# GPU Cuda10.2 please run
pip install paddlepaddle-gpu==2.0.0
```
<h3 align="center">Chinese Word Segmentation</h4>
**Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list
](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables_en.html#multi-version-whl-package-list-release)
``` shell
> python -m paddle_serving_app.package --get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py lac_model/ lac_workdir 9393 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}
Select the url link of the corresponding GPU environment and install it. For example, for Python2.7 users of Cuda 9.0, please select `cp27-cp27mu` and
The url corresponding to `cuda9.0_cudnn7-mkl`, copy it and run
```
pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl
```
<h3 align="center">Image Classification</h4>
the default `paddlepaddle-gpu==2.0.0` is Cuda 10.2 with no TensorRT. If you want to install PaddlePaddle with TensorRT. please also check the documentation-multi-version whl package list and find key word `cuda10.2-cudnn8.0-trt7.1.3`. More info please check [Paddle Serving uses TensorRT](./doc/TENSOR_RT.md)
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
If it is other environment and Python version, please find the corresponding link in the table and install it with pip.
``` shell
> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
```
For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
<h2 align="center">Quick Start Example</h2>
This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
This quick start example is mainly for those users who already have a model to deploy, and we also provide a model that can be used for deployment. in case if you want to know how to complete the process from offline training to online service, please refer to the AiStudio tutorial above.
### Boston House Price Prediction model
get into the Serving git directory, and change dir to `fit_a_line`
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
cd Serving/python/examples/fit_a_line
sh get_data.sh
```
Paddle Serving provides HTTP and RPC based service for users to access
......@@ -123,6 +174,8 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
</center>
......@@ -145,26 +198,8 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is `python/examples/fit_a_line`
```python
from paddle_serving_server.web_service import WebService
import numpy as np
class UciService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
new_data = np.zeros((len(feed), 1, 13)).astype("float32")
for i, ins in enumerate(feed):
nums = np.array(ins["x"]).reshape(1, 1, 13)
new_data[i] = nums
feed = {"x": new_data}
return feed, fetch, is_batch
uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(workdir="workdir", port=9292)
uci_service.run_rpc_service()
uci_service.run_web_service()
```
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
```
for client side,
```
......@@ -175,34 +210,20 @@ the response is
{"result":{"price":[[18.901151657104492]]}}
```
<h2 align="center">Some Key Features of Paddle Serving</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
<h2 align="center">Document</h2>
### New to Paddle Serving
- [How to save a servable model?](doc/SAVE.md)
- [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md)
- [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
### Tutorial at AIStudio
- [Introduction to PaddleServing](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [Image Segmentation on Paddle Serving](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [Sentimental Analysis](https://aistudio.baidu.com/aistudio/projectdetail/509014)
- [Paddle Serving Examples](python/examples)
### Developers
- [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
- [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
- [Golang client](doc/IMDB_GO_CLIENT.md)
- [Compile from source code](doc/COMPILE.md)
- [Develop Pipeline Serving](doc/PIPELINE_SERVING.md)
- [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md)
- [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md)
- [Paddle Serving uses TensorRT](doc/TENSOR_RT.md)
### About Efficiency
- [How to profile Paddle Serving latency?](python/examples/util)
......@@ -211,15 +232,13 @@ the response is
- [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
### FAQ
- [FAQ(Chinese)](doc/FAQ.md)
### Design
- [Design Doc](doc/DESIGN_DOC.md)
<h2 align="center">Community</h2>
### FAQ
- [FAQ(Chinese)](doc/FAQ.md)
<h2 align="center">Community</h2>
### Slack
......
......@@ -20,91 +20,137 @@
<br>
<p>
- [动机](./README_CN.md#动机)
- [教程](./README_CN.md#教程)
- [安装](./README_CN.md#安装)
- [快速开始示例](./README_CN.md#快速开始示例)
- [文档](README_CN.md#文档)
- [社区](README_CN.md#社区)
<h2 align="center">动机</h2>
Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 当用户使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络,就同时拥有了该模型的预测服务。
<h3 align="center">Paddle Serving的核心功能</h3>
- 与Paddle训练紧密连接,绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理,在线加载,在线A/B测试等.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端,例如C++, Python和Java.
***
- 任何经过[PaddlePaddle](https://github.com/paddlepaddle/paddle)训练的模型,都可以经过直接保存或是[模型转换接口](./doc/SAVE_CN.md),用于Paddle Serving在线部署。
- 支持[多模型串联服务部署](./doc/PIPELINE_SERVING_CN.md), 同时提供Rest接口和RPC接口以满足您的需求,[Pipeline示例](./python/examples/pipeline)
- 支持Paddle生态的各大模型库, 例如[PaddleDetection](./python/examples/detection)[PaddleOCR](./python/examples/ocr)[PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/tools/recserving/movie_recommender)
- 提供丰富多彩的前后处理,方便用户在训练、部署等各阶段复用相关代码,弥合AI开发者和应用开发者之间的鸿沟,详情参考[模型示例](./python/examples/)
<p align="center">
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">教程</h2>
Paddle Serving开发者为您提供了简单易用的[AIStudio教程-Paddle Serving服务化部署框架](https://aistudio.baidu.com/aistudio/projectdetail/1550674)
教程提供了如下内容
<ul>
<li>Paddle Serving环境安装</li>
<ul>
<li>Docker镜像启动方式
<li>pip安装Paddle Serving
</ul>
<li>快速体验部署在线推理服务</li>
<li>部署在线推理服务进阶流程</li>
<ul>
<li>获取可用于部署在线服务的模型</li>
<li>启动推理服务</li>
</ul>
<li>Paddle Serving在线部署实例</li>
<ul>
<li>使用Paddle Serving部署图像检测服务</li>
<li>使用Paddle Serving部署OCR Pipeline在线服务</li>
</ul>
</ul>
<h2 align="center">安装</h2>
**强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)。更多镜像请查看[Docker镜像列表](doc/DOCKER_IMAGES_CN.md)
**提示**:目前paddlepaddle 2.0版本的默认GPU环境是Cuda 10.2,因此GPU Docker的示例代码以Cuda 10.2为准。镜像和pip安装包也提供了其余GPU环境,用户如果使用其他环境,需要仔细甄别并选择合适的版本。
```
# 启动 CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```
# 启动 GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```shell
pip install paddle-serving-client==0.4.0
pip install paddle-serving-server==0.4.0 # CPU
pip install paddle-serving-app==0.2.0
pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT
pip install paddle-serving-client==0.5.0
pip install paddle-serving-server==0.5.0 # CPU
pip install paddle-serving-app==0.3.0
pip install paddle-serving-server-gpu==0.5.0.post102 #GPU with CUDA10.2 + TensorRT7
# 其他GPU环境需要确认环境再选择执行哪一条
pip install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
```
您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。如果您想自行编译,请参照[Paddle Serving编译文档](./doc/COMPILE_CN.md)
paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python2.7/3.5/3.6。
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python2.7/3.5/3.6/3.7/3.8
推荐安装1.8.4及以上版本的paddle
推荐安装2.0.0及以上版本的paddle
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)
```
# CPU环境请执行
pip install paddlepaddle==2.0.0
<h2 align="center"> Paddle Serving预装的服务 </h2>
# GPU Cuda10.2环境请执行
pip install paddlepaddle-gpu==2.0.0
```
<h3 align="center">中文分词</h4>
**注意**: 如果您的Cuda版本不是10.2,请勿直接执行上述命令,需要参考[Paddle官方文档-多版本whl包列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)
``` shell
> python -m paddle_serving_app.package --get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py lac_model/ lac_workdir 9393 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}
选择相应的GPU环境的url链接并进行安装,例如Cuda 9.0的Python2.7用户,请选择表格当中的`cp27-cp27mu``cuda9.0_cudnn7-mkl`对应的url,复制下来并执行
```
pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl
```
由于默认的`paddlepaddle-gpu==2.0.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本。更多信息请参考[如何使用TensorRT?](doc/TENSOR_RT_CN.md)
<h3 align="center">图像分类</h4>
如果是其他环境和Python版本,请在表格中找到对应的链接并用pip安装。
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
```
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)
<h2 align="center">快速开始示例</h2>
这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的,而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程,请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)
这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的,而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程,请参考前文的AiStudio教程。
<h3 align="center">波士顿房价预测</h3>
进入到Serving的git目录下,进入到`fit_a_line`例子
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
cd Serving/python/examples/fit_a_line
sh get_data.sh
```
Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
......@@ -127,10 +173,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `mem_optim_off` | - | - | Disable memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求,请参考英文文
[requests](https://requests.readthedocs.io/en/master/)
</center>
``` python
......@@ -151,26 +197,8 @@ print(fetch_map)
<h3 align="center">HTTP服务</h3>
用户也可以将数据格式处理逻辑放在服务器端进行,这样就可以直接用curl去访问服务,参考如下案例,在目录`python/examples/fit_a_line`
```python
from paddle_serving_server.web_service import WebService
import numpy as np
class UciService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
new_data = np.zeros((len(feed), 1, 13)).astype("float32")
for i, ins in enumerate(feed):
nums = np.array(ins["x"]).reshape(1, 1, 13)
new_data[i] = nums
feed = {"x": new_data}
return feed, fetch, is_batch
uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(workdir="workdir", port=9292)
uci_service.run_rpc_service()
uci_service.run_web_service()
```
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
```
客户端输入
```
......@@ -181,34 +209,20 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1
{"result":{"price":[[18.901151657104492]]}}
```
<h2 align="center">Paddle Serving的核心功能</h2>
- 与Paddle训练紧密连接,绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理,在线加载,在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端,例如Golang,C++和Python.
<h2 align="center">文档</h2>
### 新手教程
- [怎样保存用于Paddle Serving的模型?](doc/SAVE_CN.md)
- [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md)
- [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
### AIStudio教程
- [PaddleServing作业](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [PaddleServing图像分割](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [PaddleServing情感分析](https://aistudio.baidu.com/aistudio/projectdetail/509014)
- [Paddle Serving示例合辑](python/examples)
### 开发者教程
- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
- [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
- [如何编译PaddleServing?](doc/COMPILE_CN.md)
- [如何开发Pipeline?](doc/PIPELINE_SERVING_CN.md)
- [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md)
- [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md)
- [如何使用TensorRT?](doc/TENSOR_RT_CN.md)
### 关于Paddle Serving性能
- [如何测试Paddle Serving性能?](python/examples/util/)
......@@ -217,12 +231,12 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1
- [CPU版Benchmarks](doc/BENCHMARKING.md)
- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
### FAQ
- [常见问答](doc/FAQ.md)
### 设计文档
- [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
### FAQ
- [常见问答](doc/FAQ.md)
<h2 align="center">社区</h2>
### Slack
......
......@@ -31,14 +31,28 @@ message( "WITH_GPU = ${WITH_GPU}")
# Paddle Version should be one of:
# latest: latest develop build
# version number like 1.5.2
SET(PADDLE_VERSION "2.0.0-rc1")
SET(PADDLE_VERSION "2.0.0")
if (WITH_GPU)
if (WITH_TRT)
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda10.1-cudnn7-avx-mkl-trt6")
else()
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda10-cudnn7-avx-mkl")
if(CUDA_VERSION EQUAL 11.0)
set(CUDA_SUFFIX "cuda11-cudnn8-avx-mkl")
set(WITH_TRT ON)
elseif(CUDA_VERSION EQUAL 10.2)
set(CUDA_SUFFIX "cuda10.2-cudnn8-avx-mkl")
set(WITH_TRT ON)
elseif(CUDA_VERSION EQUAL 10.1)
set(CUDA_SUFFIX "cuda10.1-cudnn7-avx-mkl")
set(WITH_TRT ON)
elseif(CUDA_VERSION EQUAL 10.0)
set(CUDA_SUFFIX "cuda10-cudnn7-avx-mkl")
elseif(CUDA_VERSION EQUAL 9.0)
set(CUDA_SUFFIX "cuda9-cudnn7-avx-mkl")
endif()
else()
set(WITH_TRT OFF)
endif()
if (WITH_GPU)
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-${CUDA_SUFFIX}")
elseif (WITH_LITE)
if (WITH_XPU)
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-arm-xpu")
......@@ -78,7 +92,8 @@ if (WITH_GPU OR WITH_MKLML)
INSTALL_COMMAND
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/paddle/include ${PADDLE_INSTALL_DIR}/include &&
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/paddle/lib ${PADDLE_INSTALL_DIR}/lib &&
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/third_party ${PADDLE_INSTALL_DIR}/third_party
${CMAKE_COMMAND} -E copy_directory ${PADDLE_DOWNLOAD_DIR}/third_party ${PADDLE_INSTALL_DIR}/third_party &&
${CMAKE_COMMAND} -E copy ${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib/libdnnl.so.1 ${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib/libdnnl.so
)
else()
ExternalProject_Add(
......@@ -124,8 +139,8 @@ LINK_DIRECTORIES(${PADDLE_INSTALL_DIR}/third_party/install/mkldnn/lib)
ADD_LIBRARY(openblas STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET openblas PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/openblas/lib/libopenblas.a)
ADD_LIBRARY(paddle_fluid SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET paddle_fluid PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/lib/libpaddle_fluid.so)
ADD_LIBRARY(paddle_fluid STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET paddle_fluid PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/lib/libpaddle_fluid.a)
if (WITH_TRT)
ADD_LIBRARY(nvinfer SHARED IMPORTED GLOBAL)
......@@ -151,10 +166,13 @@ endif()
ADD_LIBRARY(xxhash STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET xxhash PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/xxhash/lib/libxxhash.a)
ADD_LIBRARY(cryptopp STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET cryptopp PROPERTY IMPORTED_LOCATION ${PADDLE_INSTALL_DIR}/third_party/install/cryptopp/lib/libcryptopp.a)
LIST(APPEND external_project_dependencies paddle)
LIST(APPEND paddle_depend_libs
xxhash)
xxhash cryptopp)
if(WITH_LITE)
LIST(APPEND paddle_depend_libs paddle_full_api_shared)
......
......@@ -22,7 +22,7 @@ include_directories(SYSTEM ${CMAKE_CURRENT_BINARY_DIR}/../)
add_executable(cube-builder src/main.cpp include/cube-builder/util.h src/util.cpp src/builder_job.cpp include/cube-builder/builder_job.h include/cube-builder/define.h src/seqfile_reader.cpp include/cube-builder/seqfile_reader.h include/cube-builder/raw_reader.h include/cube-builder/vtext.h src/crovl_builder_increment.cpp include/cube-builder/crovl_builder_increment.h src/curl_simple.cpp include/cube-builder/curl_simple.h)
add_dependencies(cube-builder jsoncpp)
add_dependencies(cube-builder jsoncpp boost)
set(DYNAMIC_LIB
gflags
......
if(CLIENT)
add_subdirectory(pybind11)
pybind11_add_module(serving_client src/general_model.cpp src/pybind_general_model.cpp)
add_dependencies(serving_client sdk_cpp)
target_link_libraries(serving_client PRIVATE -Wl,--whole-archive utils sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -Wl,-rpath,'$ORIGIN'/lib)
endif()
......@@ -29,7 +29,8 @@ target_link_libraries(serving -Wl,--whole-archive fluid_cpu_engine
-Wl,--no-whole-archive)
target_link_libraries(serving paddle_fluid ${paddle_depend_libs})
target_link_libraries(serving brpc)
target_link_libraries(serving protobuf)
target_link_libraries(serving pdserving)
target_link_libraries(serving cube-api)
target_link_libraries(serving utils)
......@@ -40,7 +41,7 @@ endif()
if(WITH_MKL OR WITH_GPU)
if (WITH_TRT)
target_link_libraries(serving -liomp5 -lmklml_intel -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
target_link_libraries(serving -liomp5 -lmklml_intel -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2 -ldnnl)
else()
target_link_libraries(serving -liomp5 -lmklml_intel -lmkldnn -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -lbz2)
endif()
......@@ -61,6 +62,7 @@ install(FILES
${CMAKE_BINARY_DIR}/third_party/install/Paddle/third_party/install/mklml/lib/libmklml_intel.so
${CMAKE_BINARY_DIR}/third_party/install/Paddle/third_party/install/mklml/lib/libiomp5.so
${CMAKE_BINARY_DIR}/third_party/install/Paddle/third_party/install/mkldnn/lib/libmkldnn.so.0
${CMAKE_BINARY_DIR}/third_party/install/Paddle/third_party/install/mkldnn/lib/libdnnl.so.1
DESTINATION
${PADDLE_SERVING_INSTALL_DIR}/demo/serving/bin)
endif()
......@@ -91,7 +91,7 @@ int GeneralReaderOp::inference() {
capacity.resize(var_num);
for (int i = 0; i < var_num; ++i) {
std::string tensor_name = model_config->_feed_name[i];
VLOG(2) << "(logid=" << log_id << ") get tensor name: " << tensor_name;
VLOG(2) << "(logid=" << log_id << ") get tensor name: " << tensor_name;
auto lod_tensor = InferManager::instance().GetInputHandle(
engine_name.c_str(), tensor_name.c_str());
std::vector<std::vector<size_t>> lod;
......
......@@ -7,7 +7,7 @@ PROTOBUF_GENERATE_CPP(pdcodegen_proto_srcs pdcodegen_proto_hdrs
LIST(APPEND pdcodegen_srcs ${pdcodegen_proto_srcs})
add_executable(pdcodegen ${pdcodegen_srcs})
add_dependencies(pdcodegen boost)
add_dependencies(pdcodegen boost protobuf)
target_link_libraries(pdcodegen protobuf ${PROTOBUF_PROTOC_LIBRARY})
# install
......
......@@ -2,7 +2,7 @@
include(src/CMakeLists.txt)
include(proto/CMakeLists.txt)
add_library(sdk-cpp ${sdk_cpp_srcs})
add_dependencies(sdk-cpp pdcodegen configure)
add_dependencies(sdk-cpp pdcodegen configure protobuf brpc leveldb)
target_link_libraries(sdk-cpp brpc configure protobuf leveldb)
# install
......
......@@ -16,32 +16,30 @@ sh get_data.sh
```
### Processing Data
Data processing needs to use the relevant library, please use pip to install
``` shell
pip install paddlepaddle
pip install paddle-serving-app
pip install Shapely
````
The following Python code will process the data `test_data/part-0` and write to the `processed.data` file.
You can directly run the following command to process the data.
[//file]:#process.py
``` python
from paddle_serving_app.reader import IMDBDataset
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
with open('test_data/part-0') as fin:
with open('processed.data', 'w') as fout:
for line in fin:
word_ids, label = imdb_dataset.get_words_and_label(line)
fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0]))
```
[python abtest_get_data.py](../python/examples/imdb/abtest_get_data.py)
The Python code in the file will process the data `test_data/part-0` and write to the `processed.data` file.
### Start Server
Here, we [use docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md) to start the server-side service.
Here, we [use docker](RUN_IN_DOCKER.md) to start the server-side service.
First, start the BOW server, which enables the `8000` port:
``` shell
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it bow-server bash
pip install paddle-serving-server
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
docker exec -it bow-server /bin/bash
pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
exit
```
......@@ -49,18 +47,25 @@ exit
Similarly, start the LSTM server, which enables the `9000` port:
```bash
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it lstm-server bash
pip install paddle-serving-server
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
docker exec -it lstm-server /bin/bash
pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
exit
```
### Start Client
Run the following Python code on the host computer to start client. Make sure that the host computer is installed with the `paddle-serving-client` package.
In order to simulate abtest condition, you can run the following Python code on the host to start the client, but you need to ensure that the host has the relevant environment, you can also run in the docker environment.
[//file]:#ab_client.py
Before running, use `pip install paddle-serving-client` to install the paddle-serving-client package.
You can directly use the following command to make abtest prediction.
[python abtest_client.py](../python/examples/imdb/abtest_client.py)
[//file]:#abtest_client.py
``` python
from paddle_serving_client import Client
......@@ -91,7 +96,7 @@ In the code, the function `client.add_variant(tag, clusters, variant_weight)` is
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
### Expected Results
Due to different network conditions, the results of each prediction may be slightly different.
``` python
[lstm](total: 1867) acc: 0.490091055169
[bow](total: 217) acc: 0.73732718894
......
......@@ -16,31 +16,29 @@ sh get_data.sh
```
### 处理数据
由于处理数据需要用到相关库,请使用pip进行安装
``` shell
pip install paddlepaddle
pip install paddle-serving-app
pip install Shapely
````
您可以直接运行下面的命令来处理数据。
下面Python代码将处理`test_data/part-0`的数据,写入`processed.data`文件中。
[python abtest_get_data.py](../python/examples/imdb/abtest_get_data.py)
```python
from paddle_serving_app.reader import IMDBDataset
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
with open('test_data/part-0') as fin:
with open('processed.data', 'w') as fout:
for line in fin:
word_ids, label = imdb_dataset.get_words_and_label(line)
fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0]))
```
文件中的Python代码将处理`test_data/part-0`的数据,并将处理后的数据生成并写入`processed.data`文件中。
### 启动Server端
这里采用[Docker方式](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER_CN.md)启动Server端服务。
这里采用[Docker方式](RUN_IN_DOCKER_CN.md)启动Server端服务。
首先启动BOW Server,该服务启用`8000`端口:
```bash
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it bow-server bash
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
docker exec -it bow-server /bin/bash
pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
exit
```
......@@ -48,19 +46,27 @@ exit
同理启动LSTM Server,该服务启用`9000`端口:
```bash
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it lstm-server bash
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server registry.baidubce.com/paddlepaddle/serving:latest /bin/bash
docker exec -it lstm-server /bin/bash
pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
exit
```
### 启动Client端
为了模拟ABTEST工况,您可以在宿主机运行下面Python代码启动Client端,但需确保宿主机具备相关环境,您也可以在docker环境下运行.
运行前使用`pip install paddle-serving-client`安装paddle-serving-client包。
在宿主机运行下面Python代码启动Client端,需要确保宿主机装好`paddle-serving-client`包。
您可以直接使用下面的命令,进行ABTEST预测。
[python abtest_client.py](../python/examples/imdb/abtest_client.py)
```python
from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
......@@ -68,28 +74,32 @@ client.add_variant("bow", ["127.0.0.1:8000"], 10)
client.add_variant("lstm", ["127.0.0.1:9000"], 90)
client.connect()
print('please wait for about 10s')
with open('processed.data') as f:
cnt = {"bow": {'acc': 0, 'total': 0}, "lstm": {'acc': 0, 'total': 0}}
for line in f:
word_ids, label = line.split(';')
word_ids = [int(x) for x in word_ids.split(',')]
feed = {"words": word_ids}
word_len = len(word_ids)
feed = {
"words": np.array(word_ids).reshape(word_len, 1),
"words.lod": [0, word_len]
}
fetch = ["acc", "cost", "prediction"]
[fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
[fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True,batch=True)
if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
cnt[tag]['acc'] += 1
cnt[tag]['total'] += 1
for tag, data in cnt.items():
print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc']) / float(data['total'])))
print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc'])/float(data['total']) ))
```
代码中,`client.add_variant(tag, clusters, variant_weight)`是为了添加一个标签为`tag`、流量权重为`variant_weight`的variant。在这个样例中,添加了一个标签为`bow`、流量权重为`10`的BOW variant,以及一个标签为`lstm`、流量权重为`90`的LSTM variant。Client端的流量会根据`10:90`的比例分发到两个variant。
Client端做预测时,若指定参数`need_variant_tag=True`,返回值则包含分发流量对应的variant标签。
### 预期结果
由于网络情况的不同,可能每次预测的结果略有差异。
``` bash
[lstm](total: 1867) acc: 0.490091055169
[bow](total: 217) acc: 0.73732718894
......
# Paddle Serving Using Baidu Kunlun Chips
(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips
(such as Phytium FT-2000+/64). We will improve
the deployment capability on various heterogeneous hardware servers in the future.
# Compilation and installation
Refer to [compile](COMPILE.md) document to setup the compilation environment。
## Compilatiton
* Compile the Serving Server
```
cd Serving
mkdir -p server-build-arm && cd server-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DSERVER=ON ..
make -j10
```
You can run `make install` to produce the target in `./output` directory. Add `-DCMAKE_INSTALL_PREFIX=./output` to specify the output path to CMake command shown above。
* Compile the Serving Client
```
mkdir -p client-build-arm && cd client-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DCLIENT=ON ..
make -j10
```
* Compile the App
```
cd Serving
mkdir -p app-build-arm && cd app-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DAPP=ON ..
make -j10
```
## Install the wheel package
After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package。
# Request parameters description
In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment。
|param|param description|about|
|:--|:--|:--|
|use_lite|using Paddle-Lite Engine|use the inference capability of Paddle-Lite|
|use_xpu|using Baidu Kunlun for inference|need to be used with the use_lite option|
|ir_optim|open the graph optimization|refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
# Deplyment examples
## Download the model
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
## Start RPC service
There are mainly three deployment methods:
* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
* deploy on the ARM server standalone with Paddle-Lite;
* deploy on the ARM server standalone without Paddle-Lite。
The first two deployment methods are recommended。
Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
Start the rpc service, deploying on ARM server.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
```
##
```
from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
Some examples are provided below, and other models can be modifed with reference to these examples。
|sample name|sample links|
|:-----|:--|
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
# Paddle Serving使用百度昆仑芯片部署
(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64)上进行部署,后续完善对其他异构硬件服务器部署能力。
# 编译、安装
基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。
## 编译
* 编译server部分
```
cd Serving
mkdir -p server-build-arm && cd server-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DSERVER=ON ..
make -j10
```
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
* 编译client部分
```
mkdir -p client-build-arm && cd client-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DCLIENT=ON ..
make -j10
```
* 编译app部分
```
cd Serving
mkdir -p app-build-arm && cd app-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DAPP=ON ..
make -j10
```
## 安装wheel包
以上编译步骤完成后,会在各自编译目录$build_dir/python/dist生成whl包,分别安装即可。例如server步骤,会在server-build-arm/python/dist目录下生成whl包, 使用命令```pip install -u xxx.whl```进行安装。
# 请求参数说明
为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。
|参数|参数说明|备注|
|:--|:--|:--|
|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力|
|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用|
|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
# 部署使用示例
## 下载模型
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
## 启动rpc服务
主要有三种启动配置:
* 使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力;
* 单独使用arm cpu部署,使用Paddle-Lite优化加速能力;
* 使用arm cpu部署,不使用Paddle-Lite加速。
推荐使用前两种部署方式。
启动rpc服务,使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
启动rpc服务,使用arm cpu部署, 使用Paddle-Lite加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
启动rpc服务,使用arm cpu部署, 不使用Paddle-Lite加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
```
## client调用
```
from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
以下提供部分样例,其他模型可参照进行修改。
|示例名称|示例链接|
|:-----|:--|
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
......@@ -2,35 +2,57 @@
([简体中文](./BERT_10_MINS_CN.md)|English)
The goal of Bert-As-Service is to give a sentence, and the service can represent the sentence as a semantic vector and return it to the user. [Bert model](https://arxiv.org/abs/1810.04805) is a popular model in the current NLP field. It has achieved good results on a variety of public NLP tasks. The semantic vector calculated by the Bert model is used as input to other NLP models, which will also greatly improve the performance of the model. Bert-As-Service allows users to easily obtain the semantic vector representation of text and apply it to their own tasks. In order to achieve this goal, we have shown in four steps that using Paddle Serving can build such a service in ten minutes. All the code and files in the example can be found in [Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) of Paddle Serving.
The goal of Bert-As-Service is to give a sentence, and the service can represent the sentence as a semantic vector and return it to the user. [Bert model](https://arxiv.org/abs/1810.04805) is a popular model in the current NLP field. It has achieved good results on a variety of public NLP tasks. The semantic vector calculated by the Bert model is used as input to other NLP models, which will also greatly improve the performance of the model. Bert-As-Service allows users to easily obtain the semantic vector representation of text and apply it to their own tasks. In order to achieve this goal, we have shown in five steps that using Paddle Serving can build such a service in ten minutes. All the code and files in the example can be found in [Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) of Paddle Serving.
#### Step1: Save the serviceable model
If your python version is 3.X, replace the 'pip' field in the following command with 'pip3',replace 'python' with 'python3'.
Paddle Serving supports various models trained based on Paddle, and saves the serviceable model by specifying the input and output variables of the model. For convenience, we can load a trained bert Chinese model from paddlehub and save a deployable service with two lines of code. The server and client configurations are placed in the `bert_seq20_model` and` bert_seq20_client` folders, respectively.
### Step1: Getting Model
[//file]:#bert_10.py
``` python
import paddlehub as hub
model_name = "bert_chinese_L-12_H-768_A-12"
module = hub.Module(model_name)
inputs, outputs, program = module.context(
trainable=True, max_seq_len=20)
feed_keys = ["input_ids", "position_ids", "segment_ids",
"input_mask"]
fetch_keys = ["pooled_output", "sequence_output"]
feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys]))
#### method 1:
This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub).
import paddle_serving_client.io as serving_io
serving_io.save_model("bert_seq20_model", "bert_seq20_client",
feed_dict, fetch_dict, program)
Install paddlehub first
```
pip install paddlehub
```
run
```
python prepare_model.py 128
```
**PaddleHub only support Python 3.5+**
the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
the config file and model file for server side are saved in the folder bert_seq128_model.
the config file generated for client side is saved in the folder bert_seq128_client.
#### method 2:
You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder:
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
```
#### Step2: Launch Service
### Step2: Getting Dict and Sample Dataset
[//file]:#server.sh
``` shell
python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 --port 9292 --gpu_ids 0
```
sh get_data.sh
```
this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
### Step3: Launch Service
start cpu inference service,Run
```
python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #cpu inference service
```
Or,start gpu inference service,Run
```
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
```
| Parameters | Meaning |
| ---------- | ---------------------------------------- |
......@@ -39,52 +61,55 @@ python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 -
| port | server port number |
| gpu_ids | GPU index number |
#### Step3: data preprocessing logic on Client Side
### Step4: data preprocessing logic on Client Side
Paddle Serving has many built-in corresponding data preprocessing logics. For the calculation of Chinese Bert semantic representation, we use the ChineseBertReader class under paddle_serving_app for data preprocessing. Model input fields of multiple models corresponding to a raw Chinese sentence can be easily fetched by developers
Install paddle_serving_app
[//file]:#pip_app.sh
```shell
pip install paddle_serving_app
```
#### Step4: Client Visit Serving
### Step5: Client Visit Serving
the script of client side bert_client.py is as follow:
[//file]:#bert_client.py
``` python
import sys
from paddle_serving_client import Client
from paddle_serving_client.utils import benchmark_args
from paddle_serving_app.reader import ChineseBertReader
import numpy as np
args = benchmark_args()
#### method 1: RPC Inference
Run
```
head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
```
the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
reader = ChineseBertReader({"max_seq_len": 128})
fetch = ["pooled_output"]
endpoint_list = ['127.0.0.1:9292']
client = Client()
client.load_client_config(args.model)
client.connect(endpoint_list)
for line in sys.stdin:
feed_dict = reader.process(line)
for key in feed_dict.keys():
feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
#### method 2: HTTP Inference
This method is divided into two steps:
1. Start an HTTP prediction server.
start cpu HTTP inference service,Run
```
python bert_web_service.py bert_seq128_model/ 9292 #launch cpu inference service
```
run
Or,start gpu HTTP inference service,Run
```
export CUDA_VISIBLE_DEVICES=0,1
```
set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
```
python bert_web_service_gpu.py bert_seq128_model/ 9292 #launch gpu inference service
```
[//file]:#bert_10_cli.sh
```shell
cat data.txt | python bert_client.py
2. Prediction via HTTP request
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
```
read samples from data.txt, print results at the standard output.
### Benchmark
......
......@@ -2,30 +2,56 @@
(简体中文|[English](./BERT_10_MINS.md))
Bert-As-Service的目标是给定一个句子,服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型,在多种公开的NLP任务上都取得了很好的效果,使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标,我们通过个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。
Bert-As-Service的目标是给定一个句子,服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型,在多种公开的NLP任务上都取得了很好的效果,使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标,我们通过以下几个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。
#### Step1:保存可服务模型
Paddle Serving支持基于Paddle进行训练的各种模型,并通过指定模型的输入和输出变量来保存可服务模型。为了方便,我们可以从paddlehub加载一个已经训练好的bert中文模型,并利用两行代码保存一个可部署的服务,服务端和客户端的配置分别放在`bert_seq20_model``bert_seq20_client`文件夹。
若使用python的版本为3.X, 将以下命令中的pip 替换为pip3, python替换为python3.
``` python
import paddlehub as hub
model_name = "bert_chinese_L-12_H-768_A-12"
module = hub.Module(model_name)
inputs, outputs, program = module.context(trainable=True, max_seq_len=20)
feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask"]
fetch_keys = ["pooled_output", "sequence_output"]
feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys]))
import paddle_serving_client.io as serving_io
serving_io.save_model("bert_seq20_model", "bert_seq20_client", feed_dict, fetch_dict, program)
### Step1:获取模型
#### 方法1:
示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)
请先安装paddlehub
```
pip install paddlehub
```
执行
```
python prepare_model.py 128
```
参数128表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq128_model文件夹。
生成client端配置文件,存放在bert_seq128_client文件夹。
#### 方法2:
您也可以从bos上直接下载上述模型(max_seq_len=128),解压后server端配置文件与模型文件存放在bert_chinese_L-12_H-768_A-12_model文件夹,client端配置文件存放在bert_chinese_L-12_H-768_A-12_client文件夹:
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
```
#### Step2:启动服务
### Step2:获取词典和样例数据
```
sh get_data.sh
```
脚本将下载中文词典vocab.txt和中文样例数据data-c.txt
### Step3:启动服务
启动cpu预测服务,执行
```
python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #启动cpu预测服务
```
或者,启动gpu预测服务,执行
```
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
``` shell
python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
```
| 参数 | 含义 |
......@@ -35,7 +61,8 @@ python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 -
| port | server端端口号 |
| gpu_ids | GPU索引号 |
#### Step3:客户端数据预处理逻辑
### Step4:客户端数据预处理逻辑
Paddle Serving内建了很多经典典型对应的数据预处理逻辑,对于中文Bert语义表示的计算,我们采用paddle_serving_app下的ChineseBertReader类进行数据预处理,开发者可以很容易获得一个原始的中文句子对应的多个模型输入字段。
......@@ -45,39 +72,40 @@ Paddle Serving内建了很多经典典型对应的数据预处理逻辑,对于
pip install paddle_serving_app
```
#### Step4:客户端访问
### Step5:客户端访问
#### 方法1:通过RPC方式执行预测
执行
```
head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
客户端脚本 bert_client.py内容如下
```
启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。
``` python
import sys
from paddle_serving_client import Client
from paddle_serving_client.utils import benchmark_args
from paddle_serving_app.reader import ChineseBertReader
import numpy as np
args = benchmark_args()
reader = ChineseBertReader({"max_seq_len": 128})
fetch = ["pooled_output"]
endpoint_list = ['127.0.0.1:9292']
client = Client()
client.load_client_config(args.model)
client.connect(endpoint_list)
#### 方法2:通过HTTP方式执行预测
该方式分为两步
1、启动一个HTTP预测服务端。
for line in sys.stdin:
feed_dict = reader.process(line)
for key in feed_dict.keys():
feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1))
result = client.predict(feed=feed_dict, fetch=fetch, batch=False)
启动cpu HTTP预测服务,执行
```
python bert_web_service.py bert_seq128_model/ 9292 #启动CPU预测服务
执行
```
```shell
cat data.txt | python bert_client.py
或者,启动gpu HTTP预测服务,执行
```
export CUDA_VISIBLE_DEVICES=0,1
```
通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu
```
python bert_web_service_gpu.py bert_seq128_model/ 9292 #启动gpu预测服务
```
从data.txt文件中读取样例,并将结果打印到标准输出。
2、通过HTTP请求执行预测。
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
```
### 性能测试
......
......@@ -6,12 +6,11 @@
| module | version |
| :--------------------------: | :-------------------------------: |
| OS | CentOS 7 |
| gcc | 4.8.5 and later |
| gcc-c++ | 4.8.5 and later |
| git | 3.82 and later |
| OS | Ubuntu16 and 18/CentOS 7 |
| gcc | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| gcc-c++ | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| cmake | 3.2.0 and later |
| Python | 2.7.2 and later / 3.6 and later |
| Python | 2.7.2 and later / 3.5.1 and later |
| Go | 1.9.2 and later |
| git | 2.17.1 and later |
| glibc-static | 2.17 |
......@@ -19,19 +18,13 @@
| bzip2-devel | 1.0.6 and later |
| python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
| sqlite-devel | 3.7.17 and later |
| patchelf | 0.9 and later |
| patchelf | 0.9 |
| libXext | 1.3.3 |
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
- Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
- Set `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`
## Get Code
``` python
......@@ -39,19 +32,47 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive
```
## PYTHONROOT Setting
## PYTHONROOT settings
```shell
# for example, the path of python is /usr/bin/python, you can set /usr as PYTHONROOT
export PYTHONROOT=/usr/
# For example, the path of python is /usr/bin/python, you can set PYTHONROOT
export PYTHONROOT=/usr
```
In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
If you are using a Docker development image, please follow the following to determine the Python version to be compiled, and set the corresponding environment variables
```
#Python 2.7
export PYTHONROOT=/usr/local/python2.7.15/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python2.7
#Python 3.5
export PYTHONROOT=/usr/local/python3.5.1
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.5m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.5m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.5
#Python3.6
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.6m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.6m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.6
#Python3.7
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.7m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.7m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.7
#Python3.8
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.8
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.8.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.8
```
## Install Python dependencies
......@@ -59,14 +80,11 @@ In the default centos7 image we provide, the Python path is `/usr/bin/python`. I
pip install -r python/requirements.txt
```
If Python3 is used, replace `pip` with `pip3`.
If you use other Python version, please use the right `pip` accordingly.
## GOPATH Setting
The default GOPATH is set to `$HOME/go`, you can also set it to other values. **If it is the Docker environment provided by Serving, you do not need to set up.**
## Compile Arguments
The default GOPATH is `$HOME/go`, which you can set to other values.
```shell
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
......@@ -100,52 +118,42 @@ make -j10
you can execute `make install` to put targets under directory `./output`, you need to add`-DCMAKE_INSTALL_PREFIX=./output`to specify output path to cmake command shown above.
### Integrated GPU version paddle inference library
### CUDA_PATH is the cuda install path,use the command(whereis cuda) to check,it should be /usr/local/cuda.
### CUDNN_LIBRARY && CUDA_CUDART_LIBRARY is the lib path, it should be /usr/local/cuda/lib64/
``` shell
export CUDA_PATH='/usr/local'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j10
```
Compared with CPU environment, GPU environment needs to refer to the following table,
**It should be noted that the following table is used as a reference for non-Docker compilation environment. The Docker compilation environment has been configured with relevant parameters and does not need to be specified in cmake process. **
### Integrated TRT version paddle inference library
| cmake environment variable | meaning | GPU environment considerations | whether Docker environment is needed |
|-----------------------|------------------------- ------------|-------------------------------|----- ---------------|
| CUDA_TOOLKIT_ROOT_DIR | cuda installation path, usually /usr/local/cuda | Required for all environments | No
(/usr/local/cuda) |
| CUDNN_LIBRARY | The directory where libcudnn.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
| CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) |
```
export CUDA_PATH='/usr/local'
If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
mkdir server-build-trt && cd server-build-trt
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH}
-DSERVER=ON \
-DWITH_GPU=ON \
-DWITH_TRT=ON ..
-DWITH_GPU=ON ..
make -j10
```
execute `make install` to put targets under directory `./output`
**Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
Execute `make install` to put the target output in the `./output` directory.
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](COMPILE.md#Notes) ).
## Compile Client
......@@ -208,7 +216,6 @@ Please use the example under `python/examples` to verify.
| CLIENT | Compile Paddle Serving Client | OFF |
| SERVER | Compile Paddle Serving Server | OFF |
| APP | Compile Paddle Serving App package | OFF |
| WITH_ELASTIC_CTR | Compile ELASITC-CTR solution | OFF |
| PACK | Compile for whl | OFF |
### WITH_GPU Option
......@@ -230,11 +237,13 @@ Note here:
The following is the base library version matching relationship used by the PaddlePaddle release version for reference:
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------------------: | :----: |
| post9 | 9.0 | CuDNN 7.3.1 for CUDA 9.0 | |
| post10 | 10.0 | CuDNN 7.5.1 for CUDA 10.0| |
| trt | 10.1 | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5 |
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------: | :----: |
| post9 | 9.0 | CuDNN 7.6.4 | |
| post10 | 10.0 | CuDNN 7.6.5 | |
| post101 | 10.1 | CuDNN 7.6.5 | 6.0.1 |
| post102 | 10.2 | CuDNN 8.0.5 | 7.1.3 |
| post11 | 11.0 | CuDNN 8.0.4 | 7.1.3 |
### How to make the compiler detect the CuDNN library
......
......@@ -6,12 +6,11 @@
| 组件 | 版本要求 |
| :--------------------------: | :-------------------------------: |
| OS | CentOS 7 |
| gcc | 4.8.5 and later |
| gcc-c++ | 4.8.5 and later |
| git | 3.82 and later |
| OS | Ubuntu16 and 18/CentOS 7 |
| gcc | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| gcc-c++ | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| cmake | 3.2.0 and later |
| Python | 2.7.2 and later / 3.6 and later |
| Python | 2.7.2 and later / 3.5.1 and later |
| Go | 1.9.2 and later |
| git | 2.17.1 and later |
| glibc-static | 2.17 |
......@@ -24,13 +23,7 @@
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境,详见[该文档](DOCKER_IMAGES_CN.md)
本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可:
-`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
-`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
-`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境并配置好了上述编译依赖,详见[该文档](DOCKER_IMAGES_CN.md)
## 获取代码
......@@ -39,19 +32,46 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive
```
## PYTHONROOT设置
```shell
# 例如python的路径为/usr/bin/python,可以设置PYTHONROOT
export PYTHONROOT=/usr/
export PYTHONROOT=/usr
```
我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`
如果您使用的是Docker开发镜像,请按照如下,确定好需要编译的Python版本,设置对应的环境变量
```
#Python 2.7
export PYTHONROOT=/usr/local/python2.7.15/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python2.7
#Python 3.5
export PYTHONROOT=/usr/local/python3.5.1
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.5m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.5m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.5
#Python3.6
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.6m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.6m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.6
#Python3.7
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.7m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.7m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.7
#Python3.8
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.8
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.8.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.8
```
## 安装Python依赖
......@@ -59,11 +79,11 @@ export PYTHONROOT=/usr/
pip install -r python/requirements.txt
```
如果使用 Python3,请以 `pip3` 替换 `pip`
如果使用其他Python版本,请使用对应版本的`pip`
## GOPATH 设置
默认 GOPATH 设置为 `$HOME/go`,您也可以设置为其他值。
默认 GOPATH 设置为 `$HOME/go`,您也可以设置为其他值。** 如果是Serving提供的Docker环境,可以不需要设置。**
```shell
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
......@@ -87,9 +107,9 @@ go get -u google.golang.org/grpc@v1.33.0
``` shell
mkdir server-build-cpu && cd server-build-cpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR/ \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DSERVER=ON ..
make -j10
```
......@@ -97,44 +117,35 @@ make -j10
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
### 集成GPU版本Paddle Inference Library
### CUDA_PATH是cuda的安装路径,可以使用命令行whereis cuda命令确认你的cuda安装路径,通常应该是/usr/local/cuda
### CUDNN_LIBRARY CUDA_CUDART_LIBRARY 是cuda库文件的路径,通常应该是/usr/local/cuda/lib64/
``` shell
export CUDA_PATH='/usr/local'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j10
```
相比CPU环境,GPU环境需要参考以下表格,
**需要说明的是,以下表格对非Docker编译环境作为参考,Docker编译环境已经配置好相关参数,无需在cmake过程指定。**
### 集成TensorRT版本Paddle Inference Library
| cmake环境变量 | 含义 | GPU环境注意事项 | Docker环境是否需要 |
|-----------------------|-------------------------------------|-------------------------------|--------------------|
| CUDA_TOOLKIT_ROOT_DIR | cuda安装路径,通常为/usr/local/cuda | 全部环境都需要 | 否(/usr/local/cuda) |
| CUDNN_LIBRARY | libcudnn.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) |
| CUDA_CUDART_LIBRARY | libcudart.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | libnvinfer.so.*所在目录的上一级目录,取决于TensorRT安装目录 | Cuda 9.0/10.0不需要,其他需要 | 否(/usr) |
```
export CUDA_PATH='/usr/local'
非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
mkdir server-build-trt && cd server-build-trt
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH}
-DSERVER=ON \
-DWITH_GPU=ON \
-DWITH_TRT=ON ..
-DWITH_GPU=ON ..
make -j10
```
......@@ -146,9 +157,9 @@ make -j10
``` shell
mkdir client-build && cd client-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCLIENT=ON ..
make -j10
```
......@@ -161,10 +172,9 @@ make -j10
```bash
mkdir app-build && cd app-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCMAKE_INSTALL_PREFIX=./output \
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DAPP=ON ..
make
```
......@@ -183,7 +193,7 @@ make
运行python端Server时,会检查`SERVING_BIN`环境变量,如果想使用自己编译的二进制文件,请将设置该环境变量为对应二进制文件的路径,通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`
其中BUILD_DIR为server-build-cpu或server-build-gpu的绝对路径。
可以cd server-build-cpu路径下,执行export SERVING_BIN=${PWD}/core/general-server/serving
可以cd server-build-cpu路径下,执行`export SERVING_BIN=${PWD}/core/general-server/serving`
......@@ -207,7 +217,6 @@ make
| CLIENT | Compile Paddle Serving Client | OFF |
| SERVER | Compile Paddle Serving Server | OFF |
| APP | Compile Paddle Serving App package | OFF |
| WITH_ELASTIC_CTR | Compile ELASITC-CTR solution | OFF |
| PACK | Compile for whl | OFF |
### WITH_GPU选项
......@@ -226,13 +235,15 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选
1. 编译Serving所在的系统上所安装的CUDA/CUDNN等基础库版本,需要兼容实际的GPU设备。例如,Tesla V100卡至少要CUDA 9.0。如果编译时所用CUDA等基础库版本过低,由于生成的GPU代码和实际硬件设备不兼容,会导致Serving进程无法启动,或出现coredump等严重问题。
2. 运行Paddle Serving的系统上安装与实际GPU设备兼容的CUDA driver,并安装与编译期所用的CUDA/CuDNN等版本兼容的基础库。如运行Paddle Serving的系统上安装的CUDA/CuDNN的版本低于编译时所用版本,可能会导致奇怪的cuda函数调用失败等问题。
以下是PaddlePaddle发布版本所使用的基础库版本匹配关系,供参考:
以下是PaddleServing 镜像的Cuda与Cudnn,TensorRT的匹配关系,供参考:
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------------------: | :----: |
| post9 | 9.0 | CuDNN 7.3.1 for CUDA 9.0 | |
| post10 | 10.0 | CuDNN 7.5.1 for CUDA 10.0| |
| trt | 10.1 | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5 |
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------: | :----: |
| post9 | 9.0 | CuDNN 7.6.4 | |
| post10 | 10.0 | CuDNN 7.6.5 | |
| post101 | 10.1 | CuDNN 7.6.5 | 6.0.1 |
| post102 | 10.2 | CuDNN 8.0.5 | 7.1.3 |
| post11 | 11.0 | CuDNN 8.0.4 | 7.1.3 |
### 如何让Paddle Serving编译系统探测到CuDNN库
......
此差异已折叠。
此差异已折叠。
......@@ -8,11 +8,10 @@ This document maintains a list of docker images provided by Paddle Serving.
You can get images in two ways:
1. Pull image directly from `hub.baidubce.com ` or `docker.io` through TAG:
1. Pull image directly from `registry.baidubce.com ` through TAG:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
```
2. Building image based on dockerfile
......@@ -20,7 +19,7 @@ You can get images in two ways:
Create a new folder and copy Dockerfile to this folder, and run the following command:
```shell
docker build -t <image-name>:<images-tag> .
docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
```
......@@ -38,15 +37,65 @@ If you want to customize your Serving based on source code, use the version with
| GPU (cuda9.0-cudnn7) development | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) runtime | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) development | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| CPU development (Used to compile packages on Ubuntu) | CentOS6 | <None> | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
| GPU (cuda9.0-cudnn7) development (Used to compile packages on Ubuntu) | CentOS6 | <None> | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
| GPU (cuda10.1-cudnn7-tensorRT6) runtime | Ubuntu16 | latest-cuda10.1-cudnn7 | [Dockerfile.cuda10.1-cudnn7](../tools/Dockerfile.cuda10.1-cudnn7) |
| GPU (cuda10.1-cudnn7-tensorRT6) development | Ubuntu16 | latest-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
| GPU (cuda10.2-cudnn8-tensorRT7) runtime | Ubuntu16| latest-cuda10.2-cudnn8 | [Dockerfile.cuda10.2-cudnn8](../tools/Dockerfile.cuda10.2-cudnn8) |
| GPU (cuda10.2-cudnn8-tensorRT7) development | Ubuntu16 | latest-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
| GPU (cuda11-cudnn8-tensorRT7) runtime | Ubuntu18| latest-cuda11-cudnn8 | [Dockerfile.cuda11-cudnn8](../tools/Dockerfile.cuda11-cudnn8) |
| GPU (cuda11-cudnn8-tensorRT7) development | Ubuntu18 | latest-cuda11-cudnn8-devel | [Dockerfile.cuda11-cudnn8.devel](../tools/Dockerfile.cuda11-cudnn8.devel) |
**Java Client:**
```
registry.baidubce.com/paddlepaddle/serving:latest-java
```
**XPU:**
```
registry.baidubce.com/paddlepaddle/serving:xpu-beta
```
## Requirements for running CUDA containers
Running a CUDA container requires a machine with at least one CUDA-capable GPU and a driver compatible with the CUDA toolkit version you are using.
The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit doesn't have to be installed.
The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit does not have to be installed.
For the relationship between CUDA toolkit version, Driver version and GPU architecture, please refer to [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA).
# (Attachment) The List of All the Docker images
Develop Images:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|------------------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-devel | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8-devel | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
Running Images:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|-----------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0 | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8 | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
**Tips:** If you want to use CPU server and GPU server (version>=0.5.0) at the same time, you should check the gcc version, only Cuda10.1/10.2/11 can run with CPU server owing to the same gcc version(8.2).
......@@ -8,11 +8,10 @@
您可以通过两种方式获取镜像。
1. 通过 TAG 直接从 `hub.baidubce.com ``docker.io` 拉取镜像:
1. 通过 TAG 直接从 `registry.baidubce.com ` 或 拉取镜像,具体TAG请参见下文的**镜像说明**章节的表格。
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
```
2. 基于 Dockerfile 构建镜像
......@@ -20,7 +19,8 @@
建立新目录,复制对应 Dockerfile 内容到该目录下 Dockerfile 文件。执行
```shell
docker build -t <image-name>:<images-tag> .
cd tools
docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
```
......@@ -29,18 +29,33 @@
运行时镜像不能用于开发编译。
若需要基于源代码二次开发编译,请使用后缀为-devel的版本。
| 镜像说明 | 操作系统 | TAG | Dockerfile |
| -------------------------------------------------- | -------- | ---------------------------- | ------------------------------------------------------------ |
| CPU 运行镜像 | CentOS7 | latest | [Dockerfile](../tools/Dockerfile) |
| CPU 开发镜像 | CentOS7 | latest-devel | [Dockerfile.devel](../tools/Dockerfile.devel) |
| GPU (cuda9.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda9.0-cudnn7 | [Dockerfile.cuda9.0-cudnn7](../tools/Dockerfile.cuda9.0-cudnn7) |
| GPU (cuda9.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| CPU 开发镜像 (用于编译 Ubuntu 包) | CentOS6 | <无> | [Dockerfile.centos6.devel](../tools/Dockerfile.centos6.devel) |
| GPU (cuda9.0-cudnn7) 开发镜像 (用于编译 Ubuntu 包) | CentOS6 | <无> | [Dockerfile.centos6.cuda9.0-cudnn7.devel](../tools/Dockerfile.centos6.cuda9.0-cudnn7.devel) |
**在TAG列,latest也可以替换成对应的版本号,例如0.5.0/0.4.1等,但需要注意的是,部分开发环境随着某个版本迭代才增加,因此并非所有环境都有对应的版本号可以使用。**
| 镜像选择 | 操作系统 | TAG | Dockerfile |
| :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
| CPU 运行镜像 | CentOS7 | latest | [Dockerfile](../tools/Dockerfile) |
| CPU 开发镜像 | CentOS7 | latest-devel | [Dockerfile.devel](../tools/Dockerfile.devel) |
| GPU (cuda9.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda9.0-cudnn7 | [Dockerfile.cuda9.0-cudnn7](../tools/Dockerfile.cuda9.0-cudnn7) |
| GPU (cuda9.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda9.0-cudnn7-devel | [Dockerfile.cuda9.0-cudnn7.devel](../tools/Dockerfile.cuda9.0-cudnn7.devel) |
| GPU (cuda10.0-cudnn7) 运行镜像 | CentOS7 | latest-cuda10.0-cudnn7 | [Dockerfile.cuda10.0-cudnn7](../tools/Dockerfile.cuda10.0-cudnn7) |
| GPU (cuda10.0-cudnn7) 开发镜像 | CentOS7 | latest-cuda10.0-cudnn7-devel | [Dockerfile.cuda10.0-cudnn7.devel](../tools/Dockerfile.cuda10.0-cudnn7.devel) |
| GPU (cuda10.1-cudnn7-tensorRT6) 运行镜像 | Ubuntu16 | latest-cuda10.1-cudnn7 | [Dockerfile.cuda10.1-cudnn7](../tools/Dockerfile.cuda10.1-cudnn7) |
| GPU (cuda10.1-cudnn7-tensorRT6) 开发镜像 | Ubuntu16 | latest-cuda10.1-cudnn7-devel | [Dockerfile.cuda10.1-cudnn7.devel](../tools/Dockerfile.cuda10.1-cudnn7.devel) |
| GPU (cuda10.2-cudnn8-tensorRT7) 运行镜像 | Ubuntu16| latest-cuda10.2-cudnn8 | [Dockerfile.cuda10.2-cudnn8](../tools/Dockerfile.cuda10.2-cudnn8) |
| GPU (cuda10.2-cudnn8-tensorRT7) 开发镜像 | Ubuntu16 | latest-cuda10.2-cudnn8-devel | [Dockerfile.cuda10.2-cudnn8.devel](../tools/Dockerfile.cuda10.2-cudnn8.devel) |
| GPU (cuda11-cudnn8-tensorRT7) 运行镜像 | Ubuntu18| latest-cuda11-cudnn8 | [Dockerfile.cuda11-cudnn8](../tools/Dockerfile.cuda11-cudnn8) |
| GPU (cuda11-cudnn8-tensorRT7) 开发镜像 | Ubuntu18 | latest-cuda11-cudnn8-devel | [Dockerfile.cuda11-cudnn8.devel](../tools/Dockerfile.cuda11-cudnn8.devel) |
**Java镜像:**
```
registry.baidubce.com/paddlepaddle/serving:latest-java
```
**XPU镜像:**
```
registry.baidubce.com/paddlepaddle/serving:xpu-beta
```
## 运行CUDA容器的要求
......@@ -50,3 +65,41 @@
运行CUDA容器的机器**只需要相应的NVIDIA驱动程序**,而CUDA工具包不是必要的。
相关CUDA工具包版本、驱动版本和GPU架构的关系请参阅 [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)
# (附录)所有镜像列表
编译镜像:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|------------------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-devel | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8-devel | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
运行镜像:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|-----------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0 | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8 | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
**注意事项:** 如果您在0.5.0及以上版本需要在一个容器当中同时运行CPU server和GPU server,需要选择Cuda10.1/10.2/11的镜像,因为他们和CPU环境有着相同版本的gcc。
# MOEDL ENCRYPTION INFERENCE
([简体中文](ENCRYPTION_CN.md)|English)
Paddle Serving provides model encryption inference, This document shows the details.
## Principle
We use symmetric encryption algorithm to encrypt the model. Symmetric encryption algorithm uses the same key for encryption and decryption, it has small amount of calculation, fast speed, is the most commonly used encryption method.
### Got an Encrypted Model
Normal model and parameters can be understood as a string, by using the encryption algorithm (parameter is your key) on them, the normal model and parameters become an encrypted one.
We provide a simple demo to encrypt the model. See the [python/examples/encryption/encrypt.py](../python/examples/encryption/encrypt.py)
### Start Encryption Service
Suppose you already have an encrypted model(in the `encrypt_server/`),you can start the encryption model service by adding an additional command line parameter `--use_encryption_model`
CPU Service
```
python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model
```
GPU Service
```
python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
```
At this point, the server does not really start, but waits for the key。
### Client Encryption Inference
First of all, you got have the key which is used in the process of model encryption.
Then you can configure your client with the key, when you connect the server, this key will send to the server and the server will keep it.
Once the server gets the key, it uses the key to parse the model and starts the model prediction service.
### Example of Model Encryption Inference
Example of model encryption inference, See the [`/python/examples/encryption/`](../python/examples/encryption/)
# 加密模型预测
(简体中文|[English](ENCRYPTION.md))
Padle Serving提供了模型加密预测功能,本文档显示了详细信息。
## 原理
采用对称加密算法对模型进行加密。对称加密算法采用同一密钥进行加解密,它计算量小,速度快,是最常用的加密方法。
### 获得加密模型
普通的模型和参数可以理解为一个字符串,通过对其使用加密算法(参数是您的密钥),普通模型和参数就变成了一个加密的模型和参数。
我们提供了一个简单的演示来加密模型。请参阅[`python/examples/encryption/encrypt.py`](../python/examples/encryption/encrypt.py)
### 启动加密服务
假设您已经有一个已经加密的模型(在`encrypt_server/`路径下),您可以通过添加一个额外的命令行参数 `--use_encryption_model`来启动加密模型服务。
CPU Service
```
python -m paddle_serving_server.serve --model encrypt_server/ --port 9300 --use_encryption_model
```
GPU Service
```
python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 --use_encryption_model --gpu_ids 0
```
此时,服务器不会真正启动,而是等待密钥。
### Client Encryption Inference
首先,您必须拥有模型加密过程中使用的密钥。
然后你可以用这个密钥配置你的客户端,当你连接服务器时,这个密钥会发送到服务器,服务器会保留它。
一旦服务器获得密钥,它就使用该密钥解析模型并启动模型预测服务。
### 模型加密推理示例
模型加密推理示例, 请参见[`/python/examples/encryption/`](../python/examples/encryption/)
......@@ -71,6 +71,41 @@ Collecting opencv-python
```
**A:** 指定opencv-python版本安装,pip install opencv-python==4.2.0.32,再安装whl包
#### Q: pip3 install whl包过程报错信息如下:
```
Complete output from command python setup.py egg_info:
Found cython-generated files...
error in grpcio setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected ',' or end-of-list in futures>=2.2.0; python_version<'3.2' at ; python_version<'3.2'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-taoxz02y/grpcio/
```
**A:** 需要升级pip3,再重新执行安装命令。
```
pip3 install --upgrade pip
pip3 install --upgrade setuptools
```
#### Q: 运行过程中报错,信息如下:
```
Traceback (most recent call last):
File "../../deploy/serving/test_client.py", line 18, in <module>
from paddle_serving_app.reader import *
File "/usr/local/python2.7.15/lib/python2.7/site-packages/paddle_serving_app/reader/__init__.py", line 15, in <module>
from .image_reader import ImageReader, File2Image, URL2Image, Sequential, Normalize, Base64ToImage
File "/usr/local/python2.7.15/lib/python2.7/site-packages/paddle_serving_app/reader/image_reader.py", line 24, in <module>
from shapely.geometry import Polygon
ImportError: No module named shapely.geometry
```
**A:** 有2种方法,第一种通过pip/pip3安装shapely,第二种通过pip/pip3安装所有依赖组件。
```
方法1:
pip install shapely==1.7.0
方法2:
pip install -r python/requirements.txt
```
## 编译问题
#### Q: 如何使用自己编译的Paddle Serving进行预测?
......
......@@ -24,13 +24,12 @@
#### 1.1 服务端对比
* gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数:
* 由于gRPC Server 端实际包含了brpc-Client端的,因此brpc-Client的初始化过程是在gRPC Server 端实现的,所以gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数,用于指定brpc-Client初始化过程中的传输数据格式配置文件路径(`client_config_path` 参数未指定时默认为None,此时`client_config_path``load_model_config` 函数中被默认为 `<server_config_path>/serving_server_conf.prototxt`,此时brpc-Client与brpc-Server的传输数据格式配置文件相同)
```
def load_model_config(self, server_config_paths, client_config_path=None)
```
在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能不同(如 在cube local 中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),此时 gRPC Server 端需要手动设置 gRPC Client 端的配置`client_config_path`
**`client_config_path` 默认为 `<server_config_path>/serving_server_conf.prototxt`。**
#### 1.2 客服端对比
......@@ -47,13 +46,15 @@
* gRPC Client 端 `predict` 函数添加 `asyn``is_python` 参数:
```
def predict(self, feed, fetch, need_variant_tag=False, asyn=False, is_python=True)
def predict(self, feed, fetch, batch=True, need_variant_tag=False, asyn=False, is_python=True,log_id=0)
```
1. `asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。
2. `is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。
3. `batch`为数据是否需要进行增维处理的选项。当`batch=True`时,feed数据不需要额外的处理,维持原有维度;当`batch=False`时,会对数据进行增维度处理。例如:feed.shape原始为[2,2],当`batch=False`时,会将feed.reshape为[1,2,2]。
#### 1.3 其他
* 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。
......@@ -74,7 +75,7 @@
## 2.示例:线性回归预测服务
以下是采用gRPC实现的关于线性回归预测的一个示例,具体代码详见此[链接](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example/fit_a_line)
以下是采用gRPC实现的关于线性回归预测的一个示例,具体代码详见此[链接](../python/examples/grpc_impl_example/fit_a_line)
#### 获取数据
```shell
......@@ -114,24 +115,12 @@ python test_asyn_client.py
python test_batch_client.py
```
#### 通用 pb 预测
``` shell
python test_general_pb_client.py
```
#### 预测超时
``` shell
python test_timeout_client.py
```
#### List 输入
``` shell
python test_list_input_client.py
```
## 3.更多示例
详见[`python/examples/grpc_impl_example`](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example)下的示例文件。
详见[`python/examples/grpc_impl_example`](../python/examples/grpc_impl_example)下的示例文件。
# How to Convert Paddle Inference Model To Paddle Serving Format
([简体中文](./INFERENCE_TO_SERVING_CN.md)|English)
you can use a build-in python module called `paddle_serving_client.convert` to convert it.
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
Arguments are the same as `inference_model_to_serving` API.
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
# 如何从Paddle保存的预测模型转为Paddle Serving格式可部署的模型
([English](./INFERENCE_TO_SERVING.md)|简体中文)
你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
模块参数与`inference_model_to_serving`接口参数相同。
| 参数 | 类型 | 默认值 | 描述 |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。|
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的>二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
......@@ -18,8 +18,9 @@ The following table shows compatibilities between Paddle Serving Server and Java
| Paddle Serving Server version | Java SDK version |
| :---------------------------: | :--------------: |
| 0.3.2 | 0.0.1 |
| 0.5.0 | 0.0.1 |
1. Directly use the provided Java SDK as the client for prediction
### Install Java SDK
You can download jar and install it to the local Maven repository:
......@@ -29,14 +30,6 @@ wget https://paddle-serving.bj.bcebos.com/jar/paddle-serving-sdk-java-0.0.1.jar
mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId=io.paddle.serving.client -DartifactId=paddle-serving-sdk-java -Dversion=0.0.1 -Dpackaging=jar
```
Or compile from the source code and install it to the local Maven repository:
```shell
cd Serving/java
mvn compile
mvn install
```
### Maven configure
```text
......@@ -47,63 +40,8 @@ mvn install
</dependency>
```
2. Use it after compiling from the source code. See the [document](../java/README.md).
## Example
Here we will show how to use Java SDK for Boston house price prediction. Please refer to [examples](../java/examples) folder for more examples.
### Get model
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
3. examples for using the java client, see the See the [document](../java/README.md).
### Start Python Server
```shell
python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --use_multilang
```
#### Client side code example
```java
import io.paddle.serving.client.*;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
public static void main( String[] args ) {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return ;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
System.out.println("predict failed.");
return ;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return ;
}
}
```
......@@ -17,8 +17,9 @@ Paddle Serving 提供了 Java SDK,支持 Client 端用 Java 语言进行预测
| Paddle Serving Server version | Java SDK version |
| :---------------------------: | :--------------: |
| 0.3.2 | 0.0.1 |
| 0.5.0 | 0.0.1 |
1. 直接使用提供的Java SDK作为Client进行预测
### 安装
您可以直接下载 jar,安装到本地 Maven 库:
......@@ -28,14 +29,6 @@ wget https://paddle-serving.bj.bcebos.com/jar/paddle-serving-sdk-java-0.0.1.jar
mvn install:install-file -Dfile=$PWD/paddle-serving-sdk-java-0.0.1.jar -DgroupId=io.paddle.serving.client -DartifactId=paddle-serving-sdk-java -Dversion=0.0.1 -Dpackaging=jar
```
或者从源码进行编译,安装到本地 Maven 库:
```shell
cd Serving/java
mvn compile
mvn install
```
### Maven 配置
```text
......@@ -46,64 +39,7 @@ mvn install
</dependency>
```
2. 从源码进行编译后使用,详细步骤见[文档](../java/README.md).
3. 相关使用示例见[文档](../java/README.md).
## 使用样例
这里将展示如何使用 Java SDK 进行房价预测,更多例子详见 [examples](../java/examples) 文件夹。
### 获取房价预测模型
```shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
### 启动 Python 端 Server
```shell
python -m paddle_serving_server.serve --model uci_housing_model --port 9393 --use_multilang
```
### Client 端代码示例
```java
import io.paddle.serving.client.*;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.factory.Nd4j;
import java.util.*;
public class PaddleServingClientExample {
public static void main( String[] args ) {
float[] data = {0.0137f, -0.1136f, 0.2553f, -0.0692f,
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
}};
List<String> fetch = Arrays.asList("price");
Client client = new Client();
String target = "localhost:9393";
boolean succ = client.connect(target);
if (succ != true) {
System.out.println("connect failed.");
return ;
}
Map<String, INDArray> fetch_map = client.predict(feed_data, fetch);
if (fetch_map == null) {
System.out.println("predict failed.");
return ;
}
for (Map.Entry<String, INDArray> e : fetch_map.entrySet()) {
System.out.println("Key = " + e.getKey() + ", Value = " + e.getValue());
}
return ;
}
}
```
......@@ -3,47 +3,59 @@
## CPU server
### Python 3
```
# Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py3-none-any.whl
```
### Python 2
```
# Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.0.0-py2-none-any.whl
```
## GPU server
### Python 3
```
#cuda 9.0
#cuda 9.0, Compile by gcc4.8
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py3-none-any.whl
#cuda 10.0
#cuda 10.0, Compile by gcc4.8
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py3-none-any.whl
#cuda10.1 with TensorRT 6
#cuda10.1 with TensorRT 6, Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post101-py3-none-any.whl
#cuda10.2 with TensorRT 7
#cuda10.2 with TensorRT 7, Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post102-py3-none-any.whl
#cuda11.0 with TensorRT 7 (beta), Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post11-py3-none-any.whl
```
### Python 2
```
#cuda 9.0
#cuda 9.0, Compile by gcc4.8
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post9-py2-none-any.whl
#cuda 10.0
#cuda 10.0, Compile by gcc4.8
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post10-py2-none-any.whl
#cuda10.1 with TensorRT 6
#cuda10.1 with TensorRT 6, Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post101-py2-none-any.whl
#cuda10.2 with TensorRT 7
#cuda10.2 with TensorRT 7, Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post102-py2-none-any.whl
#cuda11.0 with TensorRT 7 (beta), Compile by gcc8.2
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.0.0.post11-py2-none-any.whl
```
**Tips:** If you want to use CPU server and GPU server at the same time, you should check the gcc version, only Cuda10.1/10.2/11 can run with CPU server owing to the same gcc version(8.2).
## Client
### Python 3.7
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
```
### Python 3.6
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp36-none-any.whl
```
### Python 3.8
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp38-none-any.whl
```
### Python 3.7
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp37-none-any.whl
```
### Python 3.5
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp35-none-any.whl
......@@ -53,6 +65,7 @@ https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp35-none-a
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.0.0-cp27-none-any.whl
```
## App
### Python 3
```
......@@ -63,3 +76,48 @@ https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py3-none-any.w
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.0.0-py2-none-any.whl
```
## ARM user
for ARM user who uses [PaddleLite](https://github.com/PaddlePaddle/PaddleLite) can download the wheel packages as follows. And ARM user should use the xpu-beta docker [DOCKER IMAGES](./DOCKER_IMAGES.md)
**We only support Python 3.6 for Arm Users.**
### Wheel Package Links
```
# Server
https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_server_gpu-0.0.0.postarm_xpu-py3-none-any.whl
# Client
https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_client-0.0.0-cp36-none-any.whl
# App
https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_app-0.0.0-py3-none-any.whl
```
### Binary Package
for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
#### Bin links
```
# CPU AVX MKL
https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-mkl-0.0.0.tar.gz
# CPU AVX OPENBLAS
https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-openblas-0.0.0.tar.gz
# CPU NOAVX OPENBLAS
https://paddle-serving.bj.bcebos.com/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz
# Cuda 9
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda9-0.0.0.tar.gz
# Cuda 10
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda10-0.0.0.tar.gz
# Cuda 10.1
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-101-0.0.0.tar.gz
# Cuda 10.2
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-102-0.0.0.tar.gz
# Cuda 11
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda11-0.0.0.tar.gz
```
#### How to setup SERVING_BIN offline?
- download the serving server whl package and bin package, and make sure they are for the same environment
- download the serving client whl and serving app whl, pay attention to the Python version.
- `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda10-0.0.0/serving` (take Cuda 10.0 as the example)
......@@ -5,7 +5,7 @@
例如:
```shell
python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0
python -m paddle_serving_server_gpu.serve --model bert_seq128_model --port 9292 --gpu_ids 0
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0
```
......
......@@ -17,14 +17,14 @@ This document takes Python2 as an example to show how to run Paddle Serving in d
Refer to [this document](DOCKER_IMAGES.md) for a docker image:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
```
### Create container
```bash
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-devel
docker exec -it test bash
```
......@@ -46,20 +46,20 @@ The GPU version is basically the same as the CPU version, with only some differe
Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
```
### Create container
```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
```
or
```bash
docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
docker run --gpus all -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
docker exec -it test bash
```
......@@ -69,8 +69,9 @@ The `-p` option is to map the `9292` port of the container to the `9292` port of
The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.
If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version. [LATEST_PACKAGES](./LATEST_PACKAGES.md)
## Precautious
Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md).
- Runtime images cannot be used for compilation. If you want to compile from source, refer to [COMPILE](COMPILE.md).
- If you use Cuda9 and Cuda10 docker images, you cannot use `paddle_serving_server` CPU version at the same time, due to the limitation of gcc version. If you want to use both in one docker image, please choose images of Cuda10.1, Cuda10.2 and Cuda11.
......@@ -16,14 +16,16 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker)
参考[该文档](DOCKER_IMAGES_CN.md)获取镜像:
以CPU编译镜像为例
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
```
### 创建容器并进入
```bash
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-devel
docker exec -it test bash
```
......@@ -37,15 +39,19 @@ docker exec -it test bash
## GPU 版本
```shell
docker pull registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
```
### 创建容器并进入
```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
```
或者
```bash
docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
docker run --gpus all -p 9292:9292 --name test -dit registry.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
docker exec -it test bash
```
......@@ -55,8 +61,9 @@ docker exec -it test bash
镜像里自带对应镜像tag版本的`paddle_serving_server_gpu``paddle_serving_client``paddle_serving_app`,如果用户不需要更改版本,可以直接使用,适用于没有外网服务的环境。
如果需要更换版本,请参照首页的指导,下载对应版本的pip包。
如果需要更换版本,请参照首页的指导,下载对应版本的pip包。[最新安装包合集](LATEST_PACKAGES.md)
## 注意事项
运行时镜像不能用于开发编译。如果想要从源码编译,请查看[如何编译PaddleServing](COMPILE.md)
- 运行时镜像不能用于开发编译。如果想要从源码编译,请查看[如何编译PaddleServing](COMPILE.md)
- 由于Cuda10和Cuda9的环境受限于GCC版本,无法同时运行CPU版本的`paddle_serving_server`,因此如果想要在GPU环境中同时使用CPU版本的`paddle_serving_server`,请选择Cuda10.1,Cuda10.2和Cuda11版本的镜像。
......@@ -2,7 +2,74 @@
([简体中文](./SAVE_CN.md)|English)
## Save from training or prediction script
## Export from saved model files
you can use a build-in python module called `paddle_serving_client.convert` to convert it.
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
```
Arguments are the same as `inference_model_to_serving` API.
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
**Demo: Convert From Dynamic Graph**
PaddlePaddle 2.0 provides a new dynamic graph mode, so here we use imagenet ResNet50 dynamic graph as an example to teach how to export from a saved model and use it for real online inference scenarios.
```
wget https://paddle-serving.bj.bcebos.com/others/dygraph_res50.tar #模型
wget https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg #示例输入(向日葵)
tar xf dygraph_res50.tar
python -m paddle_serving_client.convert --dirname . --model_filename dygraph_model.pdmodel --params_filename dygraph_model.pdiparams --serving_server serving_server --serving_client serving_client```
We can see that the `serving_server` and `serving_client` folders hold the server and client configuration of the model respectively
Start the server (GPU)
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
```
Client (`test_client.py`)
```
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"serving_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"inputs": img}, fetch=["save_infer_model/scale_0.tmp_0"])
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
```
Run
```
python test_client.py
```
You can see that the prediction has been successfully executed. The above is the content predicted by the dynamic graph ResNet50 model on Serving. The use of other dynamic graph models is similar.
## Save from training or prediction script (Static Graph Mode)
Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
``` python
import paddle_serving_client.io as serving_io
......@@ -31,22 +98,3 @@ for line in sys.stdin:
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
## Export from saved model files
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
```
Or you can use a build-in python module called `paddle_serving_client.convert` to convert it.
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
Arguments are the same as `inference_model_to_serving` API.
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
......@@ -2,14 +2,79 @@
(简体中文|[English](./SAVE.md))
## 从训练或预测脚本中保存
## 从已保存的模型文件中导出
如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
也可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None)
```
模块参数与`inference_model_to_serving`接口参数相同。
| 参数 | 类型 | 默认值 | 描述 |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。|
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
**示例:从动态图模型中导出**
PaddlePaddle 2.0提供了全新的动态图模式,因此我们这里以imagenet ResNet50动态图为示例教学如何从已保存模型导出,并用于真实的在线预测场景。
```
wget https://paddle-serving.bj.bcebos.com/others/dygraph_res50.tar #模型
wget https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg #示例输入(向日葵)
tar xf dygraph_res50.tar
python -m paddle_serving_client.convert --dirname . --model_filename dygraph_model.pdmodel --params_filename dygraph_model.pdiparams --serving_server serving_server --serving_client serving_client
```
我们可以看到`serving_server``serving_client`文件夹分别保存着模型的服务端和客户端配置
启动服务端(GPU)
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
```
客户端写法,保存为`test_client.py`
```
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"serving_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"inputs": img}, fetch=["save_infer_model/scale_0.tmp_0"])
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
```
执行
```
python test_client.py
```
即可看到成功的执行了预测,以上就是动态图ResNet50模型在Serving上预测的内容,其他动态图模型使用方式与之类似。
## 从训练或预测脚本中保存(静态图)
目前,Paddle Serving提供了一个save_model接口供用户访问,该接口与Paddle的`save_inference_model`类似。
``` python
import paddle_serving_client.io as serving_io
serving_io.save_model("imdb_model", "imdb_client_conf",
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
paddle.static.default_main_program())
```
imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。
......@@ -32,22 +97,3 @@ for line in sys.stdin:
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
## 从已保存的模型文件中导出
如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None)
```
或者你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
模块参数与`inference_model_to_serving`接口参数相同。
| 参数 | 类型 | 默认值 | 描述 |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。|
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
## Paddle Serving uses TensorRT
(English|[简体中文]((./TENSOR_RT_CN.md)))
### Background
Deploying models trained on mainstream frameworks through the tensorRT tool launched by Nvidia can greatly increase the speed of model inference, which is often at least 1 times faster than the original framework, and it also takes up more device memory. less. Therefore, it is very useful for all users who need to deploy models to master the method of deploying deep learning models with tensorRT. Paddle Serving provides comprehensive TensorRT ecological support.
### surroundings
Serving Cuda10.1 Cuda10.2 and Cuda11 versions support TensorRT.
#### Install Paddle
In [Development using Docker environment](./RUN_IN_DOCKER.md) and [Docker image list](./DOCKER_IMAGES.md), we give the development image of TensorRT. After using the mirror to start, you need to install the Paddle whl package that supports TensorRT, refer to the documentation on the home page
```
# GPU Cuda10.2 environment please execute
pip install paddlepaddle-gpu==2.0.0
```
**Note**: If your Cuda version is not 10.2, please do not execute the above commands directly, you need to refer to [Paddle official documentation-multi-version whl package list
](https://www.paddlepaddle.org.cn/documentation/docs/en/install/Tables_en.html#multi-version-whl-package-list-release)
Select the URL link of the corresponding GPU environment and install it. For example, for Python2.7 users of Cuda 10.1, please select `cp27-cp27mu` and
`cuda10.1-cudnn7.6-trt6.0.1.5` corresponding url, copy it and execute
```
pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.0.0-gpu-cuda10.1-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post101-cp27-cp27mu-linux_x86_64.whl
```
Since the default `paddlepaddle-gpu==2.0.0` is Cuda 10.2 and TensorRT is not built, if you need to use TensorRT on `paddlepaddle-gpu`, you need to find `cuda10 in the above multi-version whl package list .2-cudnn8.0-trt7.1.3`, download the corresponding Python version.
#### Install Paddle Serving
```
# Cuda10.2
pip install paddle-server-server==${VERSION}.post102
# Cuda 10.1
pip install paddle-server-server==${VERSION}.post101
# Cuda 11
pip install paddle-server-server==${VERSION}.post11
```
### Use TensorRT
#### RPC mode
In [Serving model example](../python/examples), we have given models that can be accelerated using TensorRT, such as [Faster_RCNN model](../python/examples/detection/faster_rcnn_r50_fpn_1x_coco) under detection
We just need
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
```
The TensorRT version of the faster_rcnn model server is started
#### Local Predictor mode
In [local_predictor](../python/paddle_serving_app/local_predict.py#L52), users can explicitly specify `use_trt=True` and pass it to `load_model_config`.
Other methods are no different from other Local Predictor methods, and you need to pay attention to the compatibility of the model with TensorRT.
#### Pipeline Mode
In [Pipeline mode](./PIPELINE_SERVING.md), our [imagenet example](../python/examples/pipeline/imagenet/config.yml#L23) gives the way to set TensorRT.
## Paddle Serving 使用 TensorRT
([English](./TENSOR_RT.md)|简体中文)
### 背景
通过Nvidia推出的tensorRT工具来部署主流框架上训练的模型能够极大的提高模型推断的速度,往往相比与原本的框架能够有至少1倍以上的速度提升,同时占用的设备内存也会更加的少。因此对是所有需要部署模型的用户来说,掌握用tensorRT来部署深度学习模型的方法是非常有用的。Paddle Serving提供了全面的TensorRT生态支持。
### 环境
Serving 的Cuda10.1 Cuda10.2和Cuda11版本支持TensorRT。
#### 安装Paddle
[使用Docker环境开发](./RUN_IN_DOCKER_CN.md)[Docker镜像列表](./DOCKER_IMAGES_CN.md)当中,我们给出了TensorRT的开发镜像。使用镜像启动之后,需要安装支持TensorRT的Paddle whl包,参考首页的文档
```
# GPU Cuda10.2环境请执行
pip install paddlepaddle-gpu==2.0.0
```
**注意**: 如果您的Cuda版本不是10.2,请勿直接执行上述命令,需要参考[Paddle官方文档-多版本whl包列表
](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)
选择相应的GPU环境的url链接并进行安装,例如Cuda 10.1的Python2.7用户,请选择表格当中的`cp27-cp27mu`
`cuda10.1-cudnn7.6-trt6.0.1.5`对应的url,复制下来并执行
```
pip install https://paddle-wheel.bj.bcebos.com/with-trt/2.0.0-gpu-cuda10.1-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post101-cp27-cp27mu-linux_x86_64.whl
```
由于默认的`paddlepaddle-gpu==2.0.0`是Cuda 10.2,并没有联编TensorRT,因此如果需要和在`paddlepaddle-gpu`上使用TensorRT,需要在上述多版本whl包列表当中,找到`cuda10.2-cudnn8.0-trt7.1.3`,下载对应的Python版本。
#### 安装Paddle Serving
```
# Cuda10.2
pip install paddle-server-server==${VERSION}.post102
# Cuda 10.1
pip install paddle-server-server==${VERSION}.post101
# Cuda 11
pip install paddle-server-server==${VERSION}.post11
```
### 使用TensorRT
#### RPC模式
[Serving模型示例](../python/examples)当中,我们有给出可以使用TensorRT加速的模型,例如detection下的[Faster_RCNN模型](../python/examples/detection/faster_rcnn_r50_fpn_1x_coco)
我们只需
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/2.0/faster_rcnn_r50_fpn_1x_coco.tar
tar xf faster_rcnn_r50_fpn_1x_coco.tar
python -m paddle_serving_server_gpu.serve --model serving_server --port 9494 --gpu_ids 0 --use_trt
```
TensorRT版本的faster_rcnn模型服务端就启动了
#### Local Predictor模式
[local_predictor](../python/paddle_serving_app/local_predict.py#L52)当中,用户可以显式制定`use_trt=True`传入到`load_model_config`当中。
其他方式和其他Local Predictor使用方法没有区别,需要注意模型对TensorRT的兼容性。
#### Pipeline模式
[Pipeline模式](./PIPELINE_SERVING_CN.md)当中,我们的[imagenet例子](../python/examples/pipeline/imagenet/config.yml#L23)给出了设置TensorRT的方式。
# An End-to-end Tutorial from Training to Inference Service Deployment
([简体中文](./TRAIN_TO_SERVICE_CN.md)|English)
Paddle Serving is Paddle's high-performance online inference service framework, which can flexibly support the deployment of most models. In this article, the IMDB review sentiment analysis task is used as an example to show the entire process from model training to deployment of inference service through 9 steps.
## Step1:Prepare for Running Environment
Paddle Serving can be deployed on Linux environments.Currently the server supports deployment on Centos7. [Docker deployment is recommended](RUN_IN_DOCKER.md). The rpc client supports deploymen on Centos7 and Ubuntu 18.On other systems or in environments where you do not want to install the serving module, you can still access the server-side prediction service through the http service.
You can choose to install the cpu or gpu version of the server module according to the requirements and machine environment, and install the client module on the client machine. When you want to access the server with http, there is not need to install client module.
```shell
pip install paddle_serving_server #cpu version server side
pip install paddle_serving_server_gpu #gpu version server side
pip install paddle_serving_client #client version
```
After simple preparation, we will take the IMDB review sentiment analysis task as an example to show the process from model training to deployment of prediction services. All the code in the example can be found in the [IMDB example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb) of the Paddle Serving code base, the data and dictionary used in the example The file can be obtained by executing the get_data.sh script in the IMDB sample code.
## Step2:Determine Tasks and Raw Data Format
IMDB review sentiment analysis task is to classify the content of movie reviews to determine whether the review is a positive review or a negative review.
First let's take a look at the raw data:
```
saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
```
This is a sample of English comments. The sample uses | as the separator. The content of the comment is before the separator. The label is the sample after the separator. 0 is the negative while 1 is the positive.
## Step3:Define Reader, divide training set and test set
For the original text we need to convert it to a numeric id that the neural network can use. The imdb_reader.py script defines the method of text idization, and the words are mapped to integers through the dictionary file imdb.vocab.
<details>
<summary>imdb_reader.py</summary>
```python
import sys
import os
import paddle
import re
import paddle.fluid.incubate.data_generator as dg
class IMDBDataset(dg.MultiSlotDataGenerator):
def load_resource(self, dictfile):
self._vocab = {}
wid = 0
with open(dictfile) as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
self._unk_id = len(self._vocab)
self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
def get_words_only(self, line):
sent = line.lower().replace("<br />", " ").strip()
words = [x for x in self._pattern.split(sent) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas
def get_words_and_label(self, line):
send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
" ").strip()
label = [int(line.split('|')[-1])]
words = [x for x in self._pattern.split(send) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas, label
def infer_reader(self, infer_filelist, batch, buf_size):
def local_iter():
for fname in infer_filelist:
with open(fname, "r") as fin:
for line in fin:
feas, label = self.get_words_and_label(line)
yield feas, label
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def memory_iter():
for i in range(1000):
yield self.return_value
def data_iter():
feas, label = self.get_words_and_label(line)
yield ("words", feas), ("label", label)
return data_iter
```
</details>
The sample after mapping is similar to the following format:
```
257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
```
In this way, the neural network can train the transformed text information as feature values.
## Step4:Define CNN network for training and saving
Net we use [CNN Model](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn) for training, in nets.py we define the network structure.
<details>
<summary>nets.py</summary>
```python
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
def cnn_net(data,
label,
dict_dim,
emb_dim=128,
hid_dim=128,
hid_dim2=96,
class_dim=2,
win_size=3):
""" conv net. """
emb = fluid.layers.embedding(
input=data, size=[dict_dim, emb_dim], is_sparse=True)
conv_3 = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=hid_dim,
filter_size=win_size,
act="tanh",
pool_type="max")
fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
```
</details>
Use training dataset for training. The training script is local_train.py. After training, use the paddle_serving_client.io.save_model function to save the model files and configuration files used by the servingdeployment.
<details>
<summary>local_train.py</summary>
```python
import os
import sys
import paddle
import logging
import paddle.fluid as fluid
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("fluid")
logger.setLevel(logging.INFO)
# load dict file
def load_vocab(filename):
vocab = {}
with open(filename) as f:
wid = 0
for line in f:
vocab[line.strip()] = wid
wid += 1
vocab["<unk>"] = len(vocab)
return vocab
if __name__ == "__main__":
from nets import cnn_net
model_name = "imdb_cnn"
vocab = load_vocab('imdb.vocab')
dict_dim = len(vocab)
#define model input
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
#define dataset,train_data is the dataset directory
dataset = fluid.DatasetFactory().create_dataset()
filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
dataset.set_use_var([data, label])
pipe_command = "python imdb_reader.py"
dataset.set_pipe_command(pipe_command)
dataset.set_batch_size(4)
dataset.set_filelist(filelist)
dataset.set_thread(10)
#define model
avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
optimizer = fluid.optimizer.SGD(learning_rate=0.001)
optimizer.minimize(avg_cost)
#execute training
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
epochs = 100
import paddle_serving_client.io as serving_io
for i in range(epochs):
exe.train_from_dataset(
program=fluid.default_main_program(), dataset=dataset, debug=False)
logger.info("TRAIN --> pass: {}".format(i))
if i == 64:
#At the end of training, use the model save interface in PaddleServing to save the models and configuration files required by Serving
serving_io.save_model("{}_model".format(model_name),
"{}_client_conf".format(model_name),
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
</details>
![Training process](./imdb_loss.png) As can be seen from the above figure, the loss of the model starts to converge after the 65th round. We save the model and configuration file after the 65th round of training is completed. The saved files are divided into imdb_cnn_client_conf and imdb_cnn_model folders. The former contains client-side configuration files, and the latter contains server-side configuration files and saved model files.
The parameter list of the save_model function is as follows:
| Parameter | Meaning |
| -------------------- | ------------------------------------------------------------ |
| server_model_folder | Directory for server-side configuration files and model files |
| client_config_folder | Directory for saving client configuration files |
| feed_var_dict | The input of the inference model. The dict type and key can be customized. The value is the input variable in the model. Each key corresponds to a variable. When using the prediction service, the input data uses the key as the input name. |
| fetch_var_dict | The output of the model used for prediction, dict type, key can be customized, value is the input variable in the model, and each key corresponds to a variable. When using the prediction service, use the key to get the returned data |
| main_program | Model's program |
## Step5: Deploy RPC Prediction Service
The Paddle Serving framework supports two types of prediction service methods. One is to communicate through RPC and the other is to communicate through HTTP. The deployment and use of RPC prediction service will be introduced first. The deployment and use of HTTP prediction service will be introduced at Step 8. .
```shell
python -m paddle_serving_server.serve --model imdb_cnn_model / --port 9292 #cpu prediction service
python -m paddle_serving_server_gpu.serve --model imdb_cnn_model / --port 9292 --gpu_ids 0 #gpu prediction service
```
The parameter --model in the command specifies the server-side model and configuration file directory previously saved, --port specifies the port of the prediction service. When deploying the gpu prediction service using the gpu version, you can use --gpu_ids to specify the gpu used.
After executing one of the above commands, the RPC prediction service deployment of the IMDB sentiment analysis task is completed.
## Step6: Reuse Reader, define remote RPC client
Below we access the RPC prediction service through Python code, the script is test_client.py
<details>
<summary>test_client.py</summary>
```python
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
import sys
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
#The code of the data preprocessing part is reused here to convert the original text into a numeric id
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource(sys.argv[2])
for line in sys.stdin:
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
</details>
The script receives data from standard input and prints out the probability that the sample whose infer result is 1 and its real label.
## Step7: Call the RPC service to test the model effect
The client implemented in the previous step runs the prediction service as an example. The usage method is as follows:
```shell
cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
```
Using 2084 samples in the test_data/part-0 file for test testing, the model prediction accuracy is 88.19%.
**Note**: The effect of each model training may be slightly different, and the accuracy of predictions using the trained model will be close to the examples but may not be exactly the same.
## Step8: Deploy HTTP Prediction Service
When using the HTTP prediction service, the client does not need to install any modules of Paddle Serving, it only needs to be able to send HTTP requests. Of course, the HTTP method consumes more time in the communication phase than the RPC method.
For the IMDB sentiment analysis task, the original text needs to be preprocessed before prediction. In the RPC prediction service, we put the preprocessing in the client's script, and in the HTTP prediction service, we put the preprocessing on the server. Paddle Serving's HTTP prediction service framework prepares data pre-processing and post-processing interfaces for this situation. We just need to rewrite it according to the needs of the task.
Serving provides sample code, which is obtained by executing the imdb_web_service_demo.sh script in [IMDB Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb).
Let's take a look at the script text_classify_service.py that starts the HTTP prediction service.
<details>
<summary>text_clssify_service.py</summary>
```python
from paddle_serving_server.web_service import WebService
from imdb_reader import IMDBDataset
import sys
#extend class WebService
class IMDBService(WebService):
def prepare_dict(self, args={}):
if len(args) == 0:
exit(-1)
self.dataset = IMDBDataset()
self.dataset.load_resource(args["dict_file_path"])
#rewrite preprocess() to implement data preprocessing, here we reuse reader script for training
def preprocess(self, feed={}, fetch=[]):
if "words" not in feed:
exit(-1)
res_feed = {}
res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
return res_feed, fetch
#Here you need to use the name parameter to specify the name of the prediction service.
imdb_service = IMDBService(name="imdb")
imdb_service.load_model_config(sys.argv[1])
imdb_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server()
```
</details>
run
```shell
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
In the above command, the first parameter is the saved server-side model and configuration file. The second parameter is the working directory, which will save some configuration files for the prediction service. The directory may not exist but needs to be specified. The prediction service will be created by itself. the third parameter is Port number, the fourth parameter is the dictionary file.
## Step9: Call the prediction service with plaintext data
After starting the HTTP prediction service, you can make prediction with a single command:
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
When the inference process is normal, the prediction probability is returned, as shown below.
```
{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
```
**Note**: The effect of each model training may be slightly different, and the inferred probability value using the trained model may not be consistent with the example.
# 端到端完成从训练到部署全流程
(简体中文|[English](./TRAIN_TO_SERVICE.md))
Paddle Serving是Paddle的高性能在线预测服务框架,可以灵活支持大多数模型的部署。本文中将以IMDB评论情感分析任务为例通过9步展示从模型的训练到部署预测服务的全流程。
## Step1:准备环境
Paddle Serving可以部署在Linux环境上,目前server端支持在Centos7上部署,推荐使用[Docker部署](RUN_IN_DOCKER_CN.md)。rpc client端可以在Centos7和Ubuntu18上部署,在其他系统上或者不希望安装serving模块的环境中仍然可以通过http服务来访问server端的预测服务。
可以根据需求和机器环境来选择安装cpu或gpu版本的server模块,在client端机器上安装client模块。使用http请求的方式来访问server时,client端机器不需要安装client模块。
```shell
pip install paddle_serving_server #cpu版本server端
pip install paddle_serving_server_gpu #gpu版本server端
pip install paddle_serving_client #client端
```
简单准备后,我们将以IMDB评论情感分析任务为例,展示从模型训练到部署预测服务的流程。示例中的所有代码都可以在Paddle Serving代码库的[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中找到,示例中使用的数据和词典文件可以通过执行IMDB示例代码中的get_data.sh脚本得到。
## Step2:确定任务和原始数据格式
IMDB评论情感分析任务是对电影评论的内容进行二分类,判断该评论是属于正面评论还是负面评论。
首先我们来看一下原始的数据:
```
saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
```
这是一条英文评论样本,样本中使用|作为分隔符,分隔符之前为评论的内容,分隔符之后是样本的标签,0代表负样本,即负面评论,1代表正样本,即正面评论。
## Step3:定义Reader,划分训练集、测试集
对于原始文本我们需要将它转化为神经网络可以使用的数字id。imdb_reader.py脚本中定义了文本id化的方法,通过词典文件imdb.vocab将单词映射为整形数。
<details>
<summary>imdb_reader.py</summary>
```python
import sys
import os
import paddle
import re
import paddle.fluid.incubate.data_generator as dg
class IMDBDataset(dg.MultiSlotDataGenerator):
def load_resource(self, dictfile):
self._vocab = {}
wid = 0
with open(dictfile) as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
self._unk_id = len(self._vocab)
self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
def get_words_only(self, line):
sent = line.lower().replace("<br />", " ").strip()
words = [x for x in self._pattern.split(sent) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas
def get_words_and_label(self, line):
send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
" ").strip()
label = [int(line.split('|')[-1])]
words = [x for x in self._pattern.split(send) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas, label
def infer_reader(self, infer_filelist, batch, buf_size):
def local_iter():
for fname in infer_filelist:
with open(fname, "r") as fin:
for line in fin:
feas, label = self.get_words_and_label(line)
yield feas, label
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def memory_iter():
for i in range(1000):
yield self.return_value
def data_iter():
feas, label = self.get_words_and_label(line)
yield ("words", feas), ("label", label)
return data_iter
```
</details>
映射之后的样本类似于以下的格式:
```
257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
```
这样神经网络就可以将转化后的文本信息作为特征值进行训练。
## Step4:定义CNN网络进行训练并保存
接下来我们使用[CNN模型](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn)来进行训练。在nets.py脚本中定义网络结构。
<details>
<summary>nets.py</summary>
```python
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
def cnn_net(data,
label,
dict_dim,
emb_dim=128,
hid_dim=128,
hid_dim2=96,
class_dim=2,
win_size=3):
""" conv net. """
emb = fluid.layers.embedding(
input=data, size=[dict_dim, emb_dim], is_sparse=True)
conv_3 = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=hid_dim,
filter_size=win_size,
act="tanh",
pool_type="max")
fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
```
</details>
使用训练样本进行训练,训练脚本为local_train.py。在训练结束后使用paddle_serving_client.io.save_model函数来保存部署预测服务使用的模型文件和配置文件。
<details>
<summary>local_train.py</summary>
```python
import os
import sys
import paddle
import logging
import paddle.fluid as fluid
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("fluid")
logger.setLevel(logging.INFO)
# 加载词典文件
def load_vocab(filename):
vocab = {}
with open(filename) as f:
wid = 0
for line in f:
vocab[line.strip()] = wid
wid += 1
vocab["<unk>"] = len(vocab)
return vocab
if __name__ == "__main__":
from nets import cnn_net
model_name = "imdb_cnn"
vocab = load_vocab('imdb.vocab')
dict_dim = len(vocab)
#定义模型输入
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
#定义dataset,train_data为训练数据目录
dataset = fluid.DatasetFactory().create_dataset()
filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
dataset.set_use_var([data, label])
pipe_command = "python imdb_reader.py"
dataset.set_pipe_command(pipe_command)
dataset.set_batch_size(4)
dataset.set_filelist(filelist)
dataset.set_thread(10)
#定义模型
avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
optimizer = fluid.optimizer.SGD(learning_rate=0.001)
optimizer.minimize(avg_cost)
#执行训练
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
epochs = 100
import paddle_serving_client.io as serving_io
for i in range(epochs):
exe.train_from_dataset(
program=fluid.default_main_program(), dataset=dataset, debug=False)
logger.info("TRAIN --> pass: {}".format(i))
if i == 64:
#在训练结束时使用PaddleServing中的模型保存接口保存出Serving所需的模型和配置文件
serving_io.save_model("{}_model".format(model_name),
"{}_client_conf".format(model_name),
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
</details>
![训练过程](./imdb_loss.png)由上图可以看出模型的损失在第65轮之后开始收敛,我们在第65轮训练完成后保存模型和配置文件。保存的文件分为imdb_cnn_client_conf和imdb_cnn_model文件夹,前者包含client端的配置文件,后者包含server端的配置文件和保存的模型文件。
save_model函数的参数列表如下:
| 参数 | 含义 |
| -------------------- | ------------------------------------------------------------ |
| server_model_folder | 保存server端配置文件和模型文件的目录 |
| client_config_folder | 保存client端配置文件的目录 |
| feed_var_dict | 用于预测的模型的输入,dict类型,key可以自定义,value为模型中的input variable,每个key对应一个variable,使用预测服务时,输入数据使用key作为输入的名称 |
| fetch_var_dict | 用于预测的模型的输出,dict类型,key可以自定义,value为模型中的input variable,每个key对应一个variable,使用预测服务时,通过key来获取返回数据 |
| main_program | 模型的program |
## Step5:部署RPC预测服务
Paddle Serving框架支持两种预测服务方式,一种是通过RPC进行通信,一种是通过HTTP进行通信,下面将先介绍RPC预测服务的部署和使用方法,在Step8开始介绍HTTP预测服务的部署和使用。
```shell
python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292 #cpu预测服务
python -m paddle_serving_server_gpu.serve --model imdb_cnn_model/ --port 9292 --gpu_ids 0 #gpu预测服务
```
命令中参数--model 指定在之前保存的server端的模型和配置文件目录,--port指定预测服务的端口,当使用gpu版本部署gpu预测服务时可以使用--gpu_ids指定使用的gpu 。
执行完以上命令之一,就完成了IMDB 情感分析任务的RPC预测服务部署。
## Step6:复用Reader,定义远程RPC客户端
下面我们通过Python代码来访问RPC预测服务,脚本为test_client.py
<details>
<summary>test_client.py</summary>
```python
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
import sys
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
#在这里复用了数据预处理部分的代码将原始文本转换成数字id
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource(sys.argv[2])
for line in sys.stdin:
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
</details>
脚本从标准输入接收数据,并打印出样本预测为1的概率与真实的label。
## Step7:调用RPC服务,测试模型效果
以上一步实现的客户端为例运行预测服务,使用方式如下:
```shell
cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
```
使用test_data/part-0文件中的2084个样本进行测试测试,模型预测的准确率为88.19%。
**注意**:每次模型训练的效果可能略有不同,使用训练出的模型预测的准确率会与示例中接近但有可能不完全一致。
## Step8:部署HTTP预测服务
使用HTTP预测服务时,client端不需要安装Paddle Serving的任何模块,仅需要能发送HTTP请求即可。当然HTTP的通信方式会相较于RPC的通信方式在通信阶段消耗更多的时间。
对于IMDB情感分析任务原始文本在预测之前需要进行预处理,在RPC预测服务中我们将预处理放在client的脚本中,而在HTTP预测服务中我们将预处理放在server端。Paddle Serving的HTTP预测服务框架为这种情况准备了数据预处理和后处理的接口,我们只要根据任务需要重写即可。
Serving提供了示例代码,通过执行[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中的imdb_web_service_demo.sh脚本来获取。
下面我们来看一下启动HTTP预测服务的脚本text_classify_service.py。
<details>
<summary>text_clssify_service.py</summary>
```python
from paddle_serving_server.web_service import WebService
from imdb_reader import IMDBDataset
import sys
#继承框架中的WebService类
class IMDBService(WebService):
def prepare_dict(self, args={}):
if len(args) == 0:
exit(-1)
self.dataset = IMDBDataset()
self.dataset.load_resource(args["dict_file_path"])
#重写preprocess方法来实现数据预处理,这里也复用了训练时使用的reader脚本
def preprocess(self, feed={}, fetch=[]):
if "words" not in feed:
exit(-1)
res_feed = {}
res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
return res_feed, fetch
#这里需要使用name参数指定预测服务的名称,
imdb_service = IMDBService(name="imdb")
imdb_service.load_model_config(sys.argv[1])
imdb_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server()
```
</details>
启动命令
```shell
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
以上命令中参数1为保存的server端模型和配置文件,参数2为工作目录会保存一些预测服务工作时的配置文件,该目录可以不存在但需要指定名称,预测服务会自行创建,参数3为端口号,参数4为词典文件。
## Step9:明文数据调用预测服务
启动完HTTP预测服务,即可通过一行命令进行预测:
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
预测流程正常时,会返回预测概率,示例如下。
```
{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
```
**注意**:每次模型训练的效果可能略有不同,使用训练出的模型预测概率数值可能与示例不一致。
......@@ -117,9 +117,9 @@ Please refer to [Docker Desktop](https://www.docker.com/products/docker-desktop)
After installation, start the docker linux engine and download the relevant image. In the Serving directory
```
docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
# There is no expose port here, users can set -p to perform port mapping as needed
docker run --rm -dit --name serving_devel -v $PWD:/Serving hub.baidubce.com/paddlepaddle/serving:latest-devel
docker run --rm -dit --name serving_devel -v $PWD:/Serving registry.baidubce.com/paddlepaddle/serving:latest-devel
docker exec -it serving_devel bash
cd /Serving
```
......
......@@ -117,9 +117,9 @@ python your_client.py
安装之后启动docker的linux engine,下载相关镜像。在Serving目录下
```
docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
docker pull registry.baidubce.com/paddlepaddle/serving:latest-devel
# 此处没有expose端口,用户可根据需要设置-p来进行端口映射
docker run --rm -dit --name serving_devel -v $PWD:/Serving hub.baidubce.com/paddlepaddle/serving:latest-devel
docker run --rm -dit --name serving_devel -v $PWD:/Serving registry.baidubce.com/paddlepaddle/serving:latest-devel
docker exec -it serving_devel bash
cd /Serving
```
......
......@@ -7,8 +7,8 @@
In order to facilitate users to use java for development, we provide the compiled Serving project to be placed in the java mirror. The way to get the mirror and enter the development environment is
```
docker pull hub.baidubce.com/paddlepaddle/serving:0.4.0-java
docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.0-java
docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-java
docker run --rm -dit --name java_serving registry.baidubce.com/paddlepaddle/serving:0.5.0-java
docker exec -it java_serving bash
cd Serving/java
```
......@@ -101,19 +101,22 @@ cd ../../../java/examples/target
java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PipelineClientExample indarray_predict
```
### Precautions for details
### Customization guidance
1.In the example, all models(not pipeline) need to use `--use_multilang` to start GRPC multi-programming language support, and the port number is 9393. If you need another port, you need to modify it in the java file
2.Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/PIPELINE_SERVING.md) for details). Pipeline Serving Client for Java is released.
The above example is running in CPU mode. If GPU mode is required, there are two options.
3.The parameters`ip` and`port` in PipelineClientExample.java(path:java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)),needs to be connected with the corresponding pipeline server parameters`ip` and`port` which is defined in the config.yaml(Taking IMDB model ensemble as an example,path:python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml)
### Customization guidance
The first is that GPU Serving and Java Client are in the same image. After starting the corresponding image, the user needs to move /Serving/java in the java image to the corresponding image.
Because the docker image of Java does not contain the compilation and development environment required by serving, and the regular docker image of serving does not contain the compilation and development environment required by Java, the secondary compilation and development of GPU serving and Java client need to be completed under their respective docker images. So, we take GPU mode as an example to explain the two ways of development and deployment.
The second is to deploy GPU Serving and Java Client separately. If they are on the same host, you can learn the IP address of the corresponding container through ifconfig, and then when you connect to client.connect in `examples/src/main/java/PaddleServingClientExample.java` Make changes to the endpoint, and then compile it again. Or select `--net=host` to bind the network device of docker and host when docker starts, so that it can run directly without customizing java code.
**It should be noted that in the example, all models(not pipeline) need to use `--use_multilang` to start GRPC multi-programming language support, and the port number is 9393. If you need another port, you need to modify it in the java file**
The first is that when GPU serving and Java client are running in the same GPU image, the user needs to copy the files compiled in the java image (path:/serving /Java) to the path /serving/Java of the GPU image.
**Currently Serving has launched the Pipeline mode (see [Pipeline Serving](../doc/PIPELINE_SERVING.md) for details). Pipeline Serving Client for Java is released.**
**It should be noted that in the example, Java Pipeline Client code is in path /Java/Examples and /Java/src/main, and the Pipeline server code is in path /python/examples/pipeline/ The Client IP and Port(which is configured in java/examples/src/main/java/PipelineClientExample.java) should be corresponding to the Pipeline Server IP and Port(which is configured in config.yaml) **
The second is that GPU serving and Java client are deployed in their respective docker images (or different hosts with compilation and development environment). At this time, you only need to pay attention to the `ip` and`port` correspondence between the Java client and GPU serving. See item 3 of the above precautions for details.
......@@ -7,8 +7,8 @@
为了方便用户使用java进行开发,我们提供了编译好的Serving工程放置在java镜像当中,获取镜像并进入开发环境的方式是
```
docker pull hub.baidubce.com/paddlepaddle/serving:0.4.0-java
docker run --rm -dit --name java_serving hub.baidubce.com/paddlepaddle/serving:0.4.0-java
docker pull registry.baidubce.com/paddlepaddle/serving:0.5.0-java
docker run --rm -dit --name java_serving registry.baidubce.com/paddlepaddle/serving:0.5.0-java
docker exec -it java_serving bash
cd Serving/java
```
......@@ -103,17 +103,20 @@ cd ../../../java/examples/target
java -cp paddle-serving-sdk-java-examples-0.0.1-jar-with-dependencies.jar PipelineClientExample indarray_predict
```
### 二次开发指导
### 注意事项
上述示例是在CPU模式下运行,如果需要GPU模式,可以有两种选择
1.在示例中,所有非Pipeline模型都需要使用`--use_multilang`来启动GRPC多编程语言支持,以及端口号都是9393,如果需要别的端口,需要在java文件里修改
第一种是GPU Serving和Java Client在同一个镜像,需要用户在启动对应的镜像后,把java镜像当中的/Serving/java移动到对应的镜像中
2.目前Serving已推出Pipeline模式(原理详见[Pipeline Serving](../doc/PIPELINE_SERVING_CN.md)),面向Java的Pipeline Serving Client已发布
第二种是GPU Serving和Java Client分开部署,如果在同一台宿主机,可以通过ifconfig了解对应容器的IP地址,然后在`examples/src/main/java/PaddleServingClientExample.java`当中对client.connect时的endpoint做修改,然后再编译一次。 或者在docker启动时选择 `--net=host`来绑定docker和宿主机的网络设备,这样不需要定制java代码可以直接运行。
3.注意PipelineClientExample.java中的ip和port(位于java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)),需要与对应Pipeline server的config.yaml文件中配置的ip和port相对应。(以IMDB model ensemble模型为例,位于python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml)
**需要注意的是,在示例中,所有非Pipeline模型都需要使用`--use_multilang`来启动GRPC多编程语言支持,以及端口号都是9393,如果需要别的端口,需要在java文件里修改**
### 开发部署指导
由于Java的docker镜像中不含有Serving需要的编译开发环境,Serving的常规docker镜像中也不包含Java所需要的编译开发环境,对GPU Serving端和Java Client端的二次编译开发需要在各自的docker镜像下完成,下面以GPU模式为例,讲解开发部署的两种形式。
第一种是GPU Serving和Java Client在运行在同一个GPU镜像中,需要用户在启动GPU镜像后,把在java镜像中编译完成后的文件(位于/Serving/java目录下)拷贝到GPU镜像中的/Serving/java目录下。
第二种是GPU Serving和Java Client分别在各自的docker镜像中(或具备编译开发环境的不同主机上)部署,此时仅需注意Java Client端与GPU Serving端的ip和port需要对应,详见上述注意事项中的第3项。
**目前Serving已推出Pipeline模式(详见[Pipeline Serving](../doc/PIPELINE_SERVING_CN.md)),面向Java的Pipeline Serving Client已发布。**
**需要注意的是,Java Pipeline Client相关示例在/Java/Examples和/Java/src/main中,对应的Pipeline server在/python/examples/pipeline/中
注意java/examples/src/main/java/PipelineClientExample.java中的ip和port,需要与/python/examples/pipeline/中对应Pipeline server的config.yaml文件中配置的ip和port相对应。**
......@@ -8,7 +8,30 @@
<groupId>io.paddle.serving.client</groupId>
<artifactId>paddle-serving-sdk-java-examples</artifactId>
<version>0.0.1</version>
<repositories>
<repository>
<id>aliyun</id>
<url>https://maven.aliyun.com/repository/public</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>aliyun-plugin</id>
<url>https://maven.aliyun.com/repository/public</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
<build>
<plugins>
<plugin>
......
......@@ -16,9 +16,11 @@ public class PaddleServingClientExample {
0.0582f, -0.0727f, -0.1583f, -0.0584f,
0.6283f, 0.4919f, 0.1856f, 0.0795f, -0.0332f};
INDArray npdata = Nd4j.createFromArray(data);
long[] batch_shape = {1,13};
INDArray batch_npdata = npdata.reshape(batch_shape);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("x", npdata);
put("x", batch_npdata);
}};
List<String> fetch = Arrays.asList("price");
......@@ -69,12 +71,16 @@ public class PaddleServingClientExample {
// Div(255.0)
INDArray image = RGBimage.divi(255.0);
long[] batch_shape = {1,image.shape()[0],image.shape()[1],image.shape()[2]};
INDArray batch_image = image.reshape(batch_shape);
INDArray im_size = Nd4j.createFromArray(new int[]{height, width});
long[] batch_size_shape = {1,2};
INDArray batch_im_size = im_size.reshape(batch_size_shape);
HashMap<String, INDArray> feed_data
= new HashMap<String, INDArray>() {{
put("image", image);
put("im_size", im_size);
put("image", batch_image);
put("im_size", batch_im_size);
}};
List<String> fetch = Arrays.asList("save_infer_model/scale_0.tmp_0");
......
......@@ -47,7 +47,30 @@
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
<repositories>
<repository>
<id>aliyun</id>
<url>https://maven.aliyun.com/repository/public</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>aliyun-plugin</id>
<url>https://maven.aliyun.com/repository/public</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</pluginRepository>
</pluginRepositories>
<dependencyManagement>
<dependencies>
<dependency>
......
......@@ -4,6 +4,9 @@ import java.util.*;
import java.util.function.Function;
import java.lang.management.ManagementFactory;
import java.lang.management.RuntimeMXBean;
import java.util.stream.Collectors;
import java.util.List;
import java.util.ArrayList;
import io.grpc.ManagedChannel;
import io.grpc.ManagedChannelBuilder;
......@@ -238,7 +241,11 @@ public class Client {
} else {
throw new IllegalArgumentException("error tensor value type.");
}
tensor_builder.addAllShape(feedShapes_.get(name));
long[] longArray = variable.shape();
int[] intArray = Arrays.stream(longArray).mapToInt(i -> (int) i).toArray();
List<Integer> indarrayShapeList = Arrays.stream(intArray).boxed().collect(Collectors.toList());
//tensor_builder.addAllShape(feedShapes_.get(name));
tensor_builder.addAllShape(indarrayShapeList);
inst_builder.addTensorArray(tensor_builder.build());
}
req_builder.addInsts(inst_builder.build());
......
......@@ -263,6 +263,61 @@ class Parameter {
float* _params;
};
class FluidCpuAnalysisEncryptCore : public FluidFamilyCore {
public:
void ReadBinaryFile(const std::string& filename, std::string* contents) {
std::ifstream fin(filename, std::ios::in | std::ios::binary);
fin.seekg(0, std::ios::end);
contents->clear();
contents->resize(fin.tellg());
fin.seekg(0, std::ios::beg);
fin.read(&(contents->at(0)), contents->size());
fin.close();
}
int create(const predictor::InferEngineCreationParams& params) {
std::string data_path = params.get_path();
if (access(data_path.c_str(), F_OK) == -1) {
LOG(ERROR) << "create paddle predictor failed, path note exits: "
<< data_path;
return -1;
}
std::string model_buffer, params_buffer, key_buffer;
ReadBinaryFile(data_path + "encrypt_model", &model_buffer);
ReadBinaryFile(data_path + "encrypt_params", &params_buffer);
ReadBinaryFile(data_path + "key", &key_buffer);
VLOG(2) << "prepare for encryption model";
auto cipher = paddle::MakeCipher("");
std::string real_model_buffer = cipher->Decrypt(model_buffer, key_buffer);
std::string real_params_buffer = cipher->Decrypt(params_buffer, key_buffer);
Config analysis_config;
// paddle::AnalysisConfig analysis_config;
analysis_config.SetModelBuffer(&real_model_buffer[0],
real_model_buffer.size(),
&real_params_buffer[0],
real_params_buffer.size());
analysis_config.DisableGpu();
analysis_config.SetCpuMathLibraryNumThreads(1);
if (params.enable_memory_optimization()) {
analysis_config.EnableMemoryOptim();
}
analysis_config.SwitchSpecifyInputNames(true);
AutoLock lock(GlobalPaddleCreateMutex::instance());
VLOG(2) << "decrypt model file sucess";
_core = CreatePredictor(analysis_config);
if (NULL == _core.get()) {
LOG(ERROR) << "create paddle predictor failed, path: " << data_path;
return -1;
}
VLOG(2) << "create paddle predictor sucess, path: " << data_path;
return 0;
}
};
} // namespace fluid_cpu
} // namespace paddle_serving
} // namespace baidu
......@@ -30,6 +30,13 @@ REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
::baidu::paddle_serving::predictor::InferEngine,
"FLUID_CPU_ANALYSIS_DIR");
#if 1
REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
::baidu::paddle_serving::predictor::FluidInferEngine<
FluidCpuAnalysisEncryptCore>,
::baidu::paddle_serving::predictor::InferEngine,
"FLUID_CPU_ANALYSIS_ENCRYPT");
#endif
} // namespace fluid_cpu
} // namespace paddle_serving
} // namespace baidu
......@@ -283,6 +283,60 @@ class Parameter {
float* _params;
};
class FluidGpuAnalysisEncryptCore : public FluidFamilyCore {
public:
void ReadBinaryFile(const std::string& filename, std::string* contents) {
std::ifstream fin(filename, std::ios::in | std::ios::binary);
fin.seekg(0, std::ios::end);
contents->clear();
contents->resize(fin.tellg());
fin.seekg(0, std::ios::beg);
fin.read(&(contents->at(0)), contents->size());
fin.close();
}
int create(const predictor::InferEngineCreationParams& params) {
std::string data_path = params.get_path();
if (access(data_path.c_str(), F_OK) == -1) {
LOG(ERROR) << "create paddle predictor failed, path note exits: "
<< data_path;
return -1;
}
std::string model_buffer, params_buffer, key_buffer;
ReadBinaryFile(data_path + "encrypt_model", &model_buffer);
ReadBinaryFile(data_path + "encrypt_params", &params_buffer);
ReadBinaryFile(data_path + "key", &key_buffer);
VLOG(2) << "prepare for encryption model";
auto cipher = paddle::MakeCipher("");
std::string real_model_buffer = cipher->Decrypt(model_buffer, key_buffer);
std::string real_params_buffer = cipher->Decrypt(params_buffer, key_buffer);
Config analysis_config;
analysis_config.SetModelBuffer(&real_model_buffer[0],
real_model_buffer.size(),
&real_params_buffer[0],
real_params_buffer.size());
analysis_config.EnableUseGpu(100, FLAGS_gpuid);
analysis_config.SetCpuMathLibraryNumThreads(1);
if (params.enable_memory_optimization()) {
analysis_config.EnableMemoryOptim();
}
analysis_config.SwitchSpecifyInputNames(true);
AutoLock lock(GlobalPaddleCreateMutex::instance());
VLOG(2) << "decrypt model file sucess";
_core = CreatePredictor(analysis_config);
if (NULL == _core.get()) {
LOG(ERROR) << "create paddle predictor failed, path: " << data_path;
return -1;
}
VLOG(2) << "create paddle predictor sucess, path: " << data_path;
return 0;
}
};
} // namespace fluid_gpu
} // namespace paddle_serving
} // namespace baidu
......@@ -31,6 +31,11 @@ REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
FluidGpuAnalysisDirCore>,
::baidu::paddle_serving::predictor::InferEngine,
"FLUID_GPU_ANALYSIS_DIR");
REGIST_FACTORY_OBJECT_IMPL_WITH_NAME(
::baidu::paddle_serving::predictor::FluidInferEngine<
FluidGpuAnalysisEncryptCore>,
::baidu::paddle_serving::predictor::InferEngine,
"FLUID_GPU_ANALYSIS_ENCRPT")
} // namespace fluid_gpu
} // namespace paddle_serving
......
......@@ -86,7 +86,7 @@ if (SERVER)
elseif(CUDA_VERSION EQUAL 10.2)
set(SUFFIX 102)
elseif(CUDA_VERSION EQUAL 11.0)
set(SUFFIX 110)
set(SUFFIX 11)
endif()
add_custom_command(
......
## Examples
### Support `--use_trt`
the following models support `--use_trt`, which means you can use TensorRT to accelerate inference at Cuda 10.1 or higher.
- imagenet ResNet50/ResNet101
- detection faster_rcnn/yolov3/pp-yolo/ttf-net
## Serving模型示例
### 支持TensorRT的模型列表 `--use_trt`
以下模型支持TensorRT,可以开启 `--use_trt`来加速在线预测,其他模型不能开启。
- imagenet ResNet50/ResNet101
- detection faster_rcnn/yolov3/pp-yolo/ttf-net
......@@ -11,14 +11,16 @@ This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubd
Install paddlehub first
```
pip install paddlehub
pip3 install paddlehub
```
run
```
python prepare_model.py 128
python3 prepare_model.py 128
```
**PaddleHub only support Python 3.5+**
the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
the config file and model file for server side are saved in the folder bert_seq128_model.
the config file generated for client side is saved in the folder bert_seq128_client.
......@@ -28,6 +30,8 @@ You can also download the above model from BOS(max_seq_len=128). After decompres
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
```
if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'.
......@@ -64,7 +68,7 @@ the client reads data from data-c.txt and send prediction request, the predictio
### HTTP Inference Service
start cpu HTTP inference service,Run
```
python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service
python bert_web_service.py bert_seq128_model/ 9292 #launch cpu inference service
```
Or,start gpu HTTP inference service,Run
......
......@@ -10,11 +10,11 @@
示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)
请先安装paddlehub
```
pip install paddlehub
pip3 install paddlehub
```
执行
```
python prepare_model.py 128
python3 prepare_model.py 128
```
参数128表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq128_model文件夹。
......@@ -25,11 +25,14 @@ python prepare_model.py 128
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model
mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client
```
若使用bert_chinese_L-12_H-768_A-12_model模型,将下面命令中的bert_seq128_model字段替换为bert_chinese_L-12_H-768_A-12_model,bert_seq128_client字段替换为bert_chinese_L-12_H-768_A-12_client.
### 获取词典和样例数据
```
......@@ -67,7 +70,7 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien
### 启动HTTP预测服务
启动cpu HTTP预测服务,执行
```
python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务
python bert_web_service.py bert_seq128_model/ 9292 #启动CPU预测服务
```
......
......@@ -16,9 +16,11 @@ import paddlehub as hub
import paddle.fluid as fluid
import sys
import paddle_serving_client.io as serving_io
import paddle
paddle.enable_static()
model_name = "bert_chinese_L-12_H-768_A-12"
module = hub.Module(model_name)
module = hub.Module(name=model_name)
inputs, outputs, program = module.context(
trainable=True, max_seq_len=int(sys.argv[1]))
place = fluid.core_avx.CPUPlace()
......
......@@ -14,7 +14,7 @@ tar xf criteo_ctr_demo_model.tar.gz
mv models/ctr_client_conf .
mv models/ctr_serving_model .
```
the directories like serving_server_model and serving_client_config will appear.
the directories like `ctr_serving_model` and `ctr_client_conf` will appear.
### Start RPC Inference Service
......@@ -26,6 +26,6 @@ python -m paddle_serving_server_gpu.serve --model ctr_serving_model/ --port 9292
### RPC Infer
```
python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/
python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0
```
the latency will display in the end.
......@@ -14,7 +14,7 @@ tar xf criteo_ctr_demo_model.tar.gz
mv models/ctr_client_conf .
mv models/ctr_serving_model .
```
会在当前目录出现serving_server_model和serving_client_config文件夹。
会在当前目录出现`ctr_serving_model``ctr_client_conf`文件夹。
### 启动RPC预测服务
......@@ -26,6 +26,6 @@ python -m paddle_serving_server_gpu.serve --model ctr_serving_model/ --port 9292
### 执行预测
```
python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/
python test_client.py ctr_client_conf/serving_client_conf.prototxt raw_data/part-0
```
预测完毕会输出预测过程的耗时。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import sys
import paddle.fluid.incubate.data_generator as dg
class CriteoDataset(dg.MultiSlotDataGenerator):
def setup(self, sparse_feature_dim):
self.cont_min_ = [0, -3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
self.cont_max_ = [
20, 600, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
]
self.cont_diff_ = [
20, 603, 100, 50, 64000, 500, 100, 50, 500, 10, 10, 10, 50
]
self.hash_dim_ = sparse_feature_dim
# here, training data are lines with line_index < train_idx_
self.train_idx_ = 41256555
self.continuous_range_ = range(1, 14)
self.categorical_range_ = range(14, 40)
def _process_line(self, line):
features = line.rstrip('\n').split('\t')
dense_feature = []
sparse_feature = []
for idx in self.continuous_range_:
if features[idx] == '':
dense_feature.append(0.0)
else:
dense_feature.append((float(features[idx]) - self.cont_min_[idx - 1]) / \
self.cont_diff_[idx - 1])
for idx in self.categorical_range_:
sparse_feature.append(
[hash(str(idx) + features[idx]) % self.hash_dim_])
return dense_feature, sparse_feature, [int(features[0])]
def infer_reader(self, filelist, batch, buf_size):
def local_iter():
for fname in filelist:
with open(fname.strip(), "r") as fin:
for line in fin:
dense_feature, sparse_feature, label = self._process_line(
line)
#yield dense_feature, sparse_feature, label
yield [dense_feature] + sparse_feature + [label]
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def data_iter():
dense_feature, sparse_feature, label = self._process_line(line)
feature_name = ["dense_input"]
for idx in self.categorical_range_:
feature_name.append("C" + str(idx - 13))
feature_name.append("label")
yield zip(feature_name, [dense_feature] + sparse_feature + [label])
return data_iter
if __name__ == "__main__":
criteo_dataset = CriteoDataset()
criteo_dataset.setup(int(sys.argv[1]))
criteo_dataset.run_from_stdin()
# Serve models from Paddle Detection
(English|[简体中文](./README_CN.md))
### Introduction
PaddleDetection flying paddle target detection development kit is designed to help developers complete the whole development process of detection model formation, training, optimization and deployment faster and better. For details, see [Github](https://github.com/PaddlePaddle/PaddleDetection/tree/master/dygraph)
This article mainly introduces the deployment of Paddle Detection's dynamic graph model on Serving.
Paddle Detection provides a large number of [Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/docs/MODEL_ZOO_cn.md), these model libraries can be used in Paddle Serving with export tools Model. For the export tutorial, please refer to [Paddle Detection Export Model Tutorial (Simplified Chinese)](https://github.com/PaddlePaddle/PaddleDetection/blob/master/dygraph/deploy/EXPORT_MODEL.md).
### Serving example
Several examples of PaddleDetection models used in Serving are given in this folder
All examples support TensorRT.
- [Faster RCNN](./faster_rcnn_r50_fpn_1x_coco)
- [PPYOLO](./ppyolo_r50vd_dcn_1x_coco)
- [TTFNet](./ttfnet_darknet53_1x_coco)
- [YOLOv3](./yolov3_darknet53_270e_coco)
- [HRNet](./faster_rcnn_hrnetv2p_w18_1x)
- [Fcos](./fcos_dcn_r50_fpn_1x_coco)
- [SSD](./ssd_vgg16_300_240e_voc/)
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册