未验证 提交 d6020fdb 编写于 作者: J Jiawei Wang 提交者: GitHub

Merge branch 'develop' into fix_encoding_bug

......@@ -20,88 +20,116 @@
<br>
<p>
- [Motivation](./README.md#motivation)
- [AIStudio Tutorial](./README.md#aistudio-tutorial)
- [Installation](./README.md#installation)
- [Quick Start Example](./README.md#quick-start-example)
- [Document](README.md#document)
- [Community](README.md#community)
<h2 align="center">Motivation</h2>
We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
- Any model trained by [PaddlePaddle](https://github.com/paddlepaddle/paddle) can be directly used or [Model Conversion Interface](./doc/SAVE_CN.md) for online deployment of Paddle Serving.
- Support [Multi-model Pipeline Deployment](./doc/PIPELINE_SERVING.md), and provide the requirements of the REST interface and RPC interface itself, [Pipeline example](./python/examples/pipeline).
- Support the major model libraries of the Paddle ecosystem, such as [PaddleDetection](./python/examples/detection), [PaddleOCR](./python/examples/ocr), [PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/tools/recserving/movie_recommender).
- Provide a variety of pre-processing and post-processing to facilitate users in training, deployment and other stages of related code, bridging the gap between AI developers and application developers, please refer to
[Serving Examples](./python/examples/).
<p align="center">
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">AIStudio Tutorial</h2>
Here we provide tutorial on AIStudio(Chinese Version) [AIStudio教程-Paddle Serving服务化部署框架](https://aistudio.baidu.com/aistudio/projectdetail/1550674)
The tutorial provides
<ul>
<li>Paddle Serving Environment Setup</li>
<ul>
<li>Running in docker images
<li>pip install Paddle Serving
</ul>
<li>Quick Experience of Paddle Serving</li>
<li>Advanced Tutorial of Model Deployment</li>
<ul>
<li>Save/Convert Models for Paddle Serving</li>
<li>Setup Online Inference Service</li>
</ul>
<li>Paddle Serving Examples</li>
<ul>
<li>Paddle Serving for Detections</li>
<li>Paddle Serving for OCR</li>
</ul>
</ul>
<h2 align="center">Installation</h2>
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](doc/RUN_IN_DOCKER.md). See the [document](doc/DOCKER_IMAGES.md) for more docker images.
**Attention:**: Currently, the default GPU environment of paddlepaddle 2.0 is Cuda 10.2, so the sample code of GPU Docker is based on Cuda 10.2. We also provides docker images and whl packages for other GPU environments. If users use other environments, they need to carefully check and select the appropriate version.
```
# Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker pull hub.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```
# Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```shell
pip install paddle-serving-client==0.4.0
pip install paddle-serving-server==0.4.0 # CPU
pip install paddle-serving-client==0.5.0
pip install paddle-serving-server==0.5.0 # CPU
pip install paddle-serving-app==0.2.0
pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT
pip install paddle-serving-server-gpu==0.5.0.post102 #GPU with CUDA10.2 + TensorRT7
# DO NOT RUN ALL COMMANDS! check your GPU env and select the right one
pip install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
```
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command. If you want to compile by yourself, please refer to [How to compile Paddle Serving?](./doc/COMPILE.md)
Packages of paddle-serving-server and paddle-serving-server-gpu support Centos 6/7, Ubuntu 16/18, Windows 10.
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7.
Packages of paddle-serving-client and paddle-serving-app support Linux and Windows, but paddle-serving-client only support python2.7/3.5/3.6/3.7/3.8.
Recommended to install paddle >= 1.8.4.
Recommended to install paddle >= 2.0.0
For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
<h2 align="center"> Pre-built services with Paddle Serving</h2>
<h3 align="center">Chinese Word Segmentation</h4>
``` shell
> python -m paddle_serving_app.package --get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py lac_model/ lac_workdir 9393 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}
```
# CPU users, please run
pip install paddlepaddle==2.0.0
<h3 align="center">Image Classification</h4>
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
# GPU Cuda10.2 please run
pip install paddlepaddle-gpu==2.0.0
```
For **Windows Users**, please read the document [Paddle Serving for Windows Users](./doc/WINDOWS_TUTORIAL.md)
<h2 align="center">Quick Start Example</h2>
This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
This quick start example is mainly for those users who already have a model to deploy, and we also provide a model that can be used for deployment. in case if you want to know how to complete the process from offline training to online service, please refer to the AiStudio tutorial above.
### Boston House Price Prediction model
get into the Serving git directory, and change dir to `fit_a_line`
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
cd Serving/python/examples/fit_a_line
sh get_data.sh
```
Paddle Serving provides HTTP and RPC based service for users to access
......@@ -123,6 +151,8 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
</center>
......@@ -145,26 +175,8 @@ Here, `client.predict` function has two arguments. `feed` is a `python dict` wit
Users can also put the data format processing logic on the server side, so that they can directly use curl to access the service, refer to the following case whose path is `python/examples/fit_a_line`
```python
from paddle_serving_server.web_service import WebService
import numpy as np
class UciService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
new_data = np.zeros((len(feed), 1, 13)).astype("float32")
for i, ins in enumerate(feed):
nums = np.array(ins["x"]).reshape(1, 1, 13)
new_data[i] = nums
feed = {"x": new_data}
return feed, fetch, is_batch
uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(workdir="workdir", port=9292)
uci_service.run_rpc_service()
uci_service.run_web_service()
```
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
```
for client side,
```
......@@ -187,20 +199,13 @@ the response is
### New to Paddle Serving
- [How to save a servable model?](doc/SAVE.md)
- [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md)
- [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
### Tutorial at AIStudio
- [Introduction to PaddleServing](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [Image Segmentation on Paddle Serving](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [Sentimental Analysis](https://aistudio.baidu.com/aistudio/projectdetail/509014)
- [Paddle Serving Examples](python/examples)
### Developers
- [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
- [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
- [Golang client](doc/IMDB_GO_CLIENT.md)
- [Compile from source code](doc/COMPILE.md)
- [Develop Pipeline Serving](doc/PIPELINE_SERVING.md)
- [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md)
- [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md)
......@@ -211,15 +216,13 @@ the response is
- [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
### FAQ
- [FAQ(Chinese)](doc/FAQ.md)
### Design
- [Design Doc](doc/DESIGN_DOC.md)
<h2 align="center">Community</h2>
### FAQ
- [FAQ(Chinese)](doc/FAQ.md)
<h2 align="center">Community</h2>
### Slack
......
......@@ -20,91 +20,126 @@
<br>
<p>
- [动机](./README_CN.md#动机)
- [教程](./README_CN.md#教程)
- [安装](./README_CN.md#安装)
- [快速开始示例](./README_CN.md#快速开始示例)
- [文档](README_CN.md#文档)
- [社区](README_CN.md#社区)
<h2 align="center">动机</h2>
Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 当用户使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络,就同时拥有了该模型的预测服务。
- 任何经过[PaddlePaddle](https://github.com/paddlepaddle/paddle)训练的模型,都可以经过直接保存或是[模型转换接口](./doc/SAVE_CN.md),用于Paddle Serving在线部署。
- 支持[多模型串联服务部署](./doc/PIPELINE_SERVING_CN.md), 同时提供Rest接口和RPC接口以满足您的需求,[Pipeline示例](./python/examples/pipeline)
- 支持Paddle生态的各大模型库, 例如[PaddleDetection](./python/examples/detection)[PaddleOCR](./python/examples/ocr)[PaddleRec](https://github.com/PaddlePaddle/PaddleRec/tree/master/tools/recserving/movie_recommender)
- 提供丰富多彩的前后处理,方便用户在训练、部署等各阶段复用相关代码,弥合AI开发者和应用开发者之间的鸿沟,详情参考[模型示例](./python/examples/)
<p align="center">
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">教程</h2>
Paddle Serving开发者为您提供了简单易用的[AIStudio教程-Paddle Serving服务化部署框架](https://aistudio.baidu.com/aistudio/projectdetail/1550674)
教程提供了如下内容
<ul>
<li>Paddle Serving环境安装</li>
<ul>
<li>Docker镜像启动方式
<li>pip安装Paddle Serving
</ul>
<li>快速体验部署在线推理服务</li>
<li>部署在线推理服务进阶流程</li>
<ul>
<li>获取可用于部署在线服务的模型</li>
<li>启动推理服务</li>
</ul>
<li>Paddle Serving在线部署实例</li>
<ul>
<li>使用Paddle Serving部署图像检测服务</li>
<li>使用Paddle Serving部署OCR Pipeline在线服务</li>
</ul>
</ul>
<h2 align="center">安装</h2>
**强烈建议**您在**Docker内构建**Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)。更多镜像请查看[Docker镜像列表](doc/DOCKER_IMAGES_CN.md)
**提示**:目前paddlepaddle 2.0版本的默认GPU环境是Cuda 10.2,因此GPU Docker的示例代码以Cuda 10.2为准。镜像和pip安装包也提供了其余GPU环境,用户如果使用其他环境,需要仔细甄别并选择合适的版本。
```
# 启动 CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker pull hub.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.5.0-devel
docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```
# 启动 GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.5.0-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
git clone https://github.com/PaddlePaddle/Serving
```
```shell
pip install paddle-serving-client==0.4.0
pip install paddle-serving-server==0.4.0 # CPU
pip install paddle-serving-client==0.5.0
pip install paddle-serving-server==0.5.0 # CPU
pip install paddle-serving-app==0.2.0
pip install paddle-serving-server-gpu==0.4.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.4.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.4.0.100 # GPU with CUDA10.1+TensorRT
pip install paddle-serving-server-gpu==0.5.0.post102 #GPU with CUDA10.2 + TensorRT7
# 其他GPU环境需要确认环境再选择执行哪一条
pip install paddle-serving-server-gpu==0.5.0.post9 # GPU with CUDA9.0
pip install paddle-serving-server-gpu==0.5.0.post10 # GPU with CUDA10.0
pip install paddle-serving-server-gpu==0.5.0.post101 # GPU with CUDA10.1 + TensorRT6
pip install paddle-serving-server-gpu==0.5.0.post11 # GPU with CUDA10.1 + TensorRT7
```
您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。如果您想自行编译,请参照[Paddle Serving编译文档](./doc/COMPILE_CN.md)
paddle-serving-server和paddle-serving-server-gpu安装包支持Centos 6/7, Ubuntu 16/18和Windows 10。
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python2.7/3.5/3.6。
推荐安装1.8.4及以上版本的paddle
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)
paddle-serving-client和paddle-serving-app安装包支持Linux和Windows,其中paddle-serving-client仅支持python2.7/3.5/3.6/3.7/3.8。
<h2 align="center"> Paddle Serving预装的服务 </h2>
推荐安装2.0.0及以上版本的paddle
<h3 align="center">中文分词</h4>
```
# CPU环境请执行
pip install paddlepaddle==2.0.0
``` shell
> python -m paddle_serving_app.package --get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py lac_model/ lac_workdir 9393 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}
# GPU Cuda10.2环境请执行
pip install paddlepaddle-gpu==2.0.0
```
<h3 align="center">图像分类</h4>
**注意**: 如果您的Cuda版本不是10.2,请勿直接执行上述命令,需要参考[Paddle官方文档-多版本whl包列表](https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#whl-release)
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
> python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
选择相应的GPU环境的url链接并进行安装,例如Cuda 9.0的Python2.7用户,请选择表格当中的`cp27-cp27mu``cuda9.0_cudnn7-mkl`对应的url,复制下来并执行
```
pip install https://paddle-wheel.bj.bcebos.com/2.0.0-gpu-cuda9-cudnn7-mkl/paddlepaddle_gpu-2.0.0.post90-cp27-cp27mu-linux_x86_64.whl
```
如果是其他环境和Python版本,请在表格中找到对应的链接并用pip安装。
对于**Windows 10 用户**,请参考文档[Windows平台使用Paddle Serving指导](./doc/WINDOWS_TUTORIAL_CN.md)
<h2 align="center">快速开始示例</h2>
这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的,而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程,请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)
这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的,而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程,请参考前文的AiStudio教程。
<h3 align="center">波士顿房价预测</h3>
进入到Serving的git目录下,进入到`fit_a_line`例子
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
cd Serving/python/examples/fit_a_line
sh get_data.sh
```
Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
......@@ -127,10 +162,10 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `mem_optim_off` | - | - | Disable memory optimization |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
| `use_trt` (Only for trt version) | - | - | Run inference with TensorRT |
| `use_trt` (Only for Cuda>=10.1 version) | - | - | Run inference with TensorRT |
| `use_lite` (Only for ARM) | - | - | Run PaddleLite inference |
| `use_xpu` (Only for ARM+XPU) | - | - | Run PaddleLite XPU inference |
我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求,请参考英文文
[requests](https://requests.readthedocs.io/en/master/)
</center>
``` python
......@@ -151,26 +186,8 @@ print(fetch_map)
<h3 align="center">HTTP服务</h3>
用户也可以将数据格式处理逻辑放在服务器端进行,这样就可以直接用curl去访问服务,参考如下案例,在目录`python/examples/fit_a_line`
```python
from paddle_serving_server.web_service import WebService
import numpy as np
class UciService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
new_data = np.zeros((len(feed), 1, 13)).astype("float32")
for i, ins in enumerate(feed):
nums = np.array(ins["x"]).reshape(1, 1, 13)
new_data[i] = nums
feed = {"x": new_data}
return feed, fetch, is_batch
uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(workdir="workdir", port=9292)
uci_service.run_rpc_service()
uci_service.run_web_service()
```
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
```
客户端输入
```
......@@ -193,20 +210,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1
### 新手教程
- [怎样保存用于Paddle Serving的模型?](doc/SAVE_CN.md)
- [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md)
- [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
### AIStudio教程
- [PaddleServing作业](https://aistudio.baidu.com/aistudio/projectdetail/605819)
- [PaddleServing图像分割](https://aistudio.baidu.com/aistudio/projectdetail/457715)
- [PaddleServing情感分析](https://aistudio.baidu.com/aistudio/projectdetail/509014)
- [Paddle Serving示例合辑](python/examples)
### 开发者教程
- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
- [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
- [如何编译PaddleServing?](doc/COMPILE_CN.md)
- [如何开发Pipeline?](doc/PIPELINE_SERVING_CN.md)
- [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md)
- [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md)
......@@ -217,12 +227,12 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1
- [CPU版Benchmarks](doc/BENCHMARKING.md)
- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
### FAQ
- [常见问答](doc/FAQ.md)
### 设计文档
- [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
### FAQ
- [常见问答](doc/FAQ.md)
<h2 align="center">社区</h2>
### Slack
......
# Paddle Serving Using Baidu Kunlun Chips
(English|[简体中文](./BAIDU_KUNLUN_XPU_SERVING_CN.md))
Paddle serving supports deployment using Baidu Kunlun chips. At present, the pilot support is deployed on the ARM server with Baidu Kunlun chips
(such as Phytium FT-2000+/64). We will improve
the deployment capability on various heterogeneous hardware servers in the future.
# Compilation and installation
Refer to [compile](COMPILE.md) document to setup the compilation environment。
## Compilatiton
* Compile the Serving Server
```
cd Serving
mkdir -p server-build-arm && cd server-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DSERVER=ON ..
make -j10
```
You can run `make install` to produce the target in `./output` directory. Add `-DCMAKE_INSTALL_PREFIX=./output` to specify the output path to CMake command shown above。
* Compile the Serving Client
```
mkdir -p client-build-arm && cd client-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DCLIENT=ON ..
make -j10
```
* Compile the App
```
cd Serving
mkdir -p app-build-arm && cd app-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DAPP=ON ..
make -j10
```
## Install the wheel package
After the compilations stages above, the whl package will be generated in ```python/dist/``` under the specific temporary directories.
For example, after the Server Compiation step,the whl package will be produced under the server-build-arm/python/dist directory, and you can run ```pip install -u python/dist/*.whl``` to install the package。
# Request parameters description
In order to deploy serving
service on the arm server with Baidu Kunlun xpu chips and use the acceleration capability of Paddle-Lite,please specify the following parameters during deployment。
|param|param description|about|
|:--|:--|:--|
|use_lite|using Paddle-Lite Engine|use the inference capability of Paddle-Lite|
|use_xpu|using Baidu Kunlun for inference|need to be used with the use_lite option|
|ir_optim|open the graph optimization|refer to[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
# Deplyment examples
## Download the model
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
## Start RPC service
There are mainly three deployment methods:
* deploy on the ARM server with Baidu xpu using the acceleration capability of Paddle-Lite and xpu;
* deploy on the ARM server standalone with Paddle-Lite;
* deploy on the ARM server standalone without Paddle-Lite。
The first two deployment methods are recommended。
Start the rpc service, deploying on ARM server with Baidu Kunlun chips,and accelerate with Paddle-Lite and Baidu Kunlun xpu.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
Start the rpc service, deploying on ARM server,and accelerate with Paddle-Lite.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
Start the rpc service, deploying on ARM server.
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
```
##
```
from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
Some examples are provided below, and other models can be modifed with reference to these examples。
|sample name|sample links|
|:-----|:--|
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
# Paddle Serving使用百度昆仑芯片部署
(简体中文|[English](./BAIDU_KUNLUN_XPU_SERVING.md))
Paddle Serving支持使用百度昆仑芯片进行预测部署。目前试验性支持在百度昆仑芯片和arm服务器(如飞腾 FT-2000+/64)上进行部署,后续完善对其他异构硬件服务器部署能力。
# 编译、安装
基本环境配置可参考[该文档](COMPILE_CN.md)进行配置。
## 编译
* 编译server部分
```
cd Serving
mkdir -p server-build-arm && cd server-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DSERVER=ON ..
make -j10
```
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
* 编译client部分
```
mkdir -p client-build-arm && cd client-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DCLIENT=ON ..
make -j10
```
* 编译app部分
```
cd Serving
mkdir -p app-build-arm && cd app-build-arm
cmake -DPYTHON_INCLUDE_DIR=/usr/include/python3.7m/ \
-DPYTHON_LIBRARIES=/usr/lib64/libpython3.7m.so \
-DPYTHON_EXECUTABLE=/usr/bin/python \
-DWITH_PYTHON=ON \
-DWITH_LITE=ON \
-DWITH_XPU=ON \
-DAPP=ON ..
make -j10
```
## 安装wheel包
以上编译步骤完成后,会在各自编译目录$build_dir/python/dist生成whl包,分别安装即可。例如server步骤,会在server-build-arm/python/dist目录下生成whl包, 使用命令```pip install -u xxx.whl```进行安装。
# 请求参数说明
为了支持arm+xpu服务部署,使用Paddle-Lite加速能力,请求时需使用以下参数。
|参数|参数说明|备注|
|:--|:--|:--|
|use_lite|使用Paddle-Lite Engine|使用Paddle-Lite cpu预测能力|
|use_xpu|使用Baidu Kunlun进行预测|该选项需要与use_lite配合使用|
|ir_optim|开启Paddle-Lite计算子图优化|详细见[Paddle-Lite](https://github.com/PaddlePaddle/Paddle-Lite)|
# 部署使用示例
## 下载模型
```
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
## 启动rpc服务
主要有三种启动配置:
* 使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力;
* 单独使用arm cpu部署,使用Paddle-Lite优化加速能力;
* 使用arm cpu部署,不使用Paddle-Lite加速。
推荐使用前两种部署方式。
启动rpc服务,使用arm cpu+xpu部署,使用Paddle-Lite xpu优化加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --use_xpu --ir_optim
```
启动rpc服务,使用arm cpu部署, 使用Paddle-Lite加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292 --use_lite --ir_optim
```
启动rpc服务,使用arm cpu部署, 不使用Paddle-Lite加速能力
```
python3 -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 6 --port 9292
```
## client调用
```
from paddle_serving_client import Client
import numpy as np
client = Client()
client.load_client_config("uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
data = [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727,
-0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]
fetch_map = client.predict(feed={"x": np.array(data).reshape(1,13,1)}, fetch=["price"])
print(fetch_map)
```
以下提供部分样例,其他模型可参照进行修改。
|示例名称|示例链接|
|:-----|:--|
|fit_a_line|[fit_a_line_xpu](../python/examples/xpu/fit_a_line_xpu)|
|resnet|[resnet_v2_50_xpu](../python/examples/xpu/resnet_v2_50_xpu)|
......@@ -6,12 +6,11 @@
| module | version |
| :--------------------------: | :-------------------------------: |
| OS | CentOS 7 |
| gcc | 4.8.5 and later |
| gcc-c++ | 4.8.5 and later |
| git | 3.82 and later |
| OS | Ubuntu16 and 18/CentOS 7 |
| gcc | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| gcc-c++ | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| cmake | 3.2.0 and later |
| Python | 2.7.2 and later / 3.6 and later |
| Python | 2.7.2 and later / 3.5.1 and later |
| Go | 1.9.2 and later |
| git | 2.17.1 and later |
| glibc-static | 2.17 |
......@@ -19,19 +18,13 @@
| bzip2-devel | 1.0.6 and later |
| python-devel / python3-devel | 2.7.5 and later / 3.6.8 and later |
| sqlite-devel | 3.7.17 and later |
| patchelf | 0.9 and later |
| patchelf | 0.9 |
| libXext | 1.3.3 |
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you, see [this document](DOCKER_IMAGES.md).
This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
- Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
- Set `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`
## Get Code
``` python
......@@ -39,19 +32,47 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive
```
## PYTHONROOT Setting
## PYTHONROOT settings
```shell
# for example, the path of python is /usr/bin/python, you can set /usr as PYTHONROOT
export PYTHONROOT=/usr/
# For example, the path of python is /usr/bin/python, you can set PYTHONROOT
export PYTHONROOT=/usr
```
In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
If you are using a Docker development image, please follow the following to determine the Python version to be compiled, and set the corresponding environment variables
```
#Python 2.7
export PYTHONROOT=/usr/local/python2.7.15/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python2.7
#Python 3.5
export PYTHONROOT=/usr/local/python3.5.1
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.5m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.5m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.5
#Python3.6
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.6m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.6m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.6
#Python3.7
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.7m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.7m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.7
#Python3.8
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.8
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.8.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.8
```
## Install Python dependencies
......@@ -59,14 +80,11 @@ In the default centos7 image we provide, the Python path is `/usr/bin/python`. I
pip install -r python/requirements.txt
```
If Python3 is used, replace `pip` with `pip3`.
If you use other Python version, please use the right `pip` accordingly.
## GOPATH Setting
The default GOPATH is set to `$HOME/go`, you can also set it to other values. **If it is the Docker environment provided by Serving, you do not need to set up.**
## Compile Arguments
The default GOPATH is `$HOME/go`, which you can set to other values.
```shell
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
......@@ -100,52 +118,42 @@ make -j10
you can execute `make install` to put targets under directory `./output`, you need to add`-DCMAKE_INSTALL_PREFIX=./output`to specify output path to cmake command shown above.
### Integrated GPU version paddle inference library
### CUDA_PATH is the cuda install path,use the command(whereis cuda) to check,it should be /usr/local/cuda.
### CUDNN_LIBRARY && CUDA_CUDART_LIBRARY is the lib path, it should be /usr/local/cuda/lib64/
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j10
```
Compared with CPU environment, GPU environment needs to refer to the following table,
**It should be noted that the following table is used as a reference for non-Docker compilation environment. The Docker compilation environment has been configured with relevant parameters and does not need to be specified in cmake process. **
### Integrated TRT version paddle inference library
| cmake environment variable | meaning | GPU environment considerations | whether Docker environment is needed |
|-----------------------|------------------------- ------------|-------------------------------|----- ---------------|
| CUDA_TOOLKIT_ROOT_DIR | cuda installation path, usually /usr/local/cuda | Required for all environments | No
(/usr/local/cuda) |
| CUDNN_LIBRARY | The directory where libcudnn.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
| CUDA_CUDART_LIBRARY | The directory where libcudart.so.* is located, usually /usr/local/cuda/lib64/ | Required for all environments | No (/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | The upper level directory of the directory where libnvinfer.so.* is located, depends on the TensorRT installation directory | Cuda 9.0/10.0 does not need, other needs | No (/usr) |
```
If not in Docker environment, users can refer to the following execution methods. The specific path is subject to the current environment, and the code is only for reference.
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
mkdir server-build-trt && cd server-build-trt
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH}
-DSERVER=ON \
-DWITH_GPU=ON \
-DWITH_TRT=ON ..
-DWITH_GPU=ON ..
make -j10
```
execute `make install` to put targets under directory `./output`
**Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
Execute `make install` to put the target output in the `./output` directory.
**Note:** After the compilation is successful, you need to set the `SERVING_BIN` path, see the following [Notes](COMPILE.md#Notes) ).
## Compile Client
......@@ -208,7 +216,6 @@ Please use the example under `python/examples` to verify.
| CLIENT | Compile Paddle Serving Client | OFF |
| SERVER | Compile Paddle Serving Server | OFF |
| APP | Compile Paddle Serving App package | OFF |
| WITH_ELASTIC_CTR | Compile ELASITC-CTR solution | OFF |
| PACK | Compile for whl | OFF |
### WITH_GPU Option
......@@ -230,11 +237,13 @@ Note here:
The following is the base library version matching relationship used by the PaddlePaddle release version for reference:
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------------------: | :----: |
| post9 | 9.0 | CuDNN 7.3.1 for CUDA 9.0 | |
| post10 | 10.0 | CuDNN 7.5.1 for CUDA 10.0| |
| trt | 10.1 | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5 |
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------: | :----: |
| post9 | 9.0 | CuDNN 7.6.4 | |
| post10 | 10.0 | CuDNN 7.6.5 | |
| post101 | 10.1 | CuDNN 7.6.5 | 6.0.1 |
| post102 | 10.2 | CuDNN 8.0.5 | 7.1.3 |
| post11 | 11.0 | CuDNN 8.0.4 | 7.1.3 |
### How to make the compiler detect the CuDNN library
......
......@@ -6,12 +6,11 @@
| 组件 | 版本要求 |
| :--------------------------: | :-------------------------------: |
| OS | CentOS 7 |
| gcc | 4.8.5 and later |
| gcc-c++ | 4.8.5 and later |
| git | 3.82 and later |
| OS | Ubuntu16 and 18/CentOS 7 |
| gcc | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| gcc-c++ | 4.8.5(Cuda 9.0 and 10.0) and 8.2(Others) |
| cmake | 3.2.0 and later |
| Python | 2.7.2 and later / 3.6 and later |
| Python | 2.7.2 and later / 3.5.1 and later |
| Go | 1.9.2 and later |
| git | 2.17.1 and later |
| glibc-static | 2.17 |
......@@ -24,13 +23,7 @@
| libSM | 1.2.2 |
| libXrender | 0.9.10 |
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境,详见[该文档](DOCKER_IMAGES_CN.md)
本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可:
-`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
-`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
-`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境并配置好了上述编译依赖,详见[该文档](DOCKER_IMAGES_CN.md)
## 获取代码
......@@ -39,19 +32,46 @@ git clone https://github.com/PaddlePaddle/Serving
cd Serving && git submodule update --init --recursive
```
## PYTHONROOT设置
```shell
# 例如python的路径为/usr/bin/python,可以设置PYTHONROOT
export PYTHONROOT=/usr/
export PYTHONROOT=/usr
```
我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`
如果您使用的是Docker开发镜像,请按照如下,确定好需要编译的Python版本,设置对应的环境变量
```
#Python 2.7
export PYTHONROOT=/usr/local/python2.7.15/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python2.7
#Python 3.5
export PYTHONROOT=/usr/local/python3.5.1
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.5m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.5m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.5
#Python3.6
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.6m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.6m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.6
#Python3.7
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.7m
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.7m.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.7
#Python3.8
export PYTHONROOT=/usr/local/
export PYTHON_INCLUDE_DIR=$PYTHONROOT/include/python3.8
export PYTHON_LIBRARIES=$PYTHONROOT/lib/libpython3.8.so
export PYTHON_EXECUTABLE=$PYTHONROOT/bin/python3.8
```
## 安装Python依赖
......@@ -59,11 +79,11 @@ export PYTHONROOT=/usr/
pip install -r python/requirements.txt
```
如果使用 Python3,请以 `pip3` 替换 `pip`
如果使用其他Python版本,请使用对应版本的`pip`
## GOPATH 设置
默认 GOPATH 设置为 `$HOME/go`,您也可以设置为其他值。
默认 GOPATH 设置为 `$HOME/go`,您也可以设置为其他值。** 如果是Serving提供的Docker环境,可以不需要设置。**
```shell
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
......@@ -87,9 +107,9 @@ go get -u google.golang.org/grpc@v1.33.0
``` shell
mkdir server-build-cpu && cd server-build-cpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR/ \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DSERVER=ON ..
make -j10
```
......@@ -97,44 +117,35 @@ make -j10
可以执行`make install`把目标产出放在`./output`目录下,cmake阶段需添加`-DCMAKE_INSTALL_PREFIX=./output`选项来指定存放路径。
### 集成GPU版本Paddle Inference Library
### CUDA_PATH是cuda的安装路径,可以使用命令行whereis cuda命令确认你的cuda安装路径,通常应该是/usr/local/cuda
### CUDNN_LIBRARY CUDA_CUDART_LIBRARY 是cuda库文件的路径,通常应该是/usr/local/cuda/lib64/
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DSERVER=ON \
-DWITH_GPU=ON ..
make -j10
```
相比CPU环境,GPU环境需要参考以下表格,
**需要说明的是,以下表格对非Docker编译环境作为参考,Docker编译环境已经配置好相关参数,无需在cmake过程指定。**
### 集成TensorRT版本Paddle Inference Library
| cmake环境变量 | 含义 | GPU环境注意事项 | Docker环境是否需要 |
|-----------------------|-------------------------------------|-------------------------------|--------------------|
| CUDA_TOOLKIT_ROOT_DIR | cuda安装路径,通常为/usr/local/cuda | 全部环境都需要 | 否(/usr/local/cuda) |
| CUDNN_LIBRARY | libcudnn.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) |
| CUDA_CUDART_LIBRARY | libcudart.so.*所在目录,通常为/usr/local/cuda/lib64/ | 全部环境都需要 | 否(/usr/local/cuda/lib64/) |
| TENSORRT_ROOT | libnvinfer.so.*所在目录的上一级目录,取决于TensorRT安装目录 | Cuda 9.0/10.0不需要,其他需要 | 否(/usr) |
```
非Docker环境下,用户可以参考如下执行方式,具体的路径以当时环境为准,代码仅作为参考。
``` shell
export CUDA_PATH='/usr/local/cuda'
export CUDNN_LIBRARY='/usr/local/cuda/lib64/'
export CUDA_CUDART_LIBRARY="/usr/local/cuda/lib64/"
export TENSORRT_LIBRARY_PATH="/usr/local/TensorRT-6.0.1.5/targets/x86_64-linux-gnu/"
mkdir server-build-trt && cd server-build-trt
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH} \
mkdir server-build-gpu && cd server-build-gpu
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCUDA_TOOLKIT_ROOT_DIR=${CUDA_PATH} \
-DCUDNN_LIBRARY=${CUDNN_LIBRARY} \
-DCUDA_CUDART_LIBRARY=${CUDA_CUDART_LIBRARY} \
-DTENSORRT_ROOT=${TENSORRT_LIBRARY_PATH}
-DSERVER=ON \
-DWITH_GPU=ON \
-DWITH_TRT=ON ..
-DWITH_GPU=ON ..
make -j10
```
......@@ -146,9 +157,9 @@ make -j10
``` shell
mkdir client-build && cd client-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DCLIENT=ON ..
make -j10
```
......@@ -161,10 +172,9 @@ make -j10
```bash
mkdir app-build && cd app-build
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DCMAKE_INSTALL_PREFIX=./output \
cmake -DPYTHON_INCLUDE_DIR=$PYTHON_INCLUDE_DIR \
-DPYTHON_LIBRARIES=$PYTHON_LIBRARIES \
-DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
-DAPP=ON ..
make
```
......@@ -183,7 +193,7 @@ make
运行python端Server时,会检查`SERVING_BIN`环境变量,如果想使用自己编译的二进制文件,请将设置该环境变量为对应二进制文件的路径,通常是`export SERVING_BIN=${BUILD_DIR}/core/general-server/serving`
其中BUILD_DIR为server-build-cpu或server-build-gpu的绝对路径。
可以cd server-build-cpu路径下,执行export SERVING_BIN=${PWD}/core/general-server/serving
可以cd server-build-cpu路径下,执行`export SERVING_BIN=${PWD}/core/general-server/serving`
......@@ -207,7 +217,6 @@ make
| CLIENT | Compile Paddle Serving Client | OFF |
| SERVER | Compile Paddle Serving Server | OFF |
| APP | Compile Paddle Serving App package | OFF |
| WITH_ELASTIC_CTR | Compile ELASITC-CTR solution | OFF |
| PACK | Compile for whl | OFF |
### WITH_GPU选项
......@@ -226,13 +235,15 @@ Paddle Serving通过PaddlePaddle预测库支持在GPU上做预测。WITH_GPU选
1. 编译Serving所在的系统上所安装的CUDA/CUDNN等基础库版本,需要兼容实际的GPU设备。例如,Tesla V100卡至少要CUDA 9.0。如果编译时所用CUDA等基础库版本过低,由于生成的GPU代码和实际硬件设备不兼容,会导致Serving进程无法启动,或出现coredump等严重问题。
2. 运行Paddle Serving的系统上安装与实际GPU设备兼容的CUDA driver,并安装与编译期所用的CUDA/CuDNN等版本兼容的基础库。如运行Paddle Serving的系统上安装的CUDA/CuDNN的版本低于编译时所用版本,可能会导致奇怪的cuda函数调用失败等问题。
以下是PaddlePaddle发布版本所使用的基础库版本匹配关系,供参考:
以下是PaddleServing 镜像的Cuda与Cudnn,TensorRT的匹配关系,供参考:
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------------------: | :----: |
| post9 | 9.0 | CuDNN 7.3.1 for CUDA 9.0 | |
| post10 | 10.0 | CuDNN 7.5.1 for CUDA 10.0| |
| trt | 10.1 | CuDNN 7.5.1 for CUDA 10.1| 6.0.1.5 |
| | CUDA | CuDNN | TensorRT |
| :----: | :-----: | :----------: | :----: |
| post9 | 9.0 | CuDNN 7.6.4 | |
| post10 | 10.0 | CuDNN 7.6.5 | |
| post101 | 10.1 | CuDNN 7.6.5 | 6.0.1 |
| post102 | 10.2 | CuDNN 8.0.5 | 7.1.3 |
| post11 | 11.0 | CuDNN 8.0.4 | 7.1.3 |
### 如何让Paddle Serving编译系统探测到CuDNN库
......
......@@ -8,11 +8,10 @@ This document maintains a list of docker images provided by Paddle Serving.
You can get images in two ways:
1. Pull image directly from `hub.baidubce.com ` or `docker.io` through TAG:
1. Pull image directly from `registry.baidubce.com ` through TAG:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
```
2. Building image based on dockerfile
......@@ -20,7 +19,7 @@ You can get images in two ways:
Create a new folder and copy Dockerfile to this folder, and run the following command:
```shell
docker build -t <image-name>:<images-tag> .
docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
```
......@@ -59,6 +58,42 @@ hub.baidubce.com/paddlepaddle/serving:xpu-beta
Running a CUDA container requires a machine with at least one CUDA-capable GPU and a driver compatible with the CUDA toolkit version you are using.
The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit doesn't have to be installed.
The machine running the CUDA container **only requires the NVIDIA driver**, the CUDA toolkit does not have to be installed.
For the relationship between CUDA toolkit version, Driver version and GPU architecture, please refer to [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA).
# (Attachment) The List of All the Docker images
Develop Images:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|------------------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-devel | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8-devel | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
Running Images:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|-----------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0 | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8 | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
......@@ -8,11 +8,10 @@
您可以通过两种方式获取镜像。
1. 通过 TAG 直接从 `hub.baidubce.com ``docker.io` 拉取镜像:
1. 通过 TAG 直接从 `registry.baidubce.com ` 或 拉取镜像,具体TAG请参见下文的**镜像说明**章节的表格。
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:<TAG> # hub.baidubce.com
docker pull paddlepaddle/serving:<TAG> # hub.docker.com
docker pull registry.baidubce.com/paddlepaddle/serving:<TAG> # registry.baidubce.com
```
2. 基于 Dockerfile 构建镜像
......@@ -20,7 +19,8 @@
建立新目录,复制对应 Dockerfile 内容到该目录下 Dockerfile 文件。执行
```shell
docker build -t <image-name>:<images-tag> .
cd tools
docker build -f ${DOCKERFILE} -t <image-name>:<images-tag> .
```
......@@ -29,6 +29,8 @@
运行时镜像不能用于开发编译。
若需要基于源代码二次开发编译,请使用后缀为-devel的版本。
**在TAG列,latest也可以替换成对应的版本号,例如0.5.0/0.4.1等,但需要注意的是,部分开发环境随着某个版本迭代才增加,因此并非所有环境都有对应的版本号可以使用。**
| 镜像选择 | 操作系统 | TAG | Dockerfile |
| :----------------------------------------------------------: | :-----: | :--------------------------: | :----------------------------------------------------------: |
......@@ -55,6 +57,7 @@ hub.baidubce.com/paddlepaddle/serving:latest-java
hub.baidubce.com/paddlepaddle/serving:xpu-beta
```
## 运行CUDA容器的要求
运行CUDA容器需要至少具有一个支持CUDA的GPU以及与您所使用的CUDA工具包版本兼容的驱动程序。
......@@ -62,3 +65,40 @@ hub.baidubce.com/paddlepaddle/serving:xpu-beta
运行CUDA容器的机器**只需要相应的NVIDIA驱动程序**,而CUDA工具包不是必要的。
相关CUDA工具包版本、驱动版本和GPU架构的关系请参阅 [nvidia-docker wiki](https://github.com/NVIDIA/nvidia-docker/wiki/CUDA)
# (附录)所有镜像列表
编译镜像:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|------------------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-devel | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7-devel | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7-devel | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8-devel | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8-devel | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
运行镜像:
| Env | Version | Docker images tag | OS | Gcc Version |
|----------|---------|-----------------------|-----------|-------------|
| CPU | 0.5.0 | 0.5.0 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0 | CentOS 7 | 4.8.5 |
| Cuda9.0 | 0.5.0 | 0.5.0-cuda9.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda9.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.0 | 0.5.0 | 0.5.0-cuda10.0-cudnn7 | Ubuntu 16 | 4.8.5 |
| | <=0.4.0 | 0.4.0-cuda10.0-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.1 | 0.5.0 | 0.5.0-cuda10.1-cudnn7 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | 0.4.0-cuda10.1-cudnn7 | CentOS 7 | 4.8.5 |
| Cuda10.2 | 0.5.0 | 0.5.0-cuda10.2-cudnn8 | Ubuntu 16 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
| Cuda11.0 | 0.5.0 | 0.5.0-cuda11.0-cudnn8 | Ubuntu 18 | 8.2.0 |
| | <=0.4.0 | Nan | Nan | Nan |
......@@ -87,3 +87,34 @@ https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_client-0.0.0-cp36-no
# App
https://paddle-serving.bj.bcebos.com/whl/xpu/paddle_serving_app-0.0.0-py3-none-any.whl
```
### Binary Package
for most users, we do not need to read this section. But if you deploy your Paddle Serving on a machine without network, you will encounter a problem that the binary executable tar file cannot be downloaded. Therefore, here we give you all the download links for various environment.
#### Bin links
```
# CPU AVX MKL
https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-mkl-0.0.0.tar.gz
# CPU AVX OPENBLAS
https://paddle-serving.bj.bcebos.com/bin/serving-cpu-avx-openblas-0.0.0.tar.gz
# CPU NOAVX OPENBLAS
https://paddle-serving.bj.bcebos.com/bin/serving-cpu-noavx-openblas-0.0.0.tar.gz
# Cuda 9
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda9-0.0.0.tar.gz
# Cuda 10
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda10-0.0.0.tar.gz
# Cuda 10.1
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-101-0.0.0.tar.gz
# Cuda 10.2
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-102-0.0.0.tar.gz
# Cuda 11
https://paddle-serving.bj.bcebos.com/bin/serving-gpu-cuda11-0.0.0.tar.gz
```
#### How to setup SERVING_BIN offline?
- download the serving server whl package and bin package, and make sure they are for the same environment
- download the serving client whl and serving app whl, pay attention to the Python version.
- `pip install ` the serving and `tar xf ` the binary package, then `export SERVING_BIN=$PWD/serving-gpu-cuda10-0.0.0/serving` (take Cuda 10.0 as the example)
......@@ -17,14 +17,14 @@ This document takes Python2 as an example to show how to run Paddle Serving in d
Refer to [this document](DOCKER_IMAGES.md) for a docker image:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
```
### Create container
```bash
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-devel
docker exec -it test bash
```
......@@ -46,20 +46,20 @@ The GPU version is basically the same as the CPU version, with only some differe
Refer to [this document](DOCKER_IMAGES.md) for a docker image, the following is an example of an `cuda9.0-cudnn7` image:
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
```
### Create container
```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
```
or
```bash
docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
docker exec -it test bash
```
......@@ -69,7 +69,7 @@ The `-p` option is to map the `9292` port of the container to the `9292` port of
The mirror comes with `paddle_serving_server_gpu`, `paddle_serving_client`, and `paddle_serving_app` corresponding to the mirror tag version. If users don’t need to change the version, they can use it directly, which is suitable for environments without extranet services.
If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version.
If you need to change the version, please refer to the instructions on the homepage to download the pip package of the corresponding version. [LATEST_PACKAGES](./LATEST_PACKAGES.md)
## Precautious
......
......@@ -16,14 +16,16 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker)
参考[该文档](DOCKER_IMAGES_CN.md)获取镜像:
以CPU编译镜像为例
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker pull hub.baidubce.com/paddlepaddle/serving:latest-devel
```
### 创建容器并进入
```bash
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-devel
docker exec -it test bash
```
......@@ -37,15 +39,19 @@ docker exec -it test bash
## GPU 版本
```shell
docker pull hub.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
```
### 创建容器并进入
```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
nvidia-docker exec -it test bash
```
或者
```bash
docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda9.0-cudnn7
docker run --gpus all -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-cuda10.2-cudnn8-devel
docker exec -it test bash
```
......@@ -55,7 +61,7 @@ docker exec -it test bash
镜像里自带对应镜像tag版本的`paddle_serving_server_gpu``paddle_serving_client``paddle_serving_app`,如果用户不需要更改版本,可以直接使用,适用于没有外网服务的环境。
如果需要更换版本,请参照首页的指导,下载对应版本的pip包。
如果需要更换版本,请参照首页的指导,下载对应版本的pip包。[最新安装包合集](LATEST_PACKAGES.md)
## 注意事项
......
......@@ -2,7 +2,74 @@
([简体中文](./SAVE_CN.md)|English)
## Save from training or prediction script
## Export from saved model files
you can use a build-in python module called `paddle_serving_client.convert` to convert it.
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
```
Arguments are the same as `inference_model_to_serving` API.
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
**Demo: Convert From Dynamic Graph**
PaddlePaddle 2.0 provides a new dynamic graph mode, so here we use imagenet ResNet50 dynamic graph as an example to teach how to export from a saved model and use it for real online inference scenarios.
```
wget https://paddle-serving.bj.bcebos.com/others/dygraph_res50.tar #模型
wget https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg #示例输入(向日葵)
tar xf dygraph_res50.tar
python -m paddle_serving_client.convert --dirname . --model_filename dygraph_model.pdmodel --params_filename dygraph_model.pdiparams --serving_server serving_server --serving_client serving_client```
We can see that the `serving_server` and `serving_client` folders hold the server and client configuration of the model respectively
Start the server (GPU)
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
```
Client (`test_client.py`)
```
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"serving_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"inputs": img}, fetch=["save_infer_model/scale_0.tmp_0"])
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
```
Run
```
python test_client.py
```
You can see that the prediction has been successfully executed. The above is the content predicted by the dynamic graph ResNet50 model on Serving. The use of other dynamic graph models is similar.
## Save from training or prediction script (Static Graph Mode)
Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
``` python
import paddle_serving_client.io as serving_io
......@@ -31,22 +98,3 @@ for line in sys.stdin:
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
## Export from saved model files
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
```
Or you can use a build-in python module called `paddle_serving_client.convert` to convert it.
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
Arguments are the same as `inference_model_to_serving` API.
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
| `params_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
......@@ -2,14 +2,79 @@
(简体中文|[English](./SAVE.md))
## 从训练或预测脚本中保存
## 从已保存的模型文件中导出
如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
也可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None)
```
模块参数与`inference_model_to_serving`接口参数相同。
| 参数 | 类型 | 默认值 | 描述 |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。|
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保>存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
**示例:从动态图模型中导出**
PaddlePaddle 2.0提供了全新的动态图模式,因此我们这里以imagenet ResNet50动态图为示例教学如何从已保存模型导出,并用于真实的在线预测场景。
```
wget https://paddle-serving.bj.bcebos.com/others/dygraph_res50.tar #模型
wget https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg #示例输入(向日葵)
tar xf dygraph_res50.tar
python -m paddle_serving_client.convert --dirname . --model_filename dygraph_model.pdmodel --params_filename dygraph_model.pdiparams --serving_server serving_server --serving_client serving_client
```
我们可以看到`serving_server``serving_client`文件夹分别保存着模型的服务端和客户端配置
启动服务端(GPU)
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9393 --gpu_id 0
```
客户端写法,保存为`test_client.py`
```
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"serving_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"inputs": img}, fetch=["save_infer_model/scale_0.tmp_0"])
print(fetch_map["save_infer_model/scale_0.tmp_0"].reshape(-1))
```
执行
```
python test_client.py
```
即可看到成功的执行了预测,以上就是动态图ResNet50模型在Serving上预测的内容,其他动态图模型使用方式与之类似。
## 从训练或预测脚本中保存(静态图)
目前,Paddle Serving提供了一个save_model接口供用户访问,该接口与Paddle的`save_inference_model`类似。
``` python
import paddle_serving_client.io as serving_io
serving_io.save_model("imdb_model", "imdb_client_conf",
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
paddle.static.default_main_program())
```
imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。
......@@ -32,22 +97,3 @@ for line in sys.stdin:
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
## 从已保存的模型文件中导出
如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None)
```
或者你可以使用Paddle Serving提供的名为`paddle_serving_client.convert`的内置模块进行转换。
```python
python -m paddle_serving_client.convert --dirname ./your_inference_model_dir
```
模块参数与`inference_model_to_serving`接口参数相同。
| 参数 | 类型 | 默认值 | 描述 |
|--------------|------|-----------|--------------------------------|
| `dirname` | str | - | 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。|
| `serving_server` | str | `"serving_server"` | 转换后的模型文件和配置文件的存储路径。默认值为serving_server |
| `serving_client` | str | `"serving_client"` | 转换后的客户端配置文件存储路径。默认值为serving_client |
| `model_filename` | str | None | 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名 |
| `params_filename` | str | None | 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None |
# An End-to-end Tutorial from Training to Inference Service Deployment
([简体中文](./TRAIN_TO_SERVICE_CN.md)|English)
Paddle Serving is Paddle's high-performance online inference service framework, which can flexibly support the deployment of most models. In this article, the IMDB review sentiment analysis task is used as an example to show the entire process from model training to deployment of inference service through 9 steps.
## Step1:Prepare for Running Environment
Paddle Serving can be deployed on Linux environments.Currently the server supports deployment on Centos7. [Docker deployment is recommended](RUN_IN_DOCKER.md). The rpc client supports deploymen on Centos7 and Ubuntu 18.On other systems or in environments where you do not want to install the serving module, you can still access the server-side prediction service through the http service.
You can choose to install the cpu or gpu version of the server module according to the requirements and machine environment, and install the client module on the client machine. When you want to access the server with http, there is not need to install client module.
```shell
pip install paddle_serving_server #cpu version server side
pip install paddle_serving_server_gpu #gpu version server side
pip install paddle_serving_client #client version
```
After simple preparation, we will take the IMDB review sentiment analysis task as an example to show the process from model training to deployment of prediction services. All the code in the example can be found in the [IMDB example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb) of the Paddle Serving code base, the data and dictionary used in the example The file can be obtained by executing the get_data.sh script in the IMDB sample code.
## Step2:Determine Tasks and Raw Data Format
IMDB review sentiment analysis task is to classify the content of movie reviews to determine whether the review is a positive review or a negative review.
First let's take a look at the raw data:
```
saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
```
This is a sample of English comments. The sample uses | as the separator. The content of the comment is before the separator. The label is the sample after the separator. 0 is the negative while 1 is the positive.
## Step3:Define Reader, divide training set and test set
For the original text we need to convert it to a numeric id that the neural network can use. The imdb_reader.py script defines the method of text idization, and the words are mapped to integers through the dictionary file imdb.vocab.
<details>
<summary>imdb_reader.py</summary>
```python
import sys
import os
import paddle
import re
import paddle.fluid.incubate.data_generator as dg
class IMDBDataset(dg.MultiSlotDataGenerator):
def load_resource(self, dictfile):
self._vocab = {}
wid = 0
with open(dictfile) as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
self._unk_id = len(self._vocab)
self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
def get_words_only(self, line):
sent = line.lower().replace("<br />", " ").strip()
words = [x for x in self._pattern.split(sent) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas
def get_words_and_label(self, line):
send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
" ").strip()
label = [int(line.split('|')[-1])]
words = [x for x in self._pattern.split(send) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas, label
def infer_reader(self, infer_filelist, batch, buf_size):
def local_iter():
for fname in infer_filelist:
with open(fname, "r") as fin:
for line in fin:
feas, label = self.get_words_and_label(line)
yield feas, label
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def memory_iter():
for i in range(1000):
yield self.return_value
def data_iter():
feas, label = self.get_words_and_label(line)
yield ("words", feas), ("label", label)
return data_iter
```
</details>
The sample after mapping is similar to the following format:
```
257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
```
In this way, the neural network can train the transformed text information as feature values.
## Step4:Define CNN network for training and saving
Net we use [CNN Model](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn) for training, in nets.py we define the network structure.
<details>
<summary>nets.py</summary>
```python
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
def cnn_net(data,
label,
dict_dim,
emb_dim=128,
hid_dim=128,
hid_dim2=96,
class_dim=2,
win_size=3):
""" conv net. """
emb = fluid.layers.embedding(
input=data, size=[dict_dim, emb_dim], is_sparse=True)
conv_3 = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=hid_dim,
filter_size=win_size,
act="tanh",
pool_type="max")
fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
```
</details>
Use training dataset for training. The training script is local_train.py. After training, use the paddle_serving_client.io.save_model function to save the model files and configuration files used by the servingdeployment.
<details>
<summary>local_train.py</summary>
```python
import os
import sys
import paddle
import logging
import paddle.fluid as fluid
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("fluid")
logger.setLevel(logging.INFO)
# load dict file
def load_vocab(filename):
vocab = {}
with open(filename) as f:
wid = 0
for line in f:
vocab[line.strip()] = wid
wid += 1
vocab["<unk>"] = len(vocab)
return vocab
if __name__ == "__main__":
from nets import cnn_net
model_name = "imdb_cnn"
vocab = load_vocab('imdb.vocab')
dict_dim = len(vocab)
#define model input
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
#define dataset,train_data is the dataset directory
dataset = fluid.DatasetFactory().create_dataset()
filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
dataset.set_use_var([data, label])
pipe_command = "python imdb_reader.py"
dataset.set_pipe_command(pipe_command)
dataset.set_batch_size(4)
dataset.set_filelist(filelist)
dataset.set_thread(10)
#define model
avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
optimizer = fluid.optimizer.SGD(learning_rate=0.001)
optimizer.minimize(avg_cost)
#execute training
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
epochs = 100
import paddle_serving_client.io as serving_io
for i in range(epochs):
exe.train_from_dataset(
program=fluid.default_main_program(), dataset=dataset, debug=False)
logger.info("TRAIN --> pass: {}".format(i))
if i == 64:
#At the end of training, use the model save interface in PaddleServing to save the models and configuration files required by Serving
serving_io.save_model("{}_model".format(model_name),
"{}_client_conf".format(model_name),
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
</details>
![Training process](./imdb_loss.png) As can be seen from the above figure, the loss of the model starts to converge after the 65th round. We save the model and configuration file after the 65th round of training is completed. The saved files are divided into imdb_cnn_client_conf and imdb_cnn_model folders. The former contains client-side configuration files, and the latter contains server-side configuration files and saved model files.
The parameter list of the save_model function is as follows:
| Parameter | Meaning |
| -------------------- | ------------------------------------------------------------ |
| server_model_folder | Directory for server-side configuration files and model files |
| client_config_folder | Directory for saving client configuration files |
| feed_var_dict | The input of the inference model. The dict type and key can be customized. The value is the input variable in the model. Each key corresponds to a variable. When using the prediction service, the input data uses the key as the input name. |
| fetch_var_dict | The output of the model used for prediction, dict type, key can be customized, value is the input variable in the model, and each key corresponds to a variable. When using the prediction service, use the key to get the returned data |
| main_program | Model's program |
## Step5: Deploy RPC Prediction Service
The Paddle Serving framework supports two types of prediction service methods. One is to communicate through RPC and the other is to communicate through HTTP. The deployment and use of RPC prediction service will be introduced first. The deployment and use of HTTP prediction service will be introduced at Step 8. .
```shell
python -m paddle_serving_server.serve --model imdb_cnn_model / --port 9292 #cpu prediction service
python -m paddle_serving_server_gpu.serve --model imdb_cnn_model / --port 9292 --gpu_ids 0 #gpu prediction service
```
The parameter --model in the command specifies the server-side model and configuration file directory previously saved, --port specifies the port of the prediction service. When deploying the gpu prediction service using the gpu version, you can use --gpu_ids to specify the gpu used.
After executing one of the above commands, the RPC prediction service deployment of the IMDB sentiment analysis task is completed.
## Step6: Reuse Reader, define remote RPC client
Below we access the RPC prediction service through Python code, the script is test_client.py
<details>
<summary>test_client.py</summary>
```python
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
import sys
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
#The code of the data preprocessing part is reused here to convert the original text into a numeric id
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource(sys.argv[2])
for line in sys.stdin:
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
</details>
The script receives data from standard input and prints out the probability that the sample whose infer result is 1 and its real label.
## Step7: Call the RPC service to test the model effect
The client implemented in the previous step runs the prediction service as an example. The usage method is as follows:
```shell
cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
```
Using 2084 samples in the test_data/part-0 file for test testing, the model prediction accuracy is 88.19%.
**Note**: The effect of each model training may be slightly different, and the accuracy of predictions using the trained model will be close to the examples but may not be exactly the same.
## Step8: Deploy HTTP Prediction Service
When using the HTTP prediction service, the client does not need to install any modules of Paddle Serving, it only needs to be able to send HTTP requests. Of course, the HTTP method consumes more time in the communication phase than the RPC method.
For the IMDB sentiment analysis task, the original text needs to be preprocessed before prediction. In the RPC prediction service, we put the preprocessing in the client's script, and in the HTTP prediction service, we put the preprocessing on the server. Paddle Serving's HTTP prediction service framework prepares data pre-processing and post-processing interfaces for this situation. We just need to rewrite it according to the needs of the task.
Serving provides sample code, which is obtained by executing the imdb_web_service_demo.sh script in [IMDB Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb).
Let's take a look at the script text_classify_service.py that starts the HTTP prediction service.
<details>
<summary>text_clssify_service.py</summary>
```python
from paddle_serving_server.web_service import WebService
from imdb_reader import IMDBDataset
import sys
#extend class WebService
class IMDBService(WebService):
def prepare_dict(self, args={}):
if len(args) == 0:
exit(-1)
self.dataset = IMDBDataset()
self.dataset.load_resource(args["dict_file_path"])
#rewrite preprocess() to implement data preprocessing, here we reuse reader script for training
def preprocess(self, feed={}, fetch=[]):
if "words" not in feed:
exit(-1)
res_feed = {}
res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
return res_feed, fetch
#Here you need to use the name parameter to specify the name of the prediction service.
imdb_service = IMDBService(name="imdb")
imdb_service.load_model_config(sys.argv[1])
imdb_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server()
```
</details>
run
```shell
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
In the above command, the first parameter is the saved server-side model and configuration file. The second parameter is the working directory, which will save some configuration files for the prediction service. The directory may not exist but needs to be specified. The prediction service will be created by itself. the third parameter is Port number, the fourth parameter is the dictionary file.
## Step9: Call the prediction service with plaintext data
After starting the HTTP prediction service, you can make prediction with a single command:
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
When the inference process is normal, the prediction probability is returned, as shown below.
```
{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
```
**Note**: The effect of each model training may be slightly different, and the inferred probability value using the trained model may not be consistent with the example.
# 端到端完成从训练到部署全流程
(简体中文|[English](./TRAIN_TO_SERVICE.md))
Paddle Serving是Paddle的高性能在线预测服务框架,可以灵活支持大多数模型的部署。本文中将以IMDB评论情感分析任务为例通过9步展示从模型的训练到部署预测服务的全流程。
## Step1:准备环境
Paddle Serving可以部署在Linux环境上,目前server端支持在Centos7上部署,推荐使用[Docker部署](RUN_IN_DOCKER_CN.md)。rpc client端可以在Centos7和Ubuntu18上部署,在其他系统上或者不希望安装serving模块的环境中仍然可以通过http服务来访问server端的预测服务。
可以根据需求和机器环境来选择安装cpu或gpu版本的server模块,在client端机器上安装client模块。使用http请求的方式来访问server时,client端机器不需要安装client模块。
```shell
pip install paddle_serving_server #cpu版本server端
pip install paddle_serving_server_gpu #gpu版本server端
pip install paddle_serving_client #client端
```
简单准备后,我们将以IMDB评论情感分析任务为例,展示从模型训练到部署预测服务的流程。示例中的所有代码都可以在Paddle Serving代码库的[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中找到,示例中使用的数据和词典文件可以通过执行IMDB示例代码中的get_data.sh脚本得到。
## Step2:确定任务和原始数据格式
IMDB评论情感分析任务是对电影评论的内容进行二分类,判断该评论是属于正面评论还是负面评论。
首先我们来看一下原始的数据:
```
saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
```
这是一条英文评论样本,样本中使用|作为分隔符,分隔符之前为评论的内容,分隔符之后是样本的标签,0代表负样本,即负面评论,1代表正样本,即正面评论。
## Step3:定义Reader,划分训练集、测试集
对于原始文本我们需要将它转化为神经网络可以使用的数字id。imdb_reader.py脚本中定义了文本id化的方法,通过词典文件imdb.vocab将单词映射为整形数。
<details>
<summary>imdb_reader.py</summary>
```python
import sys
import os
import paddle
import re
import paddle.fluid.incubate.data_generator as dg
class IMDBDataset(dg.MultiSlotDataGenerator):
def load_resource(self, dictfile):
self._vocab = {}
wid = 0
with open(dictfile) as f:
for line in f:
self._vocab[line.strip()] = wid
wid += 1
self._unk_id = len(self._vocab)
self._pattern = re.compile(r'(;|,|\.|\?|!|\s|\(|\))')
self.return_value = ("words", [1, 2, 3, 4, 5, 6]), ("label", [0])
def get_words_only(self, line):
sent = line.lower().replace("<br />", " ").strip()
words = [x for x in self._pattern.split(sent) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas
def get_words_and_label(self, line):
send = '|'.join(line.split('|')[:-1]).lower().replace("<br />",
" ").strip()
label = [int(line.split('|')[-1])]
words = [x for x in self._pattern.split(send) if x and x != " "]
feas = [
self._vocab[x] if x in self._vocab else self._unk_id for x in words
]
return feas, label
def infer_reader(self, infer_filelist, batch, buf_size):
def local_iter():
for fname in infer_filelist:
with open(fname, "r") as fin:
for line in fin:
feas, label = self.get_words_and_label(line)
yield feas, label
import paddle
batch_iter = paddle.batch(
paddle.reader.shuffle(
local_iter, buf_size=buf_size),
batch_size=batch)
return batch_iter
def generate_sample(self, line):
def memory_iter():
for i in range(1000):
yield self.return_value
def data_iter():
feas, label = self.get_words_and_label(line)
yield ("words", feas), ("label", label)
return data_iter
```
</details>
映射之后的样本类似于以下的格式:
```
257 142 52 898 7 0 12899 1083 824 122 89527 134 6 65 47 48 904 89527 13 0 87 170 8 248 9 15 4 25 1365 4360 89527 702 89527 1 89527 240 3 28 89527 19 7 0 216 219 614 89527 0 84 89527 225 3 0 15 67 2356 89527 0 498 117 2 314 282 7 38 1097 89527 1 0 174 181 38 11 71 198 44 1 3110 89527 454 89527 34 37 89527 0 15 5912 80 2 9856 7748 89527 8 421 80 9 15 14 55 2218 12 4 45 6 58 25 89527 154 119 224 41 0 151 89527 871 89527 505 89527 501 89527 29 2 773 211 89527 54 307 90 0 893 89527 9 407 4 25 2 614 15 46 89527 89527 71 8 1356 35 89527 12 0 89527 89527 89 527 577 374 3 39091 22950 1 3771 48900 95 371 156 313 89527 37 154 296 4 25 2 217 169 3 2759 7 0 15 89527 0 714 580 11 2094 559 34 0 84 539 89527 1 0 330 355 3 0 15 15607 935 80 0 5369 3 0 622 89527 2 15 36 9 2291 2 7599 6968 2449 89527 1 454 37 256 2 211 113 0 480 218 1152 700 4 1684 1253 352 10 2449 89527 39 4 1819 129 1 316 462 29 0 12957 3 6 28 89527 13 0 457 8952 7 225 89527 8 2389 0 1514 89527 1
```
这样神经网络就可以将转化后的文本信息作为特征值进行训练。
## Step4:定义CNN网络进行训练并保存
接下来我们使用[CNN模型](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn)来进行训练。在nets.py脚本中定义网络结构。
<details>
<summary>nets.py</summary>
```python
import sys
import time
import numpy as np
import paddle
import paddle.fluid as fluid
def cnn_net(data,
label,
dict_dim,
emb_dim=128,
hid_dim=128,
hid_dim2=96,
class_dim=2,
win_size=3):
""" conv net. """
emb = fluid.layers.embedding(
input=data, size=[dict_dim, emb_dim], is_sparse=True)
conv_3 = fluid.nets.sequence_conv_pool(
input=emb,
num_filters=hid_dim,
filter_size=win_size,
act="tanh",
pool_type="max")
fc_1 = fluid.layers.fc(input=[conv_3], size=hid_dim2)
prediction = fluid.layers.fc(input=[fc_1], size=class_dim, act="softmax")
cost = fluid.layers.cross_entropy(input=prediction, label=label)
avg_cost = fluid.layers.mean(x=cost)
acc = fluid.layers.accuracy(input=prediction, label=label)
return avg_cost, acc, prediction
```
</details>
使用训练样本进行训练,训练脚本为local_train.py。在训练结束后使用paddle_serving_client.io.save_model函数来保存部署预测服务使用的模型文件和配置文件。
<details>
<summary>local_train.py</summary>
```python
import os
import sys
import paddle
import logging
import paddle.fluid as fluid
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("fluid")
logger.setLevel(logging.INFO)
# 加载词典文件
def load_vocab(filename):
vocab = {}
with open(filename) as f:
wid = 0
for line in f:
vocab[line.strip()] = wid
wid += 1
vocab["<unk>"] = len(vocab)
return vocab
if __name__ == "__main__":
from nets import cnn_net
model_name = "imdb_cnn"
vocab = load_vocab('imdb.vocab')
dict_dim = len(vocab)
#定义模型输入
data = fluid.layers.data(
name="words", shape=[1], dtype="int64", lod_level=1)
label = fluid.layers.data(name="label", shape=[1], dtype="int64")
#定义dataset,train_data为训练数据目录
dataset = fluid.DatasetFactory().create_dataset()
filelist = ["train_data/%s" % x for x in os.listdir("train_data")]
dataset.set_use_var([data, label])
pipe_command = "python imdb_reader.py"
dataset.set_pipe_command(pipe_command)
dataset.set_batch_size(4)
dataset.set_filelist(filelist)
dataset.set_thread(10)
#定义模型
avg_cost, acc, prediction = cnn_net(data, label, dict_dim)
optimizer = fluid.optimizer.SGD(learning_rate=0.001)
optimizer.minimize(avg_cost)
#执行训练
exe = fluid.Executor(fluid.CPUPlace())
exe.run(fluid.default_startup_program())
epochs = 100
import paddle_serving_client.io as serving_io
for i in range(epochs):
exe.train_from_dataset(
program=fluid.default_main_program(), dataset=dataset, debug=False)
logger.info("TRAIN --> pass: {}".format(i))
if i == 64:
#在训练结束时使用PaddleServing中的模型保存接口保存出Serving所需的模型和配置文件
serving_io.save_model("{}_model".format(model_name),
"{}_client_conf".format(model_name),
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
</details>
![训练过程](./imdb_loss.png)由上图可以看出模型的损失在第65轮之后开始收敛,我们在第65轮训练完成后保存模型和配置文件。保存的文件分为imdb_cnn_client_conf和imdb_cnn_model文件夹,前者包含client端的配置文件,后者包含server端的配置文件和保存的模型文件。
save_model函数的参数列表如下:
| 参数 | 含义 |
| -------------------- | ------------------------------------------------------------ |
| server_model_folder | 保存server端配置文件和模型文件的目录 |
| client_config_folder | 保存client端配置文件的目录 |
| feed_var_dict | 用于预测的模型的输入,dict类型,key可以自定义,value为模型中的input variable,每个key对应一个variable,使用预测服务时,输入数据使用key作为输入的名称 |
| fetch_var_dict | 用于预测的模型的输出,dict类型,key可以自定义,value为模型中的input variable,每个key对应一个variable,使用预测服务时,通过key来获取返回数据 |
| main_program | 模型的program |
## Step5:部署RPC预测服务
Paddle Serving框架支持两种预测服务方式,一种是通过RPC进行通信,一种是通过HTTP进行通信,下面将先介绍RPC预测服务的部署和使用方法,在Step8开始介绍HTTP预测服务的部署和使用。
```shell
python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292 #cpu预测服务
python -m paddle_serving_server_gpu.serve --model imdb_cnn_model/ --port 9292 --gpu_ids 0 #gpu预测服务
```
命令中参数--model 指定在之前保存的server端的模型和配置文件目录,--port指定预测服务的端口,当使用gpu版本部署gpu预测服务时可以使用--gpu_ids指定使用的gpu 。
执行完以上命令之一,就完成了IMDB 情感分析任务的RPC预测服务部署。
## Step6:复用Reader,定义远程RPC客户端
下面我们通过Python代码来访问RPC预测服务,脚本为test_client.py
<details>
<summary>test_client.py</summary>
```python
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
import sys
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
#在这里复用了数据预处理部分的代码将原始文本转换成数字id
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource(sys.argv[2])
for line in sys.stdin:
word_ids, label = imdb_dataset.get_words_and_label(line)
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
</details>
脚本从标准输入接收数据,并打印出样本预测为1的概率与真实的label。
## Step7:调用RPC服务,测试模型效果
以上一步实现的客户端为例运行预测服务,使用方式如下:
```shell
cat test_data/part-0 | python test_client.py imdb_lstm_client_conf/serving_client_conf.prototxt imdb.vocab
```
使用test_data/part-0文件中的2084个样本进行测试测试,模型预测的准确率为88.19%。
**注意**:每次模型训练的效果可能略有不同,使用训练出的模型预测的准确率会与示例中接近但有可能不完全一致。
## Step8:部署HTTP预测服务
使用HTTP预测服务时,client端不需要安装Paddle Serving的任何模块,仅需要能发送HTTP请求即可。当然HTTP的通信方式会相较于RPC的通信方式在通信阶段消耗更多的时间。
对于IMDB情感分析任务原始文本在预测之前需要进行预处理,在RPC预测服务中我们将预处理放在client的脚本中,而在HTTP预测服务中我们将预处理放在server端。Paddle Serving的HTTP预测服务框架为这种情况准备了数据预处理和后处理的接口,我们只要根据任务需要重写即可。
Serving提供了示例代码,通过执行[IMDB示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb)中的imdb_web_service_demo.sh脚本来获取。
下面我们来看一下启动HTTP预测服务的脚本text_classify_service.py。
<details>
<summary>text_clssify_service.py</summary>
```python
from paddle_serving_server.web_service import WebService
from imdb_reader import IMDBDataset
import sys
#继承框架中的WebService类
class IMDBService(WebService):
def prepare_dict(self, args={}):
if len(args) == 0:
exit(-1)
self.dataset = IMDBDataset()
self.dataset.load_resource(args["dict_file_path"])
#重写preprocess方法来实现数据预处理,这里也复用了训练时使用的reader脚本
def preprocess(self, feed={}, fetch=[]):
if "words" not in feed:
exit(-1)
res_feed = {}
res_feed["words"] = self.dataset.get_words_only(feed["words"])[0]
return res_feed, fetch
#这里需要使用name参数指定预测服务的名称,
imdb_service = IMDBService(name="imdb")
imdb_service.load_model_config(sys.argv[1])
imdb_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server()
```
</details>
启动命令
```shell
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
以上命令中参数1为保存的server端模型和配置文件,参数2为工作目录会保存一些预测服务工作时的配置文件,该目录可以不存在但需要指定名称,预测服务会自行创建,参数3为端口号,参数4为词典文件。
## Step9:明文数据调用预测服务
启动完HTTP预测服务,即可通过一行命令进行预测:
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
预测流程正常时,会返回预测概率,示例如下。
```
{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
```
**注意**:每次模型训练的效果可能略有不同,使用训练出的模型预测概率数值可能与示例不一致。
......@@ -21,7 +21,7 @@ import paddle.fluid as fluid
logging.basicConfig(format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger("fluid")
logger.setLevel(logging.INFO)
paddle.enable_static()
def load_vocab(filename):
vocab = {}
......
# Fit a line prediction example
([简体中文](./README_CN.md)|English)
## Get data
```shell
sh get_data.sh
```
## RPC service
### Start server
```shell
python -m paddle_serving_server_gpu.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
```
### Client prediction
The `paddlepaddle` package is used in `test_client.py`, and you may need to download the corresponding package(`pip install paddlepaddle`).
``` shell
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
## HTTP service
### Start server
Start a web service with default web service hosting modules:
``` shell
python test_server.py
```
### Client prediction
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
# 线性回归预测服务示例
(简体中文|[English](./README.md))
## 获取数据
```shell
sh get_data.sh
```
## RPC服务
### 开启服务端
``` shell
python test_server.py uci_housing_model/
```
也可以通过下面的一行代码开启默认RPC服务:
```shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_lite --use_xpu --ir_optim
```
### 客户端预测
`test_client.py`中使用了`paddlepaddle`包,需要进行下载(`pip install paddlepaddle`)。
``` shell
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
## HTTP服务
### 开启服务端
通过下面的一行代码开启默认web服务:
``` shell
python test_server.py
```
### 客户端预测
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args
import time
import paddle
import sys
import requests
args = benchmark_args()
def single_func(idx, resource):
if args.request == "rpc":
client = Client()
client.load_client_config(args.model)
client.connect([args.endpoint])
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=1)
start = time.time()
for data in train_reader():
fetch_map = client.predict(feed={"x": data[0][0]}, fetch=["price"])
end = time.time()
return [[end - start]]
elif args.request == "http":
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=1)
start = time.time()
for data in train_reader():
r = requests.post(
'http://{}/uci/prediction'.format(args.endpoint),
data={"x": data[0]})
end = time.time()
return [[end - start]]
multi_thread_runner = MultiThreadRunner()
result = multi_thread_runner.run(single_func, args.thread, {})
print(result)
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import sys
import paddle
import paddle.fluid as fluid
paddle.enable_static()
train_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.train(), buf_size=500),
batch_size=16)
test_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500),
batch_size=16)
x = fluid.data(name='x', shape=[None, 13], dtype='float32')
y = fluid.data(name='y', shape=[None, 1], dtype='float32')
y_predict = fluid.layers.fc(input=x, size=1, act=None)
cost = fluid.layers.square_error_cost(input=y_predict, label=y)
avg_loss = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.01)
sgd_optimizer.minimize(avg_loss)
place = fluid.CPUPlace()
feeder = fluid.DataFeeder(place=place, feed_list=[x, y])
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
import paddle_serving_client.io as serving_io
for pass_id in range(30):
for data_train in train_reader():
avg_loss_value, = exe.run(fluid.default_main_program(),
feed=feeder.feed(data_train),
fetch_list=[avg_loss])
serving_io.save_model("uci_housing_model", "uci_housing_client", {"x": x},
{"price": y_predict}, fluid.default_main_program())
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
import sys
import numpy as np
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9393"])
import paddle
test_reader = paddle.batch(
paddle.reader.shuffle(
paddle.dataset.uci_housing.test(), buf_size=500),
batch_size=1)
for data in test_reader():
new_data = np.zeros((1, 1, 13)).astype("float32")
new_data[0] = data[0][0]
fetch_map = client.predict(
feed={"x": new_data}, fetch=["price"], batch=True)
print(fetch_map)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
import paddle
import numpy as np
def single_func(idx, resource):
client = Client()
client.load_client_config(
"./uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
0.6283, 0.4919, 0.1856, 0.0795, -0.0332
]
x = np.array(x)
for i in range(1000):
fetch_map = client.predict(feed={"x": x}, fetch=["price"])
if fetch_map is None:
return [[None]]
return [[0]]
multi_thread_runner = MultiThreadRunner()
thread_num = 4
result = multi_thread_runner.run(single_func, thread_num, {})
if None in result[0]:
exit(1)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_server_gpu.web_service import WebService
import numpy as np
class UciService(WebService):
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
is_batch = True
new_data = np.zeros((len(feed), 1, 13)).astype("float32")
for i, ins in enumerate(feed):
nums = np.array(ins["x"]).reshape(1, 1, 13)
new_data[i] = nums
feed = {"x": new_data}
return feed, fetch, is_batch
uci_service = UciService(name="uci")
uci_service.load_model_config("uci_housing_model")
uci_service.prepare_server(workdir="workdir", port=9393, use_lite=True, use_xpu=True, ir_optim=True)
uci_service.run_rpc_service()
uci_service.run_web_service()
# Image Classification
## Get Model
```
python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
tar -xzvf resnet_v2_50_imagenet.tar.gz
```
## RPC Service
### Start Service
```
python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --port 9393 --use_lite --use_xpu --ir_optim
```
### Client Prediction
```
python resnet50_v2_client.py
```
# 图像分类
## 获取模型
```
python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
tar -xzvf resnet_v2_50_imagenet.tar.gz
```
## RPC 服务
### 启动服务端
```
python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --port 9393 --use_lite --use_xpu --ir_optim
```
### 客户端预测
```
python resnet50_v2_client.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
from paddle_serving_app.local_predict import LocalPredictor
import sys
predictor = LocalPredictor()
predictor.load_model_config(sys.argv[1], use_lite=True, use_xpu=True, ir_optim=True)
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = predictor.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
......@@ -33,7 +33,7 @@ if '${PACK}' == 'ON':
REQUIRED_PACKAGES = [
'six >= 1.10.0', 'sentencepiece<=0.1.83', 'opencv-python<=4.2.0.32', 'pillow',
'pyclipper'
'pyclipper', 'shapely'
]
packages=['paddle_serving_app',
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册