提交 819519ec 编写于 作者: M MRXLT

fix ce script

......@@ -85,6 +85,17 @@ include(generic)
include(flags)
endif()
if (APP)
include(external/zlib)
include(external/boost)
include(external/protobuf)
include(external/gflags)
include(external/glog)
include(external/pybind11)
include(external/python)
include(generic)
endif()
if (SERVER)
include(external/cudnn)
include(paddlepaddle)
......
([简体中文](./README_CN.md)|English)
<p align="center">
<br>
<img src='doc/serving_logo.png' width = "600" height = "130">
<br>
<p>
<p align="center">
<br>
<a href="https://travis-ci.com/PaddlePaddle/Serving">
......@@ -23,28 +26,20 @@ We consider deploying deep learning inference service online to be a user-facing
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">Some Key Features</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
- **Extensible framework design** which can support model serving beyond Paddle.
<h2 align="center">Installation</h2>
We **highly recommend** you to **run Paddle Serving in Docker**, please visit [Run in Docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md)
```
# Run CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it test bash
```
```
# Run GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
nvidia-docker exec -it test bash
```
......@@ -56,10 +51,44 @@ pip install paddle-serving-server-gpu # GPU
You may need to use a domestic mirror source (in China, you can use the Tsinghua mirror source, add `-i https://pypi.tuna.tsinghua.edu.cn/simple` to pip command) to speed up the download.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
<h2 align="center"> Pre-built services with Paddle Serving</h2>
<h3 align="center">Chinese Word Segmentation</h4>
``` shell
> python -m paddle_serving_app.package -get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py 9292 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}
```
<h3 align="center">Image Classification</h4>
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
> python -m paddle_serving_app.package -get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
```
<h2 align="center">Quick Start Example</h2>
This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
### Boston House Price Prediction model
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
......@@ -82,7 +111,9 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `port` | int | `9292` | Exposed port of current service to users|
| `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim` | bool | `False` | Enable memory optimization |
| `mem_optim` | bool | `False` | Enable memory / graphic memory optimization |
| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
</center>
......@@ -113,138 +144,13 @@ print(fetch_map)
```
Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
<h2 align="center"> Pre-built services with Paddle Serving</h2>
<h3 align="center">Chinese Word Segmentation</h4>
- **Description**:
``` shell
Chinese word segmentation HTTP service that can be deployed with one line command.
```
- **Download Servable Package**:
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
```
- **Host web service**:
``` shell
tar -xzf lac_model_jieba_web.tar.gz
python lac_web_service.py jieba_server_model/ lac_workdir 9292
```
- **Request sample**:
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
```
- **Request result**:
``` shell
{"word_seg":"我|爱|北京|天安门"}
```
<h3 align="center">Image Classification</h4>
- **Description**:
``` shell
Image classification trained with Imagenet dataset. A label and corresponding probability will be returned.
Note: This demo needs paddle-serving-server-gpu.
```
- **Download Servable Package**:
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
```
- **Host web service**:
``` shell
tar -xzf imagenet_demo.tar.gz
python image_classification_service_demo.py resnet50_serving_model
```
- **Request sample**:
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
```
- **Request result**:
``` shell
{"label":"daisy","prob":0.9341403245925903}
```
<h3 align="center">More Demos</h3>
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | Bert-Base-Baike |
| URL | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
| Description | Get semantic representation from a Chinese Sentence |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | Resnet50-Imagenet |
| URL | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| Description | Get image semantic representation from an image |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | Resnet101-Imagenet |
| URL | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| Description | Get image semantic representation from an image |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | CNN-IMDB |
| URL | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| Description | Get category probability from an English Sentence |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | LSTM-IMDB |
| URL | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| Description | Get category probability from an English Sentence |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | BOW-IMDB |
| URL | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| Description | Get category probability from an English Sentence |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | Jieba-LAC |
| URL | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
| Description | Get word segmentation from a Chinese Sentence |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| Model Name | DNN-CTR |
| URL | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz |
| Client/Server Code | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
| Description | Get click probability from a feature vector of item |
<h2 align="center">Some Key Features of Paddle Serving</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
<h2 align="center">Document</h2>
......@@ -259,11 +165,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
- [How to develop a new Web Service?](doc/NEW_WEB_SERVICE.md)
- [Golang client](doc/IMDB_GO_CLIENT.md)
- [Compile from source code](doc/COMPILE.md)
- [Deploy Web Service with uWSGI](doc/UWSGI_DEPLOY.md)
- [Hot loading for model file](doc/HOT_LOADING_IN_SERVING.md)
### About Efficiency
- [How to profile Paddle Serving latency?](python/examples/util)
- [How to optimize performance?(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [Deploy multi-services on one GPU(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
- [How to optimize performance?(Chinese)](doc/PERFORMANCE_OPTIM_CN.md)
- [Deploy multi-services on one GPU(Chinese)](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
......
(简体中文|[English](./README.md))
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130">
<br>
<p>
<p align="center">
<br>
<a href="https://travis-ci.com/PaddlePaddle/Serving">
......@@ -24,14 +27,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">核心功能</h2>
- 与Paddle训练紧密连接,绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理,在线加载,在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端,例如Golang,C++和Python.
- **可伸缩框架设计** 可支持不限于Paddle的模型服务.
<h2 align="center">安装</h2>
......@@ -39,14 +35,14 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
```
# 启动 CPU Docker
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker pull hub.baidubce.com/paddlepaddle/serving:latest
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it test bash
```
```
# 启动 GPU Docker
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
nvidia-docker exec -it test bash
```
```shell
......@@ -57,9 +53,42 @@ pip install paddle-serving-server-gpu # GPU
您可能需要使用国内镜像源(例如清华源, 在pip命令中添加`-i https://pypi.tuna.tsinghua.edu.cn/simple`)来加速下载。
如果需要使用develop分支编译的安装包,请从[最新安装包列表](./doc/LATEST_PACKAGES.md)中获取下载地址进行下载,使用`pip install`命令进行安装。
客户端安装包支持Centos 7和Ubuntu 18,或者您可以使用HTTP服务,这种情况下不需要安装客户端。
<h2 align="center">快速启动示例</h2>
<h2 align="center"> Paddle Serving预装的服务 </h2>
<h3 align="center">中文分词</h4>
``` shell
> python -m paddle_serving_app.package -get_model lac
> tar -xzf lac.tar.gz
> python lac_web_service.py 9292 &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9393/lac/prediction
{"result":[{"word_seg":"我|爱|北京|天安门"}]}
```
<h3 align="center">图像分类</h4>
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
> python -m paddle_serving_app.package -get_model resnet_v2_50_imagenet
> tar -xzf resnet_v2_50_imagenet.tar.gz
> python resnet50_imagenet_classify.py resnet50_serving_model &
> curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
{"result":{"label":["daisy"],"prob":[0.9341403245925903]}}
```
<h2 align="center">快速开始示例</h2>
这个快速开始示例主要是为了给那些已经有一个要部署的模型的用户准备的,而且我们也提供了一个可以用来部署的模型。如果您想知道如何从离线训练到在线服务走完全流程,请参考[从训练到部署](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE_CN.md)
<h3 align="center">波士顿房价预测</h3>
......@@ -87,6 +116,8 @@ python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --po
| `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served |
| `mem_optim` | bool | `False` | Enable memory optimization |
| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求,请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
</center>
......@@ -118,139 +149,13 @@ print(fetch_map)
```
在这里,`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict`。 `fetch`被要从服务器返回的预测变量赋值。 在该示例中,在训练过程中保存可服务模型时,被赋值的tensor名为`"x"`和`"price"`
<h2 align="center">Paddle Serving预装的服务</h2>
<h3 align="center">中文分词模型</h4>
- **介绍**:
``` shell
本示例为中文分词HTTP服务一键部署
```
- **下载服务包**:
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
```
- **启动web服务**:
``` shell
tar -xzf lac_model_jieba_web.tar.gz
python lac_web_service.py jieba_server_model/ lac_workdir 9292
```
- **客户端请求示例**:
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "我爱北京天安门"}], "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
```
- **返回结果示例**:
``` shell
{"word_seg":"我|爱|北京|天安门"}
```
<h3 align="center">图像分类模型</h4>
- **介绍**:
``` shell
图像分类模型由Imagenet数据集训练而成,该服务会返回一个标签及其概率
注意:本示例需要安装paddle-serving-server-gpu
```
- **下载服务包**:
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
```
- **启动web服务**:
``` shell
tar -xzf imagenet_demo.tar.gz
python image_classification_service_demo.py resnet50_serving_model
```
- **客户端请求示例**:
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
```
- **返回结果示例**:
``` shell
{"label":"daisy","prob":0.9341403245925903}
```
<h3 align="center">更多示例</h3>
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Bert-Base-Baike |
| 下载链接 | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
| 介绍 | 获得一个中文语句的语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Resnet50-Imagenet |
| 下载链接 | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| 介绍 | 获得一张图片的图像语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Resnet101-Imagenet |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| 介绍 | 获得一张图片的图像语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | CNN-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个中文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | LSTM-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个英文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | BOW-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个英文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Jieba-LAC |
| 下载链接 | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
| 介绍 | 获取中文语句的分词 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | DNN-CTR |
| 下载链接 | https://paddle-serving.bj.bcebos.com/criteo_ctr_example/criteo_ctr_demo_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
| 介绍 | 从项目的特征向量中获得点击概率 |
<h2 align="center">Paddle Serving的核心功能</h2>
- 与Paddle训练紧密连接,绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理,在线加载,在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
- 支持客户端和服务端之间 **高并发和高效通信**.
- 支持 **多种编程语言** 开发客户端,例如Golang,C++和Python.
<h2 align="center">文档</h2>
......@@ -265,11 +170,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"url": "https://pa
- [如何开发一个新的Web Service?](doc/NEW_WEB_SERVICE_CN.md)
- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
- [如何编译PaddleServing?](doc/COMPILE_CN.md)
- [如何使用uWSGI部署Web Service](doc/UWSGI_DEPLOY_CN.md)
- [如何实现模型文件热加载](doc/HOT_LOADING_IN_SERVING_CN.md)
### 关于Paddle Serving性能
- [如何测试Paddle Serving性能?](python/examples/util/)
- [如何优化性能?](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [在一张GPU上启动多个预测服务](doc/PERFORMANCE_OPTIM_CN.md)
- [如何优化性能?](doc/PERFORMANCE_OPTIM_CN.md)
- [在一张GPU上启动多个预测服务](doc/MULTI_SERVICE_ON_ONE_GPU_CN.md)
- [CPU版Benchmarks](doc/BENCHMARKING.md)
- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
......
......@@ -31,7 +31,7 @@ message( "WITH_GPU = ${WITH_GPU}")
# Paddle Version should be one of:
# latest: latest develop build
# version number like 1.5.2
SET(PADDLE_VERSION "1.7.1")
SET(PADDLE_VERSION "1.7.2")
if (WITH_GPU)
SET(PADDLE_LIB_VERSION "${PADDLE_VERSION}-gpu-cuda${CUDA_VERSION_MAJOR}-cudnn7-avx-mkl")
......
......@@ -23,6 +23,11 @@ add_subdirectory(pdcodegen)
add_subdirectory(sdk-cpp)
endif()
if (APP)
add_subdirectory(configure)
endif()
if(CLIENT)
add_subdirectory(general-client)
endif()
......
if (SERVER OR CLIENT)
LIST(APPEND protofiles
${CMAKE_CURRENT_LIST_DIR}/proto/server_configure.proto
${CMAKE_CURRENT_LIST_DIR}/proto/sdk_configure.proto
......@@ -28,6 +29,7 @@ FILE(GLOB inc ${CMAKE_CURRENT_BINARY_DIR}/*.pb.h)
install(FILES ${inc}
DESTINATION ${PADDLE_SERVING_INSTALL_DIR}/include/configure)
endif()
py_proto_compile(general_model_config_py_proto SRCS proto/general_model_config.proto)
add_custom_target(general_model_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
......@@ -51,6 +53,14 @@ add_custom_command(TARGET general_model_config_py_proto POST_BUILD
endif()
if (APP)
add_custom_command(TARGET general_model_config_py_proto POST_BUILD
COMMAND ${CMAKE_COMMAND} -E make_directory ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/proto
COMMAND cp *.py ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_app/proto
COMMENT "Copy generated general_model_config proto file into directory paddle_serving_app/proto."
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR})
endif()
if (SERVER)
py_proto_compile(server_config_py_proto SRCS proto/server_configure.proto)
add_custom_target(server_config_py_proto_init ALL COMMAND ${CMAKE_COMMAND} -E touch __init__.py)
......
......@@ -43,6 +43,7 @@ message EngineDesc {
optional bool enable_memory_optimization = 13;
optional bool static_optimization = 14;
optional bool force_update_static_cache = 15;
optional bool enable_ir_optimization = 16;
};
// model_toolkit conf
......
......@@ -83,9 +83,6 @@ func JsonReq(method, requrl string, timeout int, kv *map[string]string,
}
func GetHdfsMeta(src string) (master, ugi, path string, err error) {
//src = "hdfs://root:rootpasst@st1-inf-platform0.st01.baidu.com:54310/user/mis_user/news_dnn_ctr_cube_1/1501836820/news_dnn_ctr_cube_1_part54.tar"
//src = "hdfs://st1-inf-platform0.st01.baidu.com:54310/user/mis_user/news_dnn_ctr_cube_1/1501836820/news_dnn_ctr_cube_1_part54.tar"
ugiBegin := strings.Index(src, "//")
ugiPos := strings.LastIndex(src, "@")
if ugiPos != -1 && ugiBegin != -1 {
......
if(CLIENT)
add_subdirectory(pybind11)
pybind11_add_module(serving_client src/general_model.cpp src/pybind_general_model.cpp)
target_link_libraries(serving_client PRIVATE -Wl,--whole-archive utils sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz)
target_link_libraries(serving_client PRIVATE -Wl,--whole-archive utils sdk-cpp pybind python -Wl,--no-whole-archive -lpthread -lcrypto -lm -lrt -lssl -ldl -lz -Wl,-rpath,'$ORIGIN'/lib)
endif()
......@@ -69,15 +69,27 @@ class ModelRes {
const std::vector<int64_t>& get_int64_by_name(const std::string& name) {
return _int64_value_map[name];
}
std::vector<int64_t>&& get_int64_by_name_with_rv(const std::string& name) {
return std::move(_int64_value_map[name]);
}
const std::vector<float>& get_float_by_name(const std::string& name) {
return _float_value_map[name];
}
const std::vector<int>& get_shape(const std::string& name) {
std::vector<float>&& get_float_by_name_with_rv(const std::string& name) {
return std::move(_float_value_map[name]);
}
const std::vector<int>& get_shape_by_name(const std::string& name) {
return _shape_map[name];
}
const std::vector<int>& get_lod(const std::string& name) {
std::vector<int>&& get_shape_by_name_with_rv(const std::string& name) {
return std::move(_shape_map[name]);
}
const std::vector<int>& get_lod_by_name(const std::string& name) {
return _lod_map[name];
}
std::vector<int>&& get_lod_by_name_with_rv(const std::string& name) {
return std::move(_lod_map[name]);
}
void set_engine_name(const std::string& engine_name) {
_engine_name = engine_name;
}
......@@ -121,17 +133,33 @@ class PredictorRes {
const std::string& name) {
return _models[model_idx].get_int64_by_name(name);
}
std::vector<int64_t>&& get_int64_by_name_with_rv(const int model_idx,
const std::string& name) {
return std::move(_models[model_idx].get_int64_by_name_with_rv(name));
}
const std::vector<float>& get_float_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_float_by_name(name);
}
const std::vector<int>& get_shape(const int model_idx,
const std::string& name) {
return _models[model_idx].get_shape(name);
std::vector<float>&& get_float_by_name_with_rv(const int model_idx,
const std::string& name) {
return std::move(_models[model_idx].get_float_by_name_with_rv(name));
}
const std::vector<int>& get_shape_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_shape_by_name(name);
}
const std::vector<int>&& get_shape_by_name_with_rv(const int model_idx,
const std::string& name) {
return std::move(_models[model_idx].get_shape_by_name_with_rv(name));
}
const std::vector<int>& get_lod_by_name(const int model_idx,
const std::string& name) {
return _models[model_idx].get_lod_by_name(name);
}
const std::vector<int>& get_lod(const int model_idx,
const std::string& name) {
return _models[model_idx].get_lod(name);
const std::vector<int>&& get_lod_by_name_with_rv(const int model_idx,
const std::string& name) {
return std::move(_models[model_idx].get_lod_by_name_with_rv(name));
}
void add_model_res(ModelRes&& res) {
_engine_names.push_back(res.engine_name());
......
......@@ -258,9 +258,10 @@ int PredictorClient::batch_predict(
ModelRes model;
model.set_engine_name(output.engine_name());
int idx = 0;
for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name];
int idx = 0;
int shape_size = output.insts(0).tensor_array(idx).shape_size();
VLOG(2) << "fetch var " << name << " index " << idx << " shape size "
<< shape_size;
......@@ -279,9 +280,9 @@ int PredictorClient::batch_predict(
idx += 1;
}
idx = 0;
for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name];
int idx = 0;
if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int";
model._int64_value_map[name].resize(
......@@ -345,7 +346,7 @@ int PredictorClient::numpy_predict(
PredictorRes &predict_res_batch,
const int &pid) {
int batch_size = std::max(float_feed_batch.size(), int_feed_batch.size());
VLOG(2) << "batch size: " << batch_size;
predict_res_batch.clear();
Timer timeline;
int64_t preprocess_start = timeline.TimeStampUS();
......@@ -462,7 +463,7 @@ int PredictorClient::numpy_predict(
for (ssize_t j = 0; j < int_array.shape(1); j++) {
for (ssize_t k = 0; k < int_array.shape(2); k++) {
for (ssize_t l = 0; k < int_array.shape(3); l++) {
tensor->add_float_data(int_array(i, j, k, l));
tensor->add_int64_data(int_array(i, j, k, l));
}
}
}
......@@ -474,7 +475,7 @@ int PredictorClient::numpy_predict(
for (ssize_t i = 0; i < int_array.shape(0); i++) {
for (ssize_t j = 0; j < int_array.shape(1); j++) {
for (ssize_t k = 0; k < int_array.shape(2); k++) {
tensor->add_float_data(int_array(i, j, k));
tensor->add_int64_data(int_array(i, j, k));
}
}
}
......@@ -484,7 +485,7 @@ int PredictorClient::numpy_predict(
auto int_array = int_feed[vec_idx].unchecked<2>();
for (ssize_t i = 0; i < int_array.shape(0); i++) {
for (ssize_t j = 0; j < int_array.shape(1); j++) {
tensor->add_float_data(int_array(i, j));
tensor->add_int64_data(int_array(i, j));
}
}
break;
......@@ -492,7 +493,7 @@ int PredictorClient::numpy_predict(
case 1: {
auto int_array = int_feed[vec_idx].unchecked<1>();
for (ssize_t i = 0; i < int_array.shape(0); i++) {
tensor->add_float_data(int_array(i));
tensor->add_int64_data(int_array(i));
}
break;
}
......@@ -536,9 +537,9 @@ int PredictorClient::numpy_predict(
ModelRes model;
model.set_engine_name(output.engine_name());
int idx = 0;
for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name];
int idx = 0;
int shape_size = output.insts(0).tensor_array(idx).shape_size();
VLOG(2) << "fetch var " << name << " index " << idx << " shape size "
<< shape_size;
......@@ -557,9 +558,10 @@ int PredictorClient::numpy_predict(
idx += 1;
}
idx = 0;
for (auto &name : fetch_name) {
// int idx = _fetch_name_to_idx[name];
int idx = 0;
if (_fetch_name_to_type[name] == 0) {
VLOG(2) << "ferch var " << name << "type int";
model._int64_value_map[name].resize(
......
......@@ -32,24 +32,41 @@ PYBIND11_MODULE(serving_client, m) {
.def(py::init())
.def("get_int64_by_name",
[](PredictorRes &self, int model_idx, std::string &name) {
return self.get_int64_by_name(model_idx, name);
},
py::return_value_policy::reference)
// see more: https://github.com/pybind/pybind11/issues/1042
std::vector<int64_t> *ptr = new std::vector<int64_t>(
std::move(self.get_int64_by_name_with_rv(model_idx, name)));
auto capsule = py::capsule(ptr, [](void *p) {
delete reinterpret_cast<std::vector<int64_t> *>(p);
});
return py::array(ptr->size(), ptr->data(), capsule);
})
.def("get_float_by_name",
[](PredictorRes &self, int model_idx, std::string &name) {
return self.get_float_by_name(model_idx, name);
},
py::return_value_policy::reference)
std::vector<float> *ptr = new std::vector<float>(
std::move(self.get_float_by_name_with_rv(model_idx, name)));
auto capsule = py::capsule(ptr, [](void *p) {
delete reinterpret_cast<std::vector<float> *>(p);
});
return py::array(ptr->size(), ptr->data(), capsule);
})
.def("get_shape",
[](PredictorRes &self, int model_idx, std::string &name) {
return self.get_shape(model_idx, name);
},
py::return_value_policy::reference)
std::vector<int> *ptr = new std::vector<int>(
std::move(self.get_shape_by_name_with_rv(model_idx, name)));
auto capsule = py::capsule(ptr, [](void *p) {
delete reinterpret_cast<std::vector<int> *>(p);
});
return py::array(ptr->size(), ptr->data(), capsule);
})
.def("get_lod",
[](PredictorRes &self, int model_idx, std::string &name) {
return self.get_lod(model_idx, name);
},
py::return_value_policy::reference)
std::vector<int> *ptr = new std::vector<int>(
std::move(self.get_lod_by_name_with_rv(model_idx, name)));
auto capsule = py::capsule(ptr, [](void *p) {
delete reinterpret_cast<std::vector<int> *>(p);
});
return py::array(ptr->size(), ptr->data(), capsule);
})
.def("variant_tag", [](PredictorRes &self) { return self.variant_tag(); })
.def("get_engine_names",
[](PredictorRes &self) { return self.get_engine_names(); });
......@@ -100,7 +117,8 @@ PYBIND11_MODULE(serving_client, m) {
fetch_name,
predict_res_batch,
pid);
})
},
py::call_guard<py::gil_scoped_release>())
.def("numpy_predict",
[](PredictorClient &self,
const std::vector<std::vector<py::array_t<float>>>
......
......@@ -131,7 +131,7 @@ int GeneralReaderOp::inference() {
lod_tensor.dtype = paddle::PaddleDType::FLOAT32;
}
if (req->insts(0).tensor_array(i).shape(0) == -1) {
if (model_config->_is_lod_feed[i]) {
lod_tensor.lod.resize(1);
lod_tensor.lod[0].push_back(0);
VLOG(2) << "var[" << i << "] is lod_tensor";
......@@ -153,6 +153,7 @@ int GeneralReaderOp::inference() {
// specify the memory needed for output tensor_vector
for (int i = 0; i < var_num; ++i) {
if (out->at(i).lod.size() == 1) {
int tensor_size = 0;
for (int j = 0; j < batch_size; ++j) {
const Tensor &tensor = req->insts(j).tensor_array(i);
int data_len = 0;
......@@ -162,15 +163,28 @@ int GeneralReaderOp::inference() {
data_len = tensor.float_data_size();
}
VLOG(2) << "tensor size for var[" << i << "]: " << data_len;
tensor_size += data_len;
int cur_len = out->at(i).lod[0].back();
VLOG(2) << "current len: " << cur_len;
out->at(i).lod[0].push_back(cur_len + data_len);
VLOG(2) << "new len: " << cur_len + data_len;
int sample_len = 0;
if (tensor.shape_size() == 1) {
sample_len = data_len;
} else {
sample_len = tensor.shape(0);
}
out->at(i).lod[0].push_back(cur_len + sample_len);
VLOG(2) << "new len: " << cur_len + sample_len;
}
out->at(i).data.Resize(tensor_size * elem_size[i]);
out->at(i).shape = {out->at(i).lod[0].back()};
for (int j = 1; j < req->insts(0).tensor_array(i).shape_size(); ++j) {
out->at(i).shape.push_back(req->insts(0).tensor_array(i).shape(j));
}
if (out->at(i).shape.size() == 1) {
out->at(i).shape.push_back(1);
}
out->at(i).data.Resize(out->at(i).lod[0].back() * elem_size[i]);
out->at(i).shape = {out->at(i).lod[0].back(), 1};
VLOG(2) << "var[" << i
<< "] is lod_tensor and len=" << out->at(i).lod[0].back();
} else {
......
......@@ -15,8 +15,10 @@
#include "core/general-server/op/general_response_op.h"
#include <algorithm>
#include <iostream>
#include <map>
#include <memory>
#include <sstream>
#include <utility>
#include "core/general-server/op/general_infer_helper.h"
#include "core/predictor/framework/infer.h"
#include "core/predictor/framework/memory.h"
......@@ -86,17 +88,20 @@ int GeneralResponseOp::inference() {
// To get the order of model return values
output->set_engine_name(pre_name);
FetchInst *fetch_inst = output->add_insts();
for (auto &idx : fetch_index) {
Tensor *tensor = fetch_inst->add_tensor_array();
tensor->set_elem_type(1);
if (model_config->_is_lod_fetch[idx]) {
VLOG(2) << "out[" << idx << "] is lod_tensor";
VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx]
<< " is lod_tensor";
for (int k = 0; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]);
}
} else {
VLOG(2) << "out[" << idx << "] is tensor";
VLOG(2) << "out[" << idx << "] " << model_config->_fetch_name[idx]
<< " is tensor";
for (int k = 0; k < in->at(idx).shape.size(); ++k) {
VLOG(2) << "shape[" << k << "]: " << in->at(idx).shape[k];
tensor->add_shape(in->at(idx).shape[k]);
......@@ -111,6 +116,8 @@ int GeneralResponseOp::inference() {
cap *= in->at(idx).shape[j];
}
if (in->at(idx).dtype == paddle::PaddleDType::INT64) {
VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx]
<< "].";
int64_t *data_ptr = static_cast<int64_t *>(in->at(idx).data.data());
if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = output->mutable_insts(0);
......@@ -127,8 +134,11 @@ int GeneralResponseOp::inference() {
fetch_p->mutable_tensor_array(var_idx)->add_int64_data(data_ptr[j]);
}
}
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
var_idx++;
} else if (in->at(idx).dtype == paddle::PaddleDType::FLOAT32) {
VLOG(2) << "Prepare float var [" << model_config->_fetch_name[idx]
<< "].";
float *data_ptr = static_cast<float *>(in->at(idx).data.data());
if (model_config->_is_lod_fetch[idx]) {
FetchInst *fetch_p = output->mutable_insts(0);
......@@ -145,6 +155,7 @@ int GeneralResponseOp::inference() {
fetch_p->mutable_tensor_array(var_idx)->add_float_data(data_ptr[j]);
}
}
VLOG(2) << "fetch var [" << model_config->_fetch_name[idx] << "] ready";
var_idx++;
}
}
......
......@@ -35,6 +35,7 @@ class InferEngineCreationParams {
InferEngineCreationParams() {
_path = "";
_enable_memory_optimization = false;
_enable_ir_optimization = false;
_static_optimization = false;
_force_update_static_cache = false;
}
......@@ -45,10 +46,16 @@ class InferEngineCreationParams {
_enable_memory_optimization = enable_memory_optimization;
}
void set_enable_ir_optimization(bool enable_ir_optimization) {
_enable_ir_optimization = enable_ir_optimization;
}
bool enable_memory_optimization() const {
return _enable_memory_optimization;
}
bool enable_ir_optimization() const { return _enable_ir_optimization; }
void set_static_optimization(bool static_optimization = false) {
_static_optimization = static_optimization;
}
......@@ -68,6 +75,7 @@ class InferEngineCreationParams {
<< "model_path = " << _path << ", "
<< "enable_memory_optimization = " << _enable_memory_optimization
<< ", "
<< "enable_ir_optimization = " << _enable_ir_optimization << ", "
<< "static_optimization = " << _static_optimization << ", "
<< "force_update_static_cache = " << _force_update_static_cache;
}
......@@ -75,6 +83,7 @@ class InferEngineCreationParams {
private:
std::string _path;
bool _enable_memory_optimization;
bool _enable_ir_optimization;
bool _static_optimization;
bool _force_update_static_cache;
};
......@@ -150,6 +159,11 @@ class ReloadableInferEngine : public InferEngine {
force_update_static_cache = conf.force_update_static_cache();
}
if (conf.has_enable_ir_optimization()) {
_infer_engine_params.set_enable_ir_optimization(
conf.enable_ir_optimization());
}
_infer_engine_params.set_path(_model_data_path);
if (enable_memory_optimization) {
_infer_engine_params.set_enable_memory_optimization(true);
......
......@@ -22,23 +22,23 @@ namespace baidu {
namespace paddle_serving {
namespace sdk_cpp {
#define PARSE_CONF_ITEM(conf, item, name, fail) \
do { \
if (conf.has_##name()) { \
item.set(conf.name()); \
} else { \
LOG(ERROR) << "Not found key in configue: " << #name; \
} \
#define PARSE_CONF_ITEM(conf, item, name, fail) \
do { \
if (conf.has_##name()) { \
item.set(conf.name()); \
} else { \
VLOG(2) << "Not found key in configue: " << #name; \
} \
} while (0)
#define ASSIGN_CONF_ITEM(dest, src, fail) \
do { \
if (!src.init) { \
LOG(ERROR) << "Cannot assign an unintialized item: " << #src \
<< " to dest: " << #dest; \
return fail; \
} \
dest = src.value; \
#define ASSIGN_CONF_ITEM(dest, src, fail) \
do { \
if (!src.init) { \
VLOG(2) << "Cannot assign an unintialized item: " << #src \
<< " to dest: " << #dest; \
return fail; \
} \
dest = src.value; \
} while (0)
template <typename T>
......
......@@ -21,7 +21,7 @@ The following Python code will process the data `test_data/part-0` and write to
[//file]:#process.py
``` python
from imdb_reader import IMDBDataset
from paddle_serving_app.reader import IMDBDataset
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
......@@ -39,7 +39,7 @@ Here, we [use docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/R
First, start the BOW server, which enables the `8000` port:
``` shell
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it bow-server bash
pip install paddle-serving-server
python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
......@@ -49,7 +49,7 @@ exit
Similarly, start the LSTM server, which enables the `9000` port:
```bash
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it lstm-server bash
pip install paddle-serving-server
python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
......@@ -78,7 +78,7 @@ with open('processed.data') as f:
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
[fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
cnt[tag]['acc'] += 1
cnt[tag]['total'] += 1
......@@ -88,7 +88,7 @@ with open('processed.data') as f:
In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
### Expected Results
......
......@@ -20,7 +20,7 @@ sh get_data.sh
下面Python代码将处理`test_data/part-0`的数据,写入`processed.data`文件中。
```python
from imdb_reader import IMDBDataset
from paddle_serving_app.reader import IMDBDataset
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource('imdb.vocab')
......@@ -38,7 +38,7 @@ with open('test_data/part-0') as fin:
首先启动BOW Server,该服务启用`8000`端口:
```bash
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it bow-server bash
pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log &
......@@ -48,7 +48,7 @@ exit
同理启动LSTM Server,该服务启用`9000`端口:
```bash
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it lstm-server bash
pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log &
......@@ -76,7 +76,7 @@ with open('processed.data') as f:
feed = {"words": word_ids}
fetch = ["acc", "cost", "prediction"]
[fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True)
if (float(fetch_map["prediction"][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0:
cnt[tag]['acc'] += 1
cnt[tag]['total'] += 1
......
......@@ -13,10 +13,10 @@ import paddlehub as hub
model_name = "bert_chinese_L-12_H-768_A-12"
module = hub.Module(model_name)
inputs, outputs, program = module.context(trainable=True, max_seq_len=20)
feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask", "pooled_output", "sequence_output"]
feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask"]
fetch_keys = ["pooled_output", "sequence_output"]
feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys]))
fetch_dict = dict(zip(fetch_keys, [outputs[x]] for x in fetch_keys))
fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys]))
import paddle_serving_client.io as serving_io
serving_io.save_model("bert_seq20_model", "bert_seq20_client", feed_dict, fetch_dict, program)
......
......@@ -9,14 +9,18 @@
- Golang: 1.9.2 and later
- Git:2.17.1 and later
- CMake:3.2.2 and later
- Python:2.7.2 and later
- Python:2.7.2 and later / 3.6 and later
It is recommended to use Docker for compilation. We have prepared the Paddle Serving compilation environment for you:
- CPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
- GPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python 3, just adjust the Python options of cmake.
This document will take Python2 as an example to show how to compile Paddle Serving. If you want to compile with Python3, just adjust the Python options of cmake:
- Set `DPYTHON_INCLUDE_DIR` to `$PYTHONROOT/include/python3.6m/`
- Set `DPYTHON_LIBRARIES` to `$PYTHONROOT/lib64/libpython3.6.so`
- Set `DPYTHON_EXECUTABLE` to `$PYTHONROOT/bin/python3.6`
## Get Code
......@@ -32,6 +36,8 @@ cd Serving && git submodule update --init --recursive
export PYTHONROOT=/usr/
```
In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
## Compile Server
### Integrated CPU version paddle inference library
......@@ -54,6 +60,8 @@ make -j10
execute `make install` to put targets under directory `./output`
**Attention:** After the compilation is successful, you need to set the path of `SERVING_BIN`. See [Note](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE.md#Note) for details.
## Compile Client
``` shell
......
......@@ -9,14 +9,18 @@
- Golang: 1.9.2及以上
- Git:2.17.1及以上
- CMake:3.2.2及以上
- Python:2.7.2及以上
- Python:2.7.2及以上 / 3.6及以上
推荐使用Docker编译,我们已经为您准备好了Paddle Serving编译环境:
- CPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
- GPU: `hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
- CPU: `hub.baidubce.com/paddlepaddle/serving:latest-devel`,dockerfile: [Dockerfile.devel](../tools/Dockerfile.devel)
- GPU: `hub.baidubce.com/paddlepaddle/serving:latest-gpu-devel`,dockerfile: [Dockerfile.gpu.devel](../tools/Dockerfile.gpu.devel)
本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可。
本文档将以Python2为例介绍如何编译Paddle Serving。如果您想用Python3进行编译,只需要调整cmake的Python相关选项即可:
-`DPYTHON_INCLUDE_DIR`设置为`$PYTHONROOT/include/python3.6m/`
-`DPYTHON_LIBRARIES`设置为`$PYTHONROOT/lib64/libpython3.6.so`
-`DPYTHON_EXECUTABLE`设置为`$PYTHONROOT/bin/python3.6`
## 获取代码
......@@ -32,6 +36,8 @@ cd Serving && git submodule update --init --recursive
export PYTHONROOT=/usr/
```
我们提供默认Centos7的Python路径为`/usr/bin/python`,如果您要使用我们的Centos6镜像,需要将其设置为`export PYTHONROOT=/usr/local/python2.7/`
## 编译Server部分
### 集成CPU版本Paddle Inference Library
......@@ -54,6 +60,8 @@ make -j10
执行`make install`可以把目标产出放在`./output`目录下。
**注意:** 编译成功后,需要设置`SERVING_BIN`路径,详见后面的[注意事项](https://github.com/PaddlePaddle/Serving/blob/develop/doc/COMPILE_CN.md#注意事项)
## 编译Client部分
``` shell
......
......@@ -26,7 +26,7 @@ serving_io.save_model("serving_model", "client_conf",
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
代码示例中,`{"words": data}``{"prediction": prediction}`分别指定了模型的输入和输出,`"words"``"prediction"`是输和输出变量的别名,设计别名的目的是为了使开发者能够记忆自己训练模型的输入输出对应的字段。`data``prediction`则是Paddle训练过程中的`[Variable](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Variable_cn.html#variable)`,通常代表张量([Tensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Tensor_cn.html#tensor))或变长张量([LodTensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#lodtensor))。调用保存命令后,会按照用户指定的`"serving_model"``"client_conf"`生成两个目录,内容如下:
代码示例中,`{"words": data}``{"prediction": prediction}`分别指定了模型的输入和输出,`"words"``"prediction"`是输和输出变量的别名,设计别名的目的是为了使开发者能够记忆自己训练模型的输入输出对应的字段。`data``prediction`则是Paddle训练过程中的`[Variable](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Variable_cn.html#variable)`,通常代表张量([Tensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Tensor_cn.html#tensor))或变长张量([LodTensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#lodtensor))。调用保存命令后,会按照用户指定的`"serving_model"``"client_conf"`生成两个目录,内容如下:
``` shell
.
├── client_conf
......
......@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
### Product model
Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
```python
import os
......@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
def push_to_hdfs(local_file_path, remote_path):
hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
os.system('{} fs -put -f {} {}'.format(
hadoop_bin, local_file_path, remote_path))
afs = 'afs://***.***.***.***:***' # User needs to change
uci = '***,***' # User needs to change
hadoop_bin = '/path/to/haddop/bin' # User needs to change
prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
os.system('{} -rmr {}/{}'.format(
prefix, remote_path, local_file_path))
os.system('{} -put {} {}'.format(
prefix, local_file_path, remote_path))
name = "uci_housing"
for pass_id in range(30):
......
......@@ -46,7 +46,7 @@ Paddle Serving提供了一个自动监控脚本,远端地址更新模型后会
### 生产模型
`product_path`下运行下面的Python代码生产模型,每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
`product_path`下运行下面的Python代码生产模型(运行前需要修改hadoop相关的参数),每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
```python
import os
......@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
def push_to_hdfs(local_file_path, remote_path):
hadoop_bin = '/hadoop-3.1.2/bin/hadoop'
os.system('{} fs -put -f {} {}'.format(
hadoop_bin, local_file_path, remote_path))
afs = 'afs://***.***.***.***:***' # User needs to change
uci = '***,***' # User needs to change
hadoop_bin = '/path/to/haddop/bin' # User needs to change
prefix = '{} fs -Dfs.default.name={} -Dhadoop.job.ugi={}'.format(hadoop_bin, afs, uci)
os.system('{} -rmr {}/{}'.format(
prefix, remote_path, local_file_path))
os.system('{} -put {} {}'.format(
prefix, local_file_path, remote_path))
name = "uci_housing"
for pass_id in range(30):
......
# Latest Wheel Packages
## CPU server
### Python 3
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py3-none-any.whl
```
### Python 2
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server-0.3.0-py2-none-any.whl
```
## GPU server
### Python 3
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py3-none-any.whl
```
### Python 2
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_server_gpu-0.3.0-py2-none-any.whl
```
## Client
### Python 3.7
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp37-none-manylinux1_x86_64.whl
```
### Python 3.6
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp36-none-manylinux1_x86_64.whl
```
### Python 2.7
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_client-0.3.0-cp27-none-manylinux1_x86_64.whl
```
## App
### Python 3
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py3-none-any.whl
```
### Python 2
```
https://paddle-serving.bj.bcebos.com/whl/paddle_serving_app-0.1.0-py2-none-any.whl
```
# Performance Optimization
([简体中文](./PERFORMANCE_OPTIM_CN.md)|English)
Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
For computation-intensive prediction services, you can use GPU prediction services instead of CPU prediction services, or increase the number of graphics cards for GPU prediction services.
Under the same conditions, the communication time of the HTTP prediction service provided by Paddle Serving is longer than that of the RPC prediction service, so for communication-intensive services, please give priority to using RPC communication.
Parameters for performance optimization:
| Parameters | Type | Default | Description |
| ---------- | ---- | ------- | ------------------------------------------------------------ |
| mem_optim | bool | False | Enable memory / graphic memory optimization |
| ir_optim | bool | Fasle | Enable analysis and optimization of calculation graph,including OP fusion, etc |
# 性能优化
由于模型结构的不同,在执行预测时不同的预测对计算资源的消耗也不相同,对于在线的预测服务来说,对计算资源要求较少的模型,通信的时间成本占比就会较高,称为通信密集型服务,对计算资源要求较多的模型,推理计算的时间成本较高,称为计算密集型服务。对于这两种服务类型,可以根据实际需求采取不同的方式进行优化
(简体中文|[English](./PERFORMANCE_OPTIM.md))
由于模型结构的不同,在执行预测时不同的预测服务对计算资源的消耗也不相同。对于在线的预测服务来说,对计算资源要求较少的模型,通信的时间成本占比就会较高,称为通信密集型服务,对计算资源要求较多的模型,推理计算的时间成本较高,称为计算密集型服务。对于这两种服务类型,可以根据实际需求采取不同的方式进行优化
对于一个预测服务来说,想要判断属于哪种类型,最简单的方法就是看时间占比,Paddle Serving提供了[Timeline工具](../python/examples/util/README_CN.md),可以直观的展现预测服务中各阶段的耗时。
......@@ -10,4 +12,9 @@
在相同条件下,Paddle Serving提供的HTTP预测服务的通信时间是大于RPC预测服务的,因此对于通信密集型的服务请优先考虑使用RPC的通信方式。
对于模型较大,预测服务内存或显存占用较多的情况,可以通过将--mem_optim选项设置为True来开启内存/显存优化。
性能优化相关参数:
| 参数 | 类型 | 默认值 | 含义 |
| --------- | ---- | ------ | -------------------------------- |
| mem_optim | bool | False | 开启内存/显存优化 |
| ir_optim | bool | Fasle | 开启计算图分析优化,包括OP融合等 |
......@@ -17,7 +17,7 @@ You can get images in two ways:
1. Pull image directly
```bash
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker pull hub.baidubce.com/paddlepaddle/serving:latest
```
2. Building image based on dockerfile
......@@ -25,13 +25,13 @@ You can get images in two ways:
Create a new folder and copy [Dockerfile](../tools/Dockerfile) to this folder, and run the following command:
```bash
docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0 .
docker build -t hub.baidubce.com/paddlepaddle/serving:latest .
```
### Create container
```bash
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it test bash
```
......@@ -53,12 +53,6 @@ pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple
### Test example
Before running the GPU version of the Server side code, you need to set the `CUDA_VISIBLE_DEVICES` environment variable to specify which GPUs the prediction service uses. The following example specifies two GPUs with indexes 0 and 1:
```bash
export CUDA_VISIBLE_DEVICES=0,1
```
Get the trained Boston house price prediction model by the following command:
```bash
......@@ -71,13 +65,13 @@ tar -xzf uci_housing.tar.gz
Running on the Server side (inside the container):
```bash
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci &>std.log 2>err.log &
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
```
Running on the Client side (inside or outside the container):
```bash
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
- Test RPC service
......@@ -85,7 +79,7 @@ tar -xzf uci_housing.tar.gz
Running on the Server side (inside the container):
```bash
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 &>std.log 2>err.log &
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
```
Running following Python code on the Client side (inside or outside the container, The `paddle-serving-client` package needs to be installed):
......@@ -115,7 +109,7 @@ You can also get images in two ways:
1. Pull image directly
```bash
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
```
2. Building image based on dockerfile
......@@ -123,13 +117,13 @@ You can also get images in two ways:
Create a new folder and copy [Dockerfile.gpu](../tools/Dockerfile.gpu) to this folder, and run the following command:
```bash
nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu .
nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:latest-gpu .
```
### Create container
```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
nvidia-docker exec -it test bash
```
......@@ -176,7 +170,7 @@ tar -xzf uci_housing.tar.gz
Running on the Client side (inside or outside the container):
```bash
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
- Test RPC service
......
......@@ -17,7 +17,7 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker)
1. 直接拉取镜像
```bash
docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0
docker pull hub.baidubce.com/paddlepaddle/serving:latest
```
2. 基于Dockerfile构建镜像
......@@ -25,13 +25,13 @@ Docker(GPU版本需要在GPU机器上安装nvidia-docker)
建立新目录,复制[Dockerfile](../tools/Dockerfile)内容到该目录下Dockerfile文件。执行
```bash
docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0 .
docker build -t hub.baidubce.com/paddlepaddle/serving:latest .
```
### 创建容器并进入
```bash
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0
docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest
docker exec -it test bash
```
......@@ -65,13 +65,13 @@ tar -xzf uci_housing.tar.gz
在Server端(容器内)运行:
```bash
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci &>std.log 2>err.log &
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci >std.log 2>err.log &
```
在Client端(容器内或容器外)运行:
```bash
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
- 测试RPC服务
......@@ -79,7 +79,7 @@ tar -xzf uci_housing.tar.gz
在Server端(容器内)运行:
```bash
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 &>std.log 2>err.log &
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 >std.log 2>err.log &
```
在Client端(容器内或容器外,需要安装`paddle-serving-client`包)运行下面Python代码:
......@@ -107,7 +107,7 @@ GPU版本与CPU版本基本一致,只有部分接口命名的差别(GPU版
1. 直接拉取镜像
```bash
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker pull hub.baidubce.com/paddlepaddle/serving:latest-gpu
```
2. 基于Dockerfile构建镜像
......@@ -115,13 +115,13 @@ GPU版本与CPU版本基本一致,只有部分接口命名的差别(GPU版
建立新目录,复制[Dockerfile.gpu](../tools/Dockerfile.gpu)内容到该目录下Dockerfile文件。执行
```bash
nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu .
nvidia-docker build -t hub.baidubce.com/paddlepaddle/serving:latest-gpu .
```
### 创建容器并进入
```bash
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:0.2.0-gpu
nvidia-docker run -p 9292:9292 --name test -dit hub.baidubce.com/paddlepaddle/serving:latest-gpu
nvidia-docker exec -it test bash
```
......@@ -168,7 +168,7 @@ tar -xzf uci_housing.tar.gz
在Client端(容器内或容器外)运行:
```bash
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332]}, "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
- 测试RPC服务
......
## How to save a servable model of Paddle Serving?
# How to save a servable model of Paddle Serving?
([简体中文](./SAVE_CN.md)|English)
- Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
## Save from training or prediction script
Currently, paddle serving provides a save_model interface for users to access, the interface is similar with `save_inference_model` of Paddle.
``` python
import paddle_serving_client.io as serving_io
serving_io.save_model("imdb_model", "imdb_client_conf",
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
`imdb_model` is the server side model with serving configurations. `imdb_client_conf` is the client rpc configurations. Serving has a
dictionary for `Feed` and `Fetch` variables for client to assign. In the example, `{"words": data}` is the feed dict that specify the input of saved inference model. `{"prediction": prediction}` is the fetch dic that specify the output of saved inference model. An alias name can be defined for feed and fetch variables. An example of how to use alias name
`imdb_model` is the server side model with serving configurations. `imdb_client_conf` is the client rpc configurations.
Serving has a dictionary for `Feed` and `Fetch` variables for client to assign. In the example, `{"words": data}` is the feed dict that specify the input of saved inference model. `{"prediction": prediction}` is the fetch dic that specify the output of saved inference model. An alias name can be defined for feed and fetch variables. An example of how to use alias name
is as follows:
``` python
from paddle_serving_client import Client
......@@ -29,3 +31,19 @@ for line in sys.stdin:
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
## Export from saved model files
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None )
```
dirname (str) - Path of saved model files. Program file and parameter files are saved in this directory.
serving_server (str, optional) - The path of model files and configuration files for server. Default: "serving_server".
serving_client (str, optional) - The path of configuration files for client. Default: "serving_client".
model_filename (str, optional) - The name of file to load the inference program. If it is None, the default filename `__model__` will be used. Default: None.
paras_filename (str, optional) - The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. Default: None.
## 怎样保存用于Paddle Serving的模型?
# 怎样保存用于Paddle Serving的模型?
(简体中文|[English](./SAVE.md))
- 目前,Paddle Serving提供了一个save_model接口供用户访问,该接口与Paddle的`save_inference_model`类似。
## 从训练或预测脚本中保存
目前,Paddle Serving提供了一个save_model接口供用户访问,该接口与Paddle的`save_inference_model`类似。
``` python
import paddle_serving_client.io as serving_io
......@@ -10,7 +11,9 @@ serving_io.save_model("imdb_model", "imdb_client_conf",
{"words": data}, {"prediction": prediction},
fluid.default_main_program())
```
imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。 Serving有一个 提供给用户存放Feed和Fetch变量信息的字典。 在示例中,`{words”:data}` 是用于指定已保存推理模型输入的提要字典。`{"prediction":projection}`是指定保存的推理模型输出的字典。可以为feed和fetch变量定义一个别名。 如何使用别名的例子 示例如下:
imdb_model是具有服务配置的服务器端模型。 imdb_client_conf是客户端rpc配置。
Serving有一个提供给用户存放Feed和Fetch变量信息的字典。 在示例中,`{"words":data}` 是用于指定已保存推理模型输入的提要字典。`{"prediction":projection}`是指定保存的推理模型输出的字典。可以为feed和fetch变量定义一个别名。 如何使用别名的例子 示例如下:
``` python
from paddle_serving_client import Client
......@@ -29,3 +32,19 @@ for line in sys.stdin:
fetch_map = client.predict(feed=feed, fetch=fetch)
print("{} {}".format(fetch_map["prediction"][1], label[0]))
```
## 从已保存的模型文件中导出
如果已使用Paddle 的`save_inference_model`接口保存出预测要使用的模型,则可以通过Paddle Serving的`inference_model_to_serving`接口转换成可用于Paddle Serving的模型文件。
```python
import paddle_serving_client.io as serving_io
serving_io.inference_model_to_serving(dirname, serving_server="serving_server", serving_client="serving_client", model_filename=None, params_filename=None)
```
dirname (str) – 需要转换的模型文件存储路径,Program结构文件和参数文件均保存在此目录。
serving_server (str, 可选) - 转换后的模型文件和配置文件的存储路径。默认值为serving_server。
serving_client (str, 可选) - 转换后的客户端配置文件存储路径。默认值为serving_client。
model_filename (str,可选) – 存储需要转换的模型Inference Program结构的文件名称。如果设置为None,则使用 `__model__` 作为默认的文件名。默认值为None。
params_filename (str,可选) – 存储需要转换的模型所有参数的文件名称。当且仅当所有模型参数被保存在一个单独的二进制文件中,它才需要被指定。如果模型参数是存储在各自分离的文件中,设置它的值为None。默认值为None。
......@@ -350,12 +350,12 @@ In the above command, the first parameter is the saved server-side model and con
After starting the HTTP prediction service, you can make prediction with a single command:
```
curl -H "Content-Type: application/json" -X POST -d '{"words": "i am very sad | 0", "fetch": ["prediction"]}' http://127.0.0.1:9292/imdb/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
When the inference process is normal, the prediction probability is returned, as shown below.
```
{"prediction": [0.5592559576034546,0.44074398279190063]}
{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
```
**Note**: The effect of each model training may be slightly different, and the inferred probability value using the trained model may not be consistent with the example.
......@@ -353,12 +353,12 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
启动完HTTP预测服务,即可通过一行命令进行预测:
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
预测流程正常时,会返回预测概率,示例如下。
```
{"prediction":[0.5592559576034546,0.44074398279190063]}
{"result":{"prediction":[[0.4389057457447052,0.561094343662262]]}}
```
**注意**:每次模型训练的效果可能略有不同,使用训练出的模型预测概率数值可能与示例不一致。
# 使用uwsgi启动HTTP预测服务
# Deploy HTTP service with uWSGI
在提供的fit_a_line示例中,启动HTTP预测服务后会看到有以下信息:
([简体中文](./UWSGI_DEPLOY_CN.md)|English)
In fit_a_line example, after starting the HTTP prediction service, you will see the following information:
```shell
web service address:
......@@ -13,46 +15,31 @@ http://10.127.3.150:9393/uci/prediction
* Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
```
这里会提示启动的HTTP服务是开发模式,并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发,实际部署过程中配合需要WSGI(Web Server Gateway Interface)使用。
Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment.
The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
下面我们展示一下如何使用[uWSGI](https://github.com/unbit/uwsgi)模块来部署HTTP预测服务用于生产环境。
Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
编写HTTP服务脚本
```python
#uwsgi_service.py
from paddle_serving_server.web_service import WebService
from flask import Flask, request
#配置预测服务
#Define prediction service
uci_service = WebService(name = "uci")
uci_service.load_model_config("./uci_housing_model")
uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
uci_service.run_server()
#配置flask服务
app_instance = Flask(__name__)
@app_instance.before_first_request
def init():
global uci_service
uci_service._launch_web_service()
service_name = "/" + uci_service.name + "/prediction"
@app_instance.route(service_name, methods=["POST"])
def run():
return uci_service.get_prediction(request)
#run方法用于直接调试中直接启动服务
if __name__ == "__main__":
app_instance.run()
uci_service.run_rpc_service()
#Get flask application
app_instance = uci_service.get_app_instance()
```
使用uwsgi启动HTTP服务
Start service with uWSGI
```bash
uwsgi --http :9000 --wsgi-file uwsgi_service.py --callable app_instance --processes 4
uwsgi --http :9393 --module uwsgi_service:app_instance
```
使用--processes参数可以指定服务的进程数,请注意目前Serving HTTP 服务暂时不支持多线程的方式使用。
Use the --processes parameter to specify the number of service processes.
更多uWSGI的信息请参考[uWSGI使用文档](https://uwsgi-docs.readthedocs.io/en/latest/)
For more information about uWSGI, please refer to [uWSGI documentation](https://uwsgi-docs.readthedocs.io/en/latest/)
# 使用uwsgi启动HTTP预测服务
(简体中文|[English](./UWSGI_DEPLOY.md))
在提供的fit_a_line示例中,启动HTTP预测服务后会看到有以下信息:
```shell
web service address:
http://10.127.3.150:9393/uci/prediction
* Serving Flask app "serve" (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
```
这里会提示启动的HTTP服务是开发模式,并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发,实际部署过程中配合需要WSGI(Web Server Gateway Interface)使用。
下面我们展示一下如何使用[uWSGI](https://github.com/unbit/uwsgi)模块来部署HTTP预测服务用于生产环境。
编写HTTP服务脚本
```python
#uwsgi_service.py
from paddle_serving_server.web_service import WebService
#配置预测服务
uci_service = WebService(name = "uci")
uci_service.load_model_config("./uci_housing_model")
uci_service.prepare_server(workdir="./workdir", port=int(9500), device="cpu")
uci_service.run_rpc_service()
#获取flask服务
app_instance = uci_service.get_app_instance()
```
使用uwsgi启动HTTP服务
```bash
uwsgi --http :9393 --module uwsgi_service:app_instance
```
使用--processes参数可以指定服务的进程数。
更多uWSGI的信息请参考[uWSGI使用文档](https://uwsgi-docs.readthedocs.io/en/latest/)
......@@ -194,6 +194,12 @@ class FluidCpuAnalysisDirCore : public FluidFamilyCore {
analysis_config.EnableMemoryOptim();
}
if (params.enable_ir_optimization()) {
analysis_config.SwitchIrOptim(true);
} else {
analysis_config.SwitchIrOptim(false);
}
AutoLock lock(GlobalPaddleCreateMutex::instance());
_core =
paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(analysis_config);
......
......@@ -198,6 +198,12 @@ class FluidGpuAnalysisDirCore : public FluidFamilyCore {
analysis_config.EnableMemoryOptim();
}
if (params.enable_ir_optimization()) {
analysis_config.SwitchIrOptim(true);
} else {
analysis_config.SwitchIrOptim(false);
}
AutoLock lock(GlobalPaddleCreateMutex::instance());
_core =
paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(analysis_config);
......
......@@ -19,6 +19,8 @@ endif()
if (CLIENT)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/setup.py.client.in
${CMAKE_CURRENT_BINARY_DIR}/setup.py)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/../tools/python_tag.py
${CMAKE_CURRENT_BINARY_DIR}/python_tag.py)
endif()
if (APP)
......@@ -43,7 +45,8 @@ if (APP)
add_custom_command(
OUTPUT ${PADDLE_SERVING_BINARY_DIR}/.timestamp
COMMAND cp -r ${CMAKE_CURRENT_SOURCE_DIR}/paddle_serving_app/ ${PADDLE_SERVING_BINARY_DIR}/python/
COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel)
COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel
DEPENDS ${SERVING_APP_CORE} general_model_config_py_proto ${PY_FILES})
add_custom_target(paddle_python ALL DEPENDS ${PADDLE_SERVING_BINARY_DIR}/.timestamp)
endif()
......@@ -52,6 +55,7 @@ add_custom_command(
OUTPUT ${PADDLE_SERVING_BINARY_DIR}/.timestamp
COMMAND cp -r ${CMAKE_CURRENT_SOURCE_DIR}/paddle_serving_client/ ${PADDLE_SERVING_BINARY_DIR}/python/
COMMAND ${CMAKE_COMMAND} -E copy ${SERVING_CLIENT_CORE} ${PADDLE_SERVING_BINARY_DIR}/python/paddle_serving_client/serving_client.so
COMMAND env ${py_env} ${PYTHON_EXECUTABLE} python_tag.py
COMMAND env ${py_env} ${PYTHON_EXECUTABLE} setup.py bdist_wheel
DEPENDS ${SERVING_CLIENT_CORE} sdk_configure_py_proto ${PY_FILES})
add_custom_target(paddle_python ALL DEPENDS serving_client ${PADDLE_SERVING_BINARY_DIR}/.timestamp)
......
......@@ -71,28 +71,3 @@ set environmental variable to specify which gpus are used, the command above mea
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
```
### Benchmark
Model:bert_chinese_L-12_H-768_A-12
GPU:GPU V100 * 1
CUDA/cudnn Version:CUDA 9.2,cudnn 7.1.4
In the test, 10 thousand samples in the sample data are copied into 100 thousand samples. Each client thread sends a sample of the number of threads. The batch size is 1, the max_seq_len is 20(not 128 as described above), and the time unit is seconds.
When the number of client threads is 4, the prediction speed can reach 432 samples per second.
Because a single GPU can only perform serial calculations internally, increasing the number of client threads can only reduce the idle time of the GPU. Therefore, after the number of threads reaches 4, the increase in the number of threads does not improve the prediction speed.
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
| 1 | 3.05 | 290.54 | 0.37 | 239.15 | 6.43 | 0.71 | 365.63 |
| 4 | 0.85 | 213.66 | 0.091 | 200.39 | 1.62 | 0.2 | 231.45 |
| 8 | 0.42 | 223.12 | 0.043 | 110.99 | 0.8 | 0.098 | 232.05 |
| 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 |
| 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 |
the following is the client thread num - latency bar chart:
![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
......@@ -67,27 +67,3 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
```
### Benchmark
模型:bert_chinese_L-12_H-768_A-12
设备:GPU V100 * 1
环境:CUDA 9.2,cudnn 7.1.4
测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20(而不是上面的128),时间单位为秒.
在client线程数为4时,预测速度可以达到432样本每秒。
由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
| 1 | 3.05 | 290.54 | 0.37 | 239.15 | 6.43 | 0.71 | 365.63 |
| 4 | 0.85 | 213.66 | 0.091 | 200.39 | 1.62 | 0.2 | 231.45 |
| 8 | 0.42 | 223.12 | 0.043 | 110.99 | 0.8 | 0.098 | 232.05 |
| 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 |
| 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 |
总耗时变化规律如下:
![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
......@@ -22,11 +22,8 @@ import time
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args, show_latency
from batching import pad_batch_data
import tokenization
import requests
import json
from bert_reader import BertReader
from paddle_serving_app.reader import ChineseBertReader
args = benchmark_args()
......@@ -45,8 +42,7 @@ def single_func(idx, resource):
latency_list = []
if args.request == "rpc":
reader = BertReader(vocab_file="vocab.txt", max_seq_len=20)
reader = ChineseBertReader({"max_seq_len": 128})
fetch = ["pooled_output"]
client = Client()
client.load_client_config(args.model)
......@@ -78,7 +74,10 @@ def single_func(idx, resource):
elif args.request == "http":
raise ("not implemented")
end = time.time()
return [[end - start], latency_list]
if latency_flags:
return [[end - start], latency_list]
else:
return [[end - start]]
if __name__ == '__main__':
......@@ -86,7 +85,7 @@ if __name__ == '__main__':
endpoint_list = [
"127.0.0.1:9292", "127.0.0.1:9293", "127.0.0.1:9294", "127.0.0.1:9295"
]
turns = 1000
turns = 10
start = time.time()
result = multi_thread_runner.run(
single_func, args.thread, {"endpoint": endpoint_list,
......
......@@ -3,25 +3,25 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3
export FLAGS_profile_server=1
export FLAGS_profile_client=1
export FLAGS_serving_latency=1
python -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 2> elog > stdlog &
python3 -m paddle_serving_server_gpu.serve --model $1 --port 9292 --thread 4 --gpu_ids 0,1,2,3 --mem_optim False --ir_optim True 2> elog > stdlog &
sleep 5
#warm up
$PYTHONROOT/bin/python benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
python3 benchmark.py --thread 8 --batch_size 1 --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
for thread_num in 4 8 16
do
for batch_size in 1 4 16 64 256
do
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
python3 benchmark.py --thread $thread_num --batch_size $batch_size --model $2/serving_client_conf.prototxt --request rpc > profile 2>&1
echo "model name :" $1
echo "thread num :" $thread_num
echo "batch size :" $batch_size
echo "=================Done===================="
echo "model name :$1" >> profile_log_$1
echo "batch size :$batch_size" >> profile_log_$1
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log_$1
python3 ../util/show_profile.py profile $thread_num >> profile_log_$1
tail -n 8 profile >> profile_log_$1
echo "" >> profile_log_$1
done
......
......@@ -25,7 +25,7 @@ from paddlehub.common.logger import logger
import socket
from paddle_serving_client import Client
from paddle_serving_client.utils import benchmark_args
from paddle_serving_app import ChineseBertReader
from paddle_serving_app.reader import ChineseBertReader
args = benchmark_args()
......
......@@ -14,19 +14,22 @@
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_server_gpu.web_service import WebService
from bert_reader import BertReader
from paddle_serving_app.reader import ChineseBertReader
import sys
import os
class BertService(WebService):
def load(self):
self.reader = BertReader(vocab_file="vocab.txt", max_seq_len=128)
self.reader = ChineseBertReader({
"vocab_file": "vocab.txt",
"max_seq_len": 128
})
def preprocess(self, feed={}, fetch=[]):
feed_res = [{
"words": self.reader.process(ins["words"].encode("utf-8"))
} for ins in feed]
def preprocess(self, feed=[], fetch=[]):
feed_res = [
self.reader.process(ins["words"].encode("utf-8")) for ins in feed
]
return feed_res, fetch
......@@ -37,5 +40,5 @@ gpu_ids = os.environ["CUDA_VISIBLE_DEVICES"]
bert_service.set_gpus(gpu_ids)
bert_service.prepare_server(
workdir="workdir", port=int(sys.argv[2]), device="gpu")
bert_service.run_server()
bert_service.run_flask()
bert_service.run_rpc_service()
bert_service.run_web_service()
......@@ -12,37 +12,32 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import requests
import base64
import json
import time
import os
from paddle_serving_client import Client
from paddle_serving_app.reader import ChineseBertReader
import sys
py_version = sys.version_info[0]
client = Client()
client.load_client_config("./bert_seq32_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
reader = ChineseBertReader({"max_seq_len": 32})
fetch = ["sequence_10", "sequence_12", "pooled_output"]
expected_shape = {
"sequence_10": (4, 32, 768),
"sequence_12": (4, 32, 768),
"pooled_output": (4, 768)
}
batch_size = 4
feed_batch = []
def predict(image_path, server):
if py_version == 2:
image = base64.b64encode(open(image_path).read())
for line in sys.stdin:
feed = reader.process(line)
if len(feed_batch) < batch_size:
feed_batch.append(feed)
else:
image = base64.b64encode(open(image_path, "rb").read()).decode("utf-8")
req = json.dumps({"feed": [{"image": image}], "fetch": ["score"]})
r = requests.post(
server, data=req, headers={"Content-Type": "application/json"})
try:
print(r.json()["result"]["score"])
except ValueError:
print(r.text)
return r
if __name__ == "__main__":
server = "http://127.0.0.1:9393/image/prediction"
image_list = os.listdir("./image_data/n01440764/")
start = time.time()
for img in image_list:
image_file = "./image_data/n01440764/" + img
res = predict(image_file, server)
end = time.time()
print(end - start)
fetch_map = client.predict(feed=feed_batch, fetch=fetch)
feed_batch = []
for var_name in fetch:
if fetch_map[var_name].shape != expected_shape[var_name]:
print("fetch var {} shape error.".format(var_name))
sys.exit(1)
# Cascade RCNN model on Paddle Serving
([简体中文](./README_CN.md)|English)
### Get The Cascade RCNN Model
```
sh get_data.sh
```
If you want to have more detection models, please refer to [Paddle Detection Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
### Start the service
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9292 --gpu_id 0
```
### Perform prediction
```
python test_client.py
```
Image with bounding boxes and json result would be saved in `output` folder.
# 使用Paddle Serving部署Cascade RCNN模型
(简体中文|[English](./README.md))
## 获得Cascade RCNN模型
```
sh get_data.sh
```
如果你想要更多的检测模型,请参考[Paddle检测模型库](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
### 启动服务
```
python -m paddle_serving_server_gpu.serve --model serving_server --port 9292 --gpu_id 0
```
### 执行预测
```
python test_client.py
```
客户端已经为图片做好了后处理,在`output`文件夹下存放各个框的json格式信息还有后处理结果图片。
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
tar -zxvf lac_model_jieba_web.tar.gz
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/cascade_rcnn_r50_fpx_1x_serving.tar.gz
tar xf cascade_rcnn_r50_fpx_1x_serving.tar.gz
......@@ -2,16 +2,6 @@
([简体中文](./README_CN.md)|English)
### Compile Source Code
in the root directory of this git project
```
mkdir build_server
cd build_server
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
make -j10
make install -j10
```
### Get Sample Dataset
go to directory `python/examples/criteo_ctr_with_cube`
......@@ -31,7 +21,9 @@ the model will be in ./ctr_server_model_kv and ./ctr_client_config.
### Start Sparse Parameter Indexing Service
```
cp ../../../build_server/output/bin/cube* ./cube/
wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz
tar xf cube_app.tar.gz
mv cube_app/cube* ./cube/
sh cube_prepare.sh &
```
......
## 带稀疏参数索引服务的CTR预测服务
(简体中文|[English](./README.md))
### 编译源代码
在本项目的根目录下,执行
```
mkdir build_server
cd build_server
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ -DPYTHON_LIBRARIES=$PYTHONROOT/lib64/libpython2.7.so -DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python -DSERVER=ON ..
make -j10
make install -j10
```
### 获取样例数据
进入目录 `python/examples/criteo_ctr_with_cube`
```
......@@ -29,7 +19,9 @@ mv models/data ./cube/
### 启动稀疏参数索引服务
```
cp ../../../build_server/output/bin/cube* ./cube/
wget https://paddle-serving.bj.bcebos.com/others/cube_app.tar.gz
tar xf cube_app.tar.gz
mv cube_app/cube* ./cube/
sh cube_prepare.sh &
```
......
# Image Segmentation
## Get Model
```
python -m paddle_serving_app.package --get_model deeplabv3
tar -xzvf deeplabv3.tar.gz
```
## RPC Service
### Start Service
```
python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
```
### Client Prediction
```
python deeplabv3_client.py
```
# 图像分割
## 获取模型
```
python -m paddle_serving_app.package --get_model deeplabv3
tar -xzvf deeplabv3.tar.gz
```
## RPC 服务
### 启动服务端
```
python -m paddle_serving_server_gpu.serve --model deeplabv3_server --gpu_ids 0 --port 9494
```
### 客户端预测
```
python deeplabv3_client.py
......@@ -12,30 +12,23 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server.web_service import WebService
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, Transpose, BGR2RGB, SegPostprocess
import sys
import cv2
import base64
import numpy as np
from paddle_serving_app import ImageReader
client = Client()
client.load_client_config("deeplabv3_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9494"])
class ImageService(WebService):
def preprocess(self, feed={}, fetch=[]):
reader = ImageReader()
feed_batch = []
for ins in feed:
if "image" not in ins:
raise ("feed data error!")
sample = base64.b64decode(ins["image"])
img = reader.process_image(sample)
feed_batch.append({"image": img})
return feed_batch, fetch
preprocess = Sequential(
[File2Image(), Resize(
(512, 512), interpolation=cv2.INTER_LINEAR)])
postprocess = SegPostprocess(2)
image_service = ImageService(name="image")
image_service.load_model_config(sys.argv[1])
image_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
image_service.run_server()
image_service.run_flask()
filename = "N0060.jpg"
im = preprocess(filename)
fetch_map = client.predict(feed={"image": im}, fetch=["output"])
fetch_map["filename"] = filename
postprocess(fetch_map)
......@@ -12,8 +12,8 @@ If you want to have more detection models, please refer to [Paddle Detection Mod
### Start the service
```
tar xf faster_rcnn_model.tar.gz
mv faster_rcnn_model/pddet *.
GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
mv faster_rcnn_model/pddet* .
GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
```
### Perform prediction
......
......@@ -13,7 +13,7 @@ wget https://paddle-serving.bj.bcebos.com/pddet_demo/infer_cfg.yml
```
tar xf faster_rcnn_model.tar.gz
mv faster_rcnn_model/pddet* ./
GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_id 0
GLOG_v=2 python -m paddle_serving_server_gpu.serve --model pddet_serving_model --port 9494 --gpu_ids 0
```
### 执行预测
......
background
person
bicycle
car
motorcycle
airplane
bus
train
truck
boat
traffic light
fire hydrant
stop sign
parking meter
bench
bird
cat
dog
horse
sheep
cow
elephant
bear
zebra
giraffe
backpack
umbrella
handbag
tie
suitcase
frisbee
skis
snowboard
sports ball
kite
baseball bat
baseball glove
skateboard
surfboard
tennis racket
bottle
wine glass
cup
fork
knife
spoon
bowl
banana
apple
sandwich
orange
broccoli
carrot
hot dog
pizza
donut
cake
chair
couch
potted plant
bed
dining table
toilet
tv
laptop
mouse
remote
keyboard
cell phone
microwave
oven
toaster
sink
refrigerator
book
clock
vase
scissors
teddy bear
hair drier
toothbrush
......@@ -13,21 +13,29 @@
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import *
import sys
import os
import time
from paddle_serving_app.reader.pddet import Detection
import numpy as np
py_version = sys.version_info[0]
preprocess = Sequential([
File2Image(), BGR2RGB(), Div(255.0),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
Resize(640, 640), Transpose((2, 0, 1))
])
feed_var_names = ['image', 'im_shape', 'im_info']
fetch_var_names = ['multiclass_nms']
pddet = Detection(config_path=sys.argv[2], output_dir="./output")
feed_dict = pddet.preprocess(feed_var_names, sys.argv[3])
postprocess = RCNNPostprocess("label_list.txt", "output")
client = Client()
client.load_client_config(sys.argv[1])
client.connect(['127.0.0.1:9494'])
fetch_map = client.predict(feed=feed_dict, fetch=fetch_var_names)
outs = fetch_map.values()
pddet.postprocess(fetch_map, fetch_var_names)
im = preprocess(sys.argv[3])
fetch_map = client.predict(
feed={
"image": im,
"im_info": np.array(list(im.shape[1:]) + [1.0]),
"im_shape": np.array(list(im.shape[1:]) + [1.0])
},
fetch=["multiclass_nms"])
fetch_map["image"] = sys.argv[3]
postprocess(fetch_map)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
import paddle
def single_func(idx, resource):
client = Client()
client.load_client_config(
"./uci_housing_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9293", "127.0.0.1:9292"])
x = [
0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584,
0.6283, 0.4919, 0.1856, 0.0795, -0.0332
]
for i in range(1000):
fetch_map = client.predict(feed={"x": x}, fetch=["price"])
if fetch_map is None:
return [[None]]
return [[0]]
multi_thread_runner = MultiThreadRunner()
thread_num = 4
result = multi_thread_runner.run(single_func, thread_num, {})
if None in result[0]:
exit(1)
......@@ -8,34 +8,42 @@ The example uses the ResNet50_vd model to perform the imagenet 1000 classificati
```
sh get_model.sh
```
### HTTP Infer
### Install preprocess module
```
pip install paddle_serving_app
```
### HTTP Service
launch server side
```
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu inference service
python resnet50_web_service.py ResNet50_vd_model cpu 9696 #cpu inference service
```
```
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu inference service
python resnet50_web_service.py ResNet50_vd_model gpu 9696 #gpu inference service
```
client send inference request
```
python image_http_client.py
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9696/image/prediction
```
### RPC Infer
### RPC Service
launch server side
```
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu inference service
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9696 #cpu inference service
```
```
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu inference service
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9696 --gpu_ids 0 #gpu inference service
```
client send inference request
```
python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
python resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
```
*the port of server side in this example is 9393, the sample data used by client side is in the folder ./data. These parameter can be modified in practice*
*the port of server side in this example is 9696
......@@ -8,34 +8,42 @@
```
sh get_model.sh
```
### 执行HTTP预测服务
### 安装数据预处理模块
```
pip install paddle_serving_app
```
### HTTP服务
启动server端
```
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu预测服务
python resnet50_web_service.py ResNet50_vd_model cpu 9696 #cpu预测服务
```
```
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu预测服务
python resnet50_web_service.py ResNet50_vd_model gpu 9696 #gpu预测服务
```
client端进行预测
发送HTTP POST请求
```
python image_http_client.py
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"image": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"}], "fetch": ["score"]}' http://127.0.0.1:9696/image/prediction
```
### 执行RPC预测服务
### RPC服务
启动server端
```
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu预测服务
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9696 #cpu预测服务
```
```
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu预测服务
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9696 --gpu_ids 0 #gpu预测服务
```
client端进行预测
```
python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
python resnet50_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
```
*server端示例中服务端口为9393端口,client端示例中数据来自./data文件夹,server端地址为本地9393端口,可根据实际情况更改脚本。*
*server端示例中服务端口为9696端口
......@@ -19,15 +19,22 @@ from __future__ import unicode_literals, absolute_import
import os
import sys
import time
import requests
import json
import base64
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args
import requests
import json
from image_reader import ImageReader
from paddle_serving_app.reader import Sequential, URL2Image, Resize
from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
args = benchmark_args()
seq_preprocess = Sequential([
URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
def single_func(idx, resource):
file_list = []
......@@ -36,6 +43,10 @@ def single_func(idx, resource):
img_list = []
for i in range(1000):
img_list.append(open("./image_data/n01440764/" + file_list[i]).read())
profile_flags = False
if "FLAGS_profile_client" in os.environ and os.environ[
"FLAGS_profile_client"]:
profile_flags = True
if args.request == "rpc":
reader = ImageReader()
fetch = ["score"]
......@@ -46,16 +57,36 @@ def single_func(idx, resource):
for i in range(1000):
if args.batch_size >= 1:
feed_batch = []
i_start = time.time()
for bi in range(args.batch_size):
img = reader.process_image(img_list[i])
img = img.reshape(-1)
img = seq_preprocess(img_list[i])
feed_batch.append({"image": img})
i_end = time.time()
if profile_flags:
print("PROFILE\tpid:{}\timage_pre_0:{} image_pre_1:{}".
format(os.getpid(),
int(round(i_start * 1000000)),
int(round(i_end * 1000000))))
result = client.predict(feed=feed_batch, fetch=fetch)
else:
print("unsupport batch size {}".format(args.batch_size))
elif args.request == "http":
raise ("no batch predict for http")
py_version = 2
server = "http://" + resource["endpoint"][idx % len(resource[
"endpoint"])] + "/image/prediction"
start = time.time()
for i in range(1000):
if py_version == 2:
image = base64.b64encode(
open("./image_data/n01440764/" + file_list[i]).read())
else:
image = base64.b64encode(open(image_path, "rb").read()).decode(
"utf-8")
req = json.dumps({"feed": [{"image": image}], "fetch": ["score"]})
r = requests.post(
server, data=req, headers={"Content-Type": "application/json"})
end = time.time()
return [[end - start]]
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import cv2
import numpy as np
class ImageReader():
def __init__(self):
self.image_mean = [0.485, 0.456, 0.406]
self.image_std = [0.229, 0.224, 0.225]
self.image_shape = [3, 224, 224]
self.resize_short_size = 256
self.interpolation = None
def resize_short(self, img, target_size, interpolation=None):
"""resize image
Args:
img: image data
target_size: resize short target size
interpolation: interpolation mode
Returns:
resized image data
"""
percent = float(target_size) / min(img.shape[0], img.shape[1])
resized_width = int(round(img.shape[1] * percent))
resized_height = int(round(img.shape[0] * percent))
if interpolation:
resized = cv2.resize(
img, (resized_width, resized_height),
interpolation=interpolation)
else:
resized = cv2.resize(img, (resized_width, resized_height))
return resized
def crop_image(self, img, target_size, center):
"""crop image
Args:
img: images data
target_size: crop target size
center: crop mode
Returns:
img: cropped image data
"""
height, width = img.shape[:2]
size = target_size
if center == True:
w_start = (width - size) // 2
h_start = (height - size) // 2
else:
w_start = np.random.randint(0, width - size + 1)
h_start = np.random.randint(0, height - size + 1)
w_end = w_start + size
h_end = h_start + size
img = img[h_start:h_end, w_start:w_end, :]
return img
def process_image(self, sample):
""" process_image """
mean = self.image_mean
std = self.image_std
crop_size = self.image_shape[1]
data = np.fromstring(sample, np.uint8)
img = cv2.imdecode(data, cv2.IMREAD_COLOR)
if img is None:
print("img is None, pass it.")
return None
if crop_size > 0:
target_size = self.resize_short_size
img = self.resize_short(
img, target_size, interpolation=self.interpolation)
img = self.crop_image(img, target_size=crop_size, center=True)
img = img[:, :, ::-1]
img = img.astype('float32').transpose((2, 0, 1)) / 255
img_mean = np.array(mean).reshape((3, 1, 1))
img_std = np.array(std).reshape((3, 1, 1))
img -= img_mean
img /= img_std
return img
此差异已折叠。
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, Resize
from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
import time
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9696"])
label_dict = {}
label_idx = 0
with open("imagenet.label") as fin:
for line in fin:
label_dict[label_idx] = line.strip()
label_idx += 1
seq = Sequential([
URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
start = time.time()
image_file = "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg"
for i in range(10):
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
prob = max(fetch_map["score"][0])
label = label_dict[fetch_map["score"][0].tolist().index(prob)].strip(
).replace(",", "")
print("prediction: {}, probability: {}".format(label, prob))
end = time.time()
print(end - start)
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, URL2Image, Resize, CenterCrop, RGB2BGR, Transpose, Div, Normalize
if len(sys.argv) != 4:
print("python resnet50_web_service.py model device port")
sys.exit(-1)
device = sys.argv[2]
if device == "cpu":
from paddle_serving_server.web_service import WebService
else:
from paddle_serving_server_gpu.web_service import WebService
class ImageService(WebService):
def init_imagenet_setting(self):
self.seq = Sequential([
URL2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose(
(2, 0, 1)), Div(255), Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225], True)
])
self.label_dict = {}
label_idx = 0
with open("imagenet.label") as fin:
for line in fin:
self.label_dict[label_idx] = line.strip()
label_idx += 1
def preprocess(self, feed=[], fetch=[]):
feed_batch = []
for ins in feed:
if "image" not in ins:
raise ("feed data error!")
img = self.seq(ins["image"])
feed_batch.append({"image": img})
return feed_batch, fetch
def postprocess(self, feed=[], fetch=[], fetch_map={}):
score_list = fetch_map["score"]
result = {"label": [], "prob": []}
for score in score_list:
max_score = max(score)
result["label"].append(self.label_dict[score.index(max_score)]
.strip().replace(",", ""))
result["prob"].append(max_score)
return result
image_service = ImageService(name="image")
image_service.load_model_config(sys.argv[1])
image_service.init_imagenet_setting()
if device == "gpu":
image_service.set_gpus("0,1")
image_service.prepare_server(
workdir="workdir", port=int(sys.argv[3]), device=device)
image_service.run_rpc_service()
image_service.run_web_service()
......@@ -30,27 +30,3 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
### Benchmark
CPU :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
Model :[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
server thread num : 16
In this test, client sends 25000 test samples totally, the bar chart given later is the latency of single thread, the unit is second, from which we know the predict efficiency is improved greatly by multi-thread compared to single-thread. 8.7 times improvement is made by 16 threads prediction.
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
| 1 | 1.09 | 28.79 | 0.094 | 20.59 | 0.047 | 0.034 | 31.41 |
| 4 | 0.22 | 7.41 | 0.023 | 5.01 | 0.011 | 0.0098 | 8.01 |
| 8 | 0.11 | 4.7 | 0.012 | 2.61 | 0.0062 | 0.0049 | 5.01 |
| 12 | 0.081 | 4.69 | 0.0078 | 1.72 | 0.0042 | 0.0035 | 4.91 |
| 16 | 0.058 | 3.46 | 0.0061 | 1.32 | 0.0033 | 0.003 | 3.63 |
| 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 |
| 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 |
The thread-latency bar chart is as follow:
![total cost](../../../doc/imdb-benchmark-server-16.png)
......@@ -29,27 +29,3 @@ python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "i am very sad | 0"}], "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
### Benchmark
设备 :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
模型 :[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
server thread num : 16
测试中,client共发送25000条测试样本,图中数据为单个线程的耗时,时间单位为秒。可以看出,client端多线程的预测速度相比单线程有明显提升,在16线程时预测速度是单线程的8.7倍。
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
| 1 | 1.09 | 28.79 | 0.094 | 20.59 | 0.047 | 0.034 | 31.41 |
| 4 | 0.22 | 7.41 | 0.023 | 5.01 | 0.011 | 0.0098 | 8.01 |
| 8 | 0.11 | 4.7 | 0.012 | 2.61 | 0.0062 | 0.0049 | 5.01 |
| 12 | 0.081 | 4.69 | 0.0078 | 1.72 | 0.0042 | 0.0035 | 4.91 |
| 16 | 0.058 | 3.46 | 0.0061 | 1.32 | 0.0033 | 0.003 | 3.63 |
| 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 |
| 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 |
预测总耗时变化规律如下:
![total cost](../../../doc/imdb-benchmark-server-16.png)
......@@ -16,7 +16,7 @@
import sys
import time
import requests
from imdb_reader import IMDBDataset
from paddle_serving_app.reader import IMDBDataset
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args
......@@ -37,26 +37,39 @@ def single_func(idx, resource):
client.load_client_config(args.model)
client.connect([args.endpoint])
for i in range(1000):
if args.batch_size == 1:
word_ids, label = imdb_dataset.get_words_and_label(line)
fetch_map = client.predict(
feed={"words": word_ids}, fetch=["prediction"])
if args.batch_size >= 1:
feed_batch = []
for bi in range(args.batch_size):
word_ids, label = imdb_dataset.get_words_and_label(dataset[
bi])
feed_batch.append({"words": word_ids})
result = client.predict(feed=feed_batch, fetch=["prediction"])
if result is None:
raise ("predict failed.")
else:
print("unsupport batch size {}".format(args.batch_size))
elif args.request == "http":
for fn in filelist:
fin = open(fn)
for line in fin:
word_ids, label = imdb_dataset.get_words_and_label(line)
r = requests.post(
"http://{}/imdb/prediction".format(args.endpoint),
data={"words": word_ids,
"fetch": ["prediction"]})
if args.batch_size >= 1:
feed_batch = []
for bi in range(args.batch_size):
feed_batch.append({"words": dataset[bi]})
r = requests.post(
"http://{}/imdb/prediction".format(args.endpoint),
json={"feed": feed_batch,
"fetch": ["prediction"]})
if r.status_code != 200:
print('HTTP status code -ne 200')
raise ("predict failed.")
else:
print("unsupport batch size {}".format(args.batch_size))
end = time.time()
return [[end - start]]
multi_thread_runner = MultiThreadRunner()
result = multi_thread_runner.run(single_func, args.thread, {})
print(result)
avg_cost = 0
for cost in result[0]:
avg_cost += cost
print("total cost {} s of each thread".format(avg_cost / args.thread))
rm profile_log
for thread_num in 1 2 4 8 16
do
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --model imdbo_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
for batch_size in 1 2 4 8 16 32 64 128 256 512
do
$PYTHONROOT/bin/python benchmark.py --thread $thread_num --batch_size $batch_size --model imdb_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
echo "========================================"
echo "batch size : $batch_size" >> profile_log
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
tail -n 1 profile >> profile_log
done
done
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
import sys
import time
import requests
from imdb_reader import IMDBDataset
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args
args = benchmark_args()
def single_func(idx, resource):
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource("./imdb.vocab")
dataset = []
with open("./test_data/part-0") as fin:
for line in fin:
dataset.append(line.strip())
start = time.time()
if args.request == "rpc":
client = Client()
client.load_client_config(args.model)
client.connect([args.endpoint])
for i in range(1000):
if args.batch_size >= 1:
feed_batch = []
for bi in range(args.batch_size):
word_ids, label = imdb_dataset.get_words_and_label(dataset[
bi])
feed_batch.append({"words": word_ids})
result = client.predict(feed=feed_batch, fetch=["prediction"])
if result is None:
raise ("predict failed.")
else:
print("unsupport batch size {}".format(args.batch_size))
elif args.request == "http":
if args.batch_size >= 1:
feed_batch = []
for bi in range(args.batch_size):
feed_batch.append({"words": dataset[bi]})
r = requests.post(
"http://{}/imdb/prediction".format(args.endpoint),
json={"feed": feed_batch,
"fetch": ["prediction"]})
if r.status_code != 200:
print('HTTP status code -ne 200')
raise ("predict failed.")
else:
print("unsupport batch size {}".format(args.batch_size))
end = time.time()
return [[end - start]]
multi_thread_runner = MultiThreadRunner()
result = multi_thread_runner.run(single_func, args.thread, {})
avg_cost = 0
for cost in result[0]:
avg_cost += cost
print("total cost {} s of each thread".format(avg_cost / args.thread))
rm profile_log
for thread_num in 1 2 4 8 16
do
for batch_size in 1 2 4 8 16 32 64 128 256 512
do
$PYTHONROOT/bin/python benchmark_batch.py --thread $thread_num --batch_size $batch_size --model imdb_bow_client_conf/serving_client_conf.prototxt --request rpc > profile 2>&1
echo "========================================"
echo "batch size : $batch_size" >> profile_log
$PYTHONROOT/bin/python ../util/show_profile.py profile $thread_num >> profile_log
tail -n 1 profile >> profile_log
done
done
......@@ -13,7 +13,7 @@
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
from imdb_reader import IMDBDataset
from paddle_serving_app.reader import IMDBDataset
import sys
client = Client()
......
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
import sys
import subprocess
from multiprocessing import Pool
import time
def batch_predict(batch_size=4):
client = Client()
client.load_client_config(conf_file)
client.connect(["127.0.0.1:9292"])
fetch = ["acc", "cost", "prediction"]
feed_batch = []
for line in sys.stdin:
group = line.strip().split()
words = [int(x) for x in group[1:int(group[0])]]
label = [int(group[-1])]
feed = {"words": words, "label": label}
feed_batch.append(feed)
if len(feed_batch) == batch_size:
fetch_batch = client.batch_predict(
feed_batch=feed_batch, fetch=fetch)
for i in range(batch_size):
print("{} {}".format(fetch_batch[i]["prediction"][1],
feed_batch[i]["label"][0]))
feed_batch = []
if len(feed_batch) > 0:
fetch_batch = client.batch_predict(feed_batch=feed_batch, fetch=fetch)
for i in range(len(feed_batch)):
print("{} {}".format(fetch_batch[i]["prediction"][1], feed_batch[i][
"label"][0]))
if __name__ == '__main__':
conf_file = sys.argv[1]
batch_size = int(sys.argv[2])
batch_predict(batch_size)
......@@ -14,7 +14,7 @@
# pylint: disable=doc-string-missing
from paddle_serving_server.web_service import WebService
from imdb_reader import IMDBDataset
from paddle_serving_app.reader import IMDBDataset
import sys
......@@ -37,5 +37,5 @@ imdb_service.load_model_config(sys.argv[1])
imdb_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
imdb_service.prepare_dict({"dict_file_path": sys.argv[4]})
imdb_service.run_server()
imdb_service.run_flask()
imdb_service.run_rpc_service()
imdb_service.run_web_service()
......@@ -2,28 +2,27 @@
([简体中文](./README_CN.md)|English)
### Get model files and sample data
### Get Model
```
sh get_data.sh
python -m paddle_serving_app.package --get_model lac
tar -xzvf lac.tar.gz
```
the package downloaded contains lac model config along with lac dictionary.
#### Start RPC inference service
```
python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
python -m paddle_serving_server.serve --model lac_model/ --port 9292
```
### RPC Infer
```
echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
```
it will get the segmentation result
It will get the segmentation result.
### Start HTTP inference service
```
python lac_web_service.py jieba_server_model/ lac_workdir 9292
python lac_web_service.py lac_model/ lac_workdir 9292
```
### HTTP Infer
......
......@@ -2,28 +2,27 @@
(简体中文|[English](./README.md))
### 获取模型和字典文件
### 获取模型
```
sh get_data.sh
python -m paddle_serving_app.package --get_model lac
tar -xzvf lac.tar.gz
```
下载包里包含了lac模型和lac模型预测需要的字典文件
#### 开启RPC预测服务
```
python -m paddle_serving_server.serve --model jieba_server_model/ --port 9292
python -m paddle_serving_server.serve --model lac_model/ --port 9292
```
### 执行RPC预测
```
echo "我爱北京天安门" | python lac_client.py jieba_client_conf/serving_client_conf.prototxt lac_dict/
echo "我爱北京天安门" | python lac_client.py lac_client/serving_client_conf.prototxt
```
我们就能得到分词结果
### 开启HTTP预测服务
```
python lac_web_service.py jieba_server_model/ lac_workdir 9292
python lac_web_service.py lac_model/ lac_workdir 9292
```
### 执行HTTP预测
......
......@@ -16,7 +16,7 @@
import sys
import time
import requests
from lac_reader import LACReader
from paddle_serving_app.reader import LACReader
from paddle_serving_client import Client
from paddle_serving_client.utils import MultiThreadRunner
from paddle_serving_client.utils import benchmark_args
......@@ -25,7 +25,7 @@ args = benchmark_args()
def single_func(idx, resource):
reader = LACReader("lac_dict")
reader = LACReader()
start = time.time()
if args.request == "rpc":
client = Client()
......
......@@ -15,7 +15,7 @@
# pylint: disable=doc-string-missing
from paddle_serving_client import Client
from lac_reader import LACReader
from paddle_serving_app.reader import LACReader
import sys
import os
import io
......@@ -24,7 +24,7 @@ client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
reader = LACReader(sys.argv[2])
reader = LACReader()
for line in sys.stdin:
if len(line) <= 0:
continue
......@@ -32,4 +32,7 @@ for line in sys.stdin:
if len(feed_data) <= 0:
continue
fetch_map = client.predict(feed={"words": feed_data}, fetch=["crf_decode"])
print(fetch_map)
begin = fetch_map['crf_decode.lod'][0]
end = fetch_map['crf_decode.lod'][1]
segs = reader.parse_result(line, fetch_map["crf_decode"][begin:end])
print("word_seg: " + "|".join(str(words) for words in segs))
......@@ -14,8 +14,10 @@
from paddle_serving_client import Client
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
py_version = sys.version_info[0]
if py_version == 2:
reload(sys)
sys.setdefaultencoding('utf-8')
import os
import io
......
......@@ -14,12 +14,12 @@
from paddle_serving_server.web_service import WebService
import sys
from lac_reader import LACReader
from paddle_serving_app.reader import LACReader
class LACService(WebService):
def load_reader(self):
self.reader = LACReader("lac_dict")
self.reader = LACReader()
def preprocess(self, feed={}, fetch=[]):
feed_batch = []
......@@ -47,5 +47,5 @@ lac_service.load_model_config(sys.argv[1])
lac_service.load_reader()
lac_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
lac_service.run_server()
lac_service.run_flask()
lac_service.run_rpc_service()
lac_service.run_web_service()
# Image Classification
## Get Model
```
python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
tar -xzvf mobilenet_v2_imagenet.tar.gz
```
## RPC Service
### Start Service
```
python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
```
### Client Prediction
```
python mobilenet_tutorial.py
```
# 图像分类
## 获取模型
```
python -m paddle_serving_app.package --get_model mobilenet_v2_imagenet
tar -xzvf mobilenet_v2_imagenet.tar.gz
```
## RPC 服务
### 启动服务端
```
python -m paddle_serving_server_gpu.serve --model mobilenet_v2_imagenet_model --gpu_ids 0 --port 9393
```
### 客户端预测
```
python mobilenet_tutorial.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize
from paddle_serving_app.reader import CenterCrop, RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"mobilenet_v2_imagenet_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["feature_map"])
print(fetch_map["feature_map"].reshape(-1))
# OCR
## Get Model
```
python -m paddle_serving_app.package --get_model ocr_rec
tar -xzvf ocr_rec.tar.gz
```
## RPC Service
### Start Service
```
python -m paddle_serving_server.serve --model ocr_rec_model --port 9292
```
### Client Prediction
```
python test_ocr_rec_client.py
```
......@@ -12,23 +12,20 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import sys
from image_reader import ImageReader
from paddle_serving_client import Client
import time
from paddle_serving_app.reader import OCRReader
import cv2
client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9393"])
reader = ImageReader()
client.load_client_config("ocr_rec_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9292"])
start = time.time()
for i in range(1000):
with open("./data/n01440764_10026.JPEG", "rb") as f:
img = f.read()
img = reader.process_image(img)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
end = time.time()
print(end - start)
#print(fetch_map["score"])
image_file_list = ["./test_rec.jpg"]
img = cv2.imread(image_file_list[0])
ocr_reader = OCRReader()
feed = {"image": ocr_reader.preprocess([img])}
fetch = ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
fetch_map = client.predict(feed=feed, fetch=fetch)
rec_res = ocr_reader.postprocess(fetch_map)
print(image_file_list[0])
print(rec_res[0][0])
# Image Classification
## Get Model
```
python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
tar -xzvf resnet_v2_50_imagenet.tar.gz
```
## RPC Service
### Start Service
```
python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
```
### Client Prediction
```
python resnet50_v2_tutorial.py
```
# 图像分类
## 获取模型
```
python -m paddle_serving_app.package --get_model resnet_v2_50_imagenet
tar -xzvf resnet_v2_50_imagenet.tar.gz
```
## RPC 服务
### 启动服务端
```
python -m paddle_serving_server_gpu.serve --model resnet_v2_50_imagenet_model --gpu_ids 0 --port 9393
```
### 客户端预测
```
python resnet50_v2_tutorial.py
```
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
from paddle_serving_app.local_predict import Debugger
import sys
debugger = Debugger()
debugger.load_model_config(sys.argv[1], gpu=True)
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = debugger.predict(feed={"image": img}, fetch=["feature_map"])
print(fetch_map["feature_map"].reshape(-1))
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential, File2Image, Resize, CenterCrop
from paddle_serving_app.reader import RGB2BGR, Transpose, Div, Normalize
client = Client()
client.load_client_config(
"resnet_v2_50_imagenet_client/serving_client_conf.prototxt")
client.connect(["127.0.0.1:9393"])
seq = Sequential([
File2Image(), Resize(256), CenterCrop(224), RGB2BGR(), Transpose((2, 0, 1)),
Div(255), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], True)
])
image_file = "daisy.jpg"
img = seq(image_file)
fetch_map = client.predict(feed={"image": img}, fetch=["score"])
print(fetch_map["score"].reshape(-1))
# Chinese sentence sentiment classification
# Chinese Sentence Sentiment Classification
([简体中文](./README_CN.md)|English)
## Get model files and sample data
## Get Model
```
sh get_data.sh
python -m paddle_serving_app.package --get_model senta_bilstm
python -m paddle_serving_app.package --get_model lac
tar -xzvf senta_bilstm.tar.gz
tar -xzvf lac.tar.gz
```
## Start http service
## Start HTTP Service
```
python senta_web_service.py senta_bilstm_model/ workdir 9292
python -m paddle_serving_server.serve --model lac_model --port 9300
python senta_web_service.py
```
In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). Set model path by ```lac_model_path``` and dictionary path by ```lac_dict_path```.
In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task. The LAC prediction service is deployed on the CPU, and the sentiment classification task is deployed on the GPU, which can be changed according to the actual situation.
In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac).
In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task.
## Client prediction
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
```
# 中文语句情感分类
(简体中文|[English](./README.md))
## 获取模型文件和样例数据
## 获取模型文件
```
sh get_data.sh
python -m paddle_serving_app.package --get_model senta_bilstm
python -m paddle_serving_app.package --get_model lac
tar -xzvf lac.tar.gz
tar -xzvf senta_bilstm.tar.gz
```
## 启动HTTP服务
```
python senta_web_service.py senta_bilstm_model/ workdir 9292
python -m paddle_serving_server.serve --model lac_model --port 9300
python senta_web_service.py
```
中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词,在脚本中通过```lac_model_path```参数配置LAC任务的模型文件路径,```lac_dict_path```参数配置LAC任务词典路径
示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分,LAC预测服务部署在CPU上,情感分类任务部署在GPU上,可以根据实际情况进行更改
中文情感分类任务中需要先通过[LAC任务](../lac)进行中文分词。
示例中将LAC任务放在情感分类任务的HTTP预测服务的预处理部分。
## 客户端预测
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
```
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SentimentAnalysis/senta_bilstm.tar.gz --no-check-certificate
tar -xzvf senta_bilstm.tar.gz
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/lac_model.tar.gz --no-check-certificate
tar -xzvf lac_model.tar.gz
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/LexicalAnalysis/lac.tar.gz --no-check-certificate
tar -xzvf lac.tar.gz
wget https://paddle-serving.bj.bcebos.com/reader/lac/lac_dict.tar.gz --no-check-certificate
tar -xzvf lac_dict.tar.gz
wget https://paddle-serving.bj.bcebos.com/reader/senta/vocab.txt --no-check-certificate
#encoding=utf-8
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
......@@ -12,97 +13,49 @@
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_server_gpu.web_service import WebService
from paddle_serving_server.web_service import WebService
from paddle_serving_client import Client
from paddle_serving_app import LACReader, SentaReader
import numpy as np
from paddle_serving_app.reader import LACReader, SentaReader
import os
import io
import sys
import subprocess
from multiprocessing import Process, Queue
#senta_web_service.py
from paddle_serving_server.web_service import WebService
from paddle_serving_client import Client
from paddle_serving_app.reader import LACReader, SentaReader
class SentaService(WebService):
def set_config(
self,
lac_model_path,
lac_dict_path,
senta_dict_path, ):
self.lac_model_path = lac_model_path
self.lac_client_config_path = lac_model_path + "/serving_server_conf.prototxt"
self.lac_dict_path = lac_dict_path
self.senta_dict_path = senta_dict_path
self.show = False
def show_detail(self, show=False):
self.show = show
def start_lac_service(self):
if not os.path.exists('./lac_serving'):
os.mkdir("./lac_serving")
os.chdir('./lac_serving')
self.lac_port = self.port + 100
r = os.popen(
"python -m paddle_serving_server.serve --model {} --port {} &".
format("../" + self.lac_model_path, self.lac_port))
os.chdir('..')
def init_lac_service(self):
ps = Process(target=self.start_lac_service())
ps.start()
#self.init_lac_client()
def lac_predict(self, feed_data):
self.init_lac_client()
lac_result = self.lac_client.predict(
feed={"words": feed_data}, fetch=["crf_decode"])
self.lac_client.release()
return lac_result
def init_lac_client(self):
class SentaService(WebService):
#初始化lac模型预测服务
def init_lac_client(self, lac_port, lac_client_config):
self.lac_reader = LACReader()
self.senta_reader = SentaReader()
self.lac_client = Client()
self.lac_client.load_client_config(self.lac_client_config_path)
self.lac_client.connect(["127.0.0.1:{}".format(self.lac_port)])
def init_lac_reader(self):
self.lac_reader = LACReader(self.lac_dict_path)
def init_senta_reader(self):
self.senta_reader = SentaReader(vocab_path=self.senta_dict_path)
self.lac_client.load_client_config(lac_client_config)
self.lac_client.connect(["127.0.0.1:{}".format(lac_port)])
#定义senta模型预测服务的预处理,调用顺序:lac reader->lac模型预测->预测结果后处理->senta reader
def preprocess(self, feed=[], fetch=[]):
feed_data = self.lac_reader.process(feed[0]["words"])
if self.show:
print("---- lac reader ----")
print(feed_data)
lac_result = self.lac_predict(feed_data)
if self.show:
print("---- lac out ----")
print(lac_result)
segs = self.lac_reader.parse_result(feed[0]["words"],
lac_result["crf_decode"])
if self.show:
print("---- lac parse ----")
print(segs)
feed_data = self.senta_reader.process(segs)
if self.show:
print("---- senta reader ----")
print("feed_data", feed_data)
return [{"words": feed_data}], fetch
feed_data = [{
"words": self.lac_reader.process(x["words"])
} for x in feed]
lac_result = self.lac_client.predict(
feed=feed_data, fetch=["crf_decode"])
feed_batch = []
result_lod = lac_result["crf_decode.lod"]
for i in range(len(feed)):
segs = self.lac_reader.parse_result(
feed[i]["words"],
lac_result["crf_decode"][result_lod[i]:result_lod[i + 1]])
feed_data = self.senta_reader.process(segs)
feed_batch.append({"words": feed_data})
return feed_batch, fetch
senta_service = SentaService(name="senta")
#senta_service.show_detail(True)
senta_service.set_config(
lac_model_path="./lac_model",
lac_dict_path="./lac_dict",
senta_dict_path="./vocab.txt")
senta_service.load_model_config(sys.argv[1])
senta_service.prepare_server(
workdir=sys.argv[2], port=int(sys.argv[3]), device="cpu")
senta_service.init_lac_reader()
senta_service.init_senta_reader()
senta_service.init_lac_service()
senta_service.run_server()
senta_service.run_flask()
senta_service.load_model_config("senta_bilstm_model")
senta_service.prepare_server(workdir="workdir")
senta_service.init_lac_client(
lac_port=9300, lac_client_config="lac_model/serving_server_conf.prototxt")
senta_service.run_rpc_service()
senta_service.run_web_service()
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册