提交 0d39257f 编写于 作者: M MRXLT

fix conflict

......@@ -18,19 +18,19 @@
<h2 align="center">Motivation</h2>
We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you can put the model online without much effort. A demo of serving is as follows:
<p align="center">
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">Some Key Features</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
- Integrate with Paddle training pipeline seemlessly, most paddle models can be deployed **with one line command**.
- **Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
- **Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers supported.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python.
- **Extensible framework design** which can support model serving beyond Paddle.
- **Distributed Key-Value indexing** supported that is especially useful for large scale sparse features as model inputs.
- **Highly concurrent and efficient communication** between clients and servers.
- **Multiple programming languages** supported on client side, such as Golang, C++ and python
- **Extensible framework design** that can support model serving beyond Paddle.
<h2 align="center">Installation</h2>
......@@ -53,7 +53,7 @@ Paddle Serving provides HTTP and RPC based service for users to access
### HTTP service
Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a RPC service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a rpc service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
``` shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
```
......@@ -75,7 +75,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.25
### RPC service
A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here.
A user can also start a rpc service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here.
``` shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
```
......@@ -154,7 +154,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
{"label":"daisy","prob":0.9341403245925903}
```
<h3 align="center">More Demos</h4>
<h3 align="center">More Demos</h3>
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
......@@ -239,17 +239,17 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
### New to Paddle Serving
- [How to save a servable model?](doc/SAVE.md)
- [An End-to-end tutorial from training to inference service deployment](doc/TRAIN_TO_SERVICE.md)
- [Write Bert-as-Service in 10 minutes](doc/BERT_10_MINS.md)
- [An end-to-end tutorial from training to serving(Chinese)](doc/TRAIN_TO_SERVICE.md)
- [Write Bert-as-Service in 10 minutes(Chinese)](doc/BERT_10_MINS.md)
### Developers
- [How to config Serving native operators on server side?](doc/SERVER_DAG.md)
- [How to develop a new Serving operator?](doc/NEW_OPERATOR.md)
- [How to develop a new Serving operator](doc/NEW_OPERATOR.md)
- [Golang client](doc/IMDB_GO_CLIENT.md)
- [Compile from source code](doc/COMPILE.md)
- [Compile from source code(Chinese)](doc/COMPILE.md)
### About Efficiency
- [How to profile Paddle Serving latency?(Chinese)](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
- [How to profile Paddle Serving latency?](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
- [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
......@@ -258,7 +258,8 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
### Design
- [Design Doc](doc/DESIGN_DOC.md)
- [Design Doc(Chinese)](doc/DESIGN_DOC.md)
- [Design Doc(English)](doc/DESIGN_DOC_EN.md)
<h2 align="center">Community</h2>
......
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "130">
<br>
<p>
<p align="center">
<br>
<a href="https://travis-ci.com/PaddlePaddle/Serving">
<img alt="Build Status" src="https://img.shields.io/travis/com/PaddlePaddle/Serving/develop">
</a>
<img alt="Release" src="https://img.shields.io/badge/Release-0.0.3-yellowgreen">
<img alt="Issues" src="https://img.shields.io/github/issues/PaddlePaddle/Serving">
<img alt="License" src="https://img.shields.io/github/license/PaddlePaddle/Serving">
<img alt="Slack" src="https://img.shields.io/badge/Join-Slack-green">
<br>
<p>
<h2 align="center">动机</h2>
<img src='https://paddle-serving.bj.bcebos.com/imdb-demo%2FLogoMakr-3Bd2NM-300dpi.png' width = "600" height = "127">
Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 当用户使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络,就同时拥有了该模型的预测服务。
[![Build Status](https://img.shields.io/travis/com/PaddlePaddle/Serving/develop)](https://travis-ci.com/PaddlePaddle/Serving)
[![Release](https://img.shields.io/badge/Release-0.0.3-yellowgreen)](Release)
[![Issues](https://img.shields.io/github/issues/PaddlePaddle/Serving)](Issues)
[![License](https://img.shields.io/github/license/PaddlePaddle/Serving)](LICENSE)
[![Slack](https://img.shields.io/badge/Join-Slack-green)](https://paddleserving.slack.com/archives/CU0PB4K35)
## 动机
Paddle Serving 帮助深度学习开发者轻易部署在线预测服务。 **本项目目标**: 只要你使用 [Paddle](https://github.com/PaddlePaddle/Paddle) 训练了一个深度神经网络,你就同时拥有了该模型的预测服务。
<p align="center">
<img src="doc/demo.gif" width="700">
</p>
<h2 align="center">核心功能</h2>
## 核心功能
- 与Paddle训练紧密连接,绝大部分Paddle模型可以 **一键部署**.
- 支持 **工业级的服务能力** 例如模型管理,在线加载,在线A/B测试等.
- 支持 **分布式键值对索引** 助力于大规模稀疏特征作为模型输入.
......@@ -33,7 +20,7 @@ Paddle Serving 旨在帮助深度学习开发者轻易部署在线预测服务
- 支持 **多种编程语言** 开发客户端,例如Golang,C++和Python.
- **可伸缩框架设计** 可支持不限于Paddle的模型服务.
<h2 align="center">安装</h2>
## 安装
强烈建议您在Docker内构建Paddle Serving,请查看[如何在Docker中运行PaddleServing](doc/RUN_IN_DOCKER_CN.md)
......@@ -42,51 +29,17 @@ pip install paddle-serving-client
pip install paddle-serving-server
```
<h2 align="center">快速启动示例</h2>
<h3 align="center">波士顿房价预测</h3>
## 快速启动示例
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/uci_housing.tar.gz
tar -xzf uci_housing.tar.gz
```
Paddle Serving 为用户提供了基于 HTTP 和 RPC 的服务
<h3 align="center">HTTP服务</h3>
Paddle Serving提供了一个名为`paddle_serving_server.serve`的内置python模块,可以使用单行命令启动RPC服务或HTTP服务。如果我们指定参数`--name uci`,则意味着我们将拥有一个HTTP服务,其URL为$IP:$PORT/uci/prediction`。
``` shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --name uci
```
<center>
| Argument | Type | Default | Description |
|--------------|------|-----------|--------------------------------|
| `thread` | int | `4` | Concurrency of current service |
| `port` | int | `9292` | Exposed port of current service to users|
| `name` | str | `""` | Service name, can be used to generate HTTP request url |
| `model` | str | `""` | Path of paddle model directory to be served |
我们使用 `curl` 命令来发送HTTP POST请求给刚刚启动的服务。用户也可以调用python库来发送HTTP POST请求,请参考英文文档 [requests](https://requests.readthedocs.io/en/master/)。
</center>
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9292/uci/prediction
```
<h3 align="center">RPC服务</h3>
用户还可以使用`paddle_serving_server.serve`启动RPC服务。 尽管用户需要基于Paddle Serving的python客户端API进行一些开发,但是RPC服务通常比HTTP服务更快。需要指出的是这里我们没有指定`--name`。
``` shell
python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292
```
Python客户端请求
``` python
# A user can visit rpc service through paddle_serving_client API
from paddle_serving_client import Client
client = Client()
......@@ -159,6 +112,88 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
{"label":"daisy","prob":0.9341403245925903}
```
<h3 align="center">更多示例</h3>
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Bert-Base-Baike |
| 下载链接 | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
| 介绍 | 获得一个中文语句的语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Resnet50-Imagenet |
| 下载链接 | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| 介绍 | 获得一张图片的图像语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Resnet101-Imagenet |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| 介绍 | 获得一张图片的图像语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | CNN-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个中文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | LSTM-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个英文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | BOW-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个英文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Jieba-LAC |
| 下载链接 | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
| 介绍 | 获取中文语句的分词 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | DNN-CTR |
| 下载链接 | None(Get model by [local_train.py](./python/examples/criteo_ctr/local_train.py)) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
| 介绍 | 从项目的特征向量中获得点击概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | DNN-CTR(with cube) |
| 下载链接 | None(Get model by [local_train.py](python/examples/criteo_ctr_with_cube/local_train.py)) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr_with_cube |
| 介绍 | 从项目的特征向量中获得点击概率 |
<h2 align="center">文档</h2>
### 新手教程
......@@ -173,30 +208,30 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
- [如何编译PaddleServing?](doc/COMPILE_CN.md)
### 关于Paddle Serving性能
- [如何测试Paddle Serving性能?](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
- [如何测试Paddle Serving性能?](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util/)
- [CPU版Benchmarks](doc/BENCHMARKING.md)
- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
### FAQ
- [常见问答](doc/deprecated/FAQ.md)
### 设计文档
- [Paddle Serving设计文档](doc/DESIGN_DOC_CN.md)
## 文档
<h2 align="center">社区</h2>
[开发文档](doc/DESIGN.md)
### Slack
[如何在服务器端配置本地Op?](doc/SERVER_DAG.md)
想要同开发者和其他用户沟通吗?欢迎加入我们的 [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
[如何开发一个新的Op?](doc/NEW_OPERATOR.md)
### 贡献代码
[Golang 客户端](doc/IMDB_GO_CLIENT.md)
如果您想为Paddle Serving贡献代码,请参考 [Contribution Guidelines](doc/CONTRIBUTE.md)
[从源码编译](doc/COMPILE.md)
### 反馈
[常见问答](doc/FAQ.md)
如有任何反馈或是bug,请在 [GitHub Issue](https://github.com/PaddlePaddle/Serving/issues)提交
## 加入社区
如果您想要联系其他用户和开发者,欢迎加入我们的 [Slack channel](https://paddleserving.slack.com/archives/CUBPKHKMJ)
### License
## 如何贡献代码
[Apache 2.0 License](https://github.com/PaddlePaddle/Serving/blob/develop/LICENSE)
如果您想要贡献代码给Paddle Serving,请参考[Contribution Guidelines](doc/CONTRIBUTE.md)
## 语义理解预测服务
## Bert as service
示例中采用BERT模型进行语义理解预测,将文本表示为向量的形式,可以用来做进一步的分析和预测。
([简体中文](./README_CN.md)|English)
### 获取模型
In the example, a BERT model is used for semantic understanding prediction, and the text is represented as a vector, which can be used for further analysis and prediction.
示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)
请先安装paddlehub
### Getting Model
This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub).
Install paddlehub first
```
pip install paddlehub
```
执行
run
```
python prepare_model.py 20
```
参数20表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq20_model文件夹
生成client端配置文件,存放在bert_seq20_client文件夹
### 获取词典和样例数据
the 20 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
the config file and model file for server side are saved in the folder bert_seq20_model.
the config file generated for client side is saved in the folder bert_seq20_client.
### Getting Dict and Sample Dataset
```
sh get_data.sh
```
脚本将下载中文词典vocab.txt和中文样例数据data-c.txt
this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
### 启动RPC预测服务
执行
### RPC Inference Service
Run
```
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #启动cpu预测服务
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #cpu inference service
```
或者
Or
```
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
```
### 执行预测
### RPC Inference
执行预测前需要安装paddle_serving_app,模块中提供了BERT模型的数据预处理方法。
before prediction we should install paddle_serving_app. This module provides data preprocessing for BERT model.
```
pip install paddle_serving_app
```
执行
Run
```
head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt
```
启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。
### 启动HTTP预测服务
the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
### HTTP Inference Service
```
export CUDA_VISIBLE_DEVICES=0,1
```
通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu
set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
```
python bert_web_service.py bert_seq20_model/ 9292 #启动gpu预测服务
python bert_web_service.py bert_seq20_model/ 9292 #launch gpu inference service
```
### 执行预测
### HTTP Inference
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
......@@ -62,16 +68,17 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[
### Benchmark
模型:bert_chinese_L-12_H-768_A-12
Model:bert_chinese_L-12_H-768_A-12
GPU:GPU V100 * 1
设备:GPU V100 * 1
CUDA/cudnn Version:CUDA 9.2,cudnn 7.1.4
环境:CUDA 9.2,cudnn 7.1.4
测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20,时间单位为秒.
In the test, 10 thousand samples in the sample data are copied into 100 thousand samples. Each client thread sends a sample of the number of threads. The batch size is 1, the max_seq_len is 20, and the time unit is seconds.
在client线程数为4时,预测速度可以达到432样本每秒。
由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。
When the number of client threads is 4, the prediction speed can reach 432 samples per second.
Because a single GPU can only perform serial calculations internally, increasing the number of client threads can only reduce the idle time of the GPU. Therefore, after the number of threads reaches 4, the increase in the number of threads does not improve the prediction speed.
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
......@@ -81,5 +88,5 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[
| 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 |
| 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 |
总耗时变化规律如下:
the following is the client thread num - latency bar chart:
![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
## 语义理解预测服务
(简体中文|[English](./README.md))
示例中采用BERT模型进行语义理解预测,将文本表示为向量的形式,可以用来做进一步的分析和预测。
### 获取模型
示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)
请先安装paddlehub
```
pip install paddlehub
```
执行
```
python prepare_model.py 20
```
参数20表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq20_model文件夹
生成client端配置文件,存放在bert_seq20_client文件夹
### 获取词典和样例数据
```
sh get_data.sh
```
脚本将下载中文词典vocab.txt和中文样例数据data-c.txt
### 启动RPC预测服务
执行
```
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #启动cpu预测服务
```
或者
```
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
```
### 执行预测
执行预测前需要安装paddle_serving_app,模块中提供了BERT模型的数据预处理方法。
```
pip install paddle_serving_app
```
执行
```
head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt
```
启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。
### 启动HTTP预测服务
```
export CUDA_VISIBLE_DEVICES=0,1
```
通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu
```
python bert_web_service.py bert_seq20_model/ 9292 #启动gpu预测服务
```
### 执行预测
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
```
### Benchmark
模型:bert_chinese_L-12_H-768_A-12
设备:GPU V100 * 1
环境:CUDA 9.2,cudnn 7.1.4
测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20,时间单位为秒.
在client线程数为4时,预测速度可以达到432样本每秒。
由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
| 1 | 3.05 | 290.54 | 0.37 | 239.15 | 6.43 | 0.71 | 365.63 |
| 4 | 0.85 | 213.66 | 0.091 | 200.39 | 1.62 | 0.2 | 231.45 |
| 8 | 0.42 | 223.12 | 0.043 | 110.99 | 0.8 | 0.098 | 232.05 |
| 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 |
| 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 |
总耗时变化规律如下:
![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
# Fit a line example, prediction through rpc service
Start rpc service
([简体中文](./README_CN.md)|English)
## Start rpc service
``` shell
sh get_data.sh
python test_server.py uci_housing_model/
```
Prediction
## Prediction
``` shell
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
# prediction through http service
## prediction through http service
Start a web service with default web service hosting modules
``` shell
python -m paddle_serving_server.web_serve --model uci_housing_model/ --thread 10 --name uci --port 9393 --name uci
```
Prediction through http post
## Prediction through http post
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
```
# 线性回归,RPC预测服务示例
(简体中文|[English](./README.md))
## 开启RPC服务端
``` shell
sh get_data.sh
python test_server.py uci_housing_model/
```
## RPC预测
``` shell
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
## 开启HTTP服务端
Start a web service with default web service hosting modules
``` shell
python -m paddle_serving_server.web_serve --model uci_housing_model/ --thread 10 --name uci --port 9393 --name uci
```
## HTTP预测
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
```
## 图像分类示例
## Image Classification
示例中采用ResNet50_vd模型执行imagenet 1000分类任务。
([简体中文](./README_CN.md)|English)
### 获取模型配置文件和样例数据
The example uses the ResNet50_vd model to perform the imagenet 1000 classification task.
### Get model config and sample dataset
```
sh get_model.sh
```
### 执行HTTP预测服务
### HTTP Infer
启动server端
launch server side
```
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu预测服务
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu inference service
```
```
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu预测服务
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu inference service
```
client端进行预测
client send inference request
```
python image_http_client.py
```
### 执行RPC预测服务
### RPC Infer
启动server端
launch server side
```
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu预测服务
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu inference service
```
```
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu预测服务
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu inference service
```
client端进行预测
client send inference request
```
python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
```
*server端示例中服务端口为9393端口,client端示例中数据来自./data文件夹,server端地址为本地9393端口,可根据实际情况更改脚本。*
*the port of server side in this example is 9393, the sample data used by client side is in the folder ./data. These parameter can be modified in practice*
## 图像分类示例
(简体中文|[English](./README.md))
示例中采用ResNet50_vd模型执行imagenet 1000分类任务。
### 获取模型配置文件和样例数据
```
sh get_model.sh
```
### 执行HTTP预测服务
启动server端
```
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu预测服务
```
```
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu预测服务
```
client端进行预测
```
python image_http_client.py
```
### 执行RPC预测服务
启动server端
```
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu预测服务
```
```
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu预测服务
```
client端进行预测
```
python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
```
*server端示例中服务端口为9393端口,client端示例中数据来自./data文件夹,server端地址为本地9393端口,可根据实际情况更改脚本。*
## IMDB评论情绪预测服务
## IMDB comment sentiment inference service
([简体中文](./README_CN.md)|English)
### 获取模型文件和样例数据
### Get model files and sample data
```
sh get_data.sh
```
脚本会下载和解压出cnn、lstm和bow三种模型的配置文文件以及test_data和train_data。
the package downloaded contains cnn, lstm and bow model config along with their test_data and train_data.
### 启动RPC预测服务
### Start RPC inference service
```
python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292
```
### 执行预测
### RPC Infer
```
head test_data/part-0 | python test_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab
```
预测test_data/part-0的前十个样例。
### 启动HTTP预测服务
it will get predict results of the first 10 test cases.
### Start HTTP inference service
```
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
### 执行预测
### HTTP Infer
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
......@@ -31,13 +33,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0
### Benchmark
设备 :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
CPU :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
模型[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
Model[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
server thread num : 16
测试中,client共发送25000条测试样本,图中数据为单个线程的耗时,时间单位为秒。可以看出,client端多线程的预测速度相比单线程有明显提升,在16线程时预测速度是单线程的8.7倍。
In this test, client sends 25000 test samples totally, the bar chart given later is the latency of single thread, the unit is second, from which we know the predict efficiency is improved greatly by multi-thread compared to single-thread. 8.7 times improvement is made by 16 threads prediction.
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
......@@ -49,6 +51,6 @@ server thread num : 16
| 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 |
| 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 |
预测总耗时变化规律如下
The thread-latency bar chart is as follow
![total cost](../../../doc/imdb-benchmark-server-16.png)
## IMDB评论情绪预测服务
(简体中文|[English](./README.md))
### 获取模型文件和样例数据
```
sh get_data.sh
```
脚本会下载和解压出cnn、lstm和bow三种模型的配置文文件以及test_data和train_data。
### 启动RPC预测服务
```
python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292
```
### 执行预测
```
head test_data/part-0 | python test_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab
```
预测test_data/part-0的前十个样例。
### 启动HTTP预测服务
```
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
### 执行预测
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
### Benchmark
设备 :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
模型 :[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
server thread num : 16
测试中,client共发送25000条测试样本,图中数据为单个线程的耗时,时间单位为秒。可以看出,client端多线程的预测速度相比单线程有明显提升,在16线程时预测速度是单线程的8.7倍。
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
| 1 | 1.09 | 28.79 | 0.094 | 20.59 | 0.047 | 0.034 | 31.41 |
| 4 | 0.22 | 7.41 | 0.023 | 5.01 | 0.011 | 0.0098 | 8.01 |
| 8 | 0.11 | 4.7 | 0.012 | 2.61 | 0.0062 | 0.0049 | 5.01 |
| 12 | 0.081 | 4.69 | 0.0078 | 1.72 | 0.0042 | 0.0035 | 4.91 |
| 16 | 0.058 | 3.46 | 0.0061 | 1.32 | 0.0033 | 0.003 | 3.63 |
| 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 |
| 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 |
预测总耗时变化规律如下:
![total cost](../../../doc/imdb-benchmark-server-16.png)
## Timeline工具使用
## Timeline Tool Tutorial
serving框架中内置了预测服务中各阶段时间打点的功能,在client端通过环境变量来控制是否开启,开启后会将打点信息输出到屏幕。
([简体中文](./README_CN.md)|English)
The serving framework has a built-in function for predicting the timing of each stage of the service. The client controls whether to turn on the environment through environment variables. After opening, the information will be output to the screen.
```
export FLAGS_profile_client=1 #开启client端各阶段时间打点
export FLAGS_profile_server=1 #开启server端各阶段时间打点
export FLAGS_profile_client=1 #turn on the client timing tool for each stage
export FLAGS_profile_server=1 #turn on the server timing tool for each stage
```
开启该功能后,client端在预测的过程中会将对应的日志信息打印到标准输出。
After enabling this function, the client will print the corresponding log information to standard output during the prediction process.
为了更直观地展现各阶段的耗时,提供脚本对日志文件做进一步的分析处理。
In order to show the time consuming of each stage more intuitively, a script is provided to further analyze and process the log file.
使用时先将client的输出保存到文件,以profile为例。
When using, first save the output of the client to a file, taking `profile` as an example.
```
python show_profile.py profile ${thread_num}
```
这里thread_num参数为client运行时的进程数,脚本将按照这个参数来计算各阶段的平均耗时。
Here the `thread_num` parameter is the number of processes when the client is running, and the script will calculate the average time spent in each phase according to this parameter.
脚本将计算各阶段的耗时,并除以线程数做平均,打印到标准输出。
The script calculates the time spent in each stage, divides by the number of threads to average, and prints to standard output.
```
python timeline_trace.py profile trace
```
脚本将日志中的时间打点信息转换成json格式保存到trace文件,trace文件可以通过chrome浏览器的tracing功能进行可视化。
The script converts the time-dot information in the log into a json format and saves it to a trace file. The trace file can be visualized through the tracing function of the Chrome browser.
具体操作:打开chrome浏览器,在地址栏输入chrome://tracing/,跳转至tracing页面,点击load按钮,打开保存的trace文件,即可将预测服务的各阶段时间信息可视化。
Specific operation: Open the chrome browser, enter `chrome://tracing/` in the address bar, jump to the tracing page, click the `load` button, and open the saved trace file to visualize the time information of each stage of the prediction service.
效果如下图,图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务,server端开启4卡预测,client端启动4进程,batch size为1时的各阶段timeline,其中bert_pre代表client端的数据预处理阶段,client_infer代表client完成预测请求的发送和接收结果的阶段,图中的process代表的是client的进程号,每个进进程的第二行展示的是server各个op的timeline。
The data visualization output is shown as follow, it uses [bert as service example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
![timeline](../../../doc/timeline-example.png)
## Timeline工具使用
(简体中文|[English](./README.md))
serving框架中内置了预测服务中各阶段时间打点的功能,在client端通过环境变量来控制是否开启,开启后会将打点信息输出到屏幕。
```
export FLAGS_profile_client=1 #开启client端各阶段时间打点
export FLAGS_profile_server=1 #开启server端各阶段时间打点
```
开启该功能后,client端在预测的过程中会将对应的日志信息打印到标准输出。
为了更直观地展现各阶段的耗时,提供脚本对日志文件做进一步的分析处理。
使用时先将client的输出保存到文件,以profile为例。
```
python show_profile.py profile ${thread_num}
```
这里thread_num参数为client运行时的进程数,脚本将按照这个参数来计算各阶段的平均耗时。
脚本将计算各阶段的耗时,并除以线程数做平均,打印到标准输出。
```
python timeline_trace.py profile trace
```
脚本将日志中的时间打点信息转换成json格式保存到trace文件,trace文件可以通过chrome浏览器的tracing功能进行可视化。
具体操作:打开chrome浏览器,在地址栏输入chrome://tracing/,跳转至tracing页面,点击load按钮,打开保存的trace文件,即可将预测服务的各阶段时间信息可视化。
效果如下图,图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务,server端开启4卡预测,client端启动4进程,batch size为1时的各阶段timeline,其中bert_pre代表client端的数据预处理阶段,client_infer代表client完成预测请求的发送和接收结果的阶段,图中的process代表的是client的进程号,每个进进程的第二行展示的是server各个op的timeline。
![timeline](../../../doc/timeline-example.png)
......@@ -63,19 +63,25 @@ class WebService(object):
abort(400)
if "fetch" not in request.json:
abort(400)
feed, fetch = self.preprocess(request.json, request.json["fetch"])
if isinstance(feed, list):
fetch_map_batch = client_service.batch_predict(
feed_batch=feed, fetch=fetch)
fetch_map_batch = self.postprocess(
feed=request.json, fetch=fetch, fetch_map=fetch_map_batch)
result = {"result": fetch_map_batch}
elif isinstance(feed, dict):
if "fetch" in feed:
del feed["fetch"]
fetch_map = client_service.predict(feed=feed, fetch=fetch)
result = self.postprocess(
feed=request.json, fetch=fetch, fetch_map=fetch_map)
try:
feed, fetch = self.preprocess(request.json,
request.json["fetch"])
if isinstance(feed, list):
fetch_map_batch = client_service.predict(
feed_batch=feed, fetch=fetch)
fetch_map_batch = self.postprocess(
feed=request.json,
fetch=fetch,
fetch_map=fetch_map_batch)
result = {"result": fetch_map_batch}
elif isinstance(feed, dict):
if "fetch" in feed:
del feed["fetch"]
fetch_map = client_service.predict(feed=feed, fetch=fetch)
result = self.postprocess(
feed=request.json, fetch=fetch, fetch_map=fetch_map)
except ValueError:
result = {"result": "Request Value Error"}
return result
app_instance.run(host="0.0.0.0",
......
......@@ -94,21 +94,26 @@ class WebService(object):
client.connect([endpoint])
while True:
request_json = inputqueue.get()
feed, fetch = self.preprocess(request_json, request_json["fetch"])
if isinstance(feed, list):
fetch_map_batch = client.batch_predict(
feed_batch=feed, fetch=fetch)
fetch_map_batch = self.postprocess(
feed=request_json, fetch=fetch, fetch_map=fetch_map_batch)
result = {"result": fetch_map_batch}
elif isinstance(feed, dict):
if "fetch" in feed:
del feed["fetch"]
fetch_map = client.predict(feed=feed, fetch=fetch)
result = self.postprocess(
feed=request_json, fetch=fetch, fetch_map=fetch_map)
self.output_queue.put(result)
try:
feed, fetch = self.preprocess(request_json,
request_json["fetch"])
if isinstance(feed, list):
fetch_map_batch = client.predict(
feed_batch=feed, fetch=fetch)
fetch_map_batch = self.postprocess(
feed=request_json,
fetch=fetch,
fetch_map=fetch_map_batch)
result = {"result": fetch_map_batch}
elif isinstance(feed, dict):
if "fetch" in feed:
del feed["fetch"]
fetch_map = client.predict(feed=feed, fetch=fetch)
result = self.postprocess(
feed=request_json, fetch=fetch, fetch_map=fetch_map)
self.output_queue.put(result)
except ValueError:
self.output_queue.put(-1)
def _launch_web_service(self, gpu_num):
app_instance = Flask(__name__)
......@@ -152,6 +157,8 @@ class WebService(object):
if self.idx >= len(self.gpus):
self.idx = 0
result = self.output_queue.get()
if not isinstance(result, dict) and result == -1:
result = {"result": "Request Value Error"}
return result
'''
feed, fetch = self.preprocess(request.json, request.json["fetch"])
......
......@@ -48,30 +48,6 @@ function rerun() {
exit 1
}
function build_app() {
local TYPE=$1
local DIRNAME=build-app-$TYPE
mkdir $DIRNAME # pwd: /Serving
cd $DIRNAME # pwd: /Serving/build-app-$TYPE
pip install numpy sentencepiece
case $TYPE in
CPU|GPU)
cmake -DPYTHON_INCLUDE_DIR=$PYTHONROOT/include/python2.7/ \
-DPYTHON_LIBRARIES=$PYTHONROOT/lib/libpython2.7.so \
-DPYTHON_EXECUTABLE=$PYTHONROOT/bin/python \
-DAPP=ON ..
rerun "make -j2 >/dev/null" 3 # due to some network reasons, compilation may fail
pip install -U python/dist/paddle_serving_app* >/dev/null
;;
*)
echo "error type"
exit 1
;;
esac
echo "build app $TYPE part finished as expected."
cd .. # pwd: /Serving
}
function build_client() {
local TYPE=$1
local DIRNAME=build-client-$TYPE
......@@ -211,16 +187,17 @@ function python_run_criteo_ctr_with_cube() {
cp ../../../build-server-$TYPE/output/bin/cube* ./cube/
mkdir -p $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.1.3/
yes | cp ../../../build-server-$TYPE/output/demo/serving/bin/serving $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.1.3/
sh cube_prepare.sh &
check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
python test_server.py ctr_serving_model_kv &
check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score"
tail -n 2 score | awk 'NR==1'
tail -n 2 score
AUC=$(tail -n 2 score | awk 'NR==1')
VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
RES=$( echo "$AUC>$VAR2" | bc )
if [[ $RES -eq 0 ]]; then
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.70"
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.67"
exit 1
fi
echo "criteo_ctr_with_cube inference auc test success"
......@@ -246,7 +223,7 @@ function python_run_criteo_ctr_with_cube() {
VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
RES=$( echo "$AUC>$VAR2" | bc )
if [[ $RES -eq 0 ]]; then
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.70"
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.67"
exit 1
fi
echo "criteo_ctr_with_cube inference auc test success"
......@@ -277,7 +254,6 @@ function main() {
init # pwd: /Serving
build_client $TYPE # pwd: /Serving
build_server $TYPE # pwd: /Serving
build_app $TYPE # pwd: /Serving
python_run_test $TYPE # pwd: /Serving
echo "serving $TYPE part finished as expected."
}
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册