未验证 提交 ba1595b0 编写于 作者: M MRXLT 提交者: GitHub

Merge pull request #375 from wangjiawei04/v0.2.0-doc

Merge pull request #374 from MRXLT/0.2.0-fix-gpu
...@@ -154,7 +154,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv ...@@ -154,7 +154,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
{"label":"daisy","prob":0.9341403245925903} {"label":"daisy","prob":0.9341403245925903}
``` ```
<h3 align="center">More Demos</h4> <h3 align="center">More Demos</h3>
| Key | Value | | Key | Value |
| :----------------- | :----------------------------------------------------------- | | :----------------- | :----------------------------------------------------------- |
...@@ -249,8 +249,9 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv ...@@ -249,8 +249,9 @@ curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serv
- [Compile from source code(Chinese)](doc/COMPILE.md) - [Compile from source code(Chinese)](doc/COMPILE.md)
### About Efficiency ### About Efficiency
- [How profile serving efficiency?(Chinese)](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util) - [How to profile Paddle Serving latency?](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util)
- [Benchmarks](doc/BENCHMARK.md) - [CPU Benchmarks(Chinese)](doc/BENCHMARKING.md)
- [GPU Benchmarks(Chinese)](doc/GPU_BENCHMARKING.md)
### FAQ ### FAQ
- [FAQ(Chinese)](doc/FAQ.md) - [FAQ(Chinese)](doc/FAQ.md)
......
...@@ -51,6 +51,169 @@ fetch_map = client.predict(feed={"x": data}, fetch=["price"]) ...@@ -51,6 +51,169 @@ fetch_map = client.predict(feed={"x": data}, fetch=["price"])
print(fetch_map) print(fetch_map)
``` ```
在这里,`client.predict`函数具有两个参数。 `feed`是带有模型输入变量别名和值的`python dict``fetch`被要从服务器返回的预测变量赋值。 在该示例中,在训练过程中保存可服务模型时,被赋值的tensor名为`"x"``"price"`
<h2 align="center">Paddle Serving预装的服务</h2>
<h3 align="center">中文分词模型</h4>
- **介绍**:
``` shell
本示例为中文分词HTTP服务一键部署
```
- **下载服务包**:
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/lac/lac_model_jieba_web.tar.gz
```
- **启动web服务**:
``` shell
tar -xzf lac_model_jieba_web.tar.gz
python lac_web_service.py jieba_server_model/ lac_workdir 9292
```
- **客户端请求示例**:
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"words": "我爱北京天安门", "fetch":["word_seg"]}' http://127.0.0.1:9292/lac/prediction
```
- **返回结果示例**:
``` shell
{"word_seg":"我|爱|北京|天安门"}
```
<h3 align="center">图像分类模型</h4>
- **介绍**:
``` shell
图像分类模型由Imagenet数据集训练而成,该服务会返回一个标签及其概率
```
- **下载服务包**:
``` shell
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/imagenet-example/imagenet_demo.tar.gz
```
- **启动web服务**:
``` shell
tar -xzf imagenet_demo.tar.gz
python image_classification_service_demo.py resnet50_serving_model
```
- **客户端请求示例**:
<p align="center">
<br>
<img src='https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg' width = "200" height = "200">
<br>
<p>
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"url": "https://paddle-serving.bj.bcebos.com/imagenet-example/daisy.jpg", "fetch": ["score"]}' http://127.0.0.1:9292/image/prediction
```
- **返回结果示例**:
``` shell
{"label":"daisy","prob":0.9341403245925903}
```
<h3 align="center">更多示例</h3>
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Bert-Base-Baike |
| 下载链接 | [https://paddle-serving.bj.bcebos.com/bert_example/bert_seq128.tar.gz](https://paddle-serving.bj.bcebos.com/bert_example%2Fbert_seq128.tar.gz) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert |
| 介绍 | 获得一个中文语句的语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Resnet50-Imagenet |
| 下载链接 | [https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet50_vd.tar.gz](https://paddle-serving.bj.bcebos.com/imagenet-example%2FResNet50_vd.tar.gz) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| 介绍 | 获得一张图片的图像语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Resnet101-Imagenet |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imagenet-example/ResNet101_vd.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imagenet |
| 介绍 | 获得一张图片的图像语义表示 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | CNN-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个中文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | LSTM-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个英文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | BOW-IMDB |
| 下载链接 | https://paddle-serving.bj.bcebos.com/imdb-demo/imdb_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb |
| 介绍 | 从一个英文语句获得类别及其概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | Jieba-LAC |
| 下载链接 | https://paddle-serving.bj.bcebos.com/lac/lac_model.tar.gz |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/lac |
| 介绍 | 获取中文语句的分词 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | DNN-CTR |
| 下载链接 | None(Get model by [local_train.py](./python/examples/criteo_ctr/local_train.py)) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr |
| 介绍 | 从项目的特征向量中获得点击概率 |
| Key | Value |
| :----------------- | :----------------------------------------------------------- |
| 模型名 | DNN-CTR(with cube) |
| 下载链接 | None(Get model by [local_train.py](python/examples/criteo_ctr_with_cube/local_train.py)) |
| 客户端/服务端代码 | https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/criteo_ctr_with_cube |
| 介绍 | 从项目的特征向量中获得点击概率 |
<h2 align="center">文档</h2>
### 新手教程
- [怎样保存用于Paddle Serving的模型?](doc/SAVE_CN.md)
- [端到端完成从训练到部署全流程](doc/TRAIN_TO_SERVICE_CN.md)
- [十分钟构建Bert-As-Service](doc/BERT_10_MINS_CN.md)
### 开发者教程
- [如何配置Server端的计算图?](doc/SERVER_DAG_CN.md)
- [如何开发一个新的General Op?](doc/NEW_OPERATOR_CN.md)
- [如何在Paddle Serving使用Go Client?](doc/IMDB_GO_CLIENT_CN.md)
- [如何编译PaddleServing?](doc/COMPILE_CN.md)
### 关于Paddle Serving性能
- [如何测试Paddle Serving性能?](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/util/)
- [CPU版Benchmarks](doc/BENCHMARKING.md)
- [GPU版Benchmarks](doc/GPU_BENCHMARKING.md)
### FAQ
- [常见问答](doc/deprecated/FAQ.md)
## 文档 ## 文档
......
## 语义理解预测服务 ## Bert as service
示例中采用BERT模型进行语义理解预测,将文本表示为向量的形式,可以用来做进一步的分析和预测。 ([简体中文](./README_CN.md)|English)
### 获取模型 In the example, a BERT model is used for semantic understanding prediction, and the text is represented as a vector, which can be used for further analysis and prediction.
示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) ### Getting Model
请先安装paddlehub
This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub).
Install paddlehub first
``` ```
pip install paddlehub pip install paddlehub
``` ```
执行
run
``` ```
python prepare_model.py 20 python prepare_model.py 20
``` ```
参数20表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq20_model文件夹
生成client端配置文件,存放在bert_seq20_client文件夹
### 获取词典和样例数据 the 20 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
the config file and model file for server side are saved in the folder bert_seq20_model.
the config file generated for client side is saved in the folder bert_seq20_client.
### Getting Dict and Sample Dataset
``` ```
sh get_data.sh sh get_data.sh
``` ```
脚本将下载中文词典vocab.txt和中文样例数据data-c.txt this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
### 启动RPC预测服务 ### RPC Inference Service
执行 Run
``` ```
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #启动cpu预测服务 python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #cpu inference service
``` ```
或者 Or
``` ```
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
``` ```
### 执行预测 ### RPC Inference
执行预测前需要安装paddle_serving_app,模块中提供了BERT模型的数据预处理方法。 before prediction we should install paddle_serving_app. This module provides data preprocessing for BERT model.
``` ```
pip install paddle_serving_app pip install paddle_serving_app
``` ```
执行 Run
``` ```
head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt
``` ```
启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。
### 启动HTTP预测服务 the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
### HTTP Inference Service
``` ```
export CUDA_VISIBLE_DEVICES=0,1 export CUDA_VISIBLE_DEVICES=0,1
``` ```
通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
``` ```
python bert_web_service.py bert_seq20_model/ 9292 #启动gpu预测服务 python bert_web_service.py bert_seq20_model/ 9292 #launch gpu inference service
``` ```
### 执行预测 ### HTTP Inference
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
...@@ -62,16 +68,17 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[ ...@@ -62,16 +68,17 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[
### Benchmark ### Benchmark
模型:bert_chinese_L-12_H-768_A-12 Model:bert_chinese_L-12_H-768_A-12
GPU:GPU V100 * 1
设备:GPU V100 * 1 CUDA/cudnn Version:CUDA 9.2,cudnn 7.1.4
环境:CUDA 9.2,cudnn 7.1.4
测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20,时间单位为秒. In the test, 10 thousand samples in the sample data are copied into 100 thousand samples. Each client thread sends a sample of the number of threads. The batch size is 1, the max_seq_len is 20, and the time unit is seconds.
在client线程数为4时,预测速度可以达到432样本每秒。 When the number of client threads is 4, the prediction speed can reach 432 samples per second.
由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。 Because a single GPU can only perform serial calculations internally, increasing the number of client threads can only reduce the idle time of the GPU. Therefore, after the number of threads reaches 4, the increase in the number of threads does not improve the prediction speed.
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total | | client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ | | ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
...@@ -81,5 +88,5 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[ ...@@ -81,5 +88,5 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[
| 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 | | 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 |
| 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 | | 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 |
总耗时变化规律如下: the following is the client thread num - latency bar chart:
![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png) ![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
## 语义理解预测服务
(简体中文|[English](./README.md))
示例中采用BERT模型进行语义理解预测,将文本表示为向量的形式,可以用来做进一步的分析和预测。
### 获取模型
示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)
请先安装paddlehub
```
pip install paddlehub
```
执行
```
python prepare_model.py 20
```
参数20表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq20_model文件夹
生成client端配置文件,存放在bert_seq20_client文件夹
### 获取词典和样例数据
```
sh get_data.sh
```
脚本将下载中文词典vocab.txt和中文样例数据data-c.txt
### 启动RPC预测服务
执行
```
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #启动cpu预测服务
```
或者
```
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
```
### 执行预测
执行预测前需要安装paddle_serving_app,模块中提供了BERT模型的数据预处理方法。
```
pip install paddle_serving_app
```
执行
```
head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt
```
启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。
### 启动HTTP预测服务
```
export CUDA_VISIBLE_DEVICES=0,1
```
通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu
```
python bert_web_service.py bert_seq20_model/ 9292 #启动gpu预测服务
```
### 执行预测
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
```
### Benchmark
模型:bert_chinese_L-12_H-768_A-12
设备:GPU V100 * 1
环境:CUDA 9.2,cudnn 7.1.4
测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20,时间单位为秒.
在client线程数为4时,预测速度可以达到432样本每秒。
由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ----- | ------ | ---- | ------- | ------ |
| 1 | 3.05 | 290.54 | 0.37 | 239.15 | 6.43 | 0.71 | 365.63 |
| 4 | 0.85 | 213.66 | 0.091 | 200.39 | 1.62 | 0.2 | 231.45 |
| 8 | 0.42 | 223.12 | 0.043 | 110.99 | 0.8 | 0.098 | 232.05 |
| 12 | 0.32 | 225.26 | 0.029 | 73.87 | 0.53 | 0.078 | 231.45 |
| 16 | 0.23 | 227.26 | 0.022 | 55.61 | 0.4 | 0.056 | 231.9 |
总耗时变化规律如下:
![bert benchmark](../../../doc/bert-benchmark-batch-size-1.png)
# Fit a line example, prediction through rpc service # Fit a line example, prediction through rpc service
Start rpc service
([简体中文](./README_CN.md)|English)
## Start rpc service
``` shell ``` shell
sh get_data.sh sh get_data.sh
python test_server.py uci_housing_model/ python test_server.py uci_housing_model/
``` ```
Prediction ## Prediction
``` shell ``` shell
python test_client.py uci_housing_client/serving_client_conf.prototxt python test_client.py uci_housing_client/serving_client_conf.prototxt
``` ```
# prediction through http service ## prediction through http service
Start a web service with default web service hosting modules Start a web service with default web service hosting modules
``` shell ``` shell
python -m paddle_serving_server.web_serve --model uci_housing_model/ --thread 10 --name uci --port 9393 --name uci python -m paddle_serving_server.web_serve --model uci_housing_model/ --thread 10 --name uci --port 9393 --name uci
``` ```
Prediction through http post ## Prediction through http post
``` shell ``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
``` ```
# 线性回归,RPC预测服务示例
(简体中文|[English](./README.md))
## 开启RPC服务端
``` shell
sh get_data.sh
python test_server.py uci_housing_model/
```
## RPC预测
``` shell
python test_client.py uci_housing_client/serving_client_conf.prototxt
```
## 开启HTTP服务端
Start a web service with default web service hosting modules
``` shell
python -m paddle_serving_server.web_serve --model uci_housing_model/ --thread 10 --name uci --port 9393 --name uci
```
## HTTP预测
``` shell
curl -H "Content-Type:application/json" -X POST -d '{"x": [0.0137, -0.1136, 0.2553, -0.0692, 0.0582, -0.0727, -0.1583, -0.0584, 0.6283, 0.4919, 0.1856, 0.0795, -0.0332], "fetch":["price"]}' http://127.0.0.1:9393/uci/prediction
```
## 图像分类示例 ## Image Classification
示例中采用ResNet50_vd模型执行imagenet 1000分类任务。 ([简体中文](./README_CN.md)|English)
### 获取模型配置文件和样例数据 The example uses the ResNet50_vd model to perform the imagenet 1000 classification task.
### Get model config and sample dataset
``` ```
sh get_model.sh sh get_model.sh
``` ```
### 执行HTTP预测服务 ### HTTP Infer
启动server端 launch server side
``` ```
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu预测服务 python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu inference service
``` ```
``` ```
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu预测服务 python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu inference service
``` ```
client端进行预测 client send inference request
``` ```
python image_http_client.py python image_http_client.py
``` ```
### 执行RPC预测服务 ### RPC Infer
启动server端 launch server side
``` ```
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu预测服务 python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu inference service
``` ```
``` ```
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu预测服务 python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu inference service
``` ```
client端进行预测 client send inference request
``` ```
python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
``` ```
*server端示例中服务端口为9393端口,client端示例中数据来自./data文件夹,server端地址为本地9393端口,可根据实际情况更改脚本。* *the port of server side in this example is 9393, the sample data used by client side is in the folder ./data. These parameter can be modified in practice*
## 图像分类示例
(简体中文|[English](./README.md))
示例中采用ResNet50_vd模型执行imagenet 1000分类任务。
### 获取模型配置文件和样例数据
```
sh get_model.sh
```
### 执行HTTP预测服务
启动server端
```
python image_classification_service.py ResNet50_vd_model workdir 9393 #cpu预测服务
```
```
python image_classification_service_gpu.py ResNet50_vd_model workdir 9393 #gpu预测服务
```
client端进行预测
```
python image_http_client.py
```
### 执行RPC预测服务
启动server端
```
python -m paddle_serving_server.serve --model ResNet50_vd_model --port 9393 #cpu预测服务
```
```
python -m paddle_serving_server_gpu.serve --model ResNet50_vd_model --port 9393 --gpu_ids 0 #gpu预测服务
```
client端进行预测
```
python image_rpc_client.py ResNet50_vd_client_config/serving_client_conf.prototxt
```
*server端示例中服务端口为9393端口,client端示例中数据来自./data文件夹,server端地址为本地9393端口,可根据实际情况更改脚本。*
## IMDB评论情绪预测服务 ## IMDB comment sentiment inference service
([简体中文](./README_CN.md)|English)
### 获取模型文件和样例数据 ### Get model files and sample data
``` ```
sh get_data.sh sh get_data.sh
``` ```
脚本会下载和解压出cnn、lstm和bow三种模型的配置文文件以及test_data和train_data。 the package downloaded contains cnn, lstm and bow model config along with their test_data and train_data.
### 启动RPC预测服务 ### Start RPC inference service
``` ```
python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292 python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292
``` ```
### 执行预测 ### RPC Infer
``` ```
head test_data/part-0 | python test_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab head test_data/part-0 | python test_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab
``` ```
预测test_data/part-0的前十个样例。
### 启动HTTP预测服务 it will get predict results of the first 10 test cases.
### Start HTTP inference service
``` ```
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
``` ```
### 执行预测 ### HTTP Infer
``` ```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
...@@ -31,13 +33,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0 ...@@ -31,13 +33,13 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0
### Benchmark ### Benchmark
设备 :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48 CPU :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
模型[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py) Model[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
server thread num : 16 server thread num : 16
测试中,client共发送25000条测试样本,图中数据为单个线程的耗时,时间单位为秒。可以看出,client端多线程的预测速度相比单线程有明显提升,在16线程时预测速度是单线程的8.7倍。 In this test, client sends 25000 test samples totally, the bar chart given later is the latency of single thread, the unit is second, from which we know the predict efficiency is improved greatly by multi-thread compared to single-thread. 8.7 times improvement is made by 16 threads prediction.
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total | | client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- | | ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
...@@ -49,6 +51,6 @@ server thread num : 16 ...@@ -49,6 +51,6 @@ server thread num : 16
| 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 | | 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 |
| 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 | | 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 |
预测总耗时变化规律如下 The thread-latency bar chart is as follow
![total cost](../../../doc/imdb-benchmark-server-16.png) ![total cost](../../../doc/imdb-benchmark-server-16.png)
## IMDB评论情绪预测服务
(简体中文|[English](./README.md))
### 获取模型文件和样例数据
```
sh get_data.sh
```
脚本会下载和解压出cnn、lstm和bow三种模型的配置文文件以及test_data和train_data。
### 启动RPC预测服务
```
python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292
```
### 执行预测
```
head test_data/part-0 | python test_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab
```
预测test_data/part-0的前十个样例。
### 启动HTTP预测服务
```
python text_classify_service.py imdb_cnn_model/ workdir/ 9292 imdb.vocab
```
### 执行预测
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
### Benchmark
设备 :Intel(R) Xeon(R) Gold 6271 CPU @ 2.60GHz * 48
模型 :[CNN](https://github.com/PaddlePaddle/Serving/blob/develop/python/examples/imdb/nets.py)
server thread num : 16
测试中,client共发送25000条测试样本,图中数据为单个线程的耗时,时间单位为秒。可以看出,client端多线程的预测速度相比单线程有明显提升,在16线程时预测速度是单线程的8.7倍。
| client thread num | prepro | client infer | op0 | op1 | op2 | postpro | total |
| ------------------ | ------ | ------------ | ------ | ----- | ------ | ------- | ----- |
| 1 | 1.09 | 28.79 | 0.094 | 20.59 | 0.047 | 0.034 | 31.41 |
| 4 | 0.22 | 7.41 | 0.023 | 5.01 | 0.011 | 0.0098 | 8.01 |
| 8 | 0.11 | 4.7 | 0.012 | 2.61 | 0.0062 | 0.0049 | 5.01 |
| 12 | 0.081 | 4.69 | 0.0078 | 1.72 | 0.0042 | 0.0035 | 4.91 |
| 16 | 0.058 | 3.46 | 0.0061 | 1.32 | 0.0033 | 0.003 | 3.63 |
| 20 | 0.049 | 3.77 | 0.0047 | 1.03 | 0.0025 | 0.0022 | 3.91 |
| 24 | 0.041 | 3.86 | 0.0039 | 0.85 | 0.002 | 0.0017 | 3.98 |
预测总耗时变化规律如下:
![total cost](../../../doc/imdb-benchmark-server-16.png)
## Timeline工具使用 ## Timeline Tool Tutorial
serving框架中内置了预测服务中各阶段时间打点的功能,在client端通过环境变量来控制是否开启,开启后会将打点信息输出到屏幕。 ([简体中文](./README_CN.md)|English)
The serving framework has a built-in function for predicting the timing of each stage of the service. The client controls whether to turn on the environment through environment variables. After opening, the information will be output to the screen.
``` ```
export FLAGS_profile_client=1 #开启client端各阶段时间打点 export FLAGS_profile_client=1 #turn on the client timing tool for each stage
export FLAGS_profile_server=1 #开启server端各阶段时间打点 export FLAGS_profile_server=1 #turn on the server timing tool for each stage
``` ```
开启该功能后,client端在预测的过程中会将对应的日志信息打印到标准输出。 After enabling this function, the client will print the corresponding log information to standard output during the prediction process.
为了更直观地展现各阶段的耗时,提供脚本对日志文件做进一步的分析处理。 In order to show the time consuming of each stage more intuitively, a script is provided to further analyze and process the log file.
使用时先将client的输出保存到文件,以profile为例。 When using, first save the output of the client to a file, taking `profile` as an example.
``` ```
python show_profile.py profile ${thread_num} python show_profile.py profile ${thread_num}
``` ```
这里thread_num参数为client运行时的进程数,脚本将按照这个参数来计算各阶段的平均耗时。 Here the `thread_num` parameter is the number of processes when the client is running, and the script will calculate the average time spent in each phase according to this parameter.
脚本将计算各阶段的耗时,并除以线程数做平均,打印到标准输出。 The script calculates the time spent in each stage, divides by the number of threads to average, and prints to standard output.
``` ```
python timeline_trace.py profile trace python timeline_trace.py profile trace
``` ```
脚本将日志中的时间打点信息转换成json格式保存到trace文件,trace文件可以通过chrome浏览器的tracing功能进行可视化。 The script converts the time-dot information in the log into a json format and saves it to a trace file. The trace file can be visualized through the tracing function of the Chrome browser.
具体操作:打开chrome浏览器,在地址栏输入chrome://tracing/,跳转至tracing页面,点击load按钮,打开保存的trace文件,即可将预测服务的各阶段时间信息可视化。 Specific operation: Open the chrome browser, enter `chrome://tracing/` in the address bar, jump to the tracing page, click the `load` button, and open the saved trace file to visualize the time information of each stage of the prediction service.
效果如下图,图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务,server端开启4卡预测,client端启动4进程,batch size为1时的各阶段timeline,其中bert_pre代表client端的数据预处理阶段,client_infer代表client完成预测请求的发送和接收结果的阶段,图中的process代表的是client的进程号,每个进进程的第二行展示的是server各个op的timeline。 The data visualization output is shown as follow, it uses [bert as service example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
![timeline](../../../doc/timeline-example.png) ![timeline](../../../doc/timeline-example.png)
## Timeline工具使用
(简体中文|[English](./README.md))
serving框架中内置了预测服务中各阶段时间打点的功能,在client端通过环境变量来控制是否开启,开启后会将打点信息输出到屏幕。
```
export FLAGS_profile_client=1 #开启client端各阶段时间打点
export FLAGS_profile_server=1 #开启server端各阶段时间打点
```
开启该功能后,client端在预测的过程中会将对应的日志信息打印到标准输出。
为了更直观地展现各阶段的耗时,提供脚本对日志文件做进一步的分析处理。
使用时先将client的输出保存到文件,以profile为例。
```
python show_profile.py profile ${thread_num}
```
这里thread_num参数为client运行时的进程数,脚本将按照这个参数来计算各阶段的平均耗时。
脚本将计算各阶段的耗时,并除以线程数做平均,打印到标准输出。
```
python timeline_trace.py profile trace
```
脚本将日志中的时间打点信息转换成json格式保存到trace文件,trace文件可以通过chrome浏览器的tracing功能进行可视化。
具体操作:打开chrome浏览器,在地址栏输入chrome://tracing/,跳转至tracing页面,点击load按钮,打开保存的trace文件,即可将预测服务的各阶段时间信息可视化。
效果如下图,图中展示了使用[bert示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)的GPU预测服务,server端开启4卡预测,client端启动4进程,batch size为1时的各阶段timeline,其中bert_pre代表client端的数据预处理阶段,client_infer代表client完成预测请求的发送和接收结果的阶段,图中的process代表的是client的进程号,每个进进程的第二行展示的是server各个op的timeline。
![timeline](../../../doc/timeline-example.png)
...@@ -63,12 +63,16 @@ class WebService(object): ...@@ -63,12 +63,16 @@ class WebService(object):
abort(400) abort(400)
if "fetch" not in request.json: if "fetch" not in request.json:
abort(400) abort(400)
feed, fetch = self.preprocess(request.json, request.json["fetch"]) try:
feed, fetch = self.preprocess(request.json,
request.json["fetch"])
if isinstance(feed, list): if isinstance(feed, list):
fetch_map_batch = client_service.batch_predict( fetch_map_batch = client_service.predict(
feed_batch=feed, fetch=fetch) feed_batch=feed, fetch=fetch)
fetch_map_batch = self.postprocess( fetch_map_batch = self.postprocess(
feed=request.json, fetch=fetch, fetch_map=fetch_map_batch) feed=request.json,
fetch=fetch,
fetch_map=fetch_map_batch)
result = {"result": fetch_map_batch} result = {"result": fetch_map_batch}
elif isinstance(feed, dict): elif isinstance(feed, dict):
if "fetch" in feed: if "fetch" in feed:
...@@ -76,6 +80,8 @@ class WebService(object): ...@@ -76,6 +80,8 @@ class WebService(object):
fetch_map = client_service.predict(feed=feed, fetch=fetch) fetch_map = client_service.predict(feed=feed, fetch=fetch)
result = self.postprocess( result = self.postprocess(
feed=request.json, fetch=fetch, fetch_map=fetch_map) feed=request.json, fetch=fetch, fetch_map=fetch_map)
except ValueError:
result = {"result": "Request Value Error"}
return result return result
app_instance.run(host="0.0.0.0", app_instance.run(host="0.0.0.0",
......
...@@ -94,12 +94,16 @@ class WebService(object): ...@@ -94,12 +94,16 @@ class WebService(object):
client.connect([endpoint]) client.connect([endpoint])
while True: while True:
request_json = inputqueue.get() request_json = inputqueue.get()
feed, fetch = self.preprocess(request_json, request_json["fetch"]) try:
feed, fetch = self.preprocess(request_json,
request_json["fetch"])
if isinstance(feed, list): if isinstance(feed, list):
fetch_map_batch = client.batch_predict( fetch_map_batch = client.predict(
feed_batch=feed, fetch=fetch) feed_batch=feed, fetch=fetch)
fetch_map_batch = self.postprocess( fetch_map_batch = self.postprocess(
feed=request_json, fetch=fetch, fetch_map=fetch_map_batch) feed=request_json,
fetch=fetch,
fetch_map=fetch_map_batch)
result = {"result": fetch_map_batch} result = {"result": fetch_map_batch}
elif isinstance(feed, dict): elif isinstance(feed, dict):
if "fetch" in feed: if "fetch" in feed:
...@@ -107,8 +111,9 @@ class WebService(object): ...@@ -107,8 +111,9 @@ class WebService(object):
fetch_map = client.predict(feed=feed, fetch=fetch) fetch_map = client.predict(feed=feed, fetch=fetch)
result = self.postprocess( result = self.postprocess(
feed=request_json, fetch=fetch, fetch_map=fetch_map) feed=request_json, fetch=fetch, fetch_map=fetch_map)
self.output_queue.put(result) self.output_queue.put(result)
except ValueError:
self.output_queue.put(-1)
def _launch_web_service(self, gpu_num): def _launch_web_service(self, gpu_num):
app_instance = Flask(__name__) app_instance = Flask(__name__)
...@@ -152,6 +157,8 @@ class WebService(object): ...@@ -152,6 +157,8 @@ class WebService(object):
if self.idx >= len(self.gpus): if self.idx >= len(self.gpus):
self.idx = 0 self.idx = 0
result = self.output_queue.get() result = self.output_queue.get()
if not isinstance(result, dict) and result == -1:
result = {"result": "Request Value Error"}
return result return result
''' '''
feed, fetch = self.preprocess(request.json, request.json["fetch"]) feed, fetch = self.preprocess(request.json, request.json["fetch"])
......
...@@ -197,7 +197,7 @@ function python_run_criteo_ctr_with_cube() { ...@@ -197,7 +197,7 @@ function python_run_criteo_ctr_with_cube() {
VAR2="0.67" #TODO: temporarily relax the threshold to 0.67 VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
RES=$( echo "$AUC>$VAR2" | bc ) RES=$( echo "$AUC>$VAR2" | bc )
if [[ $RES -eq 0 ]]; then if [[ $RES -eq 0 ]]; then
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.70" echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.67"
exit 1 exit 1
fi fi
echo "criteo_ctr_with_cube inference auc test success" echo "criteo_ctr_with_cube inference auc test success"
...@@ -205,6 +205,30 @@ function python_run_criteo_ctr_with_cube() { ...@@ -205,6 +205,30 @@ function python_run_criteo_ctr_with_cube() {
ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
;; ;;
GPU) GPU)
check_cmd "wget https://paddle-serving.bj.bcebos.com/unittest/ctr_cube_unittest.tar.gz"
check_cmd "tar xf ctr_cube_unittest.tar.gz"
check_cmd "mv models/ctr_client_conf ./"
check_cmd "mv models/ctr_serving_model_kv ./"
check_cmd "mv models/data ./cube/"
check_cmd "mv models/ut_data ./"
cp ../../../build-server-$TYPE/output/bin/cube* ./cube/
mkdir -p $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server_gpu/serving-gpu-0.1.3/
yes | cp ../../../build-server-$TYPE/output/demo/serving/bin/serving $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_server_gpu/serving-gpu-0.1.3/
sh cube_prepare.sh &
check_cmd "mkdir work_dir1 && cp cube/conf/cube.conf ./work_dir1/"
python test_server_gpu.py ctr_serving_model_kv &
check_cmd "python test_client.py ctr_client_conf/serving_client_conf.prototxt ./ut_data >score"
tail -n 2 score | awk 'NR==1'
AUC=$(tail -n 2 score | awk 'NR==1')
VAR2="0.67" #TODO: temporarily relax the threshold to 0.67
RES=$( echo "$AUC>$VAR2" | bc )
if [[ $RES -eq 0 ]]; then
echo "error with criteo_ctr_with_cube inference auc test, auc should > 0.67"
exit 1
fi
echo "criteo_ctr_with_cube inference auc test success"
ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill
ps -ef | grep "cube" | grep -v grep | awk '{print $2}' | xargs kill
;; ;;
*) *)
echo "error type" echo "error type"
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册