@@ -23,14 +26,6 @@ We consider deploying deep learning inference service online to be a user-facing
...
@@ -23,14 +26,6 @@ We consider deploying deep learning inference service online to be a user-facing
<imgsrc="doc/demo.gif"width="700">
<imgsrc="doc/demo.gif"width="700">
</p>
</p>
<h2align="center">Some Key Features</h2>
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
-**Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
-**Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
-**Highly concurrent and efficient communication** between clients and servers supported.
-**Multiple programming languages** supported on client side, such as Golang, C++ and python.
-**Extensible framework design** which can support model serving beyond Paddle.
<h2align="center">Installation</h2>
<h2align="center">Installation</h2>
...
@@ -58,10 +53,42 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
...
@@ -58,10 +53,42 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client.
<h2align="center"> Pre-built services with Paddle Serving</h2>
This quick start example is only for users who already have a model to deploy and we prepare a ready-to-deploy model here. If you want to know how to use paddle serving from offline training to online serving, please reference to [Train_To_Service](https://github.com/PaddlePaddle/Serving/blob/develop/doc/TRAIN_TO_SERVICE.md)
| `ir_optim` | bool | `False` | Enable analysis and optimization of calculation graph |
| `ir_optim` | - | - | Enable analysis and optimization of calculation graph |
| `use_mkl` (Only for cpu version) | bool | `False` | Run inference with MKL |
| `use_mkl` (Only for cpu version) | - | - | Run inference with MKL |
Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
Here, we use `curl` to send a HTTP POST request to the service we just started. Users can use any python library to send HTTP POST as well, e.g, [requests](https://requests.readthedocs.io/en/master/).
</center>
</center>
...
@@ -117,138 +144,13 @@ print(fetch_map)
...
@@ -117,138 +144,13 @@ print(fetch_map)
```
```
Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
Here, `client.predict` function has two arguments. `feed` is a `python dict` with model input variable alias name and values. `fetch` assigns the prediction variables to be returned from servers. In the example, the name of `"x"` and `"price"` are assigned when the servable model is saved during training.
<h2align="center"> Pre-built services with Paddle Serving</h2>
<h2align="center">Some Key Features of Paddle Serving</h2>
<h3align="center">Chinese Word Segmentation</h4>
-**Description**:
``` shell
Chinese word segmentation HTTP service that can be deployed with one line command.
@@ -88,7 +88,7 @@ with open('processed.data') as f:
...
@@ -88,7 +88,7 @@ with open('processed.data') as f:
In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
In the default centos7 image we provide, the Python path is `/usr/bin/python`. If you want to use our centos6 image, you need to set it to `export PYTHONROOT=/usr/local/python2.7/`.
## Compile Server
## Compile Server
### Integrated CPU version paddle inference library
### Integrated CPU version paddle inference library
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
...
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
### Product model
### Product model
Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
```python
```python
importos
importos
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
exe.run(fluid.default_startup_program())
defpush_to_hdfs(local_file_path,remote_path):
defpush_to_hdfs(local_file_path,remote_path):
hadoop_bin='/hadoop-3.1.2/bin/hadoop'
afs='afs://***.***.***.***:***'# User needs to change
os.system('{} fs -put -f {} {}'.format(
uci='***,***'# User needs to change
hadoop_bin,local_file_path,remote_path))
hadoop_bin='/path/to/haddop/bin'# User needs to change
在`product_path`下运行下面的Python代码生产模型,每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
在`product_path`下运行下面的Python代码生产模型(运行前需要修改hadoop相关的参数),每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
```python
```python
importos
importos
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
exe.run(fluid.default_startup_program())
defpush_to_hdfs(local_file_path,remote_path):
defpush_to_hdfs(local_file_path,remote_path):
hadoop_bin='/hadoop-3.1.2/bin/hadoop'
afs='afs://***.***.***.***:***'# User needs to change
os.system('{} fs -put -f {} {}'.format(
uci='***,***'# User needs to change
hadoop_bin,local_file_path,remote_path))
hadoop_bin='/path/to/haddop/bin'# User needs to change
Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computationa-intensive services.
Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
For a prediction service, the easiest way to determine what type it is is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
...
@@ -16,5 +16,5 @@ Parameters for performance optimization:
...
@@ -16,5 +16,5 @@ Parameters for performance optimization:
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment.
Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment.
The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
```python
```python
...
@@ -29,7 +29,7 @@ from paddle_serving_server.web_service import WebService
...
@@ -29,7 +29,7 @@ from paddle_serving_server.web_service import WebService
In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac). Set model path by ```lac_model_path``` and dictionary path by ```lac_dict_path```.
In the Chinese sentiment classification task, the Chinese word segmentation needs to be done through [LAC task] (../lac).
In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task. The LAC prediction service is deployed on the CPU, and the sentiment classification task is deployed on the GPU, which can be changed according to the actual situation.
In this demo, the LAC task is placed in the preprocessing part of the HTTP prediction service of the sentiment classification task.
## Client prediction
## Client prediction
```
```
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9292/senta/prediction
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "天气不错"}], "fetch":["class_probs"]}' http://127.0.0.1:9393/senta/prediction
@@ -39,6 +39,7 @@ paddle_serving_app provides a variety of data preprocessing methods for predicti
...
@@ -39,6 +39,7 @@ paddle_serving_app provides a variety of data preprocessing methods for predicti
- class ChineseBertReader
- class ChineseBertReader
Preprocessing for Chinese semantic representation task.
Preprocessing for Chinese semantic representation task.
-`__init__(vocab_file, max_seq_len=20)`
-`__init__(vocab_file, max_seq_len=20)`
...
@@ -55,6 +56,7 @@ Preprocessing for Chinese semantic representation task.
...
@@ -55,6 +56,7 @@ Preprocessing for Chinese semantic representation task.
- class LACReader
- class LACReader
Preprocessing for Chinese word segmentation task.
Preprocessing for Chinese word segmentation task.
-`__init__(dict_floder)`
-`__init__(dict_floder)`
...
@@ -65,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
...
@@ -65,7 +67,7 @@ Preprocessing for Chinese word segmentation task.
- words(st ):Original text input.
- words(st ):Original text input.
- crf_decode(np.array):CRF code predicted by model.
- crf_decode(np.array):CRF code predicted by model.
[example](../examples/bert/lac_web_service.py)
[example](../examples/lac/lac_web_service.py)
- class SentaReader
- class SentaReader
...
@@ -76,7 +78,7 @@ Preprocessing for Chinese word segmentation task.
...
@@ -76,7 +78,7 @@ Preprocessing for Chinese word segmentation task.
[example](../examples/senta/senta_web_service.py)
[example](../examples/senta/senta_web_service.py)
- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes,[example](../examples/imagenet/image_rpc_client.py)
- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes,[example](../examples/imagenet/resnet50_rpc_client.py)