@@ -53,7 +53,7 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
...
@@ -53,7 +53,7 @@ You may need to use a domestic mirror source (in China, you can use the Tsinghua
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
If you need install modules compiled with develop branch, please download packages from [latest packages list](./doc/LATEST_PACKAGES.md) and install with `pip install` command.
Client package support Centos 7 and Ubuntu 18, or you can use HTTP service without install client.
Packages of Paddle Serving support Centos 6/7 and Ubuntu 16/18, or you can use HTTP service without install client.
<h2align="center"> Pre-built services with Paddle Serving</h2>
<h2align="center"> Pre-built services with Paddle Serving</h2>
@@ -88,7 +88,7 @@ with open('processed.data') as f:
...
@@ -88,7 +88,7 @@ with open('processed.data') as f:
In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
In the code, the function `client.add_variant(tag, clusters, variant_weight)` is to add a variant with label `tag` and flow weight `variant_weight`. In this example, a BOW variant with label of `bow` and flow weight of `10`, and an LSTM variant with label of `lstm` and a flow weight of `90` are added. The flow on the client side will be distributed to two variants according to the ratio of `10:90`.
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contains the variant tag corresponding to the distribution flow.
When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow.
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
...
@@ -46,7 +46,7 @@ In this example, the production model is uploaded to HDFS in `product_path` fold
### Product model
### Product model
Run the following Python code products model in `product_path` folder. Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
Run the following Python code products model in `product_path` folder(You need to modify Hadoop related parameters before running). Every 60 seconds, the package file of Boston house price prediction model `uci_housing.tar.gz` will be generated and uploaded to the path of HDFS `/`. After uploading, the timestamp file `donefile` will be updated and uploaded to the path of HDFS `/`.
```python
```python
importos
importos
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
exe.run(fluid.default_startup_program())
defpush_to_hdfs(local_file_path,remote_path):
defpush_to_hdfs(local_file_path,remote_path):
hadoop_bin='/hadoop-3.1.2/bin/hadoop'
afs='afs://***.***.***.***:***'# User needs to change
os.system('{} fs -put -f {} {}'.format(
uci='***,***'# User needs to change
hadoop_bin,local_file_path,remote_path))
hadoop_bin='/path/to/haddop/bin'# User needs to change
在`product_path`下运行下面的Python代码生产模型,每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
在`product_path`下运行下面的Python代码生产模型(运行前需要修改hadoop相关的参数),每隔 60 秒会产出 Boston 房价预测模型的打包文件`uci_housing.tar.gz`并上传至hdfs的`/`路径下,上传完毕后更新时间戳文件`donefile`并上传至hdfs的`/`路径下。
```python
```python
importos
importos
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
...
@@ -82,9 +82,14 @@ exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
exe.run(fluid.default_startup_program())
defpush_to_hdfs(local_file_path,remote_path):
defpush_to_hdfs(local_file_path,remote_path):
hadoop_bin='/hadoop-3.1.2/bin/hadoop'
afs='afs://***.***.***.***:***'# User needs to change
os.system('{} fs -put -f {} {}'.format(
uci='***,***'# User needs to change
hadoop_bin,local_file_path,remote_path))
hadoop_bin='/path/to/haddop/bin'# User needs to change
Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computationa-intensive services.
Due to different model structures, different prediction services consume different computing resources when performing predictions. For online prediction services, models that require less computing resources will have a higher proportion of communication time cost, which is called communication-intensive service. Models that require more computing resources have a higher time cost for inference calculations, which is called computation-intensive services.
For a prediction service, the easiest way to determine what type it is is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
For a prediction service, the easiest way to determine the type of service is to look at the time ratio. Paddle Serving provides [Timeline tool](../python/examples/util/README_CN.md), which can intuitively display the time spent in each stage of the prediction service.
For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
For communication-intensive prediction services, requests can be aggregated, and within a limit that can tolerate delay, multiple prediction requests can be combined into a batch for prediction.
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment.
Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment.
The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.
Next, we will show how to use the [uWSGI](https://github.com/unbit/uwsgi) module to deploy HTTP prediction services for production environments.