We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you can put the model online without much effort. A demo of serving is as follows:
We consider deploying deep learning inference service online to be a user-facing application in the future. **The goal of this project**: When you have trained a deep neural net with [Paddle](https://github.com/PaddlePaddle/Paddle), you are also capable to deploy the model online easily. A demo of Paddle Serving is as follows:
<palign="center">
<imgsrc="doc/demo.gif"width="700">
</p>
<h2align="center">Some Key Features</h2>
- Integrate with Paddle training pipeline seemlessly, most paddle models can be deployed **with one line command**.
- Integrate with Paddle training pipeline seamlessly, most paddle models can be deployed **with one line command**.
-**Industrial serving features** supported, such as models management, online loading, online A/B testing etc.
-**Distributed Key-Value indexing** supported that is especially useful for large scale sparse features as model inputs.
-**Highly concurrent and efficient communication** between clients and servers.
-**Multiple programming languages** supported on client side, such as Golang, C++ and python
-**Extensible framework design**that can support model serving beyond Paddle.
-**Distributed Key-Value indexing** supported which is especially useful for large scale sparse features as model inputs.
-**Highly concurrent and efficient communication** between clients and servers supported.
-**Multiple programming languages** supported on client side, such as Golang, C++ and python.
-**Extensible framework design**which can support model serving beyond Paddle.
<h2align="center">Installation</h2>
...
...
@@ -53,7 +53,7 @@ Paddle Serving provides HTTP and RPC based service for users to access
### HTTP service
Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a rpc service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
Paddle Serving provides a built-in python module called `paddle_serving_server.serve` that can start a RPC service or a http service with one-line command. If we specify the argument `--name uci`, it means that we will have a HTTP service with a url of `$IP:$PORT/uci/prediction`
A user can also start a rpc service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here.
A user can also start a RPC service with `paddle_serving_server.serve`. RPC service is usually faster than HTTP service, although a user needs to do some coding based on Paddle Serving's python client API. Note that we do not specify `--name` here.
This document will use an example of text classification task based on IMDB dataset to show how to build a A/B Test framework using Paddle Serving. The structure relationship between the client and servers in the example is shown in the figure below.
PaddlePaddle is the Baidu's open source machine learning framework, which supports a wide range of customized development of deep learning models; Paddle serving is the online prediction framework of Paddle, which seamlessly connects with Paddle model training, and provides cloud services for machine learning prediction. This article will describe the Paddle Serving design from the bottom up, from the model, service, and access levels.
最终形成一套完整的serving解决方案。
1. The model is the core of Paddle Serving prediction, including the management of model data and inference calculations;
2. Prediction framework encapsulation model for inference calculations, providing external RPC interface to connect different upstream
3. The prediction service SDK provides a set of access frameworks
- Service 对一次pv的请求封装,可配置若干条Workflow,彼此之间复用当前PV的Request对象,然后各自并行/串行执行,最后将Response写入对应的输出slot中;一个Paddle-serving进程可配置多套Service接口,上游根据ServiceName决定当前访问的Service接口。
## 2. Terms explanation
## 3. Python Interface设计
-**baidu-rpc**: Baidu's official open source RPC framework, supports multiple common communication protocols, and provides a custom interface experience based on protobuf
-**Variant**: Paddle Serving architecture is an abstraction of a minimal prediction cluster, which is characterized by all internal instances (replicas) being completely homogeneous and logically corresponding to a fixed version of a model
-**Endpoint**: Multiple Variants form an Endpoint. Logically, Endpoint represents a model, and Variants within the Endpoint represent different versions.
-**OP**: PaddlePaddle is used to encapsulate a numerical calculation operator, Paddle Serving is used to represent a basic business operation operator, and the core interface is inference. OP configures its dependent upstream OP to connect multiple OPs into a workflow
-**Channel**: An abstraction of all request-level intermediate data of the OP; data exchange between OPs through Channels
-**Bus**: manages all channels in a thread, and schedules the access relationship between the two sets of OP and Channel according to the DAG dependency graph between DAGs
-**Stage**: Workflow according to the topology diagram described by DAG, a collection of OPs that belong to the same link and can be executed in parallel
-**Node**: An OP operator instance composed of an OP operator class combined with parameter configuration, which is also an execution unit in Workflow
-**Workflow**: executes the inference interface of each OP in order according to the topology described by DAG
-**DAG/Workflow**: consists of several interdependent Nodes. Each Node can obtain the Request object through a specific interface. The node Op obtains the output object of its pre-op through the dependency relationship. The output of the last Node is the Response object by default.
-**Service**: encapsulates a pv request, can configure several Workflows, reuse the current PV's Request object with each other, and then execute each in parallel/serial execution, and finally write the Response to the corresponding output slot; a Paddle-serving process Multiple sets of Service interfaces can be configured. The upstream determines the Service interface currently accessed based on the ServiceName.
A set of Paddle Serving dynamic library, support the remote estimation service of the common model saved by Paddle, and call the various underlying functions of PaddleServing through the Python Interface.
Server Python API主要负责加载预估模型,以及生成Paddle Serving需要的各种配置,包括engines,workflow,resource等
### 3.3 Overall design:
- The user starts the Client and Server through the Python Client. The Python API has a function to check whether the interconnection and the models to be accessed match.
- The Python API calls the pybind corresponding to the client and server functions implemented by Paddle Serving, and the information transmitted through RPC is implemented through RPC.
- The Client Python API currently has two simple functions, load_inference_conf and predict, which are used to perform loading of the model to be predicted and prediction, respectively.
- The Server Python API is mainly responsible for loading the inference model and generating various configurations required by Paddle Serving, including engines, workflow, resources, etc.
### 3.4 Server Inferface
...
...
@@ -49,10 +51,10 @@ Server Python API主要负责加载预估模型,以及生成Paddle Serving需
PaddleServing is designed to saves the model interface that can be used during the training process, which is basically the same as the Paddle save inference model interface, feed_var_dict and fetch_var_dict
You can alias the input and output variables. The configuration that needs to be read when the serving starts is saved in the client and server storage directories.
**Model Management Framework**: Connects model files of multiple machine learning platforms and provides a unified inference interface
**Business Scheduling Framework**: Abstracts the calculation logic of various different inference models, provides a general DAG scheduling framework, and connects different operators through DAG diagrams to complete a prediction service together. This abstract model allows users to conveniently implement their own calculation logic, and at the same time facilitates operator sharing. (Users build their own forecasting services. A large part of their work is to build DAGs and provide operators.)
**Predict Service**: Encapsulation of the externally provided prediction service interface. Define communication fields with the client through protobuf.
The model management framework is responsible for managing the models trained by the machine learning framework. It can be abstracted into three levels: model loading, model data, and model reasoning.
#### 模型加载
#### Model Loading
将模型从磁盘加载到内存,支持多版本、热加载、增量更新等功能
Load model from disk to memory, support multi-version, hot-load, incremental update, etc.
#### 模型数据
#### Model data
模型在内存中的数据结构,集成fluid预测lib
Model data structure in memory, integrated fluid inference lib
#### inferencer
向上为预测服务提供统一的预测接口
it provided united inference interface for upper layers
With reference to the abstract idea of model calculation of the TensorFlow framework, the business logic is abstracted into a DAG diagram, driven by configuration, generating a workflow, and skipping C ++ code compilation. Each specific step of the service corresponds to a specific OP. The OP can configure the upstream OP that it depends on. Unified message passing between OPs is achieved by the thread-level bus and channel mechanisms. For example, the service process of a simple prediction service can be abstracted into 3 steps including reading request data-> calling the prediction interface-> writing back the prediction result, and correspondingly implemented to 3 OP: ReaderOp-> ClassifyOp-> WriteOp
Regarding the dependencies between OPs, and the establishment of workflows through OPs, you can refer to [从零开始写一个预测服务](./deprecated/CREATING.md)(simplified Chinese Version)
Paddle Serving instances can load multiple models at the same time, and each model uses a Service (and its configured workflow) to undertake services. You can refer to [service configuration file in Demo example](../tools/cpp_examples/demo-serving/conf/service.prototxt) to learn how to configure multiple services for the serving instance
#### 4.2.3 业务调度层级关系
#### 4.2.3 Hierarchical relationship of business scheduling
One Service corresponds to one inference model, and there is one endpoint under the model. Different versions of the model are implemented through multiple variant concepts under endpoint:
The same model prediction service can configure multiple variants, and each variant has its own downstream IP list. The client code can configure relative weights for each variant to achieve the relationship of adjusting the traffic ratio (refer to the description of variant_weight_list in [Client Configuration](./deprecated/CLIENT_CONFIGURE.md) section 3.2).
Under the premise of meeting certain interface specifications, the service framework does not make any restrictions on user data fields to meet different business interfaces of various forecast services. Baidu-rpc inherits the interface of Protobuf serice, and the user describes the Request and Response business interfaces according to the Protobuf syntax specification. Paddle Serving is built on the Baidu-rpc framework and supports this feature by default.
No matter how the communication protocol changes, the framework only needs to ensure that the communication protocol between the client and server and the format of the business data are synchronized to ensure normal communication. This information can be broken down as follows:
-Protocol: Header information agreed in advance between Server and Client to ensure mutual recognition of data format. Paddle Serving uses Protobuf as the basic communication format
-Data: Used to describe the interface of Request and Response, such as the sample data to be predicted, and the score returned by the prediction. include:
-Data fields: Field definitions included in the two data structures of Request and Return.
-Description interface: similar to the protocol interface, it supports Protobuf by default
Baidu-rpc has built-in data compression methods such as snappy, gzip, zlib, which can be configured in the configuration file (refer to [Client Configuration](./deprecated/CLIENT_CONFIGURE.md) Section 3.1 for an introduction to compress_type)
-Long Term Vision: Online deployment of deep learning models will be a user-facing application in the future. Any AI developer will face the problem of deploying an online service for his or her trained model.
Paddle Serving is the official open source online deployment framework. The long term goal of Paddle Serving is to provide professional, reliable and easy-to-use online service to the last mile of AI application.
-Easy-To-Use: For algorithmic developers to quickly deploy their models online, Paddle Serving designs APIs that can be used with Paddle's training process seamlessly, most Paddle models can be deployed as a service with one line command.
- Industrial Oriented: To meet industrial deployment requirements, Paddle Serving supports lots of large-scale deployment functions: 1) Distributed Sparse Embedding Indexing. 2) Highly concurrent underlying communications. 3) Model Management, online A/B test, model online loading.
## 2. 模块设计与实现
- Extensibility: Paddle Serving supports C++, Python and Golang client, and will support more clients with different languages. It is very easy to extend Paddle Serving to support other machine learning inference library, although currently Paddle inference library is the only official supported inference backend.
The inference phase of Paddle model focuses on 1) input variables of the model. 2) output variables of the model. 3) model structure and model parameters. Paddle Serving Python API provides a `save_model` interface for trained model, and save necessary information for Paddle Serving to use during deployment phase. An example is as follows:
In the example, `{"words": data}` and `{"prediction": prediction}` assign the inputs and outputs of a model. `"words"` and `"prediction"` are alias names of inputs and outputs. The design of alias name is to help developers to memorize model inputs and model outputs. `data` and `prediction` are Paddle `[Variable](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Variable_cn.html#variable)` in training phase that often represents ([Tensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_cn/fluid_cn/Tensor_cn.html#tensor)) or ([LodTensor](https://www.paddlepaddle.org.cn/documentation/docs/zh/beginners_guide/basic_concept/lod_tensor.html#lodtensor)). When the `save_model` API is called, two directories called `"serving_model"` and `"client_conf"` will be generated. The content of the saved model is as follows:
`"serving_client_conf.prototxt"` and `"serving_server_conf.prototxt"` are the client side and the server side configurations of Paddle Serving, and `"serving_client_conf.stream.prototxt"` and `"serving_server_conf.stream.prototxt"` are the corresponding parts. Other contents saved in the directory are the same as Paddle saved inference model. We are considering to support `save_model` interface in Paddle training framework so that a user is not aware of the servable configurations.
Paddle Serving supports inference engine on multiple devices. Current supports are CPU and GPU engine. Docker Images of CPU and GPU are provided officially. User can use one line command to start an inference service either on CPU or on GPU.
For example, `python -m paddle_serving_server.serve --model your_servable_model --thread 10 --port 9292` is the same as the following code as user can define:
Paddle Serving supports remote service access through RPC(remote procedure call) and HTTP. RPC access of remote service can be called through Client API of Paddle Serving. A user can define data preprocess function before calling Paddle Serving's client API. The example below explains how to define the input data of Paddle Serving Client. The servable model has two inputs with alias name of `sparse` and `dense`. `sparse` corresponds to sparse sequence ids such as `[1, 1001, 100001]` and `dense` corresponds to dense vector such as `[0.2, 0.5, 0.1, 0.4, 0.11, 0.22]`. For sparse sequence data, current design supports `lod_level=0` and `lod_level=1` of Paddle, that corresponds to `Tensor` and `LodTensor`. For dense vector, current design supports any `N-D Tensor`. Users do not need to assign the shape of inference model input. The Paddle Serving Client API will check the input data's shape with servable configurations.
The following code sample shows that Paddle Serving Client API connects to Server API with endpoint of the servers. To use the data parallelism ability during prediction, Paddle Serving Client allows users to define multiple server endpoints.
Paddle Serving adopts [baidu-rpc](https://github.com/apache/incubator-brpc) as underlying communication layer. baidu-rpc is an open-source RPC communication library with high concurrency and low latency advantages compared with other open source RPC library. Millions of instances and thousands of services are using baidu-rpc within Baidu.
The core execution engine of Paddle Serving is a Directed acyclic graph(DAG). In the DAG, each node represents a phase of inference service, such as paddle inference prediction, data preprocessing and data postprocessing. DAG can fully parallelize the computation efficiency and can fully utilize the computation resources. For example, when a user has input data that needs to be feed into two models, and combine the scores of the two models, the computation of model scoring is parallelized through DAG.
The underlying communication of Paddle Serving is implemented with C++ as well as the core framework, it is hard for users who do not familiar with C++ to implement new Paddle Serving Server Operators. Another approach is to use the light-weighted Web Service in Paddle Serving Server that can be viewed as a plugin. A user can implement complex data preprocessing and postprocessing logics to build a complex AI service. If access of the AI service has a large volumn, it is worth to implement the service with high performance Paddle Serving Server operators. The relationship between Web Service and RPC Service can be referenced in `User Type`.
Distributed Sparse Parameter Indexing is commonly seen in advertising and recommendation scenarios, and is often used coupled with distributed training. The figure below explains a commonly seen architecture for online recommendation. When the recommendation service receives a request from a user, the system will automatically collects training log for the offline distributed online training. Mean while, the request is sent to Paddle Serving Server. For sparse features, distributed sparse parameter index service is called so that sparse parameters can be looked up. The dense input features together with the looked up sparse model parameters are fed into the Paddle Inference Node of the DAG in Paddle Serving Server. Then the score can be responsed through RPC to product service for item ranking.
Why do we need to support distributed sparse parameter indexing in Paddle Serving? 1) In some recommendation scenarios, the number of features can be up to hundreds of billions that a single node can not hold the parameters within random access memory. 2) Paddle Serving supports distributed sparse parameter indexing that can couple with paddle inference. Users do not need to do extra work to have a low latency inference engine with hundreds of billions of parameters.
### 3.2 模型管理、在线A/B流量测试、模型热加载
### 3.2 Model Management, online A/B test, Model Online Reloading
Paddle Serving's C++ engine supports model management, online A/B test and model online reloading. Currently, python API is not released yet, please wait for the next release.
Paddle Serving provides RPC and HTTP protocol for users. For HTTP service, we recommend users with median or small traffic services to use, and the latency is not a strict requirement. For RPC protocol, we recommend high traffic services and low latency required services to use. For users who use distributed sparse parameter indexing built-in service, it is not necessary to care about the underlying details of communication. The following figure gives out several scenarios that user may want to use Paddle Serving.
For servable models saved from Paddle Serving IO API, users do not need to do extra coding work to startup a service, but may need some coding work on the client side. For development of Web Service plugin, a user needs to provide implementation of Web Service's preprocessing and postprocessing work if needed to get a HTTP service.
Web Service has lots of open sourced framework. Currently Paddle Serving uses Flask as built-in service framework, and users are not aware of this. More efficient web service will be integrated in the furture if needed.
`WebService` is a Base Class, providing inheritable interfaces such `preprocess` and `postprocess` for users to implement. In the inherited class of `WebService` class, users can define any functions they want and the startup function interface is the same as RPC service.
In recommendation and advertisement systems, it is commonly seen to use vector based index or tree based indexing service to do candidate retrievals. These retrieval tasks will be built-in services of Paddle Serving.
Paddle Serving is PaddlePaddle's online estimation service framework, which can help developers easily implement remote prediction services that call deep learning models from mobile and server ends. At present, Paddle Serving is mainly based on models that support PaddlePaddle training. It can be used in conjunction with the Paddle training framework to quickly deploy inference services. Paddle Serving is designed around common industrial-level deep learning model deployment scenarios. Some common functions include multi-model management, model hot loading, [Baidu-rpc](https://github.com/apache/incubator-brpc)-based high-concurrency low-latency response capabilities, and online model A/B tests. The API that cooperates with the Paddle training framework can enable users to seamlessly transition between training and remote deployment, improving the landing efficiency of deep learning models.
------------
## Quick Start
Paddle Serving's current develop version supports lightweight Python API for fast predictions, and training with Paddle can get through. We take the most classic Boston house price prediction as an example to fully explain the process of model training on a single machine and model deployment using Paddle Serving.
#### Install
It is highly recommended that you build Paddle Serving inside Docker, please read [How to run PaddleServing in Docker](RUN_IN_DOCKER.md)
Paddle Serving is Paddle's high-performance online inference service framework, which can flexibly support the deployment of most models. In this article, the IMDB review sentiment analysis task is used as an example to show the entire process from model training to deployment of inference service through 9 steps.
Paddle Serving can be deployed on Linux environments such as Centos and Ubuntu. On other systems or in environments where you do not want to install the serving module, you can still access the server-side prediction service through the http service.
You can choose to install the cpu or gpu version of the server module according to the requirements and machine environment, and install the client module on the client machine. When you want to access the server with http
After simple preparation, we will take the IMDB review sentiment analysis task as an example to show the process from model training to deployment of prediction services. All the code in the example can be found in the [IMDB example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb) of the Paddle Serving code base, the data and dictionary used in the example The file can be obtained by executing the get_data.sh script in the IMDB sample code.
## Step2:确定任务和原始数据格式
## Step2:Determine Tasks and Raw Data Format
IMDB评论情感分析任务是对电影评论的内容进行二分类,判断该评论是属于正面评论还是负面评论。
首先我们来看一下原始的数据:
IMDB review sentiment analysis task is to classify the content of movie reviews to determine whether the review is a positive review or a negative review.
First let's take a look at the raw data:
```
saw a trailer for this on another video, and decided to rent when it came out. boy, was i disappointed! the story is extremely boring, the acting (aside from christopher walken) is bad, and i couldn't care less about the characters, aside from really wanting to see nora's husband get thrashed. christopher walken's role is such a throw-away, what a tease! | 0
This is a sample of English comments. The sample uses | as the separator. The content of the comment is before the separator. The label is the sample after the separator. 0 is the negative while 1 is the positive.
## Step3:定义Reader,划分训练集、测试集
## Step3:Define Reader, divide training set and test set
For the original text we need to convert it to a numeric id that the neural network can use. The imdb_reader.py script defines the method of text idization, and the words are mapped to integers through the dictionary file imdb.vocab.
<details>
<summary>imdb_reader.py</summary>
...
...
@@ -102,17 +102,17 @@ class IMDBDataset(dg.MultiSlotDataGenerator):
```
</details>
映射之后的样本类似于以下的格式:
The sample after mapping is similar to the following format:
Net we use [CNN Model](https://www.paddlepaddle.org.cn/documentation/docs/zh/user_guides/nlp_case/understand_sentiment/README.cn.html#cnn) for training, in nets.py we define the network structure.
Use training dataset for training. The training script is local_train.py. After training, use the paddle_serving_client.io.save_model function to save the model files and configuration files used by the servingdeployment.
 As can be seen from the above figure, the loss of the model starts to converge after the 65th round. We save the model and configuration file after the 65th round of training is completed. The saved files are divided into imdb_cnn_client_conf and imdb_cnn_model folders. The former contains client-side configuration files, and the latter contains server-side configuration files and saved model files.
The parameter list of the save_model function is as follows:
| server_model_folder | Directory for server-side configuration files and model files |
| client_config_folder | Directory for saving client configuration files |
| feed_var_dict | The input of the inference model. The dict type and key can be customized. The value is the input variable in the model. Each key corresponds to a variable. When using the prediction service, the input data uses the key as the input name. |
| fetch_var_dict | The output of the model used for prediction, dict type, key can be customized, value is the input variable in the model, and each key corresponds to a variable. When using the prediction service, use the key to get the returned data |
The Paddle Serving framework supports two types of prediction service methods. One is to communicate through RPC and the other is to communicate through HTTP. The deployment and use of RPC prediction service will be introduced first. The deployment and use of HTTP prediction service will be introduced at Step 8. .
The parameter --model in the command specifies the server-side model and configuration file directory previously saved, --port specifies the port of the prediction service. When deploying the gpu prediction service using the gpu version, you can use --gpu_ids to specify the gpu used.
执行完以上命令之一,就完成了IMDB 情感分析任务的RPC预测服务部署。
After executing one of the above commands, the RPC prediction service deployment of the IMDB sentiment analysis task is completed.
## Step6:复用Reader,定义远程RPC客户端
下面我们通过Python代码来访问RPC预测服务,脚本为test_client.py
## Step6: Reuse Reader, define remote RPC client
Below we access the RPC prediction service through Python code, the script is test_client.py
<details>
<summary>test_client.py</summary>
...
...
@@ -267,7 +267,7 @@ client = Client()
client.load_client_config(sys.argv[1])
client.connect(["127.0.0.1:9292"])
#在这里复用了数据预处理部分的代码将原始文本转换成数字id
#The code of the data preprocessing part is reused here to convert the original text into a numeric id
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource(sys.argv[2])
...
...
@@ -281,30 +281,29 @@ for line in sys.stdin:
</details>
脚本从标准输入接收数据,并打印出样本预测为1的概率与真实的label。
The script receives data from standard input and prints out the probability that the sample whose infer result is 1 and its real label.
## Step7:调用RPC服务,测试模型效果
## Step7: Call the RPC service to test the model effect
Using 2084 samples in the test_data/part-0 file for test testing, the model prediction accuracy is 88.19%.
## Step8:部署HTTP预测服务
** Note **: The effect of each model training may be slightly different, and the accuracy of predictions using the trained model will be close to the examples but may not be exactly the same.
When using the HTTP prediction service, the client does not need to install any modules of Paddle Serving, it only needs to be able to send HTTP requests. Of course, the HTTP method consumes more time in the communication phase than the RPC method.
For the IMDB sentiment analysis task, the original text needs to be preprocessed before prediction. In the RPC prediction service, we put the preprocessing in the client's script, and in the HTTP prediction service, we put the preprocessing on the server. Paddle Serving's HTTP prediction service framework prepares data pre-processing and post-processing interfaces for this situation. We just need to rewrite it according to the needs of the task.
下面我们来看一下启动HTTP预测服务的脚本text_classify_service.py。
Serving provides sample code, which is obtained by executing the imdb_web_service_demo.sh script in [IMDB Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/imdb).
Let's take a look at the script text_classify_service.py that starts the HTTP prediction service.
<details>
<summary>text_clssify_service.py</summary>
...
...
@@ -313,7 +312,7 @@ from paddle_serving_server.web_service import WebService
from imdb_reader import IMDBDataset
import sys
#继承框架中的WebService类
#extend class WebService
class IMDBService(WebService):
def prepare_dict(self, args={}):
if len(args) == 0:
...
...
@@ -321,7 +320,7 @@ class IMDBService(WebService):
In the above command, the first parameter is the saved server-side model and configuration file. The second parameter is the working directory, which will save some configuration files for the prediction service. The directory may not exist but needs to be specified. The prediction service will be created by itself. the third parameter is Port number, the fourth parameter is the dictionary file.
## Step9:明文数据调用预测服务
启动完HTTP预测服务,即可通过一行命令进行预测:
## Step9: Call the prediction service with plaintext data
After starting the HTTP prediction service, you can make prediction with a single command:
```
curl -H "Content-Type:application/json" -X POST -d '{"words": "i am very sad | 0", "fetch":["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
预测流程正常时,会返回预测概率,示例如下。
```
curl -H "Content-Type: application / json" -X POST -d '{"words": "i am very sad | 0", "fetch": ["prediction"]}' http://127.0.0.1:9292/imdb/prediction
```
When the inference process is normal, the prediction probability is returned, as shown below.
** Note **: The effect of each model training may be slightly different, and the inferred probability value using the trained model may not be consistent with the example.