This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](../python/examples/imagenet/resnet50_web_service.py).
This document will take Uci service as an example to introduce how to develop a new Web Service. You can check out the complete code [here](../python/examples/pipeline/simple_web_service/web_service.py).
## WebService base class
## Op base class
In some services, a single model may not meet business needs, requiring multiple models to be concatenated or parallel to complete the entire service. We call a single model operation Op and provide a simple set of interfaces to implement the complex logic of Op concatenation or parallelism.
Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `preprocess` and `postprocess` method. The default implementation is as follows:
Data between Ops is passed as a dictionary, Op can be started as threads or process, and Op can be configured for the number of concurrencies, etc.
Typically, you need to inherit the Op base class and override its `init_op`, `preprocess` and `postprocess` methods, which are implemented by default as follows:
"Failed to run preprocess: this Op has multiple previous "
"inputs. Please override this func.")
os._exit(-1)
(_,input_dict),=input_dicts.items()
returninput_dict
defpostprocess(self,input_dicts,fetch_dict):
returnfetch_dict
```
```
### init_op
This method is used to load user-defined resources such as dictionaries. A separator is loaded in the [UciOp](../python/examples/pipeline/simple_web_service/web_service.py).
**Note**: If Op is launched in threaded mode, different threads of the same Op execute `init_op` only once and share `init_op` loaded resources when Op is multi-concurrent.
### preprocess
### preprocess
The preprocess method has two input parameters, `feed` and `fetch`. For an HTTP request `request`:
This method is used to preprocess the data before model prediction. It has an `input_dicts` parameter, `input_dicts` is a dictionary, key is the `name` of the previous Op, and value is the data transferred from the corresponding previous op (the data is also in dictionary format).
- The value of `feed` is the feed part `request.json["feed"]` in the request data
The `preprocess` method needs to process the data into a ndarray dictionary (key is the feed variable name, and value is the corresponding ndarray value). Op will take the return value as the input of the model prediction and pass the output to the `postprocess` method.
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
The return values are the feed and fetch values used in the prediction.
**Note**: if Op does not have a model configuration file, the return value of `preprocess` will be directly passed to `postprocess`.
### postprocess
### postprocess
The postprocess method has three input parameters, `feed`, `fetch` and `fetch_map`:
This method is used for data post-processing after model prediction. It has two parameters, `input_dicts` and `fetch_dict`.
Where the `input_dicts` parameter is consistent with the parameter in `preprocess` method, and `fetch_dict` is the output of the model prediction (key is the name of the fetch variable, and value is the corresponding ndarray value). Op will take the return value of `postprocess` as the input of subsequent Op `preprocess`.
- The value of `feed` is the feed part `request.json["feed"]` in the request data
**Note**: if Op does not have a model configuration file, `fetch_dict` will be the return value of `preprocess`.
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
- The value of `fetch_map` is the model output value.
The return value will be processed as `{"reslut": fetch_map}` as the return of the HTTP request.
Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `get_pipeline_response` method to define the topological relationship between Ops. The default implementation is as follows:
```python
```python
classImageService(WebService):
classWebService(object):
defget_pipeline_response(self,read_op):
defpreprocess(self,feed={},fetch=[]):
returnNone
reader=ImageReader()
```
feed_batch=[]
forinsinfeed:
Where `read_op` serves as the entry point of the topology map of the whole service (that is, the first op defined by the user is followed by `read_op`).
if"image"notinins:
raise("feed data error!")
For single Op service (single model), take Uci service as an example (there is only one Uci prediction model in the whole service):
sample=base64.b64decode(ins["image"])
img=reader.process_image(sample)
```python
feed_batch.append({"image":img})
classUciService(WebService):
returnfeed_batch,fetch
defget_pipeline_response(self,read_op):
uci_op=UciOp(name="uci",input_ops=[read_op])
returnuci_op
```
For multiple Op services (multiple models), take Ocr service as an example (the whole service is completed in series by Det model and Rec model):
```python
classOcrService(WebService):
defget_pipeline_response(self,read_op):
det_op=DetOp(name="det",input_ops=[read_op])
rec_op=RecOp(name="rec",input_ops=[det_op])
returnrec_op
```
WebService objects need to load a yaml configuration file through the `prepare_pipeline_config` to configure each Op and the entire service. The simplest configuration file is as follows (Uci example):
```yaml
http_port:18080
op:
uci:
local_service_conf:
model_config:uci_housing_model# path
```
All field names of yaml file are as follows:
```yaml
rpc_port:18080# gRPC port
build_dag_each_worker:false# Whether to use process server or not. The default is false
worker_num:1# gRPC thread pool size (the number of processes in the process version servicer). The default is 1
http_port:0# HTTP service port. Do not start HTTP service when the value is less or equals 0. The default value is 0.
dag:
is_thread_op:true# Whether to use the thread version of OP. The default is true
client_type:brpc# Use brpc or grpc client. The default is brpc
retry:1# The number of times DAG executor retries after failure. The default value is 1, that is, no retrying
use_profile:false# Whether to print the log on the server side. The default is false
tracer:
interval_s:-1# Monitoring time interval of Tracer (in seconds). Do not start monitoring when the value is less than 1. The default value is -1
op:
<op_name>:# op name, corresponding to the one defined in the program
concurrency:1# op concurrency number, the default is 1
timeout:-1# predict timeout in milliseconds. The default value is -1, that is, no timeout
retry:1# timeout retransmissions. The default value is 1, that is, do not try again
batch_size:1# If this field is set, Op will merge multiple request outputs into a single batch
auto_batching_timeout:-1# auto-batching timeout in milliseconds. The default value is -1, that is, no timeout
local_service_conf:
model_config:# the path of the corresponding model file. There is no default value(None). If this item is not configured, the model file will not be loaded.
workdir:""# working directory of corresponding model
thread_num:2# the corresponding model is started with thread_num threads
devices:""# on which device does the model launched. You can specify the GPU card number(such as "0,1,2"), which is CPU by default
mem_optim:true# mem optimization option, the default is true
ir_optim:false# ir optimization option, the default is false
```
```
For the above `ImageService`, only the `preprocess` method is rewritten to process the image data in Base64 format into the data format required by prediction.
All fields of Op can be defined when Op is created in the program (which will override yaml fields).
serving_client (str, optional) - The path of configuration files for client. Default: "serving_client".
Arguments are the same as `inference_model_to_serving` API.
| Argument | Type | Default | Description |
model_filename (str, optional) - The name of file to load the inference program. If it is None, the default filename `__model__` will be used. Default: None.
| `dirname` | str | - | Path of saved model files. Program file and parameter files are saved in this directory. |
paras_filename (str, optional) - The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. Default: None.
| `serving_server` | str | `"serving_server"` | The path of model files and configuration files for server. |
| `serving_client` | str | `"serving_client"` | The path of configuration files for client. |
| `model_filename` | str | None | The name of file to load the inference program. If it is None, the default filename `__model__` will be used. |
| `paras_filename` | str | None | The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. |
This document will take the image classification service based on the Imagenet data set as an example to introduce how to develop a new web service. The complete code can be visited at [here](../python/examples/imagenet/resnet50_web_service.py).
## WebService base class
Paddle Serving implements the [WebService](https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_server/web_service.py#L23) base class. You need to override its `preprocess` and `postprocess` method. The default implementation is as follows:
The preprocess method has two input parameters, `feed` and `fetch`. For an HTTP request `request`:
- The value of `feed` is the feed part `request.json["feed"]` in the request data
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
The return values are the feed and fetch values used in the prediction.
### postprocess
The postprocess method has three input parameters, `feed`, `fetch` and `fetch_map`:
- The value of `feed` is the feed part `request.json["feed"]` in the request data
- The value of `fetch` is the fetch part `request.json["fetch"]` in the request data
- The value of `fetch_map` is the model output value.
The return value will be processed as `{"reslut": fetch_map}` as the return of the HTTP request.
## Develop ImageService class
```python
classImageService(WebService):
defpreprocess(self,feed={},fetch=[]):
reader=ImageReader()
feed_batch=[]
forinsinfeed:
if"image"notinins:
raise("feed data error!")
sample=base64.b64decode(ins["image"])
img=reader.process_image(sample)
feed_batch.append({"image":img})
returnfeed_batch,fetch
```
For the above `ImageService`, only the `preprocess` method is rewritten to process the image data in Base64 format into the data format required by prediction.
help='Path of saved model files. Program file and parameter files are saved in this directory.'
)
parser.add_argument(
"--serving_server",
type=str,
default="serving_server",
help='The path of model files and configuration files for server. Default: "serving_server".'
)
parser.add_argument(
"--serving_client",
type=str,
default="serving_client",
help='The path of configuration files for client. Default: "serving_client".'
)
parser.add_argument(
"--model_filename",
type=str,
default=None,
help='The name of file to load the inference program. If it is None, the default filename __model__ will be used'
)
parser.add_argument(
"--params_filename",
type=str,
default=None,
help='The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. Default: None.'