`imdb_model` is the server side model with serving configurations. `imdb_client_conf` is the client rpc configurations. Serving has a
dictionary for `Feed` and `Fetch` variables for client to assign. In the example, `{"words": data}` is the feed dict that specify the input of saved inference model. `{"prediction": prediction}` is the fetch dic that specify the output of saved inference model. An alias name can be defined for feed and fetch variables. An example of how to use alias name
`imdb_model` is the server side model with serving configurations. `imdb_client_conf` is the client rpc configurations.
Serving has a dictionary for `Feed` and `Fetch` variables for client to assign. In the example, `{"words": data}` is the feed dict that specify the input of saved inference model. `{"prediction": prediction}` is the fetch dic that specify the output of saved inference model. An alias name can be defined for feed and fetch variables. An example of how to use alias name
is as follows:
``` python
frompaddle_serving_clientimportClient
...
...
@@ -35,10 +36,14 @@ for line in sys.stdin:
If you have saved model files using Paddle's `save_inference_model` API, you can use Paddle Serving's` inference_model_to_serving` API to convert it into a model file that can be used for Paddle Serving.
dirname (str) - Path of saved model files. Program file and parameter files are saved in this directory.
model_filename (str, optional) - The name of file to load the inference program. If it is None, the default filename __model__ will be used. Default: None.
paras_filename (str, optional) - The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. Default: None.
serving_server (str, optional) - The path of model files and configuration files for server. Default: "serving_server".
serving_client (str, optional) - The path of configuration files for client. Default: "serving_client".
model_filename (str, optional) - The name of file to load the inference program. If it is None, the default filename `__model__` will be used. Default: None.
paras_filename (str, optional) - The name of file to load all parameters. It is only used for the case that all parameters were saved in a single binary file. If parameters were saved in separate files, set it as None. Default: None.
* Running on http://0.0.0.0:9393/ (Press CTRL+C to quit)
```
这里会提示启动的HTTP服务是开发模式,并不能用于生产环境的部署。Flask启动的服务环境不够稳定也无法承受大量请求的并发,实际部署过程中配合需要WSGI(Web Server Gateway Interface)使用。
Here you will be prompted that the HTTP service started is in development mode and cannot be used for production deployment.
The prediction service started by Flask is not stable enough to withstand the concurrency of a large number of requests. In the actual deployment process, WSGI (Web Server Gateway Interface) is used.
If you want to have more detection models, please refer to [Paddle Detection Model Zoo](https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/MODEL_ZOO_cn.md)
paddle_serving_app is a tool component of the Paddle Serving framework, and includes functions such as pre-training model download and data pre-processing methods.
It is convenient for users to quickly test and deploy model examples, analyze the performance of prediction services, and debug model prediction services.
paddle_serving_app provides a variety of data preprocessing methods for prediction tasks in the field of CV and NLP.
- class ChineseBertReader
Preprocessing for Chinese semantic representation task.
-`__init__(vocab_file, max_seq_len=20)`
- vocab_file(st ):Path of dictionary file.
- max_seq_len(in ,optional):The length of sample after processing. The excess part will be truncated, and the insufficient part will be padding 0. Default 20.
-`process(line)`
- line(st ):Text input.
[example](../examples/bert/bert_client.py)
- class LACReader
Preprocessing for Chinese word segmentation task.
-`__init__(dict_floder)`
- dict_floder(st )Path of dictionary file.
-`process(sent)`
- sent(st ):Text input.
-`parse_result`
- words(st ):Original text input.
- crf_decode(np.array):CRF code predicted by model.
[example](../examples/bert/lac_web_service.py)
- class SentaReader
-`__init__(vocab_path)`
- vocab_path(st ):Path of dictionary file.
-`process(cols)`
- cols(st ):Word segmentation result.
[example](../examples/senta/senta_web_service.py)
- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes,[example](../examples/imagenet/image_rpc_client.py)
- class Sequentia
-`__init__(transforms)`
- transforms(list):List of image preprocessing classes
-`__call__(img)`
- img:The input of image preprocessing. The data type is is related to the first preprocessing method in transforms.
- size(list/int):The expected image size, when the input is a list type, it needs to contain the expected length and width. When the input is int type, the short side will be set to the length of size, and the long side will be scaled proportionally.
-`__call__(img)`
- img(numpy array):Image data.
## Timeline tools
The Timeline tool can be used to visualize the start and end time of various stages such as the preparation data of the prediction service, client wait and server op.
This tool is convenient to analyze the proportion of time occupancy in the prediction service. On this basis, prediction services can be optimized in a targeted manner.
### How to use
1. Before making predictions on the client side, turn on the timeline function of each stage in the Paddle Serving framework by environment variables. It will print timeline information in log.
```shell
export FLAGS_profile_client=1 # Turn on timeline function of client
export FLAGS_profile_server=1 # Turn on timeline function of server
```
2. Perform predictions and redirect client-side logs to files, for example, named as profile.
3. Export the information in the log file into a trace file.
4. Open the `chrome: // tracing /` URL using Chrome browser.
Load the trace file generated in the previous step through the load button, you can
Visualize the time information of each stage of the forecast service.
As shown in next figure, the figure shows the timeline of GPU prediction service using [bert example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert).
The server side starts service with 4 GPU cards, the client side starts 4 processes to request, and the batch size is 1.
In the figure, bert_pre represents the data pre-processing stage of the client, and client_infer represents the stage where the client completes the sending of the prediction request to the receiving result.
The process in the figure represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
![timeline](../../doc/timeline-example.png)
## Debug tools
The inference op of Paddle Serving is implemented based on Paddle inference lib.
Before deploying the prediction service, you may need to check the input and output of the prediction service or check the resource consumption.
Therefore, a local prediction tool is built into the paddle_serving_app, which is used in the same way as sending a request to the server through the client.
Taking [fit_a_line prediction service](../examples/fit_a_line) as an example, the following code can be used to run local prediction.