Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `python/examples/pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:
Here, we build a simple imdb model enable example to show how to use Pipeline Serving. The relevant code can be found in the `Serving/examples/Pipeline/imdb_model_ensemble` folder. The Server-side structure in the example is shown in the following figure:
@@ -26,6 +26,6 @@ The script converts the time-dot information in the log into a json format and s
Specific operation: Open the chrome browser, enter `chrome://tracing/` in the address bar, jump to the tracing page, click the `load` button, and open the saved trace file to visualize the time information of each stage of the prediction service.
The data visualization output is shown as follow, it uses [bert as service example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
The data visualization output is shown as follow, it uses [bert as service example](../C++/PaddleNLP/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
3.注意PipelineClientExample.java中的ip和port(位于java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)),需要与对应Pipeline server的config.yaml文件中配置的ip和port相对应。(以IMDB model ensemble模型为例,位于python/examples/pipeline/imdb_model_ensemble/[config.yaml](../python/examples/pipeline/imdb_model_ensemble/config.yml))
3.注意PipelineClientExample.java中的ip和port(位于java/examples/src/main/java/[PipelineClientExample.java](./examples/src/main/java/PipelineClientExample.java)),需要与对应Pipeline server的config.yaml文件中配置的ip和port相对应。(以IMDB model ensemble模型为例,位于python/examples/pipeline/imdb_model_ensemble/[config.yaml](../examples/Pipeline/imdb_model_ensemble/config.yml))
- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes,[example](../examples/imagenet/resnet50_rpc_client.py)
- The image preprocessing method is more flexible than the above method, and can be combined by the following multiple classes,[example](../../examples/C++/PaddleClas/imagenet/resnet50_rpc_client.py)
- class Sequentia
...
...
@@ -144,7 +144,7 @@ This tool is convenient to analyze the proportion of time occupancy in the predi
Load the trace file generated in the previous step through the load button, you can
Visualize the time information of each stage of the forecast service.
As shown in next figure, the figure shows the timeline of GPU prediction service using [bert example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert).
As shown in next figure, the figure shows the timeline of GPU prediction service using [bert example](../../examples/C++/PaddleNLP/bert).
The server side starts service with 4 GPU cards, the client side starts 4 processes to request, and the batch size is 1.
In the figure, bert_pre represents the data pre-processing stage of the client, and client_infer represents the stage where the client completes the sending of the prediction request to the receiving result.
The process in the figure represents the process number of the client, and the second line of each process shows the timeline of each op of the server.
...
...
@@ -157,7 +157,7 @@ The inference op of Paddle Serving is implemented based on Paddle inference lib.
Before deploying the prediction service, you may need to check the input and output of the prediction service or check the resource consumption.
Therefore, a local prediction tool is built into the paddle_serving_app, which is used in the same way as sending a request to the server through the client.
Taking [fit_a_line prediction service](../examples/fit_a_line) as an example, the following code can be used to run local prediction.
Taking [fit_a_line prediction service](../../examples/C++/fit_a_line) as an example, the following code can be used to run local prediction.