README.md 2.3 KB
Newer Older
J
Jiawei Wang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## Timeline Tool Tutorial

([简体中文](./README_CN.md)|English)

The serving framework has a built-in function for predicting the timing of each stage of the service. The client controls whether to turn on the environment through environment variables. After opening, the information will be output to the screen.
```
export FLAGS_profile_client=1 #turn on the client timing tool for each stage
export FLAGS_profile_server=1 #turn on the server timing tool for each stage
```
After enabling this function, the client will print the corresponding log information to standard output during the prediction process.

In order to show the time consuming of each stage more intuitively, a script is provided to further analyze and process the log file.

When using, first save the output of the client to a file, taking `profile` as an example.
```
T
TeslaZhao 已提交
16
python3 show_profile.py profile ${thread_num}
J
Jiawei Wang 已提交
17 18 19 20 21 22
```
Here the `thread_num` parameter is the number of processes when the client is running, and the script will calculate the average time spent in each phase according to this parameter.

The script calculates the time spent in each stage, divides by the number of threads to average, and prints to standard output.

```
T
TeslaZhao 已提交
23
python3 timeline_trace.py profile trace
J
Jiawei Wang 已提交
24 25 26 27 28 29 30
```
The script converts the time-dot information in the log into a json format and saves it to a trace file. The trace file can be visualized through the tracing function of the Chrome browser.

Specific operation: Open the chrome browser, enter `chrome://tracing/` in the address bar, jump to the tracing page, click the `load` button, and open the saved trace file to visualize the time information of each stage of the prediction service.

The data visualization output is shown as follow, it uses [bert as service example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) GPU inference service. The server starts 4 GPU prediction, the client starts 4 `processes`, and the timeline of each stage when the batch size is 1. Among them, `bert_pre` represents the data preprocessing stage of the client, and `client_infer` represents the stage where the client completes sending and receiving prediction requests. `process` represents the process number of the client, and the second line of each process shows the timeline of each op of the server.

T
TeslaZhao 已提交
31
![timeline](../../../doc/images/timeline-example.png)