python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #cpu inference service
python3 -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #cpu inference service
```
```
Or,start gpu inference service,Run
Or,start gpu inference service,Run
```
```
python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
python3 -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
```
```
### BRPC-Client Inference
### BRPC-Client Inference
before prediction we should install paddle_serving_app. This module provides data preprocessing for BERT model.
before prediction we should install paddle_serving_app. This module provides data preprocessing for BERT model.
```
```
pip install paddle_serving_app
pip3 install paddle_serving_app
```
```
Run
Run
```
```
head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
head data-c.txt | python3 bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
```
```
the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
...
@@ -68,7 +68,7 @@ the client reads data from data-c.txt and send prediction request, the predictio
...
@@ -68,7 +68,7 @@ the client reads data from data-c.txt and send prediction request, the predictio
#### GRPC-Client/HTTP-Client
#### GRPC-Client/HTTP-Client
Run
Run
```
```
head data-c.txt | python bert_httpclient.py --model bert_seq128_client/serving_client_conf.prototxt
head data-c.txt | python3 bert_httpclient.py --model bert_seq128_client/serving_client_conf.prototxt
@@ -13,14 +13,14 @@ In order to show the time consuming of each stage more intuitively, a script is
...
@@ -13,14 +13,14 @@ In order to show the time consuming of each stage more intuitively, a script is
When using, first save the output of the client to a file, taking `profile` as an example.
When using, first save the output of the client to a file, taking `profile` as an example.
```
```
python show_profile.py profile ${thread_num}
python3 show_profile.py profile ${thread_num}
```
```
Here the `thread_num` parameter is the number of processes when the client is running, and the script will calculate the average time spent in each phase according to this parameter.
Here the `thread_num` parameter is the number of processes when the client is running, and the script will calculate the average time spent in each phase according to this parameter.
The script calculates the time spent in each stage, divides by the number of threads to average, and prints to standard output.
The script calculates the time spent in each stage, divides by the number of threads to average, and prints to standard output.
```
```
python timeline_trace.py profile trace
python3 timeline_trace.py profile trace
```
```
The script converts the time-dot information in the log into a json format and saves it to a trace file. The trace file can be visualized through the tracing function of the Chrome browser.
The script converts the time-dot information in the log into a json format and saves it to a trace file. The trace file can be visualized through the tracing function of the Chrome browser.
After the prediction is completed, a json file to save the prediction result and a picture with the detection result box will be generated in the `./outpu folder.
After the prediction is completed, a json file to save the prediction result and a picture with the detection result box will be generated in the `./outpu folder.