README.md 3.2 KB
Newer Older
J
Jiawei Wang 已提交
1 2 3 4 5
## Bert as service

([简体中文](./README_CN.md)|English)

In the example, a BERT model is used for semantic understanding prediction, and the text is represented as a vector, which can be used for further analysis and prediction.
6
If your python version is 3.X, replace the 'pip' field in the following command with 'pip3',replace 'python' with 'python3'.
J
Jiawei Wang 已提交
7 8

### Getting Model
9
method 1:
J
Jiawei Wang 已提交
10 11 12 13 14 15 16 17 18
This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub).

Install paddlehub first
```
pip install paddlehub
```

run 
```
19
python prepare_model.py 128
J
Jiawei Wang 已提交
20 21
```

22 23 24
the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
the config file and model file for server side are saved in the folder bert_seq128_model.
the config file generated for client side is saved in the folder bert_seq128_client.
B
barrierye 已提交
25

26
method 2:
27 28 29 30 31
You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder:
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
```
32
if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'.
J
Jiawei Wang 已提交
33 34 35 36 37 38 39 40 41

### Getting Dict and Sample Dataset

```
sh get_data.sh
```
this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt

### RPC Inference Service
42
start cpu inference service,Run
J
Jiawei Wang 已提交
43
```
44
python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292  #cpu inference service
J
Jiawei Wang 已提交
45
```
46
Or,start gpu inference service,Run
J
Jiawei Wang 已提交
47
```
48
python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
J
Jiawei Wang 已提交
49 50 51 52 53 54 55 56 57 58
```

### RPC Inference

before prediction we should install paddle_serving_app. This module provides data preprocessing for BERT model.
```
pip install paddle_serving_app
```
Run
```
59
head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
J
Jiawei Wang 已提交
60 61 62 63 64
```

the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).

### HTTP Inference Service
65 66 67 68 69 70
start cpu HTTP inference service,Run
```
 python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service
```

Or,start gpu HTTP inference service,Run
J
Jiawei Wang 已提交
71 72 73 74 75
```
 export CUDA_VISIBLE_DEVICES=0,1
```
set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
```
76
 python bert_web_service_gpu.py bert_seq128_model/ 9292 #launch gpu inference service
J
Jiawei Wang 已提交
77 78 79 80
```
### HTTP Inference 

```
M
MRXLT 已提交
81
curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction
J
Jiawei Wang 已提交
82
```