提交 814a6602 编写于 作者: B barrierye

download bert model from bos in CI script

上级 2383d403
...@@ -15,12 +15,17 @@ pip install paddlehub ...@@ -15,12 +15,17 @@ pip install paddlehub
run run
``` ```
python prepare_model.py 20 python prepare_model.py 128
``` ```
the 20 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing. the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing.
the config file and model file for server side are saved in the folder bert_seq20_model. the config file and model file for server side are saved in the folder bert_seq128_model.
the config file generated for client side is saved in the folder bert_seq20_client. the config file generated for client side is saved in the folder bert_seq128_client.
You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder:
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
```
### Getting Dict and Sample Dataset ### Getting Dict and Sample Dataset
...@@ -32,11 +37,11 @@ this script will download Chinese Dictionary File vocab.txt and Chinese Sample D ...@@ -32,11 +37,11 @@ this script will download Chinese Dictionary File vocab.txt and Chinese Sample D
### RPC Inference Service ### RPC Inference Service
Run Run
``` ```
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #cpu inference service python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #cpu inference service
``` ```
Or Or
``` ```
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0 python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
``` ```
### RPC Inference ### RPC Inference
...@@ -47,7 +52,7 @@ pip install paddle_serving_app ...@@ -47,7 +52,7 @@ pip install paddle_serving_app
``` ```
Run Run
``` ```
head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
``` ```
the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it). the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
...@@ -58,7 +63,7 @@ the client reads data from data-c.txt and send prediction request, the predictio ...@@ -58,7 +63,7 @@ the client reads data from data-c.txt and send prediction request, the predictio
``` ```
set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used. set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
``` ```
python bert_web_service.py bert_seq20_model/ 9292 #launch gpu inference service python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service
``` ```
### HTTP Inference ### HTTP Inference
...@@ -75,7 +80,7 @@ GPU:GPU V100 * 1 ...@@ -75,7 +80,7 @@ GPU:GPU V100 * 1
CUDA/cudnn Version:CUDA 9.2,cudnn 7.1.4 CUDA/cudnn Version:CUDA 9.2,cudnn 7.1.4
In the test, 10 thousand samples in the sample data are copied into 100 thousand samples. Each client thread sends a sample of the number of threads. The batch size is 1, the max_seq_len is 20, and the time unit is seconds. In the test, 10 thousand samples in the sample data are copied into 100 thousand samples. Each client thread sends a sample of the number of threads. The batch size is 1, the max_seq_len is 20(not 128 as described above), and the time unit is seconds.
When the number of client threads is 4, the prediction speed can reach 432 samples per second. When the number of client threads is 4, the prediction speed can reach 432 samples per second.
Because a single GPU can only perform serial calculations internally, increasing the number of client threads can only reduce the idle time of the GPU. Therefore, after the number of threads reaches 4, the increase in the number of threads does not improve the prediction speed. Because a single GPU can only perform serial calculations internally, increasing the number of client threads can only reduce the idle time of the GPU. Therefore, after the number of threads reaches 4, the increase in the number of threads does not improve the prediction speed.
......
...@@ -13,11 +13,16 @@ pip install paddlehub ...@@ -13,11 +13,16 @@ pip install paddlehub
``` ```
执行 执行
``` ```
python prepare_model.py 20 python prepare_model.py 128
```
参数128表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq128_model文件夹。
生成client端配置文件,存放在bert_seq128_client文件夹。
您也可以从bos上直接下载上述模型(max_seq_len=128),解压后server端配置文件与模型文件存放在bert_chinese_L-12_H-768_A-12_model文件夹,client端配置文件存放在bert_chinese_L-12_H-768_A-12_client文件夹:
```shell
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
``` ```
参数20表示BERT模型中的max_seq_len,即预处理后的样本长度。
生成server端配置文件与模型文件,存放在bert_seq20_model文件夹
生成client端配置文件,存放在bert_seq20_client文件夹
### 获取词典和样例数据 ### 获取词典和样例数据
...@@ -29,11 +34,11 @@ sh get_data.sh ...@@ -29,11 +34,11 @@ sh get_data.sh
### 启动RPC预测服务 ### 启动RPC预测服务
执行 执行
``` ```
python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 #启动cpu预测服务 python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #启动cpu预测服务
``` ```
或者 或者
``` ```
python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
``` ```
### 执行预测 ### 执行预测
...@@ -44,7 +49,7 @@ pip install paddle_serving_app ...@@ -44,7 +49,7 @@ pip install paddle_serving_app
``` ```
执行 执行
``` ```
head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
``` ```
启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。 启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。
...@@ -54,7 +59,7 @@ head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client ...@@ -54,7 +59,7 @@ head data-c.txt | python bert_client.py --model bert_seq20_client/serving_client
``` ```
通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu 通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu
``` ```
python bert_web_service.py bert_seq20_model/ 9292 #启动gpu预测服务 python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务
``` ```
### 执行预测 ### 执行预测
...@@ -70,7 +75,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[ ...@@ -70,7 +75,7 @@ curl -H "Content-Type:application/json" -X POST -d '{"words": "hello", "fetch":[
环境:CUDA 9.2,cudnn 7.1.4 环境:CUDA 9.2,cudnn 7.1.4
测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20,时间单位为秒. 测试中将样例数据中的1W个样本复制为10W个样本,每个client线程发送线程数分之一个样本,batch size为1,max_seq_len为20(而非上面的128),时间单位为秒.
在client线程数为4时,预测速度可以达到432样本每秒。 在client线程数为4时,预测速度可以达到432样本每秒。
由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。 由于单张GPU内部只能串行计算,client线程增多只能减少GPU的空闲时间,因此在线程数达到4之后,线程数增多对预测速度没有提升。
......
...@@ -29,7 +29,7 @@ from paddle_serving_app import ChineseBertReader ...@@ -29,7 +29,7 @@ from paddle_serving_app import ChineseBertReader
args = benchmark_args() args = benchmark_args()
reader = ChineseBertReader({"max_seq_len": 20}) reader = ChineseBertReader({"max_seq_len": 128})
fetch = ["pooled_output"] fetch = ["pooled_output"]
endpoint_list = ["127.0.0.1:9292"] endpoint_list = ["127.0.0.1:9292"]
client = Client() client = Client()
......
...@@ -21,7 +21,7 @@ import os ...@@ -21,7 +21,7 @@ import os
class BertService(WebService): class BertService(WebService):
def load(self): def load(self):
self.reader = BertReader(vocab_file="vocab.txt", max_seq_len=20) self.reader = BertReader(vocab_file="vocab.txt", max_seq_len=128)
def preprocess(self, feed={}, fetch=[]): def preprocess(self, feed={}, fetch=[]):
feed_res = self.reader.process(feed["words"].encode("utf-8")) feed_res = self.reader.process(feed["words"].encode("utf-8"))
......
...@@ -276,27 +276,47 @@ function python_test_bert() { ...@@ -276,27 +276,47 @@ function python_test_bert() {
case $TYPE in case $TYPE in
CPU) CPU)
pip install paddlehub pip install paddlehub
python prepare_model.py 20 # Because download from paddlehub may timeout,
# download the model from bos(max_seq_len=128).
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
sh get_data.sh sh get_data.sh
check_cmd "python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 &" check_cmd "python -m paddle_serving_server.serve --model bert_chinese_L-12_H-768_A-12_model --port 9292 &"
sleep 5 sleep 5
pip install paddle_serving_app pip install paddle_serving_app
check_cmd "head -n 10 data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt" check_cmd "head -n 10 data-c.txt | python bert_client.py --model bert_chinese_L-12_H-768_A-12_client/serving_client_conf.prototxt"
kill_server_process kill_server_process
ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill # python prepare_model.py 20
ps -ef | grep "serving" | grep -v grep | awk '{print $2}' | xargs kill # sh get_data.sh
# check_cmd "python -m paddle_serving_server.serve --model bert_seq20_model/ --port 9292 &"
# sleep 5
# pip install paddle_serving_app
# check_cmd "head -n 10 data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt"
# kill_server_process
# ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill
# ps -ef | grep "serving" | grep -v grep | awk '{print $2}' | xargs kill
echo "bert RPC inference pass" echo "bert RPC inference pass"
;; ;;
GPU) GPU)
pip install paddlehub pip install paddlehub
python prepare_model.py 20 # Because download from paddlehub may timeout,
# download the model from bos(max_seq_len=128).
wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
sh get_data.sh sh get_data.sh
check_cmd "python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 &" check_cmd "python -m paddle_serving_server_gpu.serve --model bert_chinese_L-12_H-768_A-12_model --port 9292 --gpu_ids 0 &"
sleep 5 sleep 5
pip install paddle_serving_app pip install paddle_serving_app
check_cmd "head -n 10 data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt" check_cmd "head -n 10 data-c.txt | python bert_client.py --model bert_chinese_L-12_H-768_A-12_client/serving_client_conf.prototxt"
kill_server_process kill_server_process
ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill # python prepare_model.py 20
# sh get_data.sh
# check_cmd "python -m paddle_serving_server_gpu.serve --model bert_seq20_model/ --port 9292 --gpu_ids 0 &"
# sleep 5
# pip install paddle_serving_app
# check_cmd "head -n 10 data-c.txt | python bert_client.py --model bert_seq20_client/serving_client_conf.prototxt"
# kill_server_process
# ps -ef | grep "paddle_serving_server" | grep -v grep | awk '{print $2}' | xargs kill
echo "bert RPC inference pass" echo "bert RPC inference pass"
;; ;;
*) *)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册