diff --git a/python/examples/bert/README.md b/python/examples/bert/README.md index 4cfa5590ffb4501c78e9e6ff886f5f82c94dd2db..a8fa35ddaec86ea2f05b025a3bde4b999d57f1dc 100644 --- a/python/examples/bert/README.md +++ b/python/examples/bert/README.md @@ -3,9 +3,10 @@ ([简体中文](./README_CN.md)|English) In the example, a BERT model is used for semantic understanding prediction, and the text is represented as a vector, which can be used for further analysis and prediction. +If your python version is 3.X, replace the 'pip' field in the following command with 'pip3',replace 'python' with 'python3'. ### Getting Model - +method 1: This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub). Install paddlehub first @@ -22,11 +23,13 @@ the 128 in the command above means max_seq_len in BERT model, which is the lengt the config file and model file for server side are saved in the folder bert_seq128_model. the config file generated for client side is saved in the folder bert_seq128_client. +method 2: You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder: ```shell wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz ``` +if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'. ### Getting Dict and Sample Dataset @@ -36,11 +39,11 @@ sh get_data.sh this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt ### RPC Inference Service -Run +start cpu inference service,Run ``` python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #cpu inference service ``` -Or +Or,start gpu inference service,Run ``` python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0 ``` @@ -59,12 +62,18 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it). ### HTTP Inference Service +start cpu HTTP inference service,Run +``` + python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service +``` + +Or,start gpu HTTP inference service,Run ``` export CUDA_VISIBLE_DEVICES=0,1 ``` set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used. ``` - python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service + python bert_web_service_gpu.py bert_seq128_model/ 9292 #launch gpu inference service ``` ### HTTP Inference diff --git a/python/examples/bert/README_CN.md b/python/examples/bert/README_CN.md index 93ec8f2adbd9ae31489011900472a0077cb33783..e06e17c8f345b65884feabee08d40e5f345fa322 100644 --- a/python/examples/bert/README_CN.md +++ b/python/examples/bert/README_CN.md @@ -4,8 +4,9 @@ 示例中采用BERT模型进行语义理解预测,将文本表示为向量的形式,可以用来做进一步的分析和预测。 +若使用python的版本为3.X, 将以下命令中的pip 替换为pip3, python替换为python3. ### 获取模型 - +方法1: 示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。 请先安装paddlehub ``` @@ -19,11 +20,15 @@ python prepare_model.py 128 生成server端配置文件与模型文件,存放在bert_seq128_model文件夹。 生成client端配置文件,存放在bert_seq128_client文件夹。 +方法2: 您也可以从bos上直接下载上述模型(max_seq_len=128),解压后server端配置文件与模型文件存放在bert_chinese_L-12_H-768_A-12_model文件夹,client端配置文件存放在bert_chinese_L-12_H-768_A-12_client文件夹: ```shell wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz ``` +若使用bert_chinese_L-12_H-768_A-12_model模型,将下面命令中的bert_seq128_model字段替换为bert_chinese_L-12_H-768_A-12_model,bert_seq128_client字段替换为bert_chinese_L-12_H-768_A-12_client. + + ### 获取词典和样例数据 @@ -33,13 +38,15 @@ sh get_data.sh 脚本将下载中文词典vocab.txt和中文样例数据data-c.txt ### 启动RPC预测服务 -执行 +启动cpu预测服务,执行 ``` python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #启动cpu预测服务 + ``` -或者 +或者,启动gpu预测服务,执行 ``` python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 + ``` ### 执行预测 @@ -51,17 +58,28 @@ pip install paddle_serving_app 执行 ``` head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt + ``` 启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。 + + ### 启动HTTP预测服务 +启动cpu HTTP预测服务,执行 +``` +python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务 + +``` + +或者,启动gpu HTTP预测服务,执行 ``` export CUDA_VISIBLE_DEVICES=0,1 ``` 通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu ``` - python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务 +python bert_web_service_gpu.py bert_seq128_model/ 9292 #启动gpu预测服务 ``` + ### 执行预测 ``` diff --git a/python/examples/bert/bert_web_service_gpu.py b/python/examples/bert/bert_web_service_gpu.py new file mode 100644 index 0000000000000000000000000000000000000000..cbdd321c0932bf68c1e37f02f0c08e08a6c0e43e --- /dev/null +++ b/python/examples/bert/bert_web_service_gpu.py @@ -0,0 +1,48 @@ +# coding=utf-8 +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=doc-string-missing +from paddle_serving_server_gpu.web_service import WebService +from paddle_serving_app.reader import ChineseBertReader +import sys +import os +import numpy as np + + +class BertService(WebService): + def load(self): + self.reader = ChineseBertReader({ + "vocab_file": "vocab.txt", + "max_seq_len": 128 + }) + + def preprocess(self, feed=[], fetch=[]): + feed_res = [] + is_batch = False + for ins in feed: + feed_dict = self.reader.process(ins["words"].encode("utf-8")) + for key in feed_dict.keys(): + feed_dict[key] = np.array(feed_dict[key]).reshape( + (len(feed_dict[key]), 1)) + feed_res.append(feed_dict) + return feed_res, fetch, is_batch + + +bert_service = BertService(name="bert") +bert_service.load() +bert_service.load_model_config(sys.argv[1]) +bert_service.prepare_server( + workdir="workdir", port=int(sys.argv[2]), device="gpu") +bert_service.run_rpc_service() +bert_service.run_web_service()