Merge pull request #981 from HexToString/fix_bert_doc

fix bert_web_service_gpu and fix doc

Merge pull request #981 from HexToString/fix_bert_doc
fix bert_web_service_gpu and fix doc
69bd6505 · Jiawei Wang · wangjiawei04 · 4fd565e1 · 69bd6505 · 69bd6505
3 changed file
--- a/python/examples/bert/README.md
+++ b/python/examples/bert/README.md
@@ -3,9 +3,10 @@
 ([简体中文](./README_CN.md)|English)
 In the example, a BERT model is used for semantic understanding prediction, and the text is represented as a vector, which can be used for further analysis and prediction.
+If your python version is 3.X, replace the 'pip' field in the following command with 'pip3',replace 'python' with 'python3'.
 ### Getting Model
+method 1:
 This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub).
 Install paddlehub first
@@ -22,11 +23,13 @@ the 128 in the command above means max_seq_len in BERT model, which is the lengt
 the config file and model file for server side are saved in the folder bert_seq128_model.
 the config file generated for client side is saved in the folder bert_seq128_client.
+method 2:
 You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder:
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
 ```
+if your model is bert_chinese_L-12_H-768_A-12_model, replace the 'bert_seq128_model' field in the following command with 'bert_chinese_L-12_H-768_A-12_model',replace 'bert_seq128_client' with 'bert_chinese_L-12_H-768_A-12_client'.
 ### Getting Dict and Sample Dataset
@@ -36,11 +39,11 @@ sh get_data.sh
 this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt
 ### RPC Inference Service
-Run
+start cpu inference service,Run
 ```
 python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292  #cpu inference service
 ```
-Or
+Or,start gpu inference service,Run
 ```
 python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0
 ```
@@ -59,12 +62,18 @@ head data-c.txt | python bert_client.py --model bert_seq128_client/serving_clien
 the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it).
 ### HTTP Inference Service
+start cpu HTTP inference service,Run
+```
+ python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service
+```
+Or,start gpu HTTP inference service,Run
 ```
 export CUDA_VISIBLE_DEVICES=0,1
 ```
 set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used.
 ```
- python bert_web_service.py bert_seq128_model/ 9292 #launch gpu inference service
+ python bert_web_service_gpu.py bert_seq128_model/ 9292 #launch gpu inference service
 ```
 ### HTTP Inference 

--- a/python/examples/bert/README_CN.md
+++ b/python/examples/bert/README_CN.md
@@ -4,8 +4,9 @@
 示例中采用BERT模型进行语义理解预测，将文本表示为向量的形式，可以用来做进一步的分析和预测。
+若使用python的版本为3.X, 将以下命令中的pip 替换为pip3, python替换为python3.
 ### 获取模型
+方法1：
 示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。
 请先安装paddlehub
 ```
@@ -19,11 +20,15 @@ python prepare_model.py 128
 生成server端配置文件与模型文件，存放在bert_seq128_model文件夹。
 生成client端配置文件，存放在bert_seq128_client文件夹。
+方法2：
 您也可以从bos上直接下载上述模型（max_seq_len=128），解压后server端配置文件与模型文件存放在bert_chinese_L-12_H-768_A-12_model文件夹，client端配置文件存放在bert_chinese_L-12_H-768_A-12_client文件夹：
 ```shell
 wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz
 tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz
 ```
+若使用bert_chinese_L-12_H-768_A-12_model模型，将下面命令中的bert_seq128_model字段替换为bert_chinese_L-12_H-768_A-12_model，bert_seq128_client字段替换为bert_chinese_L-12_H-768_A-12_client.
 ### 获取词典和样例数据
@@ -33,13 +38,15 @@ sh get_data.sh
 脚本将下载中文词典vocab.txt和中文样例数据data-c.txt
 ### 启动RPC预测服务
-执行
+启动cpu预测服务，执行
 ```
 python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292  #启动cpu预测服务
 ```
-或者
+或者，启动gpu预测服务，执行
 ```
 python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务
 ```
 ### 执行预测
@@ -51,17 +58,28 @@ pip install paddle_serving_app
 执行
 ```
 head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt
 ```
 启动client读取data-c.txt中的数据进行预测，预测结果为文本的向量表示（由于数据较多，脚本中没有将输出进行打印），server端的地址在脚本中修改。
 ### 启动HTTP预测服务
+启动cpu HTTP预测服务，执行
+```
+python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务
+```
+或者，启动gpu HTTP预测服务，执行
 ```
 export CUDA_VISIBLE_DEVICES=0,1
 ```
 通过环境变量指定gpu预测服务使用的gpu，示例中指定索引为0和1的两块gpu
 ```
- python bert_web_service.py bert_seq128_model/ 9292 #启动gpu预测服务
+python bert_web_service_gpu.py bert_seq128_model/ 9292 #启动gpu预测服务
 ```
 ### 执行预测
 ```

--- a/python/examples/bert/bert_web_service_gpu.py
+++ b/python/examples/bert/bert_web_service_gpu.py
+# coding=utf-8
+# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# pylint: disable=doc-string-missing
+from paddle_serving_server_gpu.web_service import WebService
+from paddle_serving_app.reader import ChineseBertReader
+import sys
+import os
+import numpy as np
+class BertService(WebService):
+    def load(self):
+        self.reader = ChineseBertReader({
+            "vocab_file": "vocab.txt",
+            "max_seq_len": 128
+        })
+    def preprocess(self, feed=[], fetch=[]):
+        feed_res = []
+        is_batch = False
+        for ins in feed:
+            feed_dict = self.reader.process(ins["words"].encode("utf-8"))
+            for key in feed_dict.keys():
+                feed_dict[key] = np.array(feed_dict[key]).reshape(
+                    (len(feed_dict[key]), 1))
+            feed_res.append(feed_dict)
+        return feed_res, fetch, is_batch
+bert_service = BertService(name="bert")
+bert_service.load()
+bert_service.load_model_config(sys.argv[1])
+bert_service.prepare_server(
+    workdir="workdir", port=int(sys.argv[2]), device="gpu")
+bert_service.run_rpc_service()
+bert_service.run_web_service()