diff --git a/doc/BERT_10_MINS.md b/doc/BERT_10_MINS.md index 7b981d7eda467197d1b1b741c35b00c97b74c532..7f2aef671cfca910c4fb07de288fb6ba28bcd451 100644 --- a/doc/BERT_10_MINS.md +++ b/doc/BERT_10_MINS.md @@ -2,35 +2,57 @@ ([简体中文](./BERT_10_MINS_CN.md)|English) -The goal of Bert-As-Service is to give a sentence, and the service can represent the sentence as a semantic vector and return it to the user. [Bert model](https://arxiv.org/abs/1810.04805) is a popular model in the current NLP field. It has achieved good results on a variety of public NLP tasks. The semantic vector calculated by the Bert model is used as input to other NLP models, which will also greatly improve the performance of the model. Bert-As-Service allows users to easily obtain the semantic vector representation of text and apply it to their own tasks. In order to achieve this goal, we have shown in four steps that using Paddle Serving can build such a service in ten minutes. All the code and files in the example can be found in [Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) of Paddle Serving. +The goal of Bert-As-Service is to give a sentence, and the service can represent the sentence as a semantic vector and return it to the user. [Bert model](https://arxiv.org/abs/1810.04805) is a popular model in the current NLP field. It has achieved good results on a variety of public NLP tasks. The semantic vector calculated by the Bert model is used as input to other NLP models, which will also greatly improve the performance of the model. Bert-As-Service allows users to easily obtain the semantic vector representation of text and apply it to their own tasks. In order to achieve this goal, we have shown in five steps that using Paddle Serving can build such a service in ten minutes. All the code and files in the example can be found in [Example](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert) of Paddle Serving. -#### Step1: Save the serviceable model +If your python version is 3.X, replace the 'pip' field in the following command with 'pip3',replace 'python' with 'python3'. -Paddle Serving supports various models trained based on Paddle, and saves the serviceable model by specifying the input and output variables of the model. For convenience, we can load a trained bert Chinese model from paddlehub and save a deployable service with two lines of code. The server and client configurations are placed in the `bert_seq20_model` and` bert_seq20_client` folders, respectively. +### Step1: Getting Model -[//file]:#bert_10.py -``` python -import paddlehub as hub -model_name = "bert_chinese_L-12_H-768_A-12" -module = hub.Module(model_name) -inputs, outputs, program = module.context( - trainable=True, max_seq_len=20) -feed_keys = ["input_ids", "position_ids", "segment_ids", - "input_mask"] -fetch_keys = ["pooled_output", "sequence_output"] -feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys])) -fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys])) +#### method 1: +This example use model [BERT Chinese Model](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel) from [Paddlehub](https://github.com/PaddlePaddle/PaddleHub). -import paddle_serving_client.io as serving_io -serving_io.save_model("bert_seq20_model", "bert_seq20_client", - feed_dict, fetch_dict, program) +Install paddlehub first +``` +pip install paddlehub +``` + +run +``` +python prepare_model.py 128 +``` + +**PaddleHub only support Python 3.5+** + +the 128 in the command above means max_seq_len in BERT model, which is the length of sample after preprocessing. +the config file and model file for server side are saved in the folder bert_seq128_model. +the config file generated for client side is saved in the folder bert_seq128_client. + +#### method 2: +You can also download the above model from BOS(max_seq_len=128). After decompression, the config file and model file for server side are stored in the bert_chinese_L-12_H-768_A-12_model folder, and the config file generated for client side is stored in the bert_chinese_L-12_H-768_A-12_client folder: +```shell +wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz +tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz +mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model +mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client ``` -#### Step2: Launch Service +### Step2: Getting Dict and Sample Dataset -[//file]:#server.sh -``` shell -python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 --port 9292 --gpu_ids 0 +``` +sh get_data.sh +``` +this script will download Chinese Dictionary File vocab.txt and Chinese Sample Data data-c.txt + + +### Step3: Launch Service + +start cpu inference service,Run +``` +python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #cpu inference service +``` +Or,start gpu inference service,Run +``` +python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #launch gpu inference service at GPU 0 ``` | Parameters | Meaning | | ---------- | ---------------------------------------- | @@ -39,52 +61,55 @@ python -m paddle_serving_server_gpu.serve --model bert_seq20_model --thread 10 - | port | server port number | | gpu_ids | GPU index number | -#### Step3: data preprocessing logic on Client Side +### Step4: data preprocessing logic on Client Side Paddle Serving has many built-in corresponding data preprocessing logics. For the calculation of Chinese Bert semantic representation, we use the ChineseBertReader class under paddle_serving_app for data preprocessing. Model input fields of multiple models corresponding to a raw Chinese sentence can be easily fetched by developers Install paddle_serving_app -[//file]:#pip_app.sh ```shell pip install paddle_serving_app ``` -#### Step4: Client Visit Serving +### Step5: Client Visit Serving -the script of client side bert_client.py is as follow: -[//file]:#bert_client.py -``` python -import sys -from paddle_serving_client import Client -from paddle_serving_client.utils import benchmark_args -from paddle_serving_app.reader import ChineseBertReader -import numpy as np -args = benchmark_args() +#### method 1: RPC Inference + +Run +``` +head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt +``` + +the client reads data from data-c.txt and send prediction request, the prediction is given by word vector. (Due to massive data in the word vector, we do not print it). -reader = ChineseBertReader({"max_seq_len": 128}) -fetch = ["pooled_output"] -endpoint_list = ['127.0.0.1:9292'] -client = Client() -client.load_client_config(args.model) -client.connect(endpoint_list) -for line in sys.stdin: - feed_dict = reader.process(line) - for key in feed_dict.keys(): - feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1)) - result = client.predict(feed=feed_dict, fetch=fetch, batch=False) +#### method 2: HTTP Inference + +This method is divided into two steps: + +1. Start an HTTP prediction server. + +start cpu HTTP inference service,Run +``` + python bert_web_service.py bert_seq128_model/ 9292 #launch cpu inference service ``` -run +Or,start gpu HTTP inference service,Run +``` + export CUDA_VISIBLE_DEVICES=0,1 +``` +set environmental variable to specify which gpus are used, the command above means gpu 0 and gpu 1 is used. +``` + python bert_web_service_gpu.py bert_seq128_model/ 9292 #launch gpu inference service +``` -[//file]:#bert_10_cli.sh -```shell -cat data.txt | python bert_client.py +2. Prediction via HTTP request ``` +curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction +``` + -read samples from data.txt, print results at the standard output. ### Benchmark diff --git a/doc/BERT_10_MINS_CN.md b/doc/BERT_10_MINS_CN.md index b0578e8e84de7e36694e879e5a64737d275c505c..df4e8eb32614df0c8b0c2edeeb47fd1516a70710 100644 --- a/doc/BERT_10_MINS_CN.md +++ b/doc/BERT_10_MINS_CN.md @@ -2,30 +2,56 @@ (简体中文|[English](./BERT_10_MINS.md)) -Bert-As-Service的目标是给定一个句子,服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型,在多种公开的NLP任务上都取得了很好的效果,使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标,我们通过四个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。 +Bert-As-Service的目标是给定一个句子,服务可以将句子表示成一个语义向量返回给用户。[Bert模型](https://arxiv.org/abs/1810.04805)是目前NLP领域的热门模型,在多种公开的NLP任务上都取得了很好的效果,使用Bert模型计算出的语义向量来做其他NLP模型的输入对提升模型的表现也有很大的帮助。Bert-As-Service可以让用户很方便地获取文本的语义向量表示并应用到自己的任务中。为了实现这个目标,我们通过以下几个步骤说明使用Paddle Serving在十分钟内就可以搭建一个这样的服务。示例中所有的代码和文件均可以在Paddle Serving的[示例](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/bert)中找到。 -#### Step1:保存可服务模型 -Paddle Serving支持基于Paddle进行训练的各种模型,并通过指定模型的输入和输出变量来保存可服务模型。为了方便,我们可以从paddlehub加载一个已经训练好的bert中文模型,并利用两行代码保存一个可部署的服务,服务端和客户端的配置分别放在`bert_seq20_model`和`bert_seq20_client`文件夹。 +若使用python的版本为3.X, 将以下命令中的pip 替换为pip3, python替换为python3. -``` python -import paddlehub as hub -model_name = "bert_chinese_L-12_H-768_A-12" -module = hub.Module(model_name) -inputs, outputs, program = module.context(trainable=True, max_seq_len=20) -feed_keys = ["input_ids", "position_ids", "segment_ids", "input_mask"] -fetch_keys = ["pooled_output", "sequence_output"] -feed_dict = dict(zip(feed_keys, [inputs[x] for x in feed_keys])) -fetch_dict = dict(zip(fetch_keys, [outputs[x] for x in fetch_keys])) -import paddle_serving_client.io as serving_io -serving_io.save_model("bert_seq20_model", "bert_seq20_client", feed_dict, fetch_dict, program) +### Step1:获取模型 +#### 方法1: +示例中采用[Paddlehub](https://github.com/PaddlePaddle/PaddleHub)中的[BERT中文模型](https://www.paddlepaddle.org.cn/hubdetail?name=bert_chinese_L-12_H-768_A-12&en_category=SemanticModel)。 +请先安装paddlehub ``` +pip install paddlehub +``` +执行 +``` +python prepare_model.py 128 +``` +参数128表示BERT模型中的max_seq_len,即预处理后的样本长度。 +生成server端配置文件与模型文件,存放在bert_seq128_model文件夹。 +生成client端配置文件,存放在bert_seq128_client文件夹。 + +#### 方法2: +您也可以从bos上直接下载上述模型(max_seq_len=128),解压后server端配置文件与模型文件存放在bert_chinese_L-12_H-768_A-12_model文件夹,client端配置文件存放在bert_chinese_L-12_H-768_A-12_client文件夹: +```shell +wget https://paddle-serving.bj.bcebos.com/paddle_hub_models/text/SemanticModel/bert_chinese_L-12_H-768_A-12.tar.gz +tar -xzf bert_chinese_L-12_H-768_A-12.tar.gz +mv bert_chinese_L-12_H-768_A-12_model bert_seq128_model +mv bert_chinese_L-12_H-768_A-12_client bert_seq128_client +``` + -#### Step2:启动服务 +### Step2:获取词典和样例数据 + +``` +sh get_data.sh +``` +脚本将下载中文词典vocab.txt和中文样例数据data-c.txt + + +### Step3:启动服务 + +启动cpu预测服务,执行 +``` +python -m paddle_serving_server.serve --model bert_seq128_model/ --port 9292 #启动cpu预测服务 + +``` +或者,启动gpu预测服务,执行 +``` +python -m paddle_serving_server_gpu.serve --model bert_seq128_model/ --port 9292 --gpu_ids 0 #在gpu 0上启动gpu预测服务 -``` shell -python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 --gpu_ids 0 ``` | 参数 | 含义 | @@ -35,7 +61,8 @@ python -m paddle_serving_server_gpu.serve --model bert_seq20_model --port 9292 - | port | server端端口号 | | gpu_ids | GPU索引号 | -#### Step3:客户端数据预处理逻辑 + +### Step4:客户端数据预处理逻辑 Paddle Serving内建了很多经典典型对应的数据预处理逻辑,对于中文Bert语义表示的计算,我们采用paddle_serving_app下的ChineseBertReader类进行数据预处理,开发者可以很容易获得一个原始的中文句子对应的多个模型输入字段。 @@ -45,39 +72,40 @@ Paddle Serving内建了很多经典典型对应的数据预处理逻辑,对于 pip install paddle_serving_app ``` -#### Step4:客户端访问 +### Step5:客户端访问 + +#### 方法1:通过RPC方式执行预测 +执行 +``` +head data-c.txt | python bert_client.py --model bert_seq128_client/serving_client_conf.prototxt -客户端脚本 bert_client.py内容如下 +``` +启动client读取data-c.txt中的数据进行预测,预测结果为文本的向量表示(由于数据较多,脚本中没有将输出进行打印),server端的地址在脚本中修改。 -``` python -import sys -from paddle_serving_client import Client -from paddle_serving_client.utils import benchmark_args -from paddle_serving_app.reader import ChineseBertReader -import numpy as np -args = benchmark_args() -reader = ChineseBertReader({"max_seq_len": 128}) -fetch = ["pooled_output"] -endpoint_list = ['127.0.0.1:9292'] -client = Client() -client.load_client_config(args.model) -client.connect(endpoint_list) +#### 方法2:通过HTTP方式执行预测 +该方式分为两步 +1、启动一个HTTP预测服务端。 -for line in sys.stdin: - feed_dict = reader.process(line) - for key in feed_dict.keys(): - feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1)) - result = client.predict(feed=feed_dict, fetch=fetch, batch=False) +启动cpu HTTP预测服务,执行 ``` +python bert_web_service.py bert_seq128_model/ 9292 #启动CPU预测服务 -执行 +``` -```shell -cat data.txt | python bert_client.py +或者,启动gpu HTTP预测服务,执行 +``` + export CUDA_VISIBLE_DEVICES=0,1 +``` +通过环境变量指定gpu预测服务使用的gpu,示例中指定索引为0和1的两块gpu +``` +python bert_web_service_gpu.py bert_seq128_model/ 9292 #启动gpu预测服务 ``` -从data.txt文件中读取样例,并将结果打印到标准输出。 +2、通过HTTP请求执行预测。 +``` +curl -H "Content-Type:application/json" -X POST -d '{"feed":[{"words": "hello"}], "fetch":["pooled_output"]}' http://127.0.0.1:9292/bert/prediction +``` ### 性能测试 diff --git a/doc/ENCRYPTION.md b/doc/ENCRYPTION.md index 1e6a53aa386bf672d5f87647cb1682531ea3d62c..b3639bbc6572623f4f0b7af28f44effd665d9f4e 100644 --- a/doc/ENCRYPTION.md +++ b/doc/ENCRYPTION.md @@ -42,11 +42,3 @@ Once the server gets the key, it uses the key to parse the model and starts the ### Example of Model Encryption Inference Example of model encryption inference, See the [`/python/examples/encryption/`](../python/examples/encryption/)。 - -### Other Details -Interface of encryption method in paddlepaddle official website: - -[Python encryption method](https://github.com/HexToString/Serving/blob/develop/python/paddle_serving_app/local_predict.py) - -[C++ encryption method](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html#analysispre) - diff --git a/doc/ENCRYPTION_CN.md b/doc/ENCRYPTION_CN.md index 5ca304d00d198ba2c6df1c7cfbff7315ba46fe15..87452ea365f2cf3b05a0b356a3e709f882568b88 100644 --- a/doc/ENCRYPTION_CN.md +++ b/doc/ENCRYPTION_CN.md @@ -42,11 +42,3 @@ python -m paddle_serving_server_gpu.serve --model encrypt_server/ --port 9300 -- ### 模型加密推理示例 模型加密推理示例, 请参见[`/python/examples/encryption/`](../python/examples/encryption/)。 - -### 其他详细信息 -飞桨官方网站加密方法接口 - -[Python加密方法接口](https://github.com/HexToString/Serving/blob/develop/python/paddle_serving_app/local_predict.py) - -[C++加密方法接口](https://www.paddlepaddle.org.cn/documentation/docs/zh/advanced_guide/inference_deployment/inference/python_infer_cn.html#analysispre) - diff --git a/python/examples/ocr/rec_debugger_server.py b/python/examples/ocr/rec_debugger_server.py index 9dc7b19b7bf17096cf3a0c18d0a337f990ecd1a4..c7de28862e2d5a03df974b439b4f027e3b8ee423 100644 --- a/python/examples/ocr/rec_debugger_server.py +++ b/python/examples/ocr/rec_debugger_server.py @@ -22,7 +22,10 @@ from paddle_serving_client import Client from paddle_serving_app.reader import Sequential, URL2Image, ResizeByFactor from paddle_serving_app.reader import Div, Normalize, Transpose from paddle_serving_app.reader import DBPostProcess, FilterBoxes, GetRotateCropImage, SortedBoxes -from paddle_serving_server_gpu.web_service import WebService +if sys.argv[1] == 'gpu': + from paddle_serving_server_gpu.web_service import WebService +elif sys.argv[1] == 'cpu': + from paddle_serving_server.web_service import WebService import time import re import base64 diff --git a/python/examples/pipeline/imagenet/pipeline_rpc_client.py b/python/examples/pipeline/imagenet/pipeline_rpc_client.py index 3220e6c20b27c92a59cd0c28050719a8790d648d..77157359efbdb0b416009af7cfb95f41ce65a1fb 100644 --- a/python/examples/pipeline/imagenet/pipeline_rpc_client.py +++ b/python/examples/pipeline/imagenet/pipeline_rpc_client.py @@ -11,7 +11,10 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -from paddle_serving_server_gpu.pipeline import PipelineClient +try: + from paddle_serving_server_gpu.pipeline import PipelineClient +except ImportError: + from paddle_serving_server.pipeline import PipelineClient import numpy as np import requests import json diff --git a/python/examples/pipeline/ocr/pipeline_rpc_client.py b/python/examples/pipeline/ocr/pipeline_rpc_client.py index e7cd9e5ce4bc911c8e0ff944058236aced0727b7..ec721ec359063233c616bc4d49cf6d15ec775115 100644 --- a/python/examples/pipeline/ocr/pipeline_rpc_client.py +++ b/python/examples/pipeline/ocr/pipeline_rpc_client.py @@ -11,7 +11,10 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -from paddle_serving_server_gpu.pipeline import PipelineClient +try: + from paddle_serving_server_gpu.pipeline import PipelineClient +except ImportError: + from paddle_serving_server.pipeline import PipelineClient import numpy as np import requests import json diff --git a/python/paddle_serving_server_gpu/serve.py b/python/paddle_serving_server_gpu/serve.py index 88a2909955be1b4342f3d6037655995aed0d7a4c..13f081e7283d4470396896d4ab31a76292498129 100644 --- a/python/paddle_serving_server_gpu/serve.py +++ b/python/paddle_serving_server_gpu/serve.py @@ -155,8 +155,8 @@ class MainService(BaseHTTPRequestHandler): if "key" not in post_data: return False else: - key = base64.b64decode(post_data["key"]) - with open(args.model + "/key", "w") as f: + key = base64.b64decode(post_data["key"].encode()) + with open(args.model + "/key", "wb") as f: f.write(key) return True @@ -164,8 +164,8 @@ class MainService(BaseHTTPRequestHandler): if "key" not in post_data: return False else: - key = base64.b64decode(post_data["key"]) - with open(args.model + "/key", "r") as f: + key = base64.b64decode(post_data["key"].encode()) + with open(args.model + "/key", "rb") as f: cur_key = f.read() return (key == cur_key) @@ -206,7 +206,7 @@ class MainService(BaseHTTPRequestHandler): self.send_response(200) self.send_header('Content-type', 'application/json') self.end_headers() - self.wfile.write(json.dumps(response)) + self.wfile.write(json.dumps(response).encode()) if __name__ == "__main__":