diff --git a/doc/ABTEST_IN_PADDLE_SERVING.md b/doc/ABTEST_IN_PADDLE_SERVING.md index 3ae23504bff2621c9a814a3ac15e5157626f8999..d901fe8d3141b25731321bc2e3a3df000f9fd6a1 100644 --- a/doc/ABTEST_IN_PADDLE_SERVING.md +++ b/doc/ABTEST_IN_PADDLE_SERVING.md @@ -16,32 +16,30 @@ sh get_data.sh ``` ### Processing Data +Data processing needs to use the relevant library, please use pip to install +``` shell +pip install paddlepaddle +pip install paddle-serving-app +pip install Shapely +```` -The following Python code will process the data `test_data/part-0` and write to the `processed.data` file. +You can directly run the following command to process the data. -[//file]:#process.py -``` python -from paddle_serving_app.reader import IMDBDataset -imdb_dataset = IMDBDataset() -imdb_dataset.load_resource('imdb.vocab') - -with open('test_data/part-0') as fin: - with open('processed.data', 'w') as fout: - for line in fin: - word_ids, label = imdb_dataset.get_words_and_label(line) - fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0])) -``` +[python abtest_get_data.py](../python/examples/imdb/abtest_get_data.py) + +The Python code in the file will process the data `test_data/part-0` and write to the `processed.data` file. ### Start Server -Here, we [use docker](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER.md) to start the server-side service. +Here, we [use docker](RUN_IN_DOCKER.md) to start the server-side service. First, start the BOW server, which enables the `8000` port: ``` shell -docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest -docker exec -it bow-server bash -pip install paddle-serving-server +docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash +docker exec -it bow-server /bin/bash +pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple +pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log & exit ``` @@ -49,18 +47,25 @@ exit Similarly, start the LSTM server, which enables the `9000` port: ```bash -docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest -docker exec -it lstm-server bash -pip install paddle-serving-server +docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash +docker exec -it lstm-server /bin/bash +pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple +pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log & exit ``` ### Start Client -Run the following Python code on the host computer to start client. Make sure that the host computer is installed with the `paddle-serving-client` package. +In order to simulate abtest condition, you can run the following Python code on the host to start the client, but you need to ensure that the host has the relevant environment, you can also run in the docker environment. -[//file]:#ab_client.py +Before running, use `pip install paddle-serving-client` to install the paddle-serving-client package. + +You can directly use the following command to make abtest prediction. + +[python abtest_client.py](../python/examples/imdb/abtest_client.py) + +[//file]:#abtest_client.py ``` python from paddle_serving_client import Client @@ -91,7 +96,7 @@ In the code, the function `client.add_variant(tag, clusters, variant_weight)` is When making prediction on the client side, if the parameter `need_variant_tag=True` is specified, the response will contain the variant tag corresponding to the distribution flow. ### Expected Results - +Due to different network conditions, the results of each prediction may be slightly different. ``` python [lstm](total: 1867) acc: 0.490091055169 [bow](total: 217) acc: 0.73732718894 diff --git a/doc/ABTEST_IN_PADDLE_SERVING_CN.md b/doc/ABTEST_IN_PADDLE_SERVING_CN.md index 43bb702bd8b0317d7449313c0e1362953ed87744..074960381858c64a2517a86553e295052b55e6a5 100644 --- a/doc/ABTEST_IN_PADDLE_SERVING_CN.md +++ b/doc/ABTEST_IN_PADDLE_SERVING_CN.md @@ -16,31 +16,29 @@ sh get_data.sh ``` ### 处理数据 +由于处理数据需要用到相关库,请使用pip进行安装 +``` shell +pip install paddlepaddle +pip install paddle-serving-app +pip install Shapely +```` +您可以直接运行下面的命令来处理数据。 -下面Python代码将处理`test_data/part-0`的数据,写入`processed.data`文件中。 +[python abtest_get_data.py](../python/examples/imdb/abtest_get_data.py) -```python -from paddle_serving_app.reader import IMDBDataset -imdb_dataset = IMDBDataset() -imdb_dataset.load_resource('imdb.vocab') - -with open('test_data/part-0') as fin: - with open('processed.data', 'w') as fout: - for line in fin: - word_ids, label = imdb_dataset.get_words_and_label(line) - fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0])) -``` +文件中的Python代码将处理`test_data/part-0`的数据,并将处理后的数据生成并写入`processed.data`文件中。 ### 启动Server端 -这里采用[Docker方式](https://github.com/PaddlePaddle/Serving/blob/develop/doc/RUN_IN_DOCKER_CN.md)启动Server端服务。 +这里采用[Docker方式](RUN_IN_DOCKER_CN.md)启动Server端服务。 首先启动BOW Server,该服务启用`8000`端口: ```bash -docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest -docker exec -it bow-server bash +docker run -dit -v $PWD/imdb_bow_model:/model -p 8000:8000 --name bow-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash +docker exec -it bow-server /bin/bash pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple +pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple python -m paddle_serving_server.serve --model model --port 8000 >std.log 2>err.log & exit ``` @@ -48,19 +46,27 @@ exit 同理启动LSTM Server,该服务启用`9000`端口: ```bash -docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest -docker exec -it lstm-server bash +docker run -dit -v $PWD/imdb_lstm_model:/model -p 9000:9000 --name lstm-server hub.baidubce.com/paddlepaddle/serving:latest /bin/bash +docker exec -it lstm-server /bin/bash pip install paddle-serving-server -i https://pypi.tuna.tsinghua.edu.cn/simple +pip install paddle-serving-client -i https://pypi.tuna.tsinghua.edu.cn/simple python -m paddle_serving_server.serve --model model --port 9000 >std.log 2>err.log & exit ``` ### 启动Client端 +为了模拟ABTEST工况,您可以在宿主机运行下面Python代码启动Client端,但需确保宿主机具备相关环境,您也可以在docker环境下运行. + +运行前使用`pip install paddle-serving-client`安装paddle-serving-client包。 -在宿主机运行下面Python代码启动Client端,需要确保宿主机装好`paddle-serving-client`包。 + +您可以直接使用下面的命令,进行ABTEST预测。 + +[python abtest_client.py](../python/examples/imdb/abtest_client.py) ```python from paddle_serving_client import Client +import numpy as np client = Client() client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt') @@ -68,28 +74,32 @@ client.add_variant("bow", ["127.0.0.1:8000"], 10) client.add_variant("lstm", ["127.0.0.1:9000"], 90) client.connect() +print('please wait for about 10s') with open('processed.data') as f: cnt = {"bow": {'acc': 0, 'total': 0}, "lstm": {'acc': 0, 'total': 0}} for line in f: word_ids, label = line.split(';') word_ids = [int(x) for x in word_ids.split(',')] - feed = {"words": word_ids} + word_len = len(word_ids) + feed = { + "words": np.array(word_ids).reshape(word_len, 1), + "words.lod": [0, word_len] + } fetch = ["acc", "cost", "prediction"] - [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True) + [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True,batch=True) if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0: cnt[tag]['acc'] += 1 cnt[tag]['total'] += 1 for tag, data in cnt.items(): - print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc']) / float(data['total']))) + print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc'])/float(data['total']) )) ``` - 代码中,`client.add_variant(tag, clusters, variant_weight)`是为了添加一个标签为`tag`、流量权重为`variant_weight`的variant。在这个样例中,添加了一个标签为`bow`、流量权重为`10`的BOW variant,以及一个标签为`lstm`、流量权重为`90`的LSTM variant。Client端的流量会根据`10:90`的比例分发到两个variant。 Client端做预测时,若指定参数`need_variant_tag=True`,返回值则包含分发流量对应的variant标签。 ### 预期结果 - +由于网络情况的不同,可能每次预测的结果略有差异。 ``` bash [lstm](total: 1867) acc: 0.490091055169 [bow](total: 217) acc: 0.73732718894 diff --git a/python/examples/bert/prepare_model.py b/python/examples/bert/prepare_model.py index 70902adf9268d1071c79eb27216dcc2ea9a11a49..521aea9ae2e892c4e071ef2e9917e51f8e626743 100644 --- a/python/examples/bert/prepare_model.py +++ b/python/examples/bert/prepare_model.py @@ -16,7 +16,9 @@ import paddlehub as hub import paddle.fluid as fluid import sys import paddle_serving_client.io as serving_io +import paddle +paddle.enable_static() model_name = "bert_chinese_L-12_H-768_A-12" module = hub.Module(model_name) inputs, outputs, program = module.context( diff --git a/python/examples/imdb/abtest_client.py b/python/examples/imdb/abtest_client.py new file mode 100644 index 0000000000000000000000000000000000000000..f5f721b67966f1da72e19b66e014e5b72d802323 --- /dev/null +++ b/python/examples/imdb/abtest_client.py @@ -0,0 +1,43 @@ + +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_client import Client +import numpy as np + +client = Client() +client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt') +client.add_variant("bow", ["127.0.0.1:8000"], 10) +client.add_variant("lstm", ["127.0.0.1:9000"], 90) +client.connect() + +print('please wait for about 10s') +with open('processed.data') as f: + cnt = {"bow": {'acc': 0, 'total': 0}, "lstm": {'acc': 0, 'total': 0}} + for line in f: + word_ids, label = line.split(';') + word_ids = [int(x) for x in word_ids.split(',')] + word_len = len(word_ids) + feed = { + "words": np.array(word_ids).reshape(word_len, 1), + "words.lod": [0, word_len] + } + fetch = ["acc", "cost", "prediction"] + [fetch_map, tag] = client.predict(feed=feed, fetch=fetch, need_variant_tag=True,batch=True) + if (float(fetch_map["prediction"][0][1]) - 0.5) * (float(label[0]) - 0.5) > 0: + cnt[tag]['acc'] += 1 + cnt[tag]['total'] += 1 + + for tag, data in cnt.items(): + print('[{}](total: {}) acc: {}'.format(tag, data['total'], float(data['acc'])/float(data['total']) )) diff --git a/python/examples/imdb/abtest_get_data.py b/python/examples/imdb/abtest_get_data.py new file mode 100644 index 0000000000000000000000000000000000000000..c6bd7ea57b86d0df0dd2ae842bee8bd98daa910e --- /dev/null +++ b/python/examples/imdb/abtest_get_data.py @@ -0,0 +1,23 @@ +# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from paddle_serving_app.reader.imdb_reader import IMDBDataset +imdb_dataset = IMDBDataset() +imdb_dataset.load_resource('imdb.vocab') + +with open('test_data/part-0') as fin: + with open('processed.data', 'w') as fout: + for line in fin: + word_ids, label = imdb_dataset.get_words_and_label(line) + fout.write("{};{}\n".format(','.join([str(x) for x in word_ids]), label[0]))