diff --git a/doc/FAQ.md b/doc/FAQ.md index 00630bd67baef14cfcda18e47a4d5cf8596b6cd0..0dc4ed35a55e5904adbd1b924441aa21bc5436ab 100644 --- a/doc/FAQ.md +++ b/doc/FAQ.md @@ -41,6 +41,10 @@ **A:** 通过pip命令安装自己编译出的whl包,并设置SERVING_BIN环境变量为编译出的serving二进制文件路径。 +#### Q: 使用Java客户端,mvn compile过程出现"No compiler is provided in this environment. Perhaps you are running on a JRE rather than a JDK?"错误 + +**A:** 没有安装JDK,或者JAVA_HOME路径配置错误(正确配置是JDK路径,常见错误配置成JRE路径,例如正确路径参考JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64/")。Java JDK安装参考https://segmentfault.com/a/1190000015389941 + ## 部署问题 diff --git a/doc/GRPC_IMPL_CN.md b/doc/GRPC_IMPL_CN.md index 7b10907caec98ae5754126a7ec54096cc4cd48af..9e7ecd268fe0900c1085479c1f96fa083629758c 100644 --- a/doc/GRPC_IMPL_CN.md +++ b/doc/GRPC_IMPL_CN.md @@ -1,52 +1,137 @@ -# gRPC接口 +# gRPC接口使用介绍 + + - [1.与bRPC接口对比](#1与brpc接口对比) + - [1.1 服务端对比](#11-服务端对比) + - [1.2 客服端对比](#12-客服端对比) + - [1.3 其他](#13-其他) + - [2.示例:线性回归预测服务](#2示例线性回归预测服务) + - [获取数据](#获取数据) + - [开启 gRPC 服务端](#开启-grpc-服务端) + - [客户端预测](#客户端预测) + - [同步预测](#同步预测) + - [异步预测](#异步预测) + - [Batch 预测](#batch-预测) + - [通用 pb 预测](#通用-pb-预测) + - [预测超时](#预测超时) + - [List 输入](#list-输入) + - [3.更多示例](#3更多示例) + +使用gRPC接口,Client端可以在Win/Linux/MacOS平台上调用不同语言。gRPC 接口实现结构如下: + +![](https://github.com/PaddlePaddle/Serving/blob/develop/doc/grpc_impl.png) + +## 1.与bRPC接口对比 + +#### 1.1 服务端对比 + +* gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数: -gRPC 接口实现形式类似 Web Service: - -![](grpc_impl.png) - -## 与bRPC接口对比 - -1. gRPC Server 端 `load_model_config` 函数添加 `client_config_path` 参数: - - ```python + ``` def load_model_config(self, server_config_paths, client_config_path=None) ``` + 在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能不同(如 在cube local 中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),此时 gRPC Server 端需要手动设置 gRPC Client 端的配置`client_config_path`。 + **`client_config_path` 默认为 `/serving_server_conf.prototxt`。** - 在一些例子中 bRPC Server 端与 bRPC Client 端的配置文件可能是不同的(如 cube local 例子中,Client 端的数据先交给 cube,经过 cube 处理后再交给预测库),所以 gRPC Server 端需要获取 gRPC Client 端的配置;同时为了取消 gRPC Client 端手动加载配置文件的过程,所以设计 gRPC Server 端同时加载两个配置文件。`client_config_path` 默认为 `/serving_server_conf.prototxt`。 +#### 1.2 客服端对比 -2. gRPC Client 端取消 `load_client_config` 步骤: +* gRPC Client 端取消 `load_client_config` 步骤: 在 `connect` 步骤通过 RPC 获取相应的 prototxt(从任意一个 endpoint 获取即可)。 -3. gRPC Client 需要通过 RPC 方式设置 timeout 时间(调用形式与 bRPC Client保持一致) +* gRPC Client 需要通过 RPC 方式设置 timeout 时间(调用形式与 bRPC Client保持一致) 因为 bRPC Client 在 `connect` 后无法更改 timeout 时间,所以当 gRPC Server 收到变更 timeout 的调用请求时会重新创建 bRPC Client 实例以变更 bRPC Client timeout时间,同时 gRPC Client 会设置 gRPC 的 deadline 时间。 **注意,设置 timeout 接口和 Inference 接口不能同时调用(非线程安全),出于性能考虑暂时不加锁。** -4. gRPC Client 端 `predict` 函数添加 `asyn` 和 `is_python` 参数: +* gRPC Client 端 `predict` 函数添加 `asyn` 和 `is_python` 参数: - ```python + ``` def predict(self, feed, fetch, need_variant_tag=False, asyn=False, is_python=True) ``` - 其中,`asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。 +1. `asyn` 为异步调用选项。当 `asyn=True` 时为异步调用,返回 `MultiLangPredictFuture` 对象,通过 `MultiLangPredictFuture.result()` 阻塞获取预测值;当 `asyn=Fasle` 为同步调用。 + +2. `is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。 + +#### 1.3 其他 + +* 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。 + +* 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略,ABTEST 特性还未打齐。 + +* 系统兼容性: + * [x] CentOS + * [x] macOS + * [x] Windows + +* 已经支持的客户端语言: + + - Python + - Java + - Go + + +## 2.示例:线性回归预测服务 + +以下是采用gRPC实现的关于线性回归预测的一个示例,具体代码详见此[链接](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example/fit_a_line) +#### 获取数据 + +```shell +sh get_data.sh +``` + +#### 开启 gRPC 服务端 + +``` shell +python test_server.py uci_housing_model/ +``` + +也可以通过下面的一行代码开启默认 gRPC 服务: + +```shell +python -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9393 --use_multilang +``` +注:--use_multilang参数用来启用多语言客户端 + +### 客户端预测 + +#### 同步预测 + +``` shell +python test_sync_client.py +``` + +#### 异步预测 + +``` shell +python test_asyn_client.py +``` + +#### Batch 预测 + +``` shell +python test_batch_client.py +``` - `is_python` 为 proto 格式选项。当 `is_python=True` 时,基于 numpy bytes 格式进行数据传输,目前只适用于 Python;当 `is_python=False` 时,以普通数据格式传输,更加通用。使用 numpy bytes 格式传输耗时比普通数据格式小很多(详见 [#654](https://github.com/PaddlePaddle/Serving/pull/654))。 +#### 通用 pb 预测 -5. 异常处理:当 gRPC Server 端的 bRPC Client 预测失败(返回 `None`)时,gRPC Client 端同样返回None。其他 gRPC 异常会在 Client 内部捕获,并在返回的 fetch_map 中添加一个 "status_code" 字段来区分是否预测正常(参考 timeout 样例)。 +``` shell +python test_general_pb_client.py +``` -6. 由于 gRPC 只支持 pick_first 和 round_robin 负载均衡策略,ABTEST 特性还未打齐。 +#### 预测超时 -7. 经测试,gRPC 版本可以在 Windows、macOS 平台使用。 +``` shell +python test_timeout_client.py +``` -8. 计划支持的客户端语言: +#### List 输入 - - [x] Python - - [ ] Java - - [ ] Go - - [ ] JavaScript +``` shell +python test_list_input_client.py +``` -## Python 端的一些例子 +## 3.更多示例 -详见 `python/examples/grpc_impl_example` 下的示例文件。 +详见[`python/examples/grpc_impl_example`](https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/grpc_impl_example)下的示例文件。 diff --git a/python/examples/bert/bert_client.py b/python/examples/bert/bert_client.py index d0f8b0aad19b78e6235a3dd0403f20324b4681b4..b378f9f791bce4abfe79b068c1875d9b66f1791c 100644 --- a/python/examples/bert/bert_client.py +++ b/python/examples/bert/bert_client.py @@ -23,7 +23,7 @@ args = benchmark_args() reader = ChineseBertReader({"max_seq_len": 128}) fetch = ["pooled_output"] -endpoint_list = ['127.0.0.1:8861'] +endpoint_list = ['127.0.0.1:9292'] client = Client() client.load_client_config(args.model) client.connect(endpoint_list) @@ -33,5 +33,5 @@ for line in sys.stdin: for key in feed_dict.keys(): feed_dict[key] = np.array(feed_dict[key]).reshape((128, 1)) #print(feed_dict) - result = client.predict(feed=feed_dict, fetch=fetch) + result = client.predict(feed=feed_dict, fetch=fetch, batch=True) print(result) diff --git a/python/examples/bert/bert_web_service.py b/python/examples/bert/bert_web_service.py index e3985c9da6c90bb349cc76cba038abd3fe9359c5..e1260dd1c2942fc806f6fd6b2199feb9467a8c2b 100644 --- a/python/examples/bert/bert_web_service.py +++ b/python/examples/bert/bert_web_service.py @@ -29,13 +29,14 @@ class BertService(WebService): def preprocess(self, feed=[], fetch=[]): feed_res = [] + is_batch = True for ins in feed: feed_dict = self.reader.process(ins["words"].encode("utf-8")) for key in feed_dict.keys(): feed_dict[key] = np.array(feed_dict[key]).reshape( - (1, len(feed_dict[key]), 1)) + (len(feed_dict[key]), 1)) feed_res.append(feed_dict) - return feed_res, fetch + return feed_res, fetch, is_batch bert_service = BertService(name="bert") diff --git a/python/examples/imdb/benchmark.py b/python/examples/imdb/benchmark.py index d804731162b9fe1bf376867322941fdf31ea50b0..18584f88ea51373ffe2ca2e75946342c94464d76 100644 --- a/python/examples/imdb/benchmark.py +++ b/python/examples/imdb/benchmark.py @@ -18,7 +18,7 @@ import sys import time import requests import numpy as np -from paddle_serving_app.reader import IMDBDataset +from paddle_serving_app.reader.imdb_reader import IMDBDataset from paddle_serving_client import Client from paddle_serving_client.utils import MultiThreadRunner from paddle_serving_client.utils import MultiThreadRunner, benchmark_args, show_latency diff --git a/python/examples/imdb/test_client.py b/python/examples/imdb/test_client.py index c057fdb631340174cc6d3fe9d1873767ba0ece78..2aeee01a83cde4a66e4bd03ad49c7791c67a287e 100644 --- a/python/examples/imdb/test_client.py +++ b/python/examples/imdb/test_client.py @@ -13,7 +13,7 @@ # limitations under the License. # pylint: disable=doc-string-missing from paddle_serving_client import Client -from paddle_serving_app.reader import IMDBDataset +from paddle_serving_app.reader.imdb_reader import IMDBDataset import sys import numpy as np diff --git a/python/examples/imdb/text_classify_service.py b/python/examples/imdb/text_classify_service.py index 7b1f200e152da37c57cc8b2f7cd233531e5dd445..1d292194f963466d3e53859dc9e4c6da1789ea20 100755 --- a/python/examples/imdb/text_classify_service.py +++ b/python/examples/imdb/text_classify_service.py @@ -14,7 +14,7 @@ # pylint: disable=doc-string-missing from paddle_serving_server.web_service import WebService -from paddle_serving_app.reader import IMDBDataset +from paddle_serving_app.reader.imdb_reader import IMDBDataset import sys import numpy as np diff --git a/python/examples/pipeline/imdb_model_ensemble/config.yml b/python/examples/pipeline/imdb_model_ensemble/config.yml index 0853033fdccc643c459e19e2e0a573c3091ba9a9..2f25fa861f3ec50d15d5d5795e5e25dbf801e861 100644 --- a/python/examples/pipeline/imdb_model_ensemble/config.yml +++ b/python/examples/pipeline/imdb_model_ensemble/config.yml @@ -1,22 +1,100 @@ -rpc_port: 18080 +#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1 +rpc_port: 18070 + +#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port +http_port: 18071 + +#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG +#当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num worker_num: 4 -build_dag_each_worker: false + +#build_dag_each_worker, False,框架在进程内创建一条DAG;True,框架会每个进程内创建多个独立的DAG +build_dag_each_worker: False + dag: - is_thread_op: true + #op资源类型, True, 为线程模型;False,为进程模型 + is_thread_op: True + + #重试次数 retry: 1 - use_profile: false + + #使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用 + use_profile: False + + #channel的最大长度,默认为0 + channel_size: 0 + + #tracer, 跟踪框架吞吐,每个OP和channel的工作情况。无tracer时不生成数据 + tracer: + #每次trace的时间间隔,单位秒/s + interval_s: 10 op: bow: - concurrency: 2 - remote_service_conf: - client_type: brpc - model_config: imdb_bow_model - devices: "" - rpc_port : 9393 + #并发数,is_thread_op=True时,为线程并发;否则为进程并发 + concurrency: 1 + + #client连接类型,brpc + client_type: brpc + + #Serving交互重试次数,默认不重试 + retry: 1 + + #Serving交互超时时间, 单位ms + timeout: 3000 + + #Serving IPs + server_endpoints: ["127.0.0.1:9393"] + + #bow模型client端配置 + client_config: "imdb_bow_client_conf/serving_client_conf.prototxt" + + #Fetch结果列表,以client_config中fetch_var的alias_name为准 + fetch_list: ["prediction"] + + #批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout,否则不足batch_size时会阻塞 + batch_size: 1 + + #批量查询超时,与batch_size配合使用 + auto_batching_timeout: 2000 cnn: - concurrency: 2 - remote_service_conf: - client_type: brpc - model_config: imdb_cnn_model - devices: "" - rpc_port : 9292 + #并发数,is_thread_op=True时,为线程并发;否则为进程并发 + concurrency: 1 + + #client连接类型,brpc + client_type: brpc + + #Serving交互重试次数,默认不重试 + retry: 1 + + #超时时间, 单位ms + timeout: 3000 + + #Serving IPs + server_endpoints: ["127.0.0.1:9292"] + + #cnn模型client端配置 + client_config: "imdb_cnn_client_conf/serving_client_conf.prototxt" + + #Fetch结果列表,以client_config中fetch_var的alias_name为准 + fetch_list: ["prediction"] + + #批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout,否则不足batch_size时会阻塞 + batch_size: 1 + + #批量查询超时,与batch_size配合使用 + auto_batching_timeout: 2000 + combine: + #并发数,is_thread_op=True时,为线程并发;否则为进程并发 + concurrency: 1 + + #Serving交互重试次数,默认不重试 + retry: 1 + + #超时时间, 单位ms + timeout: 3000 + + #批量查询Serving的数量, 默认1。batch_size>1要设置auto_batching_timeout,否则不足batch_size时会阻塞 + batch_size: 1 + + #批量查询超时,与batch_size配合使用 + auto_batching_timeout: 2000 diff --git a/python/examples/pipeline/imdb_model_ensemble/test_pipeline_client.py b/python/examples/pipeline/imdb_model_ensemble/test_pipeline_client.py index 765ab7fd5a02a4af59b0773135bc59c802464b42..1737f8f782a25025547f68be6619c237975f5172 100644 --- a/python/examples/pipeline/imdb_model_ensemble/test_pipeline_client.py +++ b/python/examples/pipeline/imdb_model_ensemble/test_pipeline_client.py @@ -15,21 +15,22 @@ from paddle_serving_server.pipeline import PipelineClient import numpy as np client = PipelineClient() -client.connect(['127.0.0.1:18080']) +client.connect(['127.0.0.1:18070']) words = 'i am very sad | 0' futures = [] -for i in range(4): +for i in range(100): futures.append( client.predict( - feed_dict={"words": words}, + feed_dict={"words": words, + "logid": 10000 + i}, fetch=["prediction"], asyn=True, profile=False)) for f in futures: res = f.result() - if res["ecode"] != 0: + if res.err_no != 0: print("predict failed: {}".format(res)) print(res) diff --git a/python/examples/pipeline/imdb_model_ensemble/test_pipeline_server.py b/python/examples/pipeline/imdb_model_ensemble/test_pipeline_server.py index 89ce67eaef260b23150733c03cefc5dc844a8d42..35171a3910baf0af3ac6c83e521744906f49c948 100644 --- a/python/examples/pipeline/imdb_model_ensemble/test_pipeline_server.py +++ b/python/examples/pipeline/imdb_model_ensemble/test_pipeline_server.py @@ -15,10 +15,14 @@ from paddle_serving_server.pipeline import Op, RequestOp, ResponseOp from paddle_serving_server.pipeline import PipelineServer from paddle_serving_server.pipeline.proto import pipeline_service_pb2 -from paddle_serving_server.pipeline.channel import ChannelDataEcode +from paddle_serving_server.pipeline.channel import ChannelDataErrcode import numpy as np -from paddle_serving_app.reader import IMDBDataset +from paddle_serving_app.reader.imdb_reader import IMDBDataset import logging +try: + from paddle_serving_server.web_service import WebService +except ImportError: + from paddle_serving_server_gpu.web_service import WebService _LOGGER = logging.getLogger() user_handler = logging.StreamHandler() @@ -43,76 +47,66 @@ class ImdbRequestOp(RequestOp): word_ids, _ = self.imdb_dataset.get_words_and_label(words) word_len = len(word_ids) dictdata[key] = np.array(word_ids).reshape(word_len, 1) - dictdata["{}.lod".format(key)] = [0, word_len] - return dictdata + dictdata["{}.lod".format(key)] = np.array([0, word_len]) + + log_id = None + if request.logid is not None: + log_id = request.logid + return dictdata, log_id, None, "" class CombineOp(Op): - def preprocess(self, input_data): + def preprocess(self, input_data, data_id, log_id): + #_LOGGER.info("Enter CombineOp::preprocess") combined_prediction = 0 for op_name, data in input_data.items(): _LOGGER.info("{}: {}".format(op_name, data["prediction"])) combined_prediction += data["prediction"] data = {"prediction": combined_prediction / 2} - return data + return data, False, None, "" class ImdbResponseOp(ResponseOp): # Here ImdbResponseOp is consistent with the default ResponseOp implementation def pack_response_package(self, channeldata): resp = pipeline_service_pb2.Response() - resp.ecode = channeldata.ecode - if resp.ecode == ChannelDataEcode.OK.value: + resp.err_no = channeldata.error_code + if resp.err_no == ChannelDataErrcode.OK.value: feed = channeldata.parse() # ndarray to string for name, var in feed.items(): resp.value.append(var.__repr__()) resp.key.append(name) else: - resp.error_info = channeldata.error_info + resp.err_msg = channeldata.error_info return resp read_op = ImdbRequestOp() -bow_op = Op(name="bow", - input_ops=[read_op], - server_endpoints=["127.0.0.1:9393"], - fetch_list=["prediction"], - client_config="imdb_bow_client_conf/serving_client_conf.prototxt", - client_type='brpc', - concurrency=1, - timeout=-1, - retry=1, - batch_size=1, - auto_batching_timeout=None) -cnn_op = Op(name="cnn", - input_ops=[read_op], - server_endpoints=["127.0.0.1:9292"], - fetch_list=["prediction"], - client_config="imdb_cnn_client_conf/serving_client_conf.prototxt", - client_type='brpc', - concurrency=1, - timeout=-1, - retry=1, - batch_size=1, - auto_batching_timeout=None) -combine_op = CombineOp( - name="combine", - input_ops=[bow_op, cnn_op], - concurrency=1, - timeout=-1, - retry=1, - batch_size=2, - auto_batching_timeout=None) + + +class BowOp(Op): + def init_op(self): + pass + + +class CnnOp(Op): + def init_op(self): + pass + + +bow_op = BowOp("bow", input_ops=[read_op]) +cnn_op = CnnOp("cnn", input_ops=[read_op]) +combine_op = CombineOp("combine", input_ops=[bow_op, cnn_op]) # fetch output of bow_op -# response_op = ImdbResponseOp(input_ops=[bow_op]) +#response_op = ImdbResponseOp(input_ops=[bow_op]) # fetch output of combine_op response_op = ImdbResponseOp(input_ops=[combine_op]) # use default ResponseOp implementation -# response_op = ResponseOp(input_ops=[combine_op]) +#response_op = ResponseOp(input_ops=[combine_op]) server = PipelineServer() server.set_response_op(response_op) diff --git a/python/examples/pipeline/ocr/README.md b/python/examples/pipeline/ocr/README.md index f51789fc5e419d715141ba59dc49011d4f306e56..de7bcaa2ece7f9fa7ba56de533e8e4dd023ad1f3 100644 --- a/python/examples/pipeline/ocr/README.md +++ b/python/examples/pipeline/ocr/README.md @@ -28,31 +28,9 @@ python web_service.py &>log.txt & python pipeline_http_client.py ``` - -