local predictor and RPC batch interface unified (!821) · 合并请求 · PaddlePaddle / Serving

local predictor and RPC batch interface unified !821

Created by: wangjiawei04

PR types

Function Optimization

PR changes

APIs/ Docs

Describe

本次PR一共做了以下改动 1. 统一RPC和local predictor模式下的 client.predict(feed, fetch, batch)接口 废除了list作为输入的方式，所有的输入都必须是numpy格式 在batch模式下。 feed不接受列表list输入，所有的输入都必须为numpy batch只能为True或者False， True意味着用户的输入是batch模式，不需要在前面补上1 batch=False意味着用户的输入是非batch模式，需要在输入的dim前面补上1

举例

    feed = {
        "words": np.array(word_ids).reshape(word_len, 1), #shape is [word_len, 1]
        "words.lod": [0, word_len]
    }
    fetch = ["prediction"]
    fetch_map = client.predict(feed=feed, fetch=fetch, batch=True) # shape is [word_len, 1]
     fetch_map = client.predict(feed=feed, fetch=fetch, batch=False) # shape is [1, word_len, 1]

可以看到，上面的例子，如果batch设置为False时，说明给的feed输入不是batch，会在前面增加一个1 作为batch维度。

2. 完善lod 信息的支持 原有的接口，在RPC模式下，需要通过列表组装各个feed dict来实现变长序列的支持。在local predict模式下，无法给定lod信息。

为了统一，本次修改给出了以下接口

    feed = {
        "words": np.array(word_ids).reshape(word_len, 1), #shape is [word_len, 1]
        "words.lod": [0, word_len]
    }
    fetch_map = client.predict(feed=feed, fetch=fetch, batch=True)

我们需要在feed_dict内显式指定 ${VAR_NAME}.lod 这样的key，然后value是它们的lod信息。然后调用client.predict接口，此配置对RPC模式和local predictor模式都适用。这样只需要在客户端配置的几行代码就可以无缝切换brpc和local predictor模式。代码如下

以下是imdb例子的完整代码

from paddle_serving_client import Client
from paddle_serving_app.reader import IMDBDataset
from paddle_serving_app.local_predict import Debugger
import sys
import numpy as np

if sys.argv[3] == 'brpc':
    client = Client()
    client.load_client_config(sys.argv[1])
    client.connect(["127.0.0.1:8867"])
elif sys.argv[3] == "local":
    client = Debugger()
    client.load_model_config(sys.argv[1], profile=False)

# you can define any english sentence or dataset here
# This example reuses imdb reader in training, you
# can define your own data preprocessing easily.
imdb_dataset = IMDBDataset()
imdb_dataset.load_resource(sys.argv[2])

for line in sys.stdin:
    word_ids, label = imdb_dataset.get_words_and_label(line)
    word_len = len(word_ids)
    feed = {
        "words": np.array(word_ids).reshape(word_len, 1),
        "words.lod": [0, word_len]
    }
    #print(feed)
    fetch = ["prediction"]
    fetch_map = client.predict(feed=feed, fetch=fetch, batch=True)
    print("{} {}".format(fetch_map["prediction"][0], label[0]))

brpc模式请执行

python -m paddle_serving_server.serve --model imdb_cnn_model/ --port 9292
head test_data/part-0 | python test_client.py imdb_cnn_client_conf/serving_client_conf.prototxt imdb.vocab brpc

local predictor模式请执行

head test_data/part-0 | python test_client.py imdb_cnn_model imdb.vocab local

目前example下还没有通过统一入口的方式去启动brpc和local predictor，这个工作量较大，会在下个PR补上。

3. pipeline server的支持 在python/examples/pipeline下有ocr的例子，这里给出了pipeline的示例。在@barrierye的上一个pipeline的例子当中，client_type是在Dag处指定的，现在需要在Op处指定。这样就可以灵活确定哪个Op用local predictor，哪个Op用brpc，以ocr为例。

rpc_port: 18080
worker_num: 4
build_dag_each_worker: false
http_port: 9999
dag:
    is_thread_op: false
    retry: 1
    use_profile: false
op:
    det:
        concurrency: 2
        local_service_conf:
            client_type: brpc # 可以换成 local_predictor
            model_config: ocr_det_model
            devices: ""
    rec:
        concurrency: 1
        timeout: -1
        retry: 1
        local_service_conf:
            client_type: brpc # 可以换成local_predictor
            model_config: ocr_rec_model
            devices: ""

我们在对应的Op里指定client type之后，就可以启动相应的serving模式。

此外，由于client.predict接口经过修改，因此之后 Class Op的 preprocess函数的返回 feed，fetch也要遵照新的标准。也就是用一个dict去指定所有tensor，并且显式指定 ${VAR_NAME}.lod

Attention：目前无法支持GPU和local predictor 在 is_thread_op：false的模式，因为子进程无法载入CUDA环境

4. 修改example下的模型 修改了所有NLP模型中，需要用list作为输入的例子，目前已修改完成验证了所有CV模型的输出符合预期，已完成验证增加了ocr的pipeline local predictor 模式，已完成验证。

TODO：

JAVA SDK需要做验证。
和@liyang 协同修改Py2和Py3 CI脚本确保其通过

PaddlePaddle / Serving 大约 2 年 前同步成功