[WIP] Pipeline Serving (!667) · 合并请求 · PaddlePaddle / Serving

[WIP] Pipeline Serving !667

Created by: barrierye

当前版本性能测试

结论（简单模型为BOW/CNN，复杂模型为LSTM）

预测部分使用grpc接口测得性能并不理想：
- （线程版异步接口）简单单模型是c++ core耗时的230%
- （线程版异步接口）简单双模型是c++ core耗时的336%
- （线程版异步接口）复杂双模型是c++ core耗时的134%
- 进程版（同步接口，因future不可序列化）复杂双模型耗时与线程版差不多（反而有所增加）
- 线程版同步接口与异步接口在复杂双模型上耗时基本一致
预测部分使用brpc与预期一致：
- （线程版）复杂双模型是c++ core耗时的63.3%

所以应该是grpc接口受GIL作用没有并行。

实验

使用两个LSTM模型（预测耗时较大）进行测试，PyServing比原C++ core增加了34%，猜测是因为数据处理由线程进行，当这部分耗时占比较大时PyServing性能很差，故将多线程改写为多进程
将线程改成进程，在双LSTM模型测试中，耗时与线程版差不多（反而增加），一部分原因是往多进程空队列里put数据，会有一小段时间延迟（https://hg.python.org/cpython/rev/860fc6a2bd21 ），而目前channel的实现不能阻塞get操作，所以在get数据时加了1e-3s的timeout（https://github.com/PaddlePaddle/Serving/pull/667/files#diff-f6ad39aa57713e2da4fb9e81e6193357R377 ），队列会在timeout前get数据，但经测试timeout越大，等待时间越大；另一部分原因是进程间通信增加。

重新分析耗时：用线程版查看profile，发现预处理时间耗时占比非常小，在获取future时消耗较多时间:

线程版双lstm（同一列为顺序执行）:
#G prepack: 0.155 ms
#G push: 0.058 ms
<long empty: 大概 0.3 ms>
l1 prep: 0.158 ms
l2 prep: 0.175 ms
                          l2 midp: 3.898 ms
l1 midp: 4.618 ms         l2 postp: 0.022 ms
                          l2 push: 0.050 ms
l1 postp: 0.022 ms
l1 push: 0.054 ms
combine prep: 7.806 ms
combine postp: 0.026 ms
combine push: 0.045 ms
<long empty: 大概 0.3 ms>
#G postpack: 0.120 ms
<long empty: 大概 6 ms>
<next iter>

取消future换用同步模式，耗时集中在op的mid-process（预测部分），似乎这部分没有并发，总耗时和异步模式差不多：

预测部分换用用brpc接口，总耗时降到C++ core的63.3%，profile如下：

PyServing BOW单模型（thread版 grpc-aync-impl）

Timer unit: 1e-06 s
Total time: 7.91556 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           def func():
    22         1          8.0      8.0      0.0      client = PyClient()
    23         1       5452.0   5452.0      0.1      client.connect('localhost:8080')
    24
    25                                               #lp = LineProfiler()
    26                                               #lp_wrapper = lp(client.predict)
    27
    28         1          1.0      1.0      0.0      words = 'i am very sad | 0'
    29         1          5.0      5.0      0.0      imdb_dataset = IMDBDataset()
    30         1     106775.0 106775.0      1.3      imdb_dataset.load_resource('imdb.vocab')
    31
    32      2085       7741.0      3.7      0.1      for line in sys.stdin:
    33      2084     861203.0    413.2     10.9          word_ids, label = imdb_dataset.get_words_and_label(line)
    34                                                   #fetch_map = lp_wrapper(
    35      2084       1813.0      0.9      0.0          fetch_map = client.predict(
    36      2084    6932563.0   3326.6     87.6              feed={"words": word_ids}, fetch=["prediction"])

PyServing BOW和CNN双模型（thread版 grpc-aync-impl）

Total time: 12.3527 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           def func():
    21         1          9.0      9.0      0.0      client = PyClient()
    22         1       5916.0   5916.0      0.0      client.connect('localhost:8080')
    23
    24         1          1.0      1.0      0.0      words = 'i am very sad | 0'
    25         1          6.0      6.0      0.0      imdb_dataset = IMDBDataset()
    26         1     104786.0 104786.0      0.8      imdb_dataset.load_resource('imdb.vocab')
    27
    28      2085       8467.0      4.1      0.1      for line in sys.stdin:
    29      2084     877019.0    420.8      7.1          word_ids, label = imdb_dataset.get_words_and_label(line)
    30      2084   11356504.0   5449.4     91.9          fetch_map = client.predict(feed={"words": word_ids}, fetch=["prediction"])

PyServing LSTM和LSTM双模型（thread版 grpc-aync-impl）

Total time: 21.9295 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           def func():
    21         1         29.0     29.0      0.0      client = PyClient()
    22         1       6088.0   6088.0      0.0      client.connect('localhost:8080')
    23
    24         1          1.0      1.0      0.0      words = 'i am very sad | 0'
    25         1          7.0      7.0      0.0      imdb_dataset = IMDBDataset()
    26         1     228226.0 228226.0      1.0      imdb_dataset.load_resource('imdb.vocab')
    27
    28      2085      11646.0      5.6      0.1      for line in sys.stdin:
    29      2084     965434.0    463.3      4.4          word_ids, label = imdb_dataset.get_words_and_label(line)
    30      2084   20718107.0   9941.5     94.5          fetch_map = client.predict(feed={"words": word_ids}, fetch=["prediction"])

PyServing LSTM和LSTM双模型（process版 grpc-aync-impl）

Total time: 24.3444 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           def func():
    21         1          7.0      7.0      0.0      client = PyClient()
    22         1       5404.0   5404.0      0.0      client.connect('localhost:8080')
    23
    24         1          1.0      1.0      0.0      words = 'i am very sad | 0'
    25         1          4.0      4.0      0.0      imdb_dataset = IMDBDataset()
    26         1     103255.0 103255.0      0.4      imdb_dataset.load_resource('imdb.vocab')
    27
    28      2085       9329.0      4.5      0.0      for line in sys.stdin:
    29      2084     895440.0    429.7      3.7          word_ids, label = imdb_dataset.get_words_and_label(line)
    30      2084   23329327.0  11194.5     95.8          fetch_map = client.predict(feed={"words": word_ids}, fetch=["prediction"])
    31         1       1675.0   1675.0      0.0      print(fetch_map)

PyServing LSTM和LSTM双模型（thread版 c++ core）

Total time: 10.3327 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           def func():
    21         1          6.0      6.0      0.0      client = PyClient()
    22         1       5441.0   5441.0      0.1      client.connect('localhost:8080')
    23
    24         1          1.0      1.0      0.0      words = 'i am very sad | 0'
    25         1          4.0      4.0      0.0      imdb_dataset = IMDBDataset()
    26         1     104695.0 104695.0      1.0      imdb_dataset.load_resource('imdb.vocab')
    27
    28      2085       7459.0      3.6      0.1      for line in sys.stdin:
    29      2084     872195.0    418.5      8.4          word_ids, label = imdb_dataset.get_words_and_label(line)
    30      2084    9339443.0   4481.5     90.4          fetch_map = client.predict(feed={"words": word_ids}, fetch=["prediction"])
    31      2084       2712.0      1.3      0.0          if fetch_map['ecode'] != 0:
    32                                                       print(fetch_map['error_info'])
    33         1        772.0    772.0      0.0      print(fetch_map)

C++ core BOW单模型

Total time: 3.3367 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                           def func():
    21         1          3.0      3.0      0.0      words = 'i am very sad | 0'
    22         1          5.0      5.0      0.0      imdb_dataset = IMDBDataset()
    23         1     101483.0 101483.0      3.0      imdb_dataset.load_resource('imdb.vocab')
    24
    25         1      21312.0  21312.0      0.6      client = Client()
    26         1       2796.0   2796.0      0.1      client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
    27         1       4403.0   4403.0      0.1      client.connect(["127.0.0.1:8888"])
    28
    29                                               #lp = LineProfiler()
    30                                               #lp_wrapper = lp(client.predict)
    31
    32      2085       7266.0      3.5      0.2      for line in sys.stdin:
    33      2084     840609.0    403.4     25.2          word_ids, label = imdb_dataset.get_words_and_label(line)
    34                                                   #fetch_map = lp_wrapper(
    35      2084       1757.0      0.8      0.1          fetch_map = client.predict(
    36      2084    2357063.0   1131.0     70.6              feed={"words": word_ids}, fetch=["prediction"])

C++ core BOW和CNN双模型

Total time: 3.7254 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           def func():
    22         1      18822.0  18822.0      0.5      client = Client()
    23                                               # If you have more than one model, make sure that the input
    24                                               # and output of more than one model are the same.
    25         1       2541.0   2541.0      0.1      client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
    26         1       4270.0   4270.0      0.1      client.connect(["127.0.0.1:8889"])
    27
    28                                               # you can define any english sentence or dataset here
    29                                               # This example reuses imdb reader in training, you
    30                                               # can define your own data preprocessing easily.
    31         1         11.0     11.0      0.0      imdb_dataset = IMDBDataset()
    32         1     107277.0 107277.0      2.9      imdb_dataset.load_resource('imdb.vocab')
    33
    34      2085       7840.0      3.8      0.2      for line in sys.stdin:
    35      2084     858674.0    412.0     23.0          word_ids, label = imdb_dataset.get_words_and_label(line)
    36      2084       3835.0      1.8      0.1          feed = {"words": word_ids}
    37      2084       1556.0      0.7      0.0          fetch = ["prediction"]
    38      2084    2717955.0   1304.2     73.0          fetch_maps = client.predict(feed=feed, fetch=fetch)

C++ core LSTM和LSTM双模型

Total time: 16.3102 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    21                                           def func():
    22         1     453148.0 453148.0      2.8      client = Client()
    23                                               # If you have more than one model, make sure that the input
    24                                               # and output of more than one model are the same.
    25         1       5346.0   5346.0      0.0      client.load_client_config('imdb_bow_client_conf/serving_client_conf.prototxt')
    26         1       4825.0   4825.0      0.0      client.connect(["127.0.0.1:8889"])
    27
    28                                               # you can define any english sentence or dataset here
    29                                               # This example reuses imdb reader in training, you
    30                                               # can define your own data preprocessing easily.
    31         1         13.0     13.0      0.0      imdb_dataset = IMDBDataset()
    32         1     162498.0 162498.0      1.0      imdb_dataset.load_resource('imdb.vocab')
    33
    34      2085      12260.0      5.9      0.1      for line in sys.stdin:
    35      2084     942797.0    452.4      5.8          word_ids, label = imdb_dataset.get_words_and_label(line)
    36      2084       6451.0      3.1      0.0          feed = {"words": word_ids}
    37      2084       1705.0      0.8      0.0          fetch = ["prediction"]
    38      2084   14718960.0   7062.8     90.2          fetch_maps = client.predict(feed=feed, fetch=fetch)
    39         1       2219.0   2219.0      0.0      print(fetch_maps)

PaddlePaddle / Serving 接近 2 年 前同步成功

[WIP] Pipeline Serving !667

当前版本性能测试

结论（简单模型为BOW/CNN，复杂模型为LSTM）

实验

PyServing BOW单模型（thread版 grpc-aync-impl）

PyServing BOW和CNN双模型（thread版 grpc-aync-impl）

PyServing LSTM和LSTM双模型（thread版 grpc-aync-impl）

PyServing LSTM和LSTM双模型（process版 grpc-aync-impl）

PyServing LSTM和LSTM双模型（thread版 c++ core）

C++ core BOW单模型

C++ core BOW和CNN双模型

C++ core LSTM和LSTM双模型

PaddlePaddle / Serving
接近 2 年前同步成功