新版serving client端性能需优化点 (#488) · Issue · PaddlePaddle / Serving

新版serving client端性能需优化点

Created by: wangxicoding

新版client采用numpy，从下面的distill_reader的profile结果可以看出，目前reader所占耗时比例很低。主要耗时集中在real_predict也就是client端的耗时。

外层阶段	内层阶段	耗时	耗时占比
get_ready（获取就绪memory)		0.117ms	0.026%
predict（预测阶段，分三阶段）耗时439.54ms	preprocess(准备数据，memory->list)	0.056ms	0.013%
	real_predict	`438.717ms`	`99.77%`
	postprocess(复制predict到memory)	0.694ms	0.157%
put_complete（放回完成memory)		0.062ms	0.014%

画出paddle-serving-client的profile图，两个step的数据和client端时间吻合。中间的gap有一大部分时间是client没有profile到的，见问题https://github.com/PaddlePaddle/Serving/issues/486

分析目前可以优化的点

batch_predict增加numpy的接口，减少client python端preprocess和postprocess的numpy与list间的数据拷贝转化。现batch_predict c++端的接口可以不变，针对Python端再封装一层numpy的c++接口，这样就可以减少python层面的数据转换损耗。

preprocess的numpy到list数据转换(TODO时间占比profile) https://github.com/PaddlePaddle/Serving/blob/f54717396f7cc161046e38b117458ccd5729957b/python/paddle_serving_client/__init__.py#L241 batch_predict的input数据list到c++ vector的数据转换，返回数据从vector到list的转换 https://github.com/PaddlePaddle/Serving/blob/f54717396f7cc161046e38b117458ccd5729957b/python/paddle_serving_client/__init__.py#L260-L262 返回数据从list到numpy的转换(TODO时间占比profile) https://github.com/PaddlePaddle/Serving/blob/f54717396f7cc161046e38b117458ccd5729957b/python/paddle_serving_client/__init__.py#L284-L285

从client到serving发送数据和接收数据可以优化成流式数据，尽量减少gap。

PaddlePaddle / Serving 大约 2 年 前同步成功

新版serving client端性能需优化点

PaddlePaddle / Serving
大约 2 年前同步成功