Considering enhancing parameter server with DPDK
Created by: yingfeng
I notice that paddle v2 fluid has been significantly changed since its legacy version. The idea to encapsulate the parameter server into the operator is great such that the program running on local machine and distributed environment could share the major codes. However, I also notice that the parameter server in fluid is based on either GRPC or BRPC, I think there exists remarkable margin for performance improvements as a result. Due to the simplicity of current encapsulation, I think it might be good ideas to change to other network stack such that communication overhead among ps nodes could be reduced a lot. For example, this framework is a seastar based RPC, it seems DPDK enhanced network could be easily integrated into Paddle to replace either GRPC or BRPC.
Another work is eRPC, which provides RoCE based RPC solution as well as DPDK(in future), it might also be a good candidate for communication improvements.