Created by: typhoonzero
Resolve https://github.com/PaddlePaddle/Paddle/issues/9103 Related: https://github.com/PaddlePaddle/Paddle/issues/8638
This PR improves send_op performance by 50% using the vgg16 benchmark test (pserver uses 6 cores, more cores can gain better results).