Created by: Yancey1989
We can fuse split_byref/send and recv/concat into two ops so that the code is more cleaner in ParallelExecutor