Created by: qingqing01
Strictly, the gradients of single-GPU training should be exactly same with multi-GPU training. We need a unit tests for this.