Created by: zuowang
Implement a new sgd proposed in paper "Asynchronous Stochastic Gradient Descent with Delay Compensation for Distributed Deep Learning"