Created by: lcy-seso
We need clip gradient by its local norm (and also by global norm) in ConvS2S model. @guoshengCS will work on this.