Fork自 PaddlePaddle / Paddle
* optimize softmax forward
* vec softmax fw * vec softmax bw * add a message argument for compiler compatibility