Created by: qjing666
When both keep_dim and reduce_all are True in reduce ops, the gradient computation fails because the dim is unmatched.
Fix the bug and add a unittest