Fork自 PaddlePaddle / Paddle
* enable npu alignment * support flatten_params/grads * support clip by global norm * remove memset in coalesce_tensor_op * fix npu kernel of sum op when input is one tensor * add ut for flatten_param_grads+regularizer * fix ut * fix typo