[NPU] optimize mul op, use BatchMatMul to realize (#33616)
* use BatchMatMul * replace TensorCopy with ShareDataWith * remove check fp16 grad * fix format * add grad_check * fix grad check
Showing
想要评论请 注册 或 登录
* use BatchMatMul * replace TensorCopy with ShareDataWith * remove check fp16 grad * fix format * add grad_check * fix grad check