[cherry-pick]Elementwise add grad GPU kernel optimization (#30276)
* elementwise_add_grad Op optimization (#29575) * optimize for long width for elementwise (#29602) * refine (#29622) * delete the code for fp16 optimization because it is not faster than common template code (#29715) * fix the shape choose of vectorize for cuda * optimization for fp16 elementwise add (#29744) * Fix the compiler error for half type (#29799) * refine the compiler error for half2 operation (#29816) * fix the compiler error when gcc4 cuda9.0 (#29997)
Showing
想要评论请 注册 或 登录