[cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152)
* Improve performance of elementwise_add grad op (#29187) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut * make gelu fp16 computing more robust (#29484) * Add fast path for dropout when p == 0 (#29553) * add fast path for p == 0 in dropout * add ut
Showing
想要评论请 注册 或 登录