[Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110)
* Layer norm fp16 (#29169) * add fp16 for layer_norm op * revert layernorm api * fix forward * fix forward * fix backward for layernorm with fp16 * fix unit test for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U> * fix with_mkldnn compile error for layernorm with fp16 * fix with_mkldnn compile error for layernorm with fp16 Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> * fix layer_norm accuracy (#29434) * Layernorm opt (#29522) * layernorm fw opt * layernorm bw opt * fix typo, test=develop * remove const dim3 for windows CI compatibility * merge develop Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com> * Fix compile problem when cuda_arch < 6000 (#29576) * fix compile problem when cuda_arch < 6000 * refine code * refine code Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com> Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
Showing
想要评论请 注册 或 登录