• F
    [Cherry-pick] Layer norm fp16 and Nvidia optimize (#29169 #29434 #29522 #29576) (#30110) · 44b81e63
    furnace 提交于
    * Layer norm fp16 (#29169)
    
    * add fp16 for layer_norm op
    
    * revert layernorm api
    
    * fix forward
    
    * fix forward
    
    * fix backward for layernorm with fp16
    
    * fix unit test for layernorm with fp16
    
    * fix with_mkldnn compile error for layernorm with fp16
    
    * 1. revert to PADDLE_ENFORCE_NOT_NULL, 2. change static_cast<float> to static_cast<U>
    
    * fix with_mkldnn compile error for layernorm with fp16
    
    * fix with_mkldnn compile error for layernorm with fp16
    Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
    
    * fix layer_norm accuracy (#29434)
    
    * Layernorm opt (#29522)
    
    * layernorm fw opt
    
    * layernorm bw opt
    
    * fix typo, test=develop
    
    * remove const dim3 for windows CI compatibility
    
    * merge develop
    Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
    
    * Fix compile problem when cuda_arch < 6000 (#29576)
    
    * fix compile problem when cuda_arch < 6000
    
    * refine code
    
    * refine code
    Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
    Co-authored-by: Nzlsh80826 <zlsh80826@gmail.com>
    44b81e63
fp16_utils.py 17.9 KB