• W
    [cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8
    wangchaochaohu 提交于
    * elementwise_add_grad Op optimization  (#29575)
    
    * optimize for long width for elementwise (#29602)
    
    * refine (#29622)
    
    * delete the code for fp16 optimization because it is not faster than common template code (#29715)
    
    * fix the shape choose of vectorize for cuda
    
    * optimization for fp16 elementwise add (#29744)
    
    * Fix the compiler error for half type (#29799)
    
    * refine the compiler error for half2 operation (#29816)
    
    * fix the compiler error when gcc4 cuda9.0 (#29997)
    e59524f8
elementwise_add_op.h 16.7 KB