• L
    [cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad
    Leo Chen 提交于
    * Improve performance of elementwise_add grad op (#29187)
    
    * pass stop_gradient for cast op
    
    * improve performance of elementwise_add grad
    
    * use tensor copy async
    
    * dygraph branch
    
    * fix dygraph branch
    
    * add ut
    
    * make gelu fp16 computing more robust (#29484)
    
    * Add fast path for dropout when p == 0  (#29553)
    
    * add fast path for p == 0 in dropout
    
    * add ut
    07f68fad
elementwise_add_op.h 7.6 KB