-
由 Leo Chen 提交于
* Improve performance of elementwise_add grad op (#29187) * pass stop_gradient for cast op * improve performance of elementwise_add grad * use tensor copy async * dygraph branch * fix dygraph branch * add ut * make gelu fp16 computing more robust (#29484) * Add fast path for dropout when p == 0 (#29553) * add fast path for p == 0 in dropout * add ut
07f68fad