Support optional residual add in fused_attention and fused_feedforward. (#43474)
* Support optional residual add in fused_attention and fused_feedforward. * Add checkpoint and add the check of add_residual when pre_layer_norm is false. * Add TODO and change the python api to add add_residual argument.
Showing
想要评论请 注册 或 登录