Add fused self attention (#5966)
* add fused_self_attention functor * add fused_self_attention gradients * add test case * fix backward immpl bug * fix bug * fix bug * fix backward bug * fix comments * code format * fix comment * fix ci error * fix ci error * add init FusedSelfAttentionInterpState struct Co-authored-by: NYao Chi <later@usopp.net> Co-authored-by: Noneflow-ci-bot <69100618+oneflow-ci-bot@users.noreply.github.com>
Showing
想要评论请 注册 或 登录