Refactor the organization of layer_norm cuda impl. (#34883)
Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op. Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h. Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.
Showing
想要评论请 注册 或 登录