turns out LayerNorm also has weight and bias and needs to be pre-multiplied...
turns out LayerNorm also has weight and bias and needs to be pre-multiplied and trained for hypernets
Showing
想要评论请 注册 或 登录
turns out LayerNorm also has weight and bias and needs to be pre-multiplied and trained for hypernets