fix(clip): use double to accumulate grad^2
Global Norm need to compulte L2 norm of grads. It will calculate sum{grad^2}. Using float32 is easily overflowed. test=release/1.0.0
Showing
想要评论请 注册 或 登录
Global Norm need to compulte L2 norm of grads. It will calculate sum{grad^2}. Using float32 is easily overflowed. test=release/1.0.0