[Cherry-pick] Optimize update_loss_scaling_op(#32554) (#32606)
* optimize update_loss_scaling_op by fused for loop to one kernel, test=develop * remove useless while loop and optimize variable name, test=develop * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop * optimize variable name for readable by change prefix identifier from t_ to local_
Showing
想要评论请 注册 或 登录