• J
    [Cherry-pick] Optimize update_loss_scaling_op(#32554) (#32606) · 33703da8
    jiangcheng 提交于
    * optimize update_loss_scaling_op by fused for loop to one kernel, test=develop
    
    * remove useless while loop and optimize variable name, test=develop
    
    * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop
    
    * optimize variable name for readable by change prefix identifier from t_ to local_
    33703da8
update_loss_scaling_op.cu 6.2 KB