• J
    Optimize update_loss_scaling_op (#32554) · 0dc02dc7
    jiangcheng 提交于
    * optimize update_loss_scaling_op by fused for loop to one kernel, test=develop
    
    * remove useless while loop and optimize variable name, test=develop
    
    * optimize variable name from out_addrs_tensor to out_addrs_mem, test=develop
    
    * optimize variable name for readable by change prefix identifier from t_ to local_
    0dc02dc7
update_loss_scaling_op.cu 6.2 KB