Fork自 PaddlePaddle / Paddle
* add update_loss_scaling_npu NPU kernel * change TensorFromVec to Memset