[NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
* enable async copy and add wait before sync operation
* remove unneccessary wait
* add FillNpuTensorWithConstant
* refine
* fix fill_constant
* change TensorFromVector to FillNpuTensorWithConstant
* fix ignored api
* delete extra unittest
* fix little error
* fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
* change TensorCopySync to TensorCopy
* delete useless Wait and add StreamWait
* fix npu_stream error
* fix check_finite_and_unscale_op_npu TensorCopy
* only save stream wait
* fix NPUDeviceContext in all c++ unittest
* delete wait
Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
Showing
想要评论请 注册 或 登录