• P
    [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994) · 5648bd80
    pangyoki 提交于
    * enable async copy and  add wait before sync operation
    
    * remove unneccessary wait
    
    * add FillNpuTensorWithConstant
    
    * refine
    
    * fix fill_constant
    
    * change TensorFromVector to FillNpuTensorWithConstant
    
    * fix ignored api
    
    * delete extra unittest
    
    * fix little error
    
    * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
    
    * change TensorCopySync to TensorCopy
    
    * delete useless Wait and add StreamWait
    
    * fix npu_stream error
    
    * fix check_finite_and_unscale_op_npu TensorCopy
    
    * only save stream wait
    
    * fix NPUDeviceContext in all c++ unittest
    
    * delete wait
    Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
    5648bd80
elementwise_add_op_npu.cc 5.7 KB