• L
    [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
    Leo Chen 提交于
    * [NPU] support GarbageCollector for npu (#31874)
    
    * support GarbageCollector for npu
    
    * fix typo
    
    * fix gather_grad
    
    * disable NPUDefaultStreamGarbageCollector on NPU
    
    * [NPU] support npu for memcpy op (#31808)
    
    * support npu for memcpy op
    
    * add ut
    
    * fix ut
    
    * fix typo
    
    * 【NPU】fix bug of using temp vector (#31963)
    
    * fix bug when beta1_pow on cpu (#31995)
    
    * [NPU] support npu profiler (#31684)
    
    * support npu profiler
    
    * add python api
    
    * fix bugs
    
    * add wrapper for incomplete type
    
    * update profile proto
    
    * record npu wait
    
    * add xpu placeholder
    
    * fix adam (#32016)
    
    * [NPU] enable async copy and  add wait before sync operation (#31956)
    
    * enable async copy and  add wait before sync operation
    
    * remove unneccessary wait
    
    * add FillNpuTensorWithConstant
    
    * refine
    
    * fix fill_constant
    
    * make TensorFromVector/TensorToVector sync
    
    * [NPU] Support dataloader on npu place. (#31867)
    
    * [NPU] Wait on NPUPlace (#32086)
    
    * [NPU] fix cast op (#32121)
    
    * fix npu kernel of cast op to handle casting to same dtype
    
    * add comments
    
    * [NPU] support cann 20.3 (#32044)
    
    * fix compile problem on cann 20.3
    
    * fix ut
    
    * fix test_mul
    
    * fix check_finite_and_scale
    
    * fix lookup_table_v2_grad
    
    * fix cmake
    
    * support print op
    
    * [NPU] Support npu save load (#31893)
    
    * support save load for NPU
    
    * add save load npu unittest
    
    * support np.array transform in NPU
    
    * fix errors
    
    * delete dygraph in unittest
    
    * add Wait
    
    * fix unittest
    
    * fix review comment
    
    * fix unittest problem
    
    * fix little problem
    
    * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
    
    * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
    
    * refine code
    
    * fix NPUDeviceContext in all c++ unittest (#32198)
    
    * fix NPUDeviceContext in all c++ unittest
    
    * refine log
    Co-authored-by: Npangyoki <pangyoki@126.com>
    
    * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
    
    * enable async copy and  add wait before sync operation
    
    * remove unneccessary wait
    
    * add FillNpuTensorWithConstant
    
    * refine
    
    * fix fill_constant
    
    * change TensorFromVector to FillNpuTensorWithConstant
    
    * fix ignored api
    
    * delete extra unittest
    
    * fix little error
    
    * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
    
    * change TensorCopySync to TensorCopy
    
    * delete useless Wait and add StreamWait
    
    * fix npu_stream error
    
    * fix check_finite_and_unscale_op_npu TensorCopy
    
    * only save stream wait
    
    * fix NPUDeviceContext in all c++ unittest
    
    * delete wait
    Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
    
    * delete useless unittest file (#32206)
    
    * Fix op test (#32231)
    
    * fix conditional block (#32243)
    
    * fix adam bug again (#32246)
    
    * fix compile
    
    * fix ut
    
    * fix ut
    Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
    Co-authored-by: Npangyoki <pangyoki@126.com>
    cbe5c9f8
pybind.cc 123.9 KB