1. 26 6月, 2022 1 次提交
  2. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
  3. 21 3月, 2019 1 次提交
  4. 20 3月, 2019 1 次提交
  5. 11 12月, 2018 1 次提交
  6. 16 11月, 2018 1 次提交
    • W
      Refine operator cmake (#14413) · a2d9b344
      Wu Yi 提交于
      * wip simplify operator framework
      
      * wip
      
      * wip
      
      * done test=develop
      
      * clean test=develop
      
      * fix test=develop
      
      * fix deps test=develop
      
      * fix cpu build test=develop
      
      * fix tensorrt build test=develop
      
      * fix tests test=develop
      
      * fix test=develop
      
      * fix cpu build test=develop
      a2d9b344
  7. 12 2月, 2018 1 次提交
  8. 10 2月, 2018 2 次提交
  9. 26 12月, 2017 1 次提交
  10. 12 12月, 2017 1 次提交
    • Q
      Refine device context (#6433) · 61ec0b95
      QI JUN 提交于
      There are mainly following fixes:
      
      - take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
      - remove `eigen_device` interface in base class  `DeviceContext`
      - remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
      - remove unused `platform::EigenDeviceConverter`
      - rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
      - rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`
      61ec0b95
  11. 21 11月, 2017 1 次提交
  12. 13 10月, 2017 1 次提交
    • A
      Adding the Adam Optimizer operator (#4733) · 11680037
      Abhinav Arora 提交于
      * add adam op
      
      moment1_out = beta1 * moment1 + (1 − beta1) * grad
      moment2_out = beta2 * moment2 + (1 − beta2) * grad * grad
      moment1_hat =  moment1_out / (1 - beta1^t)
      moment2_hat =  moment2_out / (1 - beta2^t)
      param_out = param - learning_rate * moment1_hat / (sqrt(moment2_hat) +
      epsilon)
      
      * fix moment 2
      
      * Adding the Adam optimization operator
      
      * Adding more tests for Adam op
      11680037
  13. 07 8月, 2017 1 次提交
  14. 04 8月, 2017 1 次提交
  15. 31 7月, 2017 1 次提交
  16. 25 7月, 2017 1 次提交
  17. 19 7月, 2017 1 次提交