1. 24 9月, 2021 1 次提交
  2. 26 5月, 2021 1 次提交
  3. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
  4. 04 2月, 2021 1 次提交
  5. 03 11月, 2020 1 次提交
  6. 07 7月, 2020 1 次提交
  7. 03 7月, 2020 1 次提交
  8. 09 6月, 2020 1 次提交
  9. 26 5月, 2020 1 次提交
  10. 25 5月, 2020 1 次提交
  11. 11 5月, 2020 1 次提交
    • C
      Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f
      Chen Weihang 提交于
      * add new macro BOOST_GET_SAFELY & unittests, test=develop
      
      * add different macro type, test=develop
      
      * fix get macro type in executor, test=develop
      
      * four macro part change backup
      
      * using one macro for all case, test=develop
      
      * revert attribute change, test=develop
      
      * change to three func to solve gcc4.8 bug, test=develop
      
      * polish some details, test=develop
      aa0f254f
  12. 24 2月, 2020 1 次提交
  13. 23 2月, 2020 1 次提交
  14. 09 1月, 2020 1 次提交
  15. 28 11月, 2019 1 次提交
  16. 13 3月, 2019 1 次提交
  17. 11 3月, 2019 1 次提交
  18. 04 3月, 2019 3 次提交
  19. 01 3月, 2019 1 次提交
  20. 24 2月, 2019 1 次提交
  21. 22 2月, 2019 1 次提交
  22. 21 2月, 2019 1 次提交
    • D
      Profiler refine and add CUDA runtime api tracer (#15301) · a83e4704
      Dun 提交于
      * refine profiler && add runtime tracer
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * fix bug && test=develop
      
      * add thread id map && test=develop
      
      * test=develop
      
      * testing
      
      * bug fix
      
      * remove cuda event && refine code && test=develop
      
      * test=develop
      
      * test=develop
      
      * test=develop
      
      * fix windows temp file && test=develop
      
      * test=develop
      
      * fix windows bug && test=develop
      
      * fix start up issue && test=develop
      
      * code polish &&  test=develop
      
      * remove unused code && test=develop
      
      * add some cupti cbid && test=develop
      
      * add FLAGS_multiple_of_cupti_buffer_size && test=develop
      
      * fix compile error && test=develop
      
      * add keyword && test=develop
      
      * fix && test=develop
      
      * code polish && test=develop
      a83e4704
  23. 04 12月, 2018 1 次提交
    • Z
      test=develop · deb04809
      ZongwuYang 提交于
      Fix the bug that profiler cannot trace the nccl allreduce operator
      deb04809
  24. 26 11月, 2018 1 次提交
  25. 08 11月, 2018 1 次提交
  26. 13 8月, 2018 1 次提交
  27. 10 8月, 2018 1 次提交
  28. 31 7月, 2018 1 次提交
  29. 30 7月, 2018 3 次提交
  30. 23 7月, 2018 1 次提交
  31. 14 6月, 2018 1 次提交
    • X
      Remove cuptiFinalize. · d2afd210
      Xin Pan 提交于
      In cupti samples, only cuptiFlush is used.
      I can't find any places calling cuptiFinalize and
      this API can error out as not_implemented in some
      cuda installation.
      d2afd210
  32. 08 6月, 2018 2 次提交
  33. 22 5月, 2018 1 次提交
    • X
      multi-thread handlerequest · b4dd4c04
      Xin Pan 提交于
          Experiment on vgg flower, 2 trainers, 1ps.
          more trainer could have more speedup.
      
          After:
          Pass = 0, Iters = 327, Speed = (7.52) img/s
          Before:
          Pass = 0, Iters = 385, Speed = (6.77) img/s
      b4dd4c04
  34. 10 4月, 2018 1 次提交
  35. 14 3月, 2018 1 次提交