1. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  2. 26 3月, 2021 1 次提交
  3. 26 2月, 2021 1 次提交
  4. 13 10月, 2020 1 次提交
    • L
      Refine the format of printing tensor (#27673) · 049696bf
      Leo Chen 提交于
      * add sumary feature
      
      * refine printting tensor
      
      * add sci_mode
      
      * add sample code
      
      * fix indent error
      
      * fix _format_item
      
      * polish code
      
      * support item indent
      
      * add ut
      
      * set place for ut
      
      * fix py2 issue
      
      * fix ut
      049696bf
  5. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  6. 16 9月, 2020 1 次提交
  7. 24 8月, 2020 1 次提交
  8. 11 5月, 2020 1 次提交
    • C
      Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f
      Chen Weihang 提交于
      * add new macro BOOST_GET_SAFELY & unittests, test=develop
      
      * add different macro type, test=develop
      
      * fix get macro type in executor, test=develop
      
      * four macro part change backup
      
      * using one macro for all case, test=develop
      
      * revert attribute change, test=develop
      
      * change to three func to solve gcc4.8 bug, test=develop
      
      * polish some details, test=develop
      aa0f254f
  9. 13 4月, 2020 1 次提交
  10. 12 12月, 2019 1 次提交
    • T
      memory leak for cpu (#21174) · 9ad940fd
      tangwei12 提交于
      * add fake init for the trainer, fix large memory hold in the trainer
      * do not merge recv vars from a remote endpoint, test=develop
      * add recv and save op, merge slice var in one op, save memory
      * remove hsigmoid with pull sparse, test=develop
      9ad940fd
  11. 14 10月, 2019 1 次提交
    • 6
      Dlpack support (#20039) · 12e4be03
      633WHU 提交于
      * support dlpack to tensor and implement python interface test=develop
      
      * add unittest for _to_dlpack and from_dlpack test=develop
      12e4be03
  12. 11 9月, 2019 1 次提交
    • H
      Replace TemporaryAllocator by CUDADeviceContextAllocator (#18989) · 12542320
      Huihuang Zheng 提交于
      TemporaryAllocator is a singleton used for allocating memory for Cudnn. Since it is a singleton, we can delete it for better performance in memory.
      
      We replace TemporaryAllocator by CUDADeviceContextAllocator and CUDADeviceContextAllocation, which uses stream callback to delete the memory allocated for the stream to avoid singleton.
      
      Also added data_feed_proto to operator to fix CI in CPU compilation
      12542320
  13. 03 9月, 2019 1 次提交
  14. 24 5月, 2019 1 次提交
  15. 02 1月, 2019 1 次提交
  16. 25 12月, 2018 1 次提交
  17. 09 10月, 2018 1 次提交
  18. 02 9月, 2018 1 次提交
  19. 30 8月, 2018 1 次提交
  20. 29 6月, 2018 1 次提交
  21. 27 4月, 2018 1 次提交
  22. 23 4月, 2018 1 次提交
  23. 19 4月, 2018 1 次提交
  24. 15 2月, 2018 1 次提交
    • Y
      Update tensor_util.h (#8422) · cfffb1a3
      Yi Wang 提交于
      * Update tensor_util.h
      
      * Update with moved TensorDesc
      
      * Fix tensur_utils.cu
      
      * Update
      
      * Update
      
      * Update
      
      * Update
      
      * Make tensor_util.cu a symbolic link
      cfffb1a3
  25. 13 2月, 2018 1 次提交
    • A
      Separate VarType from VarDesc in framework.proto and fix all related compiler errors (#8414) · fcadb452
      Abhinav Arora 提交于
      * Refine Type system
      
      * Fixing type inference
      
      * Fixed create_reader_op.cc
      
      * Fix var_desc.h
      
      * Fixed executor.cc
      
      * Fix shape_inference.h
      
      * Fixed create_reader_op.cc
      
      * Fix tensor_util.h
      
      * Fixed var_type_inference_test.cc
      
      * Fix shape_inference.cc
      
      * Fixed sum_op.c
      
      * Fixed read_op.cc
      
      * Fix var_type.h
      
      * Fixed beam_search_decode_op.cc
      
      * sendrecvop_utils.cc
      
      * Fix operator.cc
      
      * Fixed lookup_table_op.cc
      
      * Fixed op_desc.cc
      
      * Fixed get_places_op.cc
      
      * Fixed lod_rank_table_op.cc
      
      * Fixed beam_search_op.cc
      
      * Fix var_desc.cc
      
      * Fixed lod_tensor_to_array_op.cc
      
      * Fixed while_op.cc
      
      * Fix program_desc_test.cc
      
      * tensor_array_read_write_op.cc
      
      * Fix assign_op.cc
      
      * Fix executor.cc
      
      * Fix protobuf.cc
      
      * Fix protobuf.cc
      fcadb452
  26. 12 2月, 2018 1 次提交
  27. 10 2月, 2018 2 次提交
  28. 15 1月, 2018 1 次提交
  29. 12 1月, 2018 1 次提交
  30. 10 1月, 2018 1 次提交
  31. 09 1月, 2018 1 次提交
  32. 05 1月, 2018 1 次提交
    • Y
      send_recv variables (#7161) · e5fe8935
      Yancey 提交于
      * send_recv variable
      
      * delete unused logs
      
      * fix ci failed
      
      * update
      
      * resize tensor before tensor copy
      
      * add selectedrows unit test
      
      * check rows
      e5fe8935
  33. 29 12月, 2017 1 次提交
  34. 28 12月, 2017 3 次提交
  35. 27 12月, 2017 2 次提交
  36. 26 12月, 2017 1 次提交