1. 11 10月, 2021 1 次提交
  2. 29 7月, 2021 1 次提交
  3. 13 7月, 2021 1 次提交
  4. 24 6月, 2021 1 次提交
    • H
      [NPU] support dygraph execution on npu place(#33579) · 6aea6be2
      houj04 提交于
      * in NPU environment, use CPUPlace for missing operators.
      
      * in NPU environment, use CPUPlace for missing operators.
      
      * fix TensorCopy bug and add unit test.
      
      * fix code style.
      
      * add more unit tests.
      6aea6be2
  5. 21 6月, 2021 1 次提交
  6. 01 6月, 2021 1 次提交
  7. 12 5月, 2021 1 次提交
  8. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
  9. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d
  10. 26 2月, 2021 1 次提交
  11. 22 12月, 2020 1 次提交
  12. 04 12月, 2020 1 次提交
  13. 01 12月, 2020 1 次提交
  14. 15 10月, 2020 1 次提交
  15. 13 10月, 2020 1 次提交
    • L
      Refine the format of printing tensor (#27673) · 049696bf
      Leo Chen 提交于
      * add sumary feature
      
      * refine printting tensor
      
      * add sci_mode
      
      * add sample code
      
      * fix indent error
      
      * fix _format_item
      
      * polish code
      
      * support item indent
      
      * add ut
      
      * set place for ut
      
      * fix py2 issue
      
      * fix ut
      049696bf
  16. 09 10月, 2020 1 次提交
  17. 24 9月, 2020 1 次提交
    • W
      use iwyu clean include (#27267) · df43905f
      wanghuancoder 提交于
      * use iwyu clean include, test=develop, test=win
      
      * compilation error, test=develop
      
      * fix compilation error2, test=develop
      
      * fix compilation error3, test=develop
      
      * fix compilation error4, test=develop
      
      * fix compilation error5, test=develop
      
      * fix compilation error6, test=develop
      
      * fix compilation error7, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error8, test=develop
      
      * fix compilation error10, test=develop
      
      * fix compilation error11, test=develop
      df43905f
  18. 16 9月, 2020 1 次提交
  19. 27 8月, 2020 1 次提交
  20. 24 8月, 2020 1 次提交
  21. 21 8月, 2020 1 次提交
    • Q
      support Baidu Kunlun AI Accelerator (#25959) · 138ecf24
      QingshuChen 提交于
      * support Baidu AI Accelerator
        * test=kunlun
      
      * minor
       * test=kunlun
      
      * support xpu op in separate file
       * test=kunlun
      
      * update XPU error message and remove duplicated code
      
       * test=kunlun
      
      * minor
       * test=kunlun
      
      * minor
       * test=kunlun
      138ecf24
  22. 15 8月, 2020 1 次提交
    • Z
      expose and unify the Tensor concepts to the user (#25978) · 6de463d3
      Zhou Wei 提交于
      * expose and unify the Tensor concepts to the user
      
      * expose tensor to user
      
      * add copy place for Tensor
      
      * add copy place for Tensor
      
      * add note
      
      * add macro PADDLE_WITH_CUDA
      
      * remove RUN_TYPE=DIST
      
      * fix some error
      6de463d3
  23. 29 7月, 2020 1 次提交
    • C
      Simplify BufferedReader to improve DataLoader performance (#25648) · 1b3081b1
      Chen Weihang 提交于
      * simplify buffered reader to improve DataLoader performance
      
      * fix 22 failed unittests
      
      * fix cuda pinned context condition
      
      * fix test_reader_reset failed
      
      * fix two failed unittests
      
      * change unittest place
      
      * polish error messaage
      
      * polish cast op GetExpecctedKernelType
      
      * remove debug info in unittest
      1b3081b1
  24. 11 5月, 2020 1 次提交
    • C
      Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f
      Chen Weihang 提交于
      * add new macro BOOST_GET_SAFELY & unittests, test=develop
      
      * add different macro type, test=develop
      
      * fix get macro type in executor, test=develop
      
      * four macro part change backup
      
      * using one macro for all case, test=develop
      
      * revert attribute change, test=develop
      
      * change to three func to solve gcc4.8 bug, test=develop
      
      * polish some details, test=develop
      aa0f254f
  25. 27 4月, 2020 2 次提交
    • C
      [dy2static] Add print transformer and unify print format (#24068) · 9b851ba2
      Chen Weihang 提交于
      * add print transformer & unify print format, test=develop
      
      * remove using of dygraph_to_static_func, test=develop
      
      * remove python stdout capture, test=develop
      
      * fix compatibility problems for PY2, test=develop
      
      * fix detail error, test=develop
      
      * fix type analysis bug, test=develop
      
      * fix print tuple compatible error in PY2, test=develop
      
      * replace get_func to declarative, test=develop
      
      * fix detail bug, test=develop
      
      * fix some detail problems, test=develop
      
      * change visit_call in print transformer, test=develop
      9b851ba2
    • Y
      Add the implementation of inverse (#23310) · ecfddebb
      Yiqun Liu 提交于
      ecfddebb
  26. 17 3月, 2020 1 次提交
  27. 11 3月, 2020 1 次提交
  28. 12 12月, 2019 1 次提交
    • T
      memory leak for cpu (#21174) · 9ad940fd
      tangwei12 提交于
      * add fake init for the trainer, fix large memory hold in the trainer
      * do not merge recv vars from a remote endpoint, test=develop
      * add recv and save op, merge slice var in one op, save memory
      * remove hsigmoid with pull sparse, test=develop
      9ad940fd
  29. 02 12月, 2019 1 次提交
  30. 28 11月, 2019 2 次提交
  31. 14 10月, 2019 1 次提交
    • 6
      Dlpack support (#20039) · 12e4be03
      633WHU 提交于
      * support dlpack to tensor and implement python interface test=develop
      
      * add unittest for _to_dlpack and from_dlpack test=develop
      12e4be03
  32. 03 9月, 2019 1 次提交
  33. 14 8月, 2019 1 次提交
  34. 24 5月, 2019 1 次提交
  35. 28 3月, 2019 1 次提交
    • J
      [MKL-DNN] Tensor modifications revert (#16462) · 26323274
      Jacek Czaja 提交于
      * Revert "[MKL-DNN] Fix to crash of Transformer when mkldnn is to be used (#16233)"
      
      This reverts commit 13816dd4.
      Apart from enabling transformer for MKL-DNN
      
      * Revert "- MKL-DNN pooling updated to set_prim_desc"
      
      This reverts commit c63f6b20.
      
      Conflicts:
      	paddle/fluid/operators/mkldnn/concat_mkldnn_op.cc
      
      * Revert "[MKL-DNN] MKL-DNN specific Tensor modification (#15429)"
      
      test=develop
      
      This reverts commit dec9cf53.
      
      * - concat compilation fix
      
      - lint
      
      test=develop
      
      - Lint fixes
      
      test=develop
      
      - Lint fixes
      
      test=develop
      
      - Fix Transpose MKLDNN op
      
      test=develop
      26323274
  36. 19 3月, 2019 2 次提交
  37. 11 3月, 2019 1 次提交