1. 27 3月, 2022 1 次提交
  2. 23 3月, 2022 1 次提交
  3. 07 3月, 2022 1 次提交
    • M
      cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca
      Ming-Xu Huang 提交于
      * Added cuBlasLtHandle_t to device context.
      
      * Added fused_gemm_epilogue op.
      
      1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
      2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
      2. Act currently only be supported ReLU. (Will add GeLU in the future).
      
      * Added UT to fused_gemm_epilogue op.
      
      * Added LinearAct Pattern
      
      1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
      pattern.
      2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
      3. act currently only support ReLU (Will support GeLU in the future).
      
      * Added FuseGemmEpiloguePass
      
      1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
      fusion (GeLU will be supported in the future).
      2. Only support matmul_v2 from nn.Linear.
      
      * Added pybind to BuildStrageter.fuse_gemm_epilogue_.
      
      * Added UT for fuse_gemm_epilogue_pass.
      
      * GeLU support and EpilogueSingleton
      
      1. Added GeLU support to fused_gemm_epilogue op.
      2. Added EpilogueSingleton to cache auxiliary pointer.
      3. Added related UTs.
      
      * Rename cublaslt_epilogue_opto gemm_epilogue_op.*.
      
      * Added both train and infer pattern to LinearAct.
      
      1. Added support of fwd graph with grap_ops linking to LinearAct.
      2. Added related changes to fuse_gemm_epilogue_pass for above
      modification.
      
      * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.
      
      * Added identity activation support to gemm_epilogue_op.
      
      * Added Linear Fusion (matmul_v2 + ele_add)
      
      1. Added matmul_v2 + ele_add pattern to LinearActPattern.
      2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.
      
      * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*
      
      * Add fused_gemm_epilogue_grad op.
      
      1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.
      
      * Add UTs to fused_gemm_epilogue_grad_op.
      
      * Change attribute name in fused_gemm_epilogue_grad_op for clearing.
      
      * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.
      
      * Added ElementwiseAdd+Matmul+Act graph pattern detection.
      
      * Fuse backward of Linear( Act(x))
      
      1. Added backward fusion pass to Linear( Act(x)).
      2. Added backward fusion pass to Linear(x).
      
      * Added UTs to backward fusion of Linear(Act(x)).
      
      * Complete document of arguments to fused_gemm_epilogue_op.
      
      * Made arguments of some functions pass by reference.
      
      * Modify code with review comments.
      
      1. Made arguments of some function pass by reference.
      2. Removed redundant code.
      3. Followed Google code style to change code.
      
      * Made 'const' code style be consistent
      
      * Fixed random seed of python UTs.
      
      * Set Compiling constrains to cuBlasLt
      
      1. Require CUDA 11.6+
      2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.
      
      * Code Reivew from Paddle
      
      1. Changed arguments name is_first_gemm to without_x_gradient for
      clearing.
      2. Applied PADDLE_THROW in fused_gemm_epilogue_op.
      
      * Remove EpilogueSingleton
      
      1. Applied ReserveSpace to replace Epilogue for passing auxiliary
      pointers between FWD and BWD.
      
      * Fix a logical error and enhance UTs.
      
      1. Added act op count checking in UTs.
      2. Fix issue to fuse backward or ReLU(Linear(X)).
      3. TODO: solve GELU fusion issues.
      
      * Fix Linear and GeLU fusion issues.
      
      1. Modified graph_detech_pattern to fit with both linear wiht gelu or
      relu.
      2. Modified data range in Uts to allow negative values.
      
      * Removed fused_gemm_epilogue_op.h.
      
      * Rename namespace pten to phi.
      
      * Rename name of arguments in fused_gemm_epilogue_op
      
      1. bias -> Bias.
      2. out -> Out.
      3. reserve_space -> ReserveSpace.
      
      * Change EpiloguePassActivationCache as local variable.
      
      1. Removed singleton in EpiloguePassActivationCache.
      2. Made EpiloguePassActivationCache as an argument to each pass
      functions.
      2a3d9eca
  4. 03 3月, 2022 1 次提交
  5. 23 2月, 2022 2 次提交
    • L
      [phi] move randperm to phi (#39816) · 30992ea0
      Leo Chen 提交于
      * move randperm to phi
      
      * fix npu
      
      * fix memory::Copy
      30992ea0
    • C
      Update record interface using part3 (#39695) · 1fcaab45
      chenjian 提交于
      * fix RecordEvent interface
      
      * modify default level to 4
      
      * update interface use
      
      * add const default trace level
      
      * update record event interface using
      
      * update record event interface using
      
      * update record event interface using
      
      * update operator.cc
      
      * update part2
      
      * update part1
      
      * update part3
      
      * fix include profiler.h header in ps server
      
      * fix include profiler.h header in ps server
      
      * fix profiler.h header
      
      * fix profiler.h header
      
      * fix merge buf
      
      * update
      
      * fix bug
      
      * fix bug
      1fcaab45
  6. 22 2月, 2022 1 次提交
  7. 20 2月, 2022 1 次提交
  8. 15 2月, 2022 1 次提交
    • R
      [PluggableDevice] Add custom runtime support (#38740) · 3e7825f3
      ronnywang 提交于
      * [CustomRuntime] Add DeviceManager
      
      * [CustomRuntime] Add DeviceInterface
      
      * [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager
      
      * [CustomRuntime] Add plug-in device
      
      * [CustomRuntime] Memory module support PluggableDevice
      
      * [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option
      
      * update
      
      * [API] update API doc based on comments, test=develop
      Co-authored-by: Nqili93 <qili93@qq.com>
      3e7825f3
  9. 14 2月, 2022 1 次提交
  10. 08 2月, 2022 1 次提交
  11. 06 2月, 2022 1 次提交
  12. 27 1月, 2022 1 次提交
  13. 26 1月, 2022 1 次提交
  14. 25 1月, 2022 1 次提交
  15. 24 1月, 2022 1 次提交
  16. 21 1月, 2022 1 次提交
    • W
      [PTEN] Add cpu context (#38979) · 064bc4b8
      Wilber 提交于
      * add cpu_context.
      
      * update
      
      * update
      
      * update
      
      * update
      
      * update
      
      * fix ci problem
      
      * fix npu ci problem
      
      * update
      
      * fix ci compile
      064bc4b8
  17. 20 1月, 2022 1 次提交
  18. 17 1月, 2022 1 次提交
    • W
      [Pten] Replace platform::Place to pten::Place. (#38899) · c48a9ad5
      Wilber 提交于
      * add pten::Place data structure.
      
      * update ci problem
      
      * fix ci problem
      
      * update
      
      * using platform::Place=pten::Place
      
      * remove BOOST_GET_CONST for CPUPlace and GPUPlace
      
      * compile pass 25%.
      
      * compile pass 45%
      
      * compile pass 60%
      
      * remove boost_get for xpu npu mlu and ipu
      
      * compile pass on cpu and gpu.
      
      * fix compile problem
      
      * fix compile error.
      
      * update
      
      * fix ci problem
      
      * update
      
      * ci approve
      
      * fix ci problem
      
      * fix ci eager test problem
      
      * remove BOOST_GET_CONST
      
      * fix npu compile
      c48a9ad5
  19. 04 1月, 2022 1 次提交
  20. 30 12月, 2021 1 次提交
    • Z
      Add cusparse and unittest (#38431) · 667dc9f0
      zhangkaihuo 提交于
      
      
          将cuSparse的handle与DeviceContext进行绑定,避免op中进行创建和销毁
          添加对cuSparse中dense和sparse转换的API进行封装
          添加对封装的API的单测
      667dc9f0
  21. 23 12月, 2021 2 次提交
  22. 20 12月, 2021 1 次提交
  23. 09 12月, 2021 1 次提交
  24. 03 12月, 2021 1 次提交
  25. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  26. 24 11月, 2021 1 次提交
  27. 02 11月, 2021 1 次提交
  28. 01 11月, 2021 1 次提交
  29. 14 10月, 2021 1 次提交
  30. 13 10月, 2021 1 次提交
  31. 15 9月, 2021 1 次提交
  32. 03 8月, 2021 1 次提交
  33. 15 7月, 2021 1 次提交
  34. 09 6月, 2021 1 次提交
  35. 12 5月, 2021 1 次提交
  36. 28 4月, 2021 1 次提交
  37. 19 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick gc/dataloader/save&load/optimization from ascendrc to develop (#32294) · cbe5c9f8
      Leo Chen 提交于
      * [NPU] support GarbageCollector for npu (#31874)
      
      * support GarbageCollector for npu
      
      * fix typo
      
      * fix gather_grad
      
      * disable NPUDefaultStreamGarbageCollector on NPU
      
      * [NPU] support npu for memcpy op (#31808)
      
      * support npu for memcpy op
      
      * add ut
      
      * fix ut
      
      * fix typo
      
      * 【NPU】fix bug of using temp vector (#31963)
      
      * fix bug when beta1_pow on cpu (#31995)
      
      * [NPU] support npu profiler (#31684)
      
      * support npu profiler
      
      * add python api
      
      * fix bugs
      
      * add wrapper for incomplete type
      
      * update profile proto
      
      * record npu wait
      
      * add xpu placeholder
      
      * fix adam (#32016)
      
      * [NPU] enable async copy and  add wait before sync operation (#31956)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * make TensorFromVector/TensorToVector sync
      
      * [NPU] Support dataloader on npu place. (#31867)
      
      * [NPU] Wait on NPUPlace (#32086)
      
      * [NPU] fix cast op (#32121)
      
      * fix npu kernel of cast op to handle casting to same dtype
      
      * add comments
      
      * [NPU] support cann 20.3 (#32044)
      
      * fix compile problem on cann 20.3
      
      * fix ut
      
      * fix test_mul
      
      * fix check_finite_and_scale
      
      * fix lookup_table_v2_grad
      
      * fix cmake
      
      * support print op
      
      * [NPU] Support npu save load (#31893)
      
      * support save load for NPU
      
      * add save load npu unittest
      
      * support np.array transform in NPU
      
      * fix errors
      
      * delete dygraph in unittest
      
      * add Wait
      
      * fix unittest
      
      * fix review comment
      
      * fix unittest problem
      
      * fix little problem
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performance (#32196)
      
      * change aclrtSynchronizeDevice to aclrtSynchronizeStream for better performace
      
      * refine code
      
      * fix NPUDeviceContext in all c++ unittest (#32198)
      
      * fix NPUDeviceContext in all c++ unittest
      
      * refine log
      Co-authored-by: Npangyoki <pangyoki@126.com>
      
      * [NPU] Remove TensorFromVector and avoid sync copy in npu op kernel for better performance (#31994)
      
      * enable async copy and  add wait before sync operation
      
      * remove unneccessary wait
      
      * add FillNpuTensorWithConstant
      
      * refine
      
      * fix fill_constant
      
      * change TensorFromVector to FillNpuTensorWithConstant
      
      * fix ignored api
      
      * delete extra unittest
      
      * fix little error
      
      * fix update_loss_scaling_op_npu and check_finite_and_unscale_op_npu
      
      * change TensorCopySync to TensorCopy
      
      * delete useless Wait and add StreamWait
      
      * fix npu_stream error
      
      * fix check_finite_and_unscale_op_npu TensorCopy
      
      * only save stream wait
      
      * fix NPUDeviceContext in all c++ unittest
      
      * delete wait
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      
      * delete useless unittest file (#32206)
      
      * Fix op test (#32231)
      
      * fix conditional block (#32243)
      
      * fix adam bug again (#32246)
      
      * fix compile
      
      * fix ut
      
      * fix ut
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      Co-authored-by: Npangyoki <pangyoki@126.com>
      cbe5c9f8
  38. 09 4月, 2021 1 次提交
    • L
      [NPU] cherry-pick basic NPU components/allocator/operator/executor supports from ascendrc (#32144) · ccf5709d
      Leo Chen 提交于
      * [feature] support npu allocator (#30840)
      
      [feature] support npu allocator
      
      * [feature] support npu operator (#30951)
      
      [feature] support npu operator
      
      * [feature] support npu allocator, part 2 (#30972)
      
      * support npu allocator
      
      * add npu device context
      
      * fix some compile problem
      
      * fix some compile problem
      
      * add npu info
      
      * compile ok
      
      * fix include dir
      
      * support naive_best_fit_allocator
      
      * run ut ok, bug failed to exit
      
      * call aclrtResetDevice before exit
      
      * fix aclFinilize
      
      * add system allocatot test
      
      * add selected_gpus in gtest
      
      * add tensor_test for npu
      
      * support npu op, initial commit
      
      * add npu stream
      
      * add elementwise_add_op
      
      * compile ok
      
      * fix typo
      
      * fix elementwise_add_op_npu_test
      
      * support op run
      
      * test can run but failed
      
      * change aclopExecuteV2 to aclopCompileAndExecute
      
      * support parsing ascend rank table file (#31000)
      
      support parsing ascend rank table file
      
      * Fix reshape on GE graph. (#31084)
      
      Fix reshape on GE graph
      
      * add npu kernel for elementwise_sub and elementwise_sub_grad (#30973)
      
      * add npu sub op
      
      * fix typo
      
      * rename test
      
      * fix bug
      
      * fix bug
      
      * add fp16 kernel
      
      * fix typo
      
      * support sub grad op
      
      * support elementwise_sub_grad op
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      
      * Fix compilation problem (#31100)
      
      Fix compilation problem (#31100)
      
      * fix compile
      
      * fix code stype
      
      * remove const_cast
      
      * support adding correct npu op in pybind.h (#31143)
      
      * support adding correct npu op in pybind.h
      
      * refine code
      
      * [NPU] Support executor with NPU (#31057)
      
      * [NPU] Support executor with NPU
      
      * Fix code according to reviews
      
      * Fix code
      
      * Add unittest for sub op npu
      
      * refactor npu device manager (#31154)
      
      refactor npu device manager (#31154)
      
      * fix selected npus
      
      * fix compile
      
      * fix reading flags from env
      
      * format
      Co-authored-by: Nxiayanming <41795079@qq.com>
      Co-authored-by: Ngongweibao <weibao.gong@gmail.com>
      Co-authored-by: Nfrankwhzhang <frankwhzhang@126.com>
      Co-authored-by: Nliym27 <33742067+liym27@users.noreply.github.com>
      ccf5709d