1. 17 3月, 2022 1 次提交
    • H
      CopyFromCpu and CopyToCpu of Onnxruntime back-end optimize (#40561) · fcbb7440
      heliqi 提交于
      * add onnxruntime predictor
      
      * Add code comments
      
      * support link paddle2onnx onnxruntime
      
      * support onnxruntime with python
      
      * support onnxruntime with python
      
      * support onnxruntime with windows
      
      * paddle2onnx compile with windows
      
      * supoort windows compile
      
      * supoort windows compile with onnxruntime
      
      * supoort windows compile with paddle2onnx
      
      * supoort mac compile
      
      * compile with mac
      
      * compile with mac
      
      * add code comments
      
      * fix remind word
      
      * code optimization
      
      * add test case
      
      * add test case
      
      * add inference demo_ci test case
      
      * fix compile paddle2onnx with no python
      
      * add inference demo_ci test case
      
      * add inference demo_ci test case
      
      * add inference infer_ut test case
      
      * support c go api and test cases
      
      * add converage test case
      
      * add converage test case
      
      * add capi test case
      
      * add capi test case
      
      * fix onnxruntime copyfromcpu and copytocpu
      
      * fix goapi
      
      * modify code
      fcbb7440
  2. 14 3月, 2022 1 次提交
  3. 12 3月, 2022 1 次提交
  4. 10 3月, 2022 2 次提交
    • H
      Inference add ONNXRuntime back-end (#39988) · 431afc39
      heliqi 提交于
      * add onnxruntime predictor
      
      * Add code comments
      
      * support link paddle2onnx onnxruntime
      
      * support onnxruntime with python
      
      * support onnxruntime with python
      
      * support onnxruntime with windows
      
      * paddle2onnx compile with windows
      
      * supoort windows compile
      
      * supoort windows compile with onnxruntime
      
      * supoort windows compile with paddle2onnx
      
      * supoort mac compile
      
      * compile with mac
      
      * compile with mac
      
      * add code comments
      
      * fix remind word
      
      * code optimization
      
      * add test case
      
      * add test case
      
      * add inference demo_ci test case
      
      * fix compile paddle2onnx with no python
      
      * add inference demo_ci test case
      
      * add inference demo_ci test case
      
      * add inference infer_ut test case
      
      * support c go api and test cases
      
      * add converage test case
      
      * add converage test case
      
      * add capi test case
      
      * add capi test case
      431afc39
    • z8hanghuan's avatar
      add tril_triu for xpu, *test=kunlun (#40246) · 1128db30
      z8hanghuan 提交于
      * add tril_triu for xpu, *test=kunlun
      
      * add tril_triu for xpu, *test=kunlun
      
      * add tril_triu for xpu, *test=kunlun
      
      * add tril_triu for xpu, *test=kunlun
      
      * add tril_triu for xpu, *test=kunlun
      1128db30
  5. 08 3月, 2022 2 次提交
  6. 07 3月, 2022 2 次提交
    • b798fb07
    • M
      cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca
      Ming-Xu Huang 提交于
      * Added cuBlasLtHandle_t to device context.
      
      * Added fused_gemm_epilogue op.
      
      1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
      2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
      2. Act currently only be supported ReLU. (Will add GeLU in the future).
      
      * Added UT to fused_gemm_epilogue op.
      
      * Added LinearAct Pattern
      
      1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
      pattern.
      2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
      3. act currently only support ReLU (Will support GeLU in the future).
      
      * Added FuseGemmEpiloguePass
      
      1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
      fusion (GeLU will be supported in the future).
      2. Only support matmul_v2 from nn.Linear.
      
      * Added pybind to BuildStrageter.fuse_gemm_epilogue_.
      
      * Added UT for fuse_gemm_epilogue_pass.
      
      * GeLU support and EpilogueSingleton
      
      1. Added GeLU support to fused_gemm_epilogue op.
      2. Added EpilogueSingleton to cache auxiliary pointer.
      3. Added related UTs.
      
      * Rename cublaslt_epilogue_opto gemm_epilogue_op.*.
      
      * Added both train and infer pattern to LinearAct.
      
      1. Added support of fwd graph with grap_ops linking to LinearAct.
      2. Added related changes to fuse_gemm_epilogue_pass for above
      modification.
      
      * Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.
      
      * Added identity activation support to gemm_epilogue_op.
      
      * Added Linear Fusion (matmul_v2 + ele_add)
      
      1. Added matmul_v2 + ele_add pattern to LinearActPattern.
      2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.
      
      * Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*
      
      * Add fused_gemm_epilogue_grad op.
      
      1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.
      
      * Add UTs to fused_gemm_epilogue_grad_op.
      
      * Change attribute name in fused_gemm_epilogue_grad_op for clearing.
      
      * Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.
      
      * Added ElementwiseAdd+Matmul+Act graph pattern detection.
      
      * Fuse backward of Linear( Act(x))
      
      1. Added backward fusion pass to Linear( Act(x)).
      2. Added backward fusion pass to Linear(x).
      
      * Added UTs to backward fusion of Linear(Act(x)).
      
      * Complete document of arguments to fused_gemm_epilogue_op.
      
      * Made arguments of some functions pass by reference.
      
      * Modify code with review comments.
      
      1. Made arguments of some function pass by reference.
      2. Removed redundant code.
      3. Followed Google code style to change code.
      
      * Made 'const' code style be consistent
      
      * Fixed random seed of python UTs.
      
      * Set Compiling constrains to cuBlasLt
      
      1. Require CUDA 11.6+
      2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.
      
      * Code Reivew from Paddle
      
      1. Changed arguments name is_first_gemm to without_x_gradient for
      clearing.
      2. Applied PADDLE_THROW in fused_gemm_epilogue_op.
      
      * Remove EpilogueSingleton
      
      1. Applied ReserveSpace to replace Epilogue for passing auxiliary
      pointers between FWD and BWD.
      
      * Fix a logical error and enhance UTs.
      
      1. Added act op count checking in UTs.
      2. Fix issue to fuse backward or ReLU(Linear(X)).
      3. TODO: solve GELU fusion issues.
      
      * Fix Linear and GeLU fusion issues.
      
      1. Modified graph_detech_pattern to fit with both linear wiht gelu or
      relu.
      2. Modified data range in Uts to allow negative values.
      
      * Removed fused_gemm_epilogue_op.h.
      
      * Rename namespace pten to phi.
      
      * Rename name of arguments in fused_gemm_epilogue_op
      
      1. bias -> Bias.
      2. out -> Out.
      3. reserve_space -> ReserveSpace.
      
      * Change EpiloguePassActivationCache as local variable.
      
      1. Removed singleton in EpiloguePassActivationCache.
      2. Made EpiloguePassActivationCache as an argument to each pass
      functions.
      2a3d9eca
  7. 04 3月, 2022 1 次提交
  8. 02 3月, 2022 2 次提交
  9. 01 3月, 2022 2 次提交
  10. 28 2月, 2022 2 次提交
  11. 25 2月, 2022 1 次提交
    • C
      [Phi] Support cudnn kernel moving & move softmax kernels (#39547) · 8895379a
      Chen Weihang 提交于
      * support cudnn kernel moving
      
      * polish cmake rules
      
      * add unittest for coverage
      
      * remove orig kernel
      
      * remove softmax cudnn kernel
      
      * fix softmax test failed
      
      * fix npu func error
      
      * resolve conflict
      
      * rename gpu dnn kernels
      
      * fix name rule error
      
      * fix compile error
      
      * update fp16 namespace
      8895379a
  12. 24 2月, 2022 2 次提交
  13. 23 2月, 2022 1 次提交
  14. 22 2月, 2022 2 次提交
  15. 20 2月, 2022 1 次提交
  16. 19 2月, 2022 2 次提交
  17. 18 2月, 2022 2 次提交
  18. 17 2月, 2022 1 次提交
    • H
      add softplus op for kunlun2. test=kunlun (#39555) · 9f99b591
      houj04 提交于
      * add softplus op for kunlun2. test=kunlun
      
      * add softplus op for kunlun2. test=kunlun
      
      * fix code style. test=kunlun
      
      * fix code style. test=kunlun
      
      * add more test cases. test=kunlun
      9f99b591
  19. 15 2月, 2022 3 次提交
  20. 14 2月, 2022 1 次提交
  21. 11 2月, 2022 1 次提交
  22. 02 2月, 2022 1 次提交
  23. 30 1月, 2022 1 次提交
  24. 29 1月, 2022 2 次提交
    • L
      Add xpu2 compiler (#37254) · 92da5055
      Liu-xiandong 提交于
      * Add XPU compiler for paddle, test=develop
      
      * clean code
      
      * clean useless code
      
      * clean useless code
      
      * clean useless code
      
      * test
      
      * add include path
      
      * use clang compiler
      
      * xpu2.cmake
      
      * XPU2 compiler passed
      
      * update
      
      * update after pten
      
      * combination the WITH_XPU and WITH_XPU2
      
      * update the fuse operation in WITH_XPU and WITH_XPU2
      
      * update
      
      * update
      
      * update
      
      * fix the merge error
      
      * update
      
      * update the code
      
      * update the code
      
      * add run_kp_kernel flag
      
      * update
      
      * update
      
      * fix prepared type_ bug
      
      * clean and update the code
      
      * reset the kernel_primitives
      
      * update
      
      * clean the code
      
      * delete useless comment
      
      * fix the bug in WITH_XPU
      
      * update
      
      * update
      
      * modify the abi
      
      * delete some useless code
      
      * Parameter automation in xpu compilation
      
      * Parameter automation in xpu compilation
      
      * delete kps in cmake
      
      * delete useless comment
      
      * clean the code
      
      * clean the code
      92da5055
    • J
  25. 28 1月, 2022 1 次提交
  26. 27 1月, 2022 1 次提交
  27. 26 1月, 2022 1 次提交
    • L
      [pten] remove deprecated fluid op kernel for pten (#38842) · 3ab9aef1
      Leo Chen 提交于
      * update cmake file to remove fluid kernel
      
      * add pten declaration.h to where pybind.h used
      
      * fix sync_bn and tensorrt_engine
      
      * refine detection_library
      
      * fix interpreter_core
      
      * support eager legacy
      
      * fit eager legacy for pten
      
      * fall back to cpu if not found kernel
      
      * fix compile problem
      
      * fix compile problem
      
      * refine fallback logic
      
      * fit operator.run()
      
      * fix xpu compile
      
      * fit for new_exec
      
      * add REGISTER_OP_WITHOUT_GRADIENT
      
      * un-cache pt_kernel_context
      
      * fix compile
      
      * fix cudnn
      
      * fix compiling with on_infer
      
      * fix mkldnn
      
      * fix isfinite_v2
      
      * fix xpu problem
      
      * fix op_device
      
      * refine fallback for xpu
      
      * fix xpu compile
      
      * merge develop
      
      * refine code format
      
      * fix compile
      
      * fix compile
      
      * add data_transfer
      
      * fix PreparePtenData
      
      * fix cpu context
      
      * merge develop
      
      * fix compile
      
      * fix error device context
      
      * fix xpu
      
      * fix dev_ctx
      3ab9aef1