1. 03 12月, 2021 1 次提交
  2. 01 12月, 2021 1 次提交
  3. 29 11月, 2021 2 次提交
  4. 26 11月, 2021 1 次提交
  5. 25 11月, 2021 2 次提交
  6. 23 11月, 2021 1 次提交
  7. 19 11月, 2021 2 次提交
  8. 17 11月, 2021 1 次提交
    • P
      Upgrade oneDNN to v2.4.4 (#36226) · d08753df
      piotrekobiIntel 提交于
      * upgrade oneDNN to v2.4-rc
      
      * Removed failing test
      
      * Revert "Removed failing test"
      
      This reverts commit 60e70e717fac2c86b7beb24dfa1343a5804ea455.
      
      * Remove most tests for debugging purposes
      
      * Update hash to oneDNN 2.4
      
      * Revert test change
      
      * Update oneDNN to 2.4.2
      
      * Update oneDNN to 2.4.3
      
      * Change oneDNN version to 2.3 for Jenkins test
      
      * Revert "Change oneDNN version to 2.3 for Jenkins test"
      
      This reverts commit 0b176defc3b63f65dd0ba85873a018534f287000.
      
      * Update oneDNN to 2.4.4
      
      * Change version of oneDNN to 2.3 for new Jenkins test
      
      * Revert "Change version of oneDNN to 2.3 for new Jenkins test"
      
      This reverts commit e005a0f78f2b41cdcf4d7de3a21df7f910b78268.
      d08753df
  9. 15 11月, 2021 1 次提交
    • C
      [Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a
      Chen Weihang 提交于
      * move extension into pten [no-verify]
      
      * append tensor methods by ext_tensor [no-verify]
      
      * append other tensor methods [no-verify]
      
      * ext related files tidy [no-verify]
      
      * include relation tidy [no-verify]
      
      * add pten tensor test [no-verify]
      
      * replace tensor in custom op & compile success
      
      * refine tensor constructor for unittest
      
      * custom relu jit run success
      
      * fix all custom op unittests
      
      * add inference cmake adapt [no-verify]
      
      * fix failed unittests
      
      * fix windows failed unittests
      
      * try to fix kunlun and inference failed
      
      * fix test_elementwise_api error
      
      * try to fix win compile failed
      
      * fix kunlun fp16 type error
      
      * remove useless haddle error macro
      
      * add custom linear op test
      
      * fix compile failed & add win symbols
      
      * fix non pten kernel cast failed
      
      * add dll decl for api
      
      * polish several deetails
      
      * polish details by review comment
      
      * add dll_decl for register
      1e598f1a
  10. 11 11月, 2021 1 次提交
  11. 09 11月, 2021 2 次提交
  12. 06 11月, 2021 1 次提交
  13. 04 11月, 2021 1 次提交
  14. 02 11月, 2021 1 次提交
  15. 01 11月, 2021 3 次提交
    • C
      update cinn commit id tag to the newest one for fix some bugs (#36890) · fe81306c
      CtfGo 提交于
      update cinn commit id tag to the newest one for fix some bugs
      fe81306c
    • C
      Paddle Tensor Operation Library initial implementation (#34425) · b9fdd3bc
      Chen Weihang 提交于
      * initial tensor design & sign kernel demo
      
      * add move constructor for meta & add lodtensor
      
      * add dirs & sign xpu kernel
      
      * add mean cpu&cuda kernel impl
      
      * move sign & mean xpu & npu kernel
      
      * add selected_rows basic impl
      
      * refactor design, BaseTensor to DenseTensor, etc.
      
      * add scale mkldnn kernel
      
      * polish xpu & npu impl details
      
      * fix mkldnn reuse compile failed
      
      * change tensor operation lib name
      
      * rename util filename
      
      * add more comments
      
      * change TensorImplInterface to TensorInterface
      
      * add kernel key and factory
      
      * remove MKLDNNTensorMeta, add MKLDNNDenseTensor
      
      * change XXDeviceContext to XXContext
      
      * add base kernel registrar utils & test on sign
      
      * replace boost::any by paddle::any
      
      * fix several ci failed
      
      * fix npu compile error
      
      * add ordered map util
      
      * fix multiple ordered_map compile errors
      
      * move dev into include dir
      
      * support sign op in static op run
      
      * fix static op run error
      
      * fix new executor compile failed
      
      * add dygraph branch & remove sign_op.h
      
      * fix test_infer_no_need_buffer_slots
      
      * fix rocm compile link error
      
      * fix unitybuild error & clear glog
      
      * fix npu compile failed
      
      * skip quant trans test
      
      * fix part windows compile problem
      
      * fix xpu enforce error
      
      * fix inference test failed
      
      * remove ordered_map to solve quant failed
      
      * fix part of rcom compile faild
      
      * add more register kernels
      
      * revert scale kernel temporarily
      
      * fix code format error
      
      * add new kernel registrar marco
      
      * rename top to tcmpt
      
      * revert xpu, npu, mkldnn impl & remove op def
      
      * add kernel args parse functor to auto parse args
      
      * revert some change & add scale kernels
      
      * add op proto in dygraph kernelcontext building
      
      * polish kernel dispatch logic & nameing rule
      
      * fix scale kernel match error
      
      * fix scale test failed
      
      * add mean API and unittest
      
      * test mean api success
      
      * add branch to solve compiled error
      
      * skip clang format error
      
      * add mean skip rule in op_library
      
      * add dot kernel, api and unittest (#6)
      
      * remove old kernel and add symbol link
      
      * fix dot compiled failed
      
      * add merco for module declare
      
      * fix npu and xpu compile error
      
      * revert sign, mean, scale, dot kernel removing
      
      * add comment for keeping old kernel impl
      
      * fix mutable_data error
      
      * fix bfloat16 conflit
      
      * fix inference undef error
      
      * adapt to msvc compile rules
      
      * polish comment for template inst
      
      * add cmake template instantiation for win
      
      * fix backend to place device id bug
      
      * fix ifdef error
      
      * Op2functor (#7)
      
      * add kernel args maker class
      
      * make args maker non-const
      
      * remove debug log
      
      * modify codes by review options
      
      * split constructPrKernelContext function
      
      * fix output name bug
      
      * fix test_mean_op test_sign_op failed
      
      * fill_any_like kernel refactor (#10)
      
      * fill_any_like kernel refactor
      
      * remove useless code of full_like c++ api
      
      * skip dtype for fill_any_like
      
      * add attrs for kernel key constrcut
      
      * add use_pt_kernel Flags to control whether to use pt kernel (#13)
      
      * add use_pt_kernel Flags to control whether to use pt kernel
      
      * change the default value to true for cheking pt kernels
      
      * fix mutable_data cuda place error
      
      * move high level apis into hapi
      
      * remove selectedrows adapting temporarily
      
      * Support Scalar in Tensor Compute Library (#14)
      
      * fill_any_like kernel refactor
      
      * remove useless code of full_like c++ api
      
      * Support Scalar in Tensor Compute Library
      
      * add scalar in dygraph and static graph mode
      
      * keep the basic type for attr, instead of using scalar for all
      
      * merge the code
      
      * remove mkldnn tensor & polish details
      
      * use flat_hash_map and small_vector in kernel factory
      
      * Refactor flatten kernel (#12)
      
      * refactor flatten kernel
      
      * update infershape function
      
      * fix compile bugs
      
      * fix bugs when merge
      
      * fix compiler bugs
      
      * fix bugs when run test_flatten_api
      
      * fix bugs when run test
      
      * Revert "use flat_hash_map and small_vector in kernel factory"
      
      This reverts commit 23091495cfdd3df8cc1be592d30f09ea66a7c72b.
      
      * Move cpu, cuda and other device code into kernels (#15)
      
      * fill_any_like kernel refactor
      
      * remove useless code of full_like c++ api
      
      * Support Scalar in Tensor Compute Library
      
      * add scalar in dygraph and static graph mode
      
      * keep the basic type for attr, instead of using scalar for all
      
      * merge the code
      
      * start refactor matmul
      
      * move cpu, cuda and other device modules into kernels
      
      * merge code
      
      * polish code in operator.cc
      
      * Perfect unitests (#16)
      
      * perfect unittest
      
      * update license
      
      * replace with flat_hash_map, small_vector (#19)
      
      * fix small_vector build error on windows platform
      
      * replace with flat_hash_map, small_vector
      
      * remove todo
      
      * Perfect unitests (#20)
      
      * perfect unittest
      
      * update license
      
      * fix bug when run tcmpt_utils_test
      
      * refactor execution adapting impl
      
      * fix insert conflit
      
      * Fix CI bug of test_yolov3 (#21)
      
      * fill_any_like kernel refactor
      
      * remove useless code of full_like c++ api
      
      * Support Scalar in Tensor Compute Library
      
      * add scalar in dygraph and static graph mode
      
      * keep the basic type for attr, instead of using scalar for all
      
      * merge the code
      
      * start refactor matmul
      
      * move cpu, cuda and other device modules into kernels
      
      * merge code
      
      * polish code in operator.cc
      
      * Fix CI bug of test_yolov3
      
      * add the tensor base class, test=develop (#17)
      
      * update the tensor base class, test=develop
      
      * remove two funcs, test=develop
      
      * update the error msg, test=develop
      Co-authored-by: NChen Weihang <chenweihang@baidu.com>
      
      * [no-verify] commit backend and tensor signature changes
      
      * Rename tcmpt to pten (#23)
      
      * rename tcmpt to pten
      
      * update omitted files for rename to pten
      
      * update omitted file for rename to pten
      
      * remove k of all enum var
      
      * remove kernel_instantiate (#26)
      
      * remove symbols and spatial_tensor
      
      * change common to functions
      
      * readd share tensor impl methods
      
      * add a candidate dense tensor class, test=develop (#28)
      
      * change all Pt to Pten
      
      * resolve conflit with xiaowei
      
      * Op2functor opt1 (#27)
      
      * replace to small vector and change to const &
      
      * add std::move
      Co-authored-by: NChen Weihang <chenweihang@baidu.com>
      
      * polish kernel factory and kernel registry
      
      * fix operator test error msg mismatch
      
      * remove tensor signature and backend set member
      
      * move scalar and polish enforce
      
      * revert dtype layout change to fix error
      
      * fix enum operator override error
      
      * add several base unittests
      
      * add pten utils tests
      
      * polish some details
      
      * Dev/op2func refactor 3 (#30)
      
      * add a candidate dense tensor class, test=develop
      
      * remove TensorBase::backend(), test=develop
      
      * remove some ops, test=develop
      
      * cherry-pick the pr of tensor meta, test=develop
      
      * moves the dense tensor and some ops, test=develop
      
      * update the linalg operator, test=develop
      
      * update other operators, test=develop
      
      * fix errors, test=develop
      
      * fix bugs, test=develop
      
      * try to resolve the problem of windows ci, test=develop
      
      * updates codes, test=develop
      
      * fix the tensor_utils.cc, test=develop
      
      * modify the dense tensor, test=develop
      
      * fix the data type, test=develop
      Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
      
      * polish some details
      
      * polish kernel signature details
      
      * fix a bug about offsets of the tensor, test=develop (#31)
      Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
      
      * polish some details
      Co-authored-by: Nchentianyu03 <ctychentianyu@gmail.com>
      Co-authored-by: Nzyfncg <1370305206@qq.com>
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: N石晓伟 <39303645+Shixiaowei02@users.noreply.github.com>
      b9fdd3bc
    • S
  16. 28 10月, 2021 1 次提交
    • Z
      Fix several bugs for enabling Paddle to train with CINN. (#36739) · c93331c5
      Zhen Wang 提交于
      * Update the content of `test_parallel_executor_run_cinn.py`.
      
      * Fix some bugs in the topological sort and `CreateNewSubGraph`.
      
      * Update the CINN commit id used by Paddle.
      
      * Update the unit test to `add+relu`.
      
      * Update according to reviewers' suggestion.
      c93331c5
  17. 27 10月, 2021 1 次提交
    • H
      add paddle.linalg.eigvalsh API (#35615) · 9f9ed3ae
      huangjun12 提交于
      * add eigvalsh with is_test
      
      * add eigvalsh op
      
      * fix backward bug
      
      * forward and backward, float and complex, unittest
      
      * remove eigvalsh_helper.h
      
      * remove changes of cusolver.h
      
      * fix unittest
      
      * fix unittest bug
      
      * update code following eigh
      
      * fix test
      
      * update lapack
      
      * pull develop
      
      * update funcor
      
      * fix unittest bug
      
      * fix details
      
      * add tensor_method_func
      
      * fix notes
      9f9ed3ae
  18. 25 10月, 2021 2 次提交
    • T
      add some ops to train ssd on kunlun (#36407) · 50778ad6
      TTerror 提交于
      * add some ops to train ssd on kunlun
      
      * add some ops to train ssd on kunlun
      
      * add some ops to train ssd on kunlun
      
      * update cast op unittest
      
      * update cast op unittest
      
      * update cast op unittest
      
      * update xpu cmake
      
      * update cast unittest
      50778ad6
    • Z
      add op: fused_feedforward(forward) (#35843) · b18cbfb2
      zhangkaihuo 提交于
      这个PR只包含fused_feedforward前向的代码。
      
      相关kernel实现:fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias
      
      fused_feedforward是一个融合算子,该算子对transformer模型的feed forward层的算子进行融合和封装,使得前端只呈现一个接口,通过融合减少部分访存和kernel launch的时间,以此提升性能。
      b18cbfb2
  19. 24 10月, 2021 1 次提交
  20. 23 10月, 2021 1 次提交
    • H
      New Paddle-CINN Compile PR (#36584) · ab732884
      Huihuang Zheng 提交于
      This PR added some changes to match the CINN change for compilation. It also tried to fix JiangCheng's Problem in PR: https://github.com/PaddlePaddle/Paddle/pull/36100
      
      These changes include:
      1. Set `CINN_GIT_TAG` to a newer tag
      2. CINN now just `make cinnapi -j`
      3. We have to add `-DPY_VERSION=${PY_VERSION} -DWITH_TESTING=ON` to CINN cmake args
      4. For CINN's third party dependencies, we could just include headers without target_link_libraries
      5. Moved `cinn.cmake` from `paddle/cmake` to `paddle/cmake/external` to match old style. External folder contains `lite`, which is the same level of `cinn`
      6. CINN added `-DNAMESPACE=cinn_gflags` in `gflags.cmake` to have different gflag namespaces between CINN and Paddle. It solved re-define problem.
      7. Change namespace of `::google::` in gflags to `::GFLAGS_NAMESPACE`
      ab732884
  21. 22 10月, 2021 1 次提交
    • L
      Fused attention op forward (#35905) · d4906214
      Li Min 提交于
      功能:本PR的目标是提高attention模块的计算性能。
      为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op;
      为了减少防存开销,本PR采取了两种优化方法:
      (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次;
      (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
      d4906214
  22. 20 10月, 2021 3 次提交
    • S
      Add FasterTokenizer Operator (#34491) · 3f2d6a3f
      Steffy-zxf 提交于
      Add Tokenizer related functionalities for Transformer model in order that the process of training and predicting is consistent.
      
      * support the text string as an input Tensor
      * support the "VOCAB"unordered_map<wstring, int> as an input Tensor to lookup tokens
      * Tokenizer used for BERT. This tokenizer applies an end-to-end, text string to wordpiece tokenization.
      * It first applies basic tokenization, followed by wordpiece tokenization.
      3f2d6a3f
    • W
      adapt to cann5.0.3_alpha3. (#36106) · 873ee4e3
      wuhuachaocoding 提交于
      873ee4e3
    • H
      Add CINN Compile Option (#36292) · 6524fa8d
      Huihuang Zheng 提交于
      Add CINN compile option in CMake.
      
      Now you can use CINN in Paddle by `-DWITH_CINN=ON` when `cmake`
      
      To test it, you can run `make cinn_lib_test -j` and `ctest -R cinn_lib_test`. 
      
      Note:
      1. You should set
      ```
      export runtime_include_dir=${CINN_SOURCE_DIR}/cinn/runtime/cuda 
      ```
      When run test, the `${CINN_SOURCE_DIR}` should be set based on your CINN directory.
      
      2. CINN is under developing now, you may have to change `CINN_GIT_TAG` to the git commit you need.
      6524fa8d
  23. 19 10月, 2021 2 次提交
    • Q
      [NPU] update inference cmake, test=develop (#36505) · 49d7bd38
      Qi Li 提交于
      * [NPU] update inference cmake, test=develop
      
      * address review comments, test=develop
      
      * fix compile error when WITH_ASCEND_CXX11 ON, test=develop
      49d7bd38
    • Y
      [paddle.linalg.qr] Add the Qr Operator (#35742) · 34d785c2
      Yulong Ao 提交于
      * Add QR decomposition op
      
      * Change codes to adapt to new svd_helper
      
      * Update linalg.py
      
      Restore the deleted comma
      
      * Restore the deleted line
      
      * Update linalg.py
      
      * Update linalg.py
      
      * Improve the qr code by reviews
      
      * Update QR based on CI results
      
      * Update qr doc, test=document_fix
      
      * Change unsafe and ill-formed codes
      34d785c2
  24. 14 10月, 2021 2 次提交
  25. 11 10月, 2021 1 次提交
  26. 28 9月, 2021 2 次提交
  27. 27 9月, 2021 1 次提交
  28. 24 9月, 2021 1 次提交