1. 06 3月, 2023 1 次提交
    • S
      oneDNN kernels code cleanup (#50743) · e2054925
      Sławomir Siwek 提交于
      * matmul refactored
      
      * fc
      
      * SetOutMemDescWithLogicalLayoutFusesSupport
      
      * matmul_v2
      
      * alpha support
      
      * group repetetive funcs
      
      * matmul utils
      
      * execute matmul methods
      
      * restore registered kernel names
      
      * split header and impl files
      
      * remove double negatives
      
      * increase coverage
      
      * add onednn tests to ctest
      
      * remove fusion logic from base matmuls
      e2054925
  2. 03 3月, 2023 5 次提交
  3. 02 3月, 2023 9 次提交
  4. 01 3月, 2023 12 次提交
    • C
      Integration flash attention (#49869) · 61611786
      Chitsing KUI 提交于
      * flash attn
      
      * seed
      
      * almost
      
      * softmax
      
      * fix workspace
      
      * add unitest; linux only
      
      * fix setup
      
      * fix datatype include
      
      * fix setup typo
      
      * fix def scope
      
      * new error api
      
      * use paddle fork
      
      * fix attr bug; complete ut
      
      * update flash hash
      
      * fix rng reset
      
      * fix offset
      
      * fix comments
      61611786
    • H
      [Tensor Operants & Prim-Relevant] Tensor supports logical operants (#50983) · 1794927b
      HongyuJia 提交于
      * Add comments for #50886
      
      * [Tensor Operants & Prim-Relevant] Tensor supports logical operants
      
      * add prim dynamic unit test
      
      * add prim static unit test
      1794927b
    • Z
      add topk prim backward (#50679) · 296b3ff0
      zqw_1997 提交于
      * tmp gather vjp
      
      * support gather
      
      * remove useless code
      
      * fix compiling error
      
      * fix ut
      
      * add eager test
      
      * add eager test
      
      * add seed
      
      * small change
      
      * fix cpu error
      
      * fix transpose op compat
      
      * remove tensor index case
      
      * fix prim_cinn
      
      * small commit
      
      * add cumsum prim backward
      
      * small commit
      
      * skip aixs=None test case
      
      * fix op generante eror
      
      * fix static test error
      
      * remove unused code
      
      * fix static test error
      
      * small commit
      
      * skip cpu float16 test case
      
      * skip eager cpu cumsum float16 test case
      
      * add eager and static UT
      
      * fix ut
      
      * add composite backward rule
      
      * fix error
      
      * fix type error and format error
      
      * add try cpu+float16 test
      
      * fix test bugs
      
      * remove test for cpu+float16 and make y[0] be the grad arg
      
      * add cinn test
      
      * fix UT
      
      * fix the wrong dim of v in test cases
      
      * change y[0] to y[1] for grad in UT
      
      * reshape flatten out
      
      * Disable cinn single test
      
      * use scatter_nd_add
      
      * modify the reshape part of topk_grad
      
      * delete useless build file
      
      * to make the syntax right
      
      * modify bug
      
      * try use of put_along_axis
      
      * remove cinn test
      
      * reformat todo
      
      * add silu composite rule
      
      * fix code style.
      
      * add cinn test
      
      * fix composite grad maker code gen
      
      * add prim in cumsum op test
      
      * remove old test
      
      * fix typro
      
      * pass the static test
      
      * fix typro
      
      * modify optest and delete old test files
      
      * remove normal test_top_k_op test
      
      * fix typro
      
      * pass axis=None test case
      
      * buffer comment
      
      * for debug
      
      * add silu fp16 unit test.
      
      * add static guard
      
      * remove forward prim test
      
      * remove same name axis
      
      * modify the test_top_v2_op.py to pass all local tests
      
      * delete the useless testcase
      
      * fix mistake
      
      * add more testcases to test dtype16 and dtype32
      
      ---------
      Co-authored-by: NJiabinYang <360788950@qq.com>
      Co-authored-by: NGGBond8488 <857631483@qq.com>
      Co-authored-by: Nzxcd <228587199@qq.com>
      Co-authored-by: NCharles-hit <wanghao107@baidu.com>
      296b3ff0
    • Y
      [Zero-Dim] Add Expand/Expand_as/Top_k for XPU to support Zero Dim Input. (#50947) · 226b4a95
      yunyaoXYY 提交于
      * Add unitest from shilong
      
      * Add kernel code from shilong
      
      * fix codestyle
      
      * add broadcast_shape test
      
      * fix unitest
      
      * fix unitests
      
      * fix unitest
      
      * add 0D grad support
      
      * add 0D grad support
      
      * add 0D grad support
      
      * fix 0D tensor
      
      * fix 0D
      
      * fix xpu 0D
      
      * fix expand kernel
      
      * fix xpu expand
      
      * Fix 0D kernel
      
      * fix 0D
      
      * fix 0D
      
      * fix 0D
      
      * fix 0D
      
      * fix XPU top_k
      
      * cancel the modify of xpu
      
      * add XPU 0D tensor
      
      * fix 0D
      226b4a95
    • W
      fix the backward bug of cumsum (#50997) · 934934d8
      wawltor 提交于
      934934d8
    • M
    • C
      fix zero bug of case18: paddle.logsumexp (#51034) · 2f900965
      chenxiao120660 提交于
      * fix bug of logsumexp
      
      * fix bug for logsumexp
      
      * fix bug for logsumexp
      2f900965
    • C
      add op map (#51026) · 83f61bd5
      cyber-pioneer 提交于
      83f61bd5
    • N
      Add multiprecision for rms op (#50132) · 48060b2e
      niuliling123 提交于
      48060b2e
    • D
      [XPU] Add kernels for VITDET (#50992) · 798b527c
      duanyanhui 提交于
      * add support of int64 add for xpu
      
      * add transpose support for int64
      
      * add randperm kernel
      
      * fix randperm
      
      * add distribute_fpn_proposal kernel
      
      * fix comment
      
      * add reduce_sum_int32
      798b527c
    • E
      fix custom plugin include headers error (#51013) · a548e70c
      engineer1109 提交于
      a548e70c
    • R
      fix gcc12 error (#51037) · ed511175
      risemeup1 提交于
      ed511175
  5. 28 2月, 2023 9 次提交
  6. 27 2月, 2023 4 次提交
    • H
      [XPU] add fp16 support for shape and lookup_table_v2 op. (#50773) · d2a0577a
      houj04 提交于
      * [XPU] add fp16 support for shape op.
      
      * [XPU] add fp16 support for lookup_table_v2 op.
      
      * update approval list: add qingshu's id.
      d2a0577a
    • 【Hackathon No.68】Remove utils in phi (#50833) · 6c181d1d
      张春乔 提交于
      * remove utils
      
      * remove utils
      
      * remove utils
      
      * remove utils
      
      * Update get_data_from_tensor.h
      
      * Update rnn_functor.h
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update rnn_kernel.cu.cc
      
      * Update rnn_kernel.cc
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update rnn_functor.h
      
      * Update rnn_kernel.cu.cc
      
      * Update rnn_kernel.cc
      
      * remove utils
      
      * Update rnn_functor.h
      
      * remove utils
      
      * remove utils
      
      * remove utils
      
      * remove utils
      
      * remove utils
      
      * Update rnn_functor.h
      
      * Update unsqueeze_op.h
      
      * Update utils.h
      
      * roll back
      
      * Update tensor_utils.h
      
      * Update tensor_utils.h
      
      * Update tensor_utils.h
      
      * Update tensor_utils.h
      
      * Update tensor_utils.h
      
      * use TensorToVector
      
      * use TensorToVector
      
      * use TensorToVector
      
      * use TensorToVector
      
      * use TensorToVector
      
      * Update rnn_kernel.cc
      
      * Update rnn_grad_kernel.cc
      
      * Update rnn_functor.h
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update rnn_kernel.cu.cc
      
      * Update rnn_functor.h
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update rnn_kernel.cu.cc
      
      * Update rnn_functor.h
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update rnn_kernel.cu.cc
      
      * add TensorToVector
      
      * roll back
      
      * Update tensor_utils.h
      
      * Update rnn_functor.h
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update tensor_utils.h
      
      * Update rnn_kernel.cu.cc
      
      * Update rnn_grad_kernel.cc
      
      * Update rnn_kernel.cc
      
      * Update rnn_grad_kernel.cu.cc
      
      * Update rnn_kernel.cu.cc
      
      * Update rnn_grad_kernel.cc
      
      * Update rnn_kernel.cc
      
      * TensorCopySync to phi::Copy
      
      * fix codestyle
      
      * rnn_kernel.cc: add ;
      
      * replace all GetDataFromTensor with phi::GetVectorFromTensor
      
      * delete include of util.h
      6c181d1d
    • H
      [Tensor Operants & Prim] Tensor pow API uses elementwise_pow (#50886) · 8a097399
      HongyuJia 提交于
      * [Tensor Operants & Prim] Tensor pow API uses elementwise_pow
      
      * unittest change to fill_constant+elementwise_pow
      8a097399
    • B
      Reduce redundant cpu computation in slice compute (#50348) · 8aec0580
      Bo Zhang 提交于
      * conflict
      
      * add UpdateSliceAttrs
      8aec0580