1. 27 3月, 2023 7 次提交
  2. 25 3月, 2023 1 次提交
  3. 24 3月, 2023 7 次提交
    • TaoTao Li's avatar
      add phi operator allreduce/reduce (#51857) · 47f87ad3
      TaoTao Li 提交于
      * add all_reduce, reduce kernel and api
      
      * fix all_reduce reduce ut
      
      fix reduce op maker conflict
      
      fix merge conflicts
      
      * fix conflicts, rename ReduceOp->ReduceBaseOp in reduce_ops
      
      rename allreduce op, to remove
      
      * fix code format
      
      fix comments
      
      * modify test_collective_reduce_api ut timeout
      
      * fix PR-CI-Build
      
      fix comments: format phi operator
      47f87ad3
    • Y
      [PHI Decoupling]Remove memory header (Part3) (#51288) · 3d78e759
      YuanRisheng 提交于
      * decouple memory copy
      
      * fix ci bugs
      
      * fix ci compile bugs
      
      * fix rocm compile
      
      * fix ci bugs
      
      * decouple memory
      
      * deal with conflict
      
      * fix xpu compile bugs
      
      * fix xpu bugs
      
      * deal with xpu bugs
      
      * fix cmake bugs
      
      * fix windows bugs
      
      * fix ci bugs
      
      * fix ci bugs
      
      * delete redundance code
      
      * add code for pybind
      
      * fix py3 bugs
      
      * fix ci bugs
      3d78e759
    • P
      [PHI]fix momentum dtype infer (#51353) · 648ec795
      PuQing 提交于
      * fix momentum dtype infer
      
      * fix momentum datatype
      
      * fix on cpu
      
      * add momentum
      648ec795
    • T
      【PaddlePaddle Hackathon 4 No.40】为 Paddle 优化 kthvalue op 在 GPU 上的计算性能 (#51835) · e18f5339
      thunder95 提交于
      * untracked files
      
      * kthvalue perf
      
      * remove unused files
      
      * fix isnan
      
      * fix isnan2
      
      * fix bug
      
      * try to fix rocm error
      e18f5339
    • Z
      Memory Efficient Attention (#51867) · e5ad3859
      ZhangDY-6483 提交于
      * first version, notest
      
      * return final rst, notest
      
      * use infinity() instead of max
      
      * ut structure
      
      * start up of ut
      
      * generate lse
      
      * update
      
      * add depense
      
      * reconstruct cmake
      
      * move file
      
      * add memory efficient attention and fix blasimpl
      
      * update
      
      * update cmake
      
      * add namespace
      
      * update cmake
      
      * use .cu
      
      * update for pad3d
      
      * bug fix
      
      * bug fix
      
      * update
      
      * bug fix
      
      * update enforce
      
      * add test case
      
      * merge the lse pad
      
      * fix kernel_fn of backward
      
      * fix PADDLE_ENFORCE_EQ and phi_api
      
      * fix PADDLE_ENFORCE
      
      * fix PADDLE_ENFORCE
      
      * rerun coverage
      
      * fix memory efficient attention test
      
      * rerun ci
      
      * add cuda version condition
      
      * add cuda version condition
      
      * delete WIP test
      
      * replace PADDLE_ENFORCE
      
      * edit the namespace of datatype in multiple.cc
      
      * rerun
      
      * rerun
      
      ---------
      Co-authored-by: Nliuyuang <liuyuang@baidu.com>
      e5ad3859
    • Z
    • Y
      Fix roll kernel gpu bug. (#52012) · b6d0dac9
      Yuang Liu 提交于
      b6d0dac9
  4. 23 3月, 2023 18 次提交
  5. 22 3月, 2023 7 次提交
    • HappyHeavyRain's avatar
      Support optimizers operator to be generated (#51767) · 0b008e0c
      HappyHeavyRain 提交于
      * test_get_kernel
      
      * add invoke signature
      
      * change reduce_max
      
      * change frobenius_norm
      
      * reset reduce_max according to composite and change reduce_all
      
      * fix the bug when Scalar(*)
      
      * fix 'scalar when support_tensor'
      
      * change code according to review
      
      * change 'keep_signature' to 'manual_signature' and add some erro info
      
      * support optimizers autogen
      
      * change sgd yaml
      
      * change generate signature
      
      * fix test/cpp/new_executor/CM
      
      * reset signature generated function
      
      * change signature funciton
      
      * change signature funciton
      0b008e0c
    • Y
      [Zero-Dim] Support 0-D tensor for some oneDNN unary kernels (#51687) · 2a3d75bc
      YangQun 提交于
      * support 0-d tensor for element wise unary ops
      
      * fix python code style check
      
      * fix approval check
      
      * support 0-d tensor for onednn softmax and logsoftmax kernels
      
      * fix commnets
      
      * fix some unittests
      2a3d75bc
    • S
      add fused dropout add (#51752) · 6ba0507d
      ShenLiang 提交于
      6ba0507d
    • D
      [XPU] fix distribute_fpn_proposals (#51873) · a10718e8
      duanyanhui 提交于
      * fix distribute_fpn_proposals
      
      * fix bug
      a10718e8
    • S
      Extract fused_transpose op dedicated for oneDNN fuse passes (#50021) · 02296977
      Sławomir Siwek 提交于
      * extract common methods to reuse
      
      * add header for transpose ops
      
      * fused_transpose
      
      * Split big function
      
      * transpose2 tests
      
      * fused_transpose
      
      * Apply extra attributes
      
      * add pbtxt file
      
      * update pbtxt
      
      * Merge develop
      
      * add more strict op compats
      
      * code  style
      
      * remove mkldnn_data_type
      
      * unify SetOutMemDescWithReshape2FuseSupport
      
      * adjust quantize-dequantize for transpose
      
      * remove appendact
      
      * transpose2 quantization
      
      * fix int8 tests
      
      * adjust transpose_op to current develop
      
      * delete fusion code from transpose_kernel
      
      * add fused transpose to NHWC unittest
      
      * change order
      02296977
    • P
      [PHI] Add multiclass_nms3 output defs (#51355) · 06cb6553
      PuQing 提交于
      * add nms3 register output defs
      
      * remove nms from set
      
      * remove nms from set
      06cb6553
    • B
      【AMP OP&Test】unit test for test_logit_op (#51051) · 289677e2
      Bo Zhang 提交于
      * test_logit_op
      
      * add cudaKernel to replace eigen impl
      
      * bf16 unit test CI
      289677e2