1. 27 3月, 2023 4 次提交
    • R
      fix_gcc12_error (#52083) · f7267412
      risemeup1 提交于
      * fix_gcc12_error
      
      * fix gcc12 error
      
      * fix gcc12 error
      f7267412
    • S
      Fused elementwise_(mul/div) (#50428) · 968f7f24
      Sławomir Siwek 提交于
      * extract Op and OPMaker to .h
      
      * extend pattern for fused_op
      
      * set "with_residual" default to false
      
      * adjust fuse passes
      
      * remove fc+eltwise flag
      
      * fused_output_scale
      
      * activation attrs
      
      * remove extra attrs
      
      * fix int8/bf16 unit tests
      
      * simplify RecomputeOutputDims
      
      * remove unused method
      
      * Add description for attributes
      
      * add extra check
      
      * adjust op compats
      
      * update quantize test
      
      * fix protobuf parsing error
      
      * fix int8 performance
      
      * fused elementwises
      
      * merge develop
      
      * remove activation
      
      * restore activation for existing add/sub ops
      968f7f24
    • H
      14abafa1
    • S
      Fix memory efficient attention bug (#52117) · 019e1cf5
      sneaxiy 提交于
      * fix mea compile error
      
      * support 2-D bias
      
      * add inline to avoid compile error
      
      * polish codes
      019e1cf5
  2. 25 3月, 2023 1 次提交
  3. 24 3月, 2023 7 次提交
    • TaoTao Li's avatar
      add phi operator allreduce/reduce (#51857) · 47f87ad3
      TaoTao Li 提交于
      * add all_reduce, reduce kernel and api
      
      * fix all_reduce reduce ut
      
      fix reduce op maker conflict
      
      fix merge conflicts
      
      * fix conflicts, rename ReduceOp->ReduceBaseOp in reduce_ops
      
      rename allreduce op, to remove
      
      * fix code format
      
      fix comments
      
      * modify test_collective_reduce_api ut timeout
      
      * fix PR-CI-Build
      
      fix comments: format phi operator
      47f87ad3
    • Y
      [PHI Decoupling]Remove memory header (Part3) (#51288) · 3d78e759
      YuanRisheng 提交于
      * decouple memory copy
      
      * fix ci bugs
      
      * fix ci compile bugs
      
      * fix rocm compile
      
      * fix ci bugs
      
      * decouple memory
      
      * deal with conflict
      
      * fix xpu compile bugs
      
      * fix xpu bugs
      
      * deal with xpu bugs
      
      * fix cmake bugs
      
      * fix windows bugs
      
      * fix ci bugs
      
      * fix ci bugs
      
      * delete redundance code
      
      * add code for pybind
      
      * fix py3 bugs
      
      * fix ci bugs
      3d78e759
    • P
      [PHI]fix momentum dtype infer (#51353) · 648ec795
      PuQing 提交于
      * fix momentum dtype infer
      
      * fix momentum datatype
      
      * fix on cpu
      
      * add momentum
      648ec795
    • T
      【PaddlePaddle Hackathon 4 No.40】为 Paddle 优化 kthvalue op 在 GPU 上的计算性能 (#51835) · e18f5339
      thunder95 提交于
      * untracked files
      
      * kthvalue perf
      
      * remove unused files
      
      * fix isnan
      
      * fix isnan2
      
      * fix bug
      
      * try to fix rocm error
      e18f5339
    • Z
      Memory Efficient Attention (#51867) · e5ad3859
      ZhangDY-6483 提交于
      * first version, notest
      
      * return final rst, notest
      
      * use infinity() instead of max
      
      * ut structure
      
      * start up of ut
      
      * generate lse
      
      * update
      
      * add depense
      
      * reconstruct cmake
      
      * move file
      
      * add memory efficient attention and fix blasimpl
      
      * update
      
      * update cmake
      
      * add namespace
      
      * update cmake
      
      * use .cu
      
      * update for pad3d
      
      * bug fix
      
      * bug fix
      
      * update
      
      * bug fix
      
      * update enforce
      
      * add test case
      
      * merge the lse pad
      
      * fix kernel_fn of backward
      
      * fix PADDLE_ENFORCE_EQ and phi_api
      
      * fix PADDLE_ENFORCE
      
      * fix PADDLE_ENFORCE
      
      * rerun coverage
      
      * fix memory efficient attention test
      
      * rerun ci
      
      * add cuda version condition
      
      * add cuda version condition
      
      * delete WIP test
      
      * replace PADDLE_ENFORCE
      
      * edit the namespace of datatype in multiple.cc
      
      * rerun
      
      * rerun
      
      ---------
      Co-authored-by: Nliuyuang <liuyuang@baidu.com>
      e5ad3859
    • Z
    • Y
      Fix roll kernel gpu bug. (#52012) · b6d0dac9
      Yuang Liu 提交于
      b6d0dac9
  4. 23 3月, 2023 9 次提交
  5. 22 3月, 2023 12 次提交
  6. 21 3月, 2023 7 次提交