1. 19 11月, 2021 4 次提交
    • W
      Add fuse_resnet_unit pass (#36818) · 3cd3bf29
      wuhuanzhou 提交于
      * GeneratePass support attr condition and mapping, test=develop
      
      * fix coverage, test=develop
      
      * Add fuse_resnet_unit pass, test=develop
      
      * fix CI errors, test=develop
      
      * fix CI errors, test=develop
      
      * fix unittest error when compiling without CUDA, test=develop
      
      * fix static ci error, test=develop
      
      * limit kernel size must equal 1, test=develop
      3cd3bf29
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
    • Y
      [fleet_executor] Parse pipeline config (#37319) · ca088f92
      Yuang Liu 提交于
      ca088f92
    • 0
      [Dy2stat]Support `for i in [1,2,3]` statements in dy2stat (#37259) · d772a9aa
      0x45f 提交于
      * support `for i in [1,2,3]` statements in dy2stat
      
      * add test case
      
      * fix ci
      
      * remove wrong code
      d772a9aa
  2. 18 11月, 2021 7 次提交
  3. 17 11月, 2021 6 次提交
  4. 16 11月, 2021 5 次提交
    • A
      Added BF16 Pool2d grad (#37081) · f95d44a2
      arlesniak 提交于
      * Added BF16 Pool2d grad
      
      * upstream pulled
      
      * fix for CI
      
      * fixes after review
      f95d44a2
    • W
      Fix the logic of VarBase _to func (#37193) · f29a3c68
      Weilong Wu 提交于
      f29a3c68
    • Z
      Make Distributed Pass UT Timeout Smaller (#37199) · a01e27cc
      Zeng Jinle 提交于
      * make pass ut timeout smaller
      
      * increate ut timeout
      a01e27cc
    • J
      added onednn elu kernel (#37149) · ae40ee32
      jakpiase 提交于
      ae40ee32
    • L
      Fix attn_bias_add bug. (#37147) · a9e7a854
      Li Min 提交于
      fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
      a9e7a854
  5. 15 11月, 2021 9 次提交
  6. 12 11月, 2021 7 次提交
  7. 11 11月, 2021 2 次提交