1. 19 11月, 2021 4 次提交
    • W
      Add fuse_resnet_unit pass (#36818) · 3cd3bf29
      wuhuanzhou 提交于
      * GeneratePass support attr condition and mapping, test=develop
      
      * fix coverage, test=develop
      
      * Add fuse_resnet_unit pass, test=develop
      
      * fix CI errors, test=develop
      
      * fix CI errors, test=develop
      
      * fix unittest error when compiling without CUDA, test=develop
      
      * fix static ci error, test=develop
      
      * limit kernel size must equal 1, test=develop
      3cd3bf29
    • F
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
    • L
      fix cmake dependence error (#37304) · 6653ac5e
      LiYuRio 提交于
      6653ac5e
  2. 18 11月, 2021 4 次提交
    • L
      fix bug to support dropout eval grad computing. (#37305) · c3d3001f
      Li Min 提交于
      * fix bug to support dropout eval grad computing.
      
      * Remove useless code.
      c3d3001f
    • Y
      [PTen]elementwise_sub kernel refactor (#37260) · 36a95654
      YuanRisheng 提交于
      * elementwise_add kernel refactor
      
      * fix compile bugs in elementwise_add refactor
      
      * fix compile bugs when run in npu/xpu
      
      * fix bugs when run unit test
      
      * fix bugs when run ci-windows
      
      * modify code as recommended
      
      * code format adjust
      
      * fix bugs when run ci
      
      * fix compile bug when run in ci-windwos
      
      * elementwise_sub refactor
      
      * add PD_DLL_DECL for elementwise_sub
      
      * fix bugs when compilei
      36a95654
    • Z
      Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8
      Zhen Wang 提交于
      * Add the `GetFetchNames` method in CinnGraphSymbolization.
      
      * Use unordered_set instead vector as the type of fetch_var_names.
      
      * Reuse the definition of kCompilationKey.
      
      * Use CompileOptions to set fetch_var_ids.
      
      * Update the argument passing of GraphCompiler.Build.
      
      * Fix some bugs in CinnGraphSymbolization::GetFetchIds.
      3ad495e8
    • Z
      Opt topk (#37256) · c4862d99
      zhangkaihuo 提交于
      topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
      c4862d99
  3. 17 11月, 2021 6 次提交
  4. 16 11月, 2021 5 次提交
    • A
      Added BF16 Pool2d grad (#37081) · f95d44a2
      arlesniak 提交于
      * Added BF16 Pool2d grad
      
      * upstream pulled
      
      * fix for CI
      
      * fixes after review
      f95d44a2
    • Y
      Add API and unit test for reshape (#37232) · 79b49c20
      YuanRisheng 提交于
      * reshape kernel refactor
      
      * fix compile bugs when run ci
      
      * support xpu for reshape
      
      * fix bugs when run unittest in kunlun ci
      
      * fix compile bugs when run kunlun
      
      * perfect code according to suggestion
      
      * add api and unit test for reshape
      79b49c20
    • Y
      Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
      Yiqun Liu 提交于
      * Make FLAGS_determinstic effective in conv2d forward.
      
      * Add call of SetCinnCudnnDeterministic in cinn_launch op.
      ea47d211
    • J
      added onednn elu kernel (#37149) · ae40ee32
      jakpiase 提交于
      ae40ee32
    • L
      Fix attn_bias_add bug. (#37147) · a9e7a854
      Li Min 提交于
      fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
      a9e7a854
  5. 15 11月, 2021 6 次提交
    • C
      [Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a
      Chen Weihang 提交于
      * move extension into pten [no-verify]
      
      * append tensor methods by ext_tensor [no-verify]
      
      * append other tensor methods [no-verify]
      
      * ext related files tidy [no-verify]
      
      * include relation tidy [no-verify]
      
      * add pten tensor test [no-verify]
      
      * replace tensor in custom op & compile success
      
      * refine tensor constructor for unittest
      
      * custom relu jit run success
      
      * fix all custom op unittests
      
      * add inference cmake adapt [no-verify]
      
      * fix failed unittests
      
      * fix windows failed unittests
      
      * try to fix kunlun and inference failed
      
      * fix test_elementwise_api error
      
      * try to fix win compile failed
      
      * fix kunlun fp16 type error
      
      * remove useless haddle error macro
      
      * add custom linear op test
      
      * fix compile failed & add win symbols
      
      * fix non pten kernel cast failed
      
      * add dll decl for api
      
      * polish several deetails
      
      * polish details by review comment
      
      * add dll_decl for register
      1e598f1a
    • F
      fix:delete macro INFERENCE (#37130) · b628c316
      feng_shuai 提交于
      b628c316
    • A
      Added BF16 to mean op (#37104) · df7cc457
      arlesniak 提交于
      * Added BF16 to mean op
      
      * fix for CI
      
      * fix for CI
      
      * fix for CI
      df7cc457
    • W
      [New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
      Weilong Wu 提交于
      * Add elementwise_mul triple grad kernel
      
      * Removed InplaceInferer and polished code
      59fdf4da
    • Z
      Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0
      Zeng Jinle 提交于
      * add split_program
      
      * make ut faster
      
      * increase ut timeout
      
      * make result deterministic
      
      * add fuse_all_reduce pass
      
      * add ut framework, update
      
      * fix ut framework
      
      * remove useless code
      
      * add coverage support
      
      * update
      
      * fix CI
      
      * fix some bugs and fix ci coverage
      
      * fix conflict
      12339fa0
    • Z
      [heterps]bug fix for local training with --heter_worker_num (#37166) · 31cd9145
      zmx 提交于
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix ut. test=develop
      
      * fix ut. test=develop
      
      * fix ut. test=develop
      31cd9145
  6. 14 11月, 2021 1 次提交
    • Y
      [PTen]Reshape Kernel Refactor (#37164) · 895692e3
      YuanRisheng 提交于
      * reshape kernel refactor
      
      * fix compile bugs when run ci
      
      * support xpu for reshape
      
      * fix bugs when run unittest in kunlun ci
      
      * fix compile bugs when run kunlun
      
      * perfect code according to suggestion
      895692e3
  7. 13 11月, 2021 1 次提交
  8. 12 11月, 2021 5 次提交
  9. 11 11月, 2021 4 次提交
    • Z
      a41447f0
    • T
      add where/where_index/masked_select for kunlun (#37053) · f5e7b02a
      TTerror 提交于
      * add where/where_index/masked_select for kunlun
      
      * fix where/where_index
      
      * update where/masked_select
      f5e7b02a
    • J
      Added softplus + activation oneDNN fuse pass (#36657) · a346c4dc
      jakpiase 提交于
      * added softplus + activation fuse plass
      
      * minor change
      
      * implemented reviewer suggestion
      
      * minor fix
      
      * minor fix
      
      * added scale_out parameter
      
      * minor fix
      
      * fix for iScan CI
      
      * conditionally disabled logs
      
      * refactored pass builder
      a346c4dc
    • Z
      [Heterps]Refactor Heter Pipeline Parameter Server (#36845) · a2da1efa
      zmx 提交于
      * change username
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * fix
      
      * update
      
      * update
      
      * update unittests
      
      * fix
      
      * update
      
      * fix
      
      * update
      
      * fix
      
      * fix
      
      * fix
      
      * update
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update send_and_recv op. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * update. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix ut. test=develop
      
      * fix unit. notest,test=coverage
      
      * fix ut. notest, test=coverage
      
      * update. notest,test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix. notest, test=coverage
      
      * fix. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * add func. notest, test=coverage
      
      * fix ut. notest, test=coverage
      
      * fix. test=develop
      
      * fix. test=develop
      a2da1efa
  10. 10 11月, 2021 3 次提交
  11. 09 11月, 2021 1 次提交