1. 22 11月, 2021 6 次提交
    • F
      disable copying of datatype when sharing buffer between two tensors. (#37247) · 9ec1432d
      Feiyu Chan 提交于
      * disable copying of datatype when sharing buffer between two tensors.
      * fix for mkldnn operator kernels (elementwise_add, sum, softplus, softmax, scale, activation), mannually set the data type when reusing memory by ShareBufferWith.
      9ec1432d
    • A
      Add isclose op (#37135) · d2200e97
      andyjpaddle 提交于
      * add isclose op, test=develop
      
      * add isclose op, test=develop
      
      * add isclose api, test=develop
      
      * rm useless code
      
      * rm useless code
      
      * update python api of isclose
      
      * add some unittest of isclose op, test=develop
      d2200e97
    • Z
      elu support alpha < 0 (#37316) · e3503de8
      zhupengyang 提交于
      e3503de8
    • Z
      Support zero value in dimension for slice (#37313) · e788c7b5
      zyfncg 提交于
      * support zero dim for slice op
      
      * support zero dim Tensor in set_value op
      
      * polish some debug log
      e788c7b5
    • C
      [PTen] Add variable transform to/from ptenTensor and add cast kernel (#36916) · 5caa6fc5
      chentianyu03 提交于
      * add cast kernel
      
      * add cast cuda kernel
      
      * add cast kernel
      
      * make cast kernel output dtype undefined
      
      * get cast dtype from vardesc
      
      * move cast to manipulation and add test case
      
      * add castinfershape
      
      * avoid reinitilaze variable
      
      * InitializeVariable support datatype
      
      * merge develop branch
      
      * fix merge bug
      
      * revert modify initializeVariable
      
      * revert modify on InitializeVariable
      
      * revert modify on InitializeVariable
      
      * mutable support reset dtype
      
      * enable make pten tensor from variable when def_arg.type is undefined
      
      * fix build pten ctx start_idx error
      
      * copy pten out tensor to variable
      
      * merge develop branch
      
      * fix non pten kernel cast failed
      
      * add reset allocation place for remake tensor
      
      * fix inplace realloc error
      
      * add mutable on pten kernles and remove unused cast files
      
      * rename function names
      
      * fix output type error
      
      * fix conflict with develop branch
      
      * set data type to variable with pten's dtype
      
      * fix test_cast_api type mismatch
      
      * densorTensro mutable_data support 0 bytes value
      
      * fix the inplace bug of reshape kernel
      
      * fix pten.backend != variable.place when moving storage, palce mismatch bug
      
      * fix conflict with develop branch
      
      * Fix bug of paddle::experimental::MovesStorage
      
      * fix ReMakePtenDenseTensor place mismatch bug
      
      * Revert "fix ReMakePtenDenseTensor place mismatch bug"
      
      This reverts commit 86336032f60b8a15eacd2c1ff2fa513f5d8dfd1a.
      
      * fix ReMakePtenDenseTensor place mismatch bug
      
      * reverts the set_lod interface, test=develop
      
      * modify by the review options
      
      * modify error message
      
      * add & for const input arguments
      
      * add reference in params
      
      * elementwise_sub add mutable_data
      
      * fix ResetHolderWithType check size bug
      
      * add dependence pten_tensor to test_cast_api object
      
      * remove unused code to pass ci coverage
      Co-authored-by: NChen Weihang <chenweihang@baidu.com>
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
      5caa6fc5
    • L
      [new feature] add local scope for interpretercore (#37379) · 1f0512be
      Leo Chen 提交于
      1f0512be
  2. 19 11月, 2021 6 次提交
    • L
      bug fix shard_index (#37042) · b505ff96
      lilong12 提交于
      b505ff96
    • J
      Optimize cinn_cache_key by replace GraphToProgram to Dot string (#37317) · edc3496f
      jiangcheng 提交于
      * optimize cache-key by replace GraphToProgram to Dot string
      
      * fix compile failure bug
      edc3496f
    • W
      Add fuse_resnet_unit pass (#36818) · 3cd3bf29
      wuhuanzhou 提交于
      * GeneratePass support attr condition and mapping, test=develop
      
      * fix coverage, test=develop
      
      * Add fuse_resnet_unit pass, test=develop
      
      * fix CI errors, test=develop
      
      * fix CI errors, test=develop
      
      * fix unittest error when compiling without CUDA, test=develop
      
      * fix static ci error, test=develop
      
      * limit kernel size must equal 1, test=develop
      3cd3bf29
    • F
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
    • L
      fix cmake dependence error (#37304) · 6653ac5e
      LiYuRio 提交于
      6653ac5e
  3. 18 11月, 2021 4 次提交
    • L
      fix bug to support dropout eval grad computing. (#37305) · c3d3001f
      Li Min 提交于
      * fix bug to support dropout eval grad computing.
      
      * Remove useless code.
      c3d3001f
    • Y
      [PTen]elementwise_sub kernel refactor (#37260) · 36a95654
      YuanRisheng 提交于
      * elementwise_add kernel refactor
      
      * fix compile bugs in elementwise_add refactor
      
      * fix compile bugs when run in npu/xpu
      
      * fix bugs when run unit test
      
      * fix bugs when run ci-windows
      
      * modify code as recommended
      
      * code format adjust
      
      * fix bugs when run ci
      
      * fix compile bug when run in ci-windwos
      
      * elementwise_sub refactor
      
      * add PD_DLL_DECL for elementwise_sub
      
      * fix bugs when compilei
      36a95654
    • Z
      Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8
      Zhen Wang 提交于
      * Add the `GetFetchNames` method in CinnGraphSymbolization.
      
      * Use unordered_set instead vector as the type of fetch_var_names.
      
      * Reuse the definition of kCompilationKey.
      
      * Use CompileOptions to set fetch_var_ids.
      
      * Update the argument passing of GraphCompiler.Build.
      
      * Fix some bugs in CinnGraphSymbolization::GetFetchIds.
      3ad495e8
    • Z
      Opt topk (#37256) · c4862d99
      zhangkaihuo 提交于
      topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
      c4862d99
  4. 17 11月, 2021 6 次提交
  5. 16 11月, 2021 5 次提交
    • A
      Added BF16 Pool2d grad (#37081) · f95d44a2
      arlesniak 提交于
      * Added BF16 Pool2d grad
      
      * upstream pulled
      
      * fix for CI
      
      * fixes after review
      f95d44a2
    • Y
      Add API and unit test for reshape (#37232) · 79b49c20
      YuanRisheng 提交于
      * reshape kernel refactor
      
      * fix compile bugs when run ci
      
      * support xpu for reshape
      
      * fix bugs when run unittest in kunlun ci
      
      * fix compile bugs when run kunlun
      
      * perfect code according to suggestion
      
      * add api and unit test for reshape
      79b49c20
    • Y
      Make FLAGS_determinstic effective in conv2d forward. (#37173) · ea47d211
      Yiqun Liu 提交于
      * Make FLAGS_determinstic effective in conv2d forward.
      
      * Add call of SetCinnCudnnDeterministic in cinn_launch op.
      ea47d211
    • J
      added onednn elu kernel (#37149) · ae40ee32
      jakpiase 提交于
      ae40ee32
    • L
      Fix attn_bias_add bug. (#37147) · a9e7a854
      Li Min 提交于
      fused_attention_op的实现中,使用了bias_add,且其实现是通过使用kernel primitive来实现的,之后kernel primitive的WriteData api接口及函数内部实现发生了更改,将判断越界的逻辑移到了template的参数中,使得调用的分支有错误,产生了越界赋值操作,污染了别的显存空间的内容。具体表现为:test_fused_attention_op_api.py 单次执行基本上不会报错,多次循环执行不同shape的输入,结果计算不对,具有偶发性,bug不易察觉。
      a9e7a854
  6. 15 11月, 2021 6 次提交
    • C
      [Pten] Refactor the implementation of custom operator (#37122) · 1e598f1a
      Chen Weihang 提交于
      * move extension into pten [no-verify]
      
      * append tensor methods by ext_tensor [no-verify]
      
      * append other tensor methods [no-verify]
      
      * ext related files tidy [no-verify]
      
      * include relation tidy [no-verify]
      
      * add pten tensor test [no-verify]
      
      * replace tensor in custom op & compile success
      
      * refine tensor constructor for unittest
      
      * custom relu jit run success
      
      * fix all custom op unittests
      
      * add inference cmake adapt [no-verify]
      
      * fix failed unittests
      
      * fix windows failed unittests
      
      * try to fix kunlun and inference failed
      
      * fix test_elementwise_api error
      
      * try to fix win compile failed
      
      * fix kunlun fp16 type error
      
      * remove useless haddle error macro
      
      * add custom linear op test
      
      * fix compile failed & add win symbols
      
      * fix non pten kernel cast failed
      
      * add dll decl for api
      
      * polish several deetails
      
      * polish details by review comment
      
      * add dll_decl for register
      1e598f1a
    • F
      fix:delete macro INFERENCE (#37130) · b628c316
      feng_shuai 提交于
      b628c316
    • A
      Added BF16 to mean op (#37104) · df7cc457
      arlesniak 提交于
      * Added BF16 to mean op
      
      * fix for CI
      
      * fix for CI
      
      * fix for CI
      df7cc457
    • W
      [New features] Add elementwise_mul triple grad kernel (#37152) · 59fdf4da
      Weilong Wu 提交于
      * Add elementwise_mul triple grad kernel
      
      * Removed InplaceInferer and polished code
      59fdf4da
    • Z
      Add distributed pass framework: including PassBase/PassTest/PassUtils (#36643) · 12339fa0
      Zeng Jinle 提交于
      * add split_program
      
      * make ut faster
      
      * increase ut timeout
      
      * make result deterministic
      
      * add fuse_all_reduce pass
      
      * add ut framework, update
      
      * fix ut framework
      
      * remove useless code
      
      * add coverage support
      
      * update
      
      * fix CI
      
      * fix some bugs and fix ci coverage
      
      * fix conflict
      12339fa0
    • Z
      [heterps]bug fix for local training with --heter_worker_num (#37166) · 31cd9145
      zmx 提交于
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix. test=develop
      
      * fix ut. test=develop
      
      * fix ut. test=develop
      
      * fix ut. test=develop
      31cd9145
  7. 14 11月, 2021 1 次提交
    • Y
      [PTen]Reshape Kernel Refactor (#37164) · 895692e3
      YuanRisheng 提交于
      * reshape kernel refactor
      
      * fix compile bugs when run ci
      
      * support xpu for reshape
      
      * fix bugs when run unittest in kunlun ci
      
      * fix compile bugs when run kunlun
      
      * perfect code according to suggestion
      895692e3
  8. 13 11月, 2021 1 次提交
  9. 12 11月, 2021 5 次提交