1. 29 11月, 2021 1 次提交
  2. 27 11月, 2021 1 次提交
    • A
      [NPU] reorganization for device API abstraction (#37110) · 72241a6a
      Aganlengzi 提交于
      * [NPU] reorganization for device API abstraction
      
      * [NPU] delete old files
      
      * [NPU] fix npu_collective_helper
      
      * [NPU] fix collective_helper
      
      * [NPU] fix ut
      
      * [NPU] mod memory allocation and hccl_helper
      
      * [NPU] fix place_type
      
      * [NPU] split enfoce.h
      
      * move acl* call into npu_info
      
      * merge conflict
      
      * fix merge
      
      * merge conflict
      
      * merge conflict
      72241a6a
  3. 26 11月, 2021 2 次提交
  4. 25 11月, 2021 7 次提交
  5. 24 11月, 2021 5 次提交
  6. 23 11月, 2021 6 次提交
  7. 22 11月, 2021 6 次提交
    • F
      disable copying of datatype when sharing buffer between two tensors. (#37247) · 9ec1432d
      Feiyu Chan 提交于
      * disable copying of datatype when sharing buffer between two tensors.
      * fix for mkldnn operator kernels (elementwise_add, sum, softplus, softmax, scale, activation), mannually set the data type when reusing memory by ShareBufferWith.
      9ec1432d
    • A
      Add isclose op (#37135) · d2200e97
      andyjpaddle 提交于
      * add isclose op, test=develop
      
      * add isclose op, test=develop
      
      * add isclose api, test=develop
      
      * rm useless code
      
      * rm useless code
      
      * update python api of isclose
      
      * add some unittest of isclose op, test=develop
      d2200e97
    • Z
      elu support alpha < 0 (#37316) · e3503de8
      zhupengyang 提交于
      e3503de8
    • Z
      Support zero value in dimension for slice (#37313) · e788c7b5
      zyfncg 提交于
      * support zero dim for slice op
      
      * support zero dim Tensor in set_value op
      
      * polish some debug log
      e788c7b5
    • C
      [PTen] Add variable transform to/from ptenTensor and add cast kernel (#36916) · 5caa6fc5
      chentianyu03 提交于
      * add cast kernel
      
      * add cast cuda kernel
      
      * add cast kernel
      
      * make cast kernel output dtype undefined
      
      * get cast dtype from vardesc
      
      * move cast to manipulation and add test case
      
      * add castinfershape
      
      * avoid reinitilaze variable
      
      * InitializeVariable support datatype
      
      * merge develop branch
      
      * fix merge bug
      
      * revert modify initializeVariable
      
      * revert modify on InitializeVariable
      
      * revert modify on InitializeVariable
      
      * mutable support reset dtype
      
      * enable make pten tensor from variable when def_arg.type is undefined
      
      * fix build pten ctx start_idx error
      
      * copy pten out tensor to variable
      
      * merge develop branch
      
      * fix non pten kernel cast failed
      
      * add reset allocation place for remake tensor
      
      * fix inplace realloc error
      
      * add mutable on pten kernles and remove unused cast files
      
      * rename function names
      
      * fix output type error
      
      * fix conflict with develop branch
      
      * set data type to variable with pten's dtype
      
      * fix test_cast_api type mismatch
      
      * densorTensro mutable_data support 0 bytes value
      
      * fix the inplace bug of reshape kernel
      
      * fix pten.backend != variable.place when moving storage, palce mismatch bug
      
      * fix conflict with develop branch
      
      * Fix bug of paddle::experimental::MovesStorage
      
      * fix ReMakePtenDenseTensor place mismatch bug
      
      * Revert "fix ReMakePtenDenseTensor place mismatch bug"
      
      This reverts commit 86336032f60b8a15eacd2c1ff2fa513f5d8dfd1a.
      
      * fix ReMakePtenDenseTensor place mismatch bug
      
      * reverts the set_lod interface, test=develop
      
      * modify by the review options
      
      * modify error message
      
      * add & for const input arguments
      
      * add reference in params
      
      * elementwise_sub add mutable_data
      
      * fix ResetHolderWithType check size bug
      
      * add dependence pten_tensor to test_cast_api object
      
      * remove unused code to pass ci coverage
      Co-authored-by: NChen Weihang <chenweihang@baidu.com>
      Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
      Co-authored-by: Nshixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
      5caa6fc5
    • L
      [new feature] add local scope for interpretercore (#37379) · 1f0512be
      Leo Chen 提交于
      1f0512be
  8. 19 11月, 2021 6 次提交
    • L
      bug fix shard_index (#37042) · b505ff96
      lilong12 提交于
      b505ff96
    • J
      Optimize cinn_cache_key by replace GraphToProgram to Dot string (#37317) · edc3496f
      jiangcheng 提交于
      * optimize cache-key by replace GraphToProgram to Dot string
      
      * fix compile failure bug
      edc3496f
    • W
      Add fuse_resnet_unit pass (#36818) · 3cd3bf29
      wuhuanzhou 提交于
      * GeneratePass support attr condition and mapping, test=develop
      
      * fix coverage, test=develop
      
      * Add fuse_resnet_unit pass, test=develop
      
      * fix CI errors, test=develop
      
      * fix CI errors, test=develop
      
      * fix unittest error when compiling without CUDA, test=develop
      
      * fix static ci error, test=develop
      
      * limit kernel size must equal 1, test=develop
      3cd3bf29
    • F
    • S
      Add paddle.incubate.graph_send_recv API (#37205) · 39012536
      Siming Dai 提交于
      * add cpu version, using set: sum, min, max
      
      * add cpu version: mean
      
      * improve cpu code and fix dynamic memory allcation problem
      
      * fix arg error, add index judge, delete fp16
      
      * fix bug in CudaAtomicMax and CudaAtomicMin
      
      * add CUDA version
      
      * fix grad_op bug for index
      
      * add op test, add correct cpu grad op
      
      * Add correct CUDA Mean grad
      
      * [Add] Successful MEAN and SUM
      
      * [Add] Successful MIN and MAX in CPU
      
      * [Add] Successful MIN and MAX in CUDA
      
      * fix windows dtype ci
      
      * fix ROCM ci by adding HIP flag
      
      * rename fused_gather_scatter to send_recv
      
      * unify name as send and recv
      
      * change zero index return time
      
      * add send_recv incubate api
      
      * fix index data type, add unittest case for API
      
      * delete redundant input tensor
      
      * fix en example and docs, add default value in pool_type
      
      * add shape judge and max grid judge
      
      * fix comment
      
      * fix index type bug
      
      * add const &
      
      * fix en docs
      
      * delete numpy in examples
      
      * add unittest for int input
      
      * fix send_recv comment
      
      * change send_recv to graph_send_recv
      39012536
    • L
      fix cmake dependence error (#37304) · 6653ac5e
      LiYuRio 提交于
      6653ac5e
  9. 18 11月, 2021 4 次提交
    • L
      fix bug to support dropout eval grad computing. (#37305) · c3d3001f
      Li Min 提交于
      * fix bug to support dropout eval grad computing.
      
      * Remove useless code.
      c3d3001f
    • Y
      [PTen]elementwise_sub kernel refactor (#37260) · 36a95654
      YuanRisheng 提交于
      * elementwise_add kernel refactor
      
      * fix compile bugs in elementwise_add refactor
      
      * fix compile bugs when run in npu/xpu
      
      * fix bugs when run unit test
      
      * fix bugs when run ci-windows
      
      * modify code as recommended
      
      * code format adjust
      
      * fix bugs when run ci
      
      * fix compile bug when run in ci-windwos
      
      * elementwise_sub refactor
      
      * add PD_DLL_DECL for elementwise_sub
      
      * fix bugs when compilei
      36a95654
    • Z
      Add the `GetFetchNames` method in CinnGraphSymbolization. (#37218) · 3ad495e8
      Zhen Wang 提交于
      * Add the `GetFetchNames` method in CinnGraphSymbolization.
      
      * Use unordered_set instead vector as the type of fetch_var_names.
      
      * Reuse the definition of kCompilationKey.
      
      * Use CompileOptions to set fetch_var_ids.
      
      * Update the argument passing of GraphCompiler.Build.
      
      * Fix some bugs in CinnGraphSymbolization::GetFetchIds.
      3ad495e8
    • Z
      Opt topk (#37256) · c4862d99
      zhangkaihuo 提交于
      topk中有cub和手写kernel两种实现,而cub是通过排序来获取topk,通过多组数据发现只有当input_width>=128且k超过input_width 75%的时候性能会比手写的更好。
      c4862d99
  10. 17 11月, 2021 2 次提交