1. 30 12月, 2021 13 次提交
    • Z
      Add cusparse and unittest (#38431) · 667dc9f0
      zhangkaihuo 提交于
      
      
          将cuSparse的handle与DeviceContext进行绑定,避免op中进行创建和销毁
          添加对cuSparse中dense和sparse转换的API进行封装
          添加对封装的API的单测
      667dc9f0
    • L
      [Fleet Executor] Support multi carrier (#38535) · 3658405c
      LiYuRio 提交于
      3658405c
    • J
      Support test imperative basic with fixed retain grad interface (#38548) · 2421a25a
      Jiabin Yang 提交于
      * Rearranged Eager AutoCodeGen directory structure
      
      * Removed USE_OP in Eager AutoCodeGen
      
      * Enabled generation for Operators without Grad/Inputs/Outputs
      
      * Resolved operators without input
      
      * Fixed merge conflicts
      
      * Enabled Eager AutoCodeGen for 10+ more operators
      
      * Refactored Eager AutoCodeGen with more organized helper objects
      
      * Enabled Eager AutoCodeGen for operators with multiple OpBases
      
      * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument
      
      * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen
      
      * Adjusted function generation/call between Python-C API & Dygraph API
      
      * Synchronized auto-generated Python-C API with Dygraph Forward Functions
      
      * support more eager tensor api
      
      * fix merge compile error
      
      * fix compile error and fit develop code
      
      * support pure CPU
      
      * fix some logic error in eager_mode
      
      * support _varbase_creator in eager mode
      
      * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs
      
      * for eager mode
      
      * refine
      
      * support multiple constructor for eager tensor
      
      * add place related code
      
      * polish code
      
      * specific randint with dtype of int64
      
      * Support pure cpu test
      
      * eager logic
      
      * refine test in pure cpu
      
      * eager logic
      
      * eager logic
      
      * eager logic, test=develop
      
      * skip core.eager when in inference, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * call RetainGrad after run forward kernel, test=develop
      
      * refine, test=develop
      
      * support dygraph util, meta, guard test
      
      * support inference test
      
      * refine test and fix initializer failed
      
      * support create varbase and fix retain grad error
      
      * fix windows error
      
      * support test_imperative_basic test in eager mode
      
      * remove additional log in variable.h
      
      * remove additional log in variable.h
      
      * remove additional code create in merge
      Co-authored-by: Njim19930609 <jim19930609@gmail.com>
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      2421a25a
    • W
      dynamic shape clone (#38520) · 339c34e6
      wenbin 提交于
      * dynamic shape clone supported
      339c34e6
    • L
      first commit (#38590) · ebc72ac2
      limingshu 提交于
      ebc72ac2
    • X
      [New-Exe]Fix word2vec hang proble using InterpreterCore (#38584) · e683ab50
      xiongkun 提交于
      * fix wait for tiexing
      
      * fix work2vec model. new_exe support EOF Exception in ReadOp now
      e683ab50
    • X
      refine run_program_op_grad output var name (#38470) · 1c094d3e
      xiongkun 提交于
      * refine run_program_op_grad output var name
      
      * add default for global_block. for pass the eagle_generator_cmd
      
      * fix
      
      * ;
      
      * fix
      
      * const cast
      
      * mutable block
      1c094d3e
    • J
      Added Conv2D BF16 BWD oneDNN kernel (#38507) · ed8ba011
      jakpiase 提交于
      * working test for padding only
      
      * added full conv2d grad kernel
      
      * removed some trash
      
      * minor change
      
      * Ci fix
      
      * format fix
      ed8ba011
    • J
      params file will not be a nessary file (#38579) · de26b88b
      JingZhuangzhuang 提交于
      de26b88b
    • C
      [PTen] Remove offset in storage (#38472) · a504ff3f
      Chen Weihang 提交于
      * remove offset in storage
      
      * revert api change
      
      * fix custom op slice bug
      
      * fix mutable_data error
      a504ff3f
    • F
      Replace shared_ptr with unique_ptr in base_ptr_test (#38530) · 3f6229c6
      From00 提交于
      3f6229c6
    • X
      add dirichlet random sample op in cpu and gpu kernel (#38244) · c5bf09bb
      Xiaoxu Chen 提交于
      * add dirichlet sample op and cpu backend kernel
      
      * add Dirichlet op cuda kernel  (#6)
      
      * add dirichlet op hip kernel
      Co-authored-by: NFeiyu Chan <chenfeiyu@baidu.com>
      c5bf09bb
    • L
      Fix the bug of batch_norm and batch_norm_grad op. (#38288) · cc83c95f
      Leo Guo 提交于
      * Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list.
      
      * Fix the bug of batch_norm and batch_norm_grad op. Add the "roi_align" and "roi_align_grad" op in xpu2 op list. test=kunlun
      Co-authored-by: NZibin <guozibin@baidu.com>
      cc83c95f
  2. 29 12月, 2021 13 次提交
  3. 28 12月, 2021 12 次提交
    • L
      Support multi-output feature for elementwise (#38410) · 48f061fb
      limingshu 提交于
      * first commit
      
      * pass ctest of  elementwise_div_grad
      48f061fb
    • F
      Utilize StreamSafeCUDAAllocator to support fast GC in new executor (#37642) · 0c7153a4
      From00 提交于
      * fix reshape move storage error
      
      * remove needless set type
      
      * alloc tensor by shared storage
      
      * Utilize StreamSafeCUDAAllocator to support fast GC in new executor
      
      * Fix compile error for Windows and ROCm
      
      * Fix compile error for Windows
      
      * Modify UT stream_safe_cuda_alloc_test
      
      * Modify UT stream_safe_cuda_alloc_test
      
      * Rewrite fast GC
      
      * Rewrite fast GC
      
      * Fix compile error for BOOST_GET_CONST
      
      * Fix compile error for BOOST_GET_CONST
      
      * Changes default stream for StreamSafeCUDAAllocator
      
      * Fix a small CI error
      
      * Remove some redundant code
      
      * Fix conflict
      
      * Fix compile error for ROCm
      
      * Fix Windoes CI error
      
      * Fix CI error
      
      * Remove some unnecessary code
      
      * Fix CI error
      
      * Add UT for fast GC
      
      * Fix CI error
      
      * add device-agnostic stream class
      
      * add stream.h
      
      * fix ut
      
      * fix cpu compile
      
      * Use RWLock in GetAllocator
      
      * Fix CI error
      Co-authored-by: NChen Weihang <chenweihang@baidu.com>
      Co-authored-by: Nzhiqiu <chenqiuliang@baidu.com>
      0c7153a4
    • J
      Support test basic of Var and Layer (#38426) · 1fb80a6a
      Jiabin Yang 提交于
      * Rearranged Eager AutoCodeGen directory structure
      
      * Removed USE_OP in Eager AutoCodeGen
      
      * Enabled generation for Operators without Grad/Inputs/Outputs
      
      * Resolved operators without input
      
      * Fixed merge conflicts
      
      * Enabled Eager AutoCodeGen for 10+ more operators
      
      * Refactored Eager AutoCodeGen with more organized helper objects
      
      * Enabled Eager AutoCodeGen for operators with multiple OpBases
      
      * Adjusted Eager AutoCodeGen to Enable Passing Output Tensor as Input Argument
      
      * Handled Dispensable Inputs/Outputs in Eager AutoCodeGen
      
      * Adjusted function generation/call between Python-C API & Dygraph API
      
      * Synchronized auto-generated Python-C API with Dygraph Forward Functions
      
      * support more eager tensor api
      
      * fix merge compile error
      
      * fix compile error and fit develop code
      
      * support pure CPU
      
      * fix some logic error in eager_mode
      
      * support _varbase_creator in eager mode
      
      * Added safe_initialized interface to EagerTensor for use in processing dispensable inputs
      
      * for eager mode
      
      * refine
      
      * support multiple constructor for eager tensor
      
      * add place related code
      
      * polish code
      
      * specific randint with dtype of int64
      
      * Support pure cpu test
      
      * eager logic
      
      * refine test in pure cpu
      
      * eager logic
      
      * eager logic
      
      * eager logic, test=develop
      
      * skip core.eager when in inference, test=develop
      
      * refine, test=develop
      
      * refine, test=develop
      
      * call RetainGrad after run forward kernel, test=develop
      
      * refine, test=develop
      
      * support dygraph util, meta, guard test
      
      * support inference test
      
      * refine test and fix initializer failed
      
      * support create varbase and fix retain grad error
      
      * fix windows error
      
      * support test code coverage
      
      * support test code coverage
      
      * support test code coverage
      Co-authored-by: Njim19930609 <jim19930609@gmail.com>
      Co-authored-by: NWang Huan <wanghuan29@baidu.com>
      1fb80a6a
    • Z
      refactor matmul directory in pten (#38227) · 982bf444
      zyfncg 提交于
      * refactor matmul directory in pten
      
      * fix merge conflict
      982bf444
    • H
      Add API and op for take_along_axis (#38396) · 3310f519
      huangxu96 提交于
      * add API and op for take_along_axis
      
      * fix compile dependency problem and add example code and doc
      
      * add unitest
      
      * delete some code for CI coverage
      
      * fix code style problem
      
      * fix as review
      3310f519
    • G
      fix adamw epsilon in cuda kernel (#37746) · 6f1bb3d6
      Guoxia Wang 提交于
      6f1bb3d6
    • T
      Add Amax and Amin API (#38417) · 340dfb26
      Tao Luo 提交于
      * add amax/amin
      
      * support axis is list
      340dfb26
    • C
      [pten] remove in_type arg in cast kernel (#38486) · 0637b9a6
      chentianyu03 提交于
      * remove intype arg in cast kernel
      
      * modify conj config in api.yaml by dictionary order
      
      * rm unused code in cast_kernel.cu
      0637b9a6
    • H
      add reduce_prod_xpu. fix reduce_mean_xpu bug. (#38481) · 78836bb7
      houj04 提交于
      * add reduce_prod_xpu. fix reduce_mean_xpu bug.
      
      * iadd reduce_prod_xpu. fix reduce_mean_xpu bug. test=kunlun
      78836bb7
    • L
      [new-exec] add completion_nofifier (#38447) · 404a4a6a
      Leo Chen 提交于
      * add completion_nofifier
      
      * fix bug
      
      * unregist event waiter
      404a4a6a
    • B
      add mul_lstm_fuse_pass ut (#37795) · 1db61c3e
      baoachun 提交于
      * add mul_lstm_fuse_pass ut
      
      * update mul_lstm_fuse_pass ut
      
      * update ut
      
      * update ut
      
      * update ut
      
      * add CPU ut cmake setting
      
      * update ut
      1db61c3e
    • L
      f9e8a775
  4. 27 12月, 2021 2 次提交