1. 18 1月, 2023 5 次提交
    • R
      [PHI] remove bitwise and, or, xor (#49916) · 9056cc8b
      RuohengMa 提交于
      * add reduce_sum_int64 and reduce_sum_int8 xpu kernels
      
      * [PHI] add clip grad kernel with support type float32 and int32
      
      * [PHI unittest] add clip_grad unit test
      
      * adapt code to clang-format
      
      * update xpu api output with clip_grad api
      
      * remove int8 support of reduce_sum xpu kernel since it can not pass unit tests
      
      * adapt license date, add code for XPUDataType convertion
      
      * add int8 support of reduce_sum
      
      * add reduce_sum unit tests for dtype int64, int8, and add more test cases
      
      * update license date
      
      * remove buggy bitwise and, or and xor xpu kernels, refine bitwise not xpu kernel
      
      * change license date
      9056cc8b
    • H
      [XPU] add logical_not op. (#49911) · 60d1199a
      houj04 提交于
      60d1199a
    • W
      [0 Tensor support] support the 0d tensor for the cumsum (#49518) · 5fca45ea
      wawltor 提交于
      * Add the cumsum 0d tensor
      
      * xpu and cpu judge the 0d  tensor
      
      * change to 2022 to 2023 in new commit
      
      * fix the reverse logic
      5fca45ea
    • Z
      [Zero-Dim] Fix bug in masked_select for XPU (#49904) · 1a8be158
      Zhang Zheng 提交于
      1a8be158
    • J
      use default XPU stream for computing (#49806) · f6b23d6d
      jameszhang 提交于
      * revert to use default XPU stream for computing
      
      XPUContext now has a null stream by default. If you want to use a separate stream
       (e.g. in async collective communication), you should create a dedicated XPUContext
      and invoke its XPUContext::CreateStream()
      
      * minor
      f6b23d6d
  2. 17 1月, 2023 6 次提交
    • Z
      Refine munmap freq for RefcountedMemoryMapAllocation (#49691) · 3fdc105f
      zhangbo9674 提交于
      * refine munmap freq for ref_cnt_mmap_allocator
      
      * add shm reuse logic
      
      * fix compile bug
      
      * fix compile bug
      
      * fix bug of file refcount
      
      * fix compile bug
      
      * fix compile bug
      
      * refine code for delete shm case
      
      * polish code
      
      * refine shm cache pool size setting logic
      
      * set buffer is 2
      
      * refine shm cache size logic
      
      * refine max shm cache
      
      * refine shm cache size
      3fdc105f
    • Y
      [Zero-Dim] support input 0D Tensor for equal_all (#49845) · f287b1e9
      yeliang2258 提交于
      * add zero dims test
      
      * update code
      
      * fix zero dims
      
      * update code
      f287b1e9
    • P
      support CUDA Graph for new executor (#49708) · 8e5ed04d
      pangyoki 提交于
      * new exe supports CUDA Graph
      
      * fix
      
      * fix
      
      * fix
      
      * fix FLAGS_use_stream_safe_cuda_allocator in unittest
      
      * insert output of coalesce_tensor op to skip_gc_var
      
      * fix
      8e5ed04d
    • Y
      [PHI]Change feed_op to phi kernel (#49116) · f7f1dc03
      YuanRisheng 提交于
      * change feed_op to phi kernel
      
      * fix ci bugs
      
      * fix build bugs
      
      * fix ci bugs
      
      * fix compile bugs
      
      * fix ci bugs
      
      * perfect code
      
      * perfect comment code
      
      * fix install bugs
      
      * modify code according comment
      
      * remove visitor in feed_op
      
      * modify according comment
      
      * perfect code according comment
      
      * add infershape
      
      * fix py3 bugs
      
      * fix getexpected kernel type
      
      * fix getexpected kernel type
      
      * fix ci bugs
      
      * add registry for custom device
      
      * fix py3 bugs
      
      * fix floating point error
      
      * fix py3 test bugs
      f7f1dc03
    • H
      SetDevice when parse TensorBase (#49860) · 4c576870
      HongyuJia 提交于
      4c576870
    • X
      【Prim】Add multiply,expand,div vjp rules (#49831) · 39c6765a
      Xiaoxu Chen 提交于
      * support elementwise base func
      
      * fix compiling error and add test
      
      * support vjp for div using comp
      
      * remove additional change
      
      * fix dy2st error with magic num
      
      * fix dy magic num
      
      * another magic
      
      * another magic
      
      * another magic
      
      * add skip rename strategy
      
      * support add vjp
      
      * support add with new axis cal
      
      * support sub vjp
      
      * [prim] add multiply vjp rules
      
      * [prim] add multiply vjp rules
      
      * [prim] fix no infershape with composite in _append_backward_ops
      
      * [prim] add expand vjp rule
      
      * [prim] add exp vjp rule
      
      * uncomment infer shape for reshape/sum static prim api
      
      * [prim] fix tanh nullptr error
      
      * remove some print message
      
      * fix magic number in run_program relative tests @JiaBinYang
      
      * [prim] add expand,multiply,exp vjp rules
      
      * fix only support single direction reduce error
      
      * infer reduce dims using out dims
      Co-authored-by: NJiabinYang <360788950@qq.com>
      39c6765a
  3. 16 1月, 2023 7 次提交
  4. 13 1月, 2023 16 次提交
  5. 12 1月, 2023 6 次提交