1. 06 2月, 2022 1 次提交
  2. 30 1月, 2022 2 次提交
  3. 29 1月, 2022 2 次提交
    • L
      Add xpu2 compiler (#37254) · 92da5055
      Liu-xiandong 提交于
      * Add XPU compiler for paddle, test=develop
      
      * clean code
      
      * clean useless code
      
      * clean useless code
      
      * clean useless code
      
      * test
      
      * add include path
      
      * use clang compiler
      
      * xpu2.cmake
      
      * XPU2 compiler passed
      
      * update
      
      * update after pten
      
      * combination the WITH_XPU and WITH_XPU2
      
      * update the fuse operation in WITH_XPU and WITH_XPU2
      
      * update
      
      * update
      
      * update
      
      * fix the merge error
      
      * update
      
      * update the code
      
      * update the code
      
      * add run_kp_kernel flag
      
      * update
      
      * update
      
      * fix prepared type_ bug
      
      * clean and update the code
      
      * reset the kernel_primitives
      
      * update
      
      * clean the code
      
      * delete useless comment
      
      * fix the bug in WITH_XPU
      
      * update
      
      * update
      
      * modify the abi
      
      * delete some useless code
      
      * Parameter automation in xpu compilation
      
      * Parameter automation in xpu compilation
      
      * delete kps in cmake
      
      * delete useless comment
      
      * clean the code
      
      * clean the code
      92da5055
    • Q
      fix kunlun2 softmax unitest bug (#39274) · 23bb2836
      QingshuChen 提交于
      * fix kunlun2 softmax unitest bug
      *test=kunlun
      
      * minor
      23bb2836
  4. 28 1月, 2022 2 次提交
  5. 27 1月, 2022 4 次提交
    • A
      [PTen]Support AllocateFrom in Tensor and Alloc/HostAlloc in Context (#39022) · 5631da9c
      Aurelius84 提交于
      * Support allocate_from in Tensor and allocate_data in Context
      
      * fix #ifdef CUDA
      
      * fix cycle depends
      
      * fix test_xxx_dev_api failed
      
      * fix windows compiling error
      
      * fix unittest
      
      * modify into PImpl
      
      * fix selected rows
      
      * add TODO comment
      
      * refine interface according reviewer
      5631da9c
    • Q
      [MLU] add compile ci scripts for MLU, test=mlu_ci (#39122) · 56410b4a
      Qi Li 提交于
      56410b4a
    • A
      [PluggableDevice] Add custom kernel support based on pten kernel management (#38848) · a8879215
      Aganlengzi 提交于
      * [Demo] custom kernel based on pten kernel
      
      * merge and npu custom work well
      
      * del comments
      
      * delete other code
      
      * fix CUDAContext
      
      * fix not found small_vector.h
      
      * support NPU
      
      * fix NPUContext
      
      * fix DeviceContext support
      
      * add UT
      
      * fix call
      
      * add UT
      
      * fix
      
      * fix for comments and ut
      
      * add MACRO control
      
      * fix multi input output
      
      * support env CUSTOM_DEVICE_ROOT
      
      * deal with special cases
      
      * fix for Windows
      
      * try coverage with test_custom_kernel_dot.py
      
      * fix test_custom_kernel_dot
      
      * fix test_custom_kernel_dot
      
      * fix merge
      
      * fix merge
      
      * fix CI
      
      * update
      
      * merge and fix
      
      * remove WITH_CUSTOM_KERNEL
      
      * fix merge
      
      * merge and fix
      
      * fix ut
      
      * fix ut for mac
      
      * add more UT
      
      * add more UT
      
      * fix
      a8879215
    • A
      [NPU] fix aarch64 deps (#39257) · 80dfa010
      Aganlengzi 提交于
      80dfa010
  6. 26 1月, 2022 4 次提交
  7. 25 1月, 2022 5 次提交
  8. 24 1月, 2022 3 次提交
  9. 21 1月, 2022 4 次提交
  10. 20 1月, 2022 2 次提交
  11. 19 1月, 2022 1 次提交
  12. 18 1月, 2022 3 次提交
  13. 17 1月, 2022 4 次提交
  14. 15 1月, 2022 1 次提交
  15. 14 1月, 2022 1 次提交
  16. 13 1月, 2022 1 次提交
    • J
      Added mul BF16/FP32 FWD/BWD oneDNN kernel (#38552) · fc6eed5b
      jakpiase 提交于
      * base changes for mul reimplementation
      
      * empty commit
      
      * tmp save
      
      * full implementation of mul bf16/fp32 fwd bwd
      
      * CI fix
      
      * CI rerun
      
      * changed unity build cmake to avoid gpu issues
      
      * removed mul mkldnn from unity build
      
      * added skipping tests if not cpu_bf16
      
      * CI fix
      
      * CI fix
      
      * CI fix
      fc6eed5b