1. 01 2月, 2023 7 次提交
    • Z
      support grid_sampler_grad op for XPU (#49857) · 520f48d6
      zhangyikun02 提交于
      520f48d6
    • G
      [Divide by 0 Error] add lu check (#49974) · f71796b6
      gouzil 提交于
      * [Divide by 0 Error] add lu check
      
      * [Divide by 0 Error] lu check migrate to c++
      f71796b6
    • G
      [Divide by 0 Error] add eig check (#49971) · 226a6567
      gouzil 提交于
      * [Divide by 0 Error] add eig check
      
      * [Divide by 0 Error] eig check migrate to c++
      
      * [Divide by 0 Error] Fix class name error
      226a6567
    • G
      [Divide by 0 Error] add norm check (#49966) · 5dfddaea
      gouzil 提交于
      * [Divide by 0 Error] add norm check
      
      * [Divide by 0 Error] fix x AttributeError
      
      * [Divide by 0 Error] norm check migrate to c++
      5dfddaea
    • L
      Combination of multiple paddle::memory::allocate operation into one for ops (#49126) · bdae5481
      limingshu 提交于
      * A leap of try for cudaLaunchCooperativeKernel
      
      * fix bugs
      
      * Totally replace the lar cuda kernel
      
      * Fix bugs
      
      * fix code according to comments
      
      * fix codes according to  review comments
      
      * adding some function overload
      
      * relocate the power operation.
      
      * add bf16 support for index select relevant ops
      
      * revert bf16 type change.
      
      * add changes for more op
      
      * fix code writting bugs
      bdae5481
    • R
      Fix UFA非法地址访问(UFA illegal address access) of case4: paddle.unbind (#49995) · 9ce8cfcf
      RedContritio 提交于
      * add axis check for unbind
      
      * add axis range check for unbind
      
      * update unittest and axis validation for unbind
      
      * add unittest invalid axis for unbind
      
      * restore axis extract for unbind
      9ce8cfcf
    • L
      H2D data transfer optimization for split kernel (#49086) · 057ba778
      limingshu 提交于
      * profile reduce kernel for fp16 and reduceHigherdim
      
      * use reinterpret_cast
      
      * fix for CI on ROCm
      
      * add Macro for ROCm
      
      * ROCm CI config
      
      * ROCm CI config
      
      * unit test repair
      
      * pull
      
      * add common_funcs.h
      
      * reduceType
      
      * Update reduce_function.h
      
      * not higher
      
      * rename
      
      * implement of matmul using cublasLt instead of cublas
      
      * cublasLt bugfix
      
      * Update matmul_kernel_impl.h
      
      * Update matmul_kernel_impl_via_blasLt.h
      
      * for-loop-algo
      
      * PR comments changes
      
      * add macro
      
      * ci unused variable isCublasLt
      
      * ci unused variable isCublasLt macro
      
      * split matmul to autotune
      
      * rewrite the split kernel with segmented_array
      
      * rewrite the split kernel with segmented_array
      
      * rewrite the split kernel with segmented_array
      
      * add some method for cuda_graph
      
      * fix bugs for rocm
      
      * change for ci-error
      
      * i dont know why ci-model-benchmark gives a shit error, so i recover codes with original one to see if original codes work.
      
      * add some changes for passing mode_benchmark and coverage ci
      
      * fix ci error
      
      * fix ci-rocm error
      
      * add some changes for header
      
      ---------
      Co-authored-by: Nzhangbopd <1299246947@qq.com>
      Co-authored-by: NBo Zhang <105368690+zhangbopd@users.noreply.github.com>
      057ba778
  2. 31 1月, 2023 19 次提交
  3. 30 1月, 2023 5 次提交
  4. 25 1月, 2023 1 次提交
  5. 20 1月, 2023 3 次提交
  6. 19 1月, 2023 2 次提交
    • H
      Fix paddle.queeze_ bug (#49903) · 11e34ae0
      heliqi 提交于
      * fix queeze_ bug
      
      * fix slove use squeeze_kernel
      
      * fix slove use squeeze_kernel
      
      * fix slove use squeeze_kernel
      
      * add test case
      11e34ae0
    • J
      [KUNLUN] add op: maxpool_with_index (#49505) · f71f77e9
      jameszhang 提交于
      * [KUNLUN] add op: maxpool_with_index
      
      * use DeviceContext::Alloc() instead of DenseTensor::mutable_data()
      
      * fix file format
      
      * solve clip unittest failure
      
      * minor fix
      
      * Revert "solve clip unittest failure" since the issue is fixed
      in #49535
      
      This reverts commit 1127adc66e79afe35ac3c00bb34e6aaa7cd7d78b.
      
      * align with xdnn on the definition of mask in max_pool_with_index
      
      * minor
      f71f77e9
  7. 18 1月, 2023 3 次提交