1. 07 7月, 2021 3 次提交
  2. 06 7月, 2021 3 次提交
    • Z
      Add gpu implementation of shuffle_batch_op (#33938) · c6b6ba1f
      Zeng Jinle 提交于
      * add gpu implementation of shuffle batch
      test=develop
      
      * add thrust cuda patches
      test=develop
      
      * fix macro guard
      
      * fix shuffle batch compile on windows/hip
      
      * fix hip compilation error
      
      * refine CMakeLists.txt
      
      * fix windows compile error
      
      * try to fix windows CI compilation error
      
      * fix windows compilation again
      
      * fix shuffle_batch op test on Windows
      c6b6ba1f
    • X
      Enhance error message for interpolate_v2 (#33941) · f2068eec
      xiaoting 提交于
      * fix interpolate for shape[i]=0, test=develop
      
      * fix test_trilinear_interp_v2 random failure, test=develop
      f2068eec
    • D
      【HETERPS】pipeline adaptive for heterps (#33159) · bfef7feb
      danleifeng 提交于
      * pipeline adaptive for heterps;test=develop
      * fix finalize hang;test=develop
      * add is_compiled_with_heterps for dataset;test=develop
      * fix hashtable core when pass ins_num=0;test=develop
      bfef7feb
  3. 05 7月, 2021 5 次提交
  4. 04 7月, 2021 1 次提交
  5. 02 7月, 2021 1 次提交
  6. 01 7月, 2021 4 次提交
  7. 30 6月, 2021 2 次提交
    • J
      Added matmul_v2 BF16/FP32 FWD kernel (#33750) · 24783c84
      jakpiase 提交于
      * added matmul_v2 bf16/fp32 FWD kernel
      
      added matmul_v2 bf16/fp32 FWD kernel
      
      * added formatting
      
      * removed some tests due to timeout in CI
      
      * refactored tests
      
      * merged tests classes into one file
      
      * minor change
      
      * removed test guard for CUDA
      
      * remove skipIf
      
      * changes after review
      
      * formated one file
      
      * minor change
      
      * added skipping UT in CUDA place
      24783c84
    • H
      [NPU] support set_device (#33815) · 8225a6a1
      houj04 提交于
      * support set_device for NPU.
      
      * minor update doc and add more unit test.
      8225a6a1
  8. 28 6月, 2021 3 次提交
  9. 25 6月, 2021 1 次提交
  10. 24 6月, 2021 7 次提交
  11. 23 6月, 2021 4 次提交
  12. 22 6月, 2021 3 次提交
    • Z
      [API/OP]Add a new API paddle.diagonal (#33586) · ad106290
      zhangbo9674 提交于
      * new api diagonal, test=develop
      
      * add new api diagonal, test=develop
      
      * new api diagonal, test=develop
      
      * add new api paddle.diagonal, test=develop
      
      * use framework::stride replace ComputeDimStride
      
      * replace cudaMalloc/cudaMemcpy by TensorFormVector in cudaKernel and cudaGradKernel
      
      * perfect funciton: when attr(offset) is exceed attr(axis1) or attr(axis2), set the diagonal dim is 0
      
      * fix RP-Mac-CI bug: replace framework::stride() by ComputDimStride.
      
      * perfect code-block
      
      * perfect code of python API diagonal
      
      * api supports dtype of float16 and bool
      
      * api supports dtype of float16 and bool
      
      * modify unittest code
      
      * modify unittest code
      
      * perfect dtype describe
      
      * perfect code-block
      ad106290
    • Z
      8a5bbae6
    • C
      transform complex scale to tensor (#33699) · 5db0c84b
      chentianyu03 提交于
      * transform complex scale to tensor
      
      * add test_case for complex scalar
      
      * modify import paddle
      5db0c84b
  13. 21 6月, 2021 3 次提交
    • L
      Add AXPY oneDNN handler (#33632) · 773aabc7
      lidanqing 提交于
      * Add oneDNN AXPY handler.
      
      * Add fallback for small tensors.
      
      * Fix ifdefs
      
      * Remove unnecessary namespace prefixes and add missing headers.
      
      * Guard handler_axpy with proper ifdefs.
      
      * Compilation of this function is possible only when Paddle is not build
      with CUDA nor HIP.
      
      * Move AXPY handler code to separate files.
      
      * Use oneDNN AXPY handler in SGD op.
      
      * Use axpy handler only when Paddle is built with oneDNN.
      
      * Add test for SUM BF16 with big rows.
      
      * Fix SFINAE rules for elementwise_add_to.
      
      * Add test case for SGD with big rows.
      
      * update
      
      * update
      Co-authored-by: NAdam Osewski <adam.osewski@intel.com>
      773aabc7
    • Y
      e0e0c0fa
    • P
      [NPU] optimize mul op, use BatchMatMul to realize (#33616) · f91dfe15
      pangyoki 提交于
      * use BatchMatMul
      
      * replace TensorCopy with ShareDataWith
      
      * remove check fp16 grad
      
      * fix format
      
      * add grad_check
      
      * fix grad check
      f91dfe15