1. 10 5月, 2023 5 次提交
    • Y
      [cherry-pick] Fix the index calculation in cross_entroy_kernel. (#53659) (#53666) · 1ab562ca
      Yiqun Liu 提交于
      cherry-pick #53659
      1ab562ca
    • Z
      [Cherry-Pick] Fix bug in log_softmax kernel when lastdim is larger than 100000 (#53657) · a7cad386
      Zhang Zheng 提交于
      Fix bug in log_softmax kernel when lastdim is larger than 100000
      
      There is an unexpected log in the calculation
      
      Cherry-Pick: #53654
      a7cad386
    • Q
      revert argsort to fix OOM bug (#53647) · 6707142a
      Qi Shao 提交于
      Revert argsort to the version without full sort algorithm implemented
      6707142a
    • B
      [cherry-pick 2.5] Broadcast && Dropout_nd Performance Optimization into Release/2.5 (#53623) · f9ea2301
      Bo Zhang 提交于
      * Support different dtypes of inputs for broadcast for dropout optimization  (#52093)
      
      * change judgement for DropoutGradGPUKernelDriver
      
      * add UnrollerWithoutVecSize and after this Loaddata to be refined
      
      * pass unittest
      
      * use same unroller with XPU
      
      * BroadcastWithInt64Index
      
      * BroadcastDataLoader template partial specialization
      
      * fix compile errs in ROCms
      
      * PR comment
      
      * dropout_nd_optimization (#51479)
      
      * with printf
      
      * add DropOutNdForwardKernel
      
      * PR comment
      
      * Dropout optimize & clean broadcast inT and ElementwiseType (#52969)
      
      * change judgement for DropoutGradGPUKernelDriver
      
      * add UnrollerWithoutVecSize and after this Loaddata to be refined
      
      * pass unittest
      
      * use same unroller with XPU
      
      * BroadcastWithInt64Index
      
      * BroadcastDataLoader template partial specialization
      
      * fix compile errs in ROCms
      
      * clean ElementwiseT and InT for BroadcastKernel
      
      * default axis and clean inT
      
      * remove redundant fast divmod computation
      
      * optimize drop_nd & drop_nd_grad
      
      * optimize BroadcastDataLoader bf16 fp16
      
      * rm InT etc. after merge develop
      
      * delete constexpr for windows ci
      
      * fix conflict
      
      * fix conflic with develop
      
      * fix conflic
      
      * new clean
      
      * clean
      
      * Fix xpu2 kp compile error (#53548)
      
      * fix conflict
      
      * conflict
      f9ea2301
    • zhouweiwei2014's avatar
      [Zero-Dim] add 0D Tensor UT case for XPU (#53611) · 3a247cba
      zhouweiwei2014 提交于
      3a247cba
  2. 09 5月, 2023 4 次提交
    • L
      Cherry pick fused linear (#53621) · f21b6f08
      limingshu 提交于
      Cherry pick fused linear
      f21b6f08
    • G
      【cherry-pick】Op test add complex support (#53604) · c8504d86
      GGBond8488 提交于
      * add complex support for  optest
      
      * add complex grad test
      
      * append one
      
      * move some debug info
      
      * move some debug info
      
      * move some debug info
      
      * move some debug info
      
      * add more complex test
      
      * Fix naming ambiguity
      
      * Revert "add more complex test"
      
      This reverts commit dbcb0516b8e53ba42e2d6089878a39b395345969.
      
      * change backward gradient, add TODO
      c8504d86
    • zhouweiwei2014's avatar
      [cherry-pick 2.5][Zero-Dim] support paddle.sum/mean/loss api output 0D (#53601) · b6e23774
      zhouweiwei2014 提交于
      * [Zero-Dim] fix functool.reduce more safe with intial value, to support empty list (#53182)
      
      * [Zero-Dim] support 0d tensor for shape and squeeze onednn kernel (#52832)
      
      * support 0d tensor for shape and squeeze onednn kernel
      
      * set python api for shape op ut
      
      * [Zero-Dim] distributed scatter/all_to_all support input 0D tensor (#53186)
      
      * [Zero-Dim] Support paddle.sum/mean/loss api output 0D,test=allcase (#52739)
      
      * [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily (#53382)
      
      * [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily
      
      * Add unittest
      
      * [CINN Support 0D-Tensor] CINN hack squeeze2 with trick temporarily (#53454)
      
      * fix test_autograd_dynamic (#53473)
      Co-authored-by: Nzhwesky2010 <zhouwei25@baidu.com>
      
      ---------
      Co-authored-by: NYangQun <qun.yang@intel.com>
      Co-authored-by: NHongyuJia <jiahongyu@baidu.com>
      Co-authored-by: NHydrogenSulfate <490868991@qq.com>
      b6e23774
    • J
      [Cherry-pick] zero-dim: support 0-D for getitem/setitem (#53441) · 767e7b3f
      JYChen 提交于
      * support 0-D output and 0-D as indice in __getitem__
      
      * fix tests
      
      * fix inference and UT
      
      * add unittest for setitem
      
      * fix xpu test
      
      * fix xpu 0-d
      
      * fix right value is 0d and index is List/Tensor
      
      * Hack__getitem__ from 0-d to 1-d with FLAGS_set_to_1d
      
      * change PHI_DECLARE_xxx to DECLARE_xxx since the change not merged to 2.5
      
      * hack 1-D tensor to Scalar
      
      * throw warning at __getitem__, not slice_utils
      767e7b3f
  3. 08 5月, 2023 3 次提交
    • Z
      [Cherry-Pick] Fix the calculation of y_grad in divide_backward (#53584) · e63fb1e6
      Zhang Zheng 提交于
      Cherry-Pick: #53582
      修改内容:在除法out = x / y中,将y的反向公式由dy = -dout * out / y 改为 dy = -dout * ((x / y) / y)
      修改原因:使用result作为反向的输入,在低精度的时候本身cast之后就会存在一些精度损失,所以重新计算后才是更准确的结果
      修改影响:此改动可以使结果更精确且对性能影响忽略不计
      e63fb1e6
    • Y
      Cherry-pick #53432 and #53556 (#53576) · 6583c390
      Yiqun Liu 提交于
      * Add fused_gate_attention API. (#53432)
      * Add PADDLE_THROW in take_along_axis kernel when the datatype of index is wrong. (#53556)
      6583c390
    • G
      [Cherry-pick]Cherry pick 0d output (#53538) · 2d02b0c1
      GGBond8488 提交于
      * add 0D output support for inalg.slogdet,test=allcase
      
      * fix zerom dime test error test=allcase
      
      * fix test error test=allcase
      
      * add static backward test, test=allcase
      
      * support_0D_output_for_matrix_rank_multi_dot, test=allcase
      
      * add 0D output test for matrox_rank and mutli_dot test=allcase
      
      * fix assert error ,test=allcase
      
      * fix test error, test=allcase
      
      * fix other test error, test=allcase
      
      * fix other test error, test=allcase
      
      * fix test error, test=allcase
      
      * fix matrix_rank and multi dot test err test=allcase
      
      * fix test error test=allcase
      
      * fix test zero dim test, test=allcase
      
      * add static backward test for multi_dot, test=allcase
      
      * add tol 2d broadcast test case, test=allcase
      
      * fix test error test=allcase
      
      * fix test error test=allcase
      
      * test=allcase
      
      * support_0d_output_for_linalg.norm
      
      * fix test error test=allcase
      
      * fix 0D test
      
      * fix test error test=allcase
      
      * fix test error test=allcase
      
      * fix tets,test=allcase
      
      * fix error,test=allcase
      
      * fix errors ,test=allcase
      
      * add static backward , test=allcase
      
      * add static backwward test, test=allcase
      
      * slogdet_support_0D_output
      
      * add new case
      
      * fix tests, test=allcase
      
      * cherry-pick
      
      * cherry-pick
      
      * fix trace gpu kernel 0d error, test=allcase
      
      * fix windows error, test=allcase
      
      * add matrixrank cherry-pick
      2d02b0c1
  4. 06 5月, 2023 2 次提交
  5. 27 4月, 2023 2 次提交
    • zhouweiwei2014's avatar
      [cherry-pick2.5] [Zero-Dim] Support... · b6996598
      zhouweiwei2014 提交于
      [cherry-pick2.5] [Zero-Dim] Support all/any/min/max/prod/logsumexp/amax/amin/some loss output 0D (#53192)
      
      b6996598
    • W
      [Cherry-Pick]Support output 0D for... · f84ac449
      wangfengsheng1999 提交于
      [Cherry-Pick]Support output 0D for is_empty/as_complex/inner/dot/rank/tensordot/squeeze_/static.accuracy/static.auc/metric.accuracy (#53199)
      
      * support output 0D for is_empty/as_complex/inner/dot/rank/tensordot/squeeze_/static.accuracy/static.auc/metric.accuracy
      
      * test_dot_py
      
      * test_dot_py
      f84ac449
  6. 25 4月, 2023 2 次提交
  7. 24 4月, 2023 1 次提交
  8. 23 4月, 2023 1 次提交
    • J
      Cherry pick getitem/setitem 0d (#53125) · a79c04f3
      JYChen 提交于
      * support 0-D output and 0-D as indice in __getitem__
      
      * fix tests
      
      * fix inference and UT
      
      * add unittest for setitem
      
      * fix xpu test
      
      * fix xpu 0-d
      a79c04f3
  9. 21 4月, 2023 1 次提交
    • J
      Cherry pick fix set value cpu (#53127) · e4178284
      JYChen 提交于
      * fix the set_value error in cpu
      
      * add a unitest for set_value OP
      
      * fix platform::is_gpu_place
      
      * add todo note for set_value
      
      * fix test
      e4178284
  10. 20 4月, 2023 1 次提交
    • G
      [Cherey-Pick]Support 0D for slogdet (#53087) · 3f5058e6
      GGBond8488 提交于
      * add 0D output support for inalg.slogdet,test=allcase
      
      * fix zerom dime test error test=allcase
      
      * fix test error test=allcase
      
      * add static backward test, test=allcase
      3f5058e6
  11. 19 4月, 2023 1 次提交
  12. 17 4月, 2023 7 次提交
  13. 14 4月, 2023 10 次提交