- 19 5月, 2023 1 次提交
-
-
由 Zhang Jun 提交于
* remove kSPARSE_WEIGHTS * remove kFASTER_DYNAMIC_SHAPES_0805 and add 'TrtMajorVersion' function
-
- 18 5月, 2023 1 次提交
-
-
由 zhouweiwei2014 提交于
* [Zero-Dim] update 0d tensor api en doc, test=document_fix * [BUG] fix windows kernel dispatch of _lzcnt bug (#53728)
-
- 16 5月, 2023 3 次提交
-
-
由 Yiqun Liu 提交于
[AMP] Allow to switch whether to use promote strategy to choose kernel for O2 training. (#53742) (#53841) Pcard-70458 cherry-pick #53742 中文文档:PaddlePaddle/docs#5882
-
由 YuanRisheng 提交于
* delete log * filter some kernel signature
-
由 shaojie_wang 提交于
Pcard-70458 cherry-pick: #53770
-
- 15 5月, 2023 1 次提交
-
-
由 Zhang Ting 提交于
Pcard-70458 cherry-pick #53712
-
- 13 5月, 2023 1 次提交
-
-
由 Zhang Jun 提交于
* scale, square, sum, swish trt op converter support zero dim (#53660) * [Paddle-Inference] Support trt 0dims of expand_as_v2 and mish. (#53627) * support_expand_mish * add unitest for reshpe 0 dims (#53685) * Add trt pow converter. (#53462) * Add trt pow converter. * update to use AddConstantLayer * add dims=0 ut * [inference Zero-Dim]add equal, elementwise_op trt 0d (#53704) * [inference Zero-Dim]prelu trt converter support zero dim tensor (#53634) * prelu op trt converter support zero dim * [Inference Zero-Dim] Support trt 0dim of gelu, hard_swish, hard_sigmoid and leaky_relu (#53714) * support_act * delete_silu * [inference zero dim] softmax, stack op trt converter support zero dim (#53729) * softmax support * support stack * remove unused code * update --------- Co-authored-by: NYuanle Liu <yuanlehome@163.com> Co-authored-by: Nxiaoxiaohehe001 <49090790+xiaoxiaohehe001@users.noreply.github.com> Co-authored-by: Nzhoutianzi666 <39978853+zhoutianzi666@users.noreply.github.com> Co-authored-by: NWilber <jiweibo@baidu.com>
-
- 12 5月, 2023 3 次提交
-
-
由 傅剑寒 提交于
This PR fix docs error of index_put , related dev PR is #53727
-
由 HydrogenSulfate 提交于
-
由 傅剑寒 提交于
This PR add data_type for selecting which arg's datatype to instantiate template type T for index_put kernel Related PR #53652
-
- 11 5月, 2023 5 次提交
-
-
由 lijialin03 提交于
-
由 WangZhen 提交于
* Fix div error when dtype is int64 in static mode * Fix out dtype
-
由 limingshu 提交于
Fix static_assert bug in Windows CUDA 11.6 compilation. This may be the bug of msvc.
-
由 feifei-111 提交于
【cherry-pick】【BugFix】fix err of api `to_tensor`, which caused by numpy version update (#53534) (#53624) * 【BugFix】fix err of api `to_tensor`, which caused by numpy version update (#53534) * fix * update code * pre-commit * remove scale check (0-D tensor is usable) * fix data dtype err * fix numpy default dtype diff * fix data dtype * fix data dtype * update * fix coverage * fix old test which is not correct when 0-D tensor is usable
-
由 JYChen 提交于
* up warning level * numpy still vlog-0
-
- 10 5月, 2023 9 次提交
-
-
由 傅剑寒 提交于
This PR add index_put api for paddle
-
由 Yiqun Liu 提交于
cherry-pick #53659
-
由 Zhang Zheng 提交于
Fix bug in log_softmax kernel when lastdim is larger than 100000 There is an unexpected log in the calculation Cherry-Pick: #53654
-
由 RedContritio 提交于
-
由 Qi Shao 提交于
Revert argsort to the version without full sort algorithm implemented
-
由 Bo Zhang 提交于
* Support different dtypes of inputs for broadcast for dropout optimization (#52093) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * PR comment * dropout_nd_optimization (#51479) * with printf * add DropOutNdForwardKernel * PR comment * Dropout optimize & clean broadcast inT and ElementwiseType (#52969) * change judgement for DropoutGradGPUKernelDriver * add UnrollerWithoutVecSize and after this Loaddata to be refined * pass unittest * use same unroller with XPU * BroadcastWithInt64Index * BroadcastDataLoader template partial specialization * fix compile errs in ROCms * clean ElementwiseT and InT for BroadcastKernel * default axis and clean inT * remove redundant fast divmod computation * optimize drop_nd & drop_nd_grad * optimize BroadcastDataLoader bf16 fp16 * rm InT etc. after merge develop * delete constexpr for windows ci * fix conflict * fix conflic with develop * fix conflic * new clean * clean * Fix xpu2 kp compile error (#53548) * fix conflict * conflict
-
由 zqw_1997 提交于
[Cherry-pick 2.5][Zero-Dim] paddle.static.data, squeeze, unbind, unstack, gather_nd and einsum support 0D (#53602) * add test cases, test=allcase * fix test cases, test=allcase * fix test cases, test=allcase * assert_allclose, test=allcase * 1e-5 to 1e-4, test=allcase * change rtol from 1e-4 to 1e-3, test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * fix test cases, test=allcase * fix test cases, test=allcase * modify the test_squeeze to not use Tensor type axis, test=allcase * add grad check for unbind and unstack, test=allcase * check for squeeze axis tensor type, test=allcase * fix bug, test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase
-
由 zhouweiwei2014 提交于
-
由 GGBond8488 提交于
-
- 09 5月, 2023 11 次提交
-
-
由 niuliling123 提交于
fix static promote 将因性能有问题而放入unsupprot_list中的算子放入黑名单中,以保证在O2模式下,只有3种场景权重会保持fp32
-
由 zhangkaihuo 提交于
cherry-pick #53430
-
由 zqw_1997 提交于
* fix doc erros, test=allcase * conflict * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * test=allcase * fix doc erros, test=allcase * fix the to_tensor error
-
由 zhouweiwei2014 提交于
-
由 limingshu 提交于
Cherry pick fused linear
-
由 Zhang Jun 提交于
* [inference][trt]trt support 0 dims (#53383) * trt support 0 dim * update activation ut * fix trt Unary operation do not support 0d when TRT < 8.6 * Update op_teller.cc * update unary ut * add rsqrt to unary_list * move rsqrt to act_list
-
由 GGBond8488 提交于
* add complex support for optest * add complex grad test * append one * move some debug info * move some debug info * move some debug info * move some debug info * add more complex test * Fix naming ambiguity * Revert "add more complex test" This reverts commit dbcb0516b8e53ba42e2d6089878a39b395345969. * change backward gradient, add TODO
-
由 zhouweiwei2014 提交于
* [Zero-Dim] fix functool.reduce more safe with intial value, to support empty list (#53182) * [Zero-Dim] support 0d tensor for shape and squeeze onednn kernel (#52832) * support 0d tensor for shape and squeeze onednn kernel * set python api for shape op ut * [Zero-Dim] distributed scatter/all_to_all support input 0D tensor (#53186) * [Zero-Dim] Support paddle.sum/mean/loss api output 0D,test=allcase (#52739) * [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily (#53382) * [CINN Support 0D-Tensor] CINN supports 0D-Tensor with trick temporarily * Add unittest * [CINN Support 0D-Tensor] CINN hack squeeze2 with trick temporarily (#53454) * fix test_autograd_dynamic (#53473) Co-authored-by: Nzhwesky2010 <zhouwei25@baidu.com> --------- Co-authored-by: NYangQun <qun.yang@intel.com> Co-authored-by: NHongyuJia <jiahongyu@baidu.com> Co-authored-by: NHydrogenSulfate <490868991@qq.com>
-
由 jzhang533 提交于
-
由 JYChen 提交于
* support 0-D output and 0-D as indice in __getitem__ * fix tests * fix inference and UT * add unittest for setitem * fix xpu test * fix xpu 0-d * fix right value is 0d and index is List/Tensor * Hack__getitem__ from 0-d to 1-d with FLAGS_set_to_1d * change PHI_DECLARE_xxx to DECLARE_xxx since the change not merged to 2.5 * hack 1-D tensor to Scalar * throw warning at __getitem__, not slice_utils
-
由 cyber-pioneer 提交于
-
- 08 5月, 2023 5 次提交
-
-
由 zhoutianzi666 提交于
* add ```converter_type``` for op converter
-
由 Zhang Zheng 提交于
Cherry-Pick: #53582 修改内容:在除法out = x / y中,将y的反向公式由dy = -dout * out / y 改为 dy = -dout * ((x / y) / y) 修改原因:使用result作为反向的输入,在低精度的时候本身cast之后就会存在一些精度损失,所以重新计算后才是更准确的结果 修改影响:此改动可以使结果更精确且对性能影响忽略不计
-
由 niuliling123 提交于
修复优化器精度检查bug
-
由 Yiqun Liu 提交于
* Add fused_gate_attention API. (#53432) * Add PADDLE_THROW in take_along_axis kernel when the datatype of index is wrong. (#53556)
-
由 GGBond8488 提交于
* add 0D output support for inalg.slogdet,test=allcase * fix zerom dime test error test=allcase * fix test error test=allcase * add static backward test, test=allcase * support_0D_output_for_matrix_rank_multi_dot, test=allcase * add 0D output test for matrox_rank and mutli_dot test=allcase * fix assert error ,test=allcase * fix test error, test=allcase * fix other test error, test=allcase * fix other test error, test=allcase * fix test error, test=allcase * fix matrix_rank and multi dot test err test=allcase * fix test error test=allcase * fix test zero dim test, test=allcase * add static backward test for multi_dot, test=allcase * add tol 2d broadcast test case, test=allcase * fix test error test=allcase * fix test error test=allcase * test=allcase * support_0d_output_for_linalg.norm * fix test error test=allcase * fix 0D test * fix test error test=allcase * fix test error test=allcase * fix tets,test=allcase * fix error,test=allcase * fix errors ,test=allcase * add static backward , test=allcase * add static backwward test, test=allcase * slogdet_support_0D_output * add new case * fix tests, test=allcase * cherry-pick * cherry-pick * fix trace gpu kernel 0d error, test=allcase * fix windows error, test=allcase * add matrixrank cherry-pick
-