- 22 5月, 2023 10 次提交
-
-
由 zhupengyang 提交于
-
由 Yuanle Liu 提交于
-
由 JYChen 提交于
-
由 Wilber 提交于
-
由 Yuanle Liu 提交于
[Inference] add config.enable_low_precision_io api and remove rely on AnalysisConfig::Precison in trt (#52485)
-
由 zhoutianzi666 提交于
* fix transfer_layout when input size if too big * do not add TransferLayoutKernelGPU * add int64 and add check
-
由 zhangyikun02 提交于
-
由 Tian Zheng 提交于
* Add GPU kernel for multiclass_nms3 op * Make multiclass_nms3 gpu kernel output consistent with cpu kernel * Fix API incompatibility * Fix unittests on builds without CUDA * Fix ROCM build * Remove fluid headers; Use default atol for unittest * Change function and variable naming * Add comments; Reduce redundant code * Use paddle test framework
-
由 niuliling123 提交于
Print python trace back when debugmode = CHECK_NAN_INF_AND_ABORT and backward has nan/inf (#52808)
-
由 wangshengxiang 提交于
* bind xpu op: 3D grid sample * fix edge cases in xpu op: reshape & slice
-
- 20 5月, 2023 3 次提交
-
-
由 ShenLiang 提交于
-
由 zhangbo9674 提交于
-
由 zhangbo9674 提交于
* add types and attributes * remove some const_cast * refine code
-
- 19 5月, 2023 25 次提交
-
-
由 shentanyue 提交于
-
由 Frank Lin 提交于
* Improve Readability and Overall Clarity of Logging * Adds the set_input_type API for specifying input data types * Specifying input data types
-
由 wz1qqx 提交于
-
由 warrentdrew 提交于
* add minimum grad composite rules * add public python api * fix format * fix format * update testcase * fix testcase * fix format * fix cmakelist.txt * fix format * fix param problem * fix op and composite rule * fix bf16 cpu support problem * fix bf16 cpu issue * fix axis error log * add axis for maximum * revert commit * remove .orig * fix generic problem * revert max op * fix axis error * fix maximum axis * fix test_check_output * fix cinn * fix minimum maximum axis check
-
由 王明冬 提交于
-
由 limingshu 提交于
* Reorganize the forward codes of flash-attention. * Fix forward. * Remove some noused codes. * Simplify codes and fix backward. * Change all LOG(INFO) to VLOG and fix the backward. * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes * decrease the effect of debug print on performance * Unify the initialize of flashattn arguments. * Rewirte the reshape of temp_mask and temp_bias. * API support use_flash_attn. * Fix compiling error on CI. * Try to crop the flash-attention lib. * Correct the condition of whether can use flash-attn. * Remove the softmax_out argument. * Remove is_causal. * Polish codes. * Fix qkv_transpose_out's shape and scaling of Q * K. * Update commit of flash-attention. --------- Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
由 limingshu 提交于
-
由 RedContritio 提交于
-
由 gouzil 提交于
* [tools] add PADDLE_API check file diff approvals * [tools] fix determine * [tools] fix determine * [tools] Change to full character matching Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com> * [tools] Update echo_line Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com> * [tools] Update check_approval Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com> --------- Co-authored-by: N张春乔 <83450930+Liyulingyue@users.noreply.github.com>
-
由 GGBond8488 提交于
* remove user define grad * fix errors * remove unused self.x_grad, self.out_grad
-
由 Zhang Zheng 提交于
* Add large dim test of log_softmax * fix
-
由 Galaxy1458 提交于
-
由 Galaxy1458 提交于
-
由 Galaxy1458 提交于
-
由 xiaoguoguo626807 提交于
* review * modify opcompat bug * modify pybind
-
由 Charles-hit 提交于
-
由 Danyang Zhang 提交于
* delete bf16 of cross entropy * delete bf16 of cross entropy
-
由 zhoutianzi666 提交于
* decrease_peak_memory
-
由 Galaxy1458 提交于
-
由 Galaxy1458 提交于
-
由 Galaxy1458 提交于
-
由 Galaxy1458 提交于
-
由 Galaxy1458 提交于
-
由 ronnywang 提交于
-
由 zhangyuqin1998 提交于
-
- 18 5月, 2023 2 次提交
-
-
由 houj04 提交于
-
由 Galaxy1458 提交于
-