- 25 5月, 2023 5 次提交
-
-
由 zhangkaihuo 提交于
-
由 thunder95 提交于
-
由 zhouweiwei2014 提交于
-
由 Leo Chen 提交于
* add log for memory stats * fix string_split in einsum
-
由 张春乔 提交于
-
- 24 5月, 2023 8 次提交
-
-
由 Yiqun Liu 提交于
* Try to increase the repeat of autotune and fix the setting of allow_tf32_cublas. * Change the repeat of cublaslt to 10. * Use FLAGS_cublaslt_exhaustive_search_times as repeats. * Fix compiling error on CI. * Polish the key and simplify codes.
-
由 zhangyuqin1998 提交于
-
由 zhangyuqin1998 提交于
* move raw kernels to legacy * Update elementwise_add_kernel.cu * fix
-
由 wz1qqx 提交于
-
由 Leo Guo 提交于
Fixed the bug in the api.cc file where there was an inconsistency between the specified type (std::vector<DenseTensor*>&) in the function pointer kernel_signature and the type of the phi kernel parameter (std::vector<DenseTensor*>) when the phi kernel is set to output as std::vector<DenseTensor*>. test=kunlun (#54053)
-
由 xiaoguoguo626807 提交于
-
由 Winters Montagne 提交于
Removed unnecessary header files introduced
-
由 lijin23 提交于
[XPU][PHI Kernels] bind bitwise_add kernel & add int32/int64 support to scatter_nd_add kernel for xpu (#54066) * bind new kernels to xpu * refine code * fix bugs in unittest
-
- 23 5月, 2023 14 次提交
-
-
由 Zhang Zheng 提交于
* [AMP OP&Test] Support float16 in selu * fix
-
由 LiYuRio 提交于
-
由 RuohengMa 提交于
-
由 zhenhailiu 提交于
* merge code from forsish * polish * paddle/fluid/pybind/auto_parallel_py.cc * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish * polish
-
由 gouzil 提交于
* [phi] autogen code tril_triu * [phi][api]fix tril_triu_grad args * [fluid] clean cmake; [phi] fix infer_meta
-
由 co63oc 提交于
-
由 weishengying 提交于
-
由 co63oc 提交于
* Fix typos * Fix
-
由 cyberslack_lee 提交于
-
由 huangjiyi 提交于
* update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update HostAlloc * update param name * update cpu kernel * remove kernel header * update * update
-
由 huangjiyi 提交于
* update * update * update * set out dtype
-
由 Wang Xin 提交于
* static graph autogen code support for pad3d op * bug fixed * add ut for pad3d mkldnn op * fix coverage * fix bug * fix bug * Delete test_pad3d_mkldnn_op.py
-
由 zhangyikun02 提交于
-
由 LoneRanger 提交于
* fix the static op generation for group_norm * fix bug of mismatch * fix bug of AssertionError * fix setting of composite
-
- 22 5月, 2023 9 次提交
-
-
由 risemeup1 提交于
* update_c++14_to_c++17_on_windows * disable test_audio_logmel_feature and test_audio_mel_feature
-
由 risemeup1 提交于
-
由 lijin23 提交于
* fix empty bugs for xpu * fix empty bugs for xpu
-
由 zhupengyang 提交于
-
由 zhupengyang 提交于
-
由 zhoutianzi666 提交于
* fix transfer_layout when input size if too big * do not add TransferLayoutKernelGPU * add int64 and add check
-
由 zhangyikun02 提交于
-
由 Tian Zheng 提交于
* Add GPU kernel for multiclass_nms3 op * Make multiclass_nms3 gpu kernel output consistent with cpu kernel * Fix API incompatibility * Fix unittests on builds without CUDA * Fix ROCM build * Remove fluid headers; Use default atol for unittest * Change function and variable naming * Add comments; Reduce redundant code * Use paddle test framework
-
由 wangshengxiang 提交于
* bind xpu op: 3D grid sample * fix edge cases in xpu op: reshape & slice
-
- 19 5月, 2023 4 次提交
-
-
由 wz1qqx 提交于
-
由 warrentdrew 提交于
* add minimum grad composite rules * add public python api * fix format * fix format * update testcase * fix testcase * fix format * fix cmakelist.txt * fix format * fix param problem * fix op and composite rule * fix bf16 cpu support problem * fix bf16 cpu issue * fix axis error log * add axis for maximum * revert commit * remove .orig * fix generic problem * revert max op * fix axis error * fix maximum axis * fix test_check_output * fix cinn * fix minimum maximum axis check
-
由 limingshu 提交于
* Reorganize the forward codes of flash-attention. * Fix forward. * Remove some noused codes. * Simplify codes and fix backward. * Change all LOG(INFO) to VLOG and fix the backward. * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes * decrease the effect of debug print on performance * Unify the initialize of flashattn arguments. * Rewirte the reshape of temp_mask and temp_bias. * API support use_flash_attn. * Fix compiling error on CI. * Try to crop the flash-attention lib. * Correct the condition of whether can use flash-attn. * Remove the softmax_out argument. * Remove is_causal. * Polish codes. * Fix qkv_transpose_out's shape and scaling of Q * K. * Update commit of flash-attention. --------- Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
由 limingshu 提交于
-