- 26 6月, 2023 1 次提交
-
-
由 XiaociZhang 提交于
* [XPU] support xpu runtime profiler: follow up * fix compile issue
-
- 20 6月, 2023 1 次提交
-
-
由 XiaociZhang 提交于
* [kunlun] avoid compile issue in non-xpu env also rename macro WITH_XPU_XPTI to WITH_XPTI * move get_xpti_dependency.sh to tools/xpu * move get_xpti_dependency.sh to tools/xpu * call get_xpti_dependency.sh only in need
-
- 16 6月, 2023 1 次提交
-
-
由 jameszhang 提交于
* [kunlun] support xpu runtime profiler * fix cmake error * add libxpti.so to paddle package * fix for style check * sync change in setup.py and python/setup.py.in * remove libxpti.so from paddle output dir in this PR
-
- 26 5月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* create phi so * fix ci bugs * fix py3 bugs * add file * fix py3 bugs * fix windows bugs * perfect so * fix py3 bugs * delete all static target in phi * fix windows bugs * fix py3 bugs * fix ci bugs * fix windows bugs * fix bugs: gflags can't be linked by dynamic and static lib * fix bugs that can not load 3rd party * fix ci bugs * fix compile bugs * fix py3 bugs * fix conflict * fix xpu bugs * fix mac compile bugs * fix psgpu bugs * fix inference failed * deal with conflict * fix LIBRARY_PATH bug * fix windows bugs * fix onednn error * fix windows compile bugs * fix windows compile bugs * fix test_cuda_graph_static_mode_error aborted * fix windows bugs * fix mac-python3 error * fix hip compile bugs * change mode to static * change to static mode * fix ci bugs * fix py3 bugs * fix windows bugs * fix bugs * add static flag * add PADDLE_API * change position of PADDLE_API * fix windows bugs * change mode to dynamic lib * fix windows static bugs * deal with conflict * fix windows unit bug * fix coverage * deal with conflict * fix windows-inference * fix py3 bugs * fix bugs when compile type_info * fix compile bugs * fix py3 bugs * fix windows bugs * fix windows openblas * fix xpu bugs * fix enforce_test in windows * update code according comment * fix windows cmake bug * fix windows bugs * fix windows bugs * delete cinn unittest * fix cinn bugs --------- Co-authored-by:
lzydev <1528794076@qq.com>
-
- 23 5月, 2023 1 次提交
-
-
由 co63oc 提交于
-
- 19 5月, 2023 1 次提交
-
-
由 limingshu 提交于
* Reorganize the forward codes of flash-attention. * Fix forward. * Remove some noused codes. * Simplify codes and fix backward. * Change all LOG(INFO) to VLOG and fix the backward. * add scale for AF2 flash_attn, much thanks to xreki and shaojie for debug these codes * decrease the effect of debug print on performance * Unify the initialize of flashattn arguments. * Rewirte the reshape of temp_mask and temp_bias. * API support use_flash_attn. * Fix compiling error on CI. * Try to crop the flash-attention lib. * Correct the condition of whether can use flash-attn. * Remove the softmax_out argument. * Remove is_causal. * Polish codes. * Fix qkv_transpose_out's shape and scaling of Q * K. * Update commit of flash-attention. --------- Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
-
- 25 4月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* add flags for phi * fix compile bugs * fix ci bugs * fix inference bugs * fix cinn' bugs * fix cinn bugs * perfect code according comment * fix ci bugs * fix ci bugs
-
- 17 4月, 2023 1 次提交
-
-
由 张春乔 提交于
-
- 14 4月, 2023 1 次提交
-
-
由 umiswing 提交于
-
- 03 4月, 2023 1 次提交
-
-
由 engineer1109 提交于
-
- 01 3月, 2023 1 次提交
-
-
由 Chitsing KUI 提交于
* flash attn * seed * almost * softmax * fix workspace * add unitest; linux only * fix setup * fix datatype include * fix setup typo * fix def scope * new error api * use paddle fork * fix attr bug; complete ut * update flash hash * fix rng reset * fix offset * fix comments
-
- 06 1月, 2023 1 次提交
-
-
由 张春乔 提交于
-
- 23 12月, 2022 1 次提交
-
-
由 Hui Zhang 提交于
* add warp transducer code
-
- 12 12月, 2022 1 次提交
-
-
由 傅剑寒 提交于
* fix codestyle * add double complex<float> complex<double> dtype support for syevj_batched * fix use_syevj flag for precision loss when input dtype of syevj_batch is complex128 in some case * optimize eigh in different case * fix missing ; bug * fix use_syevj bug * fix use_cusolver_syevj_batched flag
-
- 24 11月, 2022 1 次提交
-
-
由 PuQing 提交于
-
- 15 11月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* rm "paddle/fluid/platform/complex.h" in phi * fix codestyle with pre-commit
-
- 10 11月, 2022 1 次提交
-
-
由 huangjiyi 提交于
[PHI Decoupling] remove dependency on "paddle/fluid/platform/errors.h" and "paddle/fluid/platform/fast_divmod.h" in phi. (#47815) * rm "paddle/fluid/platform/errors.h" in phi * rm "paddle/fluid/platform/fast_divmod.h" in phi
-
- 03 11月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 02 11月, 2022 1 次提交
-
-
由 Tian Zheng 提交于
* Add build option for CUDNN Frontend API * Fix review comments * Change namespace for cudnn_frontend.h
-
- 19 10月, 2022 1 次提交
-
-
由 Yuanle Liu 提交于
-
- 17 10月, 2022 1 次提交
-
-
由 RedContritio 提交于
-
- 18 9月, 2022 1 次提交
-
-
由 RichardWooSJTU 提交于
-
- 14 9月, 2022 1 次提交
-
-
由 JingZhuangzhuang 提交于
* Delay TensorRT registry * Add unused define * Fix TensorRT test * fix function to reference * Update trt_plugin.h
-
- 01 8月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 22 7月, 2022 1 次提交
-
-
由 yuguo 提交于
-
- 18 7月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 12 7月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* clean glog header in public header * move marco pos
-
- 28 6月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
* [Sparse]add SparseTensor mv kernel(csr*dense_vec->dence_vec, coo*dense_vec->dense_vec) * fix CI
-
- 24 6月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
-
由 xiongkun 提交于
-
- 18 6月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
-
- 15 6月, 2022 2 次提交
-
-
由 zhouweiwei2014 提交于
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul * fix CI * fix CI * fix comment * fix comment
-
由 Ruibiao Chen 提交于
* Refactor port.h * Remove some unnecessary code * Fix CI errors
-
- 13 6月, 2022 1 次提交
-
-
由 Ruibiao Chen 提交于
-
- 09 6月, 2022 1 次提交
-
-
由 minghaoBD 提交于
-
- 05 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 04 6月, 2022 1 次提交
-
-
由 Sing_chan 提交于
-
- 04 5月, 2022 1 次提交
-
-
由 XiaoguangHu 提交于
-
- 22 4月, 2022 1 次提交
-
-
由 Ming-Xu Huang 提交于
* Fix leading dimension setting error in fused_gemm_epilogue_grad_op. * Add dyload to cuBlasLt functions. * Added cublasLtMatmulAlgoGetHeuristic to improve performance. * Added FLAGS_cublaslt_exhaustive_search_times to cublasLt epilogue * Added UTs to FLAGS_cublaslt_exhaustive_search_times * Added warmup runs in algo searching of Gemm epilogue. * Update copyright and documents. * Fixed error handling.
-
- 11 3月, 2022 1 次提交
-
-
由 zhouweiwei2014 提交于
-