- 31 8月, 2023 1 次提交
-
-
由 Tian Zheng 提交于
* Add fused_scale_bias_relu_conv_bnstats op * Review changes * Fix no CUDNN Frontend build * Fix PADDLE_ENFORCE format * Fix PADDLE_ENFORCE CI error * Rename kernel filename * Refactor unittest to use paddle eager_op_test * Fix padding bugs * Review changes * test=cuda117 * test=cuda117
-
- 10 8月, 2023 1 次提交
-
-
由 lzy 提交于
* add variable_length_memory_efficient_attention * update variable_length_memory_efficient_attention unittest * update variable_length_mem_eff_attn's docs and unittest * update variable_length_mem_eff_attn's docs * Update test_variable_length_memory_efficient_attention.py * Update variable_length_memory_efficient_attention.cu * fix codestyle * fix variable_length_fmha's docs and unittest * fix variable_length_fmha's docs
-
- 08 8月, 2023 1 次提交
-
-
由 huangjiyi 提交于
-
- 31 7月, 2023 1 次提交
-
-
由 wanghuancoder 提交于
support stride
-
- 01 6月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
-
- 26 5月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* create phi so * fix ci bugs * fix py3 bugs * add file * fix py3 bugs * fix windows bugs * perfect so * fix py3 bugs * delete all static target in phi * fix windows bugs * fix py3 bugs * fix ci bugs * fix windows bugs * fix bugs: gflags can't be linked by dynamic and static lib * fix bugs that can not load 3rd party * fix ci bugs * fix compile bugs * fix py3 bugs * fix conflict * fix xpu bugs * fix mac compile bugs * fix psgpu bugs * fix inference failed * deal with conflict * fix LIBRARY_PATH bug * fix windows bugs * fix onednn error * fix windows compile bugs * fix windows compile bugs * fix test_cuda_graph_static_mode_error aborted * fix windows bugs * fix mac-python3 error * fix hip compile bugs * change mode to static * change to static mode * fix ci bugs * fix py3 bugs * fix windows bugs * fix bugs * add static flag * add PADDLE_API * change position of PADDLE_API * fix windows bugs * change mode to dynamic lib * fix windows static bugs * deal with conflict * fix windows unit bug * fix coverage * deal with conflict * fix windows-inference * fix py3 bugs * fix bugs when compile type_info * fix compile bugs * fix py3 bugs * fix windows bugs * fix windows openblas * fix xpu bugs * fix enforce_test in windows * update code according comment * fix windows cmake bug * fix windows bugs * fix windows bugs * delete cinn unittest * fix cinn bugs --------- Co-authored-by: lzydev <1528794076@qq.com>
-
- 24 5月, 2023 1 次提交
-
-
由 zhangyuqin1998 提交于
-
- 18 5月, 2023 1 次提交
-
-
由 huangjiyi 提交于
-
- 15 5月, 2023 1 次提交
-
-
由 chalsliu 提交于
* Reduce inference library size and compile time * resolve conflicts
-
- 09 5月, 2023 1 次提交
-
-
由 RuohengMa 提交于
* bind sparse_coo_tensor, reduce_max/max_int32, range/arange_int32, equal_bool, scatter_grad_float32, nearest_interp_int64 kernels * add more unit tests; modify compilation logic of xpu sparse kernels
-
- 06 5月, 2023 1 次提交
-
-
由 zhangyuqin1998 提交于
* move UniformRawKernel to legacy * Update uniform_kernel.cc * Update uniform_kernel.cu * Update uniform_kernel.cc * Update uniform_kernel.cu * Update uniform_kernel.h * Update uniform_kernel.cc * Empty Commit to setup deployments
-
- 20 4月, 2023 1 次提交
-
-
由 zhangyuqin1998 提交于
* setup * Update elementwise_kernel.cc * Update elementwise_kernel.cc * fix * fix * Update elementwise_kernel.cu * fix * Update elementwise_kernel.cc * Update elementwise_kernel.cc * Update elementwise_kernel.cc * Update elementwise_kernel.cc * Update elementwise_kernel.cc * Update elementwise_kernel.cc
-
- 17 4月, 2023 1 次提交
-
-
由 zhoutianzi666 提交于
* initial commit for cutlass_teller * second commit for cutlass_teller * add conv2d_depthwise python template * add conv2d_depthwise cutlass template * /zhoukangkang/paddle_cutlass/Paddle/paddle/fluid/framework/ir/cutlass_teller.h * refine code in Conv2dFusionCanSupport * add macro in cutlass_teller.h * add 3x3 5x5 teller * add groups not 1 or conv2d_depthwise teller * 只生成ic是8的倍数的conv2d_depthwise 的kernel * add EXPLICIT in cutlass_teller.h * final commit * add split_k_slices in conv2d_depthwise * make stages == 2 * 重构部分代码 * add CutlassFusionType * solve illegal memory * make stride_h=stride_w && make dilation==1 * must check HasAttr(use_cutlass) before GetAttrIfExists * add CONV2D_DEPTHWISE_BIAS_SILU to OpType2String * modify decl.h and util.cu
-
- 14 4月, 2023 1 次提交
-
-
由 gouzil 提交于
* [phi] move sequence_pool kernel to phi * [phi] mv sequence_pooling to phi funcs * [phi] mv sequence_pooling_test * [phi] RollBACK `paddle/fluid/operators/sequence_ops/sequence_pool_op.cc` * [phi][funcs] fix mutable_data * [phi][funcs] fix mutable_data
-
- 04 4月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
-
- 31 3月, 2023 1 次提交
-
-
由 YuanRisheng 提交于
* remove distribute * fix py3 bugs * fix gpu-ps bugs * fix compile bugs * fix unittest bugs
-
- 29 3月, 2023 1 次提交
-
-
由 sneaxiy 提交于
* fix generate_kernels.py in CUDA 12.0 * fix attrs bug
-
- 24 3月, 2023 1 次提交
-
-
由 ZhangDY-6483 提交于
* first version, notest * return final rst, notest * use infinity() instead of max * ut structure * start up of ut * generate lse * update * add depense * reconstruct cmake * move file * add memory efficient attention and fix blasimpl * update * update cmake * add namespace * update cmake * use .cu * update for pad3d * bug fix * bug fix * update * bug fix * update enforce * add test case * merge the lse pad * fix kernel_fn of backward * fix PADDLE_ENFORCE_EQ and phi_api * fix PADDLE_ENFORCE * fix PADDLE_ENFORCE * rerun coverage * fix memory efficient attention test * rerun ci * add cuda version condition * add cuda version condition * delete WIP test * replace PADDLE_ENFORCE * edit the namespace of datatype in multiple.cc * rerun * rerun --------- Co-authored-by: Nliuyuang <liuyuang@baidu.com>
-
- 22 3月, 2023 2 次提交
- 16 3月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
* remove contexts in tensor_utils * update from_blob * update from_blob * update from_blob * fix bug * fix bug
-
- 15 3月, 2023 1 次提交
-
-
由 umiswing 提交于
-
- 13 3月, 2023 1 次提交
-
-
由 zhoutianzi666 提交于
* use python to generate cutlass code * refine CommonConvKernelPart1, CommonConvKernelPart2 * remove useless code in generate_cutlass_code.sh * add more config in conv2d_residual * CommonCutlassConvKernelPart1 and CommonCutlassConvKernelPart2 * add group conv support in util.cu * remove .sh * refine name * make name goodgit status! * add fuse_alpha * make code easy to understand * mot fopen generate in py * use python script to generate conv2d,group=1 cutlass code * use const & * use const & && use python script to generate conv2d/group=1 code
-
- 10 3月, 2023 1 次提交
-
-
由 HappyHeavyRain 提交于
* Add function node in phi_kernel for MKLDNN * fix the bug in 'BuildInferVarKernelContext' * add infer_varkernel_utils.cc * fix the bug:the first two parametes of 'BuildInferVarKernelContext' can't be template variable * change the code according to first review * change the code according to first review * change the mode of paddle_build.sh * change 'infer_var_kernel_fn_' to 'get_kerneltype_forvar_fn_' * add the error information * fix NotFound infomation warning * fix NotFound infomation warning * fix NotFound infomation warning
-
- 09 3月, 2023 1 次提交
-
-
由 TaoTao Li 提交于
* * add comm context for device context * add broadcast phi operator kernel and api * add broadcast support dtype, update ut * fix broadcast bfloat16 type * fix ut * update test_collective_broadcast_api timeout to 300
-
- 01 3月, 2023 1 次提交
-
-
由 Chitsing KUI 提交于
* flash attn * seed * almost * softmax * fix workspace * add unitest; linux only * fix setup * fix datatype include * fix setup typo * fix def scope * new error api * use paddle fork * fix attr bug; complete ut * update flash hash * fix rng reset * fix offset * fix comments
-
- 16 2月, 2023 1 次提交
-
-
由 Huang Jiyi 提交于
* move variable_utils from phi_api_utils to fluid * fix coment * update include * fix bugs * fix bugs * fix bugs * fix bugs * fix bugs * update * update * fix CI-Windows-OpenBLAS * fix bugs * fix bugs * fix bugs * update include * move variable_utils to phi_utils * fix namespace
-
- 03 1月, 2023 1 次提交
-
-
由 zhoutianzi666 提交于
* Implement conv2d_fusion NHWC format using CUTLASS * Add unit testing for CUTLASS Conv in inference * Add experimental API for CUTLASS.
-
- 23 12月, 2022 1 次提交
-
-
由 Hui Zhang 提交于
* add warp transducer code
-
- 22 12月, 2022 1 次提交
-
-
由 xiaoxiaohehe001 提交于
-
- 19 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move gather_scatter_kernel from fluid to phi * mv gather_scatter_kernel to gather_scatter_functor
-
- 17 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 16 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 06 12月, 2022 1 次提交
-
-
由 zyfncg 提交于
* delete Bias and ResidualData in OpMaker of conv2d * delete extra input of conv3d * refactor pass of conv_bias_fusion * fix mkldnn dependency * fix mkldnn compile * fix test_conv_bias_mkldnn_fuse_pass * police some code * remove useless log * fix analyzer_vit_ocr_tester * fix conv_activation_mkldnn_fuse_pass * fix test_analyzer_ocr * add fused_conv_sig * fix performence regression * fix performance regression
-
- 05 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
-
- 18 11月, 2022 1 次提交
-
-
由 Tian Zheng 提交于
* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation * Fix macro * Add implementation for conv_kernel and conv_grad_kernel * Modification after rebase onto latest develop * Modify plan cache to comply with the API of phi::autotune * Refactor to reduce duplicate code * Review fix: - move functions in conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu - add const specifier for input tensor - add logging when plans fail to execute - move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h * - move plan building outside of cache * Fix ROCM build
-
- 31 10月, 2022 1 次提交
-
-
由 ronnywang 提交于
* [CustomDevice] GetCCLComm add custom device support * update * update * update
-
- 20 10月, 2022 1 次提交
-
-
由 JingZhuangzhuang 提交于
* Add infer prune function * Update phi.cmake * Update operators.cmake * add fusion op
-
- 19 9月, 2022 1 次提交
-
-
由 zhangkaihuo 提交于
* sparse infer_meta
-
- 09 9月, 2022 1 次提交
-
-
由 Chen Weihang 提交于
* add fusion dir and fuse_softmax_mask kernel * remove fusion kernel dir * migrate infershape * fix code errror
-