- 03 1月, 2023 1 次提交
-
-
由 zhoutianzi666 提交于
* Implement conv2d_fusion NHWC format using CUTLASS * Add unit testing for CUTLASS Conv in inference * Add experimental API for CUTLASS.
-
- 29 12月, 2022 2 次提交
-
-
由 MarDino 提交于
-
由 Wang Bojun 提交于
* fusedAttenGrad_noGrad * code style fix * add ut * remove unnecessary log
-
- 23 12月, 2022 1 次提交
-
-
由 lzy 提交于
-
- 20 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move dropout_impl from fluid to phi * move cuda_graph_with_memory_pool from fluid to phi * update namespace * remove cuad_graph in fluid * fix mac-build * fix bugs * correct CodeStyle * fix mac-build * fix mutable_data * fix stl include * fix copy param
-
- 19 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 16 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 15 12月, 2022 2 次提交
-
-
由 huangjiyi 提交于
-
由 Sławomir Siwek 提交于
* fix wrong handler name * mkldnn_engine -> onednn_engine * remove fluid/errors.h imports * remove fluid/enforce.h imports * remove note and unnecessary import * remove fluid/pretty_log.h imports * remove fluid/place.h imports * remove fluid/data_layout_transform.h imports * remove fluid/device_context.h imports * remove mkldnn_helper code * remove fluid/mkldnn_reuse.h imports * pretty_log import
-
- 14 12月, 2022 2 次提交
-
-
由 Ming-Xu Huang 提交于
-
由 zqw_1997 提交于
* modify cmake file for cuda11.8 compile * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor)
-
- 13 12月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* save fused_attention memory when dropout_rate = 0.0 * add ut * fix ut bug * fix fused_layernorm_residual_dropout_bias_test.cu
-
- 12 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move norm_utils.cu.h from fluid to phi * remove norm_utils.h in fluid * fix bugs and replace mutable_data with Alloc * replace mutable_data with Alloc
-
- 09 12月, 2022 2 次提交
- 08 12月, 2022 1 次提交
-
-
由 limingshu 提交于
-
- 07 12月, 2022 1 次提交
-
-
由 张春乔 提交于
-
- 06 12月, 2022 1 次提交
-
-
由 zyfncg 提交于
* delete Bias and ResidualData in OpMaker of conv2d * delete extra input of conv3d * refactor pass of conv_bias_fusion * fix mkldnn dependency * fix mkldnn compile * fix test_conv_bias_mkldnn_fuse_pass * police some code * remove useless log * fix analyzer_vit_ocr_tester * fix conv_activation_mkldnn_fuse_pass * fix test_analyzer_ocr * add fused_conv_sig * fix performence regression * fix performance regression
-
- 05 12月, 2022 2 次提交
-
-
由 limingshu 提交于
* first commit * fix bugs according to ci * add some changes * change file name into function.cu.h * remove const_cast
-
由 zhoutianzi666 提交于
-
- 01 12月, 2022 1 次提交
-
-
由 minghaoBD 提交于
* fuse-mt passes compatible with structured pruning
-
- 30 11月, 2022 4 次提交
-
-
由 Netpunk 提交于
* migrate transpose_op.cu.h and gpu_utils.h * format code style * fix some problems * format code * reset tranpose_op.cc * test commit * recover transpose_op.h * delete transpose_op.h * adjust header files order in transpose_op.cc
-
由 MarDino 提交于
* add activation support * fix cublasLt bug * remove useless code and fix test random range
-
由 zhangbo9674 提交于
* add fuse act add grad pass * polish code * refine code * add test * refine code
-
由 RichardWooSJTU 提交于
* delete unnecessary shape and slice op Co-authored-by: NYour Name <you@example.com>
-
- 29 11月, 2022 2 次提交
-
-
由 lzy 提交于
* fix mma_tensorcore (__CUDA_ARCH__) * disable tensorcore by default. disable tensorcore by default, because the judgment of __CUDA_ARCH__ will cause undefined behavior in some environments, can manually enable it on a machine that supports tensorcore.
-
由 Sławomir Siwek 提交于
-
- 28 11月, 2022 4 次提交
-
-
由 Wang Bojun 提交于
* add trt support
-
由 huangjiyi 提交于
* decouple cudnn_desc.h from fluid * move cudnn_desc.h from fluid to phi * fix bugs * decouple cudnn_helper.h from fluid * fix bugs * move cudnn_helper.h from fluid to phi * add fluid cudnn_helper.h * move miopen_desc.h from fluid to phi * move miopen_helper.h from fluid to phi * fix bugs * move gpu_dnn.h from fluid to phi * fix bugs * update copyright year * simplify gpu_dnn.h in fluid * fix bugs * fix xpu build bug * fix compile bug * fix bug
-
由 张春乔 提交于
-
由 MarDino 提交于
-
- 23 11月, 2022 1 次提交
-
-
由 MarDino 提交于
* use fused mlp in multi transformer * Restruct code * use cublaslt to fuse ffn * fix conflict
-
- 22 11月, 2022 2 次提交
-
-
由 Tian Zheng 提交于
* Skip tests that use fused_ops on H100 * Add error message to FusedOps on H100
-
由 huangjiyi 提交于
* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi * update copyright years * rm "fluid/platform/device/gpu/gpu_device_function.h" in phi * rm dependence to "gpu_device_function.h" in fluid * rm gpu_device_function.h etc in fluid * fix rocm-complie bugs * fix cuda_helper_test.cu bugs
-
- 21 11月, 2022 1 次提交
-
-
由 lzy 提交于
* use mma for QK dot computing in fused_multi_transformer. * Update fused_multi_transformer_op.cu.h
-
- 18 11月, 2022 3 次提交
-
-
由 MarDino 提交于
* fused qkvBiasAdd and transpose with split qkv * fix typo * fix format * fix name * add annotation * fix comment
-
由 MarDino 提交于
* Add quick gelu and fused bias add kernel * fix annotation * remove useless code * add fast gelu option and set it in multi transformer op * add flag to restrict if use fast gelu approximate * fix flags conflict * fix use tanh function instead * add cudart version limit * use phi fast tanh func * fix comment
-
由 Wang Xin 提交于
* remove "gpu_primitives.h" in fluid namespace * fix PR-CI-GpuPS fail * fix PR-CI-GpuPS fail
-
- 17 11月, 2022 2 次提交
-
-
由 YuanRisheng 提交于
* standard api * fix xpu bugs
-
由 taixiurong 提交于
-