- 06 2月, 2023 1 次提交
-
-
由 engineer1109 提交于
-
- 03 2月, 2023 2 次提交
-
-
由 Sławomir Siwek 提交于
* replace matmul with matmul_v2 in fuse passes * Remove fusion logic from matmul * removing fusion methods * add proper name * adjust namespaces * clean attrs in python tests * delete checkpoint and restore matmul version * remove unused code * matmul and reshape/transpose fuses migrated * split MatmulOneDNN headers * fuse activation and eltwise_add * add fuse_activation * matmul_transpose_reshape/reshape_transpose_matmul * matmul + elementwise_add (fused) * activation temporary modifciation * merge newest develop * remove depedency from other PR * revert pbtxt * remove placeholders from matmul_v2 * add description in OPMaker * remove matmul_v2_op.h and all depedencies * remove dims changing in base op * add possibility to fuse already fused_matmul * restart broken CI * Empty-Commit * revert matmul_utils.h * codestyle * adjust imports * add pbtxt file * 100% matmul unit tests coverage * trigger CI with minimal changes to develop * adjust changes to develop * add fused_matmul op * inherit base ops * add "v2" * move OPMaker * Gradually add fused_matmul files * second batch of fused_matmul changes * split infershapes of matmul_v2 and fused_matmul * inherit fused_matmul from matmul_v2 * Update paddle/phi/backends/onednn/onednn_reuse.h Co-authored-by: NTomasz Socha <tomasz.socha@intel.com> * Update paddle/phi/kernels/fusion/onednn/fused_matmul_kernel.cc Co-authored-by: NTomasz Socha <tomasz.socha@intel.com> --------- Co-authored-by: NTomasz Socha <tomasz.socha@intel.com>
-
由 Yuang Liu 提交于
-
- 01 2月, 2023 1 次提交
-
-
由 Wang Bojun 提交于
* preln_residual 2 fused_bias_residual * skip layernorm fix and ut * code refine * code style refine * fix ut * fix output * add trt layer fall back info * refine op teller and ut * DropoutMaskOut output fix
-
- 13 1月, 2023 1 次提交
-
-
由 Yuanle Liu 提交于
-
- 06 1月, 2023 1 次提交
-
-
由 MarDino 提交于
-
- 05 1月, 2023 1 次提交
-
-
由 Yuang Liu 提交于
-
- 04 1月, 2023 3 次提交
-
-
由 Wilber 提交于
-
由 Yuanle Liu 提交于
-
由 HongyuJia 提交于
* execute use kernel_key first * change OpKernelType->KernelKey * fix py3 compile error, remove redundant header files * fix build_strategy_test * fix DataType::RAW * fix custom_type test: operator_test.cc * fix transform place * fix backends_are_same_class * try fix place TransDataDevice * support all KernelKey * fix TransformData * fix place_are_same_class * fix merge * fix test_params_no_grad * fix specific place of GetExpectedKernelType * fix specific place of GetExpectedKernelType * fix GetKernelTypeForVar * fix dtype error * fix fetch_v2 * change GetKernelTypeForVar * fix interpreter * fix typo error * polish codes * polish codes * polish codes * fix conflict
-
- 03 1月, 2023 1 次提交
-
-
由 zhoutianzi666 提交于
* Implement conv2d_fusion NHWC format using CUTLASS * Add unit testing for CUTLASS Conv in inference * Add experimental API for CUTLASS.
-
- 29 12月, 2022 2 次提交
-
-
由 MarDino 提交于
-
由 Wang Bojun 提交于
* fusedAttenGrad_noGrad * code style fix * add ut * remove unnecessary log
-
- 23 12月, 2022 1 次提交
-
-
由 lzy 提交于
-
- 20 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move dropout_impl from fluid to phi * move cuda_graph_with_memory_pool from fluid to phi * update namespace * remove cuad_graph in fluid * fix mac-build * fix bugs * correct CodeStyle * fix mac-build * fix mutable_data * fix stl include * fix copy param
-
- 19 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 16 12月, 2022 1 次提交
-
-
由 Wen Sun 提交于
-
- 15 12月, 2022 2 次提交
-
-
由 huangjiyi 提交于
-
由 Sławomir Siwek 提交于
* fix wrong handler name * mkldnn_engine -> onednn_engine * remove fluid/errors.h imports * remove fluid/enforce.h imports * remove note and unnecessary import * remove fluid/pretty_log.h imports * remove fluid/place.h imports * remove fluid/data_layout_transform.h imports * remove fluid/device_context.h imports * remove mkldnn_helper code * remove fluid/mkldnn_reuse.h imports * pretty_log import
-
- 14 12月, 2022 2 次提交
-
-
由 Ming-Xu Huang 提交于
-
由 zqw_1997 提交于
* modify cmake file for cuda11.8 compile * add op_library(fused_embedding_eltwise_layernorm_op DEPS bert_encoder_functor)
-
- 13 12月, 2022 1 次提交
-
-
由 sneaxiy 提交于
* save fused_attention memory when dropout_rate = 0.0 * add ut * fix ut bug * fix fused_layernorm_residual_dropout_bias_test.cu
-
- 12 12月, 2022 1 次提交
-
-
由 huangjiyi 提交于
* move norm_utils.cu.h from fluid to phi * remove norm_utils.h in fluid * fix bugs and replace mutable_data with Alloc * replace mutable_data with Alloc
-
- 09 12月, 2022 2 次提交
- 08 12月, 2022 1 次提交
-
-
由 limingshu 提交于
-
- 07 12月, 2022 1 次提交
-
-
由 张春乔 提交于
-
- 06 12月, 2022 1 次提交
-
-
由 zyfncg 提交于
* delete Bias and ResidualData in OpMaker of conv2d * delete extra input of conv3d * refactor pass of conv_bias_fusion * fix mkldnn dependency * fix mkldnn compile * fix test_conv_bias_mkldnn_fuse_pass * police some code * remove useless log * fix analyzer_vit_ocr_tester * fix conv_activation_mkldnn_fuse_pass * fix test_analyzer_ocr * add fused_conv_sig * fix performence regression * fix performance regression
-
- 05 12月, 2022 2 次提交
-
-
由 limingshu 提交于
* first commit * fix bugs according to ci * add some changes * change file name into function.cu.h * remove const_cast
-
由 zhoutianzi666 提交于
-
- 01 12月, 2022 1 次提交
-
-
由 minghaoBD 提交于
* fuse-mt passes compatible with structured pruning
-
- 30 11月, 2022 4 次提交
-
-
由 Netpunk 提交于
* migrate transpose_op.cu.h and gpu_utils.h * format code style * fix some problems * format code * reset tranpose_op.cc * test commit * recover transpose_op.h * delete transpose_op.h * adjust header files order in transpose_op.cc
-
由 MarDino 提交于
* add activation support * fix cublasLt bug * remove useless code and fix test random range
-
由 zhangbo9674 提交于
* add fuse act add grad pass * polish code * refine code * add test * refine code
-
由 RichardWooSJTU 提交于
* delete unnecessary shape and slice op Co-authored-by: NYour Name <you@example.com>
-
- 29 11月, 2022 2 次提交
-
-
由 lzy 提交于
* fix mma_tensorcore (__CUDA_ARCH__) * disable tensorcore by default. disable tensorcore by default, because the judgment of __CUDA_ARCH__ will cause undefined behavior in some environments, can manually enable it on a machine that supports tensorcore.
-
由 Sławomir Siwek 提交于
-
- 28 11月, 2022 3 次提交
-
-
由 Wang Bojun 提交于
* add trt support
-
由 huangjiyi 提交于
* decouple cudnn_desc.h from fluid * move cudnn_desc.h from fluid to phi * fix bugs * decouple cudnn_helper.h from fluid * fix bugs * move cudnn_helper.h from fluid to phi * add fluid cudnn_helper.h * move miopen_desc.h from fluid to phi * move miopen_helper.h from fluid to phi * fix bugs * move gpu_dnn.h from fluid to phi * fix bugs * update copyright year * simplify gpu_dnn.h in fluid * fix bugs * fix xpu build bug * fix compile bug * fix bug
-
由 张春乔 提交于
-