- 25 10月, 2021 2 次提交
-
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
由 Li Min 提交于
In fused_attention op and fused_ffn op, the fused bias_add+dropout+residual+layernorm kernel or bias_add+dropout+residual kernel is used. To ease the use of this kernel, we provide a wrapper in this PR. 1.To reuse the increment computing code, we exact the corresponding code to "GetSeedDataAndIncrement" routine in dropout_impl_util.h. 2.The fused_dropout_helper.h provides the fused dropout kernel wrapper. Note: the test of this warper will be provided in the following fused_attention_op and fused_ffn PRs.
-
- 22 10月, 2021 1 次提交
-
-
由 niuliling123 提交于
* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 * Update the implement of reduceAnyKernel according to kernel primitive api
-
- 17 9月, 2021 2 次提交
-
-
由 feng_shuai 提交于
* broadcast qkv_op * use PADDLE_ENFORCE_GT to replace assert
-
由 zhangkaihuo 提交于
Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. No Python API changed.
-
- 16 9月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
-
- 14 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
-
- 13 9月, 2021 2 次提交
- 09 9月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
-
- 08 9月, 2021 1 次提交
-
-
由 niuliling123 提交于
-
- 06 9月, 2021 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Add fusion_lstm INT8 PTQ * Correct mkldnn_cache_capacity and enable fc_lstm_fuse_pass only for this test * Change mkldnn_cache_capacity
-
- 03 9月, 2021 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 26 8月, 2021 1 次提交
-
-
由 Li Min 提交于
Describe Add feed_forward for fused attention op. (1) Encapsulate matmul impl (forward and backward) used in attention op. (2) Implement bias_add (forward and backward) used in attention op.
-
- 23 8月, 2021 1 次提交
-
-
由 Li Min 提交于
Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op. Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h. Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.
-
- 12 8月, 2021 1 次提交
-
-
由 Feng Xing 提交于
This PR adds fused transformer related files defining c interface including class, function etc..
-
- 05 7月, 2021 1 次提交
-
-
由 WangXi 提交于
-
- 12 6月, 2021 1 次提交
-
-
由 joanna.wozna.intel 提交于
* Small changes related to BF16 fusion_gru and fusion_lstm * Correct to pass arg by value * Add conditions to rnn op * Correct the spelling mistake * Improving the test with checking activation * Trigger CI
-
- 14 5月, 2021 1 次提交
-
-
由 Kqnonrime 提交于
* fix two error message * fix two error message * fix error * fix error * fix error * fix error * fix some error message * fix some error * fix error * fix some error * fix some error * fix some error * fix one error * fix some error * fix seven error message * fix error * fix error * fix error * fix error * fix some error message * fix error * fix some error * fix some error * fix four error message * fix error * fix error
-
- 06 5月, 2021 1 次提交
-
-
由 ronnywang 提交于
* fix test_unpool_op * fix test_inplace_addto_strategy * fix test_conv2d_fusion_op * fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor * fix test_dot_op * fix test_correlation_op * fix tracer * fix test_memcpy_op
-
- 15 4月, 2021 1 次提交
-
-
由 AshburnLee 提交于
-
- 30 3月, 2021 1 次提交
-
-
由 jakpiase 提交于
-
- 26 3月, 2021 1 次提交
-
-
由 tianshuo78520a 提交于
* delete include framework.pb.h * fix error
-
- 04 3月, 2021 1 次提交
-
-
由 jakpiase 提交于
-
- 03 3月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [ROCM] update fluid operators for rocm (part3), test=develop * fix clang format error, test=develop
-
- 19 2月, 2021 1 次提交
-
-
由 Wojciech Uss 提交于
* Modify relu native implementation * fix GPU performance
-
- 27 1月, 2021 1 次提交
-
-
由 jakpiase 提交于
* added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes * changed stream handling * minor change * added datatype to GetExpectedKernelType() * added reading stream from TLS
-
- 26 1月, 2021 2 次提交
-
-
由 jakpiase 提交于
* added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes
- 25 1月, 2021 2 次提交
-
-
由 arlesniak 提交于
* More precise mkldnn kernel choice in GetExpectedKernelType * Fixes after review * Refresh develop for CI * CI experiment * get back from CI exper
-
由 Jacek Czaja 提交于
-
- 11 1月, 2021 3 次提交
-
-
由 石晓伟 提交于
-
由 wangchaochaohu 提交于
-
由 AshburnLee 提交于
-
- 10 1月, 2021 1 次提交
-
-
由 wangchaochaohu 提交于
reduce the occupied size of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
-
- 06 1月, 2021 1 次提交
-
-
由 石晓伟 提交于
-
- 28 12月, 2020 1 次提交
-
-
由 Jack Zhou 提交于
* add gru op_register_version; test=op_version; * Update fc,mul version;test=op_version;
-
- 14 12月, 2020 1 次提交
-
-
由 Jacek Czaja 提交于
-
- 07 12月, 2020 1 次提交
-
-
由 LoveAn 提交于
* Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop
-
- 27 11月, 2020 1 次提交
-
-
由 arlesniak 提交于
-