- 26 10月, 2021 1 次提交
-
-
由 Li Min 提交于
功能:本PR的目标是提高attention模块的计算性能。 为了减少框架层对op的调度开销,本PR通过在C++层手动实现attention模块,对外提供attention 大op; 为了减少防存开销,本PR采取了两种优化方法: (1)在q,k,v计算时通过共享输入X,将该处的gemm,transpose和bias add从三次调用减少为一次; (2)使用kernel融合优化技术,在不同cuda kernel之间通过寄存器传输数据;
-
- 17 9月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. No Python API changed.
-
- 16 9月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
-
- 09 9月, 2021 1 次提交
-
-
由 zhangkaihuo 提交于
-
- 12 8月, 2021 1 次提交
-
-
由 Feng Xing 提交于
This PR adds fused transformer related files defining c interface including class, function etc..
-
- 06 5月, 2021 1 次提交
-
-
由 ronnywang 提交于
* fix test_unpool_op * fix test_inplace_addto_strategy * fix test_conv2d_fusion_op * fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor * fix test_dot_op * fix test_correlation_op * fix tracer * fix test_memcpy_op
-
- 03 3月, 2021 1 次提交
-
-
由 Qi Li 提交于
* [ROCM] update fluid operators for rocm (part3), test=develop * fix clang format error, test=develop
-
- 27 1月, 2021 1 次提交
-
-
由 jakpiase 提交于
* added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes * changed stream handling * minor change * added datatype to GetExpectedKernelType() * added reading stream from TLS
-
- 26 1月, 2021 2 次提交
-
-
由 jakpiase 提交于
* added external reorder to profiler * resolved conflict * added enable_static * initial version of lstm, not working yet * added lstm to operators.cmake * added vanilla lstm mkldnn op * added peephole weights integration * minor changes * added formatting * added fusion_lstm_mkldnn to static_whitelist * added formatting * removed comment * moved use_peepholes attribute inside is_cached block * reverted wrong changes * minor formatting change * minor changes
- 07 12月, 2020 1 次提交
-
-
由 LoveAn 提交于
* Compiling operator libraries with Unity Build on Windows CPU. * Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci * Add option in windows ci script, no_test, test=windows_ci * Optimize parallel compiling, test=develop * remove limit of parallel compile and skip some ops in UB, test=develop * remove changes of header file, test=develop * remove changes of header file, test=develop * fix test_eye_op unittest failed, test=develop * Compiling operator libraries with Unity Build on Linux, test=develop * set default WITH_UNITY_BUILD=OFF, test=develop * Move unity build rules into a single file and add comment, test=develop * optimize parallel compilation, test=develop * fix undefined reference error on coverage ci, test=develop
-
- 12 11月, 2020 1 次提交
-
-
由 Shang Zhizhou 提交于
* skip_layernorm_op done * add unittest * slice op convertor support trt < 6 * skip_layernorm only work in ernie
-
- 23 9月, 2020 1 次提交
-
-
由 Zhang Ting 提交于
* add fused_bn_add_relu op
-
- 06 8月, 2020 1 次提交
-
-
由 Adam 提交于
* Add oneDNN fusion_gru kernel and fix fc+gru pass test=develop * Formatting changes test=develop * Lint fixes test=develop * Add memory::format_tag::any to GRU weights test=develop * Fix build with CUDA * Fix build with CUDA v2
-
- 11 3月, 2020 1 次提交
-
-
由 Zhaolong Xing 提交于
* 1. add embedding eltwise layernorm fuse 2. add embedding eltwise layernorm op 3. refine inplace_add_relu 4. refine fc_eltwise_layernorm test=develop * 1. refine fc test=develop * fix comments test=develop * fix comments test=develop
-
- 10 1月, 2020 1 次提交
-
-
由 Zhen Wang 提交于
* add bn and relu fuse pass * add op attr assert and dtype assert * fix some inputs&&outputs bugs for the fused op and pattern. * add the unittest for fuse_bn_act_pass. test=develop * use normative enforce statements. test=develop * add the cpu test. test=develop * add the support of batch_size=1 for the bn with relu op. test=develop * add the error type for paddle throws. test=develop * add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 30 10月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Move the codes of fused operators to operators/fused directory. test=develop * Correct the op name in cmake. * Change the use of PADDLE_ENFORCE. test=develop
-
- 19 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add fc_elementwise_layernorm_fuse pass and unittest. * Add fused_fc_elementwise_layernorm op and its GPU kernel. test=develop * Apply fc_elementwise_layernorm_fuse_pass to GPU inference. * Add the setting of attrs in the definition of binary_op. test=develop * Add comment. * Implement the unittest. test=develop * Change the unittest name of layer_norm. test=develop
-
- 03 1月, 2019 1 次提交
-
-
由 qingqing01 提交于
test=develop
-
- 28 12月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Inception fusion operator. * Support horizontal layer fusion in conv_fusion_op. * Search conv algo strategy for variable-length input. search N times and cache the searched algos. For other input, choose the algo of input whose area is closest to this input.
-
- 26 11月, 2018 1 次提交
-
-
由 qingqing01 提交于
* Transpose-Flatten-Concat fusion operator. * Add unit testing and fix bug.
-
- 16 11月, 2018 1 次提交
-
-
由 Wu Yi 提交于
* wip simplify operator framework * wip * wip * done test=develop * clean test=develop * fix test=develop * fix deps test=develop * fix cpu build test=develop * fix tensorrt build test=develop * fix tests test=develop * fix test=develop * fix cpu build test=develop
-