提交 · d2be870a49144987eec5a3b1b18d14a8eec03858 · BaiXuePrincess / Paddle

26 10月, 2021 1 次提交

[cherry-pick-2.2] Fused attention op forward (#35905) (#36708) · d2be870a

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

d2be870a

17 9月, 2021 1 次提交

add a fusion op: fused_layernorm_residual_dropout_bias (#35151) · 7975dfcf

由 zhangkaihuo 提交于 9月 17, 2021

Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. 
No Python API changed.

7975dfcf

16 9月, 2021 1 次提交
- Z
  
  add a fusion op: fused_dropout_act_bias (#35129) · cee70434
  由 zhangkaihuo 提交于 9月 16, 2021
  
  cee70434
09 9月, 2021 1 次提交
- Z
  
  add a fusion op: fused_residual_dropout_bias (#34963) · cf8bf032
  由 zhangkaihuo 提交于 9月 09, 2021
  
  cf8bf032
12 8月, 2021 1 次提交

transformer c files (#34706) · 016cc56d

由 Feng Xing 提交于 8月 12, 2021

This PR adds fused transformer related files defining c interface including class, function etc..

016cc56d

06 5月, 2021 1 次提交

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

03 3月, 2021 1 次提交
- Q
  [ROCM] update fluid operators for rocm (part3), test=develop (#31213) · 84639b61
  由 Qi Li 提交于 3月 03, 2021
```
* [ROCM] update fluid operators for rocm (part3), test=develop

* fix clang format error, test=develop
```
  84639b61
27 1月, 2021 1 次提交

REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719) · f8da5536

由 jakpiase 提交于 1月 27, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

* changed stream handling

* minor change

* added datatype to GetExpectedKernelType()

* added reading stream from TLS

f8da5536

26 1月, 2021 2 次提交

T
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)" (#30708) · 824a79d3
由 Tao Luo 提交于 1月 26, 2021
```
This reverts commit d834f4e6.
```
824a79d3

Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661) · d834f4e6

由 jakpiase 提交于 1月 26, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

d834f4e6

07 12月, 2020 1 次提交

Compiling operator libraries with Unity build (#29130) · 671555ed

由 LoveAn 提交于 12月 07, 2020

* Compiling operator libraries with Unity Build on Windows CPU.

* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci

* Add option in windows ci script, no_test, test=windows_ci

* Optimize parallel compiling, test=develop

* remove limit of parallel compile and skip some ops in UB, test=develop

* remove changes of header file, test=develop

* remove changes of header file, test=develop

* fix test_eye_op unittest failed, test=develop

* Compiling operator libraries with Unity Build on Linux, test=develop

* set default WITH_UNITY_BUILD=OFF, test=develop

* Move unity build rules into a single file and add comment, test=develop

* optimize parallel compilation, test=develop

* fix undefined reference error on coverage ci, test=develop

671555ed

12 11月, 2020 1 次提交
- S
  裁剪transformer模型trt支持；修复tensorRT不支持DeletePass的bug (#28517) · 8699f38d
  由 Shang Zhizhou 提交于 11月 12, 2020
```
* skip_layernorm_op done

* add unittest

* slice op convertor support trt < 6

* skip_layernorm only work in ernie
```
  8699f38d
23 9月, 2020 1 次提交
- Z
  add fuse_bn_act op (#27230) · 906e7f92
  由 Zhang Ting 提交于 9月 23, 2020
```
* add fused_bn_add_relu op
```
  906e7f92
06 8月, 2020 1 次提交

Add oneDNN fusion_gru kernel (#25594) · 68c6160e

由 Adam 提交于 8月 06, 2020

* Add oneDNN fusion_gru kernel and fix fc+gru pass
test=develop

* Formatting changes
test=develop

* Lint fixes
test=develop

* Add memory::format_tag::any to GRU weights
test=develop

* Fix build with CUDA

* Fix build with CUDA v2

68c6160e

11 3月, 2020 1 次提交

[Ernie GPU Optimize]: Embedding_eltwise_layernorm Fuse (#22494) · 8d6dc102

由 Zhaolong Xing 提交于 3月 11, 2020

* 1. add embedding eltwise layernorm fuse
2. add embedding eltwise layernorm op
3. refine inplace_add_relu
4. refine fc_eltwise_layernorm
test=develop

* 1. refine fc
test=develop

* fix comments
test=develop

* fix comments

test=develop

8d6dc102

10 1月, 2020 1 次提交

Add bn and relu fuse pass (#22048) · 46189b16

由 Zhen Wang 提交于 1月 10, 2020

* add bn and relu fuse pass

* add op attr assert and dtype assert

* fix some inputs&&outputs bugs for the fused op and pattern.

* add the unittest for fuse_bn_act_pass. test=develop

* use normative enforce statements. test=develop

* add the cpu test. test=develop

* add the support of batch_size=1 for the bn with relu op. test=develop

* add the error type for paddle throws. test=develop

* add fused_batch_norm_act and fused_batch_norm_act_grad to op_has_unsed_vars_white_list. test=develop

46189b16

03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

30 10月, 2019 1 次提交

Move the codes of fused operators to operators/fused directory. (#20881) · 03ba0fda

由 Yiqun Liu 提交于 10月 30, 2019

* Move the codes of fused operators to operators/fused directory.
test=develop

* Correct the op name in cmake.

* Change the use of PADDLE_ENFORCE.
test=develop

03ba0fda

19 9月, 2019 1 次提交

Add a pass to fuse fc+elementwise_add+layernorm (#19776) · 3cd985a6

由 Yiqun Liu 提交于 9月 19, 2019

* Add fc_elementwise_layernorm_fuse pass and unittest.

* Add fused_fc_elementwise_layernorm op and its GPU kernel.
test=develop

* Apply fc_elementwise_layernorm_fuse_pass to GPU inference.

* Add the setting of attrs in the definition of binary_op.
test=develop

* Add comment.

* Implement the unittest.
test=develop

* Change the unittest name of layer_norm.
test=develop

3cd985a6

03 1月, 2019 1 次提交
- Q
  Fix compling error with cuDNN v5 (#15148) · c981bf0f
  由 qingqing01 提交于 1月 03, 2019
```
test=develop
```
  c981bf0f
28 12月, 2018 1 次提交

Inception fusion operator. (#14968) · 6f0a1d7b

由 qingqing01 提交于 12月 28, 2018

* Inception fusion operator.
* Support horizontal layer fusion in conv_fusion_op.
* Search conv algo strategy for variable-length input.
   search N times and cache the searched algos. For other input, choose the algo of input whose area is closest to this input.

6f0a1d7b

26 11月, 2018 1 次提交
- Q
  Transpose-Flatten-Concat fusion operator. (#14568) · 6224e61f
  由 qingqing01 提交于 11月 26, 2018
```
* Transpose-Flatten-Concat fusion operator.
* Add unit testing and fix bug.
```
  6224e61f
16 11月, 2018 1 次提交

Refine operator cmake (#14413) · a2d9b344

由 Wu Yi 提交于 11月 16, 2018

* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop

a2d9b344

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致