提交 · 9516108a852dcc0f14fe787045cf0eb388f41b80 · BaiXuePrincess / Paddle

26 10月, 2021 2 次提交

Add fused attention op backward and python layer. (#36498) · 5119428e

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

5119428e

L
Move fused_attention and fused_feedforward functional api path to incubate (#36704) · 9aeca2f1
由 Li Min 提交于 10月 26, 2021
```
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
```
9aeca2f1

25 10月, 2021 2 次提交

add op: fused_feedforward(backward) (#35611) · 2dd0a46a

由 zhangkaihuo 提交于 10月 25, 2021

这个PR是fused_feedforward反向的代码

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

fused_feedforward是一个融合算子，该算子对transformer模型的feed forward层的算子进行融合和封装，使得前端只呈现一个接口，通过融合减少部分访存和kernel launch的时间，以此提升性能。

2dd0a46a

add op: fused_feedforward(forward) (#35843) · b18cbfb2

由 zhangkaihuo 提交于 10月 25, 2021

这个PR只包含fused_feedforward前向的代码。

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

b18cbfb2

22 10月, 2021 1 次提交

Fused attention op forward (#35905) · d4906214

由 Li Min 提交于 10月 22, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

d4906214

21 10月, 2021 1 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) · 921c0917

由 niuliling123 提交于 10月 21, 2021

* Update the implement of reduceAnyKernel according to kernel primitive api
* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1

921c0917

15 10月, 2021 1 次提交
- Z
  
  Add ResNetUnit Python API (#35426) · 12882b2f
  由 Zhang Zheng 提交于 10月 15, 2021
  
  12882b2f
14 10月, 2021 1 次提交
- Z
  
  Add the complete code and related files of resnet_unit_op (#36366) · 12e6dbbc
  由 Zhang Zheng 提交于 10月 14, 2021
  
  12e6dbbc
12 10月, 2021 1 次提交
- Z
  
  Change the input param of fusion op interface from pointer to tensor (#36349) · 3e2dec5b
  由 Zhang Zheng 提交于 10月 12, 2021
  
  3e2dec5b
11 10月, 2021 1 次提交
- Z
  
  Add more tests and fix bugs for cudnn_norm_conv_test and cudnn_bn_and_relu_test (#36314) · a679fcbb
  由 Zhang Zheng 提交于 10月 11, 2021
  
  a679fcbb
09 10月, 2021 1 次提交
- Z
  
  Implement Fused BN + Add + Relu with cudnnFusedOps API. (#35955) · 7e6c0cee
  由 Zhang Zheng 提交于 10月 09, 2021
  
  7e6c0cee
29 9月, 2021 2 次提交
- Y
  
  Implement the grad and enhance the cache of norm_convolution fusion ops. (#36168) · 767050d9
  由 Yiqun Liu 提交于 9月 29, 2021
  
  767050d9
- L
  
  Add fused_dropout wrapper to ease use. (#36185) · 092d45c3
  由 Li Min 提交于 9月 29, 2021
  
  092d45c3
23 9月, 2021 1 次提交
- L
  
  Add fused_attention_op: add impl wrappers. (#35903) · 88ea8e6f
  由 Li Min 提交于 9月 23, 2021
  
  88ea8e6f
22 9月, 2021 1 次提交
- Z
  
  ResnetUnitOp implemented by cuDNN fused op(backend code) (#35557) · 736a7388
  由 Zhang Zheng 提交于 9月 22, 2021
  
  736a7388
17 9月, 2021 2 次提交
- F
  broadcast qkv_op (#35780) · cf9eae4c
  由 feng_shuai 提交于 9月 17, 2021
```
* broadcast qkv_op

* use PADDLE_ENFORCE_GT to replace assert
```
  cf9eae4c
- Z
  add a fusion op: fused_layernorm_residual_dropout_bias (#35151) · 7975dfcf
  由 zhangkaihuo 提交于 9月 17, 2021
```
Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. 
No Python API changed.
```
  7975dfcf
16 9月, 2021 1 次提交
- Z
  
  add a fusion op: fused_dropout_act_bias (#35129) · cee70434
  由 zhangkaihuo 提交于 9月 16, 2021
  
  cee70434
14 9月, 2021 1 次提交
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · 12bf0502
  由 Yiqun Liu 提交于 9月 14, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
```
  12bf0502
13 9月, 2021 2 次提交
- Y
  Revert "Implement FunctionTraits to support two kinds of elementwise functor... · 40d4a295
  由 Yiqun Liu 提交于 9月 13, 2021
```
Revert "Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)" (#35686)
```
  40d4a295
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · d4f84d46
  由 Yiqun Liu 提交于 9月 13, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)
```
  d4f84d46
09 9月, 2021 1 次提交
- Z
  
  add a fusion op: fused_residual_dropout_bias (#34963) · cf8bf032
  由 zhangkaihuo 提交于 9月 09, 2021
  
  cf8bf032
08 9月, 2021 1 次提交
- N
  
  Modify the reduce op according to the kernel primitive api (#35282) · 82b33be3
  由 niuliling123 提交于 9月 08, 2021
  
  82b33be3
06 9月, 2021 1 次提交

Add fusion_lstm INT8 PTQ (#35334) · 7ef04da6

由 joanna.wozna.intel 提交于 9月 06, 2021

* Add fusion_lstm INT8 PTQ

* Correct mkldnn_cache_capacity and enable fc_lstm_fuse_pass only for this test

* Change mkldnn_cache_capacity

7ef04da6

03 9月, 2021 1 次提交
- Y
  
  Unify the implementation of AlignedVector and simplify the codes of dropout and cast. (#35373) · c171eca2
  由 Yiqun Liu 提交于 9月 03, 2021
  
  c171eca2
26 8月, 2021 1 次提交

Add feed_forward for fused attention op. (#34945) · d1a33bc7

由 Li Min 提交于 8月 26, 2021

Describe

Add feed_forward for fused attention op.
(1) Encapsulate matmul impl (forward and backward) used in attention op.
(2) Implement bias_add (forward and backward) used in attention op.

d1a33bc7

23 8月, 2021 1 次提交

Refactor the organization of layer_norm cuda impl. (#34883) · 7f5eb533

由 Li Min 提交于 8月 23, 2021

Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op.

Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h.
Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.

7f5eb533

12 8月, 2021 1 次提交

transformer c files (#34706) · 016cc56d

由 Feng Xing 提交于 8月 12, 2021

This PR adds fused transformer related files defining c interface including class, function etc..

016cc56d

05 7月, 2021 1 次提交
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
12 6月, 2021 1 次提交

由 joanna.wozna.intel 提交于 6月 11, 2021

* Small changes related to BF16 fusion_gru and fusion_lstm

* Correct to pass arg by value

* Add conditions to rnn op

* Correct the spelling mistake

* Improving the test with checking activation

* Trigger CI

cd95ea82

14 5月, 2021 1 次提交

Fix four error messages (#32899) · c4787d76

由 Kqnonrime 提交于 5月 14, 2021

* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error

* fix seven error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix error

* fix some error

* fix some error

* fix four error message

* fix error

* fix error

c4787d76

06 5月, 2021 1 次提交

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

15 4月, 2021 1 次提交
- A
  
  Correct typos (#32288) · 825d4957
  由 AshburnLee 提交于 4月 15, 2021
  
  825d4957
30 3月, 2021 1 次提交
- J
  
  Added int8 kernel for oneDNN LSTM op (#31894) · 6dca7a1d
  由 jakpiase 提交于 3月 30, 2021
  
  6dca7a1d
26 3月, 2021 1 次提交
- T
  delete include framework.pb.h (#31859) · e804f085
  由 tianshuo78520a 提交于 3月 26, 2021
```
* delete include framework.pb.h

* fix error
```
  e804f085
04 3月, 2021 1 次提交
- J
  
  Added LSTM BF16 and fixed GRU BF16 (#31234) · 5b4f8aac
  由 jakpiase 提交于 3月 04, 2021
  
  5b4f8aac
03 3月, 2021 1 次提交
- Q
  [ROCM] update fluid operators for rocm (part3), test=develop (#31213) · 84639b61
  由 Qi Li 提交于 3月 03, 2021
```
* [ROCM] update fluid operators for rocm (part3), test=develop

* fix clang format error, test=develop
```
  84639b61
19 2月, 2021 1 次提交
- W
  Modify relu native implementation 2 (#30996) · 615d8a22
  由 Wojciech Uss 提交于 2月 18, 2021
```
* Modify relu native implementation

* fix GPU performance
```
  615d8a22
27 1月, 2021 1 次提交

REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719) · f8da5536

由 jakpiase 提交于 1月 27, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

* changed stream handling

* minor change

* added datatype to GetExpectedKernelType()

* added reading stream from TLS

f8da5536

26 1月, 2021 1 次提交
- T
  Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)" (#30708) · 824a79d3
  由 Tao Luo 提交于 1月 26, 2021
```
This reverts commit d834f4e6.
```
  824a79d3

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致