提交 · 36dd295e26d8d3e61fb410858f4e3b17ba4e3992 · PaddlePaddle / Paddle

16 11月, 2021 1 次提交

[cherry-pick-2.2.1]fix fused_transformer_encoder_layer bug (#37229) · 36dd295e

由 zhangkaihuo 提交于 11月 16, 2021

修复了fused_transformer_encoder_layer fine-tune过程发现的一些问题：

    fused_attention_op添加attn_mask=None的支持：PR
    pre_layer_norm处理问题：PR
    参数处理，计算错误的问题：PR
    add_bias计算错误问题：PR
    添加pure fp16的支持：PR

36dd295e

15 11月, 2021 1 次提交
- Z
  MLPerf Optimization for Release/2.2 (#37109) · 287ca7d5
  由 Zeng Jinle 提交于 11月 15, 2021
```
* add mlperf optimization PRs

* update
```
  287ca7d5
28 10月, 2021 1 次提交
- L
  Fix fused_attention_op and fused_feedforward_op bug when pre_layer_norm is false. (#36793) (#36816) · ae592233
  由 Li Min 提交于 10月 28, 2021
```
* Fix bug when pre_layer_norm is false.
```
  ae592233
27 10月, 2021 1 次提交

Add fused attention op backward and python layer. (#36498) (#36752) · 64643d50

由 Li Min 提交于 10月 27, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

64643d50

26 10月, 2021 3 次提交

[cherry pick] add op: fused_feedforward(backward) (#36730) · 76c1bae1

由 zhangkaihuo 提交于 10月 26, 2021

* add op: fused_feedforward(backward) (#35611)

这个PR是fused_feedforward反向的代码

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

fused_feedforward是一个融合算子，该算子对transformer模型的feed forward层的算子进行融合和封装，使得前端只呈现一个接口，通过融合减少部分访存和kernel launch的时间，以此提升性能。

* Move fused_attention and fused_feedforward functional api path to incubate (#36704)

将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。

76c1bae1

Z
[cherry-pick]add op: fused_feedforward(forward) (#36729) · 77034fc3
由 zhangkaihuo 提交于 10月 26, 2021
```
This is a fusion operator to compute feed forward layer in transformer model architecture.
```
77034fc3

[cherry-pick-2.2] Fused attention op forward (#35905) (#36708) · d2be870a

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

d2be870a

25 10月, 2021 2 次提交

Add fused_attention_op: add impl wrappers. (#35903) (#36673) · 8c0bacd4

由 Li Min 提交于 10月 25, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

8c0bacd4

Add fused_dropout wrapper to ease use. (#36185) (#36640) · 05d7e2fd

由 Li Min 提交于 10月 25, 2021

In fused_attention op and fused_ffn op, the fused bias_add+dropout+residual+layernorm kernel or bias_add+dropout+residual kernel is used. To ease the use of this kernel, we provide a wrapper in this PR.
1.To reuse the increment computing code, we exact the corresponding code to "GetSeedDataAndIncrement" routine in dropout_impl_util.h.
2.The fused_dropout_helper.h provides the fused dropout kernel wrapper.

Note: the test of this warper will be provided in the following fused_attention_op and fused_ffn PRs.

05d7e2fd

22 10月, 2021 1 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) (#36616) · 6840cf55

由 niuliling123 提交于 10月 22, 2021

* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
* Update the implement of reduceAnyKernel according to kernel primitive api

6840cf55

17 9月, 2021 2 次提交
- F
  broadcast qkv_op (#35780) · cf9eae4c
  由 feng_shuai 提交于 9月 17, 2021
```
* broadcast qkv_op

* use PADDLE_ENFORCE_GT to replace assert
```
  cf9eae4c
- Z
  add a fusion op: fused_layernorm_residual_dropout_bias (#35151) · 7975dfcf
  由 zhangkaihuo 提交于 9月 17, 2021
```
Fused elementwise_add, dropout, elementwise_add and layer_norm into one operator, only support Forward. 
No Python API changed.
```
  7975dfcf
16 9月, 2021 1 次提交
- Z
  
  add a fusion op: fused_dropout_act_bias (#35129) · cee70434
  由 zhangkaihuo 提交于 9月 16, 2021
  
  cee70434
14 9月, 2021 1 次提交
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · 12bf0502
  由 Yiqun Liu 提交于 9月 14, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
```
  12bf0502
13 9月, 2021 2 次提交
- Y
  Revert "Implement FunctionTraits to support two kinds of elementwise functor... · 40d4a295
  由 Yiqun Liu 提交于 9月 13, 2021
```
Revert "Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)" (#35686)
```
  40d4a295
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · d4f84d46
  由 Yiqun Liu 提交于 9月 13, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)
```
  d4f84d46
09 9月, 2021 1 次提交
- Z
  
  add a fusion op: fused_residual_dropout_bias (#34963) · cf8bf032
  由 zhangkaihuo 提交于 9月 09, 2021
  
  cf8bf032
08 9月, 2021 1 次提交
- N
  
  Modify the reduce op according to the kernel primitive api (#35282) · 82b33be3
  由 niuliling123 提交于 9月 08, 2021
  
  82b33be3
06 9月, 2021 1 次提交

Add fusion_lstm INT8 PTQ (#35334) · 7ef04da6

由 joanna.wozna.intel 提交于 9月 06, 2021

* Add fusion_lstm INT8 PTQ

* Correct mkldnn_cache_capacity and enable fc_lstm_fuse_pass only for this test

* Change mkldnn_cache_capacity

7ef04da6

03 9月, 2021 1 次提交
- Y
  
  Unify the implementation of AlignedVector and simplify the codes of dropout and cast. (#35373) · c171eca2
  由 Yiqun Liu 提交于 9月 03, 2021
  
  c171eca2
26 8月, 2021 1 次提交

Add feed_forward for fused attention op. (#34945) · d1a33bc7

由 Li Min 提交于 8月 26, 2021

Describe

Add feed_forward for fused attention op.
(1) Encapsulate matmul impl (forward and backward) used in attention op.
(2) Implement bias_add (forward and backward) used in attention op.

d1a33bc7

23 8月, 2021 1 次提交

Refactor the organization of layer_norm cuda impl. (#34883) · 7f5eb533

由 Li Min 提交于 8月 23, 2021

Refactor the organization of layer_norm cuda impl so that it can be reused in fused attention op.

Extract the layer_norm cuda impl form layer_norm_op.cu to layer_norm_kernel.cu.h.
Define fused/attention_layer_norm.h, which can be used in fused attention op in next PR.

7f5eb533

12 8月, 2021 1 次提交

transformer c files (#34706) · 016cc56d

由 Feng Xing 提交于 8月 12, 2021

This PR adds fused transformer related files defining c interface including class, function etc..

016cc56d

05 7月, 2021 1 次提交
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
12 6月, 2021 1 次提交

由 joanna.wozna.intel 提交于 6月 11, 2021

* Small changes related to BF16 fusion_gru and fusion_lstm

* Correct to pass arg by value

* Add conditions to rnn op

* Correct the spelling mistake

* Improving the test with checking activation

* Trigger CI

cd95ea82

14 5月, 2021 1 次提交

Fix four error messages (#32899) · c4787d76

由 Kqnonrime 提交于 5月 14, 2021

* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error

* fix seven error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix error

* fix some error

* fix some error

* fix four error message

* fix error

* fix error

c4787d76

06 5月, 2021 1 次提交

[ROCM] bugfix for unittest (#32392) · 31392627

由 ronnywang 提交于 5月 06, 2021

* fix test_unpool_op

* fix test_inplace_addto_strategy

* fix test_conv2d_fusion_op

* fix test_imperative_lod_tensor_to_selected_rows, test_imperative_selected_rows_to_lod_tensor

* fix test_dot_op

* fix test_correlation_op

* fix tracer

* fix test_memcpy_op

31392627

15 4月, 2021 1 次提交
- A
  
  Correct typos (#32288) · 825d4957
  由 AshburnLee 提交于 4月 15, 2021
  
  825d4957
30 3月, 2021 1 次提交
- J
  
  Added int8 kernel for oneDNN LSTM op (#31894) · 6dca7a1d
  由 jakpiase 提交于 3月 30, 2021
  
  6dca7a1d
26 3月, 2021 1 次提交
- T
  delete include framework.pb.h (#31859) · e804f085
  由 tianshuo78520a 提交于 3月 26, 2021
```
* delete include framework.pb.h

* fix error
```
  e804f085
04 3月, 2021 1 次提交
- J
  
  Added LSTM BF16 and fixed GRU BF16 (#31234) · 5b4f8aac
  由 jakpiase 提交于 3月 04, 2021
  
  5b4f8aac
03 3月, 2021 1 次提交
- Q
  [ROCM] update fluid operators for rocm (part3), test=develop (#31213) · 84639b61
  由 Qi Li 提交于 3月 03, 2021
```
* [ROCM] update fluid operators for rocm (part3), test=develop

* fix clang format error, test=develop
```
  84639b61
19 2月, 2021 1 次提交
- W
  Modify relu native implementation 2 (#30996) · 615d8a22
  由 Wojciech Uss 提交于 2月 18, 2021
```
* Modify relu native implementation

* fix GPU performance
```
  615d8a22
27 1月, 2021 1 次提交

REUPLOAD Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30719) · f8da5536

由 jakpiase 提交于 1月 27, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

* changed stream handling

* minor change

* added datatype to GetExpectedKernelType()

* added reading stream from TLS

f8da5536

26 1月, 2021 2 次提交

T
Revert "Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661)" (#30708) · 824a79d3
由 Tao Luo 提交于 1月 26, 2021
```
This reverts commit d834f4e6.
```
824a79d3

Added vanilla LSTM and LSTM with peepholes oneDNN fp32 kernel (#30661) · d834f4e6

由 jakpiase 提交于 1月 26, 2021

* added external reorder to profiler

* resolved conflict

* added enable_static

* initial version of lstm, not working yet

* added lstm to operators.cmake

* added vanilla lstm mkldnn op

* added peephole weights integration

* minor changes

* added formatting

* added fusion_lstm_mkldnn to static_whitelist

* added formatting

* removed comment

* moved use_peepholes attribute inside is_cached block

* reverted wrong changes

* minor formatting change

* minor changes

d834f4e6

25 1月, 2021 2 次提交
- A
  More precise mkldnn kernel rules in GetExpectedKernelType (#29840) · 5bf25d1e
  由 arlesniak 提交于 1月 25, 2021
```
* More precise mkldnn kernel choice in GetExpectedKernelType

* Fixes after review

* Refresh develop for CI

* CI experiment

* get back from CI exper
```
  5bf25d1e
- J
  
  [oneDNN] Cache oneDNN stream not to recreate in each oneDNN op (#30358) · 173660be
  由 Jacek Czaja 提交于 1月 25, 2021
  
  173660be
11 1月, 2021 2 次提交
- 石
  
  enhance error msgs of fusion_seqpool_cvm_concat_op.cc, test=develop (#30240) · a0ee0914
  由石晓伟提交于 1月 11, 2021
  
  a0ee0914
- W
  
  register OPMaker and Infer Shape Check for fused_elementwise_add (#30259) · 8dcae0c5
  由 wangchaochaohu 提交于 1月 11, 2021
  
  8dcae0c5

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功