提交 · 6977df8c59113ac84784ffa05b96fae2dd66f2b4 · 机器未来 / Paddle

12 10月, 2022 1 次提交

[CodeStyle][F401] remove unused imports in... · 6977df8c

由 Shuangchi He 提交于 10月 12, 2022

[CodeStyle][F401] remove unused imports in python_paddle/inference_device_profiler_text_metric_incubate_quantization_libs_audio_amp_jit. (#46762)

6977df8c

10 10月, 2022 1 次提交

make fused_multi_transformer support dynamically set the cache_kvs' shape and... · 9ea279a4

由 carryyu 提交于 10月 10, 2022

make fused_multi_transformer support dynamically set the cache_kvs' shape and support input prefix_caches. (#46777)

* make fused_multi_transformer support dynamically set the cache_kvs' shape and support input prefix_caches.

9ea279a4

26 8月, 2022 1 次提交
- W
  
  [Eager] delete final state pre-name (#45306) · 126940b3
  由 wanghuancoder 提交于 8月 26, 2022
  
  126940b3
30 6月, 2022 1 次提交
- Z
  Add new attr of fused_multi_transformer (#43730) · c2a5bb91
  由 Zhang Zheng 提交于 6月 30, 2022
```
* Add new attr of fused_multi_transformer

* fix format

* add note

* add in layer

* fixfixfixfix
```
  c2a5bb91
28 6月, 2022 1 次提交
- Y
  
  [fused_transformer] update transformer fustion for dygraph, test=allcases (#43858) · 99b3727d
  由 Yuang Liu 提交于 6月 28, 2022
  
  99b3727d
21 6月, 2022 1 次提交
- Y
  
  Fix code example of fused_attention and fused_feedforward. (#43635) · 223fb7b3
  由 Yiqun Liu 提交于 6月 21, 2022
  
  223fb7b3
17 6月, 2022 1 次提交

Support optional residual add in fused_attention and fused_feedforward. (#43474) · 19e866f9

由 Yiqun Liu 提交于 6月 17, 2022

* Support optional residual add in fused_attention and fused_feedforward.

* Add checkpoint and add the check of add_residual when pre_layer_norm is false.

* Add TODO and change the python api to add add_residual argument.

19e866f9

14 6月, 2022 1 次提交
- L
  
  fix is_test bug in fused_feedforward. (#43508) · 193ab32c
  由 Li Min 提交于 6月 14, 2022
  
  193ab32c
13 6月, 2022 1 次提交
- W
  
  fused_attention fused_feedforward api support Model Tensor Parallel (#42985) · 31ddaae2
  由 WangXi 提交于 6月 13, 2022
  
  31ddaae2
05 6月, 2022 1 次提交

【code format check upgrade】 step2：yapf (#42944) · a072fca8

由 Sing_chan 提交于 6月 05, 2022

* use yapf to format all python file

* yapf exclude two unittests file for they rely on writing and reading file, and format will break them

* disable diff_py_file because too many diff files cause command following failed

a072fca8

31 5月, 2022 1 次提交
- L
  Rename dropout is test (#43098) · 67497119
  由 Li Min 提交于 5月 31, 2022
```
* replace dropout_is_test with is_test.
* improve atol on a100.
```
  67497119
30 5月, 2022 1 次提交
- L
  Add fused_bias_dropout_residual_ln op and layer. (#43062) · dceccd9d
  由 Li Min 提交于 5月 30, 2022
```
* add fused_bias_dropout_residual_ln op and layer.
```
  dceccd9d
26 4月, 2022 1 次提交
- W
  
  Add fused_multi_transformer op to optimize transformer generation performance (#41814) · 9dadf7df
  由 WangXi 提交于 4月 26, 2022
  
  9dadf7df
25 3月, 2022 1 次提交

Refactor Dygraph Flags (#40786) · 3085d5e4

由 Jiabin Yang 提交于 3月 25, 2022

* refactor eager flags

* fix flags error when we switch from eager to dygraph

* fix ci problem

* fix ci

* fix ci

* merge develop and fix code style

* merge develop and fix code style

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* fix op test error

* merge develop

3085d5e4

11 3月, 2022 1 次提交
- Y
  
  [hybrid] Support tensor parallel and cache structure for fused attention op. (#40101) · 1882c496
  由 Yuang Liu 提交于 3月 11, 2022
  
  1882c496
24 2月, 2022 1 次提交
- L
  fix 'invalid escape sequence' (#39842) · 4e26fa57
  由 Leo Chen 提交于 2月 24, 2022
```
* fix 'invalid escape sequence'

* fix assert error
```
  4e26fa57
26 11月, 2021 1 次提交
- L
  Fix bugs when bias add none in static graph for fused_attention op. (#37566) · 097e098d
  由 Li Min 提交于 11月 26, 2021
```
* Fix bugs when bias is none for static graph for fused_attention op.
```
  097e098d
23 11月, 2021 1 次提交
- L
  Add support bias is none for fused_attention op. (#37411) · 1a8786cf
  由 Li Min 提交于 11月 23, 2021
```
Add support for bias is none for fused_attention op.
```
  1a8786cf
16 11月, 2021 1 次提交

Fix attn_bias_add bug. (#37147) · a9e7a854

由 Li Min 提交于 11月 16, 2021

fused_attention_op的实现中，使用了bias_add，且其实现是通过使用kernel primitive来实现的，之后kernel primitive的WriteData api接口及函数内部实现发生了更改，将判断越界的逻辑移到了template的参数中，使得调用的分支有错误，产生了越界赋值操作，污染了别的显存空间的内容。具体表现为：test_fused_attention_op_api.py 单次执行基本上不会报错，多次循环执行不同shape的输入，结果计算不对，具有偶发性，bug不易察觉。

a9e7a854

12 11月, 2021 1 次提交
- Z
  [fix]fix the bug of fused_attention and fused_feedforward (#36972) · 6486e242
  由 zhangkaihuo 提交于 11月 12, 2021
```
* fix bug:
1. atten: set the default value of attn_dropout_rate to None
2. ffn: add activation parameter
```
  6486e242
28 10月, 2021 1 次提交
- L
  [fix-doc-bug] Fix fused_attention_op english doc test=document_fix (#36803) · 11c2874e
  由 Li Min 提交于 10月 28, 2021
```
* Fix fused_attention english doc test=document_fix
```
  11c2874e
27 10月, 2021 1 次提交

Fused transformer encoder layer and fused feedforward layer (#36604) · 9f3613f3

由 zhangkaihuo 提交于 10月 27, 2021

本PR是fused_transformer的layer层代码，包含FusedFeedForward的layer层代码和FusedTransformerEncoderLayer的代码。

9f3613f3

26 10月, 2021 2 次提交

Add fused attention op backward and python layer. (#36498) · 5119428e

由 Li Min 提交于 10月 26, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

5119428e

L
Move fused_attention and fused_feedforward functional api path to incubate (#36704) · 9aeca2f1
由 Li Min 提交于 10月 26, 2021
```
将 #35905 和 #35843 PR中新增的的python api接口移到incubate目录下。
```
9aeca2f1

25 10月, 2021 1 次提交

add op: fused_feedforward(forward) (#35843) · b18cbfb2

由 zhangkaihuo 提交于 10月 25, 2021

这个PR只包含fused_feedforward前向的代码。

相关kernel实现：fused_dropout_act_bias, fused_residual_dropout_bias, fused_layernorm_residual_dropout_bias

fused_feedforward是一个融合算子，该算子对transformer模型的feed forward层的算子进行融合和封装，使得前端只呈现一个接口，通过融合减少部分访存和kernel launch的时间，以此提升性能。

b18cbfb2

22 10月, 2021 1 次提交

Fused attention op forward (#35905) · d4906214

由 Li Min 提交于 10月 22, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

d4906214

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致