提交 · f706d95dfe9301e18ee6575c3e58e7ba37d6e78a · BaiXuePrincess / Paddle

16 8月, 2022 1 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

15 8月, 2022 2 次提交
- Y
  
  fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
  由 Yuanle Liu 提交于 8月 15, 2022
  
  ac0553a0
- W
  convert_fp16 support multi block (#45050) · 9aecf286
  由 Wilber 提交于 8月 15, 2022
```
* convert_fp16 support multi block

* update

* update
```
  9aecf286
09 8月, 2022 1 次提交
- A
  
  fix format for paddle/phi/api/lib/tensor.cc (#44972) · b54abbe8
  由 Allen Guo 提交于 8月 09, 2022
  
  b54abbe8
05 8月, 2022 1 次提交
- C
  enhance fused_multi_transformer_op(post_layer_norm) (#44789) · 643c94e4
  由 carryyu 提交于 8月 05, 2022
```
* add fused_multi_transformer post_layer_norm

* add test post_layer_norm
```
  643c94e4
02 8月, 2022 1 次提交

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 3 次提交
- L
  unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
  由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
  88490567
- Q
  add some fp16 op for kunlun resnet50 model (#44672) · fecbc958
  由 QingshuChen 提交于 7月 29, 2022
```
* add some fp16 op for kunlun resnet50 model
*test=kunlun

* tmp
*test=kunlun
```
  fecbc958
- M
  fused_fc_elementwise_layernorm_op support fp16 (#44710) · 856f741a
  由 ming1753 提交于 7月 29, 2022
```
* fused_fc_elementwise_layernorm support fp16

* fused_fc_elementwise_layernorm support double
```
  856f741a
26 7月, 2022 1 次提交
- W
  inference multi stream support handle lazy init. (#44563) · 1892a441
  由 Wilber 提交于 7月 26, 2022
```
* multi stream support handle lazy init.

* support eigen lazy init

* update

* fix ci problem
```
  1892a441
19 7月, 2022 2 次提交
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
- Z
  
  remove include of all.h in resnet_basic_block_op_xpu.cc, test=kunlun (#44423) · d4bb2ad7
  由 zhangyikun02 提交于 7月 19, 2022
  
  d4bb2ad7
18 7月, 2022 1 次提交
- Q
  add xpu resnet_unit (#44297) · 02e9453f
  由 QingshuChen 提交于 7月 18, 2022
```
* add xpu resnet_unit
*test=kunlun

* tmp
*test=kunlun
```
  02e9453f
13 7月, 2022 1 次提交
- Z
  
  add ResNetBasicBlock python api for kunlun, test=kunlun (#44171) · 917235be
  由 zhangyikun02 提交于 7月 13, 2022
  
  917235be
12 7月, 2022 1 次提交
- Y
  
  fix fused attention, ffn, fm under new process group (#44259) · f6ff2221
  由 Yuang Liu 提交于 7月 12, 2022
  
  f6ff2221
08 7月, 2022 2 次提交
- X
  
  conv_fusion_fp16 (#44173) · 9900b42b
  由 xiaoxiaohehe001 提交于 7月 08, 2022
  
  9900b42b
- Z
  
  add implement of resnet_basic_block op for XPU2, test=kunlun (#44143) · d7be46b3
  由 zhangyikun02 提交于 7月 08, 2022
  
  d7be46b3
07 7月, 2022 2 次提交
- Z
  Fix nan in fast_ln_fwd_kernel when cols > 1024 (#44125) · 33540e10
  由 Zhang Zheng 提交于 7月 07, 2022
```
* Fix nan in fast_ln_fwd_kernel when cols > 1024

* delete blas
```
  33540e10
- Z
  
  add resnet_basic_block for kunlun, test=kunlun (#43949) · 1e6137b5
  由 zhangyikun02 提交于 7月 07, 2022
  
  1e6137b5
06 7月, 2022 2 次提交
- L
  
  Fix nan in fused multi transformer (#44093) · d7f4599d
  由 LiYuRio 提交于 7月 06, 2022
  
  d7f4599d
- X
  [Paddle Inference] Add conv_elementwise_act. (#43871) · 4c269ccb
  由 xiaoxiaohehe001 提交于 7月 06, 2022
```
* conv_fusion
```
  4c269ccb
02 7月, 2022 1 次提交

unify cpu context, part2 (#44012) · 755438a7

由 Leo Chen 提交于 7月 02, 2022

* fix init()

* delete test_device_context

* replace CPUDeviceContext with CPUContext

* fix test_scalar

* remove dot_op.cc

* fix compile

755438a7

01 7月, 2022 1 次提交

Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3

由 limingshu 提交于 7月 01, 2022

* 2nd part of transpose update

* add switch_auto_tune option.

* add some changes according to Ci

* refine the structure of auto_tune_base.

* merge develop changes

* reset the switch_set_range and change unittest of transpose auto-tune

* change the kernel auto-tune logits

53d5abe3

30 6月, 2022 2 次提交
- W
  fused_gate_attention manual code in eager (#43897) · 73f957cf
  由 wanghuancoder 提交于 6月 30, 2022
```
* fused_gate_attention manual code in eager

* refine

* refine

* refine

* refine

* refine

* refine
```
  73f957cf
- Z
  Add new attr of fused_multi_transformer (#43730) · c2a5bb91
  由 Zhang Zheng 提交于 6月 30, 2022
```
* Add new attr of fused_multi_transformer

* fix format

* add note

* add in layer

* fixfixfixfix
```
  c2a5bb91
28 6月, 2022 1 次提交
- S
  
  fix cublasLt workspace size (#43877) · 6d436f6e
  由 sneaxiy 提交于 6月 28, 2022
  
  6d436f6e
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
21 6月, 2022 1 次提交
- Y
  
  Fix code example of fused_attention and fused_feedforward. (#43635) · 223fb7b3
  由 Yiqun Liu 提交于 6月 21, 2022
  
  223fb7b3
20 6月, 2022 2 次提交
- W
  
  Add passes and plugins for distributed inference of NLU (#43049) · 007f3614
  由 whs 提交于 6月 20, 2022
  
  007f3614
- Z
  Support more dimensions in MMHA (#43612) · 03f9e598
  由 Zhang Zheng 提交于 6月 20, 2022
```
* support more dimensions

* fix
```
  03f9e598
17 6月, 2022 1 次提交

Support optional residual add in fused_attention and fused_feedforward. (#43474) · 19e866f9

由 Yiqun Liu 提交于 6月 17, 2022

* Support optional residual add in fused_attention and fused_feedforward.

* Add checkpoint and add the check of add_residual when pre_layer_norm is false.

* Add TODO and change the python api to add add_residual argument.

19e866f9

15 6月, 2022 1 次提交

Optimize prod's python implementation for dygraph. (#43309) · 9b7126d0

由 Yiqun Liu 提交于 6月 15, 2022

* Optimize prod's python implementation for dygraph.

* Change key_dim to head_dim.

* Add comment in unittest.

* Disable TF32 in unittest.

9b7126d0

14 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step3：enable clang-format sort these infrt files's headers (#43333) · 403b127b
  由 Sing_chan 提交于 6月 14, 2022
  
  403b127b
10 6月, 2022 1 次提交
- L
  
  optimize bwd layer_norm kernel with fast method (#42491) · b4a93884
  由 limingshu 提交于 6月 10, 2022
  
  b4a93884
09 6月, 2022 1 次提交
- C
  Implement dropout_nd operator to optimize dropout with axis not None. (#42463) · caa57498
  由 crystal 提交于 6月 09, 2022
```
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>
```
  caa57498
08 6月, 2022 1 次提交

Fix wrong reduce_dims in fused_gate_attention and optimize the memory usage. (#43216) · 10f8637c

由 Yiqun Liu 提交于 6月 08, 2022

* Polish codes and memory usage for fused_gate_attention.

* Fix wrong reduce_dims in fused_gate_attention when computing gradient of nonbatched_bias.

10f8637c

07 6月, 2022 1 次提交
- Z
  
  Supoort more dimensions in forward fast layer_norm kernel (#43226) · d9f8636c
  由 Zhang Zheng 提交于 6月 07, 2022
  
  d9f8636c
05 6月, 2022 2 次提交
- S
  
  revert modification for format PR passing CI; not sort headers for windows CI test failed (#43200) · 58d2949d
  由 Sing_chan 提交于 6月 05, 2022
  
  58d2949d
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致