提交 · b7b231a668ac51365cdce11dfafe6f7da04b2350 · PaddlePaddle / Paddle

30 9月, 2022 1 次提交

support pure bfloat16 for more ops (#46364) · b7b231a6

由 sneaxiy 提交于 9月 30, 2022

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* add bfloat16 to selu_grad to pass CI

* fix selu grad compilation error

b7b231a6

28 9月, 2022 1 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

21 9月, 2022 1 次提交
- J
  
  refine mkldnn code · 4b8d4ade
  由 jiahongyu 提交于 9月 20, 2022
  
  4b8d4ade
18 9月, 2022 1 次提交
- R
  
  Add INT8 support for fused_multi_transformer_op (#45284) · 3d7e2118
  由 RichardWooSJTU 提交于 9月 18, 2022
  
  3d7e2118
15 9月, 2022 1 次提交
- N
  
  [CodeStyle] trim trailing whitespace in .h, .cc, .cu, etc. (#46006) · 8dde7aea
  由 Nyakku Shigure 提交于 9月 15, 2022
  
  8dde7aea
09 9月, 2022 2 次提交
- X
  
  convfusion_cache (#45902) · 3bad26ec
  由 xiaoxiaohehe001 提交于 9月 09, 2022
  
  3bad26ec
- S
  
  fix fused_gemm_epilogue compile error (#45899) · 7d000112
  由 sneaxiy 提交于 9月 09, 2022
  
  7d000112
08 9月, 2022 2 次提交
- T
  xpu-paddlepaddle-40 [任务] fused_gemm_epilogue 支持xpu (#45706) · 7085cb97
  由 taixiurong 提交于 9月 08, 2022
```
* add gemm_epilogue

* xpu-paddlepaddle-40 [任务] fused_gemm_epilogue 支持 test=kunlun
```
  7085cb97
- S
  
  fix fused_gemm_epilogue_op compile error (#45862) · 569d6c5b
  由 sneaxiy 提交于 9月 08, 2022
  
  569d6c5b
07 9月, 2022 1 次提交
- W
  
  Fix fused cuda op's mutable data [2] (#45562) · 4bbbed9a
  由 Wilber 提交于 9月 07, 2022
  
  4bbbed9a
01 9月, 2022 1 次提交
- L
  
  update alloc usage (#45654) · e3e92c9a
  由 Leo Chen 提交于 9月 01, 2022
  
  e3e92c9a
31 8月, 2022 1 次提交
- W
  
  Fix fused cuda op's mutable data [3] (#45564) · 657c69bc
  由 Wilber 提交于 8月 31, 2022
  
  657c69bc
23 8月, 2022 1 次提交
- N
  
  Delete the template parameter BLockSize in Kernel Primitive API (#45220) · 1a0cd447
  由 niuliling123 提交于 8月 23, 2022
  
  1a0cd447
17 8月, 2022 1 次提交
- W
  fix multi stream error. (#45196) · a79d4a75
  由 Wilber 提交于 8月 17, 2022
```
* fix multi stream error.
```
  a79d4a75
16 8月, 2022 1 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

15 8月, 2022 2 次提交
- Y
  
  fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
  由 Yuanle Liu 提交于 8月 15, 2022
  
  ac0553a0
- W
  convert_fp16 support multi block (#45050) · 9aecf286
  由 Wilber 提交于 8月 15, 2022
```
* convert_fp16 support multi block

* update

* update
```
  9aecf286
09 8月, 2022 1 次提交
- A
  
  fix format for paddle/phi/api/lib/tensor.cc (#44972) · b54abbe8
  由 Allen Guo 提交于 8月 09, 2022
  
  b54abbe8
05 8月, 2022 1 次提交
- C
  enhance fused_multi_transformer_op(post_layer_norm) (#44789) · 643c94e4
  由 carryyu 提交于 8月 05, 2022
```
* add fused_multi_transformer post_layer_norm

* add test post_layer_norm
```
  643c94e4
02 8月, 2022 1 次提交

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 3 次提交
- L
  unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
  由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
  88490567
- Q
  add some fp16 op for kunlun resnet50 model (#44672) · fecbc958
  由 QingshuChen 提交于 7月 29, 2022
```
* add some fp16 op for kunlun resnet50 model
*test=kunlun

* tmp
*test=kunlun
```
  fecbc958
- M
  fused_fc_elementwise_layernorm_op support fp16 (#44710) · 856f741a
  由 ming1753 提交于 7月 29, 2022
```
* fused_fc_elementwise_layernorm support fp16

* fused_fc_elementwise_layernorm support double
```
  856f741a
26 7月, 2022 1 次提交
- W
  inference multi stream support handle lazy init. (#44563) · 1892a441
  由 Wilber 提交于 7月 26, 2022
```
* multi stream support handle lazy init.

* support eigen lazy init

* update

* fix ci problem
```
  1892a441
19 7月, 2022 2 次提交
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
- Z
  
  remove include of all.h in resnet_basic_block_op_xpu.cc, test=kunlun (#44423) · d4bb2ad7
  由 zhangyikun02 提交于 7月 19, 2022
  
  d4bb2ad7
18 7月, 2022 1 次提交
- Q
  add xpu resnet_unit (#44297) · 02e9453f
  由 QingshuChen 提交于 7月 18, 2022
```
* add xpu resnet_unit
*test=kunlun

* tmp
*test=kunlun
```
  02e9453f
13 7月, 2022 1 次提交
- Z
  
  add ResNetBasicBlock python api for kunlun, test=kunlun (#44171) · 917235be
  由 zhangyikun02 提交于 7月 13, 2022
  
  917235be
12 7月, 2022 1 次提交
- Y
  
  fix fused attention, ffn, fm under new process group (#44259) · f6ff2221
  由 Yuang Liu 提交于 7月 12, 2022
  
  f6ff2221
08 7月, 2022 2 次提交
- X
  
  conv_fusion_fp16 (#44173) · 9900b42b
  由 xiaoxiaohehe001 提交于 7月 08, 2022
  
  9900b42b
- Z
  
  add implement of resnet_basic_block op for XPU2, test=kunlun (#44143) · d7be46b3
  由 zhangyikun02 提交于 7月 08, 2022
  
  d7be46b3
07 7月, 2022 2 次提交
- Z
  Fix nan in fast_ln_fwd_kernel when cols > 1024 (#44125) · 33540e10
  由 Zhang Zheng 提交于 7月 07, 2022
```
* Fix nan in fast_ln_fwd_kernel when cols > 1024

* delete blas
```
  33540e10
- Z
  
  add resnet_basic_block for kunlun, test=kunlun (#43949) · 1e6137b5
  由 zhangyikun02 提交于 7月 07, 2022
  
  1e6137b5
06 7月, 2022 2 次提交
- L
  
  Fix nan in fused multi transformer (#44093) · d7f4599d
  由 LiYuRio 提交于 7月 06, 2022
  
  d7f4599d
- X
  [Paddle Inference] Add conv_elementwise_act. (#43871) · 4c269ccb
  由 xiaoxiaohehe001 提交于 7月 06, 2022
```
* conv_fusion
```
  4c269ccb
02 7月, 2022 1 次提交

unify cpu context, part2 (#44012) · 755438a7

由 Leo Chen 提交于 7月 02, 2022

* fix init()

* delete test_device_context

* replace CPUDeviceContext with CPUContext

* fix test_scalar

* remove dot_op.cc

* fix compile

755438a7

01 7月, 2022 1 次提交

Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3

由 limingshu 提交于 7月 01, 2022

* 2nd part of transpose update

* add switch_auto_tune option.

* add some changes according to Ci

* refine the structure of auto_tune_base.

* merge develop changes

* reset the switch_set_range and change unittest of transpose auto-tune

* change the kernel auto-tune logits

53d5abe3

30 6月, 2022 2 次提交
- W
  fused_gate_attention manual code in eager (#43897) · 73f957cf
  由 wanghuancoder 提交于 6月 30, 2022
```
* fused_gate_attention manual code in eager

* refine

* refine

* refine

* refine

* refine

* refine
```
  73f957cf
- Z
  Add new attr of fused_multi_transformer (#43730) · c2a5bb91
  由 Zhang Zheng 提交于 6月 30, 2022
```
* Add new attr of fused_multi_transformer

* fix format

* add note

* add in layer

* fixfixfixfix
```
  c2a5bb91

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功