提交 · cdab3a44b6a2c2866c2cac4ccdf32544631f654c · PaddlePaddle / Paddle

20 12月, 2022 1 次提交
- S
  Fix nullptr to TestFuseGemmEpilogueReluBWDFP* (#48997) (#49090) · cdab3a44
  由 ShenLiang 提交于 12月 20, 2022
```
Co-authored-by: NMing-Xu Huang <mingh@nvidia.com>
```
  cdab3a44
28 11月, 2022 1 次提交

Cherrypick NV fixes to release/2.4 (#48263) · 7a0b8625

由 zlsh80826 提交于 11月 28, 2022

* Reduce squeeze2_matmul_fuse_pass, flattent tests time (#47098)

* Add missing fp32 config and reduce the testing combination

* Reduce trt matmul pass test max examples

* Loose TRT fp16 tests tolerance (#47100)

* Loose TRT half test tolerance to 1e-3 (#47101)

* Loose TRT half test tolerance to 1e-3 (#47106)

* Update distributed_strategy.proto (#46531)

* Close popen pipe after used (#47053)

* Add launch_bounds (#47285)

* Fix TRT UT failures (#47488)

* Format cherry-picked commits

* CudnnNormConvolution is no longer supported on NVIDIA Hopper GPUs (#48203)

* Skip tests that use fused_ops on H100

* Add error message to FusedOps on H100
Co-authored-by: NShijie <505749828@qq.com>
Co-authored-by: NLeo Chen <39020268+leo0519@users.noreply.github.com>
Co-authored-by: NTian Zheng <tizheng@nvidia.com>

7a0b8625

26 10月, 2022 1 次提交

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and... · 9a6dd8f8

由 sneaxiy 提交于 10月 26, 2022

[Cherry-pick][Release/2.4]Refine the memory usage of fused_attention and fused_feedforward ops (#47235)

* fix fused_attention fused_feedforward

* fix ci

* fix ci

* fix ci PADDLE_GET_CONST

* fix ci ut

9a6dd8f8

20 10月, 2022 1 次提交
- S
  [Cherry-pick][Release/2.4] support pure bfloat16 for more ops · da7d2f29
  由 sneaxiy 提交于 10月 20, 2022
```
support pure bfloat16 for more ops
```
  da7d2f29
18 10月, 2022 1 次提交

[cherry-pick] Fix perf issues of mp/pp/fuse in eager mode (#47071) · b84edd90

由 Haohongxiang 提交于 10月 18, 2022

* [Dygraph] Fix performance of pp+mp by using send/recv_calc_stream instead of send/recv (#46116)

* [Dygraph] Fix Perf of FusedFeedForward and FusedAttention with AllReduce (#46780)

* update

b84edd90

19 9月, 2022 2 次提交
- M
  Add INT8 support for fused_multi_transformer_op (#45284) (#46169) · db368d5b
  由 minghaoBD 提交于 9月 19, 2022
```
Co-authored-by: NRichardWooSJTU <37864677+RichardWooSJTU@users.noreply.github.com>
```
  db368d5b
- X
  
  convfusion_cache (#46054) · f4ec1563
  由 xiaoxiaohehe001 提交于 9月 19, 2022
  
  f4ec1563
09 9月, 2022 1 次提交
- S
  
  fix fused_gemm_epilogue compile error (#45899) · 7d000112
  由 sneaxiy 提交于 9月 09, 2022
  
  7d000112
08 9月, 2022 2 次提交
- T
  xpu-paddlepaddle-40 [任务] fused_gemm_epilogue 支持xpu (#45706) · 7085cb97
  由 taixiurong 提交于 9月 08, 2022
```
* add gemm_epilogue

* xpu-paddlepaddle-40 [任务] fused_gemm_epilogue 支持 test=kunlun
```
  7085cb97
- S
  
  fix fused_gemm_epilogue_op compile error (#45862) · 569d6c5b
  由 sneaxiy 提交于 9月 08, 2022
  
  569d6c5b
07 9月, 2022 1 次提交
- W
  
  Fix fused cuda op's mutable data [2] (#45562) · 4bbbed9a
  由 Wilber 提交于 9月 07, 2022
  
  4bbbed9a
01 9月, 2022 1 次提交
- L
  
  update alloc usage (#45654) · e3e92c9a
  由 Leo Chen 提交于 9月 01, 2022
  
  e3e92c9a
31 8月, 2022 1 次提交
- W
  
  Fix fused cuda op's mutable data [3] (#45564) · 657c69bc
  由 Wilber 提交于 8月 31, 2022
  
  657c69bc
23 8月, 2022 1 次提交
- N
  
  Delete the template parameter BLockSize in Kernel Primitive API (#45220) · 1a0cd447
  由 niuliling123 提交于 8月 23, 2022
  
  1a0cd447
17 8月, 2022 1 次提交
- W
  fix multi stream error. (#45196) · a79d4a75
  由 Wilber 提交于 8月 17, 2022
```
* fix multi stream error.
```
  a79d4a75
16 8月, 2022 1 次提交

convert multihead to oss (#45019) · f706d95d

由 feng_shuai 提交于 8月 16, 2022

* convert multihead to oss

* fix:bug

* fix:delete const cast

* fix:don't support bias_qk

* add vit pass

* fix:convert bug and add preln_residual_bias

* support length=-1

* add UT for convert

* add no_bias_qk support for gpu_multihead_op

* delete infer_shape depends on bias_qk

* oss just can be used in T4 and A*

* fix:change api for ROCM CI

f706d95d

15 8月, 2022 2 次提交
- Y
  
  fused_embedding_eltwise_layernorm_op and skip_layernorm_op support fp16 (#44969) · ac0553a0
  由 Yuanle Liu 提交于 8月 15, 2022
  
  ac0553a0
- W
  convert_fp16 support multi block (#45050) · 9aecf286
  由 Wilber 提交于 8月 15, 2022
```
* convert_fp16 support multi block

* update

* update
```
  9aecf286
09 8月, 2022 1 次提交
- A
  
  fix format for paddle/phi/api/lib/tensor.cc (#44972) · b54abbe8
  由 Allen Guo 提交于 8月 09, 2022
  
  b54abbe8
05 8月, 2022 1 次提交
- C
  enhance fused_multi_transformer_op(post_layer_norm) (#44789) · 643c94e4
  由 carryyu 提交于 8月 05, 2022
```
* add fused_multi_transformer post_layer_norm

* add test post_layer_norm
```
  643c94e4
02 8月, 2022 1 次提交

Multihead matmul fp16 (#44792) · 0fd8ee63

由 Wilber 提交于 8月 02, 2022

* multihead matmul add fp16

* fix windows error

* fix rocm error

* fix rocm error

0fd8ee63

01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 3 次提交
- L
  unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
  由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
  88490567
- Q
  add some fp16 op for kunlun resnet50 model (#44672) · fecbc958
  由 QingshuChen 提交于 7月 29, 2022
```
* add some fp16 op for kunlun resnet50 model
*test=kunlun

* tmp
*test=kunlun
```
  fecbc958
- M
  fused_fc_elementwise_layernorm_op support fp16 (#44710) · 856f741a
  由 ming1753 提交于 7月 29, 2022
```
* fused_fc_elementwise_layernorm support fp16

* fused_fc_elementwise_layernorm support double
```
  856f741a
26 7月, 2022 1 次提交
- W
  inference multi stream support handle lazy init. (#44563) · 1892a441
  由 Wilber 提交于 7月 26, 2022
```
* multi stream support handle lazy init.

* support eigen lazy init

* update

* fix ci problem
```
  1892a441
19 7月, 2022 2 次提交
- R
  Rename BOOST_GET macros (#44368) · 4b085c57
  由 Ruibiao Chen 提交于 7月 19, 2022
```
* Rename BOOST_GET macros

* Fix conflicts
```
  4b085c57
- Z
  
  remove include of all.h in resnet_basic_block_op_xpu.cc, test=kunlun (#44423) · d4bb2ad7
  由 zhangyikun02 提交于 7月 19, 2022
  
  d4bb2ad7
18 7月, 2022 1 次提交
- Q
  add xpu resnet_unit (#44297) · 02e9453f
  由 QingshuChen 提交于 7月 18, 2022
```
* add xpu resnet_unit
*test=kunlun

* tmp
*test=kunlun
```
  02e9453f
13 7月, 2022 1 次提交
- Z
  
  add ResNetBasicBlock python api for kunlun, test=kunlun (#44171) · 917235be
  由 zhangyikun02 提交于 7月 13, 2022
  
  917235be
12 7月, 2022 1 次提交
- Y
  
  fix fused attention, ffn, fm under new process group (#44259) · f6ff2221
  由 Yuang Liu 提交于 7月 12, 2022
  
  f6ff2221
08 7月, 2022 2 次提交
- X
  
  conv_fusion_fp16 (#44173) · 9900b42b
  由 xiaoxiaohehe001 提交于 7月 08, 2022
  
  9900b42b
- Z
  
  add implement of resnet_basic_block op for XPU2, test=kunlun (#44143) · d7be46b3
  由 zhangyikun02 提交于 7月 08, 2022
  
  d7be46b3
07 7月, 2022 2 次提交
- Z
  Fix nan in fast_ln_fwd_kernel when cols > 1024 (#44125) · 33540e10
  由 Zhang Zheng 提交于 7月 07, 2022
```
* Fix nan in fast_ln_fwd_kernel when cols > 1024

* delete blas
```
  33540e10
- Z
  
  add resnet_basic_block for kunlun, test=kunlun (#43949) · 1e6137b5
  由 zhangyikun02 提交于 7月 07, 2022
  
  1e6137b5
06 7月, 2022 2 次提交
- L
  
  Fix nan in fused multi transformer (#44093) · d7f4599d
  由 LiYuRio 提交于 7月 06, 2022
  
  d7f4599d
- X
  [Paddle Inference] Add conv_elementwise_act. (#43871) · 4c269ccb
  由 xiaoxiaohehe001 提交于 7月 06, 2022
```
* conv_fusion
```
  4c269ccb
02 7月, 2022 1 次提交

unify cpu context, part2 (#44012) · 755438a7

由 Leo Chen 提交于 7月 02, 2022

* fix init()

* delete test_device_context

* replace CPUDeviceContext with CPUContext

* fix test_scalar

* remove dot_op.cc

* fix compile

755438a7

01 7月, 2022 1 次提交

Addition of switch_auto_tune option for transpose op (#43310) · 53d5abe3

由 limingshu 提交于 7月 01, 2022

* 2nd part of transpose update

* add switch_auto_tune option.

* add some changes according to Ci

* refine the structure of auto_tune_base.

* merge develop changes

* reset the switch_set_range and change unittest of transpose auto-tune

* change the kernel auto-tune logits

53d5abe3

30 6月, 2022 1 次提交
- W
  fused_gate_attention manual code in eager (#43897) · 73f957cf
  由 wanghuancoder 提交于 6月 30, 2022
```
* fused_gate_attention manual code in eager

* refine

* refine

* refine

* refine

* refine

* refine
```
  73f957cf

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功