提交 · c775bc69d02d2d3b4eae6c1b9862d57cecb8bba0 · BaiXuePrincess / Paddle

19 11月, 2022 1 次提交

[CustomPlace] fix amp (#48090) · c775bc69

由 Aganlengzi 提交于 11月 19, 2022

* [CustomPlace] fix amp

* [CustomPlace] fix amp

* fix ut because of too long time matmul fp16

c775bc69

18 11月, 2022 15 次提交

W

refine save hook (#48124) · 04709310
由 wanghuancoder 提交于 11月 18, 2022

04709310

Fused QKVBiasAdd and Transpose with Split Q, KV (#47680) · d595928e

由 MarDino 提交于 11月 18, 2022

* fused qkvBiasAdd and transpose with split qkv

* fix typo

* fix format

* fix name

* add annotation

* fix comment

d595928e

[PHI] Migrate matmul_grad kernel (#48023) · 4ab18ada

由 Sławomir Siwek 提交于 11月 18, 2022

* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency

* init

* ExecuteMatMulV2

* rm fluid kernel

* matmul_grad

* remove mutable_data

4ab18ada

[PHI] Migrate conv_transpose kernel (#48119) · 9aacb31b

由 Zuza Gawrysiak 提交于 11月 18, 2022

* Migrate conv_transpose to phi

* Move handler to kernel

* kernel m

* Fix formatting

* handler

* remove fluid

* revert tcp_store

* tcp_store

* remove unused

* Fix declaration

* add dnn input

* Fix typo
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

9aacb31b

Z
Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
由 zyfncg 提交于 11月 18, 2022
```
* fix bug of zero_allocator in host

* fix test compile bug

* add unittest

* update test
```
7f92e27e

Optimize FusedBiasAddGelu Kernel (#47679) · b0e28540

由 MarDino 提交于 11月 18, 2022

* Add quick gelu and fused bias add kernel

* fix annotation

* remove useless code

* add fast gelu option and set it in multi transformer op

* add flag to restrict if use fast gelu approximate

* fix flags conflict

* fix use tanh function instead

* add cudart version limit

* use phi fast tanh func

* fix comment

b0e28540

W

Refactor collective communication reduce, scatter, reduce_scatter C++ API (#48115) · edda13cd
由 Wen Sun 提交于 11月 18, 2022

edda13cd

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

W
[PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
9918bf9c
Z

cast and gradient_accumulator support double for xpu, test=kunlun (#47800) · 982d5ff7
由 zhangyikun02 提交于 11月 18, 2022

982d5ff7
F

optimize: vectorize transpose_padding (#48116) · 635958d9
由 feng_shuai 提交于 11月 18, 2022

635958d9
F

fix: supoort huge length of attention (#48053) · 42f35841
由 feng_shuai 提交于 11月 18, 2022

42f35841
H

rm "paddle/fluid/operators/amp/fp16_type_traits.h" in phi (#48051) · e4670d80
由 huangjiyi 提交于 11月 18, 2022

e4670d80

17 11月, 2022 13 次提交
- Z
  Clip intermediate output of op when save inference model (#48026) · fafc7be2
  由 zyfncg 提交于 11月 17, 2022
```
* clip extra and intermediate output of op

* fix bug

* fix bug

* polich code

* polich log
```
  fafc7be2
- W
  
  Refactor collective communication all_to_all, all_to_all_single C++ API (#48059) · 3f480af2
  由 Wen Sun 提交于 11月 17, 2022
  
  3f480af2
- W
  support int input for scale (#48044) · dbc63555
  由 wenbin 提交于 11月 17, 2022
```
* int scale

* round

* revert commit
```
  dbc63555
- H
  
  fix new executor gc dep bug (#48068) · 04dcb9d7
  由 hong 提交于 11月 17, 2022
  
  04dcb9d7
- H
  
  rm "paddle/fluid/framework/convert_utils.h" in phi (#48001) · 2f34fc7a
  由 huangjiyi 提交于 11月 17, 2022
  
  2f34fc7a
- Y
  [PHI]Standardise some C++ API (Part5) (#47860) · f3650201
  由 YuanRisheng 提交于 11月 17, 2022
```
* standard api

* fix xpu bugs
```
  f3650201
- T
  
  xpu-paddlepaddle-41 [任务] ffn and attention test=kunlun (#46658) · 071708fa
  由 taixiurong 提交于 11月 17, 2022
  
  071708fa
- W
  
  move "function_traits.h" from fluid to phi (#48065) · b7841a2b
  由 Wang Xin 提交于 11月 17, 2022
  
  b7841a2b
- X
  [Paddle Inference] Support cast trt converter of bool input and output . (#48043) · ff44df18
  由 xiaoxiaohehe001 提交于 11月 17, 2022
```
* add_cast_bool

* cast
```
  ff44df18
- H
  [PHI decoupling] move "paddle/fluid/operators/math.h" to phi (#48062) · f62bd3b4
  由 huangjiyi 提交于 11月 17, 2022
```
* rm "paddle/fluid/operators/math.h" in phi

* rm "paddle/fluid/operators/math.h" in fluit
```
  f62bd3b4
- Y
  Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16... · e5ed5257
  由 Yuang Liu 提交于 11月 17, 2022
```
Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16 training with tensor fusion. (#48041)

* add bfloat16 for adamw

* set lr not to bfloat16 for pure bf16 training

* update the logic

* update the adamw optimizer

* support bfloat for adam
```
  e5ed5257
- S
  Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5
  由 sneaxiy 提交于 11月 17, 2022
```
* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again
```
  ccbd03d5
- Z
  
  generate static graph code for some op (#48036) · 7cc0d171
  由 zyfncg 提交于 11月 17, 2022
  
  7cc0d171
16 11月, 2022 9 次提交
- X
  [Paddle Inference] Add fill_any_like trt converter. (#47974) · d6be9000
  由 xiaoxiaohehe001 提交于 11月 16, 2022
```
* add_fill_any_like

* add_fill_any_like
```
  d6be9000
- W
  elementwise_floordiv (#47944) · b4b78060
  由 wenbin 提交于 11月 16, 2022
```
* elementwise_op

* add teller

* modify ut

* comments

* modify ut

* return

* modify
```
  b4b78060
- Z
  
  trt memory set change from setMaxWorkspaceSize to setMemoryPoolLimit since trt 8.3+ (#47795) · 9cf3aa61
  由 Zhang Jun 提交于 11月 16, 2022
  
  9cf3aa61
- Z
  
  [inference][trt] update trt hardswish plugin to layer (#47745) · 6c54e0e8
  由 Zhang Jun 提交于 11月 16, 2022
  
  6c54e0e8
- P
  Add bf16 data type support to oneDNN bilinear_interp kernel (#46770) · 8e6315e4
  由 Piotr Paturej 提交于 11月 16, 2022
```
* Enable bf16 in oneDNN bilinear_interp kernel

* Fix bilinear_interp_v2 not enabled in models

* Remove unnecessary checks
```
  8e6315e4
- H
  remove avx check (#48003) · a762d68e
  由 hong 提交于 11月 16, 2022
```
* remove avx check

* fix bug;
```
  a762d68e
- L
  
  increase the level of some log (#47990) · 2f8901cb
  由 Leo Chen 提交于 11月 16, 2022
  
  2f8901cb
- W
  Update `ProcessGroupCustom` for `sync_op` compatibility (#47976) · e4ebf383
  由 Wen Sun 提交于 11月 16, 2022
```
* refactor: update pg custom

* fix: use new api in ut

* fix: typo

* revert: recover legacy apis

* fix: add GetDeviceContext
```
  e4ebf383
- C
  
  feat(ipu): add paddle inference support for model_runtime. (#47364) · 39c85064
  由 czr-gc 提交于 11月 16, 2022
  
  39c85064
15 11月, 2022 2 次提交
- Y
  
  fix onednn bugs, test=document_fix (#48013) · 21d4fa02
  由 YuanRisheng 提交于 11月 15, 2022
  
  21d4fa02
- J
  Added optimization pass for oneDNN layernorm kernel (#47782) · 519e7426
  由 jakpiase 提交于 11月 15, 2022
```
* optimization for ln

* fix

* added output to gpd

* added formatting

* fix
```
  519e7426

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致