提交 · edda13cd88b269c932e1d8fafa5a6fabbbda72a2 · PaddlePaddle / Paddle

18 11月, 2022 7 次提交

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

Y

add bf16 for numel (#48121) · a7d306af
由 Yuang Liu 提交于 11月 18, 2022

a7d306af
W
[PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
9918bf9c
Z

cast and gradient_accumulator support double for xpu, test=kunlun (#47800) · 982d5ff7
由 zhangyikun02 提交于 11月 18, 2022

982d5ff7
S

fix onednn prelu header (#48064) · 85598e31
由 Sylwester Fraczek 提交于 11月 18, 2022

85598e31
H

rm "paddle/fluid/operators/amp/fp16_type_traits.h" in phi (#48051) · e4670d80
由 huangjiyi 提交于 11月 18, 2022

e4670d80

17 11月, 2022 13 次提交
- Q
  [NPU] add _npu_identity op and api, test=develop (#47850) · 099c2302
  由 Qi Li 提交于 11月 17, 2022
```
* [NPU] add _npu_identity op and api, test=develop

* fix doc

* address comments
```
  099c2302
- X
  
  fix the thread number to ensure deterministic of embedding kernel (#48073) · 5329187d
  由 xiongkun 提交于 11月 17, 2022
  
  5329187d
- H
  
  rm "paddle/fluid/framework/convert_utils.h" in phi (#48001) · 2f34fc7a
  由 huangjiyi 提交于 11月 17, 2022
  
  2f34fc7a
- Y
  [PHI]Standardise some C++ API (Part5) (#47860) · f3650201
  由 YuanRisheng 提交于 11月 17, 2022
```
* standard api

* fix xpu bugs
```
  f3650201
- M
  
  optimizing a bit tensor_array initialization (#48066) · c374894d
  由 Mountagha 提交于 11月 17, 2022
  
  c374894d
- T
  
  xpu-paddlepaddle-41 [任务] ffn and attention test=kunlun (#46658) · 071708fa
  由 taixiurong 提交于 11月 17, 2022
  
  071708fa
- W
  
  move "function_traits.h" from fluid to phi (#48065) · b7841a2b
  由 Wang Xin 提交于 11月 17, 2022
  
  b7841a2b
- Y
  Implement a common dimension simplifier. (#47981) · bf6af816
  由 Yiqun Liu 提交于 11月 17, 2022
```
* Implement a common dims simplifier.

* Fix the include position error.

* Reduce the cpu overhead of broadcast computing.
```
  bf6af816
- H
  
  rm "paddle/phi/kernels/gpu/batch_norm_utils.h" in phi (#48057) · b7e120d2
  由 huangjiyi 提交于 11月 17, 2022
  
  b7e120d2
- H
  [PHI decoupling] move "paddle/fluid/operators/math.h" to phi (#48062) · f62bd3b4
  由 huangjiyi 提交于 11月 17, 2022
```
* rm "paddle/fluid/operators/math.h" in phi

* rm "paddle/fluid/operators/math.h" in fluit
```
  f62bd3b4
- Y
  Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16... · e5ed5257
  由 Yuang Liu 提交于 11月 17, 2022
```
Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16 training with tensor fusion. (#48041)

* add bfloat16 for adamw

* set lr not to bfloat16 for pure bf16 training

* update the logic

* update the adamw optimizer

* support bfloat for adam
```
  e5ed5257
- S
  Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5
  由 sneaxiy 提交于 11月 17, 2022
```
* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again
```
  ccbd03d5
- Z
  
  generate static graph code for some op (#48036) · 7cc0d171
  由 zyfncg 提交于 11月 17, 2022
  
  7cc0d171
16 11月, 2022 7 次提交
- H
  
  rm "paddle/fluid/framework/gpu_utils.h" in phi (#48020) · 29a0987a
  由 huangjiyi 提交于 11月 16, 2022
  
  29a0987a
- Q
  [NPU] update npu prop, test=develop (#47859) · ad8847aa
  由 Qi Li 提交于 11月 16, 2022
```
* [NPU] update npu prop, test=develop

* remove ddim.h

* remove diff

* update storage prop, test=develop
```
  ad8847aa
- H
  [Opt depthwise_conv2d] Simplify depthwise_conv2d use_cudnn attribute (#48010) · 7c304580
  由 HongyuJia 提交于 11月 16, 2022
```
* simplify depthwise_conv2d phi kernel selection

* fix depthwise_conv2d
```
  7c304580
- P
  Add bf16 data type support to oneDNN bilinear_interp kernel (#46770) · 8e6315e4
  由 Piotr Paturej 提交于 11月 16, 2022
```
* Enable bf16 in oneDNN bilinear_interp kernel

* Fix bilinear_interp_v2 not enabled in models

* Remove unnecessary checks
```
  8e6315e4
- Y
  Fix paddle rec, kim, dsin models' bugs (#47792) · e23dfed9
  由 ykkk2333 提交于 11月 16, 2022
```
* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* embedding and embedding_grad add int32 input, test=kunlun
```
  e23dfed9
- L
  
  increase the level of some log (#47990) · 2f8901cb
  由 Leo Chen 提交于 11月 16, 2022
  
  2f8901cb
- W
  
  move "gpu_primitives.h" to phi (#48015) · 9adca1e7
  由 Wang Xin 提交于 11月 16, 2022
  
  9adca1e7
15 11月, 2022 8 次提交

S

add gather dtype err msg (#48002) · 5859d0a6
由 sneaxiy 提交于 11月 15, 2022

5859d0a6

[Opt Error Message] Opt error message when selecting kernels under phi (#47970) · fd550c1b

由 HongyuJia 提交于 11月 15, 2022

* opt error message when selecting kernels under phi

* fix for loop

* polish error message

* polish error message, split into 3 error condition

* polish error message

fd550c1b

Y

Update for scatter support fake 2d index (#47946) · e65bac28
由 Yuang Liu 提交于 11月 15, 2022

e65bac28
[Zero-Dim] support input 0D Tensor for xpu kernel, test=kunlun (#47849) · d4d3d7ed
由 zhouweiwei2014 提交于 11月 15, 2022

d4d3d7ed

mkldnn directory cleanup (#47779) · 8a339d24

由 Sławomir Siwek 提交于 11月 15, 2022

* cleanup unused code

* unify is_int8 is_bfloat16

* Simplify matmul_v2 FWD kernel

* remove RunKernel methods

* remove import namespace

* remove headers

* clean fluid/phi cross imports

* remove fluid axpy_handler

* delete fluid methods

* activations

* OneDNNMemDesc

* MKLDNNFormatForSize

* MatchShapeToLayout

* MKLDNNMemoryFormat

* MKLDNNFormat

* ReorderMKLDNNHandler

* to_void_cast

* review suggestions

* interpolate

* remove fluid depedency

8a339d24

H
[PHI decoupling] remove "paddle/fluid/platform/complex.h" in phi (#47926) · aa08b769
由 huangjiyi 提交于 11月 15, 2022
```
* rm "paddle/fluid/platform/complex.h" in phi

* fix codestyle with pre-commit
```
aa08b769
W

remove 'paddle/fluid/operators/conv_op.h' from phi (#47914) · f7bf2930
由 Wang Xin 提交于 11月 15, 2022

f7bf2930

[PHI decoupling] remove dependency on "paddle/fluid/operators/elementwise/xxx.h" in phi (#47870) · 04c29558

由 huangjiyi 提交于 11月 15, 2022

* rm "paddle/fluid/operators/elementwise/xxx.h" in phi

* fix bugs

* add LaunchElementwiseCudaKernel in phi

* Revert "add LaunchElementwiseCudaKernel in phi"

This reverts commit 588f45bbdad2372ec7bff0c567a29bff675d22e1.

* rm indirect dependence to "elementwise_op_impl.cu.h"

rm indirect dependence to "elementwise_op_impl.cu.h"

Revert "add LaunchElementwiseCudaKernel in phi"

This reverts commit 588f45bbdad2372ec7bff0c567a29bff675d22e1.

add LaunchElementwiseCudaKernel in phi

fix bugs

* rm LaunchSameDimsElementwiseCudaKernel and LaunchElementwiseCudaKernel in phi

04c29558

14 11月, 2022 4 次提交
- [Zero-Dim] support input 0D Tensor as scalar attribute for some api (#47689) · e0be4b94
  由 zhouweiwei2014 提交于 11月 14, 2022
```
* [Zero-Dim] support input 0D Tensor as scalar attribute for some api

* fix doc
```
  e0be4b94
- C
  
  add cos double and triple grad operator (#47796) · 1a145aab
  由 cyber-pioneer 提交于 11月 14, 2022
  
  1a145aab
- J
  - Modified mem_desc() to return reference to Tensor::memory::desc to (#47844) · 2182a4f9
  由 Jacek Czaja 提交于 11月 14, 2022
```
avoid copying
```
  2182a4f9
- N
  
  Fix HOSTDEVICE redefinition during XPU KP compilation, test=kunlun (#47885) · 81e16a85
  由 niuliling123 提交于 11月 14, 2022
  
  81e16a85
11 11月, 2022 1 次提交
- [Zero-Dim] fix batch_norm op infermeta bug (#47858) · 18549417
  由 zhouweiwei2014 提交于 11月 11, 2022
  
  18549417

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功