提交 · aafa9820ea46343de4881f65030f5fa452be3c6c · PaddlePaddle / Paddle

18 11月, 2022 13 次提交

correct sync behavior for XPU distributed training (#47882) · aafa9820

由 james 提交于 11月 18, 2022

* correct sync behavior for XPU distributed training

XPU support event mechanism similar to cuda event, so it is advisable to
use an event to sync compute/comm streams for performance. However this
mechanism is never fully tested, and inconsistent loss/ending_epochs are
reported. Therefore, this PR replaces event sync with stream waiting as
a temporary solution.

* remove compile warning

aafa9820

D

Add description to `nn.functional.celu` (#48074) · 1fb4d90b
由 Dandelight 提交于 11月 18, 2022

1fb4d90b

fix device id issue for xpu eager mode (#48076) · 3b18d96b

由 james 提交于 11月 18, 2022

* fix device id issue for xpu eager

xpu device id is not correctly set in eager mode, thus vars are on dev0 unless
XPUDeviceGurad is called, leading to this error message for all node rank != 0:
"NotImplementedError: (Unimplemented) Place Place(xpu:0) is not supported."

* fix typo

* fix pybind error

3b18d96b

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

G

remove no used fluid beam_search_decoder (#48096) · 593bc4e2
由 GGBond8488 提交于 11月 18, 2022

593bc4e2
Y

add bf16 for numel (#48121) · a7d306af
由 Yuang Liu 提交于 11月 18, 2022

a7d306af
W
[PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
9918bf9c

Allow to specify train_bs and eval_bs separately in hapi.fit() (#48032) · a33d563c

由 parap1uie-s 提交于 11月 18, 2022

* Fix hAPI bug of not compatible with LayerHook

https://github.com/PaddlePaddle/Paddle/issues/47000

* Fix hAPI bug of not compatible with LayerHook

* Allow to specify train_bs and eval_bs separately in hapi.fit()

* Update model.py

* Update Model.py

* Update test_model.py

* update model.py

a33d563c

Z

cast and gradient_accumulator support double for xpu, test=kunlun (#47800) · 982d5ff7
由 zhangyikun02 提交于 11月 18, 2022

982d5ff7
F

optimize: vectorize transpose_padding (#48116) · 635958d9
由 feng_shuai 提交于 11月 18, 2022

635958d9
F

fix: supoort huge length of attention (#48053) · 42f35841
由 feng_shuai 提交于 11月 18, 2022

42f35841
S

fix onednn prelu header (#48064) · 85598e31
由 Sylwester Fraczek 提交于 11月 18, 2022

85598e31
H

rm "paddle/fluid/operators/amp/fp16_type_traits.h" in phi (#48051) · e4670d80
由 huangjiyi 提交于 11月 18, 2022

e4670d80

17 11月, 2022 27 次提交
- Z
  Clip intermediate output of op when save inference model (#48026) · fafc7be2
  由 zyfncg 提交于 11月 17, 2022
```
* clip extra and intermediate output of op

* fix bug

* fix bug

* polich code

* polich log
```
  fafc7be2
- 傅
  
  remove fluid.layers.soft_relu in nn.py under fluid (#47925) · ef51bbfd
  由傅剑寒提交于 11月 17, 2022
  
  ef51bbfd
- 傅
  [fluid clear] remove unstack in nn.py under fluid (#47927) · 8d08c9e0
  由傅剑寒提交于 11月 17, 2022
```
* remove unstack in nn.py under fluid

* remove unstack under fluid
```
  8d08c9e0
- H
  [Clean fluid] Clean fluid elementwise_min/pow/mod/floordiv, remove API (#48040) · 74e3f26f
  由 HongyuJia 提交于 11月 17, 2022
```
* clean fluid elementwise_pow, remove API

* clean elem_pow doc

* clean elementwise_mod

* clean elementwise min, floordiv, mod
```
  74e3f26f
- Q
  [NPU] add _npu_identity op and api, test=develop (#47850) · 099c2302
  由 Qi Li 提交于 11月 17, 2022
```
* [NPU] add _npu_identity op and api, test=develop

* fix doc

* address comments
```
  099c2302
- 傅
  (fluid清理)remove swish in nn.py under fluid (#47891) · 7619188a
  由傅剑寒提交于 11月 17, 2022
```
* remove swish in nn.py under fluid

* fix tswish test case
```
  7619188a
- 傅
  
  remove stanh in nn.py under fluid (#47889) · 209f684c
  由傅剑寒提交于 11月 17, 2022
  
  209f684c
- W
  
  Refactor collective communication all_to_all, all_to_all_single C++ API (#48059) · 3f480af2
  由 Wen Sun 提交于 11月 17, 2022
  
  3f480af2
- W
  support int input for scale (#48044) · dbc63555
  由 wenbin 提交于 11月 17, 2022
```
* int scale

* round

* revert commit
```
  dbc63555
- X
  
  fix the thread number to ensure deterministic of embedding kernel (#48073) · 5329187d
  由 xiongkun 提交于 11月 17, 2022
  
  5329187d
- H
  
  fix new executor gc dep bug (#48068) · 04dcb9d7
  由 hong 提交于 11月 17, 2022
  
  04dcb9d7
- H
  
  rm "paddle/fluid/framework/convert_utils.h" in phi (#48001) · 2f34fc7a
  由 huangjiyi 提交于 11月 17, 2022
  
  2f34fc7a
- Y
  [PHI]Standardise some C++ API (Part5) (#47860) · f3650201
  由 YuanRisheng 提交于 11月 17, 2022
```
* standard api

* fix xpu bugs
```
  f3650201
- M
  
  optimizing a bit tensor_array initialization (#48066) · c374894d
  由 Mountagha 提交于 11月 17, 2022
  
  c374894d
- T
  
  xpu-paddlepaddle-41 [任务] ffn and attention test=kunlun (#46658) · 071708fa
  由 taixiurong 提交于 11月 17, 2022
  
  071708fa
- 傅
  
  remove fluid.layers.affine_grid API (#47851) · b4460eee
  由傅剑寒提交于 11月 17, 2022
  
  b4460eee
- W
  
  move "function_traits.h" from fluid to phi (#48065) · b7841a2b
  由 Wang Xin 提交于 11月 17, 2022
  
  b7841a2b
- X
  [Paddle Inference] Support cast trt converter of bool input and output . (#48043) · ff44df18
  由 xiaoxiaohehe001 提交于 11月 17, 2022
```
* add_cast_bool

* cast
```
  ff44df18
- Y
  Implement a common dimension simplifier. (#47981) · bf6af816
  由 Yiqun Liu 提交于 11月 17, 2022
```
* Implement a common dims simplifier.

* Fix the include position error.

* Reduce the cpu overhead of broadcast computing.
```
  bf6af816
- S
  
  fix bug of p2p (#48045) · cb087beb
  由 ShenLiang 提交于 11月 17, 2022
  
  cb087beb
- W
  
  support stage2 for gradient merge. (#47711) · c20eb7a6
  由 wuhuachaocoding 提交于 11月 17, 2022
  
  c20eb7a6
- K
  
  Remove reduntant numpy input in Example code, test=document_fix (#47916) · 460d5040
  由 Kevin吴嘉文提交于 11月 17, 2022
  
  460d5040
- H
  
  rm "paddle/phi/kernels/gpu/batch_norm_utils.h" in phi (#48057) · b7e120d2
  由 huangjiyi 提交于 11月 17, 2022
  
  b7e120d2
- H
  [PHI decoupling] move "paddle/fluid/operators/math.h" to phi (#48062) · f62bd3b4
  由 huangjiyi 提交于 11月 17, 2022
```
* rm "paddle/fluid/operators/math.h" in phi

* rm "paddle/fluid/operators/math.h" in fluit
```
  f62bd3b4
- Y
  Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16... · e5ed5257
  由 Yuang Liu 提交于 11月 17, 2022
```
Support bfloat16 for adamw and adam optimizer. Fit the lr for pure bf16 training with tensor fusion. (#48041)

* add bfloat16 for adamw

* set lr not to bfloat16 for pure bf16 training

* update the logic

* update the adamw optimizer

* support bfloat for adam
```
  e5ed5257
- [Zero-Dim] temporarily revert create_scalar due to input 0D is not fully supported (#48058) · 4f57da5f
  由 zhouweiwei2014 提交于 11月 17, 2022
  
  4f57da5f
- S
  Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5
  由 sneaxiy 提交于 11月 17, 2022
```
* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again
```
  ccbd03d5

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功