提交 · 796499fd93076b107b149ce4abbb0b16c1f3a75e · BaiXuePrincess / Paddle

05 12月, 2022 1 次提交
- H
  
  move device_memory_aligment from fluid to phi (#48694) · 796499fd
  由 huangjiyi 提交于 12月 05, 2022
  
  796499fd
02 12月, 2022 1 次提交

add silu, silu_grad, unfold and unfold_grad xpu kernels (#48325) · f71de378

由 ykkk2333 提交于 12月 02, 2022

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add silu, unfold and their grads,test=kunlun

f71de378

30 11月, 2022 2 次提交

Z
Fix bug of wrong eigen dependency (#48485) · 35902ec6
由 zyfncg 提交于 11月 30, 2022
```
* fix bug of eigen_dependency

* fix xpu compile
```
35902ec6

use correct xpu stream for synchronization (#48470) · 16562a9d

由 james 提交于 11月 30, 2022

some legacy code still use xpu_wait() for stream sync -- it only syncs
default stream. this PR replaces them with dev_ctx.Wait() to ensure
that correct stream is always used

16562a9d

29 11月, 2022 2 次提交
- A
  [PHI decoupling]migrate enforce_custom.h from fluid to phi (#48422) · 9896ac1e
  由 Asthestarsfalll 提交于 11月 29, 2022
```
* migrate enforce_custom.h from fluid to phi

* move to backends/custom/
```
  9896ac1e
- H
  
  add floor fp32 op *test=kunlun (#48458) · 9d4b4be3
  由 haosicheng 提交于 11月 29, 2022
  
  9d4b4be3
28 11月, 2022 4 次提交

[PHI decoupling] move several header files from fluid to phi (#48415) · fd9c91c3

由 huangjiyi 提交于 11月 28, 2022

* decouple cudnn_desc.h from fluid

* move cudnn_desc.h from fluid to phi

* fix bugs

* decouple cudnn_helper.h from fluid

* fix bugs

* move cudnn_helper.h from fluid to phi

* add fluid cudnn_helper.h

* move miopen_desc.h from fluid to phi

* move miopen_helper.h from fluid to phi

* fix bugs

* move gpu_dnn.h from fluid to phi

* fix bugs

* update copyright year

* simplify gpu_dnn.h in fluid

* fix bugs

* fix xpu build bug

* fix compile bug

* fix bug

fd9c91c3

[Phi decouple] remove dependece to "paddle/fluid/platform/device/xpu/xxx.h" in phi (#48420) · 2bae75ed

由 huangjiyi 提交于 11月 28, 2022

* rm fluid “xpu_header.h” deps in phi

* move part of xpu_op_list.h from fluid to phi

* add fluid xpu_op_list deps

* add glog deps for xpu_op_list in phi

* fix PR-CI-Kunlun

2bae75ed

H

add square fp16 *test=kunlun (#48095) · 81d0a3cc
由 haosicheng 提交于 11月 28, 2022

81d0a3cc
张
Remove LoDTensor and Tensor in fluid except operators folder (#48416) · 4527d249
由张春乔提交于 11月 28, 2022
```
* Update communicator.cc

* Update communicator.cc

* remove LoDTensor

* remove LoDTensor and Tensor
```
4527d249

25 11月, 2022 1 次提交
- H
  
  fix xpu compile on phi::enforce. (#48345) · d90469a4
  由 houj04 提交于 11月 25, 2022
  
  d90469a4
24 11月, 2022 3 次提交

Z

add exp_grad, hard_sigmoid and hard_sigmoid_grad for xpu, test=kunlun (#48307) · d2f87d96
由 zhangyikun02 提交于 11月 24, 2022

d2f87d96
Z

add pad3d and pad3d_grad op for xpu, test=kunlun (#48306) · 22555e96
由 zhangyikun02 提交于 11月 24, 2022

22555e96

[PHI decoupling] simplify "convert_utils.h" in fluid (#48168) · de4310e6

由 huangjiyi 提交于 11月 24, 2022

* rm dependence to "convert_utils.h" in some files

* fix bugs

* replace DataType2String with DataTypeToString

* replace framework::DataTypeSize with phi::SizeOf

* mv convert_function from fluid to phi and rm old map

* recommit with pre-commit

* repalce ProtoVarType with ProtoDataType and update comment.

* fix error about include "dnnl.hpp"

* revert add dep mkldnn to convert_utils in phi

* add mkldnn deps in convert_utils.h in phi

* move deps to convert_utils.h in phi

de4310e6

23 11月, 2022 2 次提交
- Y
  add masked_select_grad kernel (#48137) · db0ea0ce
  由 ykkk2333 提交于 11月 23, 2022
```
* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

* add masked_selected_grad kernel,test=kunlun
```
  db0ea0ce
- Z
  
  add warpctc kernel and change cast_v2 to cast for xpu, test=kunlun (#48134) · 25ffe9c2
  由 zhangyikun02 提交于 11月 23, 2022
  
  25ffe9c2
22 11月, 2022 1 次提交

[PHI decoupling] remove "gpu_device_function.h" in fluid. (#48117) · 4da1a0fe

由 huangjiyi 提交于 11月 22, 2022

* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* rm dependence to "gpu_device_function.h" in fluid

* rm gpu_device_function.h etc in fluid

* fix rocm-complie bugs

* fix cuda_helper_test.cu bugs

4da1a0fe

21 11月, 2022 1 次提交
- T
  
  add adamw suppor xpu, test=kunlun (#48114) · 27e252d9
  由 taixiurong 提交于 11月 21, 2022
  
  27e252d9
18 11月, 2022 4 次提交

Z
Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
由 zyfncg 提交于 11月 18, 2022
```
* fix bug of zero_allocator in host

* fix test compile bug

* add unittest

* update test
```
7f92e27e

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

W
[PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
9918bf9c
Z

cast and gradient_accumulator support double for xpu, test=kunlun (#47800) · 982d5ff7
由 zhangyikun02 提交于 11月 18, 2022

982d5ff7

17 11月, 2022 2 次提交
- T
  
  xpu-paddlepaddle-41 [任务] ffn and attention test=kunlun (#46658) · 071708fa
  由 taixiurong 提交于 11月 17, 2022
  
  071708fa
- S
  Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5
  由 sneaxiy 提交于 11月 17, 2022
```
* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again
```
  ccbd03d5
11 11月, 2022 1 次提交

[IPU]: add model_runtime backend support in IPU (#47363) · 21b901cb

由 czr-gc 提交于 11月 11, 2022

* feat(ipu): add model_runtime backend support in IPU.

* fix(ipu_executor): fix error message format.

* fix(ipu_executor): fix format.

* fix(ipu_executor): fix format again.

* fix(ipu_executor): fix format again.

* fix(ipu_executor): fix format again.

21b901cb

10 11月, 2022 1 次提交

XPU multi-card support eager mode (#47445) · 3b91f8f3

由 james 提交于 11月 10, 2022

* XPU support eager mode

* add unittest for XPU eager mode

* minor bugfix

* minor bugfix, test=kunlun

* correct copyright info

* 1. remove unsed vars/funcs
2. ProcessGroupBKCL inherit from ProcessGroupStream

* bugfix for fp16 in eager mode multi-card, test=kunlun

* rebase & fix a few issues

* use new processgroup interface, test=kunlun

* fix compile issue, test=kunlun

3b91f8f3

08 11月, 2022 2 次提交
- Z
  
  add adadelta op for xpu, test=kunlun (#47661) · 047971f0
  由 zhangyikun02 提交于 11月 08, 2022
  
  047971f0
- Z
  
  argsort support n > 16384 and add argsort_grad op for xpu, test=kunlun (#47701) · 6a6a3ff1
  由 zhangyikun02 提交于 11月 08, 2022
  
  6a6a3ff1
07 11月, 2022 3 次提交

Q
support kldiv_loss/kldiv_loss_grad for kunlun (#47638) · 5f0a8adc
由 QingshuChen 提交于 11月 07, 2022
```
*test=kunlun
```
5f0a8adc

add roll and roll_grad kernels and strided_slice and strided_slice_grad... · 5a4d2186

由 ykkk2333 提交于 11月 07, 2022

add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun (#47368)

* add stat tool

* add roll and roll_grad kernels and strided_slice and strided_slice_grad kernels, test=kunlun

5a4d2186

[Restore PR] Remove hard code of PADDLE_WITH_CUDA (#47630) · 908a381d

由 HongyuJia 提交于 11月 07, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

* Call SetDnnFallback function in the base class

* activation fallback to plain kernel

* fix default GetExpectedKernelType find wrong kernel

* search cudnn kernel instead of fallback

* fix cudnn_handle bug

* remove tanh use_cudnn

* restore tanh use_cudnn

* debug tanh

* fix tanh bug

* delete activation cudnn kernel

* polish code

908a381d

04 11月, 2022 2 次提交
- H
  [XPU] add cumsum op. test=kunlun (#47585) · ac2a94c7
  由 houj04 提交于 11月 04, 2022
```
* [XPU] add cumsum op. test=kunlun

* try to fix linker. test=kunlun

* try to fix linker. test=kunlun

* try to fix linker. test=kunlun

* debug. test=kunlun

* update xpu.cmake. remove unnecessary codes. test=kunlun.
```
  ac2a94c7
- Y
  
  fix deepfm and deep_wide bug, add embedding_sparse_grad kernel, test=kunlun (#47365) · f53e920d
  由 ykkk2333 提交于 11月 04, 2022
  
  f53e920d
02 11月, 2022 2 次提交

H
Revert "[Kernel Selection] Remove hard code of PADDLE_WITH_CUDA (#47325)" (#47582) · a57a19ea
由 HongyuJia 提交于 11月 02, 2022
```
This reverts commit f9134045.
```
a57a19ea

[XPU] add int64 support for slice and subtract. (#47409) · 77395619

由 houj04 提交于 11月 02, 2022

* [XPU] add int64 support for slice and subtract. test=kunlun

* try to fix xpu compile. test=kunlun

* try to fix xpu compile. test=kunlun

* try to fix xpu compile. test=kunlun

* remove unnecessary modification. test=kunlun

77395619

01 11月, 2022 2 次提交

[Kernel Selection] Remove hard code of PADDLE_WITH_CUDA (#47325) · f9134045

由 HongyuJia 提交于 11月 01, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

f9134045

Adapting device-specific Extra Attributes for the PHI kernel (#46342) · c923e6c9

由 Chen Weihang 提交于 10月 31, 2022

* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* fix map at error

* Update paddle/phi/kernels/onednn/conv_grad_kernel.cc
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

* remove useless extra attrs

* replace mkldnn_engine by onednn_engine
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

c923e6c9

25 10月, 2022 1 次提交
- H
  
  opt conv_transpose cudnn (#47294) · afd5a96b
  由 HongyuJia 提交于 10月 25, 2022
  
  afd5a96b
21 10月, 2022 1 次提交
- Y
  fix nvprof_nvtx_push interface bug (#47232) · 340009d6
  由 Yuanle Liu 提交于 10月 21, 2022
```
* fix nvprof_nvtx_push interface bug
```
  340009d6
19 10月, 2022 1 次提交
- Y
  
  add nvtxRangePush/Pop for naive_executor and refine some code (#47139) · de6e7431
  由 Yuanle Liu 提交于 10月 19, 2022
  
  de6e7431

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致