提交 · 4638a62ea04e2d11dbb9a34d33fc8864e82c5481 · PaddlePaddle / Paddle

21 3月, 2023 1 次提交

[PHI decoupling] Move DataType* from paddle:experimental to phi namespace (#51716) · 4638a62e

由 iSerendipity 提交于 3月 21, 2023

* move DataType from paddle::experimental to phi

* convert namespace

* convert namespace

* convert namespace

* clarify namespace

* convert more datatype

* Revert "convert more datatype"

This reverts commit 083b462959e6a22d4d8767707b628b95b396642e.

* convert more in auto_code_generator

* fix conflicts for XPU

* fix namespace conflicts

* fix errors

* Revert "fix errors"

This reverts commit f9d9958b54ee32141112274c8a5c3c381ab0f876.

* fix errors

* fix formatting

4638a62e

16 3月, 2023 1 次提交
- H
  [phi decoupling] remove fluid gpu_info usage in phi (#51699) · 907433a7
  由 Huang Jiyi 提交于 3月 16, 2023
```
* remove fluid thread_data_registry

* update

* fix bug
```
  907433a7
15 3月, 2023 1 次提交
- J
  
  modify cmake rules temporarily (#51644) · 521bba9c
  由 JingZhuangzhuang 提交于 3月 15, 2023
  
  521bba9c
16 1月, 2023 1 次提交

CUDA12.0 integration (#49539) · 1885d55a

由 zlsh80826 提交于 1月 16, 2023

* Update warpctc for cuda-12

* Deprecate cudaProfilerInitialize for CUDA > 11

* Deprecate CUSPARSE_MV_ALG_DEFAULT for CUDA_VERSION >= 11040

* Add the missing thrust header

1885d55a

20 12月, 2022 1 次提交

[PHI decouple] move dropout_impl and cuda_graph_with_memory_pool from fluid to phi (#49139) · 579784e2

由 huangjiyi 提交于 12月 20, 2022

* move dropout_impl from fluid to phi

* move cuda_graph_with_memory_pool from fluid to phi

* update namespace

* remove cuad_graph in fluid

* fix mac-build

* fix bugs

* correct CodeStyle

* fix mac-build

* fix mutable_data

* fix stl include

* fix copy param

579784e2

09 12月, 2022 1 次提交
- P
  
  [PHI decoupling] move "flags.h" from fluid to phi (#48696) · 39ffef0d
  由 PuQing 提交于 12月 09, 2022
  
  39ffef0d
08 12月, 2022 1 次提交

[PHI decoupling] move cuda_graph from fluid to phi (#48686) · a4d9851b

由 huangjiyi 提交于 12月 08, 2022

* move cuda_graph from fluid to phi

* move device_memory_aligment from fluid to phi

* Revert "move device_memory_aligment from fluid to phi"

This reverts commit b92fcd39a0a50fdac13278f49be0237a85f3a13f.

* update xpu cmake

a4d9851b

05 12月, 2022 1 次提交
- H
  
  move device_memory_aligment from fluid to phi (#48694) · 796499fd
  由 huangjiyi 提交于 12月 05, 2022
  
  796499fd
28 11月, 2022 1 次提交

[PHI decoupling] move several header files from fluid to phi (#48415) · fd9c91c3

由 huangjiyi 提交于 11月 28, 2022

* decouple cudnn_desc.h from fluid

* move cudnn_desc.h from fluid to phi

* fix bugs

* decouple cudnn_helper.h from fluid

* fix bugs

* move cudnn_helper.h from fluid to phi

* add fluid cudnn_helper.h

* move miopen_desc.h from fluid to phi

* move miopen_helper.h from fluid to phi

* fix bugs

* move gpu_dnn.h from fluid to phi

* fix bugs

* update copyright year

* simplify gpu_dnn.h in fluid

* fix bugs

* fix xpu build bug

* fix compile bug

* fix bug

fd9c91c3

24 11月, 2022 1 次提交

[PHI decoupling] simplify "convert_utils.h" in fluid (#48168) · de4310e6

由 huangjiyi 提交于 11月 24, 2022

* rm dependence to "convert_utils.h" in some files

* fix bugs

* replace DataType2String with DataTypeToString

* replace framework::DataTypeSize with phi::SizeOf

* mv convert_function from fluid to phi and rm old map

* recommit with pre-commit

* repalce ProtoVarType with ProtoDataType and update comment.

* fix error about include "dnnl.hpp"

* revert add dep mkldnn to convert_utils in phi

* add mkldnn deps in convert_utils.h in phi

* move deps to convert_utils.h in phi

de4310e6

22 11月, 2022 1 次提交

[PHI decoupling] remove "gpu_device_function.h" in fluid. (#48117) · 4da1a0fe

由 huangjiyi 提交于 11月 22, 2022

* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* rm dependence to "gpu_device_function.h" in fluid

* rm gpu_device_function.h etc in fluid

* fix rocm-complie bugs

* fix cuda_helper_test.cu bugs

4da1a0fe

18 11月, 2022 3 次提交

Z
Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
由 zyfncg 提交于 11月 18, 2022
```
* fix bug of zero_allocator in host

* fix test compile bug

* add unittest

* update test
```
7f92e27e

CUDNN v8 Implementation of Convolution Kernels (#47454) · 14a6e67b

由 Tian Zheng 提交于 11月 18, 2022

* Refactor conv_kernel and conv_grad_kernel to provide interface for CUDNNv8 implementation

* Fix macro

* Add implementation for conv_kernel and conv_grad_kernel

* Modification after rebase onto latest develop

* Modify plan cache to comply with the API of phi::autotune

* Refactor to reduce duplicate code

* Review fix:
- move functions in  conv_kernel_impl_v8.h and conv_grad_kernel_impl_v8.h to conv_kernel.cu and conv_grad_kernelk.cu
- add const specifier for input tensor
- add logging when plans fail to execute
- move CudnnConvBwdFilterV8 and CudnnConvBwdDataV8 to conv_cudnn_frontend.h

* - move plan building outside of cache

* Fix ROCM build

14a6e67b

W
[PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
9918bf9c

17 11月, 2022 1 次提交

Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5

由 sneaxiy 提交于 11月 17, 2022

* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again

ccbd03d5

07 11月, 2022 1 次提交

[Restore PR] Remove hard code of PADDLE_WITH_CUDA (#47630) · 908a381d

由 HongyuJia 提交于 11月 07, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

* Call SetDnnFallback function in the base class

* activation fallback to plain kernel

* fix default GetExpectedKernelType find wrong kernel

* search cudnn kernel instead of fallback

* fix cudnn_handle bug

* remove tanh use_cudnn

* restore tanh use_cudnn

* debug tanh

* fix tanh bug

* delete activation cudnn kernel

* polish code

908a381d

02 11月, 2022 1 次提交
- H
  Revert "[Kernel Selection] Remove hard code of PADDLE_WITH_CUDA (#47325)" (#47582) · a57a19ea
  由 HongyuJia 提交于 11月 02, 2022
```
This reverts commit f9134045.
```
  a57a19ea
01 11月, 2022 1 次提交

[Kernel Selection] Remove hard code of PADDLE_WITH_CUDA (#47325) · f9134045

由 HongyuJia 提交于 11月 01, 2022

* move cudnn hardcode outside GetExpectedKernelType

* add header file

* debug

* update interpreter_util with hardcode

* update interpreter_util headerfile

* solve activation hardcode

* debug with CI

* add mkldnn_op_list header file

* temporarily uncomment mkldnn

* temporarily uncomment mkldnn

* delete sequence_softmax cudnn hardcode

* add hardcode to data_transfer.cc

* update data_transfer headerfile

* try fix segment fault

* update cudnn&miopen_helper

* reset HasAttr of DygraphExctnCtx

* debug, this commit should pass all CI

* debug should pass CI, temporarily disable activation

* debug should pass CI

* fix default_attr=nullptr bug

* clean debug code

f9134045

25 10月, 2022 1 次提交
- H
  
  opt conv_transpose cudnn (#47294) · afd5a96b
  由 HongyuJia 提交于 10月 25, 2022
  
  afd5a96b
21 10月, 2022 1 次提交
- Y
  fix nvprof_nvtx_push interface bug (#47232) · 340009d6
  由 Yuanle Liu 提交于 10月 21, 2022
```
* fix nvprof_nvtx_push interface bug
```
  340009d6
19 10月, 2022 1 次提交
- Y
  
  add nvtxRangePush/Pop for naive_executor and refine some code (#47139) · de6e7431
  由 Yuanle Liu 提交于 10月 19, 2022
  
  de6e7431
11 10月, 2022 1 次提交
- W
  
  Completes bfloat16 dtype for collective api in eager mode (#45844) · e4eb8d36
  由 Wen Sun 提交于 10月 11, 2022
  
  e4eb8d36
30 9月, 2022 1 次提交

support pure bfloat16 for more ops (#46364) · b7b231a6

由 sneaxiy 提交于 9月 30, 2022

* support pure bfloat16

* support bf16 linear

* update PR to pass CI

* tiny fix where_grad_kernel.cu

* add bfloat16 to selu_grad to pass CI

* fix selu grad compilation error

b7b231a6

28 9月, 2022 1 次提交

Remove the declaration of using Tensor in framework/tensor.h (#46432) · e12a905e

由 Chen Weihang 提交于 9月 28, 2022

* remove needless using tensor

* remove needless using tensor

* resolve conflict

* replace tensor using

* fix format error

* revert needless changing

* fix rocm and npu compile error

* fix cinn compile error

* fix format error

* fix mkldnn format error

* fix mkldnn format error

* fix cinn compile error

* fix cinn compile error

* fix cinn compile error

* resolve conflict

e12a905e

16 9月, 2022 1 次提交

Support broadcast elementwise operators with int64 index type (#45741) · 20b5bf84

由 sneaxiy 提交于 9月 16, 2022

* support int64 non-broadcast

* support broadcast case for int64 index

* fix bug

* support more Arity

* remove some codes

* upgrade patchelf to v0.15.0 to pass CI build

* fix bug

* fix patchelf installation

* add debug flags

* remove useless codes

* fix viterbi_decode and set_value op uts

* remove always enable int64

20b5bf84

05 9月, 2022 2 次提交
- C
  
  Fix jetson compile error (#45692) · cfaee812
  由 chalsliu 提交于 9月 05, 2022
  
  cfaee812
- S
  
  fix some op int32 exceed range (#45711) · a1dbee23
  由 sneaxiy 提交于 9月 05, 2022
  
  a1dbee23
12 8月, 2022 1 次提交

[geometric]Add paddle.geometric.send_ue_recv API (#43174) · 615b15a3

由 Siming Dai 提交于 8月 12, 2022

* add init file

* add op definition and infermeta

* add kernel definition funcs

* add broadcast infer shape

* add gpu forward kernel

* delete SUB and DIV

* add x_grad

* add template

* add e_grad for min and max

* fix small bug

* temp commit

* temp commit

* add e_grad for sum and mean

* fix some compile bug

* fix compile bugs

* fix compile problem

* add sum forward unittest

* fix broadcast error, add kernel sig, register e_grad, change unit test

* fix grad

* add temp grad fix

* temp commit

* add min max unittest

* add max, min unittest, fix mul bug

* add cpu forward sum and mean

* add forward min max, fix mean unittest

* add cpu backward min max

* fix code-style

* add backward sum mean

* fix rocm ci

* set uniitest timeout

* fix bug of x broadcast to e, gpu grad

* fix bug of x broadcast to e, cpu grad

* rename BOOST_GET_CONST macro

* fix rocm ci

* mv graph_send_e_recv to graph_send_ue_recv

* move out_size to IntArray

* add eager op test

* fix max pool type bug, add unittest for api

* revise api doc

* add fp16 for atomic min and max, add unittest

* add unittest

* add fp16 support for graph_send_recv

* fix unittest fp16 bug

* change OutSizeTensor to Out_size

* move E to Y

* add copyright, fix comment

* review code

* fix thread block size

* fix thread block size

* change api attribute name: pool_type to reduce_op, compute_type to message_op

* change api attribute name, move pool_type to reduce_op, move compute_type to message_op

615b15a3

01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

28 7月, 2022 1 次提交
- L
  
  Complete the dtypes for all_gather, add all_gather_object api (#44417) · d4cf02bc
  由 LiYuRio 提交于 7月 28, 2022
  
  d4cf02bc
27 7月, 2022 1 次提交
- Y
  
  [DCU] Fix NAN problem when training BERT on DUC platform (#44643) · 28aa0c61
  由 Yuang Liu 提交于 7月 27, 2022
  
  28aa0c61
19 7月, 2022 1 次提交

compile phi/backends into one static library (#44373) · 1047cb17

由 Leo Chen 提交于 7月 19, 2022

* compile into one static library

* fix xpu compile

* fix xpu compile

* fix inference compile

* fix inference compile

* add custom test

* revert one file

1047cb17

14 7月, 2022 2 次提交

refine allocation cmake (#44241) · dc5a0420

由 Leo Chen 提交于 7月 14, 2022

* build into one static library

* move memory/detail to memory/allocation

* fix bug

* fix profiler

* fix framework_proto

* fix deps

* fix inference compilation

* fix rocm compile

* follow comments

* fix buddy_allocator_test

dc5a0420

W
Compilation optimization (#44242) · 4baf0dbe
由 wanghuancoder 提交于 7月 14, 2022
```
* Compilation optimization
```
4baf0dbe

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
24 6月, 2022 1 次提交

record memory and op supplement info (#43550) · 8dd0a3b9

由 chenjian 提交于 6月 24, 2022

* record memory and op supplement info

* update

* update

* fix a bug

* fix memory recording

* fix a bug

* update

* update

* fix a bug

* update

* fix a bug

* fix a bug

* fix a bug

* Revert "fix a bug"

This reverts commit c1d4df52762ba9ae7c7e27cd2ba4fc3a7ed9c7a5.

* fix a bug

* fix format

* fix

8dd0a3b9

15 6月, 2022 1 次提交
- add some kernels(csr*dense->csr, dense*dense->csr) of SparseTensor matmul (#42935) · 346efe96
  由 zhouweiwei2014 提交于 6月 15, 2022
```
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul

* fix CI

* fix CI

* fix comment

* fix comment
```
  346efe96
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
02 6月, 2022 1 次提交
- S
  Fix bug of CUDAGraph kernel parameter comparation (#43163) · 3fcfcd51
  由 sneaxiy 提交于 6月 02, 2022
```
* fix cuda graph sizeof

* fix tuple type
```
  3fcfcd51

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功