提交 · b4b926f4a693699f2d0dcad1f10f6019725b31e0 · BaiXuePrincess / Paddle

25 11月, 2022 1 次提交
- S
  
  fix cuda 116 compile error (#48342) · 080349cd
  由 sneaxiy 提交于 11月 25, 2022
  
  080349cd
24 11月, 2022 1 次提交
- P
  
  [PHI decoupling] remove "paddle/fluid/platform/enforce.h" in phi (#48049) · df23c7c3
  由 PuQing 提交于 11月 24, 2022
  
  df23c7c3
23 11月, 2022 1 次提交
- S
  Make bfloat16 implicitly convert to float/double (#48238) · 1066094a
  由 sneaxiy 提交于 11月 23, 2022
```
* make bfloat16 implicit convert to float/double

* fix bfloat16_test ut compile
```
  1066094a
21 11月, 2022 1 次提交
- L
  
  add new map instance (#48145) · 2a47416c
  由 LiYuRio 提交于 11月 21, 2022
  
  2a47416c
18 11月, 2022 2 次提交

[PHI decoupling] move "gpu_device_function.h" from fluid to phi (#48097) · 27ee6e71

由 huangjiyi 提交于 11月 18, 2022

* move "paddle/phi/backends/gpu/gpu_device_function.h" to phi

* update copyright years

* rm "fluid/platform/device/gpu/gpu_device_function.h" in phi

* fix rocm-complie bugs

27ee6e71

W
[PHI decoupling] remove "gpu_primitives.h" in fluid (#48063) · 9918bf9c
由 Wang Xin 提交于 11月 18, 2022
```
* remove "gpu_primitives.h" in fluid namespace

* fix PR-CI-GpuPS fail

* fix PR-CI-GpuPS fail
```
9918bf9c

17 11月, 2022 1 次提交

Add vectorized bfloat16 atomicAdd (#48056) · ccbd03d5

由 sneaxiy 提交于 11月 17, 2022

* add vectorized bfloat16 atomicAdd

* fix compile error

* fix compile error again

* fix V100 compile error

* fix V100 compile again

ccbd03d5

16 11月, 2022 1 次提交
- W
  
  move "gpu_primitives.h" to phi (#48015) · 9adca1e7
  由 Wang Xin 提交于 11月 16, 2022
  
  9adca1e7
10 11月, 2022 1 次提交

change cudnn error to cuda error if compiled cuda version is incompatible with... · b96a21df

由 pangyoki 提交于 11月 10, 2022

change cudnn error to cuda error if compiled cuda version is incompatible with installed cuda version (#47743)

* fix cudnn error

* fix

* fix

* fix

b96a21df

04 11月, 2022 1 次提交
- P
  
  add cudnn error (#47666) · eb9e4601
  由 pangyoki 提交于 11月 04, 2022
  
  eb9e4601
01 11月, 2022 1 次提交

Adapting device-specific Extra Attributes for the PHI kernel (#46342) · c923e6c9

由 Chen Weihang 提交于 10月 31, 2022

* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* fix map at error

* Update paddle/phi/kernels/onednn/conv_grad_kernel.cc
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

* remove useless extra attrs

* replace mkldnn_engine by onednn_engine
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

c923e6c9

16 9月, 2022 1 次提交

Support broadcast elementwise operators with int64 index type (#45741) · 20b5bf84

由 sneaxiy 提交于 9月 16, 2022

* support int64 non-broadcast

* support broadcast case for int64 index

* fix bug

* support more Arity

* remove some codes

* upgrade patchelf to v0.15.0 to pass CI build

* fix bug

* fix patchelf installation

* add debug flags

* remove useless codes

* fix viterbi_decode and set_value op uts

* remove always enable int64

20b5bf84

06 9月, 2022 1 次提交
- W
  
  enable memory optimize when fp16. (#45792) · 1967c6a6
  由 Wilber 提交于 9月 06, 2022
  
  1967c6a6
05 9月, 2022 1 次提交
- S
  
  fix some op int32 exceed range (#45711) · a1dbee23
  由 sneaxiy 提交于 9月 05, 2022
  
  a1dbee23
24 8月, 2022 1 次提交

【Hackathon No.34】优化 poisson op (#45160) · 3c14b094

由 Rayman 提交于 8月 24, 2022

* 【Hackathon No.34】优化 poisson op

* [poisson] code style fix

* modify code style

* prevent from big number

* modify code style

* modify code style

* modify import

* modify import

* modify code style

3c14b094

10 8月, 2022 1 次提交
- L
  [new-exec] set cuda device before run (#44985) · 68b06ba6
  由 Leo Chen 提交于 8月 10, 2022
```
* set cuda device before run

* add header file

* fix compile
```
  68b06ba6
05 8月, 2022 1 次提交
- Q
  
  [DCU] fix hipDeviceAttributeManagedMemory not support on DTK, test=develop (#44816) · 075d7219
  由 Qi Li 提交于 8月 05, 2022
  
  075d7219
01 8月, 2022 1 次提交
- W
  infer context fix place error. (#44726) · 74e46a93
  由 Wilber 提交于 8月 01, 2022
```
* infer context fix place error.

* update

* update
```
  74e46a93
29 7月, 2022 1 次提交

move CUDAStream to phi (#44529) · da3743fd

由 Leo Chen 提交于 7月 29, 2022

* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests

da3743fd

26 7月, 2022 1 次提交
- W
  inference multi stream support handle lazy init. (#44563) · 1892a441
  由 Wilber 提交于 7月 26, 2022
```
* multi stream support handle lazy init.

* support eigen lazy init

* update

* fix ci problem
```
  1892a441
19 7月, 2022 1 次提交

compile phi/backends into one static library (#44373) · 1047cb17

由 Leo Chen 提交于 7月 19, 2022

* compile into one static library

* fix xpu compile

* fix xpu compile

* fix inference compile

* fix inference compile

* add custom test

* revert one file

1047cb17

12 7月, 2022 1 次提交
- C
  [PHI] Clean glog header in public header (#44216) · b0c9f24a
  由 Chen Weihang 提交于 7月 12, 2022
```
* clean glog header in public header

* move marco pos
```
  b0c9f24a
15 6月, 2022 2 次提交

add some kernels(csr*dense->csr, dense*dense->csr) of SparseTensor matmul (#42935) · 346efe96
由 zhouweiwei2014 提交于 6月 15, 2022
```
* add some kernel(csr*dense->csr, dense*dense->csr) of SparseTensor matmul

* fix CI

* fix CI

* fix comment

* fix comment
```
346efe96

Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to... · 15577630

由 Yiqun Liu 提交于 6月 15, 2022

Use int64_t in GetGpuLaunchConfig1D and ElementwiseKernel as index type to support large tensor. (#43506)

* Change some data type from int to int64_t in GetGpuLaunchConfig1D to support large tensor.

* Use int64_t in ElementwiseKernel as index type to support large tensor.

15577630

13 6月, 2022 1 次提交

sparse convertion kernel support secondary dispatch (#43345) · 5752643b

由 zhangkaihuo 提交于 6月 13, 2022

* use GpuMemcpy and GpuMemset

* sparse convert kernel support double dispatch by indices dtype

* cudaMemcpyKind->gpuMemcpyKind

5752643b

08 6月, 2022 1 次提交
- X
  
  call_once (#43206) · cad139a7
  由 xiaoxiaohehe001 提交于 6月 08, 2022
  
  cad139a7
07 6月, 2022 1 次提交
- W
  
  [multi-stream] Fix split and concat problem. (#43039) · 8c3777df
  由 Wilber 提交于 6月 07, 2022
  
  8c3777df
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
04 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：cmake-format (#43057) · 92568edb
  由 Sing_chan 提交于 6月 04, 2022
  
  92568edb
19 5月, 2022 1 次提交
- C
  [CompileOpt] Refine enforce code and remove boost/variant include (#41093) · ca359fec
  由 Chen Weihang 提交于 5月 19, 2022
```
* refine enforce code

* refine enforce code

* fix compile failed

* fix infrt failed
```
  ca359fec
13 5月, 2022 1 次提交
- W
  
  add gpu resources. (#42723) · 1280f294
  由 Wilber 提交于 5月 13, 2022
  
  1280f294
12 4月, 2022 2 次提交

[CustomOp] Add context pool unittests (#41085) · 59ec9599

由 Chen Weihang 提交于 4月 12, 2022

* add context pool unittests

* fix timeout

* polish details

* change option pos

* add dll decl for wndows

* fix pre-commit error

* move dll_decl and export DeviceContext

* replace lost dll_decl.h

59ec9599

J
fix_paddle_numel_check (#41607) · 51cae7f7
由 JingZhuangzhuang 提交于 4月 12, 2022
```
* fix_paddle_numel_check

* fix_paddle_numel_check
```
51cae7f7

09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

01 4月, 2022 1 次提交

[Phi]Interploatd kernels into phi (#40855) · d65a7a46

由 chentianyu03 提交于 4月 01, 2022

* add interploate cpu kernel

* fix nullptr bug

* add interpolate gpu kernel

* fix unit test error

* remove raw kernels

* add cuda kernel impl

* add infermeta

* recover accidentally deleted kernels in interpolate op

* fix grad x_grad name error

* remove interpolate_v2_op.h

* rm unused codes

* fix xpu build error

* fix build error

* fix namespace error

* add register header for nup

* fix infermeta error

* modify by review

* add the missing args in test_trt_convert_nearest_interp_v2

d65a7a46

25 3月, 2022 2 次提交
- F
  add maximum limit for grid of reduce, elementwise, gather and scatter (#40813) · 608a5f55
  由 FlyingQianMM 提交于 3月 25, 2022
```
* add maximum limit for grid of reduce, elementwise and gather

* add {} after if
```
  608a5f55
- Q
  
  [ROCm] fix compile error on DTK21.10, test=develop (#40893) · 41f813e9
  由 Qi Li 提交于 3月 25, 2022
  
  41f813e9
17 3月, 2022 1 次提交

Trt engine. (#40532) · 3082ed46

由 Wilber 提交于 3月 17, 2022

* infrt add trt engine

* fix register

* file generate

* fix ci error

* fix conflict

* add copyright

* update

* update

* update

* update engine name

* refactor trt code

* update

* update

* update

* update

* fix conflict

* update

* fix compile with cuda

3082ed46

14 3月, 2022 1 次提交

fix gpu callback (#40445) · 2c21d240

由 Leo Chen 提交于 3月 14, 2022

* fix gpu conetxt callback

* fix gpu callback

* fix callback early destruct problem

2c21d240

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致