提交 · 268f097ec0aaa903497ea72f6ad8536567d2d9e2 · PaddlePaddle / Paddle

16 9月, 2022 1 次提交

[CustomDevice] add new executor support (#46038) · 268f097e

由 ronnywang 提交于 9月 16, 2022

* [CustomDevice] add custom_device_resource_pool & device_event_custom_device

* update

* update

* update

* update

268f097e

09 9月, 2022 1 次提交
- R
  [CustomDevice] add dy2static support (#45878) · abc85c50
  由 ronnywang 提交于 9月 09, 2022
```
* [CustomDevice] add dy2static support

* update
```
  abc85c50
04 8月, 2022 1 次提交
- 王
  
  add xpu garbage collector for standalone executor. (#44572) · 0e26361c
  由王明冬提交于 8月 04, 2022
  
  0e26361c
03 8月, 2022 1 次提交
- L
  
  clean class EigenCudaStreamDevice and CudnnWorkspaceHandle in device_context.cc (#44829) · 7eb37a7e
  由 Leo Chen 提交于 8月 03, 2022
  
  7eb37a7e
01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 2 次提交

L
unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
88490567

move CUDAStream to phi (#44529) · da3743fd

由 Leo Chen 提交于 7月 29, 2022

* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests

da3743fd

19 7月, 2022 1 次提交
- W
  
  update (#44418) · d5f0ed4b
  由 Wilber 提交于 7月 19, 2022
  
  d5f0ed4b
14 7月, 2022 1 次提交

[Phi]Improve the mechanism for mkldnn kernel in PHI (#43941) · e9b4d0be

由 YuanRisheng 提交于 7月 14, 2022

* adapt mkldnn kernel in PHI

* fix ci compile bugs

* fix compile bugs

* fix compile bugs

* fix compile bugs

* fix compile bugs

* delete comment

* fix compile bugs in windows-inference

* delete code for converage

* modify code by review

* modify code by review

* add todo

* fix compile bugs

* fix compile bugs

* fix compile bugs

* fix unittest bugsx

e9b4d0be

04 7月, 2022 1 次提交
- L
  
  unify cpu context (#44049) · 8f8a6848
  由 Leo Chen 提交于 7月 04, 2022
  
  8f8a6848
02 7月, 2022 2 次提交

unify cpu context, part2 (#44012) · 755438a7

由 Leo Chen 提交于 7月 02, 2022

* fix init()

* delete test_device_context

* replace CPUDeviceContext with CPUContext

* fix test_scalar

* remove dot_op.cc

* fix compile

755438a7

unify cpu context (#43989) · 09096aeb

由 Leo Chen 提交于 7月 01, 2022

* unify cpu context

* fix init()

* delete test_device_context

* fix test_scalar

09096aeb

29 6月, 2022 1 次提交
- C
  
  fix device context init error (#43910) · d1ac85e5
  由 Chen Weihang 提交于 6月 29, 2022
  
  d1ac85e5
26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
10 6月, 2022 1 次提交
- R
  Refactor DeviceContextPool (#42901) · 114723c9
  由 Ruibiao Chen 提交于 6月 10, 2022
```
* Refactor DeviceContextPool

* Adjust header file order
```
  114723c9
08 6月, 2022 1 次提交
- W
  
  thread_local method to support predictor stream. (#42785) · cab0f2f5
  由 Wilber 提交于 6月 08, 2022
  
  cab0f2f5
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
23 5月, 2022 1 次提交

[Internal reviewing] NHWC fix to am_vocoder model for oneDNN 2.6 (#42729) · d414af94

由 Jacek Czaja 提交于 5月 23, 2022

* - prototype of reimplemented fixes

* - compilation fixes

* - compilation fix

* - cosmetic info

* - hopefully fix

* - compilation fix

* - supported for nested blocking of cache clearing

* - fix

* - Unit test to changes

* - Compilation fix to windows (hopefully)

* - Moved resetting layout to ResetBlob

* - fixes after review

d414af94

12 5月, 2022 1 次提交

add xpu buffer_reader, *test=kunlun (#42578) · cc343a41

由 z8hanghuan 提交于 5月 12, 2022

* add xpu buffer_reader, *test=kunlun

* xpu buffer_reader, use XPUDeviceGuard, *test=kunlun

* modify xpu.cmake, *test=kunlun

* modify xpu.cmake, *test=kunlun

* modify xpu.cmake, *test=kunlun

* add xpu buffer_reader, *test=kunlun

* add xpu buffer reader, *test=kunlun

* add xpu buffer reader, *test=kunlun

cc343a41

21 3月, 2022 1 次提交

[Phi] Add phi device context pool (#40635) · 0e1191f4

由 Chen Weihang 提交于 3月 21, 2022

* add phi device context pool

* change year

* fix compile error

* fix operator = error

* refine init impl

* polish details

* refine init impl

0e1191f4

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

03 3月, 2022 1 次提交
- R
  
  [CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
  由 ronnywang 提交于 3月 03, 2022
  
  b4665d23
23 2月, 2022 1 次提交

[KP] Add elementwise add xpu after phi, test=develop (#39787) · 1a1a2ce8

由 Liu-xiandong 提交于 2月 23, 2022

* [KP] Add elementwise add xpu, test=develop

* modify the File Permissions

* modify the copyright time

* modify code style

* modify code style

1a1a2ce8

22 2月, 2022 1 次提交
- R
  
  [CustomRuntime] fix CustomDeviceContext (#39766) · 60fc555e
  由 ronnywang 提交于 2月 22, 2022
  
  60fc555e
20 2月, 2022 1 次提交

[PTen->Phi PR1] Change pten dirname and namespace to phi (#39748) · dcfe1986

由 Chen Weihang 提交于 2月 20, 2022

* rename pten dir to phi

* rename namespace to phi

* rename infrt pten dir to phi

* resolve conflict

* rename pten to phi in cmake

* revert all infrt change

* change needed files

* fix infrt failed

* fix inference failed

dcfe1986

18 2月, 2022 1 次提交
- R
  
  [CustomRuntime] add pten::Backend support (#39606) · d6d0820e
  由 ronnywang 提交于 2月 18, 2022
  
  d6d0820e
15 2月, 2022 1 次提交

[PluggableDevice] Add custom runtime support (#38740) · 3e7825f3

由 ronnywang 提交于 2月 15, 2022

* [CustomRuntime] Add DeviceManager

* [CustomRuntime] Add DeviceInterface

* [CustomRuntime] Add Stream, Event, DeviceGuard, CallbackManager

* [CustomRuntime] Add plug-in device

* [CustomRuntime] Memory module support PluggableDevice

* [CustomRuntime] Add WITH_PLUGGABLE_DEVICE cmake option

* update

* [API] update API doc based on comments, test=develop
Co-authored-by: Nqili93 <qili93@qq.com>

3e7825f3

08 2月, 2022 1 次提交
- W
  [PTEN] Update gpu_context. (#39359) · 24103cbb
  由 Wilber 提交于 2月 08, 2022
```
* gpu_context..

* update

* update

* update
```
  24103cbb
06 2月, 2022 1 次提交
- W
  
  [PTEN] Add Gpu context (#39305) · a821c4a9
  由 Wilber 提交于 2月 06, 2022
  
  a821c4a9
26 1月, 2022 1 次提交

[IPU] sync misc changes 01 (#38876) · 4efbebea

由 Allen Guo 提交于 1月 26, 2022

* sync misc changes

* apply comments 01

* fix compile error

* remove is_ipu_place check

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* sync changes

* restore cmake

* update ir cmake and setup.py

* update inference_lib cmake

* split PR
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NHaicheng Jiang <haichengj@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

4efbebea

25 1月, 2022 1 次提交
- W
  
  [PTEN] Add xpu context. (#39098) · c1e5a393
  由 Wilber 提交于 1月 25, 2022
  
  c1e5a393
21 1月, 2022 1 次提交

[PTEN] Add cpu context (#38979) · 064bc4b8

由 Wilber 提交于 1月 21, 2022

* add cpu_context.

* update

* update

* update

* update

* update

* fix ci problem

* fix npu ci problem

* update

* fix ci compile

064bc4b8

30 12月, 2021 1 次提交

Add cusparse and unittest (#38431) · 667dc9f0

由 zhangkaihuo 提交于 12月 30, 2021

将cuSparse的handle与DeviceContext进行绑定，避免op中进行创建和销毁
添加对cuSparse中dense和sparse转换的API进行封装
添加对封装的API的单测

667dc9f0

23 12月, 2021 1 次提交
- W
  Support external stream. (#38373) · 15ad7ee4
  由 Wilber 提交于 12月 23, 2021
```
* support external stream.

* update

* update

* update
```
  15ad7ee4
20 12月, 2021 1 次提交
- F
  
  [MLU]add mlu backend (#38207) · 76514a1f
  由 fwenguang 提交于 12月 20, 2021
  
  76514a1f
07 12月, 2021 1 次提交
- J
  
  add ipu device p1 (#37841) · c9a3c669
  由 jianghaicheng 提交于 12月 07, 2021
  
  c9a3c669
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
29 11月, 2021 1 次提交
- P
  
  Add third batch of deprecated mkldnn namespace name changes (#37558) · 1ba81500
  由 piotrekobiIntel 提交于 11月 29, 2021
  
  1ba81500
27 11月, 2021 1 次提交

[NPU] reorganization for device API abstraction (#37110) · 72241a6a

由 Aganlengzi 提交于 11月 27, 2021

* [NPU] reorganization for device API abstraction

* [NPU] delete old files

* [NPU] fix npu_collective_helper

* [NPU] fix collective_helper

* [NPU] fix ut

* [NPU] mod memory allocation and hccl_helper

* [NPU] fix place_type

* [NPU] split enfoce.h

* move acl* call into npu_info

* merge conflict

* fix merge

* merge conflict

* merge conflict

72241a6a

24 11月, 2021 1 次提交
- P
  Changed second batch of deprecated mkldnn header and function names to new oneDNN names (#37351) · 7db7a0ec
  由 piotrekobiIntel 提交于 11月 24, 2021
```
* Add second batch of deprecated mkldnn namespace and macro changes

* Unlock CI

* Fix temporary namespace alias placing
```
  7db7a0ec

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功