提交 · e75c01f91350fce6d6051e4eec351514db005692 · PaddlePaddle / Paddle

07 4月, 2023 1 次提交
- W
  
  clean up WITH_MLU (#52546) · e75c01f9
  由 Wang Xin 提交于 4月 07, 2023
  
  e75c01f9
06 4月, 2023 1 次提交
- 张
  
  mv PADDLE_WITH_ASCEND_CL (#52535) · 80dd1672
  由张春乔提交于 4月 06, 2023
  
  80dd1672
03 4月, 2023 1 次提交
- remove WITH_ASCEND_CL PADDLE_WITH_ASCEND_CL WITH_ASCEND_CXX11 (#52448) · 0b60f28c
  由 engineer1109 提交于 4月 03, 2023
  
  0b60f28c
24 3月, 2023 1 次提交

[PHI Decoupling]Remove memory header (Part3) (#51288) · 3d78e759

由 YuanRisheng 提交于 3月 24, 2023

* decouple memory copy

* fix ci bugs

* fix ci compile bugs

* fix rocm compile

* fix ci bugs

* decouple memory

* deal with conflict

* fix xpu compile bugs

* fix xpu bugs

* deal with xpu bugs

* fix cmake bugs

* fix windows bugs

* fix ci bugs

* fix ci bugs

* delete redundance code

* add code for pybind

* fix py3 bugs

* fix ci bugs

3d78e759

13 3月, 2023 1 次提交
- H
  [phi decopuling] decouple dependency to device_context in phi (Part 2) (#51541) · 2305089f
  由 Huang Jiyi 提交于 3月 13, 2023
```
* platform::CUDAPinnedDeviceContext -> phi::GPUPinnedContext

* replace platform::TraceEventCollector
```
  2305089f
06 3月, 2023 1 次提交

[phi decoupling] decouple dependency to device_context in phi (Part 1) (#50865) · a1006b2b

由 Huang Jiyi 提交于 3月 06, 2023

* move DeviceContextPool to phi

* add EmplaceExternalContextFunc

* update namespace

* update cmake

* fix bugs and create context_pool_impl.h

* replace platform::is_xxx_place

* fix bugs

* update generator

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix bugs

* fix enforce usage

* Revert "fix enforce usage"

This reverts commit 5f521f08a69713cee506e64a00ec6d9fba709e27.

* fix bugs

* rm XPUDeviceContext and CustomDeviceContext

* fix bugs

* fix fix context init bug

* fix bugs after merge

* fix bugs

* fix name

* fix mutable_data

* update and fix bugs

* fix bugs

* update

* fix bugs

* fix name

* fix bugs

* merge

* fix bugs

* create context_pool in phi/backends

* create context_pool in phi/backends

* fix bugs

* fix xpu bugs

* fix rocm bugs

* fix bugs

* fix bugs

* fix bugs

* fix xpu bugs

* update

* update

* fix bugs

* fix bugs

a1006b2b

23 2月, 2023 1 次提交

[phi decoupling] move generator implementation from fluid to phi (#50746) · 4e417409

由 Huang Jiyi 提交于 2月 23, 2023

* move fluid generator to phi

* move fluid generator to phi

* update .gitignore

* fix bugs

* fix cannot find "glog/logging.h" in "generator.h"

* fix bugs

4e417409

30 1月, 2023 1 次提交

Support stream priority for standalone executor (#49939) · 172d1de6

由 Ruibiao Chen 提交于 1月 30, 2023

* Support stream priority for standalone executor

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

* Fix compile error

172d1de6

23 12月, 2022 1 次提交
- Q
  
  suport recompute for kunlun (#49069) · 98c17a68
  由 QingshuChen 提交于 12月 23, 2022
  
  98c17a68
05 12月, 2022 1 次提交

Replace mutable_data with DeviceContext.Alloc in phi kernels (#48500) · 34a957e3

由 Ruibiao Chen 提交于 12月 05, 2022

* Replace mutable_data with DeviceContext.Alloc in phi kernels

* Fix CI errors

* Fix CI errors

* Fix CI errors, test=kunlun

* Fix CI errors, test=kunlun

* Handle rnn_functor

* Update approvals

34a957e3

29 11月, 2022 1 次提交
- S
  
  [PHI decoupling] Move MKLDNN code (#48352) · fa051eec
  由 Sławomir Siwek 提交于 11月 29, 2022
  
  fa051eec
18 11月, 2022 1 次提交
- Z
  Fix bug of zero_allocator in HostAlloc (#48108) · 7f92e27e
  由 zyfncg 提交于 11月 18, 2022
```
* fix bug of zero_allocator in host

* fix test compile bug

* add unittest

* update test
```
  7f92e27e
01 11月, 2022 1 次提交

Adapting device-specific Extra Attributes for the PHI kernel (#46342) · c923e6c9

由 Chen Weihang 提交于 10月 31, 2022

* add extra attr property set

* add type_info for all context

* add onednn context to all context

* fix context compile error

* simplify conv kernel args

* pass runtime attr into dev_ctx

* fix marco error

* clear conv_grad_kernel extra args

* merge conv_grad_grad into conv_grad

* clear conv2d_grad_grad extra attrs

* clear yaml and eager extra attr

* fix conv1d error

* change to thread local

* fix npu compile failed

* try to fix windows compile failed

* add conv2d onednn phi kernel

* fix ci bugs (#36)

* fix compile bugs (#38)

* fix extra input transform bug (#39)

* support dynamic created attr (#40)

* reset extra info gen code

* rm conv_grad_grad kernel

* reimpl pass attr adapting

* add int attr support

* remove vector inputnames creating

* fix map at error

* Update paddle/phi/kernels/onednn/conv_grad_kernel.cc
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

* remove useless extra attrs

* replace mkldnn_engine by onednn_engine
Co-authored-by: NYuanRisheng <yuanrisheng@baidu.com>
Co-authored-by: NSławomir Siwek <slawomir.siwek@intel.com>

c923e6c9

25 9月, 2022 1 次提交
- S
  
  move some singleton to cc file (#46470) · e8b9ae20
  由 sneaxiy 提交于 9月 25, 2022
  
  e8b9ae20
09 9月, 2022 1 次提交
- R
  [CustomDevice] add dy2static support (#45878) · abc85c50
  由 ronnywang 提交于 9月 09, 2022
```
* [CustomDevice] add dy2static support

* update
```
  abc85c50
01 9月, 2022 1 次提交
- L
  remove circular dependency of device_context and allocator (#45455) · 934171ae
  由 Leo Chen 提交于 9月 01, 2022
```
* refine cmake of framework

* add deps for dense tensor

* fix deps

* remove alloc(ctx)

* add depends on mkldnn
```
  934171ae
03 8月, 2022 1 次提交
- L
  
  clean class EigenCudaStreamDevice and CudnnWorkspaceHandle in device_context.cc (#44829) · 7eb37a7e
  由 Leo Chen 提交于 8月 03, 2022
  
  7eb37a7e
01 8月, 2022 1 次提交

unify gpu context (#44740) · 86763023

由 Leo Chen 提交于 8月 01, 2022

* remove cudaDeviceContext

* remove more template

* fix rocm compile

* remove alias name CUDADeviceContext

* fix compile

* fix tests

* revert changes

86763023

29 7月, 2022 2 次提交

L
unify fluid::CUDADeviceContext and phi::GpuContext (#44723) · 88490567
由 Leo Chen 提交于 7月 29, 2022
```
* remove cudaDeviceContext

* remove more template

* fix rocm compile
```
88490567

move CUDAStream to phi (#44529) · da3743fd

由 Leo Chen 提交于 7月 29, 2022

* init

* move CUDAStream to phi

* fix compilation

* merge develop

* add stream_owned_ member

* split cuda_stream.h

* fix cpu compile

* fix constructor

* fix bug

* fix windows compile

* fix inference test_levit

* fix windows tests

da3743fd

19 7月, 2022 1 次提交
- W
  
  update (#44418) · d5f0ed4b
  由 Wilber 提交于 7月 19, 2022
  
  d5f0ed4b
14 7月, 2022 1 次提交

[Phi]Improve the mechanism for mkldnn kernel in PHI (#43941) · e9b4d0be

由 YuanRisheng 提交于 7月 14, 2022

* adapt mkldnn kernel in PHI

* fix ci compile bugs

* fix compile bugs

* fix compile bugs

* fix compile bugs

* fix compile bugs

* delete comment

* fix compile bugs in windows-inference

* delete code for converage

* modify code by review

* modify code by review

* add todo

* fix compile bugs

* fix compile bugs

* fix compile bugs

* fix unittest bugsx

e9b4d0be

11 7月, 2022 1 次提交
- 王
  
  [NPU] add npu support for new executor. test=develop (#43403) · 5988553f
  由王明冬提交于 7月 11, 2022
  
  5988553f
07 7月, 2022 1 次提交

[IPU] support dy2static for IPU merge code (#43770) · 6984fbca

由 Allen Guo 提交于 7月 07, 2022

* feat(): dynamic_to_static support for ipu.

* fix(): format fix.

* fix format

* fix cpplint error

* use phi::errors

* fix format

* fix format

* fix(): add api to restore patched function.

* fix(): identity_loss uses cpu place as expected kernel type.

* doc(): add IPU dy2static related docs.

* fix(): combine test cases.

* fix format

* fix comment

* fix format

* apply comment

* fix compiling

* fix(): align docs.

* fix(): fix identity_loss function docs.

* fix(): adjust mean and sum in identity_loss.

* fix(): minor docs.

* move API to paddle.incubate.identity_loss

* fix UT
Co-authored-by: Nzhaorui chen <zhaoruic@graphcore.ai>

6984fbca

02 7月, 2022 2 次提交

unify cpu context, part2 (#44012) · 755438a7

由 Leo Chen 提交于 7月 02, 2022

* fix init()

* delete test_device_context

* replace CPUDeviceContext with CPUContext

* fix test_scalar

* remove dot_op.cc

* fix compile

755438a7

unify cpu context (#43989) · 09096aeb

由 Leo Chen 提交于 7月 01, 2022

* unify cpu context

* fix init()

* delete test_device_context

* fix test_scalar

09096aeb

26 6月, 2022 1 次提交
- S
  
  format all files in fluid using new config (#43776) · 576236a0
  由 Sing_chan 提交于 6月 26, 2022
  
  576236a0
10 6月, 2022 1 次提交
- R
  Refactor DeviceContextPool (#42901) · 114723c9
  由 Ruibiao Chen 提交于 6月 10, 2022
```
* Refactor DeviceContextPool

* Adjust header file order
```
  114723c9
08 6月, 2022 1 次提交
- W
  
  thread_local method to support predictor stream. (#42785) · cab0f2f5
  由 Wilber 提交于 6月 08, 2022
  
  cab0f2f5
07 6月, 2022 1 次提交
- W
  
  [multi-stream] Fix split and concat problem. (#43039) · 8c3777df
  由 Wilber 提交于 6月 07, 2022
  
  8c3777df
05 6月, 2022 1 次提交
- S
  
  【code format check upgrade】 step2：clang-format (#42840) · a3730dc8
  由 Sing_chan 提交于 6月 05, 2022
  
  a3730dc8
23 5月, 2022 2 次提交

[Internal reviewing] NHWC fix to am_vocoder model for oneDNN 2.6 (#42729) · d414af94

由 Jacek Czaja 提交于 5月 23, 2022

* - prototype of reimplemented fixes

* - compilation fixes

* - compilation fix

* - cosmetic info

* - hopefully fix

* - compilation fix

* - supported for nested blocking of cache clearing

* - fix

* - Unit test to changes

* - Compilation fix to windows (hopefully)

* - Moved resetting layout to ResetBlob

* - fixes after review

d414af94

remove is_init_py of RandomGenerator, and use Global RandomGenerator by default (#42876) · 3b488bae
由 zhouweiwei2014 提交于 5月 23, 2022
```
* remove is_init_py of RandomGenerator, and use Global Generator if not OP seed

* fix comment
```
3b488bae

09 4月, 2022 1 次提交

Autotune the workspace_size_limit in conv. (#40338) · b937cdc5

由 limingshu 提交于 4月 09, 2022

* Using the maximum workspace_size of all alogirhms to limit the workspace size in exhaustive search mode.

* Use the system cudaMalloc and cudaFree to allocate workspace during searching.

* Enable switch of two kind of workspace setting methods.
Co-authored-by: NLiu Yiqun <liuyiqun01@baidu.com>

b937cdc5

01 4月, 2022 1 次提交

[Eager] Support pinned (#41035) · f3270fc8

由 wanghuancoder 提交于 4月 01, 2022

* support pinned, test=develop

* support async_write, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine,test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

* refine, test=develop

f3270fc8

31 3月, 2022 1 次提交

[new-exec] fit mkldnn op (#41058) · 02cf6764

由 Leo Chen 提交于 3月 31, 2022

* fix bug that some op has no op_role attr

* add mkldnn support for new executor

* fit for mkldnn data_transfer

* fit for mkldnn data_transfer

02cf6764

27 3月, 2022 1 次提交

[new-exec] fit for mkldnn and inplace op (#40955) · afa0e82c

由 Leo Chen 提交于 3月 27, 2022

* fit for mkldnn and inplace op

* fix compile

* refine ut

* register op version

* fix inplace op

* fix transfer_layout

afa0e82c

23 3月, 2022 1 次提交

Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988

由 From00 提交于 3月 23, 2022

* Performance optimize

* Optimize GetAllocator, RWLock and ProcessUnfreedAllocation

* Remove test file

* Fix CI error

* Fix CI errors

* Fix CI errors

d8bff988

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

03 3月, 2022 1 次提交
- R
  
  [CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
  由 ronnywang 提交于 3月 03, 2022
  
  b4665d23

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功