提交 · 5dcfb6995bd1913f2b046b4610420dff4995eff6 · PaddlePaddle / Paddle

23 3月, 2022 3 次提交

[NPU] add npu support for conv3d and conv3d_grad (#38480) · ff568afa

由 furnace 提交于 3月 23, 2022

* [NPU] add npu support for conv3d and conv3d_grad

* [NPU] delete failed unittests due to Ascend not support

* [NPU] delete debug codes

* [NPU] optimize codes, notest

* [NPU] remove const_cast

* [NPU] optimize for remove const_cast

* [NPU] fix written errors

ff568afa

Performance optimization for StreamSafeCudaAllocator (#40718) · d8bff988

由 From00 提交于 3月 23, 2022

* Performance optimize

* Optimize GetAllocator, RWLock and ProcessUnfreedAllocation

* Remove test file

* Fix CI error

* Fix CI errors

* Fix CI errors

d8bff988

Add profiler features (#40357) · c15e3823

由 chenjian 提交于 3月 23, 2022

* add event record for model profiling

* fix format

* fix format

* fix code example bug

* no

* add profiler statistic

* add profiler feature

* fix bug

* fix bug

* fix bug

* fix bug

* required: gpu

* required: gpu

* fix bug

* required: gpu

* fix ci bug

* fix ci error

* fix ci error

* upgrade document

* fix doc

* fix ci bug

* add doc and fix bug

* nothing

* fix bug

* fix format bug

* modify format

* add deprecated description for old profiler

* fix bug

* fix bug

* fix

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* add load_profiler_reuslt doc

* help fix old profiler sample code

* add api doc

* fix format

* fix api doc

* fix api doc format

* fix api doc format

* fix api doc c format

* fix api doc format

c15e3823

21 3月, 2022 4 次提交

[Phi] Add phi device context pool (#40635) · 0e1191f4

由 Chen Weihang 提交于 3月 21, 2022

* add phi device context pool

* change year

* fix compile error

* fix operator = error

* refine init impl

* polish details

* refine init impl

0e1191f4

Z

conv2d support FP16 on xpu and update unittest for conv2d, test=kunlun (#40395) · 276017bb
由 zhangyikun02 提交于 3月 21, 2022

276017bb

[IPU] add more ops (#40691) · df3ae18a

由 Allen Guo 提交于 3月 21, 2022

* add more ops

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* rm ipu_strategy.check()

* fix UT fail

* fix typo
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

df3ae18a

[IPU] update ipu_backend (#40685) · d67fe921

由 Allen Guo 提交于 3月 21, 2022

* sync changes

* copy sOpNamescope

* fix UTs

* add authors
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NAllen Guo <alleng@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

* fix code-format

* fix compile error

* add comments for feed_op
Co-authored-by: NXiaobing Wang <xiaobingw@graphcore.ai>
Co-authored-by: NZhixin Yao <zhixiny@graphcore.ai>
Co-authored-by: NZhaorui Chen <zhaoruic@graphcore.ai>
Co-authored-by: NHan Zhao <hanzhao@graphcore.ai>

d67fe921

15 3月, 2022 1 次提交

oneDNN NHWC fixes (#40049) · dde9cec0

由 Jacek Czaja 提交于 3月 15, 2022

* - Prototype of third solution

- fix

- compilation fixes

- fix

- fixe

- fix

- fix

- compilation fix

- comment fix

- lint

update mkldnn conv_elementwise_add_fuse_pass ut

- NHWC changes to prelu

- alhpa dims

- UT fix

- fix to UT

- lint

- Some fixes

- added to BWD of prelu NHWC support

- reverted removal of resetting cu_layout in clearing of caching

* - Small changes

* - compilation fix

* - fix

* - fix

* lint

* - fixes after internal review

* - compilation fix

* - lint

dde9cec0

14 3月, 2022 3 次提交

Add an elementwise + activation fusion pass. (#36541) · 3f219160

由 Tomasz Socha 提交于 3月 14, 2022

* Add elementwise add and activation fuse pass

* Fix copy ellision

* More flexible pattern detector

* More flexible fusion pass

* Update lists for pass

* Add support for Pow operator

* Add support for more activation types

* Style

* Rename fusion pass

* First version of tests

* Dirty version of pass

* Polished version

* Update pbtxt

* Style

* Update names

* Style

* Use PADDLE_ENFORCE_EQ

* Save error message to variable

* WO for error checks

* CR

* Static style check

* Add missing 'activation_scale' attribute

* Add relu6 and sigmoid activations

* Style

* Fix fuse list formating

* Sync filenames for fuse pass files

* Fix cmake after move

* Fix registration

* Fix pass name in tests

* Add missing activations to checker

* WIPS

* Working mul op

* Working sub

* Working Add

* Remove pten includes

* Remove some forward declarations

* Remove Includes

* Fixes

* Remove default kernels

* Add check if post_ops attributes are avaliable

* Style

* Code adjustment

* Register default kernels

* We have year 2022 not 2021...
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Fast review fixes
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

* Review Fix

* Rename one_dnn -> onednn

* Style after review

* Fast and dirty fix for quantization

* Update tests

* Style

* Fix mkldnn_quantizer config

* Add Joanna's suggestion.

* Check if operator is explicitly disables on OneDNN

* Try to use unregistered attributes

* Style

* Test new framework

* FXI

* FXII

* Update test

* Style
Co-authored-by: Njakpiase <jakpia21@gmail.com>
Co-authored-by: NSylwester Fraczek <sylwester.fraczek@intel.com>

3f219160

[KP] Add unittests for... · f269ca3f

由 Lijunhui 提交于 3月 14, 2022

[KP] Add unittests for brelu,ceil,celu,elu,floor,hard_shrink,hard_sigmoid,log1p,logsigmoid,relu6,silu,soft_relu,softsign,swish (#40448)

* solve unexecuted UT

* add 24 activation op UT

* append swish&thresholded_relu to kpfirst_list

* rm thresholded_relu

f269ca3f

L

Update profiler (#40460) · 89a70c76
由 liutiexing 提交于 3月 14, 2022

89a70c76

11 3月, 2022 2 次提交
- [Phi]migrate cholesky_solve op to phi (#40387) · e24ca55e
  由 zhouweiwei2014 提交于 3月 11, 2022
  
  e24ca55e
- H
  
  minor fix matmul and onehot xpu. test=kunlun (#40419) · 594e412d
  由 houj04 提交于 3月 11, 2022
  
  594e412d
10 3月, 2022 2 次提交

L

solve unexecuted UT (#40397) · bd4dc3be
由 Lijunhui 提交于 3月 10, 2022

bd4dc3be

add tril_triu for xpu, *test=kunlun (#40246) · 1128db30

由 z8hanghuan 提交于 3月 10, 2022

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

* add tril_triu for xpu, *test=kunlun

1128db30

09 3月, 2022 1 次提交
- F
  
  [MLU] add mlu buffer reader (#40131) · b5a8a0d9
  由 fwenguang 提交于 3月 09, 2022
  
  b5a8a0d9
08 3月, 2022 3 次提交

L
add the implementation of process group for hccl (#40228) · 73583f86
由 lilong12 提交于 3月 08, 2022
```
* add pg_hccl
```
73583f86
A
[custom kernel]Upgrade support for multiple libs (#40223) · c39aa18e
由 Aganlengzi 提交于 3月 08, 2022
```
* [custom kernel]Upgade support for multi libs

* upgrade phi_custom_kernel deps
```
c39aa18e

add python profiler package (#40065) · 10325a82

由 chenjian 提交于 3月 08, 2022

* add python profiler package

* update according to review

* fix bug

* fix bug

* fix bug

* add unit test

* Revert "add unit test"

This reverts commit 4e69ff71b0645e069afe5dd8fea0d07717852c48.

* reduce for pr

* add unit test

* modify for pr

* fix unittest

* update for ci coverage

* modify according to review

* fix bug

* improve coverage

10325a82

07 3月, 2022 1 次提交

cuBlasLt Epilogue To Fuse Linear + ReLU|GeLU (#39437) · 2a3d9eca

由 Ming-Xu Huang 提交于 3月 07, 2022

* Added cuBlasLtHandle_t to device context.

* Added fused_gemm_epilogue op.

1. Added fused_gemm_epilogue op to leverage cuBlastLt Epilogue.
2. Support fusion Act(X*Y + bias), X'dims >=2 and Y'dims shoule be 2.
2. Act currently only be supported ReLU. (Will add GeLU in the future).

* Added UT to fused_gemm_epilogue op.

* Added LinearAct Pattern

1. Added LinearAct into graph_pattern_detector.* to define (2.)'s
pattern.
2. LinearAct is used to detect act(element_add(matmul_v2(x, w), bias)).
3. act currently only support ReLU (Will support GeLU in the future).

* Added FuseGemmEpiloguePass

1, Added FuseGemmEpiloguePass to handle nn.Linear + Act{ReLU}
fusion (GeLU will be supported in the future).
2. Only support matmul_v2 from nn.Linear.

* Added pybind to BuildStrageter.fuse_gemm_epilogue_.

* Added UT for fuse_gemm_epilogue_pass.

* GeLU support and EpilogueSingleton

1. Added GeLU support to fused_gemm_epilogue op.
2. Added EpilogueSingleton to cache auxiliary pointer.
3. Added related UTs.

* Rename cublaslt_epilogue_opto gemm_epilogue_op.*.

* Added both train and infer pattern to LinearAct.

1. Added support of fwd graph with grap_ops linking to LinearAct.
2. Added related changes to fuse_gemm_epilogue_pass for above
modification.

* Changed CUDA requirement from 11.4 to 11.6 for fuse_gemm_epilogue_pass.

* Added identity activation support to gemm_epilogue_op.

* Added Linear Fusion (matmul_v2 + ele_add)

1. Added matmul_v2 + ele_add pattern to LinearActPattern.
2. Added matmul_v2 + ele_add support to fuse_gemm_epilogue_pass.

* Rename gemm_epilogue_op.* to fused_gemm_epilogue_op.*

* Add fused_gemm_epilogue_grad op.

1. Added fused_gemm_epilogue_grad to support backward epilogue fusion.

* Add UTs to fused_gemm_epilogue_grad_op.

* Change attribute name in fused_gemm_epilogue_grad_op for clearing.

* Allow DX and DBias be dispensable to fused_gemm_epilogue_grad op.

* Added ElementwiseAdd+Matmul+Act graph pattern detection.

* Fuse backward of Linear( Act(x))

1. Added backward fusion pass to Linear( Act(x)).
2. Added backward fusion pass to Linear(x).

* Added UTs to backward fusion of Linear(Act(x)).

* Complete document of arguments to fused_gemm_epilogue_op.

* Made arguments of some functions pass by reference.

* Modify code with review comments.

1. Made arguments of some function pass by reference.
2. Removed redundant code.
3. Followed Google code style to change code.

* Made 'const' code style be consistent

* Fixed random seed of python UTs.

* Set Compiling constrains to cuBlasLt

1. Require CUDA 11.6+
2. Remove fuse_gemm_epilogue related tests when CUDA < 11.6.

* Code Reivew from Paddle

1. Changed arguments name is_first_gemm to without_x_gradient for
clearing.
2. Applied PADDLE_THROW in fused_gemm_epilogue_op.

* Remove EpilogueSingleton

1. Applied ReserveSpace to replace Epilogue for passing auxiliary
pointers between FWD and BWD.

* Fix a logical error and enhance UTs.

1. Added act op count checking in UTs.
2. Fix issue to fuse backward or ReLU(Linear(X)).
3. TODO: solve GELU fusion issues.

* Fix Linear and GeLU fusion issues.

1. Modified graph_detech_pattern to fit with both linear wiht gelu or
relu.
2. Modified data range in Uts to allow negative values.

* Removed fused_gemm_epilogue_op.h.

* Rename namespace pten to phi.

* Rename name of arguments in fused_gemm_epilogue_op

1. bias -> Bias.
2. out -> Out.
3. reserve_space -> ReserveSpace.

* Change EpiloguePassActivationCache as local variable.

1. Removed singleton in EpiloguePassActivationCache.
2. Made EpiloguePassActivationCache as an argument to each pass
functions.

2a3d9eca

04 3月, 2022 2 次提交
- L
  clean distribution_helper, index_impl, aligned_vector code in fluid (#40071) · b9672a1e
  由 Leo Chen 提交于 3月 04, 2022
```
* clean distribution_helper, index_impl, aligned_vector code in fluid

* fix conflicts
```
  b9672a1e
- C
  
  fix warning (#40133) · 14e98a0f
  由 chenjian 提交于 3月 04, 2022
  
  14e98a0f
03 3月, 2022 3 次提交

R

[CustomRuntime] migrate CustomRuntime into phi (#39908) · b4665d23
由 ronnywang 提交于 3月 03, 2022

b4665d23

Workqueue threadnames (#40035) · b8a16911

由 liutiexing 提交于 3月 03, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Set thread name for WorkQueue

* Add thread names

* fix ut
Co-authored-by: Nliutiexing <liutiexing@google.com>

b8a16911

Z

bugfix in is_xpu_support_op (#40070) · 34d93bee
由 zhangxiaoci 提交于 3月 03, 2022

34d93bee

02 3月, 2022 3 次提交
- Z
  [bf16] add bf16 kernel: softmax & log_softmax (#39999) · 4a4215ff
  由 zhangbo9674 提交于 3月 02, 2022
```
* add softmax log_softmax

* refine rocm

* refine unittest
```
  4a4215ff
- C
  Upgrade new profiler (#39984) · 0c3f7fbc
  由 chenjian 提交于 3月 02, 2022
```
* add new profiler components

* fix bug

* upgrade new profiler

* fix operator.cc

* fix operator.cc

* fix cmakelists.txt

* fix bug

* fix according to pr

* fix bug

* fix cmake

* fix bug

* fix a bug

* fix bug

* fix bug
```
  0c3f7fbc
- L
  
  [KP] Activation op registration for XPU2. part 1/2 (#40002) · 90ab7403
  由 Lijunhui 提交于 3月 02, 2022
  
  90ab7403
01 3月, 2022 2 次提交

A

fix compiling and running with ipu (#39920) · 69ab2700
由 Allen Guo 提交于 3月 01, 2022

69ab2700

[bf16] add bf16 kernel: scale gather sum (#39683) · 6d26b332

由 zhangbo9674 提交于 3月 01, 2022

* add scale gather sum

* refine CUDA_ATOMIC_WRAPPER ADD for bf16

* add gather unittest

* solve conflict

* add scale uinttest

* add sum unittest

* solve conflict

* refine gather unittest

* refine unittest

6d26b332

28 2月, 2022 6 次提交

Trace level env (#39926) · f335d9e1

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* Add host_trace_level env variable

* Revert "Optimize perf of softmax_with_cross_entropy (#39553)"

This reverts commit bbe5228c.
Co-authored-by: Nliutiexing <liutiexing@google.com>
Co-authored-by: NZzSean <18818272991@163.com>

f335d9e1

[Pten->Phi PR4] Rename pten in funcs to phi (#39961) · eb42dd52

由 Chen Weihang 提交于 2月 28, 2022

* rename pten_utils to phi_utils

* rename pten_utils target

* rename Pten to Phi

* replace pten with phi

* resolve conflict

eb42dd52

Update host tracer (#39975) · 406f1b96

由 liutiexing 提交于 2月 28, 2022

* add align for WorkQueue

* add spinlock

* merge develop

* merge

* Add EventsWaiter

* Revert "Add EventsWaiter"

This reverts commit e206173aa9be7401b83a53581627bfaf557c8fb2.

* update HostTracer

* fix

* update

* update
Co-authored-by: Nliutiexing <liutiexing@google.com>

406f1b96

W

infrt add trt engine (#39885) · 27536a32
由 Wilber 提交于 2月 28, 2022

27536a32
C
add new profiler components (#39964) · d4ae1775
由 chenjian 提交于 2月 28, 2022
```
* add new profiler components

* fix bug
```
d4ae1775

[KP] Unify .cu and .xpu files with .kps files (#39917) · 0ff72e5d

由 Liu-xiandong 提交于 2月 28, 2022

* [KP] Unify .cu and .xpu files with .kps files

* fix CI bug in GPU and modify the list

* fix conflict

* modify the date

0ff72e5d

25 2月, 2022 3 次提交
- C
  
  move for_range into phi (#39931) · 94d8f392
  由 Chen Weihang 提交于 2月 25, 2022
  
  94d8f392
- F
  
  [phi] update code for mkl based fft (#39889) · 687902fc
  由 Feiyu Chan 提交于 2月 25, 2022
  
  687902fc
- L
  [Fix bug] fix fp16 atomicAdd compiler error on different cuda_arch. (#39886) · ef96ffb6
  由 Li Min 提交于 2月 25, 2022
```
* Fix compile error on cuda_arch less than 700.
```
  ef96ffb6
24 2月, 2022 1 次提交
- C
  [PTen->Phi PR3] Rename pten make target to phi (#39832) · f77019a0
  由 Chen Weihang 提交于 2月 24, 2022
```
* rename pten to phi

* fix infrt compile failed

* resolve conflict
```
  f77019a0

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功