提交 · 8259d9bfe54d20583df60c6fc53add6d345ff2f2 · PaddlePaddle / Paddle

26 5月, 2021 5 次提交

[NPU] refine NpuOpRunner (#32869) · 8259d9bf

由 Leo Chen 提交于 5月 26, 2021

* refine ~npuOpRunner

* implement destructor and forbid copy

* use reference to avoid copy

* use const reference

* relax adam precision

* fix top_k

8259d9bf

optimize OP's compilation time (#32617) · 78ecb668

由 wuhuanzhou 提交于 5月 26, 2021

* optimize OP's compilation time, test=develop

* add more op and run ci test, test=develop

* CUDA Kernel register in cc file, test=develop

* fix macros, test=develop

* fix undefined symbol error, test=develop

* fix compilation error and undefined symbol, test=develop

* fix compilation error on Windows, test=develop

* fix compilation error on Windows, test=develop

78ecb668

Y

Marker op for profiling (#33034) · 5c79dbb2
由 Yuang Liu 提交于 5月 26, 2021

5c79dbb2

Add double grad op for sigmoid activation, test=develop (#32971) · c711e913

由 Zhanlue Yang 提交于 5月 26, 2021

Sigmoid: Out = Sigmoid(X)
SigmoidGrad: DX = DOut*(1-Out)*Out

[This Patch]
Out
DOut -> SigmoidGradGrad -> DOutNew
DDX                        DDOut

DDOut = (1-Out)*Out*DDX
DOutNew = (1-2*Out)*DOut*DDX

c711e913

Added cast op oneDNN kernel for bf16/fp32 datatypes casting(FWD/BWD) (#33056) · a2a45d8d

由 jakpiase 提交于 5月 26, 2021

* added op cast functionality for fp32/bf16

* added newline

* added entries in static mode white list and unity build

* fixed failing tests

* changes after review

* added formatting

* upgraded tests file as reviewer suggested

* changes after review

* minor change

a2a45d8d

25 5月, 2021 5 次提交
- C
  modify complex template for elementwise ops (#33071) · dbc08d69
  由 chentianyu03 提交于 5月 25, 2021
```
* modify complex template for elementwise ops

* modify mul, div grad struct

* add complex template for CudaShuffleDownSync CudaShuffleXorSync funcs and fix the bug when delete cuda<9000

* fix shuffle func args bug

* fix shuffle func args bug

* fix shuffle func args bug
```
  dbc08d69
- 石
  add the op def proto, test=develop (#33098) · 3a7b9ed7
  由石晓伟提交于 5月 25, 2021
```
* add the op def proto, test=develop

* add while.pbtxt
```
  3a7b9ed7
- C
  modify Ops to complex template (#33041) · 5fa44c34
  由 chentianyu03 提交于 5月 25, 2021
```
* modify conj, real, imag OP to complex template

* replace with complex template to dot Op

* replace with complex template to Abs Op

* add support for complex64 and complex128
```
  5fa44c34
- J
  
  Added scale op FP32/BF16 FWD/BWD kernels (#32975) · 86ea8dce
  由 jakpiase 提交于 5月 25, 2021
  
  86ea8dce
- N
  
  Add a new high performance framework for reduce ops (#32697) · 88b43b51
  由 niuliling123 提交于 5月 25, 2021
  
  88b43b51
24 5月, 2021 1 次提交
- L
  
  Support OutType tmeplate argument in elementwise_broadcast branch (#33060) · d6aea4ac
  由 limingshu 提交于 5月 24, 2021
  
  d6aea4ac
22 5月, 2021 1 次提交

Added oneDNN matmul grad BF16/FP32 kernel (#32968) · e2a3a6f7

由 jakpiase 提交于 5月 22, 2021

* added support for most matmul cases

* added more functionality

* full functionality of matmul op, fp32 only

* added bf16 tests and functionality

* added formatting

* changes after review

* minor change

* added reviewers suggestions

e2a3a6f7

21 5月, 2021 3 次提交
- C
  replace complex64/128 with complex template in cast Op (#33019) · 79d918d9
  由 chentianyu03 提交于 5月 21, 2021
```
* replace complex in set tensor from and to numpy

* replace complex template in cast op
```
  79d918d9
- F
  optimize softmax with cross entropy hard label (#32290) · 7be6191b
  由 Feng Xing 提交于 5月 21, 2021
```
* optimize softmax with cross entropy hard label

* label ignore_index cleaning
```
  7be6191b
- P
  [NPU] cast indices and label if their type is not consistent in accuracy npu op (#33016) · 70dc5f49
  由 pangyoki 提交于 5月 21, 2021
```
* cast indices and label if their type is not consistent

* fix bug

* add unittest
```
  70dc5f49
20 5月, 2021 4 次提交

fix gather op and add logsumexp op on kunlun (#32931) · a96e8bc9

由 TTerror 提交于 5月 20, 2021

* fix gather op and add logsumexp op on kunlun

* update xpu depence

* update tests and fix elementwise_add

a96e8bc9

B

revert_matmulv2_npu (#33014) · be8e94aa
由 Baibaifan 提交于 5月 20, 2021

be8e94aa

Add complex template type (#32857) · 738bf20e

由 chentianyu03 提交于 5月 20, 2021

* add complex template file

* add numtraits for complex template

* add complex template type register

* modify specify template of complex

* modify specify template of complex

* modify specify template of complex

* modify specify template of complex

* make TensorCheckerVisitor support complex type

* fix operator= error

* add complex template

* add complex template type

* add complex template type to pyarray transform

* add complex template type to pyarray transform

* remove complex type for dlpack register

* set dlpack supprot complex type

* set dlpack supprot complex type

* set dlpack supprot complex type

* remove explict for complex constructor

* add complex unit test file

738bf20e

L

Binary functor envoking of elementwise broadcast (#32928) · 14949521
由 limingshu 提交于 5月 20, 2021

14949521

19 5月, 2021 2 次提交

[Rocm] fix test of random_crop_op & logsumexp (#32824) · aa4a56fc

由 zhulei 提交于 5月 19, 2021

* [Rocm] fix test of random_crop_op

* [Rocm] fix test of random_crop_op

* [Rocm] fix test of random_crop_op & simple_rnn_op

* [Rocm] fix test of random_crop_op & simple_rnn_op & logsumexp

* [Rocm] fix test of random_crop_op & simple_rnn_op & logsumexp

* [Rocm] fix test of random_crop_op & simple_rnn_op & logsumexp

* [Rocm] fix test of random_crop_op & logsumexp

aa4a56fc

J

[oneDNN] Pool softmax and LRN access to cache optimized (#32922) · 56008aa1
由 Jacek Czaja 提交于 5月 19, 2021

56008aa1

18 5月, 2021 4 次提交
- P
  [NPU] fix accuracy npu op bug and change top_k's output to int64 (#32935) · c66586b4
  由 pangyoki 提交于 5月 18, 2021
```
* Output indices of top_k npu op change to int64

* fix accuracy npu bug

* fix errors

* change cast method to FillNpuTensorWithConstant

* change cast method to FillNpuTensorWithConstant
```
  c66586b4
- L
  
  add unit8 for concat (#32850) · 53580bb4
  由 liuyuhui 提交于 5月 18, 2021
  
  53580bb4
- W
  
  relu supports bfloat16 data type (#32542) · bcd40f21
  由 wuhuanzhou 提交于 5月 18, 2021
  
  bcd40f21
- W
  fix the paddle compare op for the broadcast when the element equal (#32941) · c72ed824
  由 wawltor 提交于 5月 18, 2021
```
* fix the paddle compare op for the broadcast

* fix compare op in for in the cuda device
```
  c72ed824
14 5月, 2021 4 次提交

Fix four error messages (#32899) · c4787d76

由 Kqnonrime 提交于 5月 14, 2021

* fix two error message

* fix two error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix some error

* fix error

* fix some error

* fix some error

* fix some error

* fix one error

* fix some error

* fix seven error message

* fix error

* fix error

* fix error

* fix error

* fix some error message

* fix error

* fix some error

* fix some error

* fix four error message

* fix error

* fix error

c4787d76

J

[oneDNN] Refactoring of softmax grad onednn kernel to match common API (#32851) · 479689f6
由 Jacek Czaja 提交于 5月 14, 2021

479689f6
B

solove_matmulv2_npu_bugs (#32896) · 2d9d8f57
由 Baibaifan 提交于 5月 14, 2021

2d9d8f57
L

Optimization the broadcast performance of elementwise_add (#32512) · b035c8b0
由 limingshu 提交于 5月 14, 2021

b035c8b0

13 5月, 2021 4 次提交
- L
  [NPU] support global accumulator for adam (#32780) · dace3fd5
  由 Leo Chen 提交于 5月 13, 2021
```
* add use_global_beta_pow

* add use_global_beta_pow

* update npu kernel

* update python api

* refine code

* add ut for use_global_beta_pow

* fix npu kernel

* add ut for api

* add ut for exception

* add ut for save/load
```
  dace3fd5
- B
  
  solved some npu bugs (#32793) · c3ae0d40
  由 Baibaifan 提交于 5月 13, 2021
  
  c3ae0d40
- J
  
  fix stack grad gpu (#32781) (#32877) · 3e47eee9
  由 Jiawei Wang 提交于 5月 13, 2021
  
  3e47eee9
- C
  
  change unique op VisitaDataType from small to tiny (#32872) · b60ab6b6
  由 chentianyu03 提交于 5月 13, 2021
  
  b60ab6b6
12 5月, 2021 2 次提交
- L
  
  [NPU] Support async copy for TensorFromVector with event (#32563) · 85512d60
  由 liym27 提交于 5月 12, 2021
  
  85512d60
- L
  
  [NPU] Support npu pinned allocator and manage Tensor on NPUPinnedPlace (#32840) · 6b3bb796
  由 liym27 提交于 5月 12, 2021
  
  6b3bb796
10 5月, 2021 3 次提交
- Z
  
  Support different data type between input and output (#32823) · 3419de53
  由 Zhang Zheng 提交于 5月 10, 2021
  
  3419de53
- T
  [pslib] pslib with cmake (#32800) · fbbc3394
  由 Thunderbrook 提交于 5月 10, 2021
```
* pslib with cmake

* heter util

* vlog

* heter server test

* add dtor

* cmake
```
  fbbc3394
- L
  
  fix npu compile error (#32820) · 5aa8faa2
  由 Leo Chen 提交于 5月 10, 2021
  
  5aa8faa2
08 5月, 2021 2 次提交

B
add c_identity op npu (#32787) · c8affff0
由 Baibaifan 提交于 5月 08, 2021
```
* add c_identity_op_npu
```
c8affff0

[NPU] refine update_loss_scaling npu kernel (#32580) · 4628b6f8

由 pangyoki 提交于 5月 08, 2021

* refine update_loss_scaling npu kernel

* add mutable_data

* change Zerolike op to MemcpyAsync

* delete useless code

* add found_inf_vec

* add memcpy if not finite

* fix unittest

4628b6f8

PaddlePaddle / Paddle 1 年多 前同步成功

PaddlePaddle / Paddle
1 年多前同步成功