提交 · 8c0bacd45eb385941f6d8663a500aa6c46b0a038 · BaiXuePrincess / Paddle

25 10月, 2021 1 次提交

Add fused_attention_op: add impl wrappers. (#35903) (#36673) · 8c0bacd4

由 Li Min 提交于 10月 25, 2021

功能：本PR的目标是提高attention模块的计算性能。
为了减少框架层对op的调度开销，本PR通过在C++层手动实现attention模块，对外提供attention 大op；
为了减少防存开销，本PR采取了两种优化方法：
（1）在q,k,v计算时通过共享输入X，将该处的gemm，transpose和bias add从三次调用减少为一次；
（2）使用kernel融合优化技术，在不同cuda kernel之间通过寄存器传输数据；

8c0bacd4

24 10月, 2021 1 次提交

Add viterbi decode (#35778) (#36615) · 1906c746

由 Jack Zhou 提交于 10月 24, 2021

* add viterbi decode cpu kernel

* add viterbi decoder api in paddle.text

* add a data buffer once to avoid create many small pieces of data buffer frequently

* fix viterbi max_seq_length bug

* fix seq_len=1 bug

* fix device context

* move split out of for loop

* remove INVERSE_SUB

* remove 2 GET_CAST_MASK

* remove 1 loop

* remove Functor

* add to_static deploy code

* use MAX_FUNC instead of ELE_MAX

* add MaxFunctor

* impl max_func

* remove MaxFunctor

* remove cast op

* use REGISTER_OP_WITHOUT_GRADIENT

* add viterbi cuda kernel

* add FIX_BLOCKDIM_CASE macro

* add MKL add, mul; add get data mask

* add arange mkl impl

* add CPU Argmax

* add cpu gather

* use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL

* use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP

* use SAME_DIMS_ELEMENT_BINARY_OP

* add SimpleBroadcastBinaryOP

* use int instead of int64_t to accelerate

* optimize SimpleBroadcastBinaryOP

* optimize SimpleBroadcastBinaryOP

* optimize performance in both single thread and multithread situation

* remove useless line

* remove useless code

* add CREATE_TENSOR_BUFFER macro

* add INIT_REQUIRED_TENSOR macro

* add comment

* fix windows ci

* add viterbi unittest

* remove cuda add functor

* remove cuda equal

* remove a template function

* fix windows ci

* fix windows dtype

* remove some template instance

* remove useless header file

* remove some blockdim

* remove transpose impl

* accelerate cpu performance on single thread situation

* viterbi_decode->crf_decode

* rename crf params name

* add viterbi api test

* remove useless import

* add enable_static

* use viterbi decoder

* fix viterbi len=1

* fix  viterbi unittest

* remove useless comments

* reconstruct viterbi decode

* remove ADD,SUB,MUL structure

* fix coverage

* remove CREATE_TENSOR

* add name args

* crf.py->ops.py; with_start_stop_tag->include_start_end_tag

* update crf_decode en docs

* fix viterbi decode en docs

* fix some review comments

* add FIXED_BLOCK_DIM_CASE in cuda

* push_back->emplace_back

* crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

* paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

* fix viterbi_decode en docs

1906c746

22 10月, 2021 1 次提交

Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1 (#36373) (#36616) · 6840cf55

由 niuliling123 提交于 10月 22, 2021

* Fix a bug in ReadData, ReadDataBc and ReadDataReduce when NX != 1
* Update the implement of reduceAnyKernel according to kernel primitive api

6840cf55

30 9月, 2021 1 次提交
- G
  
  support fp16 (#35888) (#36191) · 87cc8d48
  由 Guoxia Wang 提交于 9月 30, 2021
  
  87cc8d48
22 9月, 2021 1 次提交

[Cherry-pick 2.2] Correct the return type of elementwise kernel to avoid many... · 0f344838

由 Yiqun Liu 提交于 9月 22, 2021

 [Cherry-pick 2.2] Correct the return type of elementwise kernel to avoid many compiling warnings. (#35839) (#35868)

Cherry-pick #35839

0f344838

18 9月, 2021 1 次提交

[oneDNN] Disable caching of Reorder operation (#35664) · e4c2a854

由 Jacek Czaja 提交于 9月 18, 2021

* - REorder disabling caching

* - compilation fix

* - another compilation fix

* - another compilation fix

* - compilation fix

* - Fix

* - yet another compilation fix

* - suppresingly another compilation fix

* - lint

* - fix after review

* - fix

e4c2a854

15 9月, 2021 1 次提交
- Y
  
  Unify the functor definition of elementwise add, sub, mul, div, floordiv, max, min. (#35684) · 2367cca6
  由 Yiqun Liu 提交于 9月 15, 2021
  
  2367cca6
14 9月, 2021 2 次提交
- P
  
  fix elementwise_div npu op (#35700) · 0bbff93c
  由 pangyoki 提交于 9月 14, 2021
  
  0bbff93c
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · 12bf0502
  由 Yiqun Liu 提交于 9月 14, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
```
  12bf0502
13 9月, 2021 2 次提交
- Y
  Revert "Implement FunctionTraits to support two kinds of elementwise functor... · 40d4a295
  由 Yiqun Liu 提交于 9月 13, 2021
```
Revert "Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)" (#35686)
```
  40d4a295
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · d4f84d46
  由 Yiqun Liu 提交于 9月 13, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)
```
  d4f84d46
08 9月, 2021 1 次提交
- W
  multiply supports bool · db5fd2a1
  由 will-jl944 提交于 9月 08, 2021
```
multiply supports bool  
```
  db5fd2a1
07 9月, 2021 1 次提交
- N
  
  Modify the elementwise op according to the kernel primitive API (#34456) · eae4bf5b
  由 niuliling123 提交于 9月 07, 2021
  
  eae4bf5b
06 9月, 2021 1 次提交
- W
  Add the extra flag for the some ops (#35442) · 49797d85
  由 wawltor 提交于 9月 06, 2021
```
* Add the extra flag for the some ops

* fix the compile problem in matmul extra
```
  49797d85
03 9月, 2021 2 次提交
- Y
  
  Unify the implementation of AlignedVector and simplify the codes of dropout and cast. (#35373) · c171eca2
  由 Yiqun Liu 提交于 9月 03, 2021
  
  c171eca2
- W
  [NPU] Add elementwise_pow_grad npu op (#35278) · e913796c
  由 WJJ1995 提交于 9月 03, 2021
```
* add elementwise_pow_grad_npu

* fixed bug for CI

* deal with comments

* fixed bug for CI

* deal with comments
```
  e913796c
02 9月, 2021 1 次提交
- W
  add axis check for elementwise op while the dimension of x is equal to the... · 25871e0e
  由 wangxinxin08 提交于 9月 02, 2021
```
add axis check for elementwise op while the dimension of x is equal to the dimension of tensor (#35340)
```
  25871e0e
31 8月, 2021 1 次提交
- A
  
  NPU add elementwise_mod (#35245) · 561841d2
  由 Aganlengzi 提交于 8月 31, 2021
  
  561841d2
27 8月, 2021 1 次提交

add elementwise max grad op for npu (#34862) · 5310ceab

由 baoachun 提交于 8月 27, 2021

* add elementwise max grad op for npu

* add elementwise max grad op for npu

* add elementwise max grad op for npu

* add elementwise max grad op for npu

* add elementwise max grad op for npu

5310ceab

26 8月, 2021 1 次提交

[oneDNN] disable caching oneDNN primitives in matmul v2, Reduce grad and... · 31f0221f

由 Jacek Czaja 提交于 8月 26, 2021

[oneDNN] disable caching oneDNN primitives in  matmul v2, Reduce grad and elementwise_add grad, expand_v2 (#35132)

* - grad caching disabled of matmul_v1

- compilation fix

- compilation fix

* - reduction removed

* - Matmul v2 disabled caching

* Draft of further changes

* - workaround for reducegrad

* - fixes to UT

* - fix to compilation

* - another fix

* - fix

31f0221f

25 8月, 2021 2 次提交
- R
  
  [NPU] Fix the performance problem when 'axis' is not specified (#35116) · 91ba86b1
  由 ronnywang 提交于 8月 25, 2021
  
  91ba86b1
- T
  
  update elementwise api in kunlun (#35021) · ff96a7d5
  由 taixiurong 提交于 8月 25, 2021
  
  ff96a7d5
22 8月, 2021 1 次提交
- Z
  
  implementation of broadcast add backward by reduce (#34143) · 56c5e210
  由 Zhang Zheng 提交于 8月 22, 2021
  
  56c5e210
16 8月, 2021 1 次提交

[oneDNN] Fix to 34554 (same as previous PR but should build with GPU) (#34859) · 9cb65653

由 Jacek Czaja 提交于 8月 16, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

* - hopefully compilation fix

- compilation fix

9cb65653

12 8月, 2021 1 次提交
- C
  Revert "[oneDNN] Fix to issue #34554 (#34623)" (#34838) · dc62a227
  由 Chen Weihang 提交于 8月 12, 2021
```
This reverts commit 0a5c99e8.
```
  dc62a227
11 8月, 2021 2 次提交

[oneDNN] Fix to issue #34554 (#34623) · 0a5c99e8

由 Jacek Czaja 提交于 8月 11, 2021

* - Added softmax without caching

* - Binary is no longer manually cached

* - Activation onednn caching removed

* - Removed manual caching of activation

* - modified UT

* - fix

* - fix

* - fixes to building

* - fix

* - fix

* - fix to UT

* - Faulty UT workaround

* - approval workaround

* - Fixes after review

* - compilation fixes

* - more lint fixes

* - more fixes after review

* - fixes after another round of review

0a5c99e8

A

[NPU] add elementwise_min_grad_op_npu,test=develop (#34731) · 45af4f2a
由 andyjpaddle 提交于 8月 11, 2021

45af4f2a

09 8月, 2021 1 次提交

[NPU] add broadcast supporting for elementwise_add_op_npu (#34057) · b7355d8e

由 ronnywang 提交于 8月 08, 2021

* add broadcast supporting for elementwise_add

* add broadcast supporting for elementwise_add

* add more tests

* remove the redundant code

* update

* fix place error in unittest

* remove skip.If

b7355d8e

05 8月, 2021 1 次提交
- L
  
  Support Ternary ops in elmentwise and broadcast (#33976) · 1d7b75dd
  由 limingshu 提交于 8月 05, 2021
  
  1d7b75dd
07 7月, 2021 1 次提交
- T
  
  [xpu] add dropout & amp ops in xpu place (#33891) · 84e813e3
  由 taixiurong 提交于 7月 07, 2021
  
  84e813e3
05 7月, 2021 2 次提交
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
- L
  Enhance error message when x or y is empty in elementwise_op (#33928) · 70100e4f
  由 Leo Chen 提交于 7月 05, 2021
```
* enhance error message when x or y is empty in elementwise_op

* format code

* format code
```
  70100e4f
24 6月, 2021 1 次提交
- J
  [oneDNN] Fix to #33282 , added support of X input broadcasting to oneDNN elementwise ops (#33549) · 049dd853
  由 Jacek Czaja 提交于 6月 24, 2021
```
* - fix to #33282

* - Increased threshold for elementwise_mul_bf16 grad

* -disabled faulty UT

* - fix to approval
```
  049dd853
23 6月, 2021 1 次提交
- L
  
  Support Mod in elementwise system (#33052) · 10171806
  由 limingshu 提交于 6月 23, 2021
  
  10171806
12 6月, 2021 1 次提交
- L
  
  Support Div and FloorDiv functor in elementwise system (#33053) · fcd93b32
  由 limingshu 提交于 6月 12, 2021
  
  fcd93b32
04 6月, 2021 1 次提交
- L
  
  Reimplement logical functors with the new optimized elementwise function (#33089) · 941308c2
  由 limingshu 提交于 6月 04, 2021
  
  941308c2
02 6月, 2021 2 次提交
- L
  
  Support Add Sub Mul Max Min Pow binary functors in elementwise system (#33050) · b432d024
  由 limingshu 提交于 6月 02, 2021
  
  b432d024
- L
  
  Reimplement the comparision binary ops using the new optimized CUDA function (#33064) · 0f154961
  由 limingshu 提交于 6月 02, 2021
  
  0f154961
26 5月, 2021 1 次提交

[NPU] refine NpuOpRunner (#32869) · 8259d9bf

由 Leo Chen 提交于 5月 26, 2021

* refine ~npuOpRunner

* implement destructor and forbid copy

* use reference to avoid copy

* use const reference

* relax adam precision

* fix top_k

8259d9bf

25 5月, 2021 1 次提交

modify complex template for elementwise ops (#33071) · dbc08d69

由 chentianyu03 提交于 5月 25, 2021

* modify complex template for elementwise ops

* modify mul, div grad struct

* add complex template for CudaShuffleDownSync CudaShuffleXorSync funcs and fix the bug when delete cuda<9000

* fix shuffle func args bug

* fix shuffle func args bug

* fix shuffle func args bug

dbc08d69

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致