提交 · 718183f1c37ed7c1b4bcba924e08fc0ca0932ad6 · PaddlePaddle / Paddle

05 1月, 2022 1 次提交

implementation of broadcast div backward by reduce (#38044) · 55cd9cb8

由 crystal 提交于 1月 05, 2022

* add elementwise div

* move mul and div grad functor

* Combine multiple CUDA kernels

* Update the reduce interface call

* add multi-output

* add multi-output div

* add branch judge

* Package branch

* Combine the x and y functions into one

55cd9cb8

04 1月, 2022 1 次提交

[Pten]Move CPU_implementation of elementwise kernel in new directory (#38651) · 7c020c71

由 YuanRisheng 提交于 1月 04, 2022

* change 'math' to 'math_kernel'

* fix compile bugs

* merge develop

* fix compile bugs

* move cpu_impl of elementwise kernel to new directory

7c020c71

18 12月, 2021 1 次提交
- F
  add complex op (#37918) · 31e874b1
  由 Feiyu Chan 提交于 12月 18, 2021
```
* add complex op and `paddle.complex`.
```
  31e874b1
15 12月, 2021 1 次提交
- Y
  Change a comment to avoid the disturb to op benchmark ci. (#38148) · 4d8242df
  由 Yiqun Liu 提交于 12月 15, 2021
```
test=document_fix
```
  4d8242df
09 12月, 2021 1 次提交
- C
  
  adjust main dir (#37916) · 1911b6f0
  由 Chen Weihang 提交于 12月 08, 2021
  
  1911b6f0
03 12月, 2021 1 次提交
- R
  refine structure for cuda and rocm (#37202) · a6d2fddb
  由 ronnywang 提交于 12月 03, 2021
```
* refine structure for cuda and rocm

* update

* update

* update

* update
```
  a6d2fddb
12 11月, 2021 1 次提交

[Pten]Refactor the Elementwise_add Kernel (#37043) · c1310343

由 YuanRisheng 提交于 11月 12, 2021

* elementwise_add kernel refactor

* fix compile bugs in elementwise_add refactor

* fix compile bugs when run in npu/xpu

* fix bugs when run unit test

* fix bugs when run ci-windows

* modify code as recommended

* code format adjust

* fix bugs when run ci

* fix compile bug when run in ci-windwos

c1310343

21 10月, 2021 1 次提交

Add viterbi decode (#35778) · 6072aecb

由 Jack Zhou 提交于 10月 21, 2021

* add viterbi decode cpu kernel

* add viterbi decoder api in paddle.text

* add a data buffer once to avoid create many small pieces of data buffer frequently

* fix viterbi max_seq_length bug

* fix seq_len=1 bug

* fix device context

* move split out of for loop

* remove INVERSE_SUB

* remove 2 GET_CAST_MASK

* remove 1 loop

* remove Functor

* add to_static deploy code

* use MAX_FUNC instead of ELE_MAX

* add MaxFunctor

* impl max_func

* remove MaxFunctor

* remove cast op

* use REGISTER_OP_WITHOUT_GRADIENT

* add viterbi cuda kernel

* add FIX_BLOCKDIM_CASE macro

* add MKL add, mul; add get data mask

* add arange mkl impl

* add CPU Argmax

* add cpu gather

* use EXECUTE_MKL_ELEMENT_BINARY_OP instead of some ADD, MUL

* use SameDimsBinaryOP instead of EXECUTE_MKL_ELEMENT_BINARY_OP

* use SAME_DIMS_ELEMENT_BINARY_OP

* add SimpleBroadcastBinaryOP

* use int instead of int64_t to accelerate

* optimize SimpleBroadcastBinaryOP

* optimize SimpleBroadcastBinaryOP

* optimize performance in both single thread and multithread situation

* remove useless line

* remove useless code

* add CREATE_TENSOR_BUFFER macro

* add INIT_REQUIRED_TENSOR macro

* add comment

* fix windows ci

* add viterbi unittest

* remove cuda add functor

* remove cuda equal

* remove a template function

* fix windows ci

* fix windows dtype

* remove some template instance

* remove useless header file

* remove some blockdim

* remove transpose impl

* accelerate cpu performance on single thread situation

* viterbi_decode->crf_decode

* rename crf params name

* add viterbi api test

* remove useless import

* add enable_static

* use viterbi decoder

* fix viterbi len=1

* fix  viterbi unittest

* remove useless comments

* reconstruct viterbi decode

* remove ADD,SUB,MUL structure

* fix coverage

* remove CREATE_TENSOR

* add name args

* crf.py->ops.py; with_start_stop_tag->include_start_end_tag

* update crf_decode en docs

* fix viterbi decode en docs

* fix some review comments

* add FIXED_BLOCK_DIM_CASE in cuda

* push_back->emplace_back

* crf_decode->viterbi_decode; include_start_end_tag->include_bos_eos_tag

* paddle.text.ops.viterbi_decode->paddle.text.viterbi_decode

* fix viterbi_decode en docs

6072aecb

15 9月, 2021 1 次提交
- Y
  
  Unify the functor definition of elementwise add, sub, mul, div, floordiv, max, min. (#35684) · 2367cca6
  由 Yiqun Liu 提交于 9月 15, 2021
  
  2367cca6
14 9月, 2021 1 次提交
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · 12bf0502
  由 Yiqun Liu 提交于 9月 14, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35688)
```
  12bf0502
13 9月, 2021 2 次提交
- Y
  Revert "Implement FunctionTraits to support two kinds of elementwise functor... · 40d4a295
  由 Yiqun Liu 提交于 9月 13, 2021
```
Revert "Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)" (#35686)
```
  40d4a295
- Y
  Implement FunctionTraits to support two kinds of elementwise functor and... · d4f84d46
  由 Yiqun Liu 提交于 9月 13, 2021
```
Implement FunctionTraits to support two kinds of elementwise functor and remove some old codes for broadcast. (#35487)
```
  d4f84d46
22 8月, 2021 1 次提交
- Z
  
  implementation of broadcast add backward by reduce (#34143) · 56c5e210
  由 Zhang Zheng 提交于 8月 22, 2021
  
  56c5e210
05 7月, 2021 2 次提交
- W
  
  Add fused elemwise gelu and optimize performance (#33480) · eae31856
  由 WangXi 提交于 7月 05, 2021
  
  eae31856
- L
  Enhance error message when x or y is empty in elementwise_op (#33928) · 70100e4f
  由 Leo Chen 提交于 7月 05, 2021
```
* enhance error message when x or y is empty in elementwise_op

* format code

* format code
```
  70100e4f
04 6月, 2021 1 次提交
- L
  
  Reimplement logical functors with the new optimized elementwise function (#33089) · 941308c2
  由 limingshu 提交于 6月 04, 2021
  
  941308c2
02 6月, 2021 2 次提交
- L
  
  Support Add Sub Mul Max Min Pow binary functors in elementwise system (#33050) · b432d024
  由 limingshu 提交于 6月 02, 2021
  
  b432d024
- L
  
  Reimplement the comparision binary ops using the new optimized CUDA function (#33064) · 0f154961
  由 limingshu 提交于 6月 02, 2021
  
  0f154961
12 4月, 2021 1 次提交

[ROCM] fix some unittests (#32129) · bd2a4e23

由 ronnywang 提交于 4月 12, 2021

* [ROCM] fix test_gru_rnn_op

* [ROCM] fix test_expand_op

* [ROCM] fix test_cross_entropy_loss

* [ROCM] fix test_conv_nn_grad

* [ROCM] fix test_bilinear_tensor_product_op

* [ROCM] fix elementwise_op_function

* [ROCM] fix test_lstm_cudnn_op

* [ROCM] fix test_gpu_package_without_gpu_device

* [ROCM] fix test_gru_unit_op

* [ROCM] fix test_imperative_optimizer

* [ROCM] fix rnn

* [ROCM] fix group_norm_op

* [ROCM] fix test_pool3d_api

* [ROCM] fix test_pool3d_op

bd2a4e23

10 3月, 2021 1 次提交
- J
  
  Optimization of elementwise CUDA kernel (#30801) · 45c7d905
  由 JamesLim 提交于 3月 10, 2021
  
  45c7d905
03 3月, 2021 1 次提交

[ROCM] update fluid elementwise op for rocm (part10), test=develop (#31361) · 7cdf6ea7

由 Qi Li 提交于 3月 03, 2021

* [ROCM] update fluid elementwise op for rocm (part10), test=develop

* update, test=develop

* address review comments, test=develop

7cdf6ea7

03 2月, 2021 1 次提交
- W
  fix the broadcast for the large second input (#30818) · b7560a59
  由 wawltor 提交于 2月 03, 2021
```
fix the broadcast for the large second input 
```
  b7560a59
10 1月, 2021 1 次提交
- W
  reduce the occupied size of memory for the fused pattern of elementwise_add... · af80859d
  由 wangchaochaohu 提交于 1月 10, 2021
```
reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)
```
  af80859d
05 8月, 2020 1 次提交
- Z
  add eltwise clip cuda impl. (#25689) · 5970871a
  由 Zhaolong Xing 提交于 8月 05, 2020
```
test=develop
```
  5970871a
16 6月, 2020 1 次提交
- L
  
  fix dtype error of compare op, test=develop (#25059) · 028de857
  由 Leo Chen 提交于 6月 16, 2020
  
  028de857
12 5月, 2020 1 次提交
- W
  Fix the elementwise ops in broadcast in the process of backward (#24319) · 2de5075a
  由 wawltor 提交于 5月 12, 2020
```
* Remove the error in the elementwise op, use the backup mode to calculate
```
  2de5075a
11 5月, 2020 1 次提交

Add macro BOOST_GET to enrich the error information of boost :: get (#24175) · aa0f254f

由 Chen Weihang 提交于 5月 11, 2020

* add new macro BOOST_GET_SAFELY & unittests, test=develop

* add different macro type, test=develop

* fix get macro type in executor, test=develop

* four macro part change backup

* using one macro for all case, test=develop

* revert attribute change, test=develop

* change to three func to solve gcc4.8 bug, test=develop

* polish some details, test=develop

aa0f254f

13 4月, 2020 1 次提交

elementwise ops error message enhancement，the python error message had add before · 289edf39

由 LutaoChu 提交于 4月 13, 2020

Those ops add the kernel message enhancement, as follows
paddle.fluid.layers.elementwise_add	
paddle.fluid.layers.elementwise_div
paddle.fluid.layers.elementwise_floordiv
paddle.fluid.layers.elementwise_max	
paddle.fluid.layers.elementwise_min	
paddle.fluid.layers.elementwise_mod	
paddle.fluid.layers.elementwise_mul	
paddle.fluid.layers.elementwise_pow	
paddle.fluid.layers.elementwise_sub

289edf39

03 4月, 2020 2 次提交
- Z
  Fix elementwise compile error, test=develop (#23381) · 01d7ccd4
  由 zhaoyuchen2018 提交于 4月 03, 2020
```
elementwise function used before definition then failed in cuda 8, move it ahead.
```
  01d7ccd4
- Z
  improve elementwise performance. (#23405) · 4fe9ca69
  由 zhaoyuchen2018 提交于 4月 03, 2020
```
* improve elementwise performance.

* Add contiguous check, test=develop
```
  4fe9ca69
29 3月, 2020 1 次提交

Improve elementwise performance. (#23001) · 58615a62

由 zhaoyuchen2018 提交于 3月 29, 2020

* Improve elementwise performance.

Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern.

* Add some cuda kernel to speedup common broadcast cases. test=develop

* Add more test cases and fix cuda kernel bug. test=develop

* Remove tests as cpu percision fails.test=develop

* Refine SplitDims, test=develop

* Change file mode, test=develop

58615a62

25 3月, 2020 1 次提交
- Z
  
  add Tensor::IsSharedBufferWith method, test=develop (#23175) · 7ca77a90
  由 Zeng Jinle 提交于 3月 25, 2020
  
  7ca77a90
17 1月, 2020 1 次提交
- Q
  
  Fix infer_shape in compling for elementwise_op (#22291) · 2d20869c
  由 qingqing01 提交于 1月 17, 2020
  
  2d20869c
19 11月, 2019 1 次提交
- D
  
  extend elementwise broadcast function (#20957) · 0e7baabe
  由 danleifeng 提交于 11月 19, 2019
  
  0e7baabe
10 10月, 2019 1 次提交
- D
  
  fix error message for elementwise_add/mul (#20283) · 3a0f93b3
  由 danleifeng 提交于 10月 10, 2019
  
  3a0f93b3
04 9月, 2019 1 次提交
- D
  elementwise broadcast function enhancement (#19536) · 8672e153
  由 danleifeng 提交于 9月 04, 2019
```
elementwise broadcast function enhancement
```
  8672e153
20 8月, 2019 1 次提交
- Z
  Fix elementwise performance poor issue (#19278) · 5296294d
  由 zhaoyuchen2018 提交于 8月 20, 2019
```
For small case use 1D block is better than 2D block.

Refer to this issue: #19275
```
  5296294d
14 6月, 2019 1 次提交
- Y
  Optimize fused_elewise_activation_grad op. (#18041) · 660c1a65
  由 Yiqun Liu 提交于 6月 14, 2019
```
test=develop
```
  660c1a65
20 5月, 2019 1 次提交

Double backward elementwise div (#17416) · 10b23a72

由 lvmengsi 提交于 5月 20, 2019

* double backward, elementwise_div

* fix dx empty. test=develop

* bug fix (#17392)

fix secure bug

* Eanble stack operator for a Ngraph, test=develop (#17406)

* fix sqrt_grad_grad unittest. test=develop (#17410)

* fix sqrt_grad_grad unittest. test=develop

* disable sqrt_grad_grad unittest. test=develop

* test=develop, fix unittest

* test=develop, fix unittest

* test=develop, fix unittest

* test=develop, fix bug

* fix unittest. test=develop

* fix unittest dx. test=develop

* tmp fix! for test... test=develop

* reduce tmp, test=develop

* test=develop, reduce tmp

* fix broadcast unittest. test=develop

* fix format. test=develop

* refine code. test=develop

* refine code. test=develop

* refine GetDoubleGradSafeTensor. test=develop

* fix format. test=develop

10b23a72

13 5月, 2019 1 次提交

add double grad for elementwise_mul op (#17255) · 8bae8590

由 Kaipeng Deng 提交于 5月 13, 2019

* add double grad for elementwise_mul. test=develop

* remove comment. test=develop

* fix grad sum. test=develop

* fix for axis expand. test=develop

* add test for axis expand. test=develop

8bae8590

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功