提交 · e59524f86d472f0f36e09cc41c3ca882d3fc2841 · 机器未来 / Paddle

11 1月, 2021 1 次提交

[cherry-pick]Elementwise add grad GPU kernel optimization (#30276) · e59524f8

由 wangchaochaohu 提交于 1月 11, 2021

* elementwise_add_grad Op optimization  (#29575)

* optimize for long width for elementwise (#29602)

* refine (#29622)

* delete the code for fp16 optimization because it is not faster than common template code (#29715)

* fix the shape choose of vectorize for cuda

* optimization for fp16 elementwise add (#29744)

* Fix the compiler error for half type (#29799)

* refine the compiler error for half2 operation (#29816)

* fix the compiler error when gcc4 cuda9.0 (#29997)

e59524f8

07 1月, 2021 1 次提交

[cherry pick] Some optimizations of elementwise_add, gelu and dropout for AMP (#30152) · 07f68fad

由 Leo Chen 提交于 1月 07, 2021

* Improve performance of elementwise_add grad op (#29187)

* pass stop_gradient for cast op

* improve performance of elementwise_add grad

* use tensor copy async

* dygraph branch

* fix dygraph branch

* add ut

* make gelu fp16 computing more robust (#29484)

* Add fast path for dropout when p == 0  (#29553)

* add fast path for p == 0 in dropout

* add ut

07f68fad

24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

30 12月, 2019 1 次提交
- D
  
  fix broadcast bug;test=develop (#21898) · b7697f62
  由 danleifeng 提交于 12月 30, 2019
  
  b7697f62
19 11月, 2019 1 次提交
- D
  
  extend elementwise broadcast function (#20957) · 0e7baabe
  由 danleifeng 提交于 11月 19, 2019
  
  0e7baabe
30 9月, 2019 1 次提交
- D
  Improve elementwise operators performance in same dimensions. (#19763) · 425279a5
  由 danleifeng 提交于 9月 30, 2019
```
Improve elementwise operators performance in same dimensions
```
  425279a5
14 5月, 2019 1 次提交
- K
  add elementwise_add_grad_grad op (#17366) · bd9bef5a
  由 Kaipeng Deng 提交于 5月 14, 2019
```
* add elementwise_add_grad_grad op. test=develop

* use defined GradMaker. test=develop
```
  bd9bef5a
13 5月, 2019 1 次提交

Optimize the elementwise op using eigen (#15494) · dcda2023

由 Yiqun Liu 提交于 5月 13, 2019

* Optimize the elementwise op with CUDA kernels.
test=develop

* Support setting of attr in op config file.
test=develop

* Add the support the setting dtype and initializer in config.
test=develop

* Save workspace.

* Add initializer "zeros".
test=develop

* Fix compiling error.

* Support the use of existed file to initailize tensor in op_tester.

* Use eigen to optimize the elementwise_add/mul for the case that x and y have the same dims.
test=develop

dcda2023

16 11月, 2018 1 次提交

Refine operator cmake (#14413) · a2d9b344

由 Wu Yi 提交于 11月 16, 2018

* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop

a2d9b344

08 11月, 2018 1 次提交

Fix input<tensor> (#14208) · c5b6573a

由 chengduo 提交于 11月 08, 2018

* fix input<tensor>
test=develop

* fix split_ids
test=develop

* ElementwiseMul should not support SelectedRows

* fix scale op
test=develop

* change GetTensorFromVar() method to GetTensorOrSelectedRowsFromVar()

* fix operator

* refine MultiOutput

* fix MultiOutput
test=develop

* disable test_dist_save_load
test=develop

* fix elementwise_op
test=develop

* add get_sparse_as_op
test=develop

* add info for check
test=develop

* rename get_sparse_as_op with extract_rows_as_op.
test=develop

* elementwise doesn't support selected_rows

* fix regularizer

* remove extract_rows_as
test=develop

* fix ci
test=develop

* add test for sum_op

* fix regularizer
test=develop

*  test=develop

* fix pserver weight decay multi inputs test=develop

c5b6573a

22 8月, 2018 1 次提交
- Y
  
  Process elemwise grad op's lod. mul_op's lod · 211d8186
  由 Yu Yang 提交于 8月 22, 2018
  
  211d8186
14 8月, 2018 1 次提交
- T
  
  Revert "Refine elementwise_add op" · 6a2a9a83
  由 tensor-tang 提交于 8月 14, 2018
  
  6a2a9a83
06 8月, 2018 1 次提交
- S
  
  refine elementwise_add op · b2d0ee51
  由 sneaxiy 提交于 8月 06, 2018
  
  b2d0ee51
01 8月, 2018 1 次提交

explicit gradient of elementwise_add/elementwise_sub (#11970) · 595a2c83

由 dzhwinter 提交于 8月 01, 2018

* "add gradient register"

* "make some enhance"

* "better format"

* "fix typo"

* "fix reuse"

* "fix get expected kernel"

* "change the mkldnn code"

* "fix mkldnn"

* "fix mkldnn failed test"

* "add comment"

595a2c83

24 5月, 2018 8 次提交
- T
  
  MKL optimized elementwise add: fix style check · 3e876b3e
  由 Tomasz Patejko 提交于 5月 24, 2018
  
  3e876b3e
- T
  
  MKL elementwise add backward: backward works for integral types with fall back to default impl · 9241011b
  由 Tomasz Patejko 提交于 5月 22, 2018
  
  9241011b
- T
  
  MKL elementwise add backward: grad inputs copied when they are not null · fde47aae
  由 Tomasz Patejko 提交于 5月 22, 2018
  
  fde47aae
- T
  
  MKL optimized elementwise add backward: coding style fixes · 996d12f1
  由 Tomasz Patejko 提交于 5月 21, 2018
  
  996d12f1
- T
  
  MKL elementwise add backward: Initial implementation with vector copy · 5a622c29
  由 Tomasz Patejko 提交于 5月 21, 2018
  
  5a622c29
- T
  
  MKL elementwise add: default implementation used for integral types, float16 and/or GPU · 01fb2be9
  由 Tomasz Patejko 提交于 5月 21, 2018
  
  01fb2be9
- T
  
  MKL elementwise_add: BLAS version compiles with integral types · 6f932482
  由 Tomasz Patejko 提交于 5月 19, 2018
  
  6f932482
- T
  
  MKL elementwise add: elementwise_add uses vAdd VML function when MKL is used · e43c8f33
  由 Tomasz Patejko 提交于 5月 17, 2018
  
  e43c8f33
23 2月, 2018 1 次提交
- Y
  Speed up elemwise grad (#8402) · 88c22e9d
  由 Yu Yang 提交于 2月 23, 2018
```
* Speed up elemwise grad

* Fix bug

* Add macro for MAX_BLOCK_DIM
```
  88c22e9d
12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
03 2月, 2018 1 次提交
- C
  
  Add layer norm [GPU] · 76e188e5
  由 chengduoZH 提交于 2月 02, 2018
  
  76e188e5
02 2月, 2018 1 次提交
- C
  
  refine elementwise_op · affce733
  由 chengduoZH 提交于 2月 02, 2018
  
  affce733
16 1月, 2018 1 次提交
- F
  
  refine elementwise_add_op · f59599a3
  由 fengjiayi 提交于 1月 16, 2018
  
  f59599a3
15 1月, 2018 1 次提交
- F
  
  remove unnecessary functor1 · 6ee8a2e1
  由 fengjiayi 提交于 1月 15, 2018
  
  6ee8a2e1
26 12月, 2017 1 次提交
- L
  
  unify the indentation of license · 761b3297
  由 Luo Tao 提交于 12月 26, 2017
  
  761b3297
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

06 12月, 2017 1 次提交
- C
  
  refine code · 8711a9a2
  由 chengduoZH 提交于 12月 06, 2017
  
  8711a9a2
05 12月, 2017 1 次提交
- C
  
  follow comments · 9e244a8c
  由 chengduoZH 提交于 12月 05, 2017
  
  9e244a8c
04 12月, 2017 1 次提交
- C
  
  code refine · fbbfe8b8
  由 chengduoZH 提交于 12月 04, 2017
  
  fbbfe8b8
28 9月, 2017 1 次提交
- Y
  
  Add Skeleton of Double support · 3a5693e0
  由 Yu Yang 提交于 9月 27, 2017
  
  3a5693e0
27 9月, 2017 1 次提交
- Q
  
  split elementwise_op.h into two header files · 0fa4b985
  由 qiaolongfei 提交于 9月 26, 2017
  
  0fa4b985
23 9月, 2017 1 次提交
- F
  
  Fix progma_once error · 948d1d78
  由 fengjiayi 提交于 9月 22, 2017
  
  948d1d78
22 9月, 2017 1 次提交
- G
  Elementwise operator. (#4139) · f99841dd
  由 gongweibao 提交于 9月 22, 2017
```
Elementwise operator add/sub/mul/div
```
  f99841dd

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致