提交 · b207b8a7bf312f0aaf30b9f4296928a3e0787eaf · BaiXuePrincess / Paddle

12 1月, 2021 1 次提交

[cherry-pick]memory optimization for fuse pattern of elemwise_add + act (#30303) · b207b8a7

由 wangchaochaohu 提交于 1月 12, 2021

* reduce the  occupied size  of memory for the fused pattern of elementwise_add Op and activation Op(relu Op for example) (#29885)

* register OPMaker and Infer Shape Check for fused_elementwise_add (#30259)

b207b8a7

16 10月, 2019 1 次提交
- Q
  Support fp16 in GPU impl of fused_elemwise_activation_op. (#20636) · 01eddc1a
  由 qingqing01 提交于 10月 16, 2019
```
* Support fp16 in fused_elemwise_activation_op.
* Fix unit testing in ONLY-CPU mode.
```
  01eddc1a
16 11月, 2018 1 次提交

Refine operator cmake (#14413) · a2d9b344

由 Wu Yi 提交于 11月 16, 2018

* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop

a2d9b344

08 8月, 2018 1 次提交

Feature/op fusion (#12240) · 7c8b69c7

由 chengduo 提交于 8月 08, 2018

* Add Preface

* Add demo code

* Save file

* Refine code

* seems can work

* use elementwise strategy

* Use ElementwiseComputeEx

* Add comments

* extract functions from operator

* Refine code

* Follow comment

* code refine

* follow comments

* follow comments

7c8b69c7

12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
26 12月, 2017 1 次提交
- L
  
  unify the indentation of license · 761b3297
  由 Luo Tao 提交于 12月 26, 2017
  
  761b3297
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

16 11月, 2017 1 次提交

feature/while_grad_op (#5554) · 18f0c40a

由 Yang Yang(Tony) 提交于 11月 16, 2017

* first commit

* Python API for while op

* Python Unittest for simple while_op forward

* fix out to be list

* Fix UT

* VarType

* Fix several bugs

* Fix bug

* Fix bug

* Fix Bug

* Fix bug

* Fix unittest

* Remove debug log

* Add comments

* add PADDLE_ENFORCE

* while_grad_op first commit

* Add `BlockDescBind::FindRecursiveOrCreateVar()` and fix bugs

* not sure how to setdim of while outputs

* push for test

* add executor vlog

* fix bug of while_op cond

* Several enhancement for code

1. Backward always infer shape & infer var type. Since there are RENAME
variables will be created when creating backward operator, but their
shape & var types are not inferenced.
2. Never use SomePtr-> directly, since every pointer could be nullptr if
it is a function return value. Add `detail::Ref` to cast pointer to
reference safely.
3. Enhance error message for backward.
4. Infer data type of variable in `sum` and `tensor_write`

* Fix bugs of while_op gradient

* Fix several bugs of while_op grad

* fix fill zeros like

* fix 3 >= 3

* fix place holder shouldn't be null

* fail on sum op

* Fix SumOp of TensorList

* clean up

* pass while test

* fix test_array_write_read

* pass sum op

* Support int/int64 for fill_constant_batch_size_like

* Fix compile

18f0c40a

11 11月, 2017 1 次提交
- D
  
  Use G++ to compile some cu operators. · f5e36765
  由 dangqingqing 提交于 11月 11, 2017
  
  f5e36765
09 11月, 2017 1 次提交
- D
  
  remove header file paddle/framework/eigen.h · cceed081
  由 dangqingqing 提交于 11月 09, 2017
  
  cceed081
08 11月, 2017 1 次提交
- D
  
  Remove fill_constant_batch_size_like_op.h and clean some operator codes. · e5791dd1
  由 dangqingqing 提交于 11月 08, 2017
  
  e5791dd1
26 10月, 2017 1 次提交
- Y
  
  add fill constant batch size like op (#5057) · 6cc2ce01
  由 Yang Yang(Tony) 提交于 10月 25, 2017
  
  6cc2ce01
10 10月, 2017 1 次提交
- A
  
  Implementing the fill constant op for the executor · 6efacc14
  由 Abhinav Arora 提交于 10月 09, 2017
  
  6efacc14
08 8月, 2017 2 次提交
- D
  
  "fix clang format error" · 2c553e4f
  由 dongzhihong 提交于 8月 08, 2017
  
  2c553e4f
- D
  
  Update fill_zeros_like_op.cu · 7945572c
  由 dzhwinter 提交于 8月 08, 2017
  
  7945572c
07 8月, 2017 1 次提交
- D
  
  "remove type alias done." · 72fb86a2
  由 dongzhihong 提交于 8月 07, 2017
  
  72fb86a2
04 8月, 2017 2 次提交
- F
  
  fix bug · 5d7e8bfb
  由 fengjiayi 提交于 8月 03, 2017
  
  5d7e8bfb
- L
  
  Add cpplint for *.h and cuda *.cu · b58725bd
  由 liaogang 提交于 8月 04, 2017
  
  b58725bd
26 7月, 2017 1 次提交
- F
  
  Add fill_zeros_like op · a2dc9614
  由 fengjiayi 提交于 7月 26, 2017
  
  a2dc9614

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致