提交 · 7d4002e06a94a7a066ecf70bd476e62c3cdea6a4 · BaiXuePrincess / Paddle

20 4月, 2020 1 次提交
- M
  restrict block num of layer_norm_grad cuda block to 128 (#23878) · 7d4002e0
  由 mapingshuo 提交于 4月 20, 2020
```
restrict block num of layer_norm_grad cuda kernel to 128, test=develop
```
  7d4002e0
06 1月, 2020 1 次提交

Add TRT support for BERT (#21135) · 0a51098a

由 Pei Yang 提交于 1月 06, 2020

* add gelu plugin

* align trt bert with gpu

* add support for fused fc with relu,

* add unittest for bert trt

0a51098a

22 11月, 2019 1 次提交

Fix the crash issue when scale or bias was null-pointer. (#21284) · 69dd5152

由 Yihua Xu 提交于 11月 22, 2019

* Fix the crash issue when scale or bias was null-pointer.

test=develop

* Add the error message for passing CI.

test=develop

69dd5152

19 11月, 2019 1 次提交
- D
  
  extend elementwise broadcast function (#20957) · 0e7baabe
  由 danleifeng 提交于 11月 19, 2019
  
  0e7baabe
20 3月, 2019 1 次提交
- S
  fix op grad maker · 023a3a3d
  由 sneaxiy 提交于 3月 19, 2019
```
test=develop
```
  023a3a3d
08 3月, 2019 1 次提交
- T
  simplify the jitkernel templates and tests · 14a764c9
  由 tensor-tang 提交于 3月 08, 2019
```
test=develop
```
  14a764c9
07 3月, 2019 1 次提交
- T
  unify the kernelfuncs cache and add unit test · 802f362a
  由 tensor-tang 提交于 3月 07, 2019
```
test=develop
```
  802f362a
20 12月, 2018 1 次提交
- T
  fix enum style · 1aaec571
  由 tensor-tang 提交于 12月 20, 2018
```
test=develop
```
  1aaec571
18 12月, 2018 1 次提交
- T
  
  fix build · 6648995f
  由 tensor-tang 提交于 12月 17, 2018
  
  6648995f
17 12月, 2018 1 次提交
- T
  
  enable crf decoding and layer norm refer code · 720b55cb
  由 tensor-tang 提交于 12月 17, 2018
  
  720b55cb
19 11月, 2018 1 次提交

Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8

由 Yihua Xu 提交于 11月 19, 2018

* Optimize layer_norm operator with AVX intrinsic functions

* Revert the wrong modifications

* Implement the jit kernel for layer_norm operator

* Add math headfile to fix the compile issue (test=develop)

* Add math headfile to fix the compile issue (test=develop)

* Fixed the intrinsic headfile issue (test=develop)

* Fix the conflicts (test=develop)

* Revert for CUDA compiler (test=develop)

* Fixed the cuda depency (test=develop)

* Fix the marco issues (test=develop)

f4c869d8

16 11月, 2018 1 次提交

Refine operator cmake (#14413) · a2d9b344

由 Wu Yi 提交于 11月 16, 2018

* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop

a2d9b344

04 5月, 2018 1 次提交
- Y
  
  Clean and extract blas · ef6ea790
  由 Yu Yang 提交于 5月 04, 2018
  
  ef6ea790
25 3月, 2018 2 次提交
- X
  
  Pass cpu build · 1a4be55a
  由 Xin Pan 提交于 3月 25, 2018
  
  1a4be55a
- X
  Improve layer_norm speed · 904fa05f
  由 Xin Pan 提交于 3月 25, 2018
```
    transfomer on a single device step time
    reduces from 0.157 to 0.125
```
  904fa05f
15 2月, 2018 1 次提交

Update tensor_util.h (#8422) · cfffb1a3

由 Yi Wang 提交于 2月 14, 2018

* Update tensor_util.h

* Update with moved TensorDesc

* Fix tensur_utils.cu

* Update

* Update

* Update

* Update

* Make tensor_util.cu a symbolic link

cfffb1a3

12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
05 2月, 2018 3 次提交
- C
  
  code refine · 67731297
  由 chengduoZH 提交于 2月 05, 2018
  
  67731297
- C
  
  unifid GPU and CPU implementation · df0e74db
  由 chengduoZH 提交于 2月 05, 2018
  
  df0e74db
- C
  
  Separate GPU and CPU implementation · 5092f529
  由 chengduoZH 提交于 2月 03, 2018
  
  5092f529
03 2月, 2018 1 次提交
- C
  
  unifid GPU and CPU implementation · e0333735
  由 chengduoZH 提交于 2月 03, 2018
  
  e0333735
24 1月, 2018 1 次提交
- C
  
  add layer_norm · ca017719
  由 chengduoZH 提交于 1月 22, 2018
  
  ca017719
22 12月, 2017 1 次提交
- Q
  add data layout (#6832) · 6b475981
  由 QI JUN 提交于 12月 22, 2017
```
* add data layout

* fix ci
```
  6b475981
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

25 10月, 2017 1 次提交

CPU Batch Norm Op (#4964) · ee998a9c

由 Qiao Longfei 提交于 10月 24, 2017

* init batch norm op

* prepare input output

* compute mean_out var_out save_mean save_var on CPU

* active is test

* use eigen to do computation

* complete batch norm forward

* set default momentum to 0.9

* add batch norm grad op in CPU

* add tensor_format and NHWC support, add python test

* add test training

* add batch norm gradient test

* improve comment, fix foward Python UnitTest

* add gradient test

* fix eigen warning

* follow name style

* fix a bug

* change float to T

* add simple forward test

* test with different place

* add backward test

* refine python test

* remove old python test code

* code clean

* follow code style

* update comment

ee998a9c

10 10月, 2017 1 次提交
- A
  
  Implementing the fill constant op for the executor · 6efacc14
  由 Abhinav Arora 提交于 10月 09, 2017
  
  6efacc14
28 9月, 2017 1 次提交
- Y
  
  Add Skeleton of Double support · 3a5693e0
  由 Yu Yang 提交于 9月 27, 2017
  
  3a5693e0
20 9月, 2017 1 次提交
- D
  
  Share LoD between input and output of each opeators. · b65709e4
  由 dangqingqing 提交于 9月 19, 2017
  
  b65709e4
23 8月, 2017 1 次提交
- D
  
  Remove set functor and add comapre_grad test · f188e22b
  由 dangqingqing 提交于 8月 23, 2017
  
  f188e22b
11 8月, 2017 1 次提交
- Y
  
  Fix python unit tests · c99f84ac
  由 Yu Yang 提交于 8月 11, 2017
  
  c99f84ac
08 8月, 2017 1 次提交
- F
  
  fix bug · 28476676
  由 fengjiayi 提交于 8月 07, 2017
  
  28476676
07 8月, 2017 1 次提交
- D
  
  "remove type alias done." · 72fb86a2
  由 dongzhihong 提交于 8月 07, 2017
  
  72fb86a2
05 8月, 2017 1 次提交
- Y
  
  Reformat paddle/operators/* strictly following Google Style Guide · 9620df44
  由 Yi Wang 提交于 8月 04, 2017
  
  9620df44
02 8月, 2017 1 次提交
- F
  
  Add unittest for `FillZerosLikeOp` · 8bd73159
  由 fengjiayi 提交于 8月 01, 2017
  
  8bd73159
01 8月, 2017 1 次提交
- Y
  
  Follow comments and merge develop · e2fd2bd0
  由 Yu Yang 提交于 8月 01, 2017
  
  e2fd2bd0
26 7月, 2017 1 次提交
- F
  
  Add fill_zeros_like op · a2dc9614
  由 fengjiayi 提交于 7月 26, 2017
  
  a2dc9614
25 7月, 2017 1 次提交
- Y
  Add type_alias to import framework into ops · efc119b4
  由 Yu Yang 提交于 7月 25, 2017
```
Make implement an operator less noisy.
```
  efc119b4
19 7月, 2017 1 次提交
- Q
  
  add Flatten method to EigenVector · d9fa6159
  由 qijun 提交于 7月 19, 2017
  
  d9fa6159

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致