提交 · 676995c86cb4b49f9a41c7a32c5e054b16201753 · Crayon鑫 / Paddle

22 2月, 2019 1 次提交

Optimze Gelu with MKL Erf function (#15770) · 676995c8

由 Yihua Xu 提交于 2月 22, 2019

* Optimize for gelu operator

* Set up the low accuracy mode of MKL ERF function.

test=develop

* Only enable MKLML ERF when OS is linux

* Use the speical mklml version included vmsErf function to verify gelu mkl kernel.

test=develop

* Add the CUDA macro to avoid NVCC's compile issue.

test=develop

* Add the TODO comments for mklml library modification.

test=develop

* Clean Code

test=develop

* Add the comment of marco for NVCC compiler.

test=develop

676995c8

21 2月, 2019 5 次提交

T
disable dam temporarily (#15860) · e3dd6970
由 Tao Luo 提交于 2月 21, 2019
```
test=develop
```
e3dd6970
D

test=develop · 35a90e06
由 Dun Liang 提交于 2月 21, 2019

35a90e06
D

test=develop · c9080f51
由 Dun Liang 提交于 2月 21, 2019

c9080f51
D

test=develop · 1c7bb0e4
由 Dun Liang 提交于 2月 21, 2019

1c7bb0e4

Profiler refine and add CUDA runtime api tracer (#15301) · a83e4704

由 Dun 提交于 2月 21, 2019

* refine profiler && add runtime tracer

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* test=develop

* fix bug && test=develop

* add thread id map && test=develop

* test=develop

* testing

* bug fix

* remove cuda event && refine code && test=develop

* test=develop

* test=develop

* test=develop

* fix windows temp file && test=develop

* test=develop

* fix windows bug && test=develop

* fix start up issue && test=develop

* code polish &&  test=develop

* remove unused code && test=develop

* add some cupti cbid && test=develop

* add FLAGS_multiple_of_cupti_buffer_size && test=develop

* fix compile error && test=develop

* add keyword && test=develop

* fix && test=develop

* code polish && test=develop

a83e4704

20 2月, 2019 2 次提交
- M
  Enable momentum operator for a ngraph engine (#15673) · 13ec2d33
  由 mozga-intel 提交于 2月 20, 2019
```
* Enable momentum operator for a ngraph engine
test=develop

* Update tests
test=develop

* Unnecessary line of the code as intended was removed
test=develop
```
  13ec2d33
- T
  
  remove legacy any.cmake · c797a1f0
  由 Tao Luo 提交于 2月 20, 2019
  
  c797a1f0
19 2月, 2019 3 次提交
- T
  fix warnings (#15790) · e1c707fe
  由 tensor-tang 提交于 2月 19, 2019
```
* fix warnings

test=develop

* fix enforce test

test=develop
```
  e1c707fe
- S
  fix enforce_test · 9b8e0e2f
  由 sneaxiy 提交于 2月 19, 2019
```
test=develop
```
  9b8e0e2f
- S
  fix many warning · 209b3557
  由 sneaxiy 提交于 2月 19, 2019
```
test=develop
```
  209b3557
14 2月, 2019 1 次提交
- S
  fix enforce · f0590947
  由 sneaxiy 提交于 2月 13, 2019
```
test=develop
```
  f0590947
11 2月, 2019 2 次提交
- D
  
  add details. test=develop · 04e9776a
  由 dzhwinter 提交于 2月 11, 2019
  
  04e9776a
- M
  Enable batch_norm operator for a ngraph engine · 1198ccae
  由 mozga-intel 提交于 2月 11, 2019
```
test=develop
```
  1198ccae
03 2月, 2019 1 次提交
- P
  fix the lib_any dependency · 883d2209
  由 peizhilin 提交于 2月 03, 2019
```
test=develop
```
  883d2209
02 2月, 2019 2 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
- B
  Enable accuracy op for ngraph engine (#15592) · ac4cde00
  由 baojun 提交于 2月 02, 2019
```
* Added accuracy ngraph op test=develop

* fixed name type test=develop
```
  ac4cde00
31 1月, 2019 2 次提交

expose peak gpu memory API to python test=develop (#15529) · 6e84eb13

由 liuwei1031 提交于 1月 31, 2019

* expose peak gpu memory API to python test=develop

* add unittest for peak gpu memory monitoring test=develop

* add pybind change test=develop

* add mutex to gpu mem usage monitor test=develop

* update benchmark flag definition file test=develop

* tweak unittest for memory monitoring test=develop

6e84eb13

G
To make CUDA_LAUNCH_KERNEL_HELPER support large size. · 5dfce931
由 guoshengCS 提交于 1月 31, 2019
```
test=develop
```
5dfce931

28 1月, 2019 2 次提交
- T
  add jit kernel hsum, hmax and softmax refer code · 81177258
  由 tensor-tang 提交于 1月 25, 2019
```
test=develop
```
  81177258
- S
  fix compile error in distributed mode · ba4f43fd
  由 sneaxiy 提交于 1月 28, 2019
```
test=develop
```
  ba4f43fd
24 1月, 2019 2 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

S
lazy_allocator · 51227bd4
由 sneaxiy 提交于 1月 23, 2019
```
test=develop
```
51227bd4

23 1月, 2019 1 次提交
- T
  checkpoint at distributed training (#14854) · 8b50ad80
  由 tangwei12 提交于 1月 23, 2019
```
checkpoint for distributed training.
```
  8b50ad80
16 1月, 2019 1 次提交
- M
  
  Add single GPU support to imperative · 315b133e
  由 minqiyang 提交于 1月 16, 2019
  
  315b133e
14 1月, 2019 2 次提交
- P
  fix issue when type is invalid · eea75a1d
  由 peizhilin 提交于 1月 14, 2019
```
test=develop
```
  eea75a1d
- C
  Revert "Revert "Remove workspace_handle in conv_cudnn (#15186)"" (#15290) · 46d01d79
  由 chengduo 提交于 1月 13, 2019
```
test=develop
This reverts commit 358e657f.
```
  46d01d79
11 1月, 2019 4 次提交

C
fix thread safe bug · c4eced98
由 chengduozh 提交于 1月 11, 2019
```
test=develop
```
c4eced98
C
Revert "Remove workspace_handle in conv_cudnn (#15186)" · 358e657f
由 chengduozh 提交于 1月 11, 2019
```
test=develop
This reverts commit 064512aa.
```
358e657f
W
Fix performance drop when with MKL-DNN · cb2ba584
由 Wojciech Uss 提交于 1月 11, 2019
```
test=develop
```
cb2ba584

Remove workspace_handle in conv_cudnn (#15186) · 064512aa

由 chengduo 提交于 1月 10, 2019

* remove workspace_handle in conv2d_cudnn
test=develop

* remove workspace_handle
test=develop

* fix bug
test=develop

* make test_conv2d_op SERIAL
test=develop

* save memory in conv_cudnn
test=develop

* enhance thread safety
test=develop

* enhance temporary allocator
test=develop

* Add excess fraction
test=develop

* follow comments
test=develop

* fix bug and code refine
test=develop

* fix memory size check
test=develop

* rename reuse_tmp_allocation_excess_fraction
test=develop

064512aa

10 1月, 2019 2 次提交

Conv int8 residual (#15145) · 8f17c714

由 xiaolil1 提交于 1月 10, 2019

* Enable basic MKL-DNN INT8 Conv OP
test=develop

* Modify test case
test=develop

* Clean unittest code
test=develop

* Fix test
test=develop

* Modify test
test=develop

* Enable MKL-DNN INT8 Conv with Relu Fusion OP
test=develop

* Enable INT8 Conv with residual fusion OP
test=develop

* Modify code.
test=develop

* Modify basic INT8 Conv
test=develop

* Modify Conv.
test=develop

* fix style
test=develop

* Fix style
test=develop

* Fix test
test=develop

* Modify code.
test=develop

* Fix test
test=develop

8f17c714

P
adjust the shlwapi on windows · 439691f5
由 peizhilin 提交于 1月 10, 2019
```
test=develop
```
439691f5

09 1月, 2019 1 次提交
- P
  add the enable_debug flag · c1235c93
  由 peizhilin 提交于 1月 09, 2019
```
test=develop
```
  c1235c93
08 1月, 2019 6 次提交
- M
  Enable element_wise_add operator for a ngraph · a42f8f4f
  由 mozga-intel 提交于 1月 07, 2019
```
test=develop
```
  a42f8f4f
- P
  
  use thread local instance test=develop · 1cd95d8a
  由 peizhilin 提交于 1月 08, 2019
  
  1cd95d8a
- S
  Revert "Revert "Remove op handle lock"" · ed409ac9
  由 sneaxiy 提交于 1月 08, 2019
```
test=develop
```
  ed409ac9
- P
  
  not include the numeric under linux test=develop · d54133ea
  由 peizhilin 提交于 1月 08, 2019
  
  d54133ea
- P
  
  add the python callstack for debug support test=develop · a6f5ceee
  由 peizhilin 提交于 1月 08, 2019
  
  a6f5ceee
- Z
  Revert "Remove op handle lock" · dacfaaa9
  由 Zeng Jinle 提交于 1月 08, 2019
```
test=develop
```
  dacfaaa9

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致