提交 · 2a42250699dd29494840e7c05a45c35dbaf5a280 · PaddlePaddle / Paddle

07 12月, 2020 1 次提交

Compiling operator libraries with Unity build (#29130) · 671555ed

由 LoveAn 提交于 12月 07, 2020

* Compiling operator libraries with Unity Build on Windows CPU.

* Compiling operator libraries with Unity Build on Windows GPU, no_test, test=windows_ci

* Add option in windows ci script, no_test, test=windows_ci

* Optimize parallel compiling, test=develop

* remove limit of parallel compile and skip some ops in UB, test=develop

* remove changes of header file, test=develop

* remove changes of header file, test=develop

* fix test_eye_op unittest failed, test=develop

* Compiling operator libraries with Unity Build on Linux, test=develop

* set default WITH_UNITY_BUILD=OFF, test=develop

* Move unity build rules into a single file and add comment, test=develop

* optimize parallel compilation, test=develop

* fix undefined reference error on coverage ci, test=develop

671555ed

08 11月, 2020 1 次提交

exec ut no more than 15s 1 (#28439) · ba075632

由 YUNSHEN XIE 提交于 11月 08, 2020

* disable ut test_parallel_executor_fetch_isolated_var,test=document_fix

* test for limiting ut exec time as 15S

* fix an error caused by cannot find ut

* fix some error

* can not find test_transformer

* fix error caused by ut not run in windows

* fix error caused by Compiler Options

* fix error caused by setting timeout value as 15 in python/paddle/tests/CMakeLists.txt

* setting timeout value to 120s for old ut

* add the timeout value setting

* fix error caused by ut only run in coverage_ci

* add analyzer_transformer_profile_tester

* fix some error

* fix some error

* fix error with inference option

* fix error with inference option setting as ON_INFER

* add some ut to set timeout

* modified some option

* fix error

* fix some timeout error

* fix error

* fix error

* fix timeout for test_analyzer_bfloat16_resnet50

* fix error

* setting timeout properity for some ut

* first pr for new ut timeout as 15S

ba075632

22 9月, 2020 1 次提交
- Z
  Add the cpu version of segment sum mean max min op · f4c750d7
  由 Zhong Hui 提交于 9月 22, 2020
```
Add the cpu version of segment sum mean max min op
```
  f4c750d7
09 9月, 2020 1 次提交
- W
  
  [cuda11 support] change the CMakeLists to support the cuda11 (#27124) · c71d79b1
  由 wangchaochaohu 提交于 9月 09, 2020
  
  c71d79b1
27 4月, 2020 1 次提交
- Y
  
  Add the implementation of inverse (#23310) · ecfddebb
  由 Yiqun Liu 提交于 4月 27, 2020
  
  ecfddebb
24 4月, 2020 1 次提交
- Z
  
  fix compilation failure (#24091) · ab2e2842
  由 Zeng Jinle 提交于 4月 24, 2020
  
  ab2e2842
26 3月, 2020 1 次提交

[Paddle-TRT]: Ernie Dynamic shape support. (#23138) · 430b0099

由 Zhaolong Xing 提交于 3月 26, 2020

* add dynamic plugin support.
test=develop

* change emb eltwise layernorm to math function
test=develop

* add emb eltwise layernorm
test=develop

* can run dynamic shape ernie
test=develop

* fix ci
test=develop

* add ut for trt ernie dynamic

test=develop

* refine dynamic shape c++ interface.
test=develop

* fix comments
test=develop

* fix comments
test=develop

430b0099

11 9月, 2019 1 次提交

Implement the GPU kernel of fc operator (#19687) · a65c728e

由 Yiqun Liu 提交于 9月 11, 2019

* Refine the codes related to fc op.

* Add GPU implementation for fc functor.

* Apply fc_fuse_pass in GPU inference.
test=develop

* Change the cmake for fc op.

* Change PADDLE_ENFORCE to PADDLE_ENFORCE_EQ.

* Add an attribute to set the activation type in fc_op.

* Enhance the unittest of fc_op.
test=develop

* Remove the declaration of FCOpGrad back to the header file.
test=develop

* Set default value for newly added arguments in test_fc_op.
test=develop

a65c728e

05 9月, 2019 1 次提交

unify PADDLE_ASSERT_MSG into PADDLE_ENFORCE(error_message) (#19631) · 3ae939e4

由 Tao Luo 提交于 9月 05, 2019

* remove assert.h

* change PADDLE_ASSERT_MSG to PADDLE_ENFORCE

test=develop

* fix tensorrt paddle_enforce

test=develop

3ae939e4

02 2月, 2019 1 次提交
- P
  fix dependency · 061299be
  由 peizhilin 提交于 2月 02, 2019
```
test=develop
```
  061299be
30 1月, 2019 1 次提交
- X
  
  add sample_logits op · 58ad40cc
  由 xuezhong 提交于 1月 30, 2019
  
  58ad40cc
29 1月, 2019 1 次提交
- T
  refine softmax and use with cache · d59f7335
  由 tensor-tang 提交于 1月 28, 2019
```
test=develop
```
  d59f7335
24 1月, 2019 1 次提交

Add the CUDA kernel for beam_search op (#15020) · 3008fa12

由 Yiqun Liu 提交于 1月 24, 2019

* Refine the beam_search op and test.

* A basic CUDA implementation of beam_search for small batch_size.

* Implement CUDA kernel for beam_search_op.

* Use multiple CUDA threads in the same block to select the top beam.

* Update the python api of beam_search op.

* Enable extend function in CPU kernel of beam_search op.

* Unify the CUDA codes.
test=develop

* Unify the CPU kernel of beam_search op.

* Ensure the seletced items of beam_search_op's CPU kernel sorted by scores.

* Update the description of beam_search in API.spec.

* Enable the use of CUDA kernel in beam_search op.

* Exclude the beam_search's CUDA unittest when there is no CUDA gpu, and delete some debuging statements.
test=develop

* Follow comments.
test=develop

* Call the CPU kernel for beam_search op when batch_size > 4.
test=develop

* Remove the except of is_empty op in PrepareData.
test=develop

3008fa12

18 1月, 2019 1 次提交

Tree conv op (#15217) · e2ba9668

由 zhaozhehao 提交于 1月 18, 2019

* refactor tree2col operator with new memory mechanism test=develop

* test=develop

* test=develop

* Modified API according to panyx0718 test=develop

* fix API change according to heavengate test=develop

* Modify API comment test=develop

e2ba9668

04 1月, 2019 1 次提交
- T
  
  use seqpool jitkernel · e58a569c
  由 tensor-tang 提交于 12月 28, 2018
  
  e58a569c
17 12月, 2018 1 次提交
- T
  
  use vadd, vaddrelu, lstm and gru jitkernel · 64a90b2f
  由 tensor-tang 提交于 12月 17, 2018
  
  64a90b2f
05 12月, 2018 1 次提交
- T
  
  init jitkernel · 77236e33
  由 tensor-tang 提交于 11月 26, 2018
  
  77236e33
03 12月, 2018 1 次提交
- N
  
  add prelu gpu inference · f75815b7
  由 nhzlx 提交于 12月 03, 2018
  
  f75815b7
22 11月, 2018 1 次提交

Windows/online (#14474) · d9a1f3e5

由 wopeizl 提交于 11月 22, 2018

* add recordio support

* disable the openblas multi-thread on windows since no support
adjust the python script

* code style

* code style
test=develop

* add create_recordio_file_reader back

* fix code style
test=develop

* fix the gtest.cmake on windows

* fix cc_test on windows

* fix the win build
test=develop

* remove fused compile support on windows
test=develop

* add the jit support
test=develop

* add the jit support, test=develop

* add the jit support, test=develop

* add the jit back
fix compile error on windows

* rollback test=develop

* test case fix

* disable DSO by default on windows

* exclude warpctc_op on windows

* exclude the dynload_warpctc out on windows
test=develop

* fix the scripts error
test=develop

* disable avx on windows by default
test=develop

* re-organize the cmake file

* disable mkl on windows by default

* add warp_ctc back

* fix the dependency

* fix the dependency

* fix the build issue on windows

* remove unsupported flag on windows

* code style

* code style
test=develop

* fix issue

* add profiler, parallel_executor back

* clean up the pre-definitions on windows

* fix build issue

* test=develop

d9a1f3e5

19 11月, 2018 1 次提交

Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8

由 Yihua Xu 提交于 11月 19, 2018

* Optimize layer_norm operator with AVX intrinsic functions

* Revert the wrong modifications

* Implement the jit kernel for layer_norm operator

* Add math headfile to fix the compile issue (test=develop)

* Add math headfile to fix the compile issue (test=develop)

* Fixed the intrinsic headfile issue (test=develop)

* Fix the conflicts (test=develop)

* Revert for CUDA compiler (test=develop)

* Fixed the cuda depency (test=develop)

* Fix the marco issues (test=develop)

f4c869d8

18 11月, 2018 1 次提交
- P
  add the jit back · a3e952f4
  由 peizhilin 提交于 11月 18, 2018
```
fix compile error on windows
```
  a3e952f4
17 11月, 2018 1 次提交
- P
  add the jit support · c75dc885
  由 peizhilin 提交于 11月 17, 2018
```
test=develop
```
  c75dc885
16 11月, 2018 1 次提交

Make nce support more distribution. (#13549) · 17226782

由 whs 提交于 11月 16, 2018

* Fix truncated normal.

* Fix.

* Make nce support more distribution.

* Fix API.spec.

* Fix python API.

* Fix.
test=develop

* Fix API.spec
test=develop

* Fix sampler.

* Fix order of arguments in python API.
test=develop

17226782

08 11月, 2018 3 次提交
- P
  
  remove duplicate · 41b423d4
  由 peizhilin 提交于 11月 08, 2018
  
  41b423d4
- P
  
  merge from develop · dcfab111
  由 peizhilin 提交于 11月 08, 2018
  
  dcfab111
- Z
  
  Revert "cherry picked windows patches." · ba8b5619
  由 Zhaolong Xing 提交于 11月 08, 2018
  
  ba8b5619
06 11月, 2018 1 次提交
- T
  fix jit on mac · b81e1b65
  由 tensor-tang 提交于 11月 06, 2018
```
test=develop
```
  b81e1b65
05 11月, 2018 1 次提交
- P
  
  cpu build support · 9d67c1fb
  由 peizhilin 提交于 11月 05, 2018
  
  9d67c1fb
01 11月, 2018 3 次提交
- T
  
  refine jitcode and add vmul jitcode implementation · a3377f7b
  由 tensor-tang 提交于 11月 01, 2018
  
  a3377f7b
- T
  
  refine and init jitkernel vmul · a53b1b0b
  由 tensor-tang 提交于 11月 01, 2018
  
  a53b1b0b
- T
  
  add jit gencode · 2139b9f6
  由 tensor-tang 提交于 11月 01, 2018
  
  2139b9f6
31 10月, 2018 1 次提交
- D
  
  add back jit simd instructions. stage. · 31676583
  由 dzhwinter 提交于 10月 31, 2018
  
  31676583
30 10月, 2018 1 次提交
- D
  
  cleard. staged · bf2e4cb1
  由 dzhwinter 提交于 10月 30, 2018
  
  bf2e4cb1
26 10月, 2018 1 次提交
- T
  
  add crf decode jit kernel · 21487d78
  由 tensor-tang 提交于 10月 23, 2018
  
  21487d78
24 10月, 2018 1 次提交
- M
  
  Add unit-test for sequence_pooling functor · 047fa2f9
  由 minqiyang 提交于 10月 24, 2018
  
  047fa2f9
23 10月, 2018 1 次提交

Refine Split op (#13967) · a7497653

由 chengduo 提交于 10月 23, 2018

* speedup split_op
test=develop

* speedup split_op
test=develop

* rename ConcatGrad to Split

* refine concat and split
test=develop

* fix compile error

a7497653

22 10月, 2018 1 次提交
- T
  
  add fusion gru jit kernel · 640e789d
  由 tensor-tang 提交于 10月 22, 2018
  
  640e789d
18 10月, 2018 1 次提交
- T
  
  fix illegal instruction of rnn1 and text · 36588b33
  由 tensor-tang 提交于 10月 18, 2018
  
  36588b33
17 10月, 2018 1 次提交
- N
  Add ceil model pooling for trt (ocr attention) · 2b5edfbc
  由 nhzlx 提交于 10月 17, 2018
```
test=develop
```
  2b5edfbc
11 10月, 2018 1 次提交
- M
  Accelerate SequencePool Op on SUM mode · 0385b0a1
  由 minqiyang 提交于 10月 11, 2018
```
test=develop
```
  0385b0a1

PaddlePaddle / Paddle 大约 1 年 前同步成功

PaddlePaddle / Paddle
大约 1 年前同步成功