提交 · 458b16f42a03fd68af4da05bb93fbc6bf2a75f9e · 机器未来 / Paddle

29 10月, 2018 1 次提交

Rebase of seqpool-max optimization · 458b16f4

由 Jacek Czaja 提交于 10月 23, 2018

test=develop

- Added rough profiling

- Profiled maxpool itself

- First draft of max seqpool optimization (is_test added)

- Added unit tests to seqpool

- Cosmetic fixes

- Fix to UT of Seq pool

Disabled grad checking for sequence max pool when is_test is set to True

-Cosmetic fix to comment

test=develop

- Fix to GPU build

test=develop

- yet another GPU fix for sequence max pool

- Fix to comment

test=develop

- Change to API of sequence_pool

test=develop

- Yet another API spec change

test=develop

458b16f4

25 10月, 2018 1 次提交
- T
  fix avx error · d24d282a
  由 tensor-tang 提交于 10月 25, 2018
```
test=develop
```
  d24d282a
24 10月, 2018 6 次提交
- M
  Move code to SumSeqPoolGradFunctor · 2468057d
  由 minqiyang 提交于 10月 24, 2018
```
test=develop
```
  2468057d
- M
  Fix copy wrong pos bug · 9725db0d
  由 minqiyang 提交于 10月 24, 2018
```
test=develop
```
  9725db0d
- M
  
  Accelerate sequence_pool functor · 9c687090
  由 minqiyang 提交于 10月 24, 2018
  
  9c687090
- M
  
  Add gpu support for unittest · 14ebc424
  由 minqiyang 提交于 10月 24, 2018
  
  14ebc424
- M
  
  Polish unit test code · bd5a82e1
  由 minqiyang 提交于 10月 24, 2018
  
  bd5a82e1
- M
  
  Add unit-test for sequence_pooling functor · 047fa2f9
  由 minqiyang 提交于 10月 24, 2018
  
  047fa2f9
23 10月, 2018 2 次提交
- T
  
  optimize fusion gru kernel at size 8 · 159be8cc
  由 tensor-tang 提交于 10月 23, 2018
  
  159be8cc
- C
  Refine Split op (#13967) · a7497653
  由 chengduo 提交于 10月 23, 2018
```
* speedup split_op
test=develop

* speedup split_op
test=develop

* rename ConcatGrad to Split

* refine concat and split
test=develop

* fix compile error
```
  a7497653
22 10月, 2018 1 次提交
- T
  
  add fusion gru jit kernel · 640e789d
  由 tensor-tang 提交于 10月 22, 2018
  
  640e789d
19 10月, 2018 2 次提交
- T
  
  refine and add eltadd_relu unit test · e5ce9659
  由 tensor-tang 提交于 10月 19, 2018
  
  e5ce9659
- T
  
  fuse elementwise_add and relu · 7cb19a59
  由 tensor-tang 提交于 10月 19, 2018
  
  7cb19a59
18 10月, 2018 3 次提交
- T
  
  clean code exp avx · 74843558
  由 tensor-tang 提交于 10月 18, 2018
  
  74843558
- T
  
  fix illegal instruction of rnn2 · b4751a34
  由 tensor-tang 提交于 10月 18, 2018
  
  b4751a34
- T
  
  fix illegal instruction of rnn1 and text · 36588b33
  由 tensor-tang 提交于 10月 18, 2018
  
  36588b33
17 10月, 2018 3 次提交
- T
  fix warning and mac compile · e69328c3
  由 tensor-tang 提交于 10月 17, 2018
```
test=develop
```
  e69328c3
- N
  Add ceil model pooling for trt (ocr attention) · 2b5edfbc
  由 nhzlx 提交于 10月 17, 2018
```
test=develop
```
  2b5edfbc
- S
  
  test=develop · 4b4af84e
  由 sneaxiy 提交于 10月 16, 2018
  
  4b4af84e
12 10月, 2018 3 次提交
- T
  
  refine and replace lstm peephole kernel · 8e182170
  由 tensor-tang 提交于 10月 12, 2018
  
  8e182170
- D
  optimize depthwise conv by register memory (#13778) · 5f2e8378
  由 Dun 提交于 10月 12, 2018
```
* optimize depthwise conv by register memory
* test=develop
```
  5f2e8378
- M
  Polish code · 3f6ec900
  由 minqiyang 提交于 10月 12, 2018
```
test=develop
```
  3f6ec900
11 10月, 2018 3 次提交
- T
  
  init peephole runtime kernel · 7ef2699e
  由 tensor-tang 提交于 10月 11, 2018
  
  7ef2699e
- M
  Accelerate SequencePool Op on SUM mode · 0385b0a1
  由 minqiyang 提交于 10月 11, 2018
```
test=develop
```
  0385b0a1
- M
  Accelerate SelectedRows Functors: · 8ec748cf
  由 minqiyang 提交于 10月 11, 2018
```
  1. Accelerate SelectedRows MergeAdd functor

  2. Add SelectedRowsSumTo functor to support MergeAdd multiple SelectedRows into one

test=develop
```
  8ec748cf
09 10月, 2018 4 次提交
- T
  thread local jit kernels · 3ee8f2c6
  由 tensor-tang 提交于 10月 09, 2018
```
test=develop
```
  3ee8f2c6
- T
  replace the lstm compute with jitkernel · 9131a356
  由 tensor-tang 提交于 10月 09, 2018
```
test=develop
```
  9131a356
- T
  
  add lstm compute unit test · b55c2476
  由 tensor-tang 提交于 10月 09, 2018
  
  b55c2476
- T
  optimize lstm jitkernel keq8 · 2a009691
  由 tensor-tang 提交于 10月 09, 2018
```
test=develop
```
  2a009691
08 10月, 2018 3 次提交
- T
  add vrelu and lstm kernel · f2adaf1c
  由 tensor-tang 提交于 10月 08, 2018
```
test=develop
```
  f2adaf1c
- T
  
  refine code and fix · e6d8aca3
  由 tensor-tang 提交于 10月 08, 2018
  
  e6d8aca3
- T
  
  fix bug vtanh · 2513b2cc
  由 tensor-tang 提交于 9月 30, 2018
  
  2513b2cc
30 9月, 2018 2 次提交
- T
  
  add vtanh and unit test · cf8c8e72
  由 tensor-tang 提交于 9月 30, 2018
  
  cf8c8e72
- D
  "fix compile error" (#13579) · 26771f41
  由 dzhwinter 提交于 9月 30, 2018
```
* "fix compile error"

* "fix ci"

* rerun ci
test=develop

* test=develop

rerun ci
```
  26771f41
29 9月, 2018 6 次提交
- T
  
  add vaddbias and unit test · d10a9df7
  由 tensor-tang 提交于 9月 29, 2018
  
  d10a9df7
- T
  
  add vsigmoid avx implementations and unit test · 3c8b6511
  由 tensor-tang 提交于 9月 29, 2018
  
  3c8b6511
- T
  
  refine code and init vsigmoid · 55e44761
  由 tensor-tang 提交于 9月 29, 2018
  
  55e44761
- W
  Avoid multiple definitions of lstm_compute_ctht when linking libpaddle_fluid.so · 1940bc2d
  由 wangguibao 提交于 9月 29, 2018
```
test=develop
```
  1940bc2d
- S
  
  fix sparse rmsprop · 584c3f04
  由 sneaxiy 提交于 9月 29, 2018
  
  584c3f04
- D
  Optimization of Kernels that related to DeepLabv3+ (#13534) · 161c3e31
  由 Dun 提交于 9月 29, 2018
```
* refine reduce by cub
* optimize KernelDepthwiseConvFilterGrad
* optimize depthwise conv and reduce mean and reduce sum
* fix bug: dilation
* cuda arch and cuda 8 compatible
```
  161c3e31

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致