提交 · 87648f8edfdeb13a0605dc878e600730ae2ef135 · 兽拳 / Paddle

27 11月, 2018 4 次提交

J

polish code, test=develop · c3c3c0b3
由 JiabinYang 提交于 11月 27, 2018

c3c3c0b3

Make NCE_OP more efficient and support SelectedRows (#14469) · 56a4912b

由 tangwei12 提交于 11月 27, 2018

* Fix truncated normal.

* Fix.

* Make nce support more distribution.

* Fix API.spec.

* Fix python API.

* Fix.
test=develop

* Fix API.spec
test=develop

* Fix sampler.

* Fix order of arguments in python API.
test=develop

* NCE add selectedrows support

* NCE update weighted sampling

* fix bugs in nce_op, and assign_value_op optimized

* fix bugs in nce_op, revert assign_value_op

* nce_op optimize

* nce_op optimize

* nce_op optimize

* add selectedRows test later

test=develop

* add selectedRows supported

* add selectedRows supported

test=develop

* add selectedRows supported

* add nce selectedRows supported, test=develop

* add nce selectedRows supported

* add nce selectedRows supported, test=develop

* fix height in nce, test=develop

* add ut

* add ut, test=develop

* make AutoGrownIndex inline
test=develop

* fix tinny error, test=develop

56a4912b

P

minor fix · 38715e6f
由 peizhilin 提交于 11月 27, 2018

38715e6f
J

refine code and add none bias ut, test=develop · b10df8bc
由 JiabinYang 提交于 11月 27, 2018

b10df8bc

26 11月, 2018 3 次提交
- J
  
  refine code and comments, test=develop · 2f6b529a
  由 JiabinYang 提交于 11月 26, 2018
  
  2f6b529a
- T
  add comments and follow comments · 1f0291a5
  由 tensor-tang 提交于 11月 26, 2018
```
test=develop
```
  1f0291a5
- J
  
  add sparsed bias grad, test=develop · 02d68051
  由 JiabinYang 提交于 11月 26, 2018
  
  02d68051
23 11月, 2018 4 次提交
- L
  
  add Set/GetCPUNumThreads api · e21edb26
  由 luotao1 提交于 11月 22, 2018
  
  e21edb26
- J
  
  test=develop · 42470f14
  由 JiabinYang 提交于 11月 23, 2018
  
  42470f14
- T
  enable gru jitcode and refine act and lstm jitcode · 6a7f83d4
  由 tensor-tang 提交于 11月 23, 2018
```
test=develop
```
  6a7f83d4
- J
  
  temp · 0fca1684
  由 JiabinYang 提交于 11月 23, 2018
  
  0fca1684
22 11月, 2018 5 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

P

fix unit test cases · 7c8c9dc9
由 peizhilin 提交于 11月 22, 2018

7c8c9dc9
T
enable peephole jitcode · 0c5ed5f6
由 tensor-tang 提交于 11月 22, 2018
```
test=develop
```
0c5ed5f6
T
init gru jitcode and fix lstm jitcode · e3b61cf5
由 tensor-tang 提交于 11月 22, 2018
```
test=develop
```
e3b61cf5

Windows/online (#14474) · d9a1f3e5

由 wopeizl 提交于 11月 22, 2018

* add recordio support

* disable the openblas multi-thread on windows since no support
adjust the python script

* code style

* code style
test=develop

* add create_recordio_file_reader back

* fix code style
test=develop

* fix the gtest.cmake on windows

* fix cc_test on windows

* fix the win build
test=develop

* remove fused compile support on windows
test=develop

* add the jit support
test=develop

* add the jit support, test=develop

* add the jit support, test=develop

* add the jit back
fix compile error on windows

* rollback test=develop

* test case fix

* disable DSO by default on windows

* exclude warpctc_op on windows

* exclude the dynload_warpctc out on windows
test=develop

* fix the scripts error
test=develop

* disable avx on windows by default
test=develop

* re-organize the cmake file

* disable mkl on windows by default

* add warp_ctc back

* fix the dependency

* fix the dependency

* fix the build issue on windows

* remove unsupported flag on windows

* code style

* code style
test=develop

* fix issue

* add profiler, parallel_executor back

* clean up the pre-definitions on windows

* fix build issue

* test=develop

d9a1f3e5

21 11月, 2018 3 次提交
- T
  add gru refer code and remove redundant avx code · 35620513
  由 tensor-tang 提交于 11月 21, 2018
```
test=develop
```
  35620513
- T
  jitkernel lstm refer support peephole · f9138608
  由 tensor-tang 提交于 11月 21, 2018
```
test=develop
```
  f9138608
- J
  
  test=develop · 014e50c2
  由 JiabinYang 提交于 11月 21, 2018
  
  014e50c2
20 11月, 2018 2 次提交
- T
  refine refer code and add lstm refer code · ce31deb7
  由 tensor-tang 提交于 11月 20, 2018
```
test=develop
```
  ce31deb7
- T
  
  add lstm jitcode · c2cfb03a
  由 tensor-tang 提交于 11月 20, 2018
  
  c2cfb03a
19 11月, 2018 1 次提交

Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8

由 Yihua Xu 提交于 11月 19, 2018

* Optimize layer_norm operator with AVX intrinsic functions

* Revert the wrong modifications

* Implement the jit kernel for layer_norm operator

* Add math headfile to fix the compile issue (test=develop)

* Add math headfile to fix the compile issue (test=develop)

* Fixed the intrinsic headfile issue (test=develop)

* Fix the conflicts (test=develop)

* Revert for CUDA compiler (test=develop)

* Fixed the cuda depency (test=develop)

* Fix the marco issues (test=develop)

f4c869d8

18 11月, 2018 3 次提交
- J
  - Removing partial specialization of sotmax for inference for GPU · 9b0eae30
  由 Jacek Czaja 提交于 11月 18, 2018
```
test=develop
```
  9b0eae30
- P
  
  rollback test=develop · a1fa1854
  由 peizhilin 提交于 11月 18, 2018
  
  a1fa1854
- P
  add the jit back · a3e952f4
  由 peizhilin 提交于 11月 18, 2018
```
fix compile error on windows
```
  a3e952f4
17 11月, 2018 6 次提交
- T
  fix jitcode small size · a19b3225
  由 tensor-tang 提交于 11月 17, 2018
```
test=develop
```
  a19b3225
- P
  
  add the jit support, test=develop · 928efeed
  由 peizhilin 提交于 11月 17, 2018
  
  928efeed
- P
  
  add the jit support, test=develop · 5e46c983
  由 peizhilin 提交于 11月 17, 2018
  
  5e46c983
- P
  add the jit support · c75dc885
  由 peizhilin 提交于 11月 17, 2018
```
test=develop
```
  c75dc885
- T
  sigmoid and tanh support all size · 4dbdfa60
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  4dbdfa60
- T
  refine exp jitcode with all size · ccb89637
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  ccb89637
16 11月, 2018 9 次提交
- T
  
  refine relu and fix addrelu test · d3eae8f6
  由 tensor-tang 提交于 11月 16, 2018
  
  d3eae8f6
- T
  
  refine act and vxx with all size · 4e67fe6a
  由 tensor-tang 提交于 11月 16, 2018
  
  4e67fe6a
- T
  
  exp support all size · ba3eaed7
  由 tensor-tang 提交于 11月 16, 2018
  
  ba3eaed7
- T
  fix build error on noavx · 1ffce8c0
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  1ffce8c0
- M
  MKLDNN elementwise_mul: Move Kernel to KernelPool to avoid segfaults · c69c4160
  由 Michal Gallus 提交于 11月 15, 2018
```
test=develop
```
  c69c4160
- J
  Squashing MKL based softmax for inference · 513bb6c1
  由 Jacek Czaja 提交于 11月 08, 2018
```
test=develop

- Added profiling to softmax functors

- MKL based softmax inference op

- Fix to softmax compuation via MKL

- cleaning

- Cosmetic fixes to softmax MKL

- Fix to ON_INFER lack of propagation
```
  513bb6c1
- N
  add macro for pool2dDirectCUDAFunctor · 9b64aac4
  由 nhzlx 提交于 11月 16, 2018
```
test=develop
```
  9b64aac4
- W
  Make nce support more distribution. (#13549) · 17226782
  由 whs 提交于 11月 16, 2018
```
* Fix truncated normal.

* Fix.

* Make nce support more distribution.

* Fix API.spec.

* Fix python API.

* Fix.
test=develop

* Fix API.spec
test=develop

* Fix sampler.

* Fix order of arguments in python API.
test=develop
```
  17226782
- N
  
  fxi avg pool trt bug and fix cpplint · b9691169
  由 nhzlx 提交于 11月 16, 2018
  
  b9691169

兽拳 / Paddle 与 Fork 源项目一致

兽拳 / Paddle
与 Fork 源项目一致