提交 · dfbdece55cadb88f1eeeed7c8159a3b4f37f0c0d · BaiXuePrincess / Paddle

22 11月, 2018 3 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

D
Group Norm (#13843) · ae7d2286
由 Dun 提交于 11月 22, 2018
```
Add group normalization operator.
```
ae7d2286

Windows/online (#14474) · d9a1f3e5

由 wopeizl 提交于 11月 22, 2018

* add recordio support

* disable the openblas multi-thread on windows since no support
adjust the python script

* code style

* code style
test=develop

* add create_recordio_file_reader back

* fix code style
test=develop

* fix the gtest.cmake on windows

* fix cc_test on windows

* fix the win build
test=develop

* remove fused compile support on windows
test=develop

* add the jit support
test=develop

* add the jit support, test=develop

* add the jit support, test=develop

* add the jit back
fix compile error on windows

* rollback test=develop

* test case fix

* disable DSO by default on windows

* exclude warpctc_op on windows

* exclude the dynload_warpctc out on windows
test=develop

* fix the scripts error
test=develop

* disable avx on windows by default
test=develop

* re-organize the cmake file

* disable mkl on windows by default

* add warp_ctc back

* fix the dependency

* fix the dependency

* fix the build issue on windows

* remove unsupported flag on windows

* code style

* code style
test=develop

* fix issue

* add profiler, parallel_executor back

* clean up the pre-definitions on windows

* fix build issue

* test=develop

d9a1f3e5

21 11月, 2018 2 次提交
- Y
  fix(Compile): fix depends error when compile op using cub · 3edd32d0
  由 Yu Yang 提交于 11月 21, 2018
```
some operators depend on cub and xxhash by header. The dependency should be declared explicitly rather than declared to pybind.

test=develop
```
  3edd32d0
- D
  Fix compling with cuDNN v5 · cda60311
  由 Dang Qingqing 提交于 11月 20, 2018
```
test=develop
```
  cda60311
20 11月, 2018 3 次提交
- Y
  
  Add the macro for NVCC (test=develop) · a906a361
  由 Yihua Xu 提交于 11月 20, 2018
  
  a906a361
- Y
  Revert "Remove the remnant code (test=develop)" · d91740ac
  由 Yihua Xu 提交于 11月 20, 2018
```
This reverts commit be506703.
```
  d91740ac
- Y
  
  Remove the remnant code (test=develop) · be506703
  由 Yihua Xu 提交于 11月 20, 2018
  
  be506703
19 11月, 2018 4 次提交

Q
Modify some infer-shape about detection operators in compile-time. (#14483) · 9eefd2c7
由 qingqing01 提交于 11月 19, 2018
```
* Modify some infer-shape in compile-time.
```
9eefd2c7

Optimize the layer_norm operator with AVX intrinsic function (#14417) · f4c869d8

由 Yihua Xu 提交于 11月 19, 2018

* Optimize layer_norm operator with AVX intrinsic functions

* Revert the wrong modifications

* Implement the jit kernel for layer_norm operator

* Add math headfile to fix the compile issue (test=develop)

* Add math headfile to fix the compile issue (test=develop)

* Fixed the intrinsic headfile issue (test=develop)

* Fix the conflicts (test=develop)

* Revert for CUDA compiler (test=develop)

* Fixed the cuda depency (test=develop)

* Fix the marco issues (test=develop)

f4c869d8

Q
Convolution fusion operator. (#14449) · fd7e6431
由 qingqing01 提交于 11月 19, 2018
```
* Convolution fusion operator.
* Clean code
test=develop
```
fd7e6431

fix dist deps (#14471) · d7bd0361

由 Wu Yi 提交于 11月 19, 2018

* fix dist deps test=develop

* update test=develop

* update test=develop

* update test=develop

* update test=develop

d7bd0361

18 11月, 2018 1 次提交
- J
  - Removing partial specialization of sotmax for inference for GPU · 9b0eae30
  由 Jacek Czaja 提交于 11月 18, 2018
```
test=develop
```
  9b0eae30
17 11月, 2018 4 次提交
- T
  fix jitcode small size · a19b3225
  由 tensor-tang 提交于 11月 17, 2018
```
test=develop
```
  a19b3225
- J
  - Fix to GPU · be80bb4f
  由 Jacek Czaja 提交于 11月 16, 2018
```
test=develop
```
  be80bb4f
- T
  sigmoid and tanh support all size · 4dbdfa60
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  4dbdfa60
- T
  refine exp jitcode with all size · ccb89637
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  ccb89637
16 11月, 2018 23 次提交
- T
  
  refine relu and fix addrelu test · d3eae8f6
  由 tensor-tang 提交于 11月 16, 2018
  
  d3eae8f6
- T
  
  refine act and vxx with all size · 4e67fe6a
  由 tensor-tang 提交于 11月 16, 2018
  
  4e67fe6a
- T
  
  exp support all size · ba3eaed7
  由 tensor-tang 提交于 11月 16, 2018
  
  ba3eaed7
- T
  fix build error on noavx · 1ffce8c0
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  1ffce8c0
- M
  MKLDNN elementwise_mul: Move Kernel to KernelPool to avoid segfaults · c69c4160
  由 Michal Gallus 提交于 11月 15, 2018
```
test=develop
```
  c69c4160
- M
  MKLDNN elementwise_mul: Check if AVX512 is available · 785066eb
  由 Michal Gallus 提交于 11月 13, 2018
```
test=develop
```
  785066eb
- M
  MKLDNN elementwise_mul: Lint changes to UT & integration · 08f63c4d
  由 Michal Gallus 提交于 11月 13, 2018
```
test=develop
```
  08f63c4d
- M
  MKLDNN elementwise_mul: Reorder on non-nchw input, fallback on non-16 divisable fm · 49b09327
  由 Michal Gallus 提交于 11月 09, 2018
```
test=develop
```
  49b09327
- M
  
  MKLDNN elementwise_mul: Parallelize mul · d14858e4
  由 Michal Gallus 提交于 11月 06, 2018
  
  d14858e4
- M
  
  MKLDNN elementwise_mul: Support NCHW, update UT · ed31936b
  由 Michal Gallus 提交于 11月 06, 2018
  
  ed31936b
- T
  
  MKLDNN elementwise_mul: h and w loops implemented in xbyak · 700bcbf7
  由 Tomasz Patejko 提交于 10月 28, 2018
  
  700bcbf7
- T
  
  MKLDNN elementwise_mul: CPU tests initially refactored. MKLDNN mul test for broadcast added · ad09faca
  由 Tomasz Patejko 提交于 10月 26, 2018
  
  ad09faca
- T
  
  MKLDNN elementwise_mul: simple xbyak version for AVX512 · 2d73ad18
  由 Tomasz Patejko 提交于 10月 25, 2018
  
  2d73ad18
- T
  
  MKLDNN elementwise_add: simple initial implementation of the operator for MKLDNN format · 213ec37d
  由 Tomasz Patejko 提交于 10月 25, 2018
  
  213ec37d
- W
  Refine operator cmake (#14413) · a2d9b344
  由 Wu Yi 提交于 11月 16, 2018
```
* wip simplify operator framework

* wip

* wip

* done test=develop

* clean test=develop

* fix test=develop

* fix deps test=develop

* fix cpu build test=develop

* fix tensorrt build test=develop

* fix tests test=develop

* fix test=develop

* fix cpu build test=develop
```
  a2d9b344
- J
  fix space_to_depth_op unicode problem (#14430) · 28bd5b7b
  由 Jiabin Yang 提交于 11月 16, 2018
```
* fix space_to_depth_op unicode problem

* test=develop
```
  28bd5b7b
- J
  Squashing MKL based softmax for inference · 513bb6c1
  由 Jacek Czaja 提交于 11月 08, 2018
```
test=develop

- Added profiling to softmax functors

- MKL based softmax inference op

- Fix to softmax compuation via MKL

- cleaning

- Cosmetic fixes to softmax MKL

- Fix to ON_INFER lack of propagation
```
  513bb6c1
- N
  add macro for pool2dDirectCUDAFunctor · 9b64aac4
  由 nhzlx 提交于 11月 16, 2018
```
test=develop
```
  9b64aac4
- W
  Make nce support more distribution. (#13549) · 17226782
  由 whs 提交于 11月 16, 2018
```
* Fix truncated normal.

* Fix.

* Make nce support more distribution.

* Fix API.spec.

* Fix python API.

* Fix.
test=develop

* Fix API.spec
test=develop

* Fix sampler.

* Fix order of arguments in python API.
test=develop
```
  17226782
- N
  
  fxi avg pool trt bug and fix cpplint · b9691169
  由 nhzlx 提交于 11月 16, 2018
  
  b9691169
- T
  exp, sigmoid, tanh jitcode support more size · 1f00723f
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  1f00723f
- W
  Add cudnn ctc loss (#12366) · b32c13dc
  由 Wu Yi 提交于 11月 16, 2018
```
* add cudnn ctc loss

* wip add test test=develop

* wip

* wip

* done test=develop

* move include cudnn test=develop

* test test=develop

* fix build test=develop

* fix build test=develop

* fix build on cudnn5 test=develop

* fix cudnn5 build test=develop

* fix cudnn5 build test=develop

* merge develop softmax functor change test=develop
```
  b32c13dc
- T
  remove ComputeDeprecated · e2d6eddd
  由 tensor-tang 提交于 11月 16, 2018
```
test=develop
```
  e2d6eddd

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致