提交 · ef129718eac8758124605b674af07a040c235119 · BaiXuePrincess / Paddle

14 1月, 2018 1 次提交

"cudnn operators change to cudnn kernel" (#6660) · 5ad1aef0

由 dzhwinter 提交于 1月 14, 2018

* "unified operators"

* "add CUDNN register"

* "add use cudnn attribute"

* "add attribute"

* "test conv tranpose op"

* "remove duplicated attr"

* "fix op test"

* "add attribute to set cudnn"

* "add more log"

* "need layout op register support"

* "add more log"

* "change GetExpectedKernelType "

* "fix Get attr in conv_op"

* "fix CI"

* "fix tests"

* "removed kernel priority fallback"

* "fix CI"

* "fix stack pointer bug"

* "refine buggy interface"

* "add const cast to save life"

* "fix get_output_with_grad"

* "fix op test with dataformat"

* ""fix pooling

* "fix pooling test"

* "fix CI"

* "fix with_gpu error"

* "add transform needed functional check"

* "fix unpack list error"

* "comment out parallel.do temporary"

* "fix CI"

* "fix compile doc error"

* "make threshold larger"

5ad1aef0

09 1月, 2018 1 次提交

Port WarpCTC Operator (#5107) · b5fda272

由 Yiqun Liu 提交于 1月 09, 2018

* Add Seq2BatchFunctor, which will be used in WarpCTCOp.

* Implement WrapCTCFunctor and WrapCTCKernel.

* Add unittest of warpctc_op.

* Modify the check_output inferface in python unittest framework to allow check a subset of outputs.

* Use absolute offset lod in warpctc_op and related functors.

* Refine the comments of warpctc_op.

* The new python unittest supports checking a subset of the outputs, so revoke the previous change.

* Rename the transform from LoDTensor to Tensor with shape [max_sequence_length, num_sequences, sequence_width] to PaddingSequenceFunctor.

* Update to the newest codes.

* Rename the PaddingSequenceFunctor to PaddingLoDTensorFunctor and remove the computation of dimensions out of the functos.

b5fda272

26 12月, 2017 1 次提交
- L
  
  unify the indentation of license · 761b3297
  由 Luo Tao 提交于 12月 26, 2017
  
  761b3297
24 12月, 2017 1 次提交

Feature/operator run place (#6783) · 735eba29

由 dzhwinter 提交于 12月 24, 2017

* "change operator interface"

* "move devicepool to device_context"

* "fix operator test"

* "fix op_registry Run interface"

* "net op passed. Need to fix nccl multi-Context"

* "add nccl group function"

* "add nccl group function"

* "fix gpu count exceed 32 error"

* "fix recurrent op, nccl op"

* "change the other operators interface with Place"

* "fix typo"

* "fix pybind"

* "fix device in python side"

* "fix pybind failed"

* "add init for test"

* "fix CI"

735eba29

15 12月, 2017 1 次提交
- Y
  
  Fix compile on CUDA9.1 & MacOS (#6642) · d5cab4f0
  由 Yu Yang 提交于 12月 15, 2017
  
  d5cab4f0
07 12月, 2017 1 次提交
- Y
  Add HasCUDNN to detect if CUDNN is installed or not (#6349) · f291abfc
  由 Yu Yang 提交于 12月 07, 2017
```
* Add HasCUDNN to detect if CUDNN is installed or not

* Fix CI
```
  f291abfc
29 11月, 2017 1 次提交
- 武
  Fix compile on cudnn7 (#5982) · 4ecbab42
  由武毅提交于 11月 29, 2017
```
* fix compile on cudnn7

* update

* update

* make silent
```
  4ecbab42
24 11月, 2017 1 次提交

Make enforce target (#5889) · c9172c1c

由 Qiao Longfei 提交于 11月 24, 2017

* make enforce a target and dependent on nccl when gpu is enabled

* add some more dependency

c9172c1c

11 11月, 2017 2 次提交

D

Use G++ to compile some cu operators. · f5e36765
由 dangqingqing 提交于 11月 11, 2017

f5e36765

Fix a dead lock bug for dyload/nccl.h when nccl lib cannot be loaded (#5533) · 2378679a

由 emailweixu 提交于 11月 10, 2017

It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/

2378679a

26 10月, 2017 1 次提交

Cudnn batch norm op (#5067) · 56b723c4

由 Qiao Longfei 提交于 10月 25, 2017

* init cudnn batch norm op

* rename batch_norm_cudnn_op.cc batch_norm_op.cu

* correct name style

* add ExtractNCWHD, simplify code

* fix ExtractNCWHD

* use CUDNN_ENFORCE instead of PADDLE_ENFORCE

56b723c4

24 10月, 2017 2 次提交
- Y
  
  Use external project for NCCL (#5028) · 94e741d6
  由 Yu Yang 提交于 10月 23, 2017
  
  94e741d6
- Y
  Feature/nccl dso (#5001) · 43c6ff21
  由 Yu Yang 提交于 10月 23, 2017
```
* "add nccl enforce"

* Dev

* Update comment

* Add nccl test

* Follow comments
```
  43c6ff21
18 10月, 2017 1 次提交

MatMul operator (#4856) · 16489827

由 Markus Kliegl 提交于 10月 17, 2017

* initial matmul operator

Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.

For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.

16489827

16 10月, 2017 1 次提交
- D
  
  "fix enforce error" · d8aebaf5
  由 Dong Zhihong 提交于 10月 15, 2017
  
  d8aebaf5
15 10月, 2017 1 次提交
- D
  
  "add enforce check" · 54d3dbd8
  由 Dong Zhihong 提交于 10月 14, 2017
  
  54d3dbd8
31 8月, 2017 1 次提交
- D
  
  Add unit testing for cuDNN wrapper. · 20713222
  由 dangqingqing 提交于 8月 31, 2017
  
  20713222
10 8月, 2017 4 次提交
- Y
  
  Add curandGenerateNormal to curand.h · d2995288
  由 Yu Yang 提交于 8月 10, 2017
  
  d2995288
- Q
  
  format code · 688c43b1
  由 qijun 提交于 8月 10, 2017
  
  688c43b1
- Y
  Fix gaussian_random_op compile error · 45911102
  由 Yu Yang 提交于 8月 10, 2017
```
* Should always use `dynload::` for cuda function.
* Fix cublas.h without DSO load.
```
  45911102
- Q
  
  fix bug in dynload · 5f1081d8
  由 qijun 提交于 8月 10, 2017
  
  5f1081d8
04 8月, 2017 1 次提交
- L
  
  Add cpplint for *.h and cuda *.cu · b58725bd
  由 liaogang 提交于 8月 04, 2017
  
  b58725bd
15 7月, 2017 1 次提交
- L
  
  ENH: unify PADDLE_ENFORCE · f812de2c
  由 liaogang 提交于 7月 15, 2017
  
  f812de2c
13 7月, 2017 1 次提交
- Q
  
  fix bug in dynload · 4e918377
  由 qijun 提交于 7月 13, 2017
  
  4e918377
12 7月, 2017 1 次提交
- Q
  
  split device_context · 14d2c399
  由 qijun 提交于 7月 12, 2017
  
  14d2c399
11 7月, 2017 2 次提交
- Q
  
  fix cublas dynload bug · 69d76812
  由 qijun 提交于 7月 11, 2017
  
  69d76812
- Y
  
  Refine CUDA Related libraries · a0466053
  由 Yu Yang 提交于 7月 11, 2017
  
  a0466053
04 7月, 2017 3 次提交
- Q
  
  fix wrong including header-file in files in paddle/platform/dynload dir · e6fcdd47
  由 qijun 提交于 7月 04, 2017
  
  e6fcdd47
- L
  
  Delete cmake in dynload · 379434b2
  由 liaogang 提交于 7月 04, 2017
  
  379434b2
- Q
  
  move to dynload directory · 3567ea6d
  由 qijun 提交于 7月 04, 2017
  
  3567ea6d

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致