提交 · 4d8f39b8530c30a190f354204003153c2a11450d · BaiXuePrincess / Paddle

12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

07 12月, 2017 2 次提交
- Y
  
  Remove DeviceContext::Finish · 6b9567e0
  由 Yang Yu 提交于 12月 07, 2017
  
  6b9567e0
- Y
  Add HasCUDNN to detect if CUDNN is installed or not (#6349) · f291abfc
  由 Yu Yang 提交于 12月 07, 2017
```
* Add HasCUDNN to detect if CUDNN is installed or not

* Fix CI
```
  f291abfc
05 12月, 2017 1 次提交
- Q
  
  fix bug in gpu default memory allocating policy (#6268) · 96a5f96c
  由 QI JUN 提交于 12月 05, 2017
  
  96a5f96c
01 12月, 2017 3 次提交
- Q
  change GPU memory allocating policy (#6159) · d066b07f
  由 QI JUN 提交于 12月 01, 2017
```
* change GPU memory allocating policy

* fix potential overflow bug
```
  d066b07f
- C
  
  code refine (#6164) · e50f3570
  由 chengduo 提交于 12月 01, 2017
  
  e50f3570
- Y
  Fix the proformance problem of enforce (#6085) · 8ac02279
  由 Yu Yang 提交于 12月 01, 2017
```
* Fix Proformance problem of enforce

* Fix missing `;` in code

* Fix CI
```
  8ac02279
29 11月, 2017 1 次提交
- 武
  Fix compile on cudnn7 (#5982) · 4ecbab42
  由武毅提交于 11月 29, 2017
```
* fix compile on cudnn7

* update

* update

* make silent
```
  4ecbab42
28 11月, 2017 2 次提交
- D
  
  Refine paddle/v2/fluid/profiler.py. · 5e7e90ce
  由 dangqingqing 提交于 11月 28, 2017
  
  5e7e90ce
- D
  
  Refine paddle/v2/fluid/profiler.py. · 696b0253
  由 dangqingqing 提交于 11月 28, 2017
  
  696b0253
27 11月, 2017 3 次提交
- D
  
  Add cuda profiler tools and expose it in Python. · 623f62a7
  由 dangqingqing 提交于 11月 27, 2017
  
  623f62a7
- D
  
  Add cuda profiler tools. · 6cf2dcbc
  由 dangqingqing 提交于 11月 27, 2017
  
  6cf2dcbc
- 武
  Conv cudnn 3d (#5783) · a06bec12
  由武毅提交于 11月 27, 2017
```
* conv cudnn 3d

* update test case

* update

* update

* follow comments and remove groups from helper

* update

* refine

* update

* follow comments2

* update

* fix compile
```
  a06bec12
24 11月, 2017 1 次提交

Make enforce target (#5889) · c9172c1c

由 Qiao Longfei 提交于 11月 24, 2017

* make enforce a target and dependent on nccl when gpu is enabled

* add some more dependency

c9172c1c

23 11月, 2017 1 次提交
- Y
  Feature/support int64 for sum (#5832) · c077a6d5
  由 Yu Yang 提交于 11月 23, 2017
```
* Support int64 for sum op

* Refine code
```
  c077a6d5
16 11月, 2017 1 次提交
- D
  "fix accuracy kernel bug" (#5673) · e97b8987
  由 dzhwinter 提交于 11月 15, 2017
```
* "fix accuracy kernel bug"

* "relauch ci"
```
  e97b8987
15 11月, 2017 1 次提交
- C
  
  fix data layout · 74912c7d
  由 chengduoZH 提交于 11月 15, 2017
  
  74912c7d
13 11月, 2017 3 次提交
- C
  
  add cudnn_pool3d unit test · ec1e2fc9
  由 chengduoZH 提交于 11月 13, 2017
  
  ec1e2fc9
- C
  
  add cudnn 3d unit test · a93a59ec
  由 chengduoZH 提交于 11月 13, 2017
  
  a93a59ec
- Y
  
  Fix GPU Compile on Linux · 17405027
  由 Yang Yu 提交于 11月 13, 2017
  
  17405027
11 11月, 2017 2 次提交

D

Use G++ to compile some cu operators. · f5e36765
由 dangqingqing 提交于 11月 11, 2017

f5e36765

Fix a dead lock bug for dyload/nccl.h when nccl lib cannot be loaded (#5533) · 2378679a

由 emailweixu 提交于 11月 10, 2017

It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/

2378679a

08 11月, 2017 2 次提交
- Y
  CompareOp's kernel device type is decided by input tensor place · 3187451a
  由 Yang Yu 提交于 11月 07, 2017
```
CompareOp can run on CPU even other operators are running on GPU, since
opeatations like comparing control flags should be performed only on CPU
```
  3187451a
- Q
  
  Check errors for the cuda kernel calls. (#5436) · 58db07b7
  由 qingqing01 提交于 11月 08, 2017
  
  58db07b7
31 10月, 2017 1 次提交
- Q
  remove unused code (#5219) · afd1e844
  由 QI JUN 提交于 10月 30, 2017
```
* remove unused code

* fix cmake file

* fix build error
```
  afd1e844
26 10月, 2017 1 次提交

Cudnn batch norm op (#5067) · 56b723c4

由 Qiao Longfei 提交于 10月 25, 2017

* init cudnn batch norm op

* rename batch_norm_cudnn_op.cc batch_norm_op.cu

* correct name style

* add ExtractNCWHD, simplify code

* fix ExtractNCWHD

* use CUDNN_ENFORCE instead of PADDLE_ENFORCE

56b723c4

25 10月, 2017 1 次提交
- D
  
  checkin nccl operator · 0990c87b
  由 Dong Zhihong 提交于 10月 24, 2017
  
  0990c87b
24 10月, 2017 2 次提交
- Y
  
  Use external project for NCCL (#5028) · 94e741d6
  由 Yu Yang 提交于 10月 23, 2017
  
  94e741d6
- Y
  Feature/nccl dso (#5001) · 43c6ff21
  由 Yu Yang 提交于 10月 23, 2017
```
* "add nccl enforce"

* Dev

* Update comment

* Add nccl test

* Follow comments
```
  43c6ff21
18 10月, 2017 1 次提交

MatMul operator (#4856) · 16489827

由 Markus Kliegl 提交于 10月 17, 2017

* initial matmul operator

Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.

For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.

16489827

16 10月, 2017 1 次提交
- D
  
  "fix enforce error" · d8aebaf5
  由 Dong Zhihong 提交于 10月 15, 2017
  
  d8aebaf5
15 10月, 2017 1 次提交
- D
  
  "add enforce check" · 54d3dbd8
  由 Dong Zhihong 提交于 10月 14, 2017
  
  54d3dbd8
14 10月, 2017 1 次提交
- D
  
  "nccl add interface" · d1443104
  由 Dong Zhihong 提交于 10月 13, 2017
  
  d1443104
12 10月, 2017 1 次提交

武

Cudnn conv op (#4195) · a3ccbdb3

由武毅提交于 10月 12, 2017

* add cudnn_conv_op

* WIP

* update

* update

* fix grad check

* use platform::memory

* add support group for cudnn

* update

* follow comments

* fix onlycpu build

* update cuda define

* follow comments

* follow comments

* merge with updates

* fix compile error

* follow comments

* follow comments

a3ccbdb3

10 10月, 2017 2 次提交
- L
  
  remove unused PADDLE_ONLY_CPU comment · 871a3f6e
  由 Luo Tao 提交于 10月 10, 2017
  
  871a3f6e
- Y
  
  clean up for review · e5155713
  由 Yang Yang 提交于 10月 09, 2017
  
  e5155713
07 10月, 2017 1 次提交
- Q
  
  fix executor gpu unittest · 1f5192a2
  由 qijun 提交于 10月 06, 2017
  
  1f5192a2
05 10月, 2017 3 次提交
- Y
  
  Rename platform::GetDeviceCount into platform::GetCUDADeviceCount · 2b204f04
  由 Yi Wang 提交于 10月 04, 2017
  
  2b204f04
- Q
  
  fix gpu build error · fe10e86d
  由 qijun 提交于 10月 04, 2017
  
  fe10e86d
- Y
  
  Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU · 4558807c
  由 Yi Wang 提交于 10月 04, 2017
  
  4558807c

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致