提交 · 1b0c7d7c7afd4c5f09eb43358dd4851ba1735c3f · BaiXuePrincess / Paddle

15 12月, 2017 5 次提交
- Y
  
  Simplize system_allocator and fix GPU_INFO (#6653) · 1b0c7d7c
  由 Yu Yang 提交于 12月 15, 2017
  
  1b0c7d7c
- Y
  
  Fix compile on CUDA9.1 & MacOS (#6642) · d5cab4f0
  由 Yu Yang 提交于 12月 15, 2017
  
  d5cab4f0
- T
  
  fix place_test on MKLDNNPlace · bf269d67
  由 tensor-tang 提交于 12月 15, 2017
  
  bf269d67
- T
  
  fix conflict of Place · a92f057e
  由 tensor-tang 提交于 12月 15, 2017
  
  a92f057e
- T
  
  fix undefined issue when with_gpu · f2712105
  由 tensor-tang 提交于 12月 15, 2017
  
  f2712105
14 12月, 2017 2 次提交
- T
  
  add MKLDNNPlace · e0c33176
  由 tensor-tang 提交于 12月 14, 2017
  
  e0c33176
- D
  "derived cudnnDevice context" (#6585) · 0e9b393b
  由 dzhwinter 提交于 12月 14, 2017
```
* "derived cudnnDevice context"

* "leave remove cudnn handle from CUDADeviceContext"

* "fix math function error"
```
  0e9b393b
12 12月, 2017 1 次提交

Refine device context (#6433) · 61ec0b95

由 QI JUN 提交于 12月 12, 2017

There are mainly following fixes:

- take `DeviceContext` as the template parameter of math functors and OpKernel instead of `Place`
- remove `eigen_device` interface in base class  `DeviceContext`
- remove `GetEigenDevice` interface in `ExecutionContext` and base class `DeviceContext`
- remove unused `platform::EigenDeviceConverter`
- rename `REGISTER_OP_GPU_KERNEL` to `REGISTER_OP_CUDA_KERNEL`
- rename `USE_GPU_ONLY_OP` to `USE_CUDA_ONLY_OP`

61ec0b95

07 12月, 2017 2 次提交
- Y
  
  Remove DeviceContext::Finish · 6b9567e0
  由 Yang Yu 提交于 12月 07, 2017
  
  6b9567e0
- Y
  Add HasCUDNN to detect if CUDNN is installed or not (#6349) · f291abfc
  由 Yu Yang 提交于 12月 07, 2017
```
* Add HasCUDNN to detect if CUDNN is installed or not

* Fix CI
```
  f291abfc
05 12月, 2017 1 次提交
- Q
  
  fix bug in gpu default memory allocating policy (#6268) · 96a5f96c
  由 QI JUN 提交于 12月 05, 2017
  
  96a5f96c
01 12月, 2017 3 次提交
- Q
  change GPU memory allocating policy (#6159) · d066b07f
  由 QI JUN 提交于 12月 01, 2017
```
* change GPU memory allocating policy

* fix potential overflow bug
```
  d066b07f
- C
  
  code refine (#6164) · e50f3570
  由 chengduo 提交于 12月 01, 2017
  
  e50f3570
- Y
  Fix the proformance problem of enforce (#6085) · 8ac02279
  由 Yu Yang 提交于 12月 01, 2017
```
* Fix Proformance problem of enforce

* Fix missing `;` in code

* Fix CI
```
  8ac02279
29 11月, 2017 1 次提交
- 武
  Fix compile on cudnn7 (#5982) · 4ecbab42
  由武毅提交于 11月 29, 2017
```
* fix compile on cudnn7

* update

* update

* make silent
```
  4ecbab42
28 11月, 2017 2 次提交
- D
  
  Refine paddle/v2/fluid/profiler.py. · 5e7e90ce
  由 dangqingqing 提交于 11月 28, 2017
  
  5e7e90ce
- D
  
  Refine paddle/v2/fluid/profiler.py. · 696b0253
  由 dangqingqing 提交于 11月 28, 2017
  
  696b0253
27 11月, 2017 3 次提交
- D
  
  Add cuda profiler tools and expose it in Python. · 623f62a7
  由 dangqingqing 提交于 11月 27, 2017
  
  623f62a7
- D
  
  Add cuda profiler tools. · 6cf2dcbc
  由 dangqingqing 提交于 11月 27, 2017
  
  6cf2dcbc
- 武
  Conv cudnn 3d (#5783) · a06bec12
  由武毅提交于 11月 27, 2017
```
* conv cudnn 3d

* update test case

* update

* update

* follow comments and remove groups from helper

* update

* refine

* update

* follow comments2

* update

* fix compile
```
  a06bec12
24 11月, 2017 1 次提交

Make enforce target (#5889) · c9172c1c

由 Qiao Longfei 提交于 11月 24, 2017

* make enforce a target and dependent on nccl when gpu is enabled

* add some more dependency

c9172c1c

23 11月, 2017 1 次提交
- Y
  Feature/support int64 for sum (#5832) · c077a6d5
  由 Yu Yang 提交于 11月 23, 2017
```
* Support int64 for sum op

* Refine code
```
  c077a6d5
16 11月, 2017 1 次提交
- D
  "fix accuracy kernel bug" (#5673) · e97b8987
  由 dzhwinter 提交于 11月 15, 2017
```
* "fix accuracy kernel bug"

* "relauch ci"
```
  e97b8987
15 11月, 2017 1 次提交
- C
  
  fix data layout · 74912c7d
  由 chengduoZH 提交于 11月 15, 2017
  
  74912c7d
13 11月, 2017 3 次提交
- C
  
  add cudnn_pool3d unit test · ec1e2fc9
  由 chengduoZH 提交于 11月 13, 2017
  
  ec1e2fc9
- C
  
  add cudnn 3d unit test · a93a59ec
  由 chengduoZH 提交于 11月 13, 2017
  
  a93a59ec
- Y
  
  Fix GPU Compile on Linux · 17405027
  由 Yang Yu 提交于 11月 13, 2017
  
  17405027
11 11月, 2017 2 次提交

D

Use G++ to compile some cu operators. · f5e36765
由 dangqingqing 提交于 11月 11, 2017

f5e36765

Fix a dead lock bug for dyload/nccl.h when nccl lib cannot be loaded (#5533) · 2378679a

由 emailweixu 提交于 11月 10, 2017

It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/

2378679a

08 11月, 2017 2 次提交
- Y
  CompareOp's kernel device type is decided by input tensor place · 3187451a
  由 Yang Yu 提交于 11月 07, 2017
```
CompareOp can run on CPU even other operators are running on GPU, since
opeatations like comparing control flags should be performed only on CPU
```
  3187451a
- Q
  
  Check errors for the cuda kernel calls. (#5436) · 58db07b7
  由 qingqing01 提交于 11月 08, 2017
  
  58db07b7
31 10月, 2017 1 次提交
- Q
  remove unused code (#5219) · afd1e844
  由 QI JUN 提交于 10月 30, 2017
```
* remove unused code

* fix cmake file

* fix build error
```
  afd1e844
26 10月, 2017 1 次提交

Cudnn batch norm op (#5067) · 56b723c4

由 Qiao Longfei 提交于 10月 25, 2017

* init cudnn batch norm op

* rename batch_norm_cudnn_op.cc batch_norm_op.cu

* correct name style

* add ExtractNCWHD, simplify code

* fix ExtractNCWHD

* use CUDNN_ENFORCE instead of PADDLE_ENFORCE

56b723c4

25 10月, 2017 1 次提交
- D
  
  checkin nccl operator · 0990c87b
  由 Dong Zhihong 提交于 10月 24, 2017
  
  0990c87b
24 10月, 2017 2 次提交
- Y
  
  Use external project for NCCL (#5028) · 94e741d6
  由 Yu Yang 提交于 10月 23, 2017
  
  94e741d6
- Y
  Feature/nccl dso (#5001) · 43c6ff21
  由 Yu Yang 提交于 10月 23, 2017
```
* "add nccl enforce"

* Dev

* Update comment

* Add nccl test

* Follow comments
```
  43c6ff21
18 10月, 2017 1 次提交

MatMul operator (#4856) · 16489827

由 Markus Kliegl 提交于 10月 17, 2017

* initial matmul operator

Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.

For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.

16489827

16 10月, 2017 1 次提交
- D
  
  "fix enforce error" · d8aebaf5
  由 Dong Zhihong 提交于 10月 15, 2017
  
  d8aebaf5
15 10月, 2017 1 次提交
- D
  
  "add enforce check" · 54d3dbd8
  由 Dong Zhihong 提交于 10月 14, 2017
  
  54d3dbd8
14 10月, 2017 1 次提交
- D
  
  "nccl add interface" · d1443104
  由 Dong Zhihong 提交于 10月 13, 2017
  
  d1443104

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致