提交 · 6ac42378ac410c4cdc983275c1d0254e42132bce · 机器未来 / Paddle

16 11月, 2017 1 次提交
- D
  "fix accuracy kernel bug" (#5673) · e97b8987
  由 dzhwinter 提交于 11月 15, 2017
```
* "fix accuracy kernel bug"

* "relauch ci"
```
  e97b8987
15 11月, 2017 1 次提交
- C
  
  fix data layout · 74912c7d
  由 chengduoZH 提交于 11月 15, 2017
  
  74912c7d
13 11月, 2017 3 次提交
- C
  
  add cudnn_pool3d unit test · ec1e2fc9
  由 chengduoZH 提交于 11月 13, 2017
  
  ec1e2fc9
- C
  
  add cudnn 3d unit test · a93a59ec
  由 chengduoZH 提交于 11月 13, 2017
  
  a93a59ec
- Y
  
  Fix GPU Compile on Linux · 17405027
  由 Yang Yu 提交于 11月 13, 2017
  
  17405027
11 11月, 2017 2 次提交

D

Use G++ to compile some cu operators. · f5e36765
由 dangqingqing 提交于 11月 11, 2017

f5e36765

Fix a dead lock bug for dyload/nccl.h when nccl lib cannot be loaded (#5533) · 2378679a

由 emailweixu 提交于 11月 10, 2017

It caused by a bug of std::call_once described in https://stackoverflow.com/questions/41717579/stdcall-once-hangs-on-second-call-after-callable-threw-on-first-call. It is likely caused by a deeper bug of pthread_once, which is discussed in https://patchwork.ozlabs.org/patch/482350/

2378679a

08 11月, 2017 2 次提交
- Y
  CompareOp's kernel device type is decided by input tensor place · 3187451a
  由 Yang Yu 提交于 11月 07, 2017
```
CompareOp can run on CPU even other operators are running on GPU, since
opeatations like comparing control flags should be performed only on CPU
```
  3187451a
- Q
  
  Check errors for the cuda kernel calls. (#5436) · 58db07b7
  由 qingqing01 提交于 11月 08, 2017
  
  58db07b7
31 10月, 2017 1 次提交
- Q
  remove unused code (#5219) · afd1e844
  由 QI JUN 提交于 10月 30, 2017
```
* remove unused code

* fix cmake file

* fix build error
```
  afd1e844
26 10月, 2017 1 次提交

Cudnn batch norm op (#5067) · 56b723c4

由 Qiao Longfei 提交于 10月 25, 2017

* init cudnn batch norm op

* rename batch_norm_cudnn_op.cc batch_norm_op.cu

* correct name style

* add ExtractNCWHD, simplify code

* fix ExtractNCWHD

* use CUDNN_ENFORCE instead of PADDLE_ENFORCE

56b723c4

25 10月, 2017 1 次提交
- D
  
  checkin nccl operator · 0990c87b
  由 Dong Zhihong 提交于 10月 24, 2017
  
  0990c87b
24 10月, 2017 2 次提交
- Y
  
  Use external project for NCCL (#5028) · 94e741d6
  由 Yu Yang 提交于 10月 23, 2017
  
  94e741d6
- Y
  Feature/nccl dso (#5001) · 43c6ff21
  由 Yu Yang 提交于 10月 23, 2017
```
* "add nccl enforce"

* Dev

* Update comment

* Add nccl test

* Follow comments
```
  43c6ff21
18 10月, 2017 1 次提交

MatMul operator (#4856) · 16489827

由 Markus Kliegl 提交于 10月 17, 2017

* initial matmul operator

Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.

For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.

16489827

16 10月, 2017 1 次提交
- D
  
  "fix enforce error" · d8aebaf5
  由 Dong Zhihong 提交于 10月 15, 2017
  
  d8aebaf5
15 10月, 2017 1 次提交
- D
  
  "add enforce check" · 54d3dbd8
  由 Dong Zhihong 提交于 10月 14, 2017
  
  54d3dbd8
14 10月, 2017 1 次提交
- D
  
  "nccl add interface" · d1443104
  由 Dong Zhihong 提交于 10月 13, 2017
  
  d1443104
12 10月, 2017 1 次提交

武

Cudnn conv op (#4195) · a3ccbdb3

由武毅提交于 10月 12, 2017

* add cudnn_conv_op

* WIP

* update

* update

* fix grad check

* use platform::memory

* add support group for cudnn

* update

* follow comments

* fix onlycpu build

* update cuda define

* follow comments

* follow comments

* merge with updates

* fix compile error

* follow comments

* follow comments

a3ccbdb3

10 10月, 2017 2 次提交
- L
  
  remove unused PADDLE_ONLY_CPU comment · 871a3f6e
  由 Luo Tao 提交于 10月 10, 2017
  
  871a3f6e
- Y
  
  clean up for review · e5155713
  由 Yang Yang 提交于 10月 09, 2017
  
  e5155713
07 10月, 2017 1 次提交
- Q
  
  fix executor gpu unittest · 1f5192a2
  由 qijun 提交于 10月 06, 2017
  
  1f5192a2
05 10月, 2017 4 次提交
- Y
  
  Rename platform::GetDeviceCount into platform::GetCUDADeviceCount · 2b204f04
  由 Yi Wang 提交于 10月 04, 2017
  
  2b204f04
- Q
  
  fix gpu build error · fe10e86d
  由 qijun 提交于 10月 04, 2017
  
  fe10e86d
- Y
  
  Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU · 4558807c
  由 Yi Wang 提交于 10月 04, 2017
  
  4558807c
- Y
  Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU` · 84500f94
  由 Yu Yang 提交于 10月 04, 2017
```
By shell command

```bash
  sed -i 's#ifdef PADDLE_ONLY_CPU#ifndef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
  sed -i 's#ifndef PADDLE_ONLY_CPU#ifdef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
```
```
  84500f94
04 10月, 2017 2 次提交
- Q
  
  remove device context manager · 39505151
  由 qijun 提交于 10月 03, 2017
  
  39505151
- Q
  
  refine codes · 6c4d1f55
  由 qijun 提交于 10月 03, 2017
  
  6c4d1f55
03 10月, 2017 2 次提交
- Q
  
  follow comments · b5dbe88b
  由 qijun 提交于 10月 02, 2017
  
  b5dbe88b
- Y
  
  Unify Map in OpDescBind · 04e604b7
  由 Yu Yang 提交于 9月 30, 2017
  
  04e604b7
02 10月, 2017 1 次提交
- D
  
  format · 5423cb3e
  由 dongzhihong 提交于 10月 01, 2017
  
  5423cb3e
01 10月, 2017 1 次提交
- Y
  
  Unify Map in OpDescBind · 8fd845e0
  由 Yu Yang 提交于 9月 30, 2017
  
  8fd845e0
29 9月, 2017 3 次提交
- Q
  
  fix gpu build error · b611a479
  由 qijun 提交于 9月 28, 2017
  
  b611a479
- Q
  
  move EigenDeviceConverter to device_context.h · 7a6fcc7d
  由 qijun 提交于 9月 28, 2017
  
  7a6fcc7d
- Y
  
  Follow comments · f2feb333
  由 Yu Yang 提交于 9月 28, 2017
  
  f2feb333
28 9月, 2017 1 次提交
- Y
  
  Add Skeleton of Double support · 3a5693e0
  由 Yu Yang 提交于 9月 27, 2017
  
  3a5693e0
26 9月, 2017 2 次提交
- Q
  fix nv_library (#4370) · d0ad82cf
  由 Qiao Longfei 提交于 9月 25, 2017
```
* fix nv_library

* fix symbol in gpu_info.h
```
  d0ad82cf
- Y
  Use `bool` for PADDLE_ENFORCE, not int · 699dbe3b
  由 Yu Yang 提交于 9月 25, 2017
```
* If stat is an integer, bool value will implicit cast to int before
  pass to PADDLE_ENFORCE
```
  699dbe3b
23 9月, 2017 1 次提交

Sync computation when Python invoke `run` · ba1f5b5c

由 Yu Yang 提交于 9月 22, 2017

* Since GPU is an async device by default. We should sync computation
  when Python invoke `run`. So Python can get the correct computation
  result

ba1f5b5c

22 9月, 2017 1 次提交
- C
  
  fix framework::LoDTensor => Tensor · 0417e4e4
  由 chengduoZH 提交于 9月 22, 2017
  
  0417e4e4

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致