提交 · fcd74e06b8f8ed1e7cd13a0255f207f25e638992 · 机器未来 / Paddle

18 10月, 2017 1 次提交

由 Markus Kliegl 提交于 10月 17, 2017

* initial matmul operator

Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.

For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.

16489827

12 10月, 2017 1 次提交

武

Cudnn conv op (#4195) · a3ccbdb3

由武毅提交于 10月 12, 2017

* add cudnn_conv_op

* WIP

* update

* update

* fix grad check

* use platform::memory

* add support group for cudnn

* update

* follow comments

* fix onlycpu build

* update cuda define

* follow comments

* follow comments

* merge with updates

* fix compile error

* follow comments

* follow comments

a3ccbdb3

10 10月, 2017 2 次提交
- L
  
  remove unused PADDLE_ONLY_CPU comment · 871a3f6e
  由 Luo Tao 提交于 10月 10, 2017
  
  871a3f6e
- Y
  
  clean up for review · e5155713
  由 Yang Yang 提交于 10月 09, 2017
  
  e5155713
07 10月, 2017 1 次提交
- Q
  
  fix executor gpu unittest · 1f5192a2
  由 qijun 提交于 10月 06, 2017
  
  1f5192a2
05 10月, 2017 4 次提交
- Y
  
  Rename platform::GetDeviceCount into platform::GetCUDADeviceCount · 2b204f04
  由 Yi Wang 提交于 10月 04, 2017
  
  2b204f04
- Q
  
  fix gpu build error · fe10e86d
  由 qijun 提交于 10月 04, 2017
  
  fe10e86d
- Y
  
  Use PADDLE_WITH_CUDA instead of PADDLE_WITH_GPU · 4558807c
  由 Yi Wang 提交于 10月 04, 2017
  
  4558807c
- Y
  Change `PADDLE_ONLY_CPU` to `PADDLE_WITH_GPU` · 84500f94
  由 Yu Yang 提交于 10月 04, 2017
```
By shell command

```bash
  sed -i 's#ifdef PADDLE_ONLY_CPU#ifndef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
  sed -i 's#ifndef PADDLE_ONLY_CPU#ifdef PADDLE_WITH_GPU#g' `find ./paddle/ -name '*.h' -o -name '*.cc' -o -name '*.cpp' -o -name '*.c' -o -name '*.cu'`
```
```
  84500f94
04 10月, 2017 2 次提交
- Q
  
  remove device context manager · 39505151
  由 qijun 提交于 10月 03, 2017
  
  39505151
- Q
  
  refine codes · 6c4d1f55
  由 qijun 提交于 10月 03, 2017
  
  6c4d1f55
03 10月, 2017 2 次提交
- Q
  
  follow comments · b5dbe88b
  由 qijun 提交于 10月 02, 2017
  
  b5dbe88b
- Y
  
  Unify Map in OpDescBind · 04e604b7
  由 Yu Yang 提交于 9月 30, 2017
  
  04e604b7
02 10月, 2017 1 次提交
- D
  
  format · 5423cb3e
  由 dongzhihong 提交于 10月 01, 2017
  
  5423cb3e
01 10月, 2017 1 次提交
- Y
  
  Unify Map in OpDescBind · 8fd845e0
  由 Yu Yang 提交于 9月 30, 2017
  
  8fd845e0
29 9月, 2017 3 次提交
- Q
  
  fix gpu build error · b611a479
  由 qijun 提交于 9月 28, 2017
  
  b611a479
- Q
  
  move EigenDeviceConverter to device_context.h · 7a6fcc7d
  由 qijun 提交于 9月 28, 2017
  
  7a6fcc7d
- Y
  
  Follow comments · f2feb333
  由 Yu Yang 提交于 9月 28, 2017
  
  f2feb333
28 9月, 2017 1 次提交
- Y
  
  Add Skeleton of Double support · 3a5693e0
  由 Yu Yang 提交于 9月 27, 2017
  
  3a5693e0
26 9月, 2017 2 次提交
- Q
  fix nv_library (#4370) · d0ad82cf
  由 Qiao Longfei 提交于 9月 25, 2017
```
* fix nv_library

* fix symbol in gpu_info.h
```
  d0ad82cf
- Y
  Use `bool` for PADDLE_ENFORCE, not int · 699dbe3b
  由 Yu Yang 提交于 9月 25, 2017
```
* If stat is an integer, bool value will implicit cast to int before
  pass to PADDLE_ENFORCE
```
  699dbe3b
23 9月, 2017 1 次提交

Sync computation when Python invoke `run` · ba1f5b5c

由 Yu Yang 提交于 9月 22, 2017

* Since GPU is an async device by default. We should sync computation
  when Python invoke `run`. So Python can get the correct computation
  result

ba1f5b5c

22 9月, 2017 1 次提交
- C
  
  fix framework::LoDTensor => Tensor · 0417e4e4
  由 chengduoZH 提交于 9月 22, 2017
  
  0417e4e4
19 9月, 2017 3 次提交
- D
  
  Refine platform::Transform function and fix prelu_op testing. · 41a2321a
  由 dangqingqing 提交于 9月 19, 2017
  
  41a2321a
- Y
  Change Transform API · 87e4e25d
  由 Yu Yang 提交于 9月 18, 2017
```
Using DeviceContext, not Place to get stream
```
  87e4e25d
- Y
  Remove lazy-initialization in device_context · 81d56ca8
  由 Yu Yang 提交于 9月 18, 2017
```
* Also use `const DeviceContext&` all the time, to prevent `const_cast`

Fix #4169
Fix #3468
Fix #3475
```
  81d56ca8
18 9月, 2017 1 次提交
- 武
  Refine accuracy_op CUDA kernel (#4097) · 8580dce3
  由武毅提交于 9月 18, 2017
```
* refind accuracy_op

* follow comments

* follow comments
```
  8580dce3
14 9月, 2017 2 次提交
- L
  Fix enforce test failed · 59d661b9
  由 liaogang 提交于 9月 14, 2017
```
Note: If no symbol with a suitable value is found, both this field and dli_saddr shall be set to NULL.
```
  59d661b9
- Q
  
  fix relu functor and revert some codes · 0957fa7b
  由 qijun 提交于 9月 14, 2017
  
  0957fa7b
13 9月, 2017 2 次提交
- Q
  
  move EigenDeviceConverter to device_context.h · 3c49e7b1
  由 qijun 提交于 9月 13, 2017
  
  3c49e7b1
- Y
  
  Extract DevPtrCast to device_ptr_cast.h · f8c6792a
  由 Yu Yang 提交于 9月 12, 2017
  
  f8c6792a
12 9月, 2017 1 次提交
- Y
  Mark thrust::device_ptr in transform · 6fbf097b
  由 Yu Yang 提交于 9月 11, 2017
```
Fix TravisCI
```
  6fbf097b
11 9月, 2017 1 次提交

Remove enforce demangle · dad5421a

由 Yu Yang 提交于 9月 10, 2017

It is buggy in some Linux because the unique_ptr will be free however
the std::string trying to use that char*.

Moreover, it is no need to demangle for error log by Paddle.
Just use `c++filt` or other shell utilities to do this.

dad5421a

09 9月, 2017 1 次提交
- Y
  Host and device transform API · c5fa417c
  由 Yu Yang 提交于 9月 08, 2017
```
* with unit-tests
* Also complete `memcpy`
```
  c5fa417c
07 9月, 2017 1 次提交
- Y
  
  Pass CI · ed346f1d
  由 Yu Yang 提交于 9月 06, 2017
  
  ed346f1d
04 9月, 2017 1 次提交
- D
  
  Remove cudnn_helper.cc · 8c048aa0
  由 dangqingqing 提交于 9月 04, 2017
  
  8c048aa0
31 8月, 2017 2 次提交
- D
  
  Add unit testing for cuDNN wrapper. · 20713222
  由 dangqingqing 提交于 8月 31, 2017
  
  20713222
- D
  
  Add cuDNN Wrapper. · c20a01d6
  由 dangqingqing 提交于 8月 31, 2017
  
  c20a01d6
23 8月, 2017 1 次提交
- D
  
  Remove set functor and add comapre_grad test · f188e22b
  由 dangqingqing 提交于 8月 23, 2017
  
  f188e22b
22 8月, 2017 1 次提交
- D
  
  fix cuda_helper.h · 9bc1a1a1
  由 dangqingqing 提交于 8月 22, 2017
  
  9bc1a1a1

机器未来 / Paddle 与 Fork 源项目一致

机器未来 / Paddle
与 Fork 源项目一致