提交 · 7b5e23c0345ca717f7af2c2fb05d36e5655fb2a3 · BaiXuePrincess / Paddle

05 9月, 2019 1 次提交

Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6

由 Yiqun Liu 提交于 9月 05, 2019

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

42b5bec6

23 11月, 2018 1 次提交
- C
  fix cublas warp error · f7847ca6
  由 chengduozh 提交于 11月 23, 2018
```
test=develop
```
  f7847ca6
22 11月, 2018 1 次提交

Refine cublas to support CUBLAS_TENSOR_OP_MATH (#13929) · 00b9e9a1

由 chengduo 提交于 11月 22, 2018

* refine cublase
test=develop

* code refine

* refine cublas

* add GEMME_EX

* add enable_cublas_tensor_op_math doc and add cublasCall
test=develop

* fix CublasCall for cuda version
test=develop

* fix error
test=develop

* fix GEMM_EX to be compatible with gcc 4.8
test=develop

* add GEMM_EX
test=develop

* to compatiable with gcc4.8
test=develop

00b9e9a1

28 9月, 2018 1 次提交
- D
  namespace issue (#13543) · 2d00e658
  由 dzhwinter 提交于 9月 28, 2018
```
* flags

* "follow comment"
```
  2d00e658
15 9月, 2018 1 次提交
- D
  
  debug version · 85f8dd1c
  由 dzhwinter 提交于 9月 15, 2018
  
  85f8dd1c
21 8月, 2018 1 次提交
- D
  
  status (#12764) · e23ddf6a
  由 dzhwinter 提交于 8月 21, 2018
  
  e23ddf6a
17 8月, 2018 1 次提交
- D
  
  dlfnh · 335398f1
  由 dzhwinter 提交于 8月 17, 2018
  
  335398f1
01 6月, 2018 1 次提交
- Y
  
  Use static for dlsym · c5115950
  由 yuyang18 提交于 6月 01, 2018
  
  c5115950
25 4月, 2018 1 次提交
- Y
  
  Make dyload strictly use the same ABI in header · 3d53631b
  由 Yu Yang 提交于 4月 25, 2018
  
  3d53631b
11 4月, 2018 1 次提交
- K
  Fix cuda 7.5 error with cublas GEMM (#9811) · 7ed457e7
  由 Kexin Zhao 提交于 4月 10, 2018
```
* fix gemm error for cuda 7.5

* fix version number
```
  7ed457e7
08 4月, 2018 1 次提交
- Y
  Fix cpplint errors with paddle/fluid/platform/dynload (#9715) · e185502e
  由 Yi Wang 提交于 4月 07, 2018
```
* Update source files.

* Update headers

* Update

* Update

* Update

* Update

* Fix a CMake dependency
```
  e185502e
07 4月, 2018 1 次提交
- K
  Update the cuda API and enable tensor core for GEMM (#9622) · d00bd9eb
  由 Kexin Zhao 提交于 4月 06, 2018
```
* change from hgemm to gemmEx

* fix cpplint
```
  d00bd9eb
09 3月, 2018 1 次提交

Add float16 GEMM math function on GPU (#8695) · 90215b78

由 kexinzhao 提交于 3月 08, 2018

* test cpu float16 data transform

* add isnan etc

* small fix

* fix containsNAN test error

* add data_type transform GPU test

* add float16 GPU example

* fix error

* fix GPU test error

* initial commit

* fix error

* small fix

* add more gemm fp16 tests

* fix error

* add utility function

90215b78

12 2月, 2018 1 次提交
- Q
  
  Fix the grammar in copyright. (#8403) · 24509f4a
  由 qingqing01 提交于 2月 12, 2018
  
  24509f4a
10 2月, 2018 2 次提交
- Y
  
  Correct #include path · fc374821
  由 Yi Wang 提交于 2月 09, 2018
  
  fc374821
- Y
  
  Move file to fluid/; Edit CMakeLists.txt · 90648f33
  由 Yi Wang 提交于 2月 09, 2018
  
  90648f33
11 11月, 2017 1 次提交
- D
  
  Use G++ to compile some cu operators. · f5e36765
  由 dangqingqing 提交于 11月 11, 2017
  
  f5e36765
18 10月, 2017 1 次提交

MatMul operator (#4856) · 16489827

由 Markus Kliegl 提交于 10月 17, 2017

* initial matmul operator

Similar to np.matmul, but also has transpose_X and transpose_Y flags,
and only supports tensors from rank 1 to 3 inclusive.

For GPU, uses cublas?gemmStridedBatched. For CPU, uses
cblas_?gemm_batch if available via MKL; otherwise a simple serial
implementation that loops over the batch dimension is employed for now.

16489827

10 8月, 2017 3 次提交
- Q
  
  format code · 688c43b1
  由 qijun 提交于 8月 10, 2017
  
  688c43b1
- Y
  Fix gaussian_random_op compile error · 45911102
  由 Yu Yang 提交于 8月 10, 2017
```
* Should always use `dynload::` for cuda function.
* Fix cublas.h without DSO load.
```
  45911102
- Q
  
  fix bug in dynload · 5f1081d8
  由 qijun 提交于 8月 10, 2017
  
  5f1081d8
13 7月, 2017 1 次提交
- Q
  
  fix bug in dynload · 4e918377
  由 qijun 提交于 7月 13, 2017
  
  4e918377
11 7月, 2017 2 次提交
- Q
  
  fix cublas dynload bug · 69d76812
  由 qijun 提交于 7月 11, 2017
  
  69d76812
- Y
  
  Refine CUDA Related libraries · a0466053
  由 Yu Yang 提交于 7月 11, 2017
  
  a0466053
04 7月, 2017 3 次提交
- Q
  
  fix wrong including header-file in files in paddle/platform/dynload dir · e6fcdd47
  由 qijun 提交于 7月 04, 2017
  
  e6fcdd47
- Q
  
  move to dynload directory · 3567ea6d
  由 qijun 提交于 7月 04, 2017
  
  3567ea6d
- Q
  
  follow comments · 9eeabe98
  由 qijun 提交于 7月 04, 2017
  
  9eeabe98
03 7月, 2017 2 次提交
- Q
  
  fix cuda compile error · a77fcef3
  由 qijun 提交于 7月 03, 2017
  
  a77fcef3
- Q
  
  add dynamic_load · 3ba7a738
  由 qijun 提交于 7月 03, 2017
  
  3ba7a738

BaiXuePrincess / Paddle 与 Fork 源项目一致

BaiXuePrincess / Paddle
与 Fork 源项目一致