提交 · 76738504ad8101aa8221c9a6d9bef2b6cbfc52ca · Crayon鑫 / Paddle

16 12月, 2020 1 次提交

由 Y_Xuan 提交于 12月 16, 2020

* 添加rocm平台支持代码

* 修改一些问题

* 修改一些歧义并添加备注

* 修改代码格式

* 解决冲突后的代码修改

* 修改operators.cmake

* 修改格式

* 修正错误

* 统一接口

* 修改日期

76738504

24 9月, 2020 1 次提交

use iwyu clean include (#27267) · df43905f

由 wanghuancoder 提交于 9月 24, 2020

* use iwyu clean include, test=develop, test=win

* compilation error, test=develop

* fix compilation error2, test=develop

* fix compilation error3, test=develop

* fix compilation error4, test=develop

* fix compilation error5, test=develop

* fix compilation error6, test=develop

* fix compilation error7, test=develop

* fix compilation error8, test=develop

* fix compilation error8, test=develop

* fix compilation error10, test=develop

* fix compilation error11, test=develop

df43905f

09 7月, 2020 1 次提交
- C
  
  remove WITH_DSO compile option (#25444) · 172d4ecb
  由 Chen Weihang 提交于 7月 09, 2020
  
  172d4ecb
18 5月, 2020 1 次提交

Add some check for CUDA Driver API and NVRTC (#22719) · 560c8153

由 Yiqun Liu 提交于 5月 18, 2020

* Add the check for whether CUDA Driver and NVRTC is available for the runtime system.

* Call cuInit to initialize the CUDA Driver API before all CUDA callings.
test=develop

* Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting.
test=develop

* Do not initialize CUDA Driver API for windows and macos.
test=develop

* Remove the call of cuInit when entering paddle and enable the test_code_generator.
test=develop

* Add some built-in functions for __half.
test=develop

* Change save_intermediate_out to false in unittest.
test=develop

* Fix error reference to tempropary variable when seting including path for device_code.
test=develop

560c8153

03 1月, 2020 1 次提交

Add the first implememtation of fusion_group op (#19621) · d4832077

由 Yiqun Liu 提交于 1月 03, 2020

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

* Add DeviceCodePool to manage all device codes.

* Add the first implementation fusion_group op.

* Add unit-test for fusion_group op.

* Add the check of result.

* Add the check of nvrtc in unit-test.
test=develop

* Add comment to explain the inputs, outputs and features of fusion_group op.
test=develop

* Disable fusion_group op for mac and windows.
test=develop

* Make the compiling of device code return status instead of hanging up.
test=develop

* Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.

* Unify fusion_group_op's input and output names.
test=develop

* Add the check of CUDA driver library in unittest.
test=develop

* Refine the calling of PADDLE_ENFORCE.
test=develop

d4832077

05 9月, 2019 1 次提交

Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6

由 Yiqun Liu 提交于 9月 05, 2019

* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
test=develop

* Call CUDA driver api to launch the kernel compiled by nvrtc.
test=develop

* Disable for mac and windows.
test=develop

* Refine the codes to support manually specified num_threads and workload_per_thread.
test=develop

* Refine the CUDA kernel to support large dims.
test=develop

42b5bec6

Crayon鑫 / Paddle 与 Fork 源项目一致

Crayon鑫 / Paddle
与 Fork 源项目一致