- 18 5月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the check for whether CUDA Driver and NVRTC is available for the runtime system. * Call cuInit to initialize the CUDA Driver API before all CUDA callings. test=develop * Change the behavior when libnvrtc.so can not be found, printing a warning instead of exiting. test=develop * Do not initialize CUDA Driver API for windows and macos. test=develop * Remove the call of cuInit when entering paddle and enable the test_code_generator. test=develop * Add some built-in functions for __half. test=develop * Change save_intermediate_out to false in unittest. test=develop * Fix error reference to tempropary variable when seting including path for device_code. test=develop
-
- 21 2月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
-
- 03 1月, 2020 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop * Add DeviceCodePool to manage all device codes. * Add the first implementation fusion_group op. * Add unit-test for fusion_group op. * Add the check of result. * Add the check of nvrtc in unit-test. test=develop * Add comment to explain the inputs, outputs and features of fusion_group op. test=develop * Disable fusion_group op for mac and windows. test=develop * Make the compiling of device code return status instead of hanging up. test=develop * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API. * Unify fusion_group_op's input and output names. test=develop * Add the check of CUDA driver library in unittest. test=develop * Refine the calling of PADDLE_ENFORCE. test=develop
-
- 05 9月, 2019 1 次提交
-
-
由 Yiqun Liu 提交于
* Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc. test=develop * Call CUDA driver api to launch the kernel compiled by nvrtc. test=develop * Disable for mac and windows. test=develop * Refine the codes to support manually specified num_threads and workload_per_thread. test=develop * Refine the CUDA kernel to support large dims. test=develop
-