• Y
    Integrate NVRTC to support compiling CUDA kernel at runtime (#19422) · 42b5bec6
    Yiqun Liu 提交于
    * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
    test=develop
    
    * Call CUDA driver api to launch the kernel compiled by nvrtc.
    test=develop
    
    * Disable for mac and windows.
    test=develop
    
    * Refine the codes to support manually specified num_threads and workload_per_thread.
    test=develop
    
    * Refine the CUDA kernel to support large dims.
    test=develop
    42b5bec6
cublas.h 4.6 KB