• Y
    Add the first implememtation of fusion_group op (#19621) · d4832077
    Yiqun Liu 提交于
    * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
    test=develop
    
    * Call CUDA driver api to launch the kernel compiled by nvrtc.
    test=develop
    
    * Disable for mac and windows.
    test=develop
    
    * Refine the codes to support manually specified num_threads and workload_per_thread.
    test=develop
    
    * Refine the CUDA kernel to support large dims.
    test=develop
    
    * Add DeviceCodePool to manage all device codes.
    
    * Add the first implementation fusion_group op.
    
    * Add unit-test for fusion_group op.
    
    * Add the check of result.
    
    * Add the check of nvrtc in unit-test.
    test=develop
    
    * Add comment to explain the inputs, outputs and features of fusion_group op.
    test=develop
    
    * Disable fusion_group op for mac and windows.
    test=develop
    
    * Make the compiling of device code return status instead of hanging up.
    test=develop
    
    * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
    
    * Unify fusion_group_op's input and output names.
    test=develop
    
    * Add the check of CUDA driver library in unittest.
    test=develop
    
    * Refine the calling of PADDLE_ENFORCE.
    test=develop
    d4832077
CMakeLists.txt 1.4 KB