• Y
    Enable the detection of subgraph composed of grad ops (#21223) · dcfb6038
    Yiqun Liu 提交于
    * Add the first implememtation of fusion_group op #19621 (#3)
    
    * Add the dynamic load of nvrtc, and support runtime compiling of CUDA kernel using nvrtc.
    test=develop
    
    * Call CUDA driver api to launch the kernel compiled by nvrtc.
    test=develop
    
    * Disable for mac and windows.
    test=develop
    
    * Refine the codes to support manually specified num_threads and workload_per_thread.
    test=develop
    
    * Refine the CUDA kernel to support large dims.
    test=develop
    
    * Add DeviceCodePool to manage all device codes.
    
    * Add the first implementation fusion_group op.
    
    * Add unit-test for fusion_group op.
    
    * Add the check of result.
    
    * Add the check of nvrtc in unit-test.
    test=develop
    
    * Add comment to explain the inputs, outputs and features of fusion_group op.
    test=develop
    
    * Disable fusion_group op for mac and windows.
    test=develop
    
    * Make the compiling of device code return status instead of hanging up.
    test=develop
    
    * Add the check of whether there is CUDA driver library, and do not core dump when failing to call the CUDA driver API.
    
    * Unify fusion_group_op's input and output names.
    test=develop
    
    * Add the check of CUDA driver library in unittest.
    test=develop
    
    * Enable generating code for a given subgraph. #21126 (#4)
    
    * Enable generating code for a given subgraph.
    
    * Support sorting the subgraph.
    
    * Remove the rearange of expressions because we use the sorted subgraph directly.
    
    * Enable generating code for a subgraph which is composed of grad ops.
    
    * Use expression information to check the accuracy in unittest.
    
    * Separate load and store from computation expressions.
    test=develop
    
    * Improve the loading statements in generated codes.
    test=develop
    
    * Remove unused arguments from formal list.
    test=develop
    
    * Enable the detection of subgraph of grad ops.
    
    * Generate code for detected subgraph in fusion_group_pass.
    
    * Add an option in BuildStrategy to enable fusion_group_pass and add unittest.
    test=develop
    
    * Fix a bug when checking whether the shape of all inputs are the same.
    
    * Add debug information.
    
    * Remove subgraph_detector from inference/analysis to the common framework/ir directory. (#5)
    
    test=develop
    
    * Call subgraph_detector in fusion_group pass.
    test=develop
    
    * Disable fusion_group when WITH_GPU is OFF.
    test=develop
    
    * Refine all PADDLE_ENFORCE message.
    test=develop
    
    * Fix the case that some inputs are not defined in grad ops, and set op_role for fused op.
    test=develop
    
    * Follow review comments.
    test=develop
    dcfb6038
CMakeLists.txt 16.5 KB