• L
    Merge lars op (#35476) · 0c31579c
    limingshu 提交于
    * A leap of try for cudaLaunchCooperativeKernel
    
    * fix bugs
    
    * Totally replace the lar cuda kernel
    
    * Fix bugs
    
    * a test for lars merge
    
    * Adding las_op_momentum infer_shape
    
    * Fix codes
    
    * use avg_numel instead of max_numel to acquire grid num
    
    * modify unittest files about lars op
    
    * Finally converge when merged-lars works
    
    * fix ctest files
    
    * add merged_operation kernel when cuda version is older than 11
    
    * Fix code style
    
    * fix ctest failure
    
    * fix error
    
    * fix all ctest error and change lars compute code of cpu
    
    * fix bugs on v100.
    
    * revert python modififation about lars
    
    * revert python modification codes
    0c31579c
lars_momentum_op.cu 23.3 KB