• Z
    Improve elementwise performance. (#23001) · 58615a62
    zhaoyuchen2018 提交于
    * Improve elementwise performance.
    
    Elementwise performace is poor as walk into CommonGradBroadcastCUDA, add some new kernels for different data pattern.
    
    * Add some cuda kernel to speedup common broadcast cases. test=develop
    
    * Add more test cases and fix cuda kernel bug. test=develop
    
    * Remove tests as cpu percision fails.test=develop
    
    * Refine SplitDims, test=develop
    
    * Change file mode, test=develop
    58615a62
elementwise_op_function.h 97.4 KB