• G
    Enable the convolution/relu6(bounded_relu) fusion for FP32 on Intel platform. (#17130) · 2281ebf0
    guomingz 提交于
    * Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.
    
    Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
    Batch size | with fusion | without fusion
    -- | -- | --
    1 | 214.7 | 53.4
    50 | 1219.727 | 137.280
    
    test=develop
    
    * Fix the format issue
    
    test=develop
    
    * Add the missing nolint comments.
    
    test=develop
    
    * Fix the typos.
    
    test=develop
    
    * Register the conv_brelu_mkldnn_fuse_pass for the MKLDNN engine.
    
    test=develop
    
    * Adjust the indentation.
    
    test=develop
    
    * Add the test_conv_brelu_mkldnn_fuse_pass case.
    
    test=develop
    
    * Slightly update the code per Baidu comments.
    Let the parameter definition embedded into the code.
    That's will make the code easy to understand.
    
    test=develop
    2281ebf0
conv_transpose_mkldnn_op.cc 9.6 KB