Created by: guomingz
Relu6 is the bottleneck op for Mobilenet-v2. As the mkldnn supports the conv/relu6 fusion, we implement it fusion via cpass way. Due to the int8 enabling for this fusion will be supported in MKLDNN v0.20, so this PR is focused on the fp32 optimization.
Below table shows the benchmark(FPS) which measured on skx-8180(28 cores)
Batch size | w/ fusion | w/o fusion |
---|---|---|
1 | 214.7 | 53.4 |
50 | 1219.727 | 137.280 |
test=develop