Created by: luotao1
related #9629 (closed)
- solve the fuse batch norm for conv operator with and without bias
- after fusing, the elapsed time on resnet (test_inference_image_classification) is from 11.2s to 9.3s, about 10% speedup on inference.
Note that this PR modify the program desc from C++ end, and discussed with @jacquesqiao, it is better to modify the program from Python end.