Created by: Jie-Fang
Optimize NHWC data format for batchnorm.
Use new cudnn batchnorm api for NHWC. Combine this with NHWC optimization for conv2d, see pr20597, ResNet50 improves about 15%-20% performace.