Skip the BatchNorm when feature only have 1 element.
Created by: qingqing01
y = (x - mean(x)) / (std(x) + eps)
If x only have 1 element, mean(x) = x, std(x) = 0. The output will be entirely zero (ignoring the bias). The feature is no meaningless. In this case, we should not use feature-wise batch normalization.