Created by: wangxicoding
see issue https://github.com/PaddlePaddle/Paddle/issues/21804, resolve first & second problems.
- When scale_grad is null, bias_grad is not null, batch_norm_grad will not set bias_grad, shape will be 0. Now, enforce scale & bias must be trainable same time or no trainable same time.
- enforce allreduce size must > 0.