Created by: yaoxuefeng6
In data_norm op, which is usually used with pslib mode. we force to change the distribution of embedding by the historical mean and square sum without doing scale and shift. So in some case, when the slot is empty or new, the value after normalization may be impractical. To fix this bug, we determine to do this normalization by judging if the show num of slots is zero. if zero, we skip the forward normalization and backward update of these statistic values