Created by: Aurelius84
In distribute mode, tensor will be splited into many parts and initialized independently.
And in this case, the numel of splitted tensor will be smaller than actual value, which leads PADDLE_ENFORCE
will fail In distribute mode.
This PR skips the param checking in distribute mode.