Created by: Jie-Fang
In the previous mixed precision module, we have to cast input to fp16 first, and store fp16 and fp32 weights in the startup program, and cast fp32 to fp16 weights after updating. Thanks to the black/white list, we don't need to consider master weight copy and the input can maintain fp32. Because we will rewrite the program and insert appropriate cast ops for fp16 execution which will cause inserting cast op for fp16 gradients converting to fp32 in the backward procedure, and updating will occur on fp32 weight. Another advantage is that we don't need to write other functions for saving and loading pretrained fp32 model. Because our parameters are always kept in fp32.