Created by: pangyoki
Add support for pre-train tasks in dygraph BERT model, including single or multi GPU card training method. And repair some problem in BERT model, add mask language model's accuracy to output.