Created by: xyzhou-puck
Add the dygraph BERT model (version 1).
I test it on glue (mainly in MNLI and CoLA) and I got similar results to its static-graph version.
I will add source code of SQuAD and pre-training later.