提交 68d50765 编写于 作者: J Jacob Devlin

Adding BERT-Large cased

上级 32ac4763
......@@ -212,6 +212,10 @@ These models are all released under the same license as the source code (Apache
For information about the Multilingual and Chinese model, see the
[Multilingual README](https://github.com/google-research/bert/blob/master/multilingual.md).
**When using a cased model, make sure to pass `--do_lower=False` to the training
scripts. (Or pass `do_lower_case=False` directly to `FullTokenizer` if you're
using your own script.)**
The links to the models are here (right-click, 'Save link as...' on the name):
* **[`BERT-Base, Uncased`](https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip)**:
......@@ -220,8 +224,8 @@ The links to the models are here (right-click, 'Save link as...' on the name):
24-layer, 1024-hidden, 16-heads, 340M parameters
* **[`BERT-Base, Cased`](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)**:
12-layer, 768-hidden, 12-heads , 110M parameters
* **`BERT-Large, Cased`**: 24-layer, 1024-hidden, 16-heads, 340M parameters
(Not available yet. Needs to be re-generated).
* **[`BERT-Large, Cased`](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-24_H-1024_A-16.zip)**:
24-layer, 1024-hidden, 16-heads, 340M parameters
* **[`BERT-Base, Multilingual Cased (New, recommended)`](https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip)**:
104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
* **[`BERT-Base, Multilingual Uncased (Orig, not recommended)`](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip)**:
......@@ -826,6 +830,10 @@ accuracy numbers.
### Pre-training tips and caveats
* **If using your own vocabulary, make sure to change `vocab_size` in
`bert_config.json`. If you use a larger vocabulary without changing this,
you will likely get NaNs when training on GPU or TPU due to unchecked
out-of-bounds access.**
* If your task has a large domain-specific corpus available (e.g., "movie
reviews" or "scientific papers"), it will likely be beneficial to run
additional steps of pre-training on your corpus, starting from the BERT
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册