Adding BERT-Large cased

68d50765 · Jacob Devlin · 32ac4763 · 68d50765
隐藏空白更改
内联并排

Showing with 10 addition and 2 deletion

README.md README.md +10 -2

未找到文件。
--- a/README.md
+++ b/README.md
@@ -212,6 +212,10 @@ These models are all released under the same license as the source code (Apache
 For information about the Multilingual and Chinese model, see the
 [Multilingual README](https://github.com/google-research/bert/blob/master/multilingual.md).

+**When using a cased model, make sure to pass `--do_lower=False` to the training
+scripts. (Or pass `do_lower_case=False` directly to `FullTokenizer` if you're
+using your own script.)**
+
 The links to the models are here (right-click, 'Save link as...' on the name):

 *   **[`BERT-Base, Uncased`](https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip)**:
@@ -220,8 +224,8 @@ The links to the models are here (right-click, 'Save link as...' on the name):
    24-layer, 1024-hidden, 16-heads, 340M parameters
 *   **[`BERT-Base, Cased`](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip)**:
    12-layer, 768-hidden, 12-heads , 110M parameters
-*   **`BERT-Large, Cased`**: 24-layer, 1024-hidden, 16-heads, 340M parameters
-    (Not available yet. Needs to be re-generated).
+*   **[`BERT-Large, Cased`](https://storage.googleapis.com/bert_models/2018_10_18/cased_L-24_H-1024_A-16.zip)**:
+    24-layer, 1024-hidden, 16-heads, 340M parameters
 *   **[`BERT-Base, Multilingual Cased (New, recommended)`](https://storage.googleapis.com/bert_models/2018_11_23/multi_cased_L-12_H-768_A-12.zip)**:
    104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters
 *   **[`BERT-Base, Multilingual Uncased (Orig, not recommended)`](https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip)**:
@@ -826,6 +830,10 @@ accuracy numbers.

 ### Pre-training tips and caveats

+*   **If using your own vocabulary, make sure to change `vocab_size` in
+    `bert_config.json`. If you use a larger vocabulary without changing this,
+    you will likely get NaNs when training on GPU or TPU due to unchecked
+    out-of-bounds access.**
 *   If your task has a large domain-specific corpus available (e.g., "movie
    reviews" or "scientific papers"), it will likely be beneficial to run
    additional steps of pre-training on your corpus, starting from the BERT