Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
diluosixu
bert
提交
67a4537b
B
bert
项目概览
diluosixu
/
bert
与 Fork 源项目一致
从无法访问的项目Fork
通知
4
Star
0
Fork
0
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
0
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
B
bert
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
0
Issue
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
体验新版 GitCode,发现更多精彩内容 >>
未验证
提交
67a4537b
编写于
7月 15, 2019
作者:
S
Slav Petrov
提交者:
GitHub
7月 15, 2019
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Update multilingual.md
Correct Wikipedia size correlation comment.
上级
0fce551b
变更
1
显示空白变更内容
内联
并排
Showing
1 changed file
with
4 addition
and
6 deletion
+4
-6
multilingual.md
multilingual.md
+4
-6
未找到文件。
multilingual.md
浏览文件 @
67a4537b
...
...
@@ -69,7 +69,7 @@ Note that the English result is worse than the 84.2 MultiNLI baseline because
this training used Multilingual BERT rather than English-only BERT. This implies
that for high-resource languages, the Multilingual model is somewhat worse than
a single-language model. However, it is not feasible for us to train and
maintain dozens of single-language model. Therefore, if your goal is to maximize
maintain dozens of single-language model
s
. Therefore, if your goal is to maximize
performance with a language other than English or Chinese, you might find it
beneficial to run pre-training for additional steps starting from our
Multilingual model on data from your language of interest.
...
...
@@ -152,11 +152,9 @@ taken as the training data for each language
However, the size of the Wikipedia for a given language varies greatly, and
therefore low-resource languages may be "under-represented" in terms of the
neural network model (under the assumption that languages are "competing" for
limited model capacity to some extent).
However, the size of a Wikipedia also correlates with the number of speakers of
a language, and we also don't want to overfit the model by performing thousands
of epochs over a tiny Wikipedia for a particular language.
limited model capacity to some extent). At the same time, we also don't want
to overfit the model by performing thousands of epochs over a tiny Wikipedia
for a particular language.
To balance these two factors, we performed exponentially smoothed weighting of
the data during pre-training data creation (and WordPiece vocab creation). In
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录