未验证 提交 f5c7f8b1 编写于 作者: Y Yam 提交者: GitHub

add ERNIE-M to model center (#5570)

* add ERNIE-M to model center

* add benchmark_en info.yaml

* remove unused cell

* rm image from repo and fix ipynb

* add introduction_en and fix some annotations

* add task to info.yaml

* add model_size and the number of parameters while modify ipynb following comments

* rewrite pretrained model into chinese

* modify task info
上级 2e011b69
## 1. 多语言自然语言推断Benchmark
### 1.1 软硬件环境
* ERNIE-M模型训练过程中使用 Tesla V100 SXM 32GB
### 1.2 数据集
XNLI 是 MNLI 的子集,并且已被翻译成14种不同的语言(包含一些较低资源语言)。与 MNLI 一样,目标是预测文本蕴含(句子 A 是否暗示/矛盾/都不是句子 B )。
### 1.3 指标
| Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur | Avg |
| ---------------------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Cross-lingual Transfer | | | | | | | | | | | | | | | | |
| XLM | 85.0 | 78.7 | 78.9 | 77.8 | 76.6 | 77.4 | 75.3 | 72.5 | 73.1 | 76.1 | 73.2 | 76.5 | 69.6 | 68.4 | 67.3 | 75.1 |
| Unicoder | 85.1 | 79.0 | 79.4 | 77.8 | 77.2 | 77.2 | 76.3 | 72.8 | 73.5 | 76.4 | 73.6 | 76.2 | 69.4 | 69.7 | 66.7 | 75.4 |
| XLM-R | 85.8 | 79.7 | 80.7 | 78.7 | 77.5 | 79.6 | 78.1 | 74.2 | 73.8 | 76.5 | 74.6 | 76.7 | 72.4 | 66.5 | 68.3 | 76.2 |
| INFOXLM | **86.4** | **80.6** | 80.8 | 78.9 | 77.8 | 78.9 | 77.6 | 75.6 | 74.0 | 77.0 | 73.7 | 76.7 | 72.0 | 66.4 | 67.1 | 76.2 |
| **ERNIE-M** | 85.5 | 80.1 | **81.2** | **79.2** | **79.1** | **80.4** | **78.1** | **76.8** | **76.3** | **78.3** | **75.8** | **77.4** | **72.9** | **69.5** | **68.8** | **77.3** |
| XLM-R Large | 89.1 | 84.1 | 85.1 | 83.9 | 82.9 | 84.0 | 81.2 | 79.6 | 79.8 | 80.8 | 78.1 | 80.2 | 76.9 | 73.9 | 73.8 | 80.9 |
| INFOXLM Large | **89.7** | 84.5 | 85.5 | 84.1 | 83.4 | 84.2 | 81.3 | 80.9 | 80.4 | 80.8 | 78.9 | 80.9 | 77.9 | 74.8 | 73.7 | 81.4 |
| VECO Large | 88.2 | 79.2 | 83.1 | 82.9 | 81.2 | 84.2 | 82.8 | 76.2 | 80.3 | 74.3 | 77.0 | 78.4 | 71.3 | **80.4** | **79.1** | 79.9 |
| **ERNIR-M Large** | 89.3 | **85.1** | **85.7** | **84.4** | **83.7** | **84.5** | 82.0 | **81.2** | **81.2** | **81.9** | **79.2** | **81.0** | **78.6** | 76.2 | 75.4 | **82.0** |
| Translate-Train-All | | | | | | | | | | | | | | | | |
| XLM | 85.0 | 80.8 | 81.3 | 80.3 | 79.1 | 80.9 | 78.3 | 75.6 | 77.6 | 78.5 | 76.0 | 79.5 | 72.9 | 72.8 | 68.5 | 77.8 |
| Unicoder | 85.6 | 81.1 | 82.3 | 80.9 | 79.5 | 81.4 | 79.7 | 76.8 | 78.2 | 77.9 | 77.1 | 80.5 | 73.4 | 73.8 | 69.6 | 78.5 |
| XLM-R | 85.4 | 81.4 | 82.2 | 80.3 | 80.4 | 81.3 | 79.7 | 78.6 | 77.3 | 79.7 | 77.9 | 80.2 | 76.1 | 73.1 | 73.0 | 79.1 |
| INFOXLM | 86.1 | 82.0 | 82.8 | 81.8 | 80.9 | 82.0 | 80.2 | 79.0 | 78.8 | 80.5 | 78.3 | 80.5 | 77.4 | 73.0 | 71.6 | 79.7 |
| **ERNIE-M** | **86.2** | **82.5** | **83.8** | **82.6** | **82.4** | **83.4** | **80.2** | **80.6** | **80.5** | **81.1** | **79.2** | **80.5** | **77.7** | **75.0** | **73.3** | **80.6** |
| XLM-R Large | 89.1 | 85.1 | 86.6 | 85.7 | 85.3 | 85.9 | 83.5 | 83.2 | 83.1 | 83.7 | 81.5 | **83.7** | **81.6** | 78.0 | 78.1 | 83.6 |
| VECO Large | 88.9 | 82.4 | 86.0 | 84.7 | 85.3 | 86.2 | **85.8** | 80.1 | 83.0 | 77.2 | 80.9 | 82.8 | 75.3 | **83.1** | **83.0** | 83.0 |
| **ERNIE-M Large** | **89.5** | **86.5** | **86.9** | **86.1** | **86.0** | **86.8** | 84.1 | **83.8** | **84.1** | **84.5** | **82.1** | 83.5 | 81.1 | 79.4 | 77.9 | **84.2** |
## 2. 更多下游任务的benchmark
详见论文: [Ouyang X , Wang S , Pang C , et al. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora[J]. 2020.](https://arxiv.org/abs/2012.15674)
### 2.1 多语言命名实体识别
* 数据集:CoNLI
| Model | en | nl | es | de | Avg |
| ------------------------------ | --------- | --------- | --------- | --------- | --------- |
| *Fine-tune on English dataset* | | | | | |
| mBERT | 91.97 | 77.57 | 74.96 | 69.56 | 78.52 |
| XLM-R | 92.25 | **78.08** | 76.53 | **69.60** | 79.11 |
| **ERNIE-M** | **92.78** | 78.01 | **79.37** | 68.08 | **79.56** |
| XLM-R LARGE | 92.92 | 80.80 | 78.64 | 71.40 | 80.94 |
| **ERNIE-M LARGE** | **93.28** | **81.45** | **78.83** | **72.99** | **81.64** |
| *Fine-tune on all dataset* | | | | | |
| XLM-R | 91.08 | 89.09 | 87.28 | 83.17 | 87.66 |
| **ERNIE-M** | **93.04** | **91.73** | **88.33** | **84.20** | **89.32** |
| XLM-R LARGE | 92.00 | 91.60 | **89.52** | 84.60 | 89.43 |
| **ERNIE-M LARGE** | **94.01** | **93.81** | 89.23 | **86.20** | **90.81** |
### 2.2 多语言问答
* 数据集:MLQA
| Model | en | es | de | ar | hi | vi | zh | Avg |
| ----------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
| mBERT | 77.7 / 65.2 | 64.3 / 46.6 | 57.9 / 44.3 | 45.7 / 29.8 | 43.8 / 29.7 | 57.1 / 38.6 | 57.5 / 37.3 | 57.7 / 41.6 |
| XLM | 74.9 / 62.4 | 68.0 / 49.8 | 62.2 / 47.6 | 54.8 / 36.3 | 48.8 / 27.3 | 61.4 / 41.8 | 61.1 / 39.6 | 61.6 / 43.5 |
| XLM-R | 77.1 / 64.6 | 67.4 / 49.6 | 60.9 / 46.7 | 54.9 / 36.6 | 59.4 / 42.9 | 64.5 / 44.7 | 61.8 / 39.3 | 63.7 / 46.3 |
| INFOXLM | 81.3 / 68.2 | 69.9 / 51.9 | 64.2 / 49.6 | 60.1 / 40.9 | 65.0 / 47.5 | 70.0 / 48.6 | 64.7 / **41.2** | 67.9 / 49.7 |
| **ERNIE-M** | **81.6 / 68.5** | **70.9 / 52.6** | **65.8 / 50.7** | **61.8 / 41.9** | **65.4 / 47.5** | **70.0 / 49.2** | **65.6** / 41.0 | **68.7 / 50.2** |
| XLM-R LARGE | 80.6 / 67.8 | 74.1 / 56.0 | 68.5 / 53.6 | 63.1 / 43.5 | 62.9 / 51.6 | 71.3 / 50.9 | 68.0 / 45.4 | 70.7 / 52.7 |
| INFOXLM LARGE | **84.5 / 71.6** | **75.1 / 57.3** | **71.2 / 56.2** | **67.6 / 47.6** | 72.5 / 54.2 | **75.2 / 54.1** | 69.2 / 45.4 | 73.6 / 55.2 |
| **ERNIE-M LARGE** | 84.4 / 71.5 | 74.8 / 56.6 | 70.8 / 55.9 | 67.4 / 47.2 | **72.6 / 54.7** | 75.0 / 53.7 | **71.1 / 47.5** | **73.7 / 55.3** |
### 2.3 多语言复述识别
* 数据集:PAWS-X
| Model | en | de | es | fr | ja | ko | zh | Avg |
| ---------------------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Cross-lingual Transfer | | | | | | | | |
| mBERT | 94.0 | 85.7 | 87.4 | 87.0 | 73.0 | 69.6 | 77.0 | 81.9 |
| XLM | 94.0 | 85.9 | 88.3 | 87.4 | 69.3 | 64.8 | 76.5 | 80.9 |
| MMTE | 93.1 | 85.1 | 87.2 | 86.9 | 72.0 | 69.2 | 75.9 | 81.3 |
| XLM-R LARGE | 94.7 | 89.7 | 90.1 | 90.4 | 78.7 | 79.0 | 82.3 | 86.4 |
| VECO LARGE | **96.2** | 91.3 | 91.4 | 92.0 | 81.8 | 82.9 | 85.1 | 88.7 |
| **ERNIE-M LARGE** | 96.0 | **91.9** | **91.4** | **92.2** | **83.9** | **84.5** | **86.9** | **89.5** |
| Translate-Train-All | | | | | | | | |
| VECO LARGE | 96.4 | 93.0 | 93.0 | 93.5 | 87.2 | 86.8 | 87.9 | 91.1 |
| **ERNIE-M LARGE** | **96.5** | **93.5** | **93.3** | **93.8** | **87.9** | **88.4** | **89.2** | **91.8** |
### 2.4 多语言句子召回
* 数据集:Tatoeba
| Model | Avg |
| --------------------------------------- | -------- |
| XLM-R LARGE | 75.2 |
| VECO LARGE | 86.9 |
| **ERNIE-M LARGE** | **87.9** |
| **ERNIE-M LARGE( after fine-tuning)** | 93.3 |
## 1. Benchmark of Cross-lingual Natural Language Inference
### 1.1 Environment
* The training preocess of ERNIE-M model uses Tesla V100 SXM 32GB
### 1.2 Datasets
XNLI is a subset of MNLI and has been translated into 14 different kinds of languages including some low-resource languages. The goal of the task is to predict testual entailment (whether sentence A implies / contradicts / neither sentence B).
### 1.3 Benchmark
| Model | en | fr | es | de | el | bg | ru | tr | ar | vi | th | zh | hi | sw | ur | Avg |
| ---------------------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Cross-lingual Transfer | | | | | | | | | | | | | | | | |
| XLM | 85.0 | 78.7 | 78.9 | 77.8 | 76.6 | 77.4 | 75.3 | 72.5 | 73.1 | 76.1 | 73.2 | 76.5 | 69.6 | 68.4 | 67.3 | 75.1 |
| Unicoder | 85.1 | 79.0 | 79.4 | 77.8 | 77.2 | 77.2 | 76.3 | 72.8 | 73.5 | 76.4 | 73.6 | 76.2 | 69.4 | 69.7 | 66.7 | 75.4 |
| XLM-R | 85.8 | 79.7 | 80.7 | 78.7 | 77.5 | 79.6 | 78.1 | 74.2 | 73.8 | 76.5 | 74.6 | 76.7 | 72.4 | 66.5 | 68.3 | 76.2 |
| INFOXLM | **86.4** | **80.6** | 80.8 | 78.9 | 77.8 | 78.9 | 77.6 | 75.6 | 74.0 | 77.0 | 73.7 | 76.7 | 72.0 | 66.4 | 67.1 | 76.2 |
| **ERNIE-M** | 85.5 | 80.1 | **81.2** | **79.2** | **79.1** | **80.4** | **78.1** | **76.8** | **76.3** | **78.3** | **75.8** | **77.4** | **72.9** | **69.5** | **68.8** | **77.3** |
| XLM-R Large | 89.1 | 84.1 | 85.1 | 83.9 | 82.9 | 84.0 | 81.2 | 79.6 | 79.8 | 80.8 | 78.1 | 80.2 | 76.9 | 73.9 | 73.8 | 80.9 |
| INFOXLM Large | **89.7** | 84.5 | 85.5 | 84.1 | 83.4 | 84.2 | 81.3 | 80.9 | 80.4 | 80.8 | 78.9 | 80.9 | 77.9 | 74.8 | 73.7 | 81.4 |
| VECO Large | 88.2 | 79.2 | 83.1 | 82.9 | 81.2 | 84.2 | 82.8 | 76.2 | 80.3 | 74.3 | 77.0 | 78.4 | 71.3 | **80.4** | **79.1** | 79.9 |
| **ERNIR-M Large** | 89.3 | **85.1** | **85.7** | **84.4** | **83.7** | **84.5** | 82.0 | **81.2** | **81.2** | **81.9** | **79.2** | **81.0** | **78.6** | 76.2 | 75.4 | **82.0** |
| Translate-Train-All | | | | | | | | | | | | | | | | |
| XLM | 85.0 | 80.8 | 81.3 | 80.3 | 79.1 | 80.9 | 78.3 | 75.6 | 77.6 | 78.5 | 76.0 | 79.5 | 72.9 | 72.8 | 68.5 | 77.8 |
| Unicoder | 85.6 | 81.1 | 82.3 | 80.9 | 79.5 | 81.4 | 79.7 | 76.8 | 78.2 | 77.9 | 77.1 | 80.5 | 73.4 | 73.8 | 69.6 | 78.5 |
| XLM-R | 85.4 | 81.4 | 82.2 | 80.3 | 80.4 | 81.3 | 79.7 | 78.6 | 77.3 | 79.7 | 77.9 | 80.2 | 76.1 | 73.1 | 73.0 | 79.1 |
| INFOXLM | 86.1 | 82.0 | 82.8 | 81.8 | 80.9 | 82.0 | 80.2 | 79.0 | 78.8 | 80.5 | 78.3 | 80.5 | 77.4 | 73.0 | 71.6 | 79.7 |
| **ERNIE-M** | **86.2** | **82.5** | **83.8** | **82.6** | **82.4** | **83.4** | **80.2** | **80.6** | **80.5** | **81.1** | **79.2** | **80.5** | **77.7** | **75.0** | **73.3** | **80.6** |
| XLM-R Large | 89.1 | 85.1 | 86.6 | 85.7 | 85.3 | 85.9 | 83.5 | 83.2 | 83.1 | 83.7 | 81.5 | **83.7** | **81.6** | 78.0 | 78.1 | 83.6 |
| VECO Large | 88.9 | 82.4 | 86.0 | 84.7 | 85.3 | 86.2 | **85.8** | 80.1 | 83.0 | 77.2 | 80.9 | 82.8 | 75.3 | **83.1** | **83.0** | 83.0 |
| **ERNIE-M Large** | **89.5** | **86.5** | **86.9** | **86.1** | **86.0** | **86.8** | 84.1 | **83.8** | **84.1** | **84.5** | **82.1** | 83.5 | 81.1 | 79.4 | 77.9 | **84.2** |
## 2. More Benchmark of Downstream Tasks
Reference to paper: [Ouyang X , Wang S , Pang C , et al. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora[J]. 2020.](https://arxiv.org/abs/2012.15674)
### 2.1 Cross-lingual Named Entity Recognition
* datasets:CoNLI
| Model | en | nl | es | de | Avg |
| ------------------------------ | --------- | --------- | --------- | --------- | --------- |
| *Fine-tune on English dataset* | | | | | |
| mBERT | 91.97 | 77.57 | 74.96 | 69.56 | 78.52 |
| XLM-R | 92.25 | **78.08** | 76.53 | **69.60** | 79.11 |
| **ERNIE-M** | **92.78** | 78.01 | **79.37** | 68.08 | **79.56** |
| XLM-R LARGE | 92.92 | 80.80 | 78.64 | 71.40 | 80.94 |
| **ERNIE-M LARGE** | **93.28** | **81.45** | **78.83** | **72.99** | **81.64** |
| *Fine-tune on all dataset* | | | | | |
| XLM-R | 91.08 | 89.09 | 87.28 | 83.17 | 87.66 |
| **ERNIE-M** | **93.04** | **91.73** | **88.33** | **84.20** | **89.32** |
| XLM-R LARGE | 92.00 | 91.60 | **89.52** | 84.60 | 89.43 |
| **ERNIE-M LARGE** | **94.01** | **93.81** | 89.23 | **86.20** | **90.81** |
### 2.2 Cross-lingual Question Answering
* datasets:MLQA
| Model | en | es | de | ar | hi | vi | zh | Avg |
| ----------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
| mBERT | 77.7 / 65.2 | 64.3 / 46.6 | 57.9 / 44.3 | 45.7 / 29.8 | 43.8 / 29.7 | 57.1 / 38.6 | 57.5 / 37.3 | 57.7 / 41.6 |
| XLM | 74.9 / 62.4 | 68.0 / 49.8 | 62.2 / 47.6 | 54.8 / 36.3 | 48.8 / 27.3 | 61.4 / 41.8 | 61.1 / 39.6 | 61.6 / 43.5 |
| XLM-R | 77.1 / 64.6 | 67.4 / 49.6 | 60.9 / 46.7 | 54.9 / 36.6 | 59.4 / 42.9 | 64.5 / 44.7 | 61.8 / 39.3 | 63.7 / 46.3 |
| INFOXLM | 81.3 / 68.2 | 69.9 / 51.9 | 64.2 / 49.6 | 60.1 / 40.9 | 65.0 / 47.5 | 70.0 / 48.6 | 64.7 / **41.2** | 67.9 / 49.7 |
| **ERNIE-M** | **81.6 / 68.5** | **70.9 / 52.6** | **65.8 / 50.7** | **61.8 / 41.9** | **65.4 / 47.5** | **70.0 / 49.2** | **65.6** / 41.0 | **68.7 / 50.2** |
| XLM-R LARGE | 80.6 / 67.8 | 74.1 / 56.0 | 68.5 / 53.6 | 63.1 / 43.5 | 62.9 / 51.6 | 71.3 / 50.9 | 68.0 / 45.4 | 70.7 / 52.7 |
| INFOXLM LARGE | **84.5 / 71.6** | **75.1 / 57.3** | **71.2 / 56.2** | **67.6 / 47.6** | 72.5 / 54.2 | **75.2 / 54.1** | 69.2 / 45.4 | 73.6 / 55.2 |
| **ERNIE-M LARGE** | 84.4 / 71.5 | 74.8 / 56.6 | 70.8 / 55.9 | 67.4 / 47.2 | **72.6 / 54.7** | 75.0 / 53.7 | **71.1 / 47.5** | **73.7 / 55.3** |
### 2.3 Cross-lingual Paraphrase Identification
* datasets:PAWS-X
| Model | en | de | es | fr | ja | ko | zh | Avg |
| ---------------------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
| Cross-lingual Transfer | | | | | | | | |
| mBERT | 94.0 | 85.7 | 87.4 | 87.0 | 73.0 | 69.6 | 77.0 | 81.9 |
| XLM | 94.0 | 85.9 | 88.3 | 87.4 | 69.3 | 64.8 | 76.5 | 80.9 |
| MMTE | 93.1 | 85.1 | 87.2 | 86.9 | 72.0 | 69.2 | 75.9 | 81.3 |
| XLM-R LARGE | 94.7 | 89.7 | 90.1 | 90.4 | 78.7 | 79.0 | 82.3 | 86.4 |
| VECO LARGE | **96.2** | 91.3 | 91.4 | 92.0 | 81.8 | 82.9 | 85.1 | 88.7 |
| **ERNIE-M LARGE** | 96.0 | **91.9** | **91.4** | **92.2** | **83.9** | **84.5** | **86.9** | **89.5** |
| Translate-Train-All | | | | | | | | |
| VECO LARGE | 96.4 | 93.0 | 93.0 | 93.5 | 87.2 | 86.8 | 87.9 | 91.1 |
| **ERNIE-M LARGE** | **96.5** | **93.5** | **93.3** | **93.8** | **87.9** | **88.4** | **89.2** | **91.8** |
### 2.4 Cross-lingual Sentence Retrieval
* dataset:Tatoeba
| Model | Avg |
| --------------------------------------- | -------- |
| XLM-R LARGE | 75.2 |
| VECO LARGE | 86.9 |
| **ERNIE-M LARGE** | **87.9** |
| **ERNIE-M LARGE( after fine-tuning)** | 93.3 |
# 提供模型所支持的任务场景、推理和预训练模型文件:
|模型名称 | 任务场景 | 模型大小 | 模型参数量 | 位置编码最大长度 | 下载地址 |
|---|---|---|---|---| --- |
|ernie-m-base | 各类NLP任务通用的预训练模型 | 1.04G | 279M | 514 | [预训练模型](https://paddlenlp.bj.bcebos.com/models/transformers/ernie_m/ernie_m_base.pdparams) |
|ernie-m-large | 各类NLP任务通用的预训练模型 | 2.09G | 560M | 514 | [预训练模型](https://paddlenlp.bj.bcebos.com/models/transformers/ernie_m/ernie_m_large.pdparams) |
# 提供模型所支持的任务场景、推理和预训练模型文件:
|model | task | model_size | number_of_parameters | max_position_embeddings | download |
|---|---|---|---|---| --- |
|ernie-m-base | pretrained model for each NLP task | 1.04G | 279M | 514 | [Pretrained_model](https://paddlenlp.bj.bcebos.com/models/transformers/ernie_m/ernie_m_base.pdparams) |
|ernie-m-large | pretrained model for each NLP task | 2.09G | 560M | 514 | [Pretrained_model](https://paddlenlp.bj.bcebos.com/models/transformers/ernie_m/ernie_m_large.pdparams) |
---
Model_Info:
name: "ERNIE-M"
description: "多语言的语言模型"
description_en: "A cross-lingual model"
icon: "@后续UE统一设计之后,会存到bos上某个位置"
from_repo: "PaddleNLP"
Task:
- tag_en: "Natural Language Processing"
tag: "自然语言处理"
sub_tag_en: "Pretrained Model"
sub_tag: "预训练模型"
Example: ""
Datasets: ""
Pulisher: "Baidu"
License: "apache.2.0"
Paper:
- title: "ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual\
\ Semantics with Monolingual Corpora"
url: "https://arxiv.org/pdf/2012.15674.pdf"
IfTraining: 0
IfOnlineDemo: 0
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. ERNIE-M模型简介\n",
"\n",
"\n",
"[ERNIE-M](https://arxiv.org/abs/2012.15674) 是百度提出的一种多语言语言模型。原文提出了一种新的训练方法,让模型能够将多种语言的表示与单语语料库对齐,以克服平行语料库大小对模型性能的限制。原文的主要想法是将回译机制整合到预训练的流程中,在单语语料库上生成伪平行句对,以便学习不同语言之间的语义对齐,从而增强跨语言模型的语义建模。实验结果表明,ERNIE-M 优于现有的跨语言模型,并在各种跨语言下游任务中提供了最新的 SOTA 结果。\n",
"\n",
"本项目是 ERNIE-M 的 PaddlePaddle 动态图实现, 包含模型训练,模型验证等内容。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. 模型效果及应用场景\n",
"\n",
"### 2.1 自然语言推断任务\n",
"\n",
"#### 2.1.1 数据集\n",
"\n",
"XNLI 是 MNLI 的子集,并且已被翻译成14种不同的语言(包含一些较低资源语言)。与 MNLI 一样,目标是预测文本蕴含(句子 A 是否暗示/矛盾/都不是句子 B )。\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. 模型如何使用\n",
"\n",
"## 3.1 模型训练\n",
"\n",
"* 安装PaddleNLP包\n",
"\n",
"```shell\n",
"pip install --upgrade paddlenlp\n",
"pip install datasets\n",
"```\n",
"\n",
"* 下载\n",
"\n",
"```shell\n",
"# 下载脚本文件(从gitee上更快)\n",
"wget https://gitee.com/paddlepaddle/PaddleNLP/blob/develop/model_zoo/ernie-m/run_classifier.py\n",
"```\n",
"\n",
"* 单卡训练\n",
"\n",
"```shell\n",
"python run_classifier.py \\\n",
" --task_type cross-lingual-transfer \\\n",
" --batch_size 16 \\\n",
" --model_name_or_path ernie-m-base \\\n",
" --save_steps 12272 \\\n",
" --output_dir output\n",
"```\n",
"\n",
"* 多卡训练\n",
"\n",
"```shell\n",
"python -m paddle.distributed.launch --gpus 0,1 --log_dir output run_classifier.py \\\n",
" --task_type cross-lingual-transfer \\\n",
" --batch_size 16 \\\n",
" --model_name_or_path ernie-m-base \\\n",
" --save_steps 12272 \\\n",
" --output_dir output\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.2 通用参数释义\n",
"\n",
"- `task_type` 表示了自然语言推断任务的类型,目前支持的类型为:\"cross-lingual-transfer\", \"translate-train-all\"\n",
" ,分别表示在英文数据集上训练并在所有15种语言数据集上测试、在所有15种语言数据集上训练和测试。\n",
"- `model_name_or_path` 指示了 Fine-tuning 使用的具体预训练模型以及预训练时使用的tokenizer,目前支持的预训练模型有:\"ernie-m-base\", \"ernie-m-large\"\n",
" 。若模型相关内容保存在本地,这里也可以提供相应目录地址,例如:\"./checkpoint/model_xx/\"。\n",
"- `output_dir` 表示模型保存路径。\n",
"- `max_seq_length` 表示最大句子长度,超过该长度将被截断,不足该长度的将会进行 padding。\n",
"- `learning_rate` 表示基础学习率大小,将于 learning rate scheduler 产生的值相乘作为当前学习率。\n",
"- `num_train_epochs` 表示训练轮数。\n",
"- `logging_steps` 表示日志打印间隔步数。\n",
"- `save_steps` 表示模型保存及评估间隔步数。\n",
"- `batch_size` 表示每次迭代**每张**卡上的样本数目。\n",
"- `weight_decay` 表示AdamW的权重衰减系数。\n",
"- `layerwise_decay` 表示 AdamW with Layerwise decay 的逐层衰减系数。\n",
"- `warmup_proportion` 表示学习率warmup系数。\n",
"- `max_steps` 表示最大训练步数。若训练`num_train_epochs`轮包含的训练步数大于该值,则达到`max_steps`后就提前结束。\n",
"- `seed` 表示随机数种子。\n",
"- `device` 表示训练使用的设备, 'gpu'表示使用 GPU, 'xpu'表示使用百度昆仑卡, 'cpu'表示使用 CPU。\n",
"- `use_amp` 表示是否启用自动混合精度训练。\n",
"- `scale_loss` 表示自动混合精度训练的参数。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. 模型原理"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"原文提出两种方法建模各种语言间的对齐关系:\n",
"\n",
"- **Cross-Attention Masked Language Modeling(CAMLM)**: 该算法在少量双语语料上捕捉语言间的对齐信息。其需要在不利用源句子上下文的情况下,通过目标句子还原被掩盖的词语,使模型初步建模了语言间的对齐关系。\n",
"- **Back-Translation masked language modeling(BTMLM)**: 该方法基于回译机制从单语语料中学习语言间的对齐关系。通过CAMLM 生成伪平行语料,然后让模型学习生成的伪平行句子,使模型可以利用单语语料更好地建模语义对齐关系。\n",
"\n",
"\n",
"![](https://foruda.gitee.com/images/1668157409546826003/f78cb949_5218658.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. 相关论文以及引用信息\n",
"\n",
"```bibtex\n",
"@article{Ouyang2021ERNIEMEM,\n",
" title={ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora},\n",
" author={Xuan Ouyang and Shuohuan Wang and Chao Pang and Yu Sun and Hao Tian and Hua Wu and Haifeng Wang},\n",
" journal={ArXiv},\n",
" year={2021},\n",
" volume={abs/2012.15674}\n",
"}\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7.13 ('model_center')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.13"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "de1ffcbce2b3061b5001e2c22f3a27594f323d4a49b789ebdbef6534581834bd"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. ERNIE-M Introduction\n",
"\n",
"ERNIE-M, proposed by Baidu, is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora, to overcome the constraint that the parallel corpus size places on the model performance. The insight is to integrate back-translation into the pre-training process by generating pseudo-parallel sentence pairs on a monolingual corpus to enable the learning of semantic alignments between different languages, thereby enhancing the semantic modeling of cross-lingual models. Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results in various cross-lingual downstream tasks.\n",
"\n",
"This project is a PaddlePaddle dynamic graph implementation of ERNIE-M, including model training, model validation, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Model Effects and Application Scenarios\n",
"\n",
"### 2.1 Natural Language Inference\n",
"\n",
"#### 2.1.1 Dataset\n",
"\n",
"XNLI is a subset of MNLI and has been translated into 14 different kinds of languages including some low-resource languages. The goal of the task is to predict testual entailment (whether sentence A implies / contradicts / neither sentence B).\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 3. How to Use the Model\n",
"\n",
"## 3.1 Model Training\n",
"\n",
"* prepare module\n",
"\n",
"```shell\n",
"pip install --upgrade paddlenlp\n",
"pip install datasets\n",
"```\n",
"\n",
"* download script file.\n",
"\n",
"```shell\n",
"# download from gitee\n",
"wget https://gitee.com/paddlepaddle/PaddleNLP/blob/develop/model_zoo/ernie-m/run_classifier.py\n",
"```\n",
"\n",
"* training with single gpu\n",
"\n",
"```shell\n",
"python run_classifier.py \\\n",
" --task_type cross-lingual-transfer \\\n",
" --batch_size 16 \\\n",
" --model_name_or_path ernie-m-base \\\n",
" --save_steps 12272 \\\n",
" --output_dir output\n",
"```\n",
"\n",
"* training with multiple gpu\n",
"\n",
"```shell\n",
"python -m paddle.distributed.launch --gpus 0,1 --log_dir output run_classifier.py \\\n",
" --task_type cross-lingual-transfer \\\n",
" --batch_size 16 \\\n",
" --model_name_or_path ernie-m-base \\\n",
" --save_steps 12272 \\\n",
" --output_dir output\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3.2 Parameter Description\n",
"\n",
"- `task_type` the type of Natural Language Inference. Supporting types include \"cross-lingual-transfer\", \"translate-train-all\", imply fine-tune the model with an English training set and evaluate the foreign language XNLI test and ine-tune the model on the concatenation of all other languages and evaluate on each language test set, respectively.\n",
"- `model_name_or_path` Path to pre-trained model. Supporting \"ernie-m-base\", \"ernie-m-large\" or the directory to the model saved on the local device, e.g. \"./checkpoint/model_xx/\"。\n",
"- `output_dir` The output directory where the model predictions and checkpoints will be written.\n",
"- `max_seq_length` The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded.\n",
"- `learning_rate` The initial learning rate for AdamW.\n",
"- `num_train_epochs` Total number of training epochs to perform.\n",
"- `logging_steps` Log every X updates steps.\n",
"- `save_steps` Save checkpoint every X updates steps.\n",
"- `batch_size` Batch size per GPU/CPU/XPU for training.\n",
"- `weight_decay` Weight decay ratio for AdamW.\n",
"- `layerwise_decay` Layerwise decay ratio.\n",
"- `warmup_proportion` Linear warmup over warmup_steps. If > 0: Override warmup_proportion.\n",
"- `max_steps` If > 0: set total number of training steps to perform. Override num_train_epochs.\n",
"- `seed` random seed for initialization.\n",
"- `device` The device to select to train the model, is must be cpu/gpu/xpu.\n",
"- `use_amp` Enable mixed precision training.\n",
"- `scale_loss` The value of scale_loss for fp16."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Model Principles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We proposed two novel methods to align the representation of multiple languages:\n",
"\n",
"- **Cross-Attention Masked Language Modeling(CAMLM)**: In CAMLM, we learn the multilingual semantic representation by restoring the MASK tokens in the input sentences.\n",
"- **Back-Translation masked language modeling(BTMLM)**: We use BTMLM to train our model to generate pseudo-parallel sentences from the monolingual sentences. The generated pairs are then used as the input of the model to further align the cross-lingual semantics, thus enhancing the multilingual representation.\n",
"\n",
"![](https://foruda.gitee.com/images/1668157409546826003/f78cb949_5218658.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Related papers and citations\n",
"\n",
"```bibtex\n",
"@article{Ouyang2021ERNIEMEM,\n",
" title={ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora},\n",
" author={Xuan Ouyang and Shuohuan Wang and Chao Pang and Yu Sun and Hao Tian and Hua Wu and Haifeng Wang},\n",
" journal={ArXiv},\n",
" year={2021},\n",
" volume={abs/2012.15674}\n",
"}\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.7.13 ('model_center')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.13"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "de1ffcbce2b3061b5001e2c22f3a27594f323d4a49b789ebdbef6534581834bd"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册