diff --git a/ernie-gen/README.md b/ernie-gen/README.md
index 0976d283de96b55ba483f1d3af8c623d9a03d2a1..34997eb576a00f67b4835c8f8fd32cc626dbcd8a 100644
--- a/ernie-gen/README.md
+++ b/ernie-gen/README.md
@@ -43,11 +43,11 @@ Specifically, the span-by-span generation task and word-by-word generation task
## Pre-trained Models
-We release the checkpoints for **ERNIE-GEN _base_** model and **ERNIE-GEN _large_** model which are both pre-trained on English Wikipedia and [BookCorpus](https://arxiv.org/abs/1506.06724) (totally 16GB). Besides, **ERNIE-GEN _large_** pre-trained on the 160GB corpus (used by [RoBERTa](https://arxiv.org/abs/1907.11692) and [BART](https://arxiv.org/abs/1910.13461)) is available as well.
+We release the checkpoints for **ERNIE-GEN _base_** model and **ERNIE-GEN _large_** model which are both pre-trained on English Wikipedia and [BookCorpus](https://arxiv.org/abs/1506.06724) (totally 16GB). Besides, **ERNIE-GEN _large_** pre-trained on the 430GB corpus (see [ERNIE-GEN Appendix A.1](https://arxiv.org/abs/2001.11314) for the description of the corpus) is available as well.
- [**ERNIE-GEN _base_**](https://ernie.bj.bcebos.com/ernie_gen_base.tgz) (_lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters_)
- [**ERNIE-GEN _large_**](https://ernie.bj.bcebos.com/ernie_gen_large.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_)
-- [**ERNIE-GEN _large with 160G_**](https://ernie.bj.bcebos.com/ernie_gen_large_160g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_)
+- [**ERNIE-GEN _large with 430G_**](https://ernie.bj.bcebos.com/ernie_gen_large_430g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_)
## Fine-tuning on Downstream Tasks
@@ -65,7 +65,7 @@ The results on Gigaword-10k (10K examples of Gigaword) are presented as follows:
| UniLM | 16G / 340M | 34.21 | 15.28 | 31.54 |
| **ENRIE-GEN** _base_ | 16G / 110M | 33.75 | 15.23 | 31.35 |
| **ERNIE-GEN** _large_ | 16G / 340M | 35.05 | 16.10 | 32.50 |
-| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **35.51** | **16.79** | **33.23** |
+| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **35.51** | **16.79** | **33.23** |
The results on Gigaword are presented as follows:
@@ -78,7 +78,7 @@ The results on Gigaword are presented as follows:
| PEGASUS (_HugeNews_) | 3.8T / 568M | 39.12 | 19.86 | 36.24 |
| **ENRIE-GEN** _base_ | 16G / 110M | 38.83 | 20.04 | 36.20 |
| **ERNIE-GEN** _large_ | 16G / 340M | 39.25 | 20.25 | 36.53 |
-| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **39.46** | **20.34** | **36.74** |
+| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **39.46** | **20.34** | **36.74** |
We preprocess the raw Gigaword dataset following UniLM, the preprocessed data is avalilable at this [Gigaword](https://ernie.bj.bcebos.com/gigaword.tgz).
@@ -97,7 +97,7 @@ The results on CNN/Daily Mail are presented as follows:
| PEGASUS (_HugeNews_) | 3.8T / 568M | 44.17 | 21.47 | 41.11 |
| **ENRIE-GEN** _base_ | 16G / 110M | 42.30 | 19.92 | 39.68 |
| **ENRIE-GEN** _large_ | 16G / 340M | 44.02 | 21.17 | 41.26 |
-| **ENRIE-GEN** _large_ (160G) | 160G / 340M | **44.31** | 21.35 | **41.60** |
+| **ENRIE-GEN** _large_ (430G) | 430G / 340M | **44.31** | 21.35 | **41.60** |
We preprocess the raw CNN/Daily Mail dataset following UniLM, the preprocessed data is avalilable at this [CNN/Daily Mail](https://ernie.bj.bcebos.com/cnndm.tgz).
@@ -114,7 +114,7 @@ The results on the [SQuAD 1.1](https://arxiv.org/abs/1806.03822) dataset followi
| **ENRIE-GEN** _base_ (beam size=1) | 22.28 | 25.13 | 50.38 |
| **ERNIE-GEN** _large_ (beam size=1) | 24.03 | 26.31 | 52.36 |
| **ERNIE-GEN** _large_ (beam size=5) | 25.40 | **26.92** | 52.84 |
-| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **25.41** | 26.77 | **52.91** |
+| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **25.41** | 26.77 | **52.91** |
The results following the reversed dev-test data split in [[Zhao et al., 2018]](https://www.aclweb.org/anthology/D18-1424/) are presented as follows:
@@ -125,7 +125,7 @@ The results following the reversed dev-test data split in [[Zhao et al., 2018]](
| **ENRIE-GEN** _base_ (beam size=1) | 23.52 | 25.61 | 51.45 |
| **ERNIE-GEN** _large_ (beam size=1) | 25.57 | 26.89 | 53.31 |
| **ERNIE-GEN** _large_ (beam size=5) | 26.95 | **27.57** | 53.77 |
-| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **27.05** | 27.43 | **53.83** |
+| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **27.05** | 27.43 | **53.83** |
*_Note that we also report the results with higher beam size to 5._
@@ -161,24 +161,6 @@ Results of development set on CoQA task is presented as follows:
We preprocess the raw [CoQA](https://arxiv.org/abs/1808.07042) dataset, the preprocessed data is avalilable at this [CoQA-preprocessed](https://ernie.bj.bcebos.com/coqa.tgz).
-Finally, we also compared with a concurrent work [ProphetNet](https://arxiv.org/abs/2001.04063), the fine-tuning results on Gigaword, CNN/Daily Mail and SQuAD are reported as follows:
-
-- _**Abstractive Summarization**_
-
-| Model / Task | Data / Params | Gigaword |CNN/Daily Mail|
-| :-------------------------------------------------------- | :----------------------------: | :----------------------: | :----------------------: |
-| Metric | - | Rouge-1 / Rouge-2 / Rouge-L |Rouge-1 / Rouge-2 / Rouge-L|
-| **ProphetNet** _large_ (160G) | 160G / 340M | **39.51** / **20.42** / 36.69 |44.20 / 21.17 / 41.30|
-| **ERNIE-GEN** _large_ (160G) | 160G / 340M | 39.46 / 20.34 / **36.74** |**44.31** / **21.35** / **41.60**|
-
-- _**Question Generation**_
-
-| Model | Data / Params | BLEU-4 / METEOR / Rouge-L |BLEU-4 / METEOR / Rouge-L|
-| :-------------------------------------------------------- | :----------------------------: | :----------------------: |:----------------------: |
-| Data split | - | Original |Reversed dev-test|
-| **ProphetNet** _large_ (16G) | 16G / 340M | 25.01 / 26.83 / 52.57 |26.72 / **27.64** / **53.79** |
-| **ERNIE-GEN** _large_ (16G) | 16G / 340M | **25.40** / **26.92** / **52.84** |**26.95** / 27.57 / **53.77**|
-
## Usage
### Install PaddlePaddle
@@ -191,7 +173,7 @@ pip install -r requirements.txt
### Fine-tuning
Please update LD_LIBRARY_PATH about CUDA, cuDNN, NCCL2 before running ERNIE-GEN. We have put the parameter configurations of the above downstream tasks in `config/`. You can easily run finetuning through these configuration files. For example, you can finetune ERNIE-GEN base model on Gigaword by
```script
-MODEL="base" # base or large or large_160g
+MODEL="base" # base or large or large_430g
TASK="gigaword" # cnndm, coqa, gigaword, squad_qg or persona-chat
sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf
```
diff --git a/ernie-gen/README.zh.md b/ernie-gen/README.zh.md
index f748ed3303366102a3cbf5cef21790bb419d651a..9d66dd0a23fb44780ca08e61e28fd72310b0d1c5 100644
--- a/ernie-gen/README.zh.md
+++ b/ernie-gen/README.zh.md
@@ -43,11 +43,11 @@
## 预训练模型
-我们发布了 **ERNIE-GEN _base_** 模型和 **ERNIE-GEN _large_** 模型。 预训练数据使用英文维基百科和 BookCorpus,总共16GB。此外,我们还发布了基于 160GB 语料预训练的**ERNIE-GEN _large_** 模型,此份语料也被用于 [RoBERTa](https://arxiv.org/abs/1907.11692) 和 [BART](https://arxiv.org/abs/1910.13461) 的预训练。
+我们发布了 **ERNIE-GEN _base_** 模型和 **ERNIE-GEN _large_** 模型。 预训练数据使用英文维基百科和 BookCorpus,总共16GB。此外,我们还发布了基于 430GB 语料(数据描述见[ERNIE-GEN Appendix A.1](https://arxiv.org/abs/2001.11314))预训练的**ERNIE-GEN _large_** 模型。
- [**ERNIE-GEN _base_**](https://ernie.bj.bcebos.com/ernie_gen_base.tgz) (_lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters_)
- [**ERNIE-GEN _large_**](https://ernie.bj.bcebos.com/ernie_gen_large.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_)
-- [**ERNIE-GEN _large with 160G_**](https://ernie.bj.bcebos.com/ernie_gen_large_160g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_)
+- [**ERNIE-GEN _large with 430G_**](https://ernie.bj.bcebos.com/ernie_gen_large_430g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_)
## 微调任务
@@ -65,7 +65,7 @@
| UniLM | 16G / 340M | 34.21 | 15.28 | 31.54 |
| **ENRIE-GEN** _base_ | 16G / 110M | 33.75 | 15.23 | 31.35 |
| **ERNIE-GEN** _large_ | 16G / 340M | 35.05 | 16.10 | 32.50 |
-| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **35.51** | **16.79** | **33.23** |
+| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **35.51** | **16.79** | **33.23** |
在 Gigaword 上的效果:
@@ -78,7 +78,7 @@
| PEGASUS (_HugeNews_) | 3.8T / 568M | 39.12 | 19.86 | 36.24 |
| **ENRIE-GEN** _base_ | 16G / 110M | 38.83 | 20.04 | 36.20 |
| **ERNIE-GEN** _large_ | 16G / 340M | 39.25 | 20.25 | 36.53 |
-| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **39.46** | **20.34** | **36.74** |
+| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **39.46** | **20.34** | **36.74** |
我们按照 UniLM 的方式处理了数据,下载链接 [Gigaword](https://ernie.bj.bcebos.com/gigaword.tgz)。
@@ -97,7 +97,7 @@
| PEGASUS (_HugeNews_) | 3.8T / 568M | 44.17 | 21.47 | 41.11 |
| **ENRIE-GEN** _base_ | 16G / 110M | 42.30 | 19.92 | 39.68 |
| **ENRIE-GEN** _large_ | 16G / 340M | 44.02 | 21.17 | 41.26 |
-| **ENRIE-GEN** _large_ (160G) | 160G / 340M | **44.31** | 21.35 | **41.60** |
+| **ENRIE-GEN** _large_ (430G) | 430G / 340M | **44.31** | 21.35 | **41.60** |
我们按照 UniLM 的方式处理了数据,下载链接 [CNN/Daily Mail](https://ernie.bj.bcebos.com/cnndm.tgz)。
@@ -114,7 +114,7 @@
| **ENRIE-GEN** _base_ (beam size=1) | 22.28 | 25.13 | 50.38 |
| **ERNIE-GEN** _large_ (beam size=1) | 24.03 | 26.31 | 52.36 |
| **ERNIE-GEN** _large_ (beam size=5) | 25.40 | **26.92** | 52.84 |
-| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **25.41** | 26.77 | **52.91** |
+| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **25.41** | 26.77 | **52.91** |
按照 [[Zhao et al., 2018]](https://www.aclweb.org/anthology/D18-1424/) 反向使用验证集和测试集,效果如下:
@@ -125,7 +125,7 @@
| **ENRIE-GEN** _base_ (beam size=1) | 23.52 | 25.61 | 51.45 |
| **ERNIE-GEN** _large_ (beam size=1) | 25.57 | 26.89 | 53.31 |
| **ERNIE-GEN** _large_ (beam size=5) | 26.95 | **27.57** | 53.77 |
-| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **27.05** | 27.43 | **53.83** |
+| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **27.05** | 27.43 | **53.83** |
*_我们增加了将 beam size 扩大到 5 的结果。_
@@ -159,23 +159,6 @@
我们对原始的 CoQA 数据集进行了处理,下载链接 [CoQA](https://ernie.bj.bcebos.com/coqa.tgz)。
-此外,我们与同期的工作 [ProphetNet](https://arxiv.org/abs/2001.04063) 在 Gigaword,CNN/Daily Mail 和 SQuAD 三个数据集上进行了对比:
-
-- _**生成式摘要**_
-
-| 模型 / 任务 | 数据量 / 参数量 | Gigaword |CNN/Daily Mail|
-| :-------------------------------------------------------- | :------------------------------: | :----------------------: | :----------------------: |
-| Metric | - | Rouge-1 / Rouge-2 / Rouge-L |Rouge-1 / Rouge-2 / Rouge-L|
-| ProphetNet _large_ (160G) | 160G / 340M | **39.51** / **20.42** / 36.69 |44.20 / 21.17 / 41.30|
-| **ERNIE-GEN** _large_ (160G) | 160G / 340M | 39.46 / 20.34 / **36.74** |**44.31** / **21.35** / **41.60**|
-
-- _**问题生成**_
-
-| 模型 | 数据量 / 参数量 | BLEU-4 / METEOR / Rouge-L |BLEU-4 / METEOR / Rouge-L|
-| :-------------------------------------------------------- | :------------------------------: | :----------------------: |:----------------------: |
-| Data split | - | Original |Reversed dev-test|
-| ProphetNet** _large_ (16G) | 16G / 340M | 25.01 / 26.83 / 52.57 |26.72 / **27.64** / **53.79** |
-| **ERNIE-GEN** _large_ (16G) | 16G / 340M | **25.40** / **26.92** / **52.84** |**26.95** / 27.57 / **53.77**|
## 使用说明
@@ -189,7 +172,7 @@ pip install -r requirements.txt
### 运行微调
在运行 ERNIE-GEN 前,需要将 CUDA 、cuDNN 、NCCL2 的动态库路径添加到 LD_LIBRARY_PATH 。 我们把下游任务的参数配置文件放到了 `config/` ,可以简单地通过配置文件运行。 例如,您可以通过下面的指令在 Gigaword 数据集上微调 ERNIE-GEN base 模型:
```script
-MODEL="base" # base or large or large_160g
+MODEL="base" # base or large or large_430g
TASK="gigaword" # cnndm, coqa, gigaword, squad_qg or persona-chat
sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf
```
diff --git a/ernie-gen/configs/large_160g/coqa_conf b/ernie-gen/configs/large_160g/coqa_conf
deleted file mode 100644
index fb8178733e62c13acf355002d021df43727a3e95..0000000000000000000000000000000000000000
--- a/ernie-gen/configs/large_160g/coqa_conf
+++ /dev/null
@@ -1,52 +0,0 @@
-#load model
-vocab_path="ernie_gen_large/vocab.txt"
-config_path="ernie_gen_large/ernie_config.json"
-init_model="ernie_gen_large/params"
-
-#for multi-turn dialog/qa
-task_type="dialog"
-role_type_size=3
-turn_type_size=16
-
-#input
-max_src_len=480
-max_tgt_len=32
-tokenized_input="true"
-continuous_position="true"
-batch_size=4
-in_tokens="false"
-#tgt_type_id=1
-
-#decode
-do_decode="true"
-max_dec_len=30
-beam_size=3
-length_penalty=0.0
-use_multi_gpu_test="true"
-
-#train
-epoch=10
-weight_decay=0.01
-label_smooth=0.1
-hidden_dropout_prob=0.1
-save_and_valid_by_epoch="true"
-#lr
-warmup_proportion=0.1
-lr_scheduler="linear_warmup_decay"
-learning_rate=1e-5
-#noise
-random_noise="false"
-noise_prob=0.5
-
-#dataset
-data_path="./datasets/coqa/"
-train_set="train.tsv"
-dev_set="dev.tsv"
-do_train="true"
-do_val="true"
-do_test="false"
-do_pred="false"
-
-#evaluate
-eval_script="sh ./eval/tasks/coqa/eval.sh"
-eval_mertrics="f1"
diff --git a/ernie-gen/configs/large_160g/persona-chat_conf b/ernie-gen/configs/large_160g/persona-chat_conf
deleted file mode 100644
index 66e2379ff34de4d4781b814809f7f70b6fbbbe34..0000000000000000000000000000000000000000
--- a/ernie-gen/configs/large_160g/persona-chat_conf
+++ /dev/null
@@ -1,53 +0,0 @@
-#load model
-vocab_path="ernie_gen_large/vocab.txt"
-config_path="ernie_gen_large/ernie_config.json"
-init_model="ernie_gen_large/params"
-
-#for multi-turn dialog/qa
-task_type="dialog"
-role_type_size=3
-turn_type_size=16
-
-#input
-max_src_len=472
-max_tgt_len=40
-tokenized_input="true"
-continuous_position="true"
-batch_size=8
-in_tokens="false"
-
-#decode
-do_decode="true"
-max_dec_len=32
-beam_size=10
-length_penalty=1.3
-use_multi_gpu_test="true"
-
-#train
-epoch=30
-weight_decay=0.01
-label_smooth=0.0
-hidden_dropout_prob=0.1
-save_and_valid_by_epoch="true"
-#lr
-warmup_proportion=0.1
-lr_scheduler="linear_warmup_decay"
-learning_rate=1e-4
-#noise
-random_noise="false"
-noise_prob=0.0
-
-#dataset
-data_path="./datasets/persona_chat/"
-train_set="train.tsv"
-dev_set="dev.2k.tsv"
-pred_set="test.tsv"
-do_train="true"
-do_val="true"
-do_test="false"
-do_pred="true"
-do_decode="true"
-
-#evaluate
-eval_script="sh ./eval/tasks/persona_chat/eval.sh"
-eval_mertrics="bleu_1,bleu_2,distinct_1,distinct_2"
diff --git a/ernie-gen/configs/large_160g/cnndm_conf b/ernie-gen/configs/large_430g/cnndm_conf
similarity index 84%
rename from ernie-gen/configs/large_160g/cnndm_conf
rename to ernie-gen/configs/large_430g/cnndm_conf
index 4b0bd7dbe0d3fa97a7df2fcad879266af3f532e0..886baa665b44a610c515ebf1654258a7534a8a16 100644
--- a/ernie-gen/configs/large_160g/cnndm_conf
+++ b/ernie-gen/configs/large_430g/cnndm_conf
@@ -1,7 +1,7 @@
#load model
-vocab_path="ernie_gen_large_160g/vocab.txt"
-config_path="ernie_gen_large_160g/ernie_config.json"
-init_model="ernie_gen_large_160g/params"
+vocab_path="ernie_gen_large_430g/vocab.txt"
+config_path="ernie_gen_large_430g/ernie_config.json"
+init_model="ernie_gen_large_430g/params"
#input
max_src_len=640
diff --git a/ernie-gen/configs/large_160g/gigaword-10k_conf b/ernie-gen/configs/large_430g/gigaword-10k_conf
similarity index 84%
rename from ernie-gen/configs/large_160g/gigaword-10k_conf
rename to ernie-gen/configs/large_430g/gigaword-10k_conf
index 89a4f90350817be87b33c02b6fef55f93b58d3a9..df7dece5a4e78103306d0dab8009c5c6f37fdec6 100644
--- a/ernie-gen/configs/large_160g/gigaword-10k_conf
+++ b/ernie-gen/configs/large_430g/gigaword-10k_conf
@@ -1,7 +1,7 @@
#load model
-vocab_path="ernie_gen_large_160g/vocab.txt"
-config_path="ernie_gen_large_160g/ernie_config.json"
-init_model="ernie_gen_large_160g/params"
+vocab_path="ernie_gen_large_430g/vocab.txt"
+config_path="ernie_gen_large_430g/ernie_config.json"
+init_model="ernie_gen_large_430g/params"
#input
max_src_len=192
diff --git a/ernie-gen/configs/large_160g/gigaword_conf b/ernie-gen/configs/large_430g/gigaword_conf
similarity index 84%
rename from ernie-gen/configs/large_160g/gigaword_conf
rename to ernie-gen/configs/large_430g/gigaword_conf
index 4d31e9a19b6b73c4eaeba2e448de9d9f96c374ea..9ddec1ceb5115e033c1a6527f9fce702b306f552 100644
--- a/ernie-gen/configs/large_160g/gigaword_conf
+++ b/ernie-gen/configs/large_430g/gigaword_conf
@@ -1,7 +1,7 @@
#load model
-vocab_path="ernie_gen_large_160g/vocab.txt"
-config_path="ernie_gen_large_160g/ernie_config.json"
-init_model="ernie_gen_large_160g/params"
+vocab_path="ernie_gen_large_430g/vocab.txt"
+config_path="ernie_gen_large_430g/ernie_config.json"
+init_model="ernie_gen_large_430g/params"
#input
max_src_len=192
diff --git a/ernie-gen/configs/large_160g/squad-qg_conf b/ernie-gen/configs/large_430g/squad-qg_conf
similarity index 83%
rename from ernie-gen/configs/large_160g/squad-qg_conf
rename to ernie-gen/configs/large_430g/squad-qg_conf
index 85953f8a705deb85ccd50070175c65777d271b01..ac35468c1e87ddcced10d2a5a303d66de45886de 100644
--- a/ernie-gen/configs/large_160g/squad-qg_conf
+++ b/ernie-gen/configs/large_430g/squad-qg_conf
@@ -1,7 +1,7 @@
#load model
-vocab_path="ernie_gen_large_160g/vocab.txt"
-config_path="ernie_gen_large_160g/ernie_config.json"
-init_model="ernie_gen_large_160g/params"
+vocab_path="ernie_gen_large_430g/vocab.txt"
+config_path="ernie_gen_large_430g/ernie_config.json"
+init_model="ernie_gen_large_430g/params"
#input
max_src_len=512