From 5b56b529cc4f560163eeeccbe573afde2dc23623 Mon Sep 17 00:00:00 2001 From: zhanghan17 Date: Wed, 10 Jun 2020 15:30:05 +0800 Subject: [PATCH] 160g to 430g --- ernie-gen/README.md | 34 +++--------- ernie-gen/README.zh.md | 33 +++--------- ernie-gen/configs/large_160g/coqa_conf | 52 ------------------ .../configs/large_160g/persona-chat_conf | 53 ------------------- .../{large_160g => large_430g}/cnndm_conf | 6 +-- .../gigaword-10k_conf | 6 +-- .../{large_160g => large_430g}/gigaword_conf | 6 +-- .../{large_160g => large_430g}/squad-qg_conf | 6 +-- 8 files changed, 28 insertions(+), 168 deletions(-) delete mode 100644 ernie-gen/configs/large_160g/coqa_conf delete mode 100644 ernie-gen/configs/large_160g/persona-chat_conf rename ernie-gen/configs/{large_160g => large_430g}/cnndm_conf (84%) rename ernie-gen/configs/{large_160g => large_430g}/gigaword-10k_conf (84%) rename ernie-gen/configs/{large_160g => large_430g}/gigaword_conf (84%) rename ernie-gen/configs/{large_160g => large_430g}/squad-qg_conf (83%) diff --git a/ernie-gen/README.md b/ernie-gen/README.md index 0976d28..34997eb 100644 --- a/ernie-gen/README.md +++ b/ernie-gen/README.md @@ -43,11 +43,11 @@ Specifically, the span-by-span generation task and word-by-word generation task ## Pre-trained Models -We release the checkpoints for **ERNIE-GEN _base_** model and **ERNIE-GEN _large_** model which are both pre-trained on English Wikipedia and [BookCorpus](https://arxiv.org/abs/1506.06724) (totally 16GB). Besides, **ERNIE-GEN _large_** pre-trained on the 160GB corpus (used by [RoBERTa](https://arxiv.org/abs/1907.11692) and [BART](https://arxiv.org/abs/1910.13461)) is available as well. +We release the checkpoints for **ERNIE-GEN _base_** model and **ERNIE-GEN _large_** model which are both pre-trained on English Wikipedia and [BookCorpus](https://arxiv.org/abs/1506.06724) (totally 16GB). Besides, **ERNIE-GEN _large_** pre-trained on the 430GB corpus (see [ERNIE-GEN Appendix A.1](https://arxiv.org/abs/2001.11314) for the description of the corpus) is available as well. - [**ERNIE-GEN _base_**](https://ernie.bj.bcebos.com/ernie_gen_base.tgz) (_lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters_) - [**ERNIE-GEN _large_**](https://ernie.bj.bcebos.com/ernie_gen_large.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_) -- [**ERNIE-GEN _large with 160G_**](https://ernie.bj.bcebos.com/ernie_gen_large_160g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_) +- [**ERNIE-GEN _large with 430G_**](https://ernie.bj.bcebos.com/ernie_gen_large_430g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_) ## Fine-tuning on Downstream Tasks @@ -65,7 +65,7 @@ The results on Gigaword-10k (10K examples of Gigaword) are presented as follows: | UniLM | 16G / 340M | 34.21 | 15.28 | 31.54 | | **ENRIE-GEN** _base_ | 16G / 110M | 33.75 | 15.23 | 31.35 | | **ERNIE-GEN** _large_ | 16G / 340M | 35.05 | 16.10 | 32.50 | -| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **35.51** | **16.79** | **33.23** | +| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **35.51** | **16.79** | **33.23** | The results on Gigaword are presented as follows: @@ -78,7 +78,7 @@ The results on Gigaword are presented as follows: | PEGASUS (_HugeNews_) | 3.8T / 568M | 39.12 | 19.86 | 36.24 | | **ENRIE-GEN** _base_ | 16G / 110M | 38.83 | 20.04 | 36.20 | | **ERNIE-GEN** _large_ | 16G / 340M | 39.25 | 20.25 | 36.53 | -| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **39.46** | **20.34** | **36.74** | +| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **39.46** | **20.34** | **36.74** | We preprocess the raw Gigaword dataset following UniLM, the preprocessed data is avalilable at this [Gigaword](https://ernie.bj.bcebos.com/gigaword.tgz). @@ -97,7 +97,7 @@ The results on CNN/Daily Mail are presented as follows: | PEGASUS (_HugeNews_) | 3.8T / 568M | 44.17 | 21.47 | 41.11 | | **ENRIE-GEN** _base_ | 16G / 110M | 42.30 | 19.92 | 39.68 | | **ENRIE-GEN** _large_ | 16G / 340M | 44.02 | 21.17 | 41.26 | -| **ENRIE-GEN** _large_ (160G) | 160G / 340M | **44.31** | 21.35 | **41.60** | +| **ENRIE-GEN** _large_ (430G) | 430G / 340M | **44.31** | 21.35 | **41.60** | We preprocess the raw CNN/Daily Mail dataset following UniLM, the preprocessed data is avalilable at this [CNN/Daily Mail](https://ernie.bj.bcebos.com/cnndm.tgz). @@ -114,7 +114,7 @@ The results on the [SQuAD 1.1](https://arxiv.org/abs/1806.03822) dataset followi | **ENRIE-GEN** _base_ (beam size=1) | 22.28 | 25.13 | 50.38 | | **ERNIE-GEN** _large_ (beam size=1) | 24.03 | 26.31 | 52.36 | | **ERNIE-GEN** _large_ (beam size=5) | 25.40 | **26.92** | 52.84 | -| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **25.41** | 26.77 | **52.91** | +| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **25.41** | 26.77 | **52.91** | The results following the reversed dev-test data split in [[Zhao et al., 2018]](https://www.aclweb.org/anthology/D18-1424/) are presented as follows: @@ -125,7 +125,7 @@ The results following the reversed dev-test data split in [[Zhao et al., 2018]]( | **ENRIE-GEN** _base_ (beam size=1) | 23.52 | 25.61 | 51.45 | | **ERNIE-GEN** _large_ (beam size=1) | 25.57 | 26.89 | 53.31 | | **ERNIE-GEN** _large_ (beam size=5) | 26.95 | **27.57** | 53.77 | -| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **27.05** | 27.43 | **53.83** | +| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **27.05** | 27.43 | **53.83** | *_Note that we also report the results with higher beam size to 5._ @@ -161,24 +161,6 @@ Results of development set on CoQA task is presented as follows: We preprocess the raw [CoQA](https://arxiv.org/abs/1808.07042) dataset, the preprocessed data is avalilable at this [CoQA-preprocessed](https://ernie.bj.bcebos.com/coqa.tgz). -Finally, we also compared with a concurrent work [ProphetNet](https://arxiv.org/abs/2001.04063), the fine-tuning results on Gigaword, CNN/Daily Mail and SQuAD are reported as follows: - -- _**Abstractive Summarization**_ - -| Model / Task | Data / Params | Gigaword |CNN/Daily Mail| -| :-------------------------------------------------------- | :----------------------------: | :----------------------: | :----------------------: | -| Metric | - | Rouge-1 / Rouge-2 / Rouge-L |Rouge-1 / Rouge-2 / Rouge-L| -| **ProphetNet** _large_ (160G) | 160G / 340M | **39.51** / **20.42** / 36.69 |44.20 / 21.17 / 41.30| -| **ERNIE-GEN** _large_ (160G) | 160G / 340M | 39.46 / 20.34 / **36.74** |**44.31** / **21.35** / **41.60**| - -- _**Question Generation**_ - -| Model | Data / Params | BLEU-4 / METEOR / Rouge-L |BLEU-4 / METEOR / Rouge-L| -| :-------------------------------------------------------- | :----------------------------: | :----------------------: |:----------------------: | -| Data split | - | Original |Reversed dev-test| -| **ProphetNet** _large_ (16G) | 16G / 340M | 25.01 / 26.83 / 52.57 |26.72 / **27.64** / **53.79** | -| **ERNIE-GEN** _large_ (16G) | 16G / 340M | **25.40** / **26.92** / **52.84** |**26.95** / 27.57 / **53.77**| - ## Usage ### Install PaddlePaddle @@ -191,7 +173,7 @@ pip install -r requirements.txt ### Fine-tuning Please update LD_LIBRARY_PATH about CUDA, cuDNN, NCCL2 before running ERNIE-GEN. We have put the parameter configurations of the above downstream tasks in `config/`. You can easily run finetuning through these configuration files. For example, you can finetune ERNIE-GEN base model on Gigaword by ```script -MODEL="base" # base or large or large_160g +MODEL="base" # base or large or large_430g TASK="gigaword" # cnndm, coqa, gigaword, squad_qg or persona-chat sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf ``` diff --git a/ernie-gen/README.zh.md b/ernie-gen/README.zh.md index f748ed3..9d66dd0 100644 --- a/ernie-gen/README.zh.md +++ b/ernie-gen/README.zh.md @@ -43,11 +43,11 @@ ## 预训练模型 -我们发布了 **ERNIE-GEN _base_** 模型和 **ERNIE-GEN _large_** 模型。 预训练数据使用英文维基百科和 BookCorpus,总共16GB。此外,我们还发布了基于 160GB 语料预训练的**ERNIE-GEN _large_** 模型,此份语料也被用于 [RoBERTa](https://arxiv.org/abs/1907.11692) 和 [BART](https://arxiv.org/abs/1910.13461) 的预训练。 +我们发布了 **ERNIE-GEN _base_** 模型和 **ERNIE-GEN _large_** 模型。 预训练数据使用英文维基百科和 BookCorpus,总共16GB。此外,我们还发布了基于 430GB 语料(数据描述见[ERNIE-GEN Appendix A.1](https://arxiv.org/abs/2001.11314))预训练的**ERNIE-GEN _large_** 模型。 - [**ERNIE-GEN _base_**](https://ernie.bj.bcebos.com/ernie_gen_base.tgz) (_lowercased | 12-layer, 768-hidden, 12-heads, 110M parameters_) - [**ERNIE-GEN _large_**](https://ernie.bj.bcebos.com/ernie_gen_large.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_) -- [**ERNIE-GEN _large with 160G_**](https://ernie.bj.bcebos.com/ernie_gen_large_160g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_) +- [**ERNIE-GEN _large with 430G_**](https://ernie.bj.bcebos.com/ernie_gen_large_430g.tgz) (_lowercased | 24-layer, 1024-hidden, 16-heads, 340M parameters_) ## 微调任务 @@ -65,7 +65,7 @@ | UniLM | 16G / 340M | 34.21 | 15.28 | 31.54 | | **ENRIE-GEN** _base_ | 16G / 110M | 33.75 | 15.23 | 31.35 | | **ERNIE-GEN** _large_ | 16G / 340M | 35.05 | 16.10 | 32.50 | -| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **35.51** | **16.79** | **33.23** | +| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **35.51** | **16.79** | **33.23** | 在 Gigaword 上的效果: @@ -78,7 +78,7 @@ | PEGASUS (_HugeNews_) | 3.8T / 568M | 39.12 | 19.86 | 36.24 | | **ENRIE-GEN** _base_ | 16G / 110M | 38.83 | 20.04 | 36.20 | | **ERNIE-GEN** _large_ | 16G / 340M | 39.25 | 20.25 | 36.53 | -| **ERNIE-GEN** _large_ (160G) | 160G / 340M | **39.46** | **20.34** | **36.74** | +| **ERNIE-GEN** _large_ (430G) | 430G / 340M | **39.46** | **20.34** | **36.74** | 我们按照 UniLM 的方式处理了数据,下载链接 [Gigaword](https://ernie.bj.bcebos.com/gigaword.tgz)。 @@ -97,7 +97,7 @@ | PEGASUS (_HugeNews_) | 3.8T / 568M | 44.17 | 21.47 | 41.11 | | **ENRIE-GEN** _base_ | 16G / 110M | 42.30 | 19.92 | 39.68 | | **ENRIE-GEN** _large_ | 16G / 340M | 44.02 | 21.17 | 41.26 | -| **ENRIE-GEN** _large_ (160G) | 160G / 340M | **44.31** | 21.35 | **41.60** | +| **ENRIE-GEN** _large_ (430G) | 430G / 340M | **44.31** | 21.35 | **41.60** | 我们按照 UniLM 的方式处理了数据,下载链接 [CNN/Daily Mail](https://ernie.bj.bcebos.com/cnndm.tgz)。 @@ -114,7 +114,7 @@ | **ENRIE-GEN** _base_ (beam size=1) | 22.28 | 25.13 | 50.38 | | **ERNIE-GEN** _large_ (beam size=1) | 24.03 | 26.31 | 52.36 | | **ERNIE-GEN** _large_ (beam size=5) | 25.40 | **26.92** | 52.84 | -| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **25.41** | 26.77 | **52.91** | +| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **25.41** | 26.77 | **52.91** | 按照 [[Zhao et al., 2018]](https://www.aclweb.org/anthology/D18-1424/) 反向使用验证集和测试集,效果如下: @@ -125,7 +125,7 @@ | **ENRIE-GEN** _base_ (beam size=1) | 23.52 | 25.61 | 51.45 | | **ERNIE-GEN** _large_ (beam size=1) | 25.57 | 26.89 | 53.31 | | **ERNIE-GEN** _large_ (beam size=5) | 26.95 | **27.57** | 53.77 | -| **ERNIE-GEN** _large_ (beam size=5) + (160G) | **27.05** | 27.43 | **53.83** | +| **ERNIE-GEN** _large_ (beam size=5) + (430G) | **27.05** | 27.43 | **53.83** | *_我们增加了将 beam size 扩大到 5 的结果。_ @@ -159,23 +159,6 @@ 我们对原始的 CoQA 数据集进行了处理,下载链接 [CoQA](https://ernie.bj.bcebos.com/coqa.tgz)。 -此外,我们与同期的工作 [ProphetNet](https://arxiv.org/abs/2001.04063) 在 Gigaword,CNN/Daily Mail 和 SQuAD 三个数据集上进行了对比: - -- _**生成式摘要**_ - -| 模型 / 任务 | 数据量 / 参数量 | Gigaword |CNN/Daily Mail| -| :-------------------------------------------------------- | :------------------------------: | :----------------------: | :----------------------: | -| Metric | - | Rouge-1 / Rouge-2 / Rouge-L |Rouge-1 / Rouge-2 / Rouge-L| -| ProphetNet _large_ (160G) | 160G / 340M | **39.51** / **20.42** / 36.69 |44.20 / 21.17 / 41.30| -| **ERNIE-GEN** _large_ (160G) | 160G / 340M | 39.46 / 20.34 / **36.74** |**44.31** / **21.35** / **41.60**| - -- _**问题生成**_ - -| 模型 | 数据量 / 参数量 | BLEU-4 / METEOR / Rouge-L |BLEU-4 / METEOR / Rouge-L| -| :-------------------------------------------------------- | :------------------------------: | :----------------------: |:----------------------: | -| Data split | - | Original |Reversed dev-test| -| ProphetNet** _large_ (16G) | 16G / 340M | 25.01 / 26.83 / 52.57 |26.72 / **27.64** / **53.79** | -| **ERNIE-GEN** _large_ (16G) | 16G / 340M | **25.40** / **26.92** / **52.84** |**26.95** / 27.57 / **53.77**| ## 使用说明 @@ -189,7 +172,7 @@ pip install -r requirements.txt ### 运行微调 在运行 ERNIE-GEN 前,需要将 CUDA 、cuDNN 、NCCL2 的动态库路径添加到 LD_LIBRARY_PATH 。 我们把下游任务的参数配置文件放到了 `config/` ,可以简单地通过配置文件运行。 例如,您可以通过下面的指令在 Gigaword 数据集上微调 ERNIE-GEN base 模型: ```script -MODEL="base" # base or large or large_160g +MODEL="base" # base or large or large_430g TASK="gigaword" # cnndm, coqa, gigaword, squad_qg or persona-chat sh run_seq2seq.sh ./configs/${MODEL}/${TASK}_conf ``` diff --git a/ernie-gen/configs/large_160g/coqa_conf b/ernie-gen/configs/large_160g/coqa_conf deleted file mode 100644 index fb81787..0000000 --- a/ernie-gen/configs/large_160g/coqa_conf +++ /dev/null @@ -1,52 +0,0 @@ -#load model -vocab_path="ernie_gen_large/vocab.txt" -config_path="ernie_gen_large/ernie_config.json" -init_model="ernie_gen_large/params" - -#for multi-turn dialog/qa -task_type="dialog" -role_type_size=3 -turn_type_size=16 - -#input -max_src_len=480 -max_tgt_len=32 -tokenized_input="true" -continuous_position="true" -batch_size=4 -in_tokens="false" -#tgt_type_id=1 - -#decode -do_decode="true" -max_dec_len=30 -beam_size=3 -length_penalty=0.0 -use_multi_gpu_test="true" - -#train -epoch=10 -weight_decay=0.01 -label_smooth=0.1 -hidden_dropout_prob=0.1 -save_and_valid_by_epoch="true" -#lr -warmup_proportion=0.1 -lr_scheduler="linear_warmup_decay" -learning_rate=1e-5 -#noise -random_noise="false" -noise_prob=0.5 - -#dataset -data_path="./datasets/coqa/" -train_set="train.tsv" -dev_set="dev.tsv" -do_train="true" -do_val="true" -do_test="false" -do_pred="false" - -#evaluate -eval_script="sh ./eval/tasks/coqa/eval.sh" -eval_mertrics="f1" diff --git a/ernie-gen/configs/large_160g/persona-chat_conf b/ernie-gen/configs/large_160g/persona-chat_conf deleted file mode 100644 index 66e2379..0000000 --- a/ernie-gen/configs/large_160g/persona-chat_conf +++ /dev/null @@ -1,53 +0,0 @@ -#load model -vocab_path="ernie_gen_large/vocab.txt" -config_path="ernie_gen_large/ernie_config.json" -init_model="ernie_gen_large/params" - -#for multi-turn dialog/qa -task_type="dialog" -role_type_size=3 -turn_type_size=16 - -#input -max_src_len=472 -max_tgt_len=40 -tokenized_input="true" -continuous_position="true" -batch_size=8 -in_tokens="false" - -#decode -do_decode="true" -max_dec_len=32 -beam_size=10 -length_penalty=1.3 -use_multi_gpu_test="true" - -#train -epoch=30 -weight_decay=0.01 -label_smooth=0.0 -hidden_dropout_prob=0.1 -save_and_valid_by_epoch="true" -#lr -warmup_proportion=0.1 -lr_scheduler="linear_warmup_decay" -learning_rate=1e-4 -#noise -random_noise="false" -noise_prob=0.0 - -#dataset -data_path="./datasets/persona_chat/" -train_set="train.tsv" -dev_set="dev.2k.tsv" -pred_set="test.tsv" -do_train="true" -do_val="true" -do_test="false" -do_pred="true" -do_decode="true" - -#evaluate -eval_script="sh ./eval/tasks/persona_chat/eval.sh" -eval_mertrics="bleu_1,bleu_2,distinct_1,distinct_2" diff --git a/ernie-gen/configs/large_160g/cnndm_conf b/ernie-gen/configs/large_430g/cnndm_conf similarity index 84% rename from ernie-gen/configs/large_160g/cnndm_conf rename to ernie-gen/configs/large_430g/cnndm_conf index 4b0bd7d..886baa6 100644 --- a/ernie-gen/configs/large_160g/cnndm_conf +++ b/ernie-gen/configs/large_430g/cnndm_conf @@ -1,7 +1,7 @@ #load model -vocab_path="ernie_gen_large_160g/vocab.txt" -config_path="ernie_gen_large_160g/ernie_config.json" -init_model="ernie_gen_large_160g/params" +vocab_path="ernie_gen_large_430g/vocab.txt" +config_path="ernie_gen_large_430g/ernie_config.json" +init_model="ernie_gen_large_430g/params" #input max_src_len=640 diff --git a/ernie-gen/configs/large_160g/gigaword-10k_conf b/ernie-gen/configs/large_430g/gigaword-10k_conf similarity index 84% rename from ernie-gen/configs/large_160g/gigaword-10k_conf rename to ernie-gen/configs/large_430g/gigaword-10k_conf index 89a4f90..df7dece 100644 --- a/ernie-gen/configs/large_160g/gigaword-10k_conf +++ b/ernie-gen/configs/large_430g/gigaword-10k_conf @@ -1,7 +1,7 @@ #load model -vocab_path="ernie_gen_large_160g/vocab.txt" -config_path="ernie_gen_large_160g/ernie_config.json" -init_model="ernie_gen_large_160g/params" +vocab_path="ernie_gen_large_430g/vocab.txt" +config_path="ernie_gen_large_430g/ernie_config.json" +init_model="ernie_gen_large_430g/params" #input max_src_len=192 diff --git a/ernie-gen/configs/large_160g/gigaword_conf b/ernie-gen/configs/large_430g/gigaword_conf similarity index 84% rename from ernie-gen/configs/large_160g/gigaword_conf rename to ernie-gen/configs/large_430g/gigaword_conf index 4d31e9a..9ddec1c 100644 --- a/ernie-gen/configs/large_160g/gigaword_conf +++ b/ernie-gen/configs/large_430g/gigaword_conf @@ -1,7 +1,7 @@ #load model -vocab_path="ernie_gen_large_160g/vocab.txt" -config_path="ernie_gen_large_160g/ernie_config.json" -init_model="ernie_gen_large_160g/params" +vocab_path="ernie_gen_large_430g/vocab.txt" +config_path="ernie_gen_large_430g/ernie_config.json" +init_model="ernie_gen_large_430g/params" #input max_src_len=192 diff --git a/ernie-gen/configs/large_160g/squad-qg_conf b/ernie-gen/configs/large_430g/squad-qg_conf similarity index 83% rename from ernie-gen/configs/large_160g/squad-qg_conf rename to ernie-gen/configs/large_430g/squad-qg_conf index 85953f8..ac35468 100644 --- a/ernie-gen/configs/large_160g/squad-qg_conf +++ b/ernie-gen/configs/large_430g/squad-qg_conf @@ -1,7 +1,7 @@ #load model -vocab_path="ernie_gen_large_160g/vocab.txt" -config_path="ernie_gen_large_160g/ernie_config.json" -init_model="ernie_gen_large_160g/params" +vocab_path="ernie_gen_large_430g/vocab.txt" +config_path="ernie_gen_large_430g/ernie_config.json" +init_model="ernie_gen_large_430g/params" #input max_src_len=512 -- GitLab