Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
ERNIE
提交
2a3d001d
E
ERNIE
项目概览
PaddlePaddle
/
ERNIE
大约 2 年 前同步成功
通知
115
Star
5997
Fork
1271
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
29
列表
看板
标记
里程碑
合并请求
0
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
E
ERNIE
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
29
Issue
29
列表
看板
标记
里程碑
合并请求
0
合并请求
0
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
未验证
提交
2a3d001d
编写于
5月 25, 2021
作者:
X
xfcygaocan
提交者:
GitHub
5月 25, 2021
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
add unimo large and most of tasks (#678)
上级
7cd60539
变更
49
显示空白变更内容
内联
并排
Showing
49 changed file
with
4635 addition
and
42 deletion
+4635
-42
ernie-unimo/README.md
ernie-unimo/README.md
+553
-13
ernie-unimo/model_files/download_unimo_base_mnli_en.sh
ernie-unimo/model_files/download_unimo_base_mnli_en.sh
+14
-0
ernie-unimo/model_files/download_unimo_large_en.sh
ernie-unimo/model_files/download_unimo_large_en.sh
+14
-0
ernie-unimo/model_files/download_unimo_large_mnli_en.sh
ernie-unimo/model_files/download_unimo_large_mnli_en.sh
+14
-0
ernie-unimo/script/classification/CoLA/model_conf
ernie-unimo/script/classification/CoLA/model_conf
+29
-0
ernie-unimo/script/classification/CoLA/run.sh
ernie-unimo/script/classification/CoLA/run.sh
+101
-0
ernie-unimo/script/classification/CoLA_large/model_conf
ernie-unimo/script/classification/CoLA_large/model_conf
+29
-0
ernie-unimo/script/classification/CoLA_large/run.sh
ernie-unimo/script/classification/CoLA_large/run.sh
+101
-0
ernie-unimo/script/classification/MNLI-AX/model_conf
ernie-unimo/script/classification/MNLI-AX/model_conf
+33
-0
ernie-unimo/script/classification/MNLI-AX/run.sh
ernie-unimo/script/classification/MNLI-AX/run.sh
+109
-0
ernie-unimo/script/classification/MNLI-AX_large/model_conf
ernie-unimo/script/classification/MNLI-AX_large/model_conf
+33
-0
ernie-unimo/script/classification/MNLI-AX_large/run.sh
ernie-unimo/script/classification/MNLI-AX_large/run.sh
+109
-0
ernie-unimo/script/classification/SST-2_large/model_conf
ernie-unimo/script/classification/SST-2_large/model_conf
+29
-0
ernie-unimo/script/classification/SST-2_large/run.sh
ernie-unimo/script/classification/SST-2_large/run.sh
+99
-0
ernie-unimo/script/img2txt/coco/model_conf
ernie-unimo/script/img2txt/coco/model_conf
+74
-0
ernie-unimo/script/img2txt/coco/run.sh
ernie-unimo/script/img2txt/coco/run.sh
+124
-0
ernie-unimo/script/img2txt/coco_large/model_conf
ernie-unimo/script/img2txt/coco_large/model_conf
+75
-0
ernie-unimo/script/img2txt/coco_large/run.sh
ernie-unimo/script/img2txt/coco_large/run.sh
+125
-0
ernie-unimo/script/regression/STS-B/model_conf
ernie-unimo/script/regression/STS-B/model_conf
+29
-0
ernie-unimo/script/regression/STS-B/run.sh
ernie-unimo/script/regression/STS-B/run.sh
+99
-0
ernie-unimo/script/regression/STS-B_large/model_conf
ernie-unimo/script/regression/STS-B_large/model_conf
+29
-0
ernie-unimo/script/regression/STS-B_large/run.sh
ernie-unimo/script/regression/STS-B_large/run.sh
+99
-0
ernie-unimo/script/retrieval/Flickr30k/run.sh
ernie-unimo/script/retrieval/Flickr30k/run.sh
+1
-3
ernie-unimo/script/retrieval/Flickr30k_large/model_conf
ernie-unimo/script/retrieval/Flickr30k_large/model_conf
+28
-0
ernie-unimo/script/retrieval/Flickr30k_large/run.sh
ernie-unimo/script/retrieval/Flickr30k_large/run.sh
+98
-0
ernie-unimo/script/seq2seq/cnndm/model_conf
ernie-unimo/script/seq2seq/cnndm/model_conf
+67
-0
ernie-unimo/script/seq2seq/cnndm/run.sh
ernie-unimo/script/seq2seq/cnndm/run.sh
+115
-0
ernie-unimo/script/seq2seq/cnndm_large/model_conf
ernie-unimo/script/seq2seq/cnndm_large/model_conf
+70
-0
ernie-unimo/script/seq2seq/cnndm_large/run.sh
ernie-unimo/script/seq2seq/cnndm_large/run.sh
+114
-0
ernie-unimo/script/seq2seq/coqa_large/model_conf
ernie-unimo/script/seq2seq/coqa_large/model_conf
+73
-0
ernie-unimo/script/seq2seq/coqa_large/run.sh
ernie-unimo/script/seq2seq/coqa_large/run.sh
+117
-0
ernie-unimo/script/seq2seq/gigaword/model_conf
ernie-unimo/script/seq2seq/gigaword/model_conf
+68
-0
ernie-unimo/script/seq2seq/gigaword/run.sh
ernie-unimo/script/seq2seq/gigaword/run.sh
+114
-0
ernie-unimo/script/seq2seq/gigaword_large/model_conf
ernie-unimo/script/seq2seq/gigaword_large/model_conf
+67
-0
ernie-unimo/script/seq2seq/gigaword_large/run.sh
ernie-unimo/script/seq2seq/gigaword_large/run.sh
+114
-0
ernie-unimo/script/seq2seq/squad_qg/model_conf
ernie-unimo/script/seq2seq/squad_qg/model_conf
+68
-0
ernie-unimo/script/seq2seq/squad_qg/run.sh
ernie-unimo/script/seq2seq/squad_qg/run.sh
+114
-0
ernie-unimo/script/seq2seq/squad_qg_large/model_conf
ernie-unimo/script/seq2seq/squad_qg_large/model_conf
+68
-0
ernie-unimo/script/seq2seq/squad_qg_large/run.sh
ernie-unimo/script/seq2seq/squad_qg_large/run.sh
+114
-0
ernie-unimo/script/visual_entailment/SNLI-VE/model_conf
ernie-unimo/script/visual_entailment/SNLI-VE/model_conf
+28
-0
ernie-unimo/script/visual_entailment/SNLI-VE/run.sh
ernie-unimo/script/visual_entailment/SNLI-VE/run.sh
+114
-0
ernie-unimo/script/visual_entailment/SNLI-VE_large/model_conf
...e-unimo/script/visual_entailment/SNLI-VE_large/model_conf
+28
-0
ernie-unimo/script/visual_entailment/SNLI-VE_large/run.sh
ernie-unimo/script/visual_entailment/SNLI-VE_large/run.sh
+114
-0
ernie-unimo/src/args/visual_entailment_args.py
ernie-unimo/src/args/visual_entailment_args.py
+112
-0
ernie-unimo/src/finetune/visual_entailment.py
ernie-unimo/src/finetune/visual_entailment.py
+280
-0
ernie-unimo/src/model/unimo_finetune.py
ernie-unimo/src/model/unimo_finetune.py
+39
-25
ernie-unimo/src/reader/visual_entailment_reader.py
ernie-unimo/src/reader/visual_entailment_reader.py
+241
-0
ernie-unimo/src/run_retrieval.py
ernie-unimo/src/run_retrieval.py
+1
-1
ernie-unimo/src/run_visual_entailment.py
ernie-unimo/src/run_visual_entailment.py
+347
-0
未找到文件。
ernie-unimo/README.md
浏览文件 @
2a3d001d
...
@@ -28,29 +28,35 @@ Results on single-modal understanding and generation tasks:
...
@@ -28,29 +28,35 @@ Results on single-modal understanding and generation tasks:
---
---
## TODOs
## TODOs
-
[] Add all downstream tasks
-
[] Add VQA tasks
-
[] Add unimo large model
## Dependencies
## Dependencies
python 3.7.4
\
python 3.7.4
\
paddlepaddle-gpu==1.8.4.post107
\
paddlepaddle-gpu==1.8.4.post107
\
pyrouge==0.1.3
pyrouge==0.1.3
regex==2020.7.14
## Pre-trained Models
## Pre-trained Models
`UNIMO`
adopts large-scale text corpus, image collections and image-text aligned datasets as the pre-training data.
`UNIMO`
adopts large-scale text corpus, image collections and image-text aligned datasets as the pre-training data.
We provide
`UNIMO`
models of 1 scale settings which are pretrained
:
We provide
`UNIMO`
pre-trained models below
:
[
UNIMO base
](
https://unimo.bj.bcebos.com/model/unimo_base_en.tar.gz
)
(
lowercased
| 12 layers)
[
UNIMO base
](
https://unimo.bj.bcebos.com/model/unimo_base_en.tar.gz
)
(
lowercased
| 12 layers)
[
UNIMO-mnli base
](
https://unimo.bj.bcebos.com/model/unimo_mnli_base_en.tar.gz
)
(
lowercased
| 12 layers)
[
UNIMO large
](
https://unimo.bj.bcebos.com/model/unimo_large_en.tar.gz
)
(
lowercased
| 24 layers)
[
UNIMO-mnli large
](
https://unimo.bj.bcebos.com/model/unimo_mnli_large_en.tar.gz
)
(
lowercased
| 24 layers)
```
```
MODEL_SIZE=base
MODEL_SIZE=base
# base | mnli_base | large | mnli_large
cd /path/to/model_files
cd /path/to/model_files
wget --no-check-certificate -q https://unimo.bj.bcebos.com/model/unimo_${MODEL_SIZE}_en.tar.gz
wget --no-check-certificate -q https://unimo.bj.bcebos.com/model/unimo_${MODEL_SIZE}_en.tar.gz
tar -zxf unimo_${MODEL_SIZE}_en.tar.gz
tar -zxf unimo_${MODEL_SIZE}_en.tar.gz
```
```
## Experiments
## Experiments
Our fine-tuning experiments are carried on V100 GPU.
Here are the results from the
`UNIMO`
model
:
Our fine-tuning experiments are carried on V100 GPU.
The following are the startup methods and basic settings of all downstream tasks
:
<table>
<table>
<tr>
<tr>
...
@@ -62,29 +68,165 @@ Our fine-tuning experiments are carried on V100 GPU. Here are the results from t
...
@@ -62,29 +68,165 @@ Our fine-tuning experiments are carried on V100 GPU. Here are the results from t
<td><strong><center>
Running Time
</strong></td>
<td><strong><center>
Running Time
</strong></td>
</tr>
</tr>
<tr>
<tr>
<td
rowspan=
"
1
"
><center>
Text Understanding
<center></td>
<td
rowspan=
"
8
"
><center>
Text Understanding
<center></td>
<td
rowspan=
"
1
"
><center>
SST-2
<center></td>
<td
rowspan=
"
2
"
><center>
SST-2
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/classification/SST-2/run.sh
</td>
<td><center>
sh ./script/classification/SST-2/run.sh
</td>
<td><center>
8
</td>
<td><center>
8
</td>
<td><center>
9h
</td>
<td><center>
9h
</td>
</tr>
</tr>
<tr>
<tr>
<td
rowspan=
"1"
><center>
Text Generation
<center></td>
<td><center>
UNIMO large
</td>
<td
rowspan=
"1"
><center>
CoQA
<center></td>
<td><center>
sh ./script/classification/SST-2_large/run.sh
</td>
<td><center>
8
</td>
<td><center>
14h
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
CoLA
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/classification/CoLA/run.sh
</td>
<td><center>
4
</td>
<td><center>
2h
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/classification/CoLA_large/run.sh
</td>
<td><center>
4
</td>
<td><center>
4h
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
MNLI-AX
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/classification/MNLI-AX/run.sh
</td>
<td><center>
8
</td>
<td><center>
1d20h
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/classification/MNLI-AX_large/run.sh
</td>
<td><center>
8
</td>
<td><center>
2d13h
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
STS-B
<center></td>
<td><center>
UNIMO-mnli base
</td>
<td><center>
sh ./script/regression/STS-B/run.sh
</td>
<td><center>
8
</td>
<td><center>
2h
</td>
</tr>
<tr>
<td><center>
UNIMO-mnli large
</td>
<td><center>
sh ./script/regression/STS-B_large/run.sh
</td>
<td><center>
8
</td>
<td><center>
4h
</td>
</tr>
<tr>
<td
rowspan=
"8"
><center>
Text Generation
<center></td>
<td
rowspan=
"2"
><center>
CNN/DailyMail
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/seq2seq/cnndm/run.sh
</td>
<td><center>
4
</td>
<td><center>
1d8h
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/seq2seq/cnndm_large/run.sh
</td>
<td><center>
4
</td>
<td><center>
3d18h
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
Gigaword
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/seq2seq/gigaword/run.sh
</td>
<td><center>
4
</td>
<td><center>
1d3h
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/seq2seq/gigaword_large/run.sh
</td>
<td><center>
4
</td>
<td><center>
2d3h
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
CoQA
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/seq2seq/coqa/run.sh
</td>
<td><center>
sh ./script/seq2seq/coqa/run.sh
</td>
<td><center>
4
</td>
<td><center>
4
</td>
<td><center>
7h
</td>
<td><center>
7h
</td>
</tr>
</tr>
<tr>
<tr>
<td
rowspan=
"1"
><center>
Multi-Modal Understanding
<center></td>
<td><center>
UNIMO large
</td>
<td
rowspan=
"1"
><center>
Flickr30k
<center></td>
<td><center>
sh ./script/seq2seq/coqa_large/run.sh
</td>
<td><center>
4
</td>
<td><center>
22h
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
Squad_QG
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/seq2seq/squad_qg/run.sh
</td>
<td><center>
4
</td>
<td><center>
4h
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/seq2seq/squad_qg_large/run.sh
</td>
<td><center>
4
</td>
<td><center>
8h
</td>
</tr>
<tr>
<td
rowspan=
"6"
><center>
Multi-Modal Understanding
<center></td>
<td
rowspan=
"2"
><center>
Flickr30k
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/retrieval/Flickr30k/run.sh
</td>
<td><center>
sh ./script/retrieval/Flickr30k/run.sh
</td>
<td><center>
16
</td>
<td><center>
16
</td>
<td><center>
3d
</td>
<td><center>
3d
</td>
</tr>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/retrieval/Flickr30k_large/run.sh
</td>
<td><center>
16
</td>
<td><center>
3d
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
SNLI-VE
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/visual_entailment/SNLI-VE/run.sh
</td>
<td><center>
16
</td>
<td><center>
16h
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/visual_entailment/SNLI-VE_large/run.sh
</td>
<td><center>
16
</td>
<td><center>
2d
</td>
</tr>
<tr>
<td
rowspan=
"2"
><center>
VQA
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
-
</td>
<td><center>
-
</td>
<td><center>
-
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
-
</td>
<td><center>
-
</td>
<td><center>
-
</td>
</tr>
<tr>
<td
rowspan=
"6"
><center>
Multi-Modal Generation
<center></td>
<td
rowspan=
"2"
><center>
COCO Caption
<center></td>
<td><center>
UNIMO base
</td>
<td><center>
sh ./script/img2txt/coco/run.sh
</td>
<td><center>
16
</td>
<td><center>
3d
</td>
</tr>
<tr>
<td><center>
UNIMO large
</td>
<td><center>
sh ./script/img2txt/coco_large/run.sh
</td>
<td><center>
16
</td>
<td><center>
4d
</td>
</tr>
<table>
<table>
---
---
...
@@ -105,6 +247,10 @@ For base model:
...
@@ -105,6 +247,10 @@ For base model:
```
```
bash ./script/classification/SST-2/run.sh
bash ./script/classification/SST-2/run.sh
```
```
For large model:
```
bash ./script/classification/SST-2_large/run.sh
```
#### Evaluation Results:
#### Evaluation Results:
...
@@ -117,13 +263,285 @@ bash ./script/classification/SST-2/run.sh
...
@@ -117,13 +263,285 @@ bash ./script/classification/SST-2/run.sh
<td><center>
UNIMO-base
</td>
<td><center>
UNIMO-base
</td>
<td><center>
95.1
</td>
<td><center>
95.1
</td>
</tr>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
96.8
</td>
</tr>
<table>
### (2) Natural Language Inference
#### Download MNLI-AX dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/MNLI-AX.tar.gz
tar -zxf MNLI-AX.tar.gz
```
#### Run the following common to train and evaluate on the MNLI-AX dataset:
For base model:
```
bash ./script/classification/MNLI-AX/run.sh
```
For large model:
```
bash ./script/classification/MNLI-AX_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
Acc-(m/mm)
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
86.8/86.7
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
89.8/89.5
</td>
</tr>
<table>
<table>
### (3) Similarity Tasks
#### Download STS-B dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/STS-B.tar.gz
tar -zxf STS-B.tar.gz
```
#### Run the following common to train and evaluate on the STS-B dataset:
For base model:
```
bash ./script/regression/STS-B/run.sh
```
For large model:
```
bash ./script/regression/STS-B_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
Pearson correlation
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
91.0
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
92.6
</td>
</tr>
<table>
### (4) Linguistic Acceptability Judgments
#### Download CoLA dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/CoLA.tar.gz
tar -zxf CoLA.tar.gz
```
#### Run the following common to train and evaluate on the CoLA dataset:
For base model:
```
bash ./script/classification/CoLA/run.sh
```
For large model:
```
bash ./script/classification/CoLA_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
Matthews correlation
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
65.4
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
68.5
</td>
</tr>
<table>
## Text Generation Tasks
## Text Generation Tasks
### (1) Conversation Question Answering
### (1) Document Summarization
#### Download CNN/DailyMail dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/cnndm.tar.gz
tar -zxf cnndm.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/cnndm.tar.gz
tar -zxf cnndm.tar.gz
```
#### Run the following common to train and evaluate on the CNN/DailyMail dataset:
For base model:
```
bash ./script/seq2seq/cnndm/run.sh
```
For large model:
```
bash ./script/seq2seq/cnndm_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
ROUGE-1
</strong></td>
<td><strong><center>
ROUGE-2
</strong></td>
<td><strong><center>
ROUGE-L
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
42.42
</td>
<td><center>
20.12
</td>
<td><center>
39.61
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
43.51
</td>
<td><center>
20.65
</td>
<td><center>
40.63
</td>
</tr>
<table>
### (2) Sentence Compression
#### Download Gigaword dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/gigaword.tar.gz
tar -zxf gigaword.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/gigaword.tar.gz
tar -zxf gigaword.tar.gz
```
#### Run the following common to train and evaluate on the Gigaword dataset:
For base model:
```
bash ./script/seq2seq/gigaword/run.sh
```
For large model:
```
bash ./script/seq2seq/gigaword_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
ROUGE-1
</strong></td>
<td><strong><center>
ROUGE-2
</strong></td>
<td><strong><center>
ROUGE-L
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
38.80
</td>
<td><center>
19.99
</td>
<td><center>
36.27
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
39.71
</td>
<td><center>
20.37
</td>
<td><center>
36.88
</td>
</tr>
<table>
### (3) Question Generation
#### Download Squad dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/squad_qg.tar.gz
tar -zxf squad_qg.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/squad_qg.tar.gz
tar -zxf squad_qg.tar.gz
```
#### Run the following common to train and evaluate on the Squad dataset:
For base model:
```
bash ./script/seq2seq/squad_qg/run.sh
```
For large model:
```
bash ./script/seq2seq/squad_qg_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
BLUE4
</strong></td>
<td><strong><center>
METEOR
</strong></td>
<td><strong><center>
ROUGE-L
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
22.78
</td>
<td><center>
25.24
</td>
<td><center>
51.34
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
24.59
</td>
<td><center>
26.39
</td>
<td><center>
52.47
</td>
</tr>
<table>
### (4) Conversation Question Answering
#### Download CoQA dataset:
#### Download CoQA dataset:
```
```
cd /path/to/data
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/coqa.tar.gz
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/coqa.tar.gz
...
@@ -144,6 +562,10 @@ For base model:
...
@@ -144,6 +562,10 @@ For base model:
```
```
bash ./script/seq2seq/coqa/run.sh
bash ./script/seq2seq/coqa/run.sh
```
```
For large model:
```
bash ./script/seq2seq/coqa_large/run.sh
```
#### Evaluation Results:
#### Evaluation Results:
...
@@ -156,6 +578,10 @@ bash ./script/seq2seq/coqa/run.sh
...
@@ -156,6 +578,10 @@ bash ./script/seq2seq/coqa/run.sh
<td><center>
UNIMO-base
</td>
<td><center>
UNIMO-base
</td>
<td><center>
80.2
</td>
<td><center>
80.2
</td>
</tr>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
84.9
</td>
</tr>
<table>
<table>
...
@@ -179,6 +605,10 @@ For base model:
...
@@ -179,6 +605,10 @@ For base model:
```
```
bash ./script/retrieval/Flickr30k/run.sh
bash ./script/retrieval/Flickr30k/run.sh
```
```
For large model:
```
bash ./script/retrieval/Flickr30k_large/run.sh
```
#### Evaluation Results:
#### Evaluation Results:
...
@@ -197,6 +627,12 @@ Results of Image Retrieval task on Flickr30k dataset
...
@@ -197,6 +627,12 @@ Results of Image Retrieval task on Flickr30k dataset
<td><center>
93.40
</td>
<td><center>
93.40
</td>
<td><center>
96.08
</td>
<td><center>
96.08
</td>
</tr>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
78.04
</td>
<td><center>
94.24
</td>
<td><center>
97.12
</td>
</tr>
<table>
<table>
Results of Text Retrieval task on Flickr30k dataset
Results of Text Retrieval task on Flickr30k dataset
...
@@ -214,6 +650,110 @@ Results of Text Retrieval task on Flickr30k dataset
...
@@ -214,6 +650,110 @@ Results of Text Retrieval task on Flickr30k dataset
<td><center>
98.40
</td>
<td><center>
98.40
</td>
<td><center>
99.10
</td>
<td><center>
99.10
</td>
</tr>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
89.40
</td>
<td><center>
98.90
</td>
<td><center>
99.80
</td>
</tr>
<table>
### (2) Visual Entailment
#### Download SNLI-VE dataset:
##### Note: Visual features are extracted by [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention)
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/SNLI-VE.tar.gz
tar -zxf SNLI-VE.tar.gz
```
#### Run the following common to train and evaluate on the SNLI-VE dataset:
For base model:
```
bash ./script/visual_entailment/SNLI-VE/run.sh
```
For large model:
```
bash ./script/visual_entailment/SNLI-VE_large/run.sh
```
#### Evaluation Results:
Results of Visual Entailment task on SNLI-VE dataset
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
dev
</strong></td>
<td><strong><center>
test
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
80.00
</td>
<td><center>
79.10
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
81.11
</td>
<td><center>
80.63
</td>
</tr>
<table>
## Multi-Modal Generation Tasks
### (1) Image Caption Generation
#### Download COCO Caption dataset:
##### Note: Visual features are extracted by [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention)
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/coco.tar.gz
tar -zxf coco.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/coco.tar.gz
tar -zxf coco.tar.gz
```
#### Run the following common to train and evaluate on the COCO Caption dataset:
For base model:
```
bash ./script/img2txt/coco/run.sh
```
For large model:
```
bash ./script/img2txt/coco_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>
Model
</strong></td>
<td><strong><center>
BLUE4
</strong></td>
<td><strong><center>
CIDEr
</strong></td>
</tr>
<tr>
<td><center>
UNIMO-base
</td>
<td><center>
38.8
</td>
<td><center>
124.4
</td>
</tr>
<tr>
<td><center>
UNIMO-large
</td>
<td><center>
39.6
</td>
<td><center>
127.7
</td>
</tr>
<table>
<table>
---
---
...
...
ernie-unimo/model_files/download_unimo_base_mnli_en.sh
0 → 100644
浏览文件 @
2a3d001d
data_name
=
unimo_mnli_base_en
data_tar
=
${
data_name
}
.tar.gz
bos_url
=
https://unimo.bj.bcebos.com/model/
$data_tar
rm
-rf
$data_name
wget
--no-check-certificate
-q
$bos_url
if
[[
$?
-ne
0
]]
;
then
echo
"url link:
$bos_url
"
echo
"download data failed"
exit
1
fi
tar
zxf
$data_tar
rm
-f
$data_tar
exit
0
ernie-unimo/model_files/download_unimo_large_en.sh
0 → 100644
浏览文件 @
2a3d001d
data_name
=
unimo_large_en
data_tar
=
${
data_name
}
.tar.gz
bos_url
=
https://unimo.bj.bcebos.com/model/
$data_tar
rm
-rf
$data_name
wget
--no-check-certificate
-q
$bos_url
if
[[
$?
-ne
0
]]
;
then
echo
"url link:
$bos_url
"
echo
"download data failed"
exit
1
fi
tar
zxf
$data_tar
rm
-f
$data_tar
exit
0
ernie-unimo/model_files/download_unimo_large_mnli_en.sh
0 → 100644
浏览文件 @
2a3d001d
data_name
=
unimo_mnli_large_en
data_tar
=
${
data_name
}
.tar.gz
bos_url
=
https://unimo.bj.bcebos.com/model/
$data_tar
rm
-rf
$data_name
wget
--no-check-certificate
-q
$bos_url
if
[[
$?
-ne
0
]]
;
then
echo
"url link:
$bos_url
"
echo
"download data failed"
exit
1
fi
tar
zxf
$data_tar
rm
-f
$data_tar
exit
0
ernie-unimo/script/classification/CoLA/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="classification"
task=CoLA
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=2
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=10
eval_mertrics=matthews_corrcoef
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/classification/CoLA/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
250
fi
python
-u
./src/run_classifier.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/CoLA/train.tsv
\
--dev_set
./data/CoLA/dev.tsv
\
--test_set
./data/CoLA/test.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
2
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/classification/CoLA_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="classification"
task=CoLA_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=2
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=10
eval_mertrics=matthews_corrcoef
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/classification/CoLA_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
250
fi
python
-u
./src/run_classifier.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/CoLA/train.tsv
\
--dev_set
./data/CoLA/dev.tsv
\
--test_set
./data/CoLA/test.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
2
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/classification/MNLI-AX/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="classification"
task=MNLI-AX
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_val_hard="True"
do_test="False"
do_test_hard="False"
do_pred="True"
do_pred_hard="True"
do_diagnostic="True"
num_labels=3
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=10000
validation_steps=20000
skip_steps=100
eval_mertrics=simple_accuracy
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/classification/MNLI-AX/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
10000
fi
python
-u
./src/run_classifier.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_val_hard
${
do_val_hard
:-
"False"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_test_hard
${
do_test_hard
:-
"False"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--do_pred_hard
${
do_pred_hard
:-
"False"
}
\
--do_diagnostic
${
do_diagnostic
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/MNLI-AX/train.tsv
\
--dev_set
./data/MNLI-AX/m/dev.tsv
\
--dev_hard_set
./data/MNLI-AX/mm/dev.tsv
\
--test_set
./data/MNLI-AX/m/test.tsv
\
--test_hard_set
./data/MNLI-AX/mm/test.tsv
\
--diagnostic_set
./data/MNLI-AX/diagnostic.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
3
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
--line_prefix
=
"Best validation result:"
--final_res_file
=
"final_res.m.txt"
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
--line_prefix
=
"Best validation_hard result:"
--final_res_file
=
"final_res.mm.txt"
exit
0
ernie-unimo/script/classification/MNLI-AX_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="classification"
task=MNLI-AX_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_val_hard="True"
do_test="False"
do_test_hard="False"
do_pred="True"
do_pred_hard="True"
do_diagnostic="True"
num_labels=3
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=10000
validation_steps=20000
skip_steps=100
eval_mertrics=simple_accuracy
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/classification/MNLI-AX_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
10000
fi
python
-u
./src/run_classifier.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_val_hard
${
do_val_hard
:-
"False"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_test_hard
${
do_test_hard
:-
"False"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--do_pred_hard
${
do_pred_hard
:-
"False"
}
\
--do_diagnostic
${
do_diagnostic
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/MNLI-AX/train.tsv
\
--dev_set
./data/MNLI-AX/m/dev.tsv
\
--dev_hard_set
./data/MNLI-AX/mm/dev.tsv
\
--test_set
./data/MNLI-AX/m/test.tsv
\
--test_hard_set
./data/MNLI-AX/mm/test.tsv
\
--diagnostic_set
./data/MNLI-AX/diagnostic.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
3
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
--line_prefix
=
"Best validation result:"
--final_res_file
=
"final_res.m.txt"
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
--line_prefix
=
"Best validation_hard result:"
--final_res_file
=
"final_res.mm.txt"
exit
0
ernie-unimo/script/classification/SST-2_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="classification"
task=SST-2_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=2
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=2000
validation_steps=2000
skip_steps=10
eval_mertrics=simple_accuracy
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/classification/SST-2_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
1000
fi
python
-u
./src/run_classifier.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/SST-2/train.tsv
\
--dev_set
./data/SST-2/dev.tsv
\
--test_set
./data/SST-2/test.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
2
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
CONFIG_PATH
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/img2txt/coco/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="img2txt"
init_model="./model_files/unimo_base_en"
data_path="./data/coco"
object_file_local_path="coco_object_0.35_tot.ids"
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=10000
validation_steps=10000
label_smooth=0.1
weight_decay=0.01
max_seq_len=50
random_seed=666
#decoding params
do_decode="true"
max_img_len=101
max_obj_len=100
max_tgt_len=50
max_out_len=50
min_out_len=5
beam_size=5
length_penalty=0.6
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#dataset
train_filelist="train_filelist"
valid_filelist="valid_filelist"
test_filelist="test_filelist"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/coco/eval.sh"
eval_mertrics="Bleu_1,Bleu_2,Bleu_3,Bleu_4,METEOR,ROUGE_L,CIDEr,SPICE"
## turning params
pred_batch_size=8
epoch=20
BATCH_SIZE=("4")
LR_RATE=("7e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
## adversial training params
adv_type="villa"
adv_step=4
adv_lr=0.03
norm_type="l2"
adv_max_norm=0
adv_init_mag=0.4
adv_kl_weight=1.0
##configuration
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/img2txt/coco/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
timestamp
=
`
date
"+%Y%m%d-%H%M%S"
`
echo
$timestamp
# check
check_iplist
set
-eu
output_dir
=
../output-coco
log_dir
=
../log-coco
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
\
--split_log_path
$log_dir
\
--nproc_per_node 16"
skip_steps
=
10
save_steps
=
10000
validation_steps
=
10000
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_img2txt.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_filelist
${
data_path
}
/
${
train_filelist
:-
""
}
\
--valid_filelist
${
data_path
}
/
${
valid_filelist
:-
""
}
\
--test_filelist
${
data_path
}
/
${
test_filelist
:-
""
}
\
--object_file
${
data_path
}
/
${
object_file_local_path
:-
""
}
\
--epoch
${
epoch
}
\
--task_type
${
task_type
:-
"img2txt"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_img_len
${
max_img_len
}
\
--max_obj_len
${
max_obj_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--adv_step
${
adv_step
:-
2
}
\
--adv_lr
${
adv_lr
:-
0
.05
}
\
--adv_type
${
adv_type
:-
"None"
}
\
--norm_type
${
norm_type
:-
"l2"
}
\
--adv_max_norm
${
adv_max_norm
:-
0
.4
}
\
--adv_init_mag
${
adv_init_mag
:-
0
.4
}
\
--adv_kl_weight
${
adv_kl_weight
:-
1
.5
}
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
""
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/img2txt/coco_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="img2txt"
init_model="./model_files/unimo_large_en"
data_path="./data/coco"
object_file_local_path="coco_object_0.2_tot.ids"
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=10000
validation_steps=10000
label_smooth=0.1
weight_decay=0.01
max_seq_len=50
random_seed=666
#decoding params
do_decode="true"
max_img_len=101
max_obj_len=100
max_tgt_len=50
max_out_len=50
min_out_len=5
beam_size=5
length_penalty=0.6
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#dataset
train_filelist="train_filelist"
valid_filelist="valid_filelist"
test_filelist="test_filelist"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/coco/eval.sh"
eval_mertrics="Bleu_1,Bleu_2,Bleu_3,Bleu_4,METEOR,ROUGE_L,CIDEr,SPICE"
## turning params
pred_batch_size=1
epoch=10
BATCH_SIZE=("2")
LR_RATE=("7e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
## adversial training params
adv_type="villa"
adv_step=4
adv_lr=0.05
norm_type="l2"
adv_max_norm=0.4
adv_init_mag=0.4
adv_kl_weight=1.0
with_pure_model="True"
## configuration
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/img2txt/coco_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
timestamp
=
`
date
"+%Y%m%d-%H%M%S"
`
echo
$timestamp
# check
check_iplist
set
-eu
output_dir
=
../output-coco_large
log_dir
=
../log-coco_large
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
\
--split_log_path
$log_dir
\
--nproc_per_node 16"
skip_steps
=
10
save_steps
=
10000
validation_steps
=
10000
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_img2txt.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_filelist
${
data_path
}
/
${
train_filelist
:-
""
}
\
--valid_filelist
${
data_path
}
/
${
valid_filelist
:-
""
}
\
--test_filelist
${
data_path
}
/
${
test_filelist
:-
""
}
\
--object_file
${
data_path
}
/
${
object_file_local_path
:-
""
}
\
--epoch
${
epoch
}
\
--task_type
${
task_type
:-
"img2txt"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_img_len
${
max_img_len
}
\
--max_obj_len
${
max_obj_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--adv_step
${
adv_step
:-
2
}
\
--adv_lr
${
adv_lr
:-
0
.05
}
\
--adv_type
${
adv_type
:-
"None"
}
\
--norm_type
${
norm_type
:-
"l2"
}
\
--adv_max_norm
${
adv_max_norm
:-
0
.4
}
\
--adv_init_mag
${
adv_init_mag
:-
0
.4
}
\
--adv_kl_weight
${
adv_kl_weight
:-
1
.5
}
\
--with_pure_model
${
with_pure_model
:-
"True"
}
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
""
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/regression/STS-B/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="regression"
task=STS-B
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=1
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=20
eval_mertrics=pearson_and_spearman
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_mnli_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/regression/STS-B/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
250
fi
python
-u
./src/run_regression.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/STS-B/train.tsv
\
--dev_set
./data/STS-B/dev.tsv
\
--test_set
./data/STS-B/test.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
2
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"pearson_and_spearman"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/regression/STS-B_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="regression"
task=STS-B_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=1
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=20
eval_mertrics=pearson_and_spearman
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_mnli_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/regression/STS-B_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
CUDA_VISIBLE_DEVICES
=
0,1,2,3,4,5,6,7
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save_prefix
=
"
${
output_dir
}
/predict"
mkdir
-p
$pred_save_prefix
fi
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
if
[[
${
do_pred
}
==
"True"
]]
;
then
pred_save
=
"
${
pred_save_prefix
}
/test.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
250
fi
python
-u
./src/run_regression.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"False"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_pred
${
do_pred
:-
"True"
}
\
--pred_save
${
pred_save
:-
"./output/predict/test"
}
\
--batch_size
${
bs
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_set
./data/STS-B/train.tsv
\
--dev_set
./data/STS-B/dev.tsv
\
--test_set
./data/STS-B/test.tsv
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"10"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
2
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"pearson_and_spearman"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/retrieval/Flickr30k/run.sh
浏览文件 @
2a3d001d
...
@@ -2,7 +2,6 @@
...
@@ -2,7 +2,6 @@
set
-eux
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./env.sh
...
@@ -11,7 +10,6 @@ source ./utils.sh
...
@@ -11,7 +10,6 @@ source ./utils.sh
check_iplist
check_iplist
export
FLAGS_fuse_parameter_memory_size
=
64
export
FLAGS_fuse_parameter_memory_size
=
64
set
-eu
output_dir
=
./output/
${
task
}
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
save_model_base_dir
=
$output_dir
/save_model
...
@@ -41,7 +39,7 @@ python $lanch_start ./src/run_retrieval.py \
...
@@ -41,7 +39,7 @@ python $lanch_start ./src/run_retrieval.py \
--use_fuse
${
use_fuse
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
2
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
:-
"False"
}
\
--use_sigmoid
${
use_sigmoid
:-
"False"
}
\
--use_sigmoid
${
use_sigmoid
:-
"False"
}
\
...
...
ernie-unimo/script/retrieval/Flickr30k_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="retrieval"
task=Flickr30k_large
## hyper param
epoch=40
do_train="True"
do_val="True"
do_test="True"
save_checkpoints="False"
save_steps=10000
validation_steps=10000
samples_num=20
bbox="bbox100"
max_img_len=101
seed=1
batch_size=2
test_batch_size=96
lr=5e-6
learning_rate_scale=0.1
learning_rate_decay_epoch1=24
learning_rate_decay_epoch2=32
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/retrieval/Flickr30k_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
FLAGS_fuse_parameter_memory_size
=
64
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$save_model_base_dir
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$batch_size
"."
eval_dir
=
"
${
output_dir
}
/tmp/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
batch_size
}
"
mkdir
-p
$eval_dir
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
batch_size
}
"
mkdir
-p
$save_model_dir
fi
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3,4,5,6,7
\
--split_log_path
$log_dir
\
--log_prefix
$log_prefix
\
--nproc_per_node 8"
lanch_start
=
" -u ./src/launch.py
${
distributed_args
}
"
python
$lanch_start
./src/run_retrieval.py
\
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"True"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
:-
"False"
}
\
--use_sigmoid
${
use_sigmoid
:-
"False"
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--scale_circle
${
scale_circle
:-
1
.0
}
\
--margin
${
margin
:-
0
.2
}
\
--verbose
true
\
--samples_num
${
samples_num
:-
20
}
\
--run_random
${
run_random
:-
"False"
}
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--batch_size
${
batch_size
:-
16
}
\
--test_batch_size
${
test_batch_size
:-
96
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_image_caption
./data/Flickr30k/flickr30k-textids/train.ids
\
--train_image_feature_dir
./data/Flickr30k/flickr30k-features/
$bbox
/train
\
--dev_image_caption
./data/Flickr30k/flickr30k-textids/val.all.ids
\
--dev_image_feature_dir
./data/Flickr30k/flickr30k-features/
$bbox
/dev
\
--test_image_caption
./data/Flickr30k/flickr30k-textids/test.all.ids
\
--test_image_feature_dir
./data/Flickr30k/flickr30k-features/
$bbox
/test
\
--img_id_path
./data/Flickr30k/flickr30k-textids/dataset_flickr30k_name_id.txt
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_step
${
warmup_step
:-
"1"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--max_img_len
${
max_img_len
:-
37
}
\
--learning_rate
${
lr
:-
"5e-6"
}
\
--learning_rate_scale
${
learning_rate_scale
:-
0
.1
}
\
--learning_rate_decay_epoch1
${
learning_rate_decay_epoch1
:-
24
}
\
--learning_rate_decay_epoch2
${
learning_rate_decay_epoch2
:-
32
}
\
--lr_scheduler
${
lr_scheduler
:-
"scale_by_epoch_decay"
}
\
--skip_steps
${
skip_steps
:-
"50"
}
\
--num_iteration_per_drop_scope
10
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"recall@k"
}
\
--eval_dir
$eval_dir
\
--random_seed
${
seed
:-
1
}
\
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
exit
0
ernie-unimo/script/seq2seq/cnndm/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_base_en"
data_path='./data/cnndm'
## hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=512
max_tgt_len=128
max_out_len=128
min_out_len=20
beam_size=5
length_penalty=0.6
block_trigram="True"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.2k.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/cnndm/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=10
epoch=20
BATCH_SIZE=("8")
LR_RATE=("4e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/seq2seq/cnndm/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-cnndm
log_dir
=
../log-cnndm
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"normal"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu"
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/seq2seq/cnndm_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/cnndm'
## hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=512
max_tgt_len=128
max_out_len=128
min_out_len=20
beam_size=6
length_penalty=1.2
block_trigram="true"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.2k.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/cnndm/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("2e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
## configuration
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/seq2seq/cnndm_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-cnndm_large
log_dir
=
../log-cnndm_large
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"normal"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu"
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/seq2seq/coqa_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/coqa'
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=10000
validation_steps=10000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
random_seed=666
#for multi-turn dialog/qa
task_type="dialog"
role_type_size=3
turn_type_size=16
#decoding params
do_decode="true"
max_src_len=480
max_tgt_len=32
max_out_len=30
min_out_len=0
beam_size=3
length_penalty=0.0
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.tsv"
test_set="dev.tsv"
do_train="true"
do_val="true"
do_test="false"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/coqa/eval.sh"
eval_mertrics="f1"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("8e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/seq2seq/coqa_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-coqa_large
log_dir
=
../log-coqa_large
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 4,5,6,7
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"dialog"
}
\
--role_type_size
${
role_type_size
:-
3
}
\
--turn_type_size
${
turn_type_size
:-
16
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu_1"
}
\
--random_seed
${
random_seed
:-
"666"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/seq2seq/gigaword/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_base_en"
data_path='./data/gigaword'
# hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=128
max_tgt_len=32
max_out_len=32
min_out_len=5
beam_size=5
length_penalty=0.6
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.20k.tsv"
pred_set="test.tsv"
test_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/gigaword/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=32
epoch=10
BATCH_SIZE=("32")
LR_RATE=("3e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/seq2seq/gigaword/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-gigaword
log_dir
=
../log-gigaword
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
True
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"normal"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu"
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/seq2seq/gigaword_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/gigaword'
# hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=128
max_tgt_len=32
max_out_len=32
min_out_len=5
beam_size=6
length_penalty=1.2
block_trigram="false"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.20k.tsv"
pred_set="test.tsv"
test_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/gigaword/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=32
epoch=10
BATCH_SIZE=("32")
LR_RATE=("3e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/seq2seq/gigaword_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-gigaword_large
log_dir
=
../log-gigaword_large
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
True
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"normal"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu"
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/seq2seq/squad_qg/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_base_en"
data_path='./data/squad_qg'
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=5000
validation_steps=5000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
random_seed=666
#decoding params
do_decode="true"
max_src_len=416
max_tgt_len=96
max_out_len=48
min_out_len=5
beam_size=5
length_penalty=1.0
block_trigram="false"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/squad_qg/eval.sh"
eval_mertrics="Bleu_4,METEOR,ROUGE_L"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("1.25e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/seq2seq/squad_qg/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-squad_qg
log_dir
=
../log-squad_qg
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"normal"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu"
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/seq2seq/squad_qg_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/squad_qg'
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=5000
validation_steps=5000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
random_seed=666
#decoding params
do_decode="true"
max_src_len=416
max_tgt_len=96
max_out_len=48
min_out_len=5
beam_size=6
length_penalty=1.2
block_trigram="false"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/squad_qg/eval.sh"
eval_mertrics="Bleu_4,METEOR,ROUGE_L"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("5e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
ernie-unimo/script/seq2seq/squad_qg_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
# config env
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
# check
check_iplist
set
-eu
output_dir
=
../output-squad_qg_large
log_dir
=
../log-squad_qg_large
mkdir
-p
$output_dir
$log_dir
e_executor
=
$(
echo
${
use_experimental_executor
-
'True'
}
|
tr
'[A-Z]'
'[a-z]'
)
use_fuse
=
$(
echo
${
use_fuse
-
'False'
}
|
tr
'[A-Z]'
'[a-z]'
)
if
[[
${
use_fuse
}
==
"true"
]]
;
then
#MB
export
FLAGS_fuse_parameter_memory_size
=
64
fi
export
DEV_PREFIX
=
`
echo
${
dev_set
:-
"dev.tsv"
}
|
sed
's/\.tsv$//'
`
export
TEST_PREFIX
=
`
echo
${
test_set
:-
"test.tsv"
}
|
sed
's/\.tsv$//'
`
export
PRED_PREFIX
=
`
echo
${
pred_set
:-
"pred.tsv"
}
|
sed
's/\.tsv$//'
`
export
EVAL_SCRIPT_LOG
=
${
MYDIR
}
/../../../
${
output_dir
}
/eval.log
export
TASK_DATA_PATH
=
${
data_path
}
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3
\
--split_log_path
$log_dir
\
--nproc_per_node 4"
for
random_seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"random_seed "
${
random_seed
}
for
batch_size
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
${
batch_size
}
for
warmup_proportion
in
"
${
WARMUP_PROP
[@]
}
"
;
do
echo
"warmup_proportion "
${
warmup_proportion
}
for
learning_rate
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
${
learning_rate
}
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_seq2seq.py
--use_cuda
"True"
\
--is_distributed
"True"
\
--use_multi_gpu_test
${
use_multi_gpu_test
:-
"True"
}
\
--use_fp16
${
use_fp16
:-
"False"
}
\
--use_dynamic_loss_scaling
${
use_fp16
}
\
--init_loss_scaling
${
loss_scaling
:-
128
}
\
--use_fast_executor
${
e_executor
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"False"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"False"
}
\
--do_train
${
do_train
:-
"true"
}
\
--do_val
${
do_val
:-
"false"
}
\
--do_test
${
do_test
:-
"true"
}
\
--do_pred
${
do_pred
:-
"false"
}
\
--do_decode
${
do_decode
:-
"True"
}
\
--train_set
${
data_path
}
/
${
train_set
:-
""
}
\
--dev_set
${
data_path
}
/
${
dev_set
:-
""
}
\
--test_set
${
data_path
}
/
${
test_set
:-
""
}
\
--pred_set
${
data_path
}
/
${
pred_set
:-
""
}
\
--epoch
${
epoch
}
\
--tokenized_input
${
tokenized_input
:-
"True"
}
\
--task_type
${
task_type
:-
"normal"
}
\
--max_seq_len
${
max_seq_len
}
\
--max_src_len
${
max_src_len
}
\
--max_tgt_len
${
max_tgt_len
}
\
--max_out_len
${
max_out_len
}
\
--min_out_len
${
min_out_len
}
\
--block_trigram
${
block_trigram
:-
"True"
}
\
--beam_size
${
beam_size
:-
5
}
\
--length_penalty
${
length_penalty
:-
0
.6
}
\
--hidden_dropout_prob
${
hidden_dropout_prob
:-
0
.1
}
\
--attention_probs_dropout_prob
${
attention_probs_dropout_prob
:-
0
.1
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.98
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--continuous_position
${
continuous_position
:-
"false"
}
\
--tgt_type_id
${
tgt_type_id
:-
1
}
\
--batch_size
${
batch_size
}
\
--pred_batch_size
${
pred_batch_size
}
\
--in_tokens
${
in_tokens
:-
"True"
}
\
--learning_rate
${
learning_rate
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--warmup_proportion
${
warmup_proportion
:-
0
.02
}
\
--weight_decay
${
weight_decay
:-
0
.01
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--label_smooth
${
label_smooth
:-
0
.1
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--checkpoints
$output_dir
\
--save_steps
${
save_steps
:-
10000
}
\
--validation_steps
${
validation_steps
:-
10000
}
\
--skip_steps
${
skip_steps
:-
10
}
\
--save_and_valid_by_epoch
${
save_and_valid_by_epoch
:-
"False"
}
\
--eval_script
${
eval_script
:-
""
}
\
--eval_mertrics
${
eval_mertrics
:-
"bleu"
}
\
--random_seed
${
random_seed
:-
"1"
}
>>
$log_dir
/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py
--log_dir
=
$log_dir
exit
0
ernie-unimo/script/visual_entailment/SNLI-VE/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="visual_entailment"
task=SNLI-VE
bbox="bbox100"
weight_decay=0
max_len=512
warmup_ratio=0.06
eval_mertrics=simple_accuracy
do_train="True"
do_val="True"
do_test="True"
do_test_hard="False"
test_batch_size=24
save_checkpoints="False"
save_steps=2000
validation_steps=1000
EPOCH=("10")
BATCH_SIZE=("12")
LR_RATE=("1e-5")
DD_RAND_SEED=("1")
init_model="./model_files/unimo_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/visual_entailment/SNLI-VE/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
FLAGS_fuse_parameter_memory_size
=
64
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
eval_dir
=
${
output_dir
}
/tmp
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$eval_dir
$save_model_base_dir
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
eval_dir
=
"
${
output_dir
}
/tmp/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$eval_dir
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
2000
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
\
--split_log_path
$log_dir
\
--log_prefix
$log_prefix
\
--nproc_per_node 16"
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_visual_entailment.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"True"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"True"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
:-
"false"
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.999
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_test_hard
${
do_test_hard
:-
"True"
}
\
--num_train_examples
${
num_train_examples
:-
529527
}
\
--adv_step
${
adv_step
:-
4
}
\
--adv_lr
${
adv_lr
:-
0
.05
}
\
--norm_type
${
norm_type
:-
"l2"
}
\
--adv_max_norm
${
adv_max_norm
:-
0
.4
}
\
--adv_init_mag
${
adv_init_mag
:-
0
.4
}
\
--batch_size
${
bs
:-
16
}
\
--test_batch_size
${
test_batch_size
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_filelist
"./data/SNLI-VE/
$bbox
/train_filelist"
\
--dev_filelist
"./data/SNLI-VE/
$bbox
/dev_filelist"
\
--test_filelist
"./data/SNLI-VE/
$bbox
/test_filelist"
\
--test_hard_filelist
${
test_hard_filelist
:-
""
}
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--max_img_len
${
max_img_len
:-
101
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"100"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
3
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--eval_dir
${
eval_dir
:-
"./output/tmp"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
--key_words
=
job.log.0
exit
0
ernie-unimo/script/visual_entailment/SNLI-VE_large/model_conf
0 → 100644
浏览文件 @
2a3d001d
output_name="visual_entailment"
task=SNLI-VE_large
bbox="bbox100"
weight_decay=0
max_len=512
warmup_ratio=0.06
eval_mertrics=simple_accuracy
do_train="True"
do_val="True"
do_test="True"
do_test_hard="False"
test_batch_size=8
save_checkpoints="False"
save_steps=2000
validation_steps=1000
EPOCH=("10")
BATCH_SIZE=("4")
LR_RATE=("1e-5")
DD_RAND_SEED=("1")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
ernie-unimo/script/visual_entailment/SNLI-VE_large/run.sh
0 → 100644
浏览文件 @
2a3d001d
#!/usr/bin/env bash
set
-eux
R_DIR
=
`
dirname
$0
`
;
MYDIR
=
`
cd
$R_DIR
;
pwd
`
cd
${
MYDIR
}
/../../../
source
${
MYDIR
}
/model_conf
source
./env.sh
source
./utils.sh
check_iplist
export
FLAGS_fuse_parameter_memory_size
=
64
output_dir
=
./output/
${
task
}
log_dir
=
${
output_dir
}
/log
eval_dir
=
${
output_dir
}
/tmp
save_model_base_dir
=
$output_dir
/save_model
mkdir
-p
$output_dir
$log_dir
$eval_dir
$save_model_base_dir
for
seed
in
"
${
DD_RAND_SEED
[@]
}
"
;
do
echo
"seed "
$seed
for
epoch
in
"
${
EPOCH
[@]
}
"
;
do
echo
"epoch "
$epoch
for
lr
in
"
${
LR_RATE
[@]
}
"
;
do
echo
"learning rate "
$lr
for
bs
in
"
${
BATCH_SIZE
[@]
}
"
;
do
echo
"batch_size "
$bs
log_prefix
=
$seed
"_"
$epoch
"_"
$lr
"_"
$bs
"."
eval_dir
=
"
${
output_dir
}
/tmp/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$eval_dir
if
[[
${
bs
}
==
"32"
]]
;
then
validation_steps
=
2000
fi
if
[[
${
save_checkpoints
}
==
"True"
]]
;
then
save_model_dir
=
"
${
save_model_base_dir
}
/params.
${
seed
}
.
${
epoch
}
.
${
lr
}
.
${
bs
}
"
mkdir
-p
$save_model_dir
fi
distributed_args
=
"--node_ips
${
PADDLE_TRAINERS
}
\
--node_id
${
PADDLE_TRAINER_ID
}
\
--current_node_ip
${
POD_IP
}
\
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
\
--split_log_path
$log_dir
\
--log_prefix
$log_prefix
\
--nproc_per_node 16"
python
-u
./src/launch.py
${
distributed_args
}
\
./src/run_visual_entailment.py
--use_cuda
"True"
\
--is_distributed
${
is_distributed
:-
"True"
}
\
--weight_sharing
${
weight_sharing
:-
"True"
}
\
--use_fuse
${
use_fuse
:-
"True"
}
\
--use_fast_executor
${
e_executor
:-
"true"
}
\
--use_fp16
${
use_fp16
:-
"false"
}
\
--nccl_comm_num
${
nccl_comm_num
:-
1
}
\
--use_hierarchical_allreduce
${
use_hierarchical_allreduce
:-
"True"
}
\
--in_tokens
${
in_tokens
:-
"false"
}
\
--use_dynamic_loss_scaling
${
use_fp16
:-
"false"
}
\
--init_loss_scaling
${
loss_scaling
:-
12800
}
\
--beta1
${
beta1
:-
0
.9
}
\
--beta2
${
beta2
:-
0
.999
}
\
--epsilon
${
epsilon
:-
1e
-06
}
\
--verbose
true
\
--do_train
${
do_train
:-
"True"
}
\
--do_val
${
do_val
:-
"True"
}
\
--do_test
${
do_test
:-
"True"
}
\
--do_test_hard
${
do_test_hard
:-
"True"
}
\
--num_train_examples
${
num_train_examples
:-
529527
}
\
--adv_step
${
adv_step
:-
4
}
\
--adv_lr
${
adv_lr
:-
0
.05
}
\
--norm_type
${
norm_type
:-
"l2"
}
\
--adv_max_norm
${
adv_max_norm
:-
0
.4
}
\
--adv_init_mag
${
adv_init_mag
:-
0
.4
}
\
--batch_size
${
bs
:-
16
}
\
--test_batch_size
${
test_batch_size
:-
16
}
\
--init_pretraining_params
${
init_model
:-
""
}
\
--train_filelist
"./data/SNLI-VE/
$bbox
/train_filelist"
\
--dev_filelist
"./data/SNLI-VE/
$bbox
/dev_filelist"
\
--test_filelist
"./data/SNLI-VE/
$bbox
/test_filelist"
\
--test_hard_filelist
${
test_hard_filelist
:-
""
}
\
--checkpoints
${
save_model_dir
:-
""
}
\
--save_checkpoints
${
save_checkpoints
:-
"True"
}
\
--save_steps
${
save_steps
:-
1000
}
\
--weight_decay
${
weight_decay
:-
"0.1"
}
\
--warmup_proportion
${
warmup_ratio
:-
"0.06"
}
\
--validation_steps
${
validation_steps
:-
"100"
}
\
--epoch
$epoch
\
--max_seq_len
${
max_len
:-
512
}
\
--max_img_len
${
max_img_len
:-
101
}
\
--learning_rate
${
lr
:-
"5e-5"
}
\
--lr_scheduler
${
lr_scheduler
:-
"linear_warmup_decay"
}
\
--skip_steps
${
skip_steps
:-
"100"
}
\
--num_iteration_per_drop_scope
10
\
--num_labels
${
num_labels
:-
3
}
\
--unimo_vocab_file
${
vocab_file
}
\
--encoder_json_file
${
bpe_json
}
\
--vocab_bpe_file
${
bpe_file
}
\
--unimo_config_path
${
config_path
}
\
--eval_mertrics
${
eval_mertrics
:-
"simple_accuracy"
}
\
--eval_dir
${
eval_dir
:-
"./output/tmp"
}
\
--random_seed
${
seed
:-
1
}
>>
$log_dir
/
${
log_prefix
}
lanch.log 2>&1
done
done
done
done
if
[[
$?
-ne
0
]]
;
then
echo
"run failed"
exit
1
fi
python ./src/utils/stat_res.py
--log_dir
=
$log_dir
--key_words
=
job.log.0
exit
0
ernie-unimo/src/args/visual_entailment_args.py
0 → 100644
浏览文件 @
2a3d001d
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""args for visual_entailment task"""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
argparse
from
utils.args
import
ArgumentGroup
# yapf: disable
parser
=
argparse
.
ArgumentParser
(
__doc__
)
model_g
=
ArgumentGroup
(
parser
,
"model"
,
"model configuration and paths."
)
model_g
.
add_arg
(
"init_checkpoint"
,
str
,
None
,
"Init checkpoint to resume training from."
)
model_g
.
add_arg
(
"init_pretraining_params"
,
str
,
None
,
"Init pre-training params which preforms fine-tuning from. If the "
"arg 'init_checkpoint' has been set, this argument wouldn't be valid."
)
model_g
.
add_arg
(
"checkpoints"
,
str
,
"checkpoints"
,
"Path to save checkpoints."
)
model_g
.
add_arg
(
"save_checkpoints"
,
bool
,
True
,
"Whether to save checkpoints"
)
model_g
.
add_arg
(
"weight_sharing"
,
bool
,
True
,
"If set, share weights between word embedding and masked lm."
)
model_g
.
add_arg
(
"unimo_vocab_file"
,
str
,
'./model_files/dict/unimo_en.vocab.txt'
,
"unimo vocab"
)
model_g
.
add_arg
(
"encoder_json_file"
,
str
,
'./model_files/dict/unimo_en.encoder.json'
,
'bpt map'
)
model_g
.
add_arg
(
"vocab_bpe_file"
,
str
,
'./model_files/dict/unimo_en.vocab.bpe'
,
"vocab bpe"
)
model_g
.
add_arg
(
"unimo_config_path"
,
str
,
"./model_files/config/unimo_base_en.json"
,
"The file to save unimo configuration."
)
train_g
=
ArgumentGroup
(
parser
,
"training"
,
"training options."
)
train_g
.
add_arg
(
"epoch"
,
int
,
3
,
"Number of epoches for fine-tuning."
)
train_g
.
add_arg
(
"learning_rate"
,
float
,
5e-5
,
"Learning rate used to train with warmup."
)
train_g
.
add_arg
(
"lr_scheduler"
,
str
,
"linear_warmup_decay"
,
"scheduler of learning rate."
,
choices
=
[
'linear_warmup_decay'
,
'noam_decay'
])
train_g
.
add_arg
(
"weight_decay"
,
float
,
0.01
,
"Weight decay rate for L2 regularizer."
)
train_g
.
add_arg
(
"warmup_proportion"
,
float
,
0.1
,
"Proportion of training steps to perform linear learning rate warmup for."
)
train_g
.
add_arg
(
"save_steps"
,
int
,
10000
,
"The steps interval to save checkpoints."
)
train_g
.
add_arg
(
"validation_steps"
,
int
,
1000
,
"The steps interval to evaluate model performance."
)
train_g
.
add_arg
(
"nccl_comm_num"
,
int
,
1
,
"NCCL comm num."
)
train_g
.
add_arg
(
"hierarchical_allreduce_inter_nranks"
,
int
,
8
,
"Hierarchical allreduce inter ranks."
)
train_g
.
add_arg
(
"use_hierarchical_allreduce"
,
bool
,
False
,
"Use hierarchical allreduce or not."
)
train_g
.
add_arg
(
"use_fp16"
,
bool
,
False
,
"Whether to use fp16 mixed precision training."
)
train_g
.
add_arg
(
"use_dynamic_loss_scaling"
,
bool
,
False
,
"Whether to use dynamic loss scaling."
)
train_g
.
add_arg
(
"init_loss_scaling"
,
float
,
1.0
,
"Loss scaling factor for mixed precision training, only valid when use_fp16 is enabled."
)
train_g
.
add_arg
(
"incr_every_n_steps"
,
int
,
100
,
"Increases loss scaling every n consecutive."
)
train_g
.
add_arg
(
"decr_every_n_nan_or_inf"
,
int
,
2
,
"Decreases loss scaling every n accumulated steps with nan or inf gradients."
)
train_g
.
add_arg
(
"incr_ratio"
,
float
,
2.0
,
"The multiplier to use when increasing the loss scaling."
)
train_g
.
add_arg
(
"decr_ratio"
,
float
,
0.8
,
"The less-than-one-multiplier to use when decreasing."
)
train_g
.
add_arg
(
"use_fuse"
,
bool
,
False
,
"Whether to use fuse_allreduce_ops."
)
# args for villa adv_lr, norm_type, adv_max_norm, adv_init_mag
train_g
.
add_arg
(
"adv_step"
,
int
,
4
,
"adv_step"
)
train_g
.
add_arg
(
"adv_lr"
,
float
,
0.05
,
"adv_lr"
)
train_g
.
add_arg
(
"norm_type"
,
str
,
'l2'
,
"norm_type"
)
train_g
.
add_arg
(
"adv_max_norm"
,
float
,
0.4
,
"adv_max_norm"
)
train_g
.
add_arg
(
"adv_init_mag"
,
float
,
0.4
,
"adv_init_mag"
)
# args for adam optimizer
train_g
.
add_arg
(
"beta1"
,
float
,
0.9
,
"beta1 for adam"
)
train_g
.
add_arg
(
"beta2"
,
float
,
0.98
,
"beta2 for adam."
)
train_g
.
add_arg
(
"epsilon"
,
float
,
1e-06
,
"epsilon for adam."
)
log_g
=
ArgumentGroup
(
parser
,
"logging"
,
"logging related."
)
log_g
.
add_arg
(
"skip_steps"
,
int
,
10
,
"The steps interval to print loss."
)
log_g
.
add_arg
(
"verbose"
,
bool
,
False
,
"Whether to output verbose log."
)
log_g
.
add_arg
(
"eval_dir"
,
str
,
""
,
"eval_dir to save tmp data"
)
data_g
=
ArgumentGroup
(
parser
,
"data"
,
"Data paths, vocab paths and data processing options"
)
data_g
.
add_arg
(
"train_filelist"
,
str
,
None
,
"Path to training data."
)
data_g
.
add_arg
(
"test_filelist"
,
str
,
None
,
"Path to test data."
)
data_g
.
add_arg
(
"test_hard_filelist"
,
str
,
None
,
"Path to test_hard data."
)
data_g
.
add_arg
(
"dev_filelist"
,
str
,
None
,
"Path to validation data."
)
data_g
.
add_arg
(
"max_seq_len"
,
int
,
512
,
"Number of words of the longest seqence."
)
data_g
.
add_arg
(
"max_img_len"
,
int
,
37
,
"Image feature size==2048."
)
data_g
.
add_arg
(
"num_train_examples"
,
int
,
0
,
"num_train_examples"
)
data_g
.
add_arg
(
"batch_size"
,
int
,
32
,
"Total examples' number in batch for training. see also --in_tokens."
)
data_g
.
add_arg
(
"test_batch_size"
,
int
,
24
,
"Total examples' number in batch for training. see also --in_tokens."
)
data_g
.
add_arg
(
"in_tokens"
,
bool
,
False
,
"If set, the batch size will be the maximum number of tokens in one batch. "
"Otherwise, it will be the maximum number of examples in one batch."
)
data_g
.
add_arg
(
"do_lower_case"
,
bool
,
True
,
"Whether to lower case the input text. Should be True for uncased models and False for cased models."
)
data_g
.
add_arg
(
"random_seed"
,
int
,
0
,
"Random seed."
)
data_g
.
add_arg
(
"num_labels"
,
int
,
3
,
"label number"
)
run_type_g
=
ArgumentGroup
(
parser
,
"run_type"
,
"running type options."
)
run_type_g
.
add_arg
(
"use_cuda"
,
bool
,
True
,
"If set, use GPU for training."
)
run_type_g
.
add_arg
(
"is_distributed"
,
bool
,
False
,
"If set, then start distributed training."
)
run_type_g
.
add_arg
(
"use_fast_executor"
,
bool
,
False
,
"If set, use fast parallel executor (in experiment)."
)
run_type_g
.
add_arg
(
"num_iteration_per_drop_scope"
,
int
,
10
,
"Iteration intervals to drop scope."
)
run_type_g
.
add_arg
(
"do_train"
,
bool
,
True
,
"Whether to perform training."
)
run_type_g
.
add_arg
(
"do_val"
,
bool
,
True
,
"Whether to perform evaluation on dev data set."
)
run_type_g
.
add_arg
(
"do_test"
,
bool
,
True
,
"Whether to perform evaluation on test data set."
)
run_type_g
.
add_arg
(
"do_test_hard"
,
bool
,
False
,
"Whether to perform evaluation on test data set."
)
run_type_g
.
add_arg
(
"eval_mertrics"
,
str
,
"simple_accuracy"
,
"eval_mertrics"
)
# yapf: enable
ernie-unimo/src/finetune/visual_entailment.py
0 → 100644
浏览文件 @
2a3d001d
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Model for visual_entailment."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
os
import
glob
import
time
import
numpy
as
np
import
paddle.fluid
as
fluid
from
model.unimo_finetune
import
UNIMOModel
from
eval
import
glue_eval
from
collections
import
OrderedDict
from
utils.utils
import
print_eval_log
def
kl_divergence_with_logits
(
q_logits
,
p_logits
):
"""
symmetric KL-divergence (See SMART, Sec 3.1)
q_logits: logits
p_logits: delta_logits
"""
q
=
fluid
.
layers
.
softmax
(
input
=
q_logits
)
p
=
fluid
.
layers
.
softmax
(
input
=
p_logits
)
kl_qp
=
fluid
.
layers
.
reduce_sum
(
q
*
(
fluid
.
layers
.
log
(
q
)
-
fluid
.
layers
.
log
(
p
)),
-
1
)
kl_pq
=
fluid
.
layers
.
reduce_sum
(
p
*
(
fluid
.
layers
.
log
(
p
)
-
fluid
.
layers
.
log
(
q
)),
-
1
)
vat_loss
=
fluid
.
layers
.
mean
(
x
=
kl_qp
+
kl_pq
)
return
vat_loss
def
create_model
(
args
,
config
,
pyreader_name
=
"train_reader"
,
is_train
=
True
):
"""create_model"""
shapes
=
[[
-
1
,
args
.
max_seq_len
,
1
],
# src_ids
[
-
1
,
args
.
max_seq_len
,
1
],
# pos_ids
[
-
1
,
args
.
max_seq_len
,
1
],
# sent_ids
[
-
1
,
args
.
max_img_len
+
args
.
max_seq_len
,
args
.
max_img_len
+
args
.
max_seq_len
],
# input_mask
[
-
1
,
args
.
max_img_len
,
1
],
# v_mask
[
-
1
,
args
.
max_seq_len
,
1
],
# t_mask
[
-
1
,
args
.
max_img_len
,
config
[
"image_embedding_size"
]],
# image_embedding
[
-
1
,
args
.
max_img_len
,
5
],
# image_loc
[
-
1
,
1
]
# labels
]
dtypes
=
[
'int64'
,
'int64'
,
'int64'
,
'float32'
,
'float32'
,
'float32'
,
'float32'
,
'float32'
,
'int64'
]
lod_levels
=
[
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
,
0
]
pyreader
=
fluid
.
layers
.
py_reader
(
capacity
=
70
,
shapes
=
shapes
,
dtypes
=
dtypes
,
lod_levels
=
lod_levels
,
name
=
pyreader_name
,
use_double_buffer
=
True
)
(
src_ids
,
pos_ids
,
sent_ids
,
input_mask
,
v_mask
,
t_mask
,
image_embedding
,
image_loc
,
labels
)
\
=
fluid
.
layers
.
read_file
(
pyreader
)
emb_ids
=
{
"word_embedding"
:
src_ids
,
"sent_embedding"
:
sent_ids
,
"pos_embedding"
:
pos_ids
}
image_input
=
{
"image_embedding"
:
image_embedding
,
"loc_embedding"
:
image_loc
}
adv_step
,
adv_lr
,
norm_type
,
adv_max_norm
,
adv_init_mag
=
\
args
.
adv_step
,
args
.
adv_lr
,
args
.
norm_type
,
args
.
adv_max_norm
,
args
.
adv_init_mag
assert
adv_step
>
0
and
adv_init_mag
>
0
def
get_loss_and_logits
(
text_feats
,
image_feats
):
feats
=
text_feats
+
image_feats
cls_params_name
=
[
"cls_out_w_0"
,
"cls_out_b_0"
]
feats
=
fluid
.
layers
.
fc
(
input
=
feats
,
size
=
2048
,
param_attr
=
fluid
.
ParamAttr
(
name
=
cls_params_name
[
0
],
initializer
=
fluid
.
initializer
.
TruncatedNormal
(
scale
=
0.02
)),
bias_attr
=
fluid
.
ParamAttr
(
name
=
cls_params_name
[
1
],
initializer
=
fluid
.
initializer
.
Constant
(
0.
)))
feats
=
fluid
.
layers
.
dropout
(
x
=
feats
,
dropout_prob
=
0.1
,
dropout_implementation
=
"upscale_in_train"
)
cls_params_name
=
[
"cls_out_w_1"
,
"cls_out_b_1"
]
logits
=
fluid
.
layers
.
fc
(
input
=
feats
,
size
=
args
.
num_labels
,
param_attr
=
fluid
.
ParamAttr
(
name
=
cls_params_name
[
0
],
initializer
=
fluid
.
initializer
.
TruncatedNormal
(
scale
=
0.02
)),
bias_attr
=
fluid
.
ParamAttr
(
name
=
cls_params_name
[
1
],
initializer
=
fluid
.
initializer
.
Constant
(
0.
)))
ce_loss
,
probs
=
fluid
.
layers
.
softmax_with_cross_entropy
(
logits
=
logits
,
label
=
labels
,
return_softmax
=
True
)
loss
=
fluid
.
layers
.
mean
(
x
=
ce_loss
)
/
adv_step
return
loss
,
logits
,
probs
def
init_delta
(
input
,
mask
,
shape
,
name
=
'text'
):
real_seq_len
=
fluid
.
layers
.
shape
(
input
)[
1
]
fake
=
fluid
.
layers
.
data
(
name
=
name
+
"_fake"
,
shape
=
shape
,
dtype
=
'float32'
)
mask_slice
=
fluid
.
layers
.
slice
(
mask
,
axes
=
[
1
],
starts
=
[
0
],
ends
=
fluid
.
layers
.
shape
(
mask
)[
1
])
length
=
fluid
.
layers
.
reduce_sum
(
mask_slice
,
dim
=
1
,
keep_dim
=
True
)
*
shape
[
-
1
]
# l2 norm
delta
=
fluid
.
layers
.
uniform_random_batch_size_like
(
mask
,
shape
=
fake
.
shape
,
min
=-
1.0
,
max
=
1.0
)
delta
=
fluid
.
layers
.
slice
(
delta
,
axes
=
[
1
],
starts
=
[
0
],
ends
=
real_seq_len
)
delta
=
delta
*
mask_slice
mag
=
adv_init_mag
/
fluid
.
layers
.
sqrt
(
length
)
delta
=
delta
*
mag
return
delta
if
is_train
:
text_emb_shape
=
[
-
1
,
args
.
max_seq_len
,
config
[
'hidden_size'
]]
text_delta
=
init_delta
(
src_ids
,
t_mask
,
text_emb_shape
,
name
=
'text'
)
image_emb_shape
=
[
-
1
,
args
.
max_img_len
,
config
[
'image_embedding_size'
]]
image_delta
=
init_delta
(
image_embedding
,
v_mask
,
image_emb_shape
,
name
=
'img'
)
else
:
text_delta
,
image_delta
=
None
,
None
def
pgd_with_l2
(
loss
,
delta
):
# grad
delta_grad
=
fluid
.
backward
.
gradients
(
loss
,
delta
)[
0
]
# l2 norm
delta_norm
=
fluid
.
layers
.
sqrt
(
fluid
.
layers
.
reduce_sum
(
fluid
.
layers
.
pow
(
fluid
.
layers
.
reshape
(
delta_grad
,
\
[
fluid
.
layers
.
shape
(
delta_grad
)[
0
],
-
1
]),
factor
=
2
),
dim
=
1
,
keep_dim
=
True
))
delta_norm
=
fluid
.
layers
.
clamp
(
delta_norm
,
min
=
float
(
1e-8
))
# pgd
delta
=
delta
+
adv_lr
*
delta_grad
/
delta_norm
# projection
if
adv_max_norm
>
0
:
exceed_mask
=
(
delta_norm
>
adv_max_norm
).
astype
(
'float32'
)
reweights
=
(
adv_max_norm
/
delta_norm
)
*
exceed_mask
+
(
1
-
exceed_mask
)
delta
=
delta
*
reweights
delta_grad
.
stop_gradient
=
True
return
delta
loss
=
None
for
iter
in
range
(
adv_step
):
vl_pure
=
UNIMOModel
(
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
config
=
config
,
image_input
=
image_input
,
weight_sharing
=
args
.
weight_sharing
)
vl_text
=
UNIMOModel
(
text_adv_delta
=
text_delta
,
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
config
=
config
,
image_input
=
image_input
,
weight_sharing
=
args
.
weight_sharing
)
vl_image
=
UNIMOModel
(
image_adv_delta
=
image_delta
,
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
config
=
config
,
image_input
=
image_input
,
weight_sharing
=
args
.
weight_sharing
)
h_pure_text
,
h_pure_image
=
vl_pure
.
get_pooled_output
()
h_text_text
,
h_text_image
=
vl_text
.
get_pooled_output
()
h_image_text
,
h_image_image
=
vl_image
.
get_pooled_output
()
loss_pure
,
logit_pure
,
probs_pure
=
get_loss_and_logits
(
h_pure_text
,
h_pure_image
)
loss_text
,
logit_text
,
probs_text
=
get_loss_and_logits
(
h_text_text
,
h_text_image
)
loss_image
,
logit_image
,
probs_image
=
get_loss_and_logits
(
h_image_text
,
h_image_image
)
if
is_train
:
text_delta
=
pgd_with_l2
(
loss_text
,
text_delta
)
image_delta
=
pgd_with_l2
(
loss_image
,
image_delta
)
kl_adv_text_loss
=
kl_divergence_with_logits
(
logit_pure
,
logit_text
)
kl_adv_image_loss
=
kl_divergence_with_logits
(
logit_pure
,
logit_image
)
cur_loss
=
loss_pure
+
loss_text
+
loss_image
+
kl_adv_text_loss
+
kl_adv_image_loss
loss
=
cur_loss
if
loss
is
None
else
loss
+
cur_loss
num_seqs
=
fluid
.
layers
.
create_tensor
(
dtype
=
'int64'
)
accuracy
=
fluid
.
layers
.
accuracy
(
input
=
probs_pure
,
label
=
labels
,
total
=
num_seqs
)
graph_vars
=
{
"loss"
:
loss
,
"probs"
:
probs_pure
,
"accuracy"
:
accuracy
,
"labels"
:
labels
,
"num_seqs"
:
num_seqs
}
for
k
,
v
in
graph_vars
.
items
():
v
.
persistable
=
False
return
pyreader
,
graph_vars
def
evaluate
(
args
,
exe
,
test_pyreader
,
graph_vars
,
eval_phase
,
dev_count
=
1
,
gpu_id
=
0
):
"""evaluate"""
all_mat
=
[]
test_pyreader
.
start
()
time_begin
=
time
.
time
()
fetch_list
=
[
graph_vars
[
"probs"
].
name
,
graph_vars
[
"labels"
].
name
]
while
True
:
try
:
np_probs
,
np_labels
=
exe
.
run
(
fetch_list
=
fetch_list
)
np_preds
=
np
.
argmax
(
np_probs
,
axis
=
1
).
reshape
((
-
1
,
1
))
np_labels
=
np_labels
.
reshape
((
-
1
,
1
))
mat
=
np
.
concatenate
([
np_preds
,
np_labels
],
axis
=
1
)
all_mat
.
extend
(
mat
.
tolist
())
except
fluid
.
core
.
EOFException
:
test_pyreader
.
reset
()
break
all_mat
=
np
.
array
(
all_mat
)
time_end
=
time
.
time
()
save_file
=
"%s/%s.trainers_%d.part_%d.npy"
%
(
args
.
eval_dir
,
eval_phase
,
dev_count
,
gpu_id
)
np
.
save
(
save_file
,
all_mat
)
tmp_file
=
"%s/%s.trainers_%d.part_%d.finish"
%
(
args
.
eval_dir
,
eval_phase
,
dev_count
,
gpu_id
)
tmp_writer
=
open
(
tmp_file
,
"w"
)
tmp_writer
.
close
()
if
gpu_id
==
0
:
while
True
:
ret
=
os
.
popen
(
'find %s -maxdepth 1 -name "%s.trainers_%d.part_*.finish"'
%
(
args
.
eval_dir
,
eval_phase
,
dev_count
)).
readlines
()
if
len
(
ret
)
!=
dev_count
:
time
.
sleep
(
1
)
continue
else
:
break
all_mats
=
[]
save_files
=
glob
.
glob
(
"%s/%s.trainers_%d.part_*.npy"
%
(
args
.
eval_dir
,
eval_phase
,
dev_count
))
for
cur_save_file
in
save_files
:
mat
=
np
.
load
(
cur_save_file
).
tolist
()
all_mats
.
extend
(
mat
)
all_mats
=
np
.
array
(
all_mats
)
cur_time
=
str
(
int
(
time
.
time
()))
os
.
system
(
"mkdir %s/%s"
%
(
args
.
eval_dir
,
cur_time
))
os
.
system
(
"mv %s/%s.trainers_%d.* %s/%s"
%
(
args
.
eval_dir
,
eval_phase
,
dev_count
,
args
.
eval_dir
,
cur_time
))
ret
=
OrderedDict
()
ret
[
'phase'
]
=
eval_phase
ret
[
'loss'
]
=
-
1
ret
[
'data_num'
]
=
all_mats
.
shape
[
0
]
ret
[
'used_time'
]
=
round
(
time_end
-
time_begin
,
4
)
metrics
=
OrderedDict
()
metrics
[
"simple_accuracy"
]
=
glue_eval
.
simple_accuracy
if
args
.
eval_mertrics
in
metrics
:
ret_metric
=
metrics
[
args
.
eval_mertrics
](
all_mats
[:,
0
],
all_mats
[:,
1
])
ret
.
update
(
ret_metric
)
print_eval_log
(
ret
)
else
:
raise
ValueError
(
'unsupported metric {}'
.
format
(
args
.
eval_mertrics
))
return
ret
else
:
return
None
ernie-unimo/src/model/unimo_finetune.py
浏览文件 @
2a3d001d
...
@@ -92,17 +92,6 @@ class UNIMOModel(object):
...
@@ -92,17 +92,6 @@ class UNIMOModel(object):
self
.
_is_img2txt_task
=
(
task_type
==
"img2txt"
)
self
.
_is_img2txt_task
=
(
task_type
==
"img2txt"
)
self
.
_is_multimodal_task
=
(
image_input
is
not
None
)
self
.
_is_multimodal_task
=
(
image_input
is
not
None
)
if
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
:
self
.
_input_type
=
'vol'
elif
emb_ids
is
not
None
and
image_input
is
not
None
:
self
.
_input_type
=
'vl'
elif
emb_ids
is
not
None
:
self
.
_input_type
=
'l'
elif
image_input
is
not
None
and
emb_obj_ids
is
not
None
:
self
.
_input_type
=
'vo'
else
:
raise
ValueError
(
'input feature error'
)
if
self
.
_is_dialogue_task
:
if
self
.
_is_dialogue_task
:
self
.
_role_type_size
=
config
[
"role_type_size"
]
self
.
_role_type_size
=
config
[
"role_type_size"
]
self
.
_turn_type_size
=
config
[
"turn_type_size"
]
self
.
_turn_type_size
=
config
[
"turn_type_size"
]
...
@@ -156,28 +145,39 @@ class UNIMOModel(object):
...
@@ -156,28 +145,39 @@ class UNIMOModel(object):
def
_build_model
(
self
,
emb_ids
=
None
,
input_mask
=
None
,
image_input
=
None
,
emb_obj_ids
=
None
,
gather_idx
=
None
):
def
_build_model
(
self
,
emb_ids
=
None
,
input_mask
=
None
,
image_input
=
None
,
emb_obj_ids
=
None
,
gather_idx
=
None
):
"""build unimo model"""
"""build unimo model"""
if
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
:
input_type
=
'vol'
elif
emb_ids
is
not
None
and
image_input
is
not
None
:
input_type
=
'vl'
elif
emb_ids
is
not
None
:
input_type
=
'l'
elif
image_input
is
not
None
and
emb_obj_ids
is
not
None
:
input_type
=
'vo'
else
:
raise
ValueError
(
'input feature error'
)
self
.
_enc_vol_out
=
None
self
.
_enc_vol_out
=
None
self
.
_enc_vl_out
=
None
self
.
_enc_vl_out
=
None
self
.
_enc_v_out
=
None
self
.
_enc_v_out
=
None
self
.
_enc_l_out
=
None
self
.
_enc_l_out
=
None
if
self
.
_
input_type
==
'vol'
:
if
input_type
==
'vol'
:
self
.
_enc_vol_out
,
self
.
_enc_v_out
,
self
.
_enc_l_out
=
self
.
encode
(
emb_ids
=
emb_ids
,
self
.
_enc_vol_out
,
self
.
_enc_v_out
,
self
.
_enc_l_out
=
self
.
encode
(
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
input_mask
=
input_mask
,
image_input
=
image_input
,
image_input
=
image_input
,
emb_obj_ids
=
emb_obj_ids
,
emb_obj_ids
=
emb_obj_ids
,
gather_idx
=
gather_idx
)
gather_idx
=
gather_idx
)
elif
self
.
_
input_type
==
'vl'
:
elif
input_type
==
'vl'
:
self
.
_enc_vl_out
,
self
.
_enc_v_out
,
self
.
_enc_l_out
=
self
.
encode
(
emb_ids
=
emb_ids
,
self
.
_enc_vl_out
,
self
.
_enc_v_out
,
self
.
_enc_l_out
=
self
.
encode
(
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
input_mask
=
input_mask
,
image_input
=
image_input
,
image_input
=
image_input
,
gather_idx
=
gather_idx
)
gather_idx
=
gather_idx
)
elif
self
.
_
input_type
==
'vo'
:
elif
input_type
==
'vo'
:
self
.
_enc_v_out
=
self
.
encode
(
input_mask
=
input_mask
,
self
.
_enc_v_out
=
self
.
encode
(
input_mask
=
input_mask
,
image_input
=
image_input
,
image_input
=
image_input
,
emb_obj_ids
=
emb_obj_ids
,
emb_obj_ids
=
emb_obj_ids
,
gather_idx
=
gather_idx
)
gather_idx
=
gather_idx
)
elif
self
.
_
input_type
==
'l'
:
elif
input_type
==
'l'
:
self
.
_enc_l_out
=
self
.
encode
(
emb_ids
=
emb_ids
,
self
.
_enc_l_out
=
self
.
encode
(
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
input_mask
=
input_mask
,
gather_idx
=
gather_idx
)
gather_idx
=
gather_idx
)
...
@@ -186,10 +186,22 @@ class UNIMOModel(object):
...
@@ -186,10 +186,22 @@ class UNIMOModel(object):
def
encode
(
self
,
emb_ids
=
None
,
input_mask
=
None
,
image_input
=
None
,
emb_obj_ids
=
None
,
gather_idx
=
None
):
def
encode
(
self
,
emb_ids
=
None
,
input_mask
=
None
,
image_input
=
None
,
emb_obj_ids
=
None
,
gather_idx
=
None
):
"""unimo encoder"""
"""unimo encoder"""
if
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
:
input_type
=
'vol'
elif
emb_ids
is
not
None
and
image_input
is
not
None
:
input_type
=
'vl'
elif
emb_ids
is
not
None
:
input_type
=
'l'
elif
image_input
is
not
None
and
emb_obj_ids
is
not
None
:
input_type
=
'vo'
else
:
raise
ValueError
(
'input feature error'
)
emb_feature
,
n_head_self_attn_mask
,
_v_seq_len
,
_o_seq_len
=
self
.
_gen_input
(
emb_ids
=
emb_ids
,
emb_feature
,
n_head_self_attn_mask
,
_v_seq_len
,
_o_seq_len
=
self
.
_gen_input
(
emb_ids
=
emb_ids
,
input_mask
=
input_mask
,
input_mask
=
input_mask
,
image_input
=
image_input
,
image_input
=
image_input
,
emb_obj_ids
=
emb_obj_ids
)
emb_obj_ids
=
emb_obj_ids
,
input_type
=
input_type
)
enc_out
=
encoder
(
enc_out
=
encoder
(
enc_input
=
emb_feature
,
enc_input
=
emb_feature
,
attn_bias
=
n_head_self_attn_mask
,
attn_bias
=
n_head_self_attn_mask
,
...
@@ -210,7 +222,7 @@ class UNIMOModel(object):
...
@@ -210,7 +222,7 @@ class UNIMOModel(object):
caches
=
self
.
caches
,
caches
=
self
.
caches
,
gather_idx
=
gather_idx
)
gather_idx
=
gather_idx
)
if
self
.
_
input_type
==
'vol'
:
if
input_type
==
'vol'
:
assert
_v_seq_len
is
not
None
and
_o_seq_len
is
not
None
,
"the input is invalid"
assert
_v_seq_len
is
not
None
and
_o_seq_len
is
not
None
,
"the input is invalid"
_vol_seq_len
=
layers
.
shape
(
enc_out
)[
1
]
_vol_seq_len
=
layers
.
shape
(
enc_out
)[
1
]
enc_v_out
=
fluid
.
layers
.
slice
(
enc_v_out
=
fluid
.
layers
.
slice
(
...
@@ -221,7 +233,7 @@ class UNIMOModel(object):
...
@@ -221,7 +233,7 @@ class UNIMOModel(object):
input
=
enc_out
,
axes
=
[
1
],
starts
=
[
_v_seq_len
+
_o_seq_len
],
ends
=
[
_vol_seq_len
])
input
=
enc_out
,
axes
=
[
1
],
starts
=
[
_v_seq_len
+
_o_seq_len
],
ends
=
[
_vol_seq_len
])
enc_vol_out
=
enc_out
enc_vol_out
=
enc_out
return
enc_vol_out
,
enc_v_out
,
enc_l_out
return
enc_vol_out
,
enc_v_out
,
enc_l_out
elif
self
.
_
input_type
==
'vl'
:
elif
input_type
==
'vl'
:
assert
_v_seq_len
is
not
None
and
_o_seq_len
is
None
,
"the input is invalid"
assert
_v_seq_len
is
not
None
and
_o_seq_len
is
None
,
"the input is invalid"
_vl_seq_len
=
layers
.
shape
(
enc_out
)[
1
]
_vl_seq_len
=
layers
.
shape
(
enc_out
)[
1
]
enc_v_out
=
fluid
.
layers
.
slice
(
enc_v_out
=
fluid
.
layers
.
slice
(
...
@@ -230,20 +242,22 @@ class UNIMOModel(object):
...
@@ -230,20 +242,22 @@ class UNIMOModel(object):
input
=
enc_out
,
axes
=
[
1
],
starts
=
[
_v_seq_len
],
ends
=
[
_vl_seq_len
])
input
=
enc_out
,
axes
=
[
1
],
starts
=
[
_v_seq_len
],
ends
=
[
_vl_seq_len
])
enc_vl_out
=
enc_out
enc_vl_out
=
enc_out
return
enc_vl_out
,
enc_v_out
,
enc_l_out
return
enc_vl_out
,
enc_v_out
,
enc_l_out
elif
self
.
_
input_type
==
'vo'
:
elif
input_type
==
'vo'
:
assert
_v_seq_len
is
not
None
and
_o_seq_len
is
not
None
,
"the input is invalid"
assert
_v_seq_len
is
not
None
and
_o_seq_len
is
not
None
,
"the input is invalid"
enc_v_out
=
fluid
.
layers
.
slice
(
enc_v_out
=
fluid
.
layers
.
slice
(
input
=
enc_out
,
axes
=
[
1
],
starts
=
[
0
],
ends
=
[
_v_seq_len
])
input
=
enc_out
,
axes
=
[
1
],
starts
=
[
0
],
ends
=
[
_v_seq_len
])
return
enc_v_out
return
enc_v_out
elif
self
.
_
input_type
==
'l'
:
elif
input_type
==
'l'
:
assert
_v_seq_len
is
None
and
_o_seq_len
is
None
,
"the input is invalid"
assert
_v_seq_len
is
None
and
_o_seq_len
is
None
,
"the input is invalid"
enc_l_out
=
enc_out
enc_l_out
=
enc_out
return
enc_l_out
return
enc_l_out
else
:
else
:
raise
ValueError
(
"The input type is invalid"
)
raise
ValueError
(
"The input type is invalid"
)
def
_gen_input
(
self
,
emb_ids
=
None
,
input_mask
=
None
,
image_input
=
None
,
emb_obj_ids
=
None
):
def
_gen_input
(
self
,
emb_ids
=
None
,
input_mask
=
None
,
image_input
=
None
,
emb_obj_ids
=
None
,
input_type
=
None
):
assert
input_mask
is
not
None
,
"input_mask should not be none"
assert
input_mask
is
not
None
,
"input_mask should not be none"
assert
input_type
is
not
None
,
"input_type should not be none"
self_attn_mask
=
input_mask
self_attn_mask
=
input_mask
self_attn_mask
=
fluid
.
layers
.
scale
(
self_attn_mask
=
fluid
.
layers
.
scale
(
x
=
self_attn_mask
,
scale
=
1e4
,
bias
=-
1.0
,
bias_after_scale
=
False
)
x
=
self_attn_mask
,
scale
=
1e4
,
bias
=-
1.0
,
bias_after_scale
=
False
)
...
@@ -320,16 +334,16 @@ class UNIMOModel(object):
...
@@ -320,16 +334,16 @@ class UNIMOModel(object):
emb_obj_out
,
'nd'
,
self
.
_prepostprocess_dropout
,
name
=
"pre_encoder"
)
emb_obj_out
,
'nd'
,
self
.
_prepostprocess_dropout
,
name
=
"pre_encoder"
)
_o_seq_len
=
layers
.
shape
(
emb_obj_out
)[
1
]
_o_seq_len
=
layers
.
shape
(
emb_obj_out
)[
1
]
if
self
.
_
input_type
==
'vol'
:
if
input_type
==
'vol'
:
assert
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
,
"the input is invalid"
assert
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
,
"the input is invalid"
emb_feature
=
fluid
.
layers
.
concat
([
emb_v_out
,
emb_obj_out
,
emb_out
],
axis
=
1
)
emb_feature
=
fluid
.
layers
.
concat
([
emb_v_out
,
emb_obj_out
,
emb_out
],
axis
=
1
)
elif
self
.
_
input_type
==
'vl'
:
elif
input_type
==
'vl'
:
assert
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
None
,
"the input is invalid"
assert
emb_ids
is
not
None
and
image_input
is
not
None
and
emb_obj_ids
is
None
,
"the input is invalid"
emb_feature
=
fluid
.
layers
.
concat
([
emb_v_out
,
emb_out
],
axis
=
1
)
emb_feature
=
fluid
.
layers
.
concat
([
emb_v_out
,
emb_out
],
axis
=
1
)
elif
self
.
_
input_type
==
'l'
:
elif
input_type
==
'l'
:
assert
emb_ids
is
not
None
and
image_input
is
None
and
emb_obj_ids
is
None
,
"the input is invalid"
assert
emb_ids
is
not
None
and
image_input
is
None
and
emb_obj_ids
is
None
,
"the input is invalid"
emb_feature
=
emb_out
emb_feature
=
emb_out
elif
self
.
_
input_type
==
'vo'
:
elif
input_type
==
'vo'
:
assert
emb_ids
is
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
,
"the input is invalid"
assert
emb_ids
is
None
and
image_input
is
not
None
and
emb_obj_ids
is
not
None
,
"the input is invalid"
emb_feature
=
fluid
.
layers
.
concat
([
emb_v_out
,
emb_obj_out
],
axis
=
1
)
emb_feature
=
fluid
.
layers
.
concat
([
emb_v_out
,
emb_obj_out
],
axis
=
1
)
else
:
else
:
...
...
ernie-unimo/src/reader/visual_entailment_reader.py
0 → 100644
浏览文件 @
2a3d001d
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""data reader for multimodal pretraining"""
from
__future__
import
print_function
from
__future__
import
division
import
json
import
base64
import
os
import
numpy
as
np
import
gzip
import
six
import
functools
import
paddle.fluid
as
fluid
from
reader.batching
import
pad_feature_data
,
pad_batch_data
class
ClassifyReader
(
object
):
"""ClassifyReader"""
def
__init__
(
self
,
filelist
,
max_seq_len
,
tokenizer
):
self
.
files
=
open
(
filelist
).
readlines
()
self
.
current_file_index
=
0
self
.
total_file
=
len
(
self
.
files
)
self
.
current_file
=
None
self
.
tot_examples_nums
=
0
self
.
max_seq_len
=
max_seq_len
self
.
pad_id
=
tokenizer
.
pad_token_id
self
.
sep_id
=
tokenizer
.
sep_token_id
self
.
trainer_id
=
int
(
os
.
getenv
(
"PADDLE_TRAINER_ID"
,
"0"
))
self
.
trainer_nums
=
int
(
os
.
getenv
(
"PADDLE_TRAINERS_NUM"
,
"1"
))
def
get_num_examples
(
self
):
"""get_num_examples"""
for
index
,
file_
in
enumerate
(
self
.
files
):
self
.
tot_examples_nums
+=
int
(
os
.
popen
(
'wc -l '
+
file_
.
strip
()).
read
().
split
()[
0
])
return
self
.
tot_examples_nums
def
get_progress
(
self
):
"""return current progress of traning data
"""
return
self
.
current_epoch
,
self
.
current_example
,
self
.
current_file_index
,
self
.
total_file
,
self
.
current_file
def
parse_line
(
self
,
line
,
max_seq_len
=
512
):
""" parse one line to token_ids, sentence_ids, pos_ids, label
"""
line
=
line
.
strip
(
'
\r\n
'
).
split
(
";"
)
if
len
(
line
)
==
14
:
(
image_id
,
data_id
,
label
,
token_ids
,
sent_ids
,
pos_ids
,
_
,
image_w
,
image_h
,
\
number_box
,
boxes
,
image_embeddings
,
_
,
_
)
=
line
else
:
raise
ValueError
(
"One sample have %d fields!"
%
len
(
line
))
def
decode_feature
(
base64_str
,
size
):
fea_base64
=
base64
.
b64decode
(
base64_str
)
fea_decode
=
np
.
frombuffer
(
fea_base64
,
dtype
=
np
.
float32
)
shape
=
size
,
int
(
fea_decode
.
shape
[
0
]
/
size
)
features
=
np
.
resize
(
fea_decode
,
shape
)
return
features
token_ids
=
[
int
(
token
)
for
token
in
token_ids
.
split
(
" "
)]
sent_ids
=
[
int
(
token
)
for
token
in
sent_ids
.
split
(
" "
)]
pos_ids
=
[
int
(
token
)
for
token
in
pos_ids
.
split
(
" "
)]
assert
len
(
token_ids
)
==
len
(
sent_ids
)
==
len
(
pos_ids
),
\
"[Must be true]len(token_ids) == len(sent_ids) == len(pos_ids)"
number_box
=
int
(
number_box
)
boxes
=
decode_feature
(
boxes
,
number_box
)
image_embeddings
=
decode_feature
(
image_embeddings
,
number_box
)
image_embeddings_cls
=
np
.
mean
(
image_embeddings
,
axis
=
0
,
keepdims
=
True
)
image_embeddings
=
np
.
concatenate
([
image_embeddings_cls
,
image_embeddings
],
0
)
image_location
=
np
.
zeros
((
boxes
.
shape
[
0
],
5
),
dtype
=
np
.
float32
)
image_location
[:,
:
4
]
=
boxes
image_location
[:,
4
]
=
(
image_location
[:,
3
]
-
image_location
[:,
1
])
*
(
image_location
[:,
2
]
-
image_location
[:,
0
])
/
(
float
(
image_w
)
*
float
(
image_h
))
image_location
[:,
0
]
=
image_location
[:,
0
]
/
float
(
image_w
)
image_location
[:,
1
]
=
image_location
[:,
1
]
/
float
(
image_h
)
image_location
[:,
2
]
=
image_location
[:,
2
]
/
float
(
image_w
)
image_location
[:,
3
]
=
image_location
[:,
3
]
/
float
(
image_h
)
g_location
=
np
.
array
([
0
,
0
,
1
,
1
,
1
])
image_location
=
np
.
concatenate
([
np
.
expand_dims
(
g_location
,
axis
=
0
),
image_location
],
axis
=
0
)
image_loc
=
image_location
if
len
(
token_ids
)
>
max_seq_len
:
token_ids
=
token_ids
[:
max_seq_len
-
1
]
+
[
self
.
sep_id
]
sent_ids
=
sent_ids
[:
max_seq_len
]
pos_ids
=
pos_ids
[:
max_seq_len
]
return
[
token_ids
,
sent_ids
,
pos_ids
,
label
,
image_loc
,
image_embeddings
,
number_box
+
1
]
def
_prepare_batch_data
(
self
,
insts
,
pad_id
=
None
):
batch_src_ids
=
[
inst
[
0
]
for
inst
in
insts
]
batch_sent_ids
=
[
inst
[
1
]
for
inst
in
insts
]
batch_pos_ids
=
[
inst
[
2
]
for
inst
in
insts
]
batch_labels
=
[
inst
[
3
]
for
inst
in
insts
]
batch_image_loc
=
[
inst
[
4
]
for
inst
in
insts
]
batch_image_embedding
=
[
inst
[
5
]
for
inst
in
insts
]
batch_image_size
=
[
inst
[
6
]
for
inst
in
insts
]
batch_labels
=
np
.
array
(
batch_labels
).
astype
(
"int64"
).
reshape
([
-
1
,
1
])
src_ids
,
token_mask
=
pad_batch_data
(
batch_src_ids
,
pretraining_task
=
'nlu'
,
pad_idx
=
pad_id
,
return_input_mask
=
True
)
sent_ids
=
pad_batch_data
(
batch_sent_ids
,
pretraining_task
=
'nlu'
,
pad_idx
=
pad_id
)
pos_ids
=
pad_batch_data
(
batch_pos_ids
,
pretraining_task
=
'nlu'
,
pad_idx
=
pad_id
)
image_loc
=
pad_feature_data
(
batch_image_loc
)
image_embedding
,
image_mask
=
pad_feature_data
(
batch_image_embedding
,
return_mask
=
True
,
batch_image_size
=
batch_image_size
)
input_mask
=
np
.
concatenate
((
image_mask
,
token_mask
),
axis
=
1
)
input_mask
=
np
.
matmul
(
input_mask
,
np
.
transpose
(
input_mask
,
(
0
,
2
,
1
)))
return_list
=
[
src_ids
,
pos_ids
,
sent_ids
,
input_mask
,
image_mask
,
token_mask
,
image_embedding
,
image_loc
,
batch_labels
]
return
return_list
def
read_file
(
self
,
file
):
"""read_file"""
if
file
.
endswith
(
'.gz'
):
with
gzip
.
open
(
file
,
"rt"
)
as
f
:
for
line
in
f
:
parsed_line
=
self
.
parse_line
(
line
,
max_seq_len
=
self
.
max_seq_len
)
if
parsed_line
is
None
:
continue
yield
parsed_line
else
:
with
open
(
file
,
"r"
)
as
f
:
for
line
in
f
:
parsed_line
=
self
.
parse_line
(
line
,
max_seq_len
=
self
.
max_seq_len
)
if
parsed_line
is
None
:
continue
yield
parsed_line
def
shuffle_samples
(
self
,
sample_generator
,
buffer
=
1000
):
"""shuffle_samples"""
samples
=
[]
try
:
while
True
:
while
len
(
samples
)
<
buffer
:
sample
=
next
(
sample_generator
)
samples
.
append
(
sample
)
np
.
random
.
shuffle
(
samples
)
for
sample
in
samples
:
yield
sample
samples
=
[]
except
StopIteration
:
if
len
(
samples
)
==
0
:
yield
None
else
:
np
.
random
.
shuffle
(
samples
)
for
sample
in
samples
:
yield
sample
def
data_generator
(
self
,
batch_size
,
epoch
,
phase
):
"""
data_generator
"""
if
phase
!=
"train"
:
epoch
=
1
def
wrapper
():
"""wrapper"""
def
batch_reader
():
"""batch_reader"""
for
epoch_index
in
range
(
epoch
):
self
.
global_rng
=
np
.
random
.
RandomState
(
epoch_index
)
self
.
current_epoch
=
epoch_index
self
.
current_example
=
0
if
phase
==
"train"
:
self
.
global_rng
.
shuffle
(
self
.
files
)
for
index
,
file_
in
enumerate
(
self
.
files
):
self
.
current_file_index
=
index
+
1
self
.
current_file
=
file_
batch_records
=
[]
for
sample
in
self
.
shuffle_samples
(
self
.
read_file
(
file
=
file_
.
strip
())):
self
.
current_example
=
self
.
current_example
+
1
if
sample
is
None
:
continue
if
len
(
batch_records
)
<
batch_size
:
batch_records
.
append
(
sample
)
else
:
yield
self
.
_prepare_batch_data
(
batch_records
,
self
.
pad_id
)
batch_records
=
[
sample
]
if
batch_records
:
yield
self
.
_prepare_batch_data
(
batch_records
,
self
.
pad_id
)
all_dev_batches
=
[]
for
batch_data
in
batch_reader
():
if
len
(
all_dev_batches
)
<
self
.
trainer_nums
:
all_dev_batches
.
append
(
batch_data
)
if
len
(
all_dev_batches
)
==
self
.
trainer_nums
:
yield
all_dev_batches
[
self
.
trainer_id
]
all_dev_batches
=
[]
if
phase
==
"train"
:
all_dev_batches
=
all_dev_batches
*
self
.
trainer_nums
np
.
random
.
shuffle
(
all_dev_batches
)
if
self
.
trainer_id
<
len
(
all_dev_batches
):
yield
all_dev_batches
[
self
.
trainer_id
]
return
wrapper
if
__name__
==
"__main__"
:
pass
ernie-unimo/src/run_retrieval.py
浏览文件 @
2a3d001d
...
@@ -11,7 +11,7 @@
...
@@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# See the License for the specific language governing permissions and
# limitations under the License.
# limitations under the License.
"""Finetuning on
classification
tasks."""
"""Finetuning on
retrieval
tasks."""
from
__future__
import
absolute_import
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
division
...
...
ernie-unimo/src/run_visual_entailment.py
0 → 100644
浏览文件 @
2a3d001d
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""visual entailment tasks."""
from
__future__
import
absolute_import
from
__future__
import
division
from
__future__
import
print_function
import
os
import
time
import
multiprocessing
import
numpy
as
np
import
paddle.fluid
as
fluid
from
utils.optimization
import
optimization
from
utils.utils
import
get_time
from
utils.init
import
init_pretraining_params
,
init_checkpoint
from
utils.args
import
print_arguments
from
model.tokenization
import
GptBpeTokenizer
from
args.visual_entailment_args
import
parser
from
collections
import
OrderedDict
from
model.unimo_finetune
import
UNIMOConfig
from
finetune.visual_entailment
import
create_model
,
evaluate
from
reader.visual_entailment_reader
import
ClassifyReader
args
=
parser
.
parse_args
()
def
main
(
args
):
"""main"""
model_config
=
UNIMOConfig
(
args
.
unimo_config_path
)
model_config
.
print_config
()
gpu_id
=
0
gpus
=
fluid
.
core
.
get_cuda_device_count
()
if
args
.
is_distributed
and
os
.
getenv
(
"FLAGS_selected_gpus"
)
is
not
None
:
gpu_list
=
os
.
getenv
(
"FLAGS_selected_gpus"
).
split
(
","
)
gpus
=
len
(
gpu_list
)
gpu_id
=
int
(
gpu_list
[
0
])
if
args
.
use_cuda
:
place
=
fluid
.
CUDAPlace
(
gpu_id
)
dev_count
=
gpus
else
:
place
=
fluid
.
CPUPlace
()
dev_count
=
int
(
os
.
environ
.
get
(
'CPU_NUM'
,
multiprocessing
.
cpu_count
()))
tokenizer
=
GptBpeTokenizer
(
vocab_file
=
args
.
unimo_vocab_file
,
encoder_json_file
=
args
.
encoder_json_file
,
vocab_bpe_file
=
args
.
vocab_bpe_file
,
do_lower_case
=
args
.
do_lower_case
)
if
not
(
args
.
do_train
or
args
.
do_val
or
args
.
do_test
or
args
.
do_test_hard
):
raise
ValueError
(
"For args `do_train`, `do_val`, `do_test`, `do_test_hard`, at "
"least one of them must be True."
)
startup_prog
=
fluid
.
Program
()
if
args
.
random_seed
is
not
None
:
startup_prog
.
random_seed
=
args
.
random_seed
trainers_num
=
int
(
os
.
getenv
(
"PADDLE_TRAINERS_NUM"
,
"1"
))
if
args
.
do_train
:
train_data_reader
=
ClassifyReader
(
args
.
train_filelist
,
args
.
max_seq_len
,
tokenizer
)
train_data_generator
=
train_data_reader
.
data_generator
(
batch_size
=
args
.
batch_size
,
epoch
=
args
.
epoch
,
phase
=
"train"
)
if
args
.
num_train_examples
:
num_train_examples
=
args
.
num_train_examples
else
:
num_train_examples
=
train_data_reader
.
get_num_examples
()
step_num_per_epoch
=
num_train_examples
//
args
.
batch_size
//
trainers_num
max_train_steps
=
args
.
epoch
*
step_num_per_epoch
warmup_steps
=
int
(
max_train_steps
*
args
.
warmup_proportion
)
print
(
"Device count: %d, gpu_id: %d"
%
(
dev_count
,
gpu_id
))
print
(
"Num train examples: %d"
%
num_train_examples
)
print
(
"Max train steps: %d"
%
max_train_steps
)
print
(
"Num warmup steps: %d"
%
warmup_steps
)
train_program
=
fluid
.
Program
()
with
fluid
.
program_guard
(
train_program
,
startup_prog
):
with
fluid
.
unique_name
.
guard
():
train_pyreader
,
graph_vars
=
create_model
(
args
,
config
=
model_config
,
pyreader_name
=
"train_reader"
,
is_train
=
True
)
scheduled_lr
,
loss_scaling
=
optimization
(
loss
=
graph_vars
[
"loss"
],
warmup_steps
=
warmup_steps
,
num_train_steps
=
max_train_steps
,
learning_rate
=
args
.
learning_rate
,
train_program
=
train_program
,
weight_decay
=
args
.
weight_decay
,
scheduler
=
args
.
lr_scheduler
,
use_fp16
=
args
.
use_fp16
,
use_dynamic_loss_scaling
=
args
.
use_dynamic_loss_scaling
,
init_loss_scaling
=
args
.
init_loss_scaling
,
beta1
=
args
.
beta1
,
beta2
=
args
.
beta2
,
epsilon
=
args
.
epsilon
)
if
args
.
do_val
or
args
.
do_test
or
args
.
do_test_hard
:
test_prog
=
fluid
.
Program
()
with
fluid
.
program_guard
(
test_prog
,
startup_prog
):
with
fluid
.
unique_name
.
guard
():
test_pyreader
,
test_graph_vars
=
create_model
(
args
,
config
=
model_config
,
pyreader_name
=
"dev_reader"
,
is_train
=
False
)
test_prog
=
test_prog
.
clone
(
for_test
=
True
)
if
args
.
do_val
:
dev_data_reader
=
ClassifyReader
(
args
.
dev_filelist
,
args
.
max_seq_len
,
tokenizer
)
dev_data_generator
=
dev_data_reader
.
data_generator
(
batch_size
=
args
.
test_batch_size
,
epoch
=
1
,
phase
=
"dev"
)
if
args
.
do_test
:
test_data_reader
=
ClassifyReader
(
args
.
test_filelist
,
args
.
max_seq_len
,
tokenizer
)
test_data_generator
=
test_data_reader
.
data_generator
(
batch_size
=
args
.
test_batch_size
,
epoch
=
1
,
phase
=
"test"
)
if
args
.
do_test_hard
:
test_hard_data_reader
=
ClassifyReader
(
args
.
test_hard_filelist
,
args
.
max_seq_len
,
tokenizer
)
test_hard_data_generator
=
test_hard_data_reader
.
data_generator
(
batch_size
=
args
.
test_batch_size
,
epoch
=
1
,
phase
=
"test_hard"
)
nccl2_num_trainers
=
1
nccl2_trainer_id
=
0
print
(
"args.is_distributed:"
,
args
.
is_distributed
)
if
args
.
is_distributed
:
trainer_id
=
int
(
os
.
getenv
(
"PADDLE_TRAINER_ID"
,
"0"
))
worker_endpoints_env
=
os
.
getenv
(
"PADDLE_TRAINER_ENDPOINTS"
)
current_endpoint
=
os
.
getenv
(
"PADDLE_CURRENT_ENDPOINT"
)
worker_endpoints
=
worker_endpoints_env
.
split
(
","
)
trainers_num
=
len
(
worker_endpoints
)
print
(
"worker_endpoints:{} trainers_num:{} current_endpoint:{}
\
trainer_id:{}"
.
format
(
worker_endpoints
,
trainers_num
,
current_endpoint
,
trainer_id
))
# prepare nccl2 env.
config
=
fluid
.
DistributeTranspilerConfig
()
config
.
mode
=
"nccl2"
if
args
.
nccl_comm_num
>
1
:
config
.
nccl_comm_num
=
args
.
nccl_comm_num
if
args
.
use_hierarchical_allreduce
and
trainers_num
>
args
.
hierarchical_allreduce_inter_nranks
:
config
.
use_hierarchical_allreduce
=
args
.
use_hierarchical_allreduce
config
.
hierarchical_allreduce_inter_nranks
=
args
.
hierarchical_allreduce_inter_nranks
assert
config
.
hierarchical_allreduce_inter_nranks
>
1
assert
trainers_num
%
config
.
hierarchical_allreduce_inter_nranks
==
0
config
.
hierarchical_allreduce_exter_nranks
=
\
trainers_num
/
config
.
hierarchical_allreduce_inter_nranks
t
=
fluid
.
DistributeTranspiler
(
config
=
config
)
t
.
transpile
(
trainer_id
,
trainers
=
worker_endpoints_env
,
current_endpoint
=
current_endpoint
,
program
=
train_program
if
args
.
do_train
else
test_prog
,
startup_program
=
startup_prog
)
nccl2_num_trainers
=
trainers_num
nccl2_trainer_id
=
trainer_id
exe
=
fluid
.
Executor
(
place
)
exe
.
run
(
startup_prog
)
if
args
.
do_train
:
if
args
.
init_checkpoint
and
args
.
init_pretraining_params
:
print
(
"WARNING: args 'init_checkpoint' and 'init_pretraining_params' "
"both are set! Only arg 'init_checkpoint' is made valid."
)
if
args
.
init_checkpoint
:
init_checkpoint
(
exe
,
args
.
init_checkpoint
,
main_program
=
train_program
)
elif
args
.
init_pretraining_params
:
init_pretraining_params
(
exe
,
args
.
init_pretraining_params
,
main_program
=
train_program
)
elif
args
.
do_val
or
args
.
do_test
or
args
.
do_test_hard
:
args
.
init_checkpoint
=
args
.
init_pretraining_params
if
not
args
.
init_checkpoint
:
raise
ValueError
(
"args 'init_checkpoint' should be set if"
"only doing validation or testing!"
)
init_checkpoint
(
exe
,
args
.
init_checkpoint
,
main_program
=
startup_prog
)
if
args
.
do_train
:
exec_strategy
=
fluid
.
ExecutionStrategy
()
if
args
.
use_fast_executor
:
exec_strategy
.
use_experimental_executor
=
True
exec_strategy
.
num_threads
=
4
if
args
.
use_fp16
else
2
exec_strategy
.
num_iteration_per_drop_scope
=
min
(
args
.
num_iteration_per_drop_scope
,
args
.
skip_steps
)
build_strategy
=
fluid
.
BuildStrategy
()
build_strategy
.
remove_unnecessary_lock
=
False
if
args
.
use_fuse
:
build_strategy
.
fuse_all_reduce_ops
=
True
train_exe
=
fluid
.
ParallelExecutor
(
use_cuda
=
args
.
use_cuda
,
loss_name
=
graph_vars
[
"loss"
].
name
,
build_strategy
=
build_strategy
,
exec_strategy
=
exec_strategy
,
main_program
=
train_program
,
num_trainers
=
nccl2_num_trainers
,
trainer_id
=
nccl2_trainer_id
)
train_pyreader
.
decorate_tensor_provider
(
train_data_generator
)
else
:
train_exe
=
None
if
args
.
do_val
or
args
.
do_test
or
args
.
do_test_hard
:
test_exe
=
fluid
.
ParallelExecutor
(
use_cuda
=
args
.
use_cuda
,
main_program
=
test_prog
,
share_vars_from
=
train_exe
)
dev_ret_history
=
[]
# (steps, key_eval, eval)
test_ret_history
=
[]
# (steps, key_eval, eval)
test_hard_ret_history
=
[]
# (steps, key_eval, eval)
steps
=
0
if
args
.
do_train
:
train_pyreader
.
start
()
time_begin
=
time
.
time
()
skip_steps
=
args
.
skip_steps
while
True
:
try
:
steps
+=
1
if
steps
%
skip_steps
==
0
:
train_fetch_list
=
[
graph_vars
[
"loss"
].
name
,
scheduled_lr
.
name
]
res
=
train_exe
.
run
(
fetch_list
=
train_fetch_list
)
outputs
=
{
"loss"
:
np
.
mean
(
res
[
0
]),
'learning_rate'
:
float
(
res
[
1
][
0
])}
if
args
.
verbose
:
verbose
=
"train pyreader queue size: %d, learning_rate: %.10f"
%
\
(
train_pyreader
.
queue
.
size
(),
outputs
[
'learning_rate'
])
print
(
verbose
)
current_epoch
,
current_example
,
current_file_index
,
total_file
,
current_file
=
\
train_data_reader
.
get_progress
()
time_end
=
time
.
time
()
used_time
=
time_end
-
time_begin
print
(
"%s - epoch: %d, progress: %d/%d, %d/%d, step: %d, ave loss: %f, speed: %f steps/s"
%
\
(
get_time
(),
current_epoch
,
current_example
,
num_train_examples
,
current_file_index
,
\
total_file
,
steps
,
outputs
[
"loss"
],
args
.
skip_steps
/
used_time
))
time_begin
=
time
.
time
()
else
:
train_exe
.
run
(
fetch_list
=
[])
if
nccl2_trainer_id
==
0
:
if
steps
%
args
.
save_steps
==
0
and
args
.
save_checkpoints
:
save_path
=
os
.
path
.
join
(
args
.
checkpoints
,
"step_"
+
str
(
steps
))
fluid
.
io
.
save_persistables
(
exe
,
save_path
,
train_program
)
if
steps
%
args
.
validation_steps
==
0
:
# evaluate dev set
if
args
.
do_val
:
test_pyreader
.
decorate_tensor_provider
(
dev_data_generator
)
outputs
=
evaluate
(
args
,
test_exe
,
test_pyreader
,
test_graph_vars
,
\
"dev"
,
trainers_num
,
nccl2_trainer_id
)
if
nccl2_trainer_id
==
0
:
dev_ret_history
.
append
((
steps
,
outputs
[
'key_eval'
],
outputs
[
outputs
[
'key_eval'
]]))
# evaluate test set
if
args
.
do_test
:
test_pyreader
.
decorate_tensor_provider
(
test_data_generator
)
outputs
=
evaluate
(
args
,
test_exe
,
test_pyreader
,
test_graph_vars
,
\
"test"
,
trainers_num
,
nccl2_trainer_id
)
if
nccl2_trainer_id
==
0
:
test_ret_history
.
append
((
steps
,
outputs
[
'key_eval'
],
outputs
[
outputs
[
'key_eval'
]]))
# evaluate test set
if
args
.
do_test_hard
:
test_pyreader
.
decorate_tensor_provider
(
test_hard_data_generator
)
outputs
=
evaluate
(
args
,
test_exe
,
test_pyreader
,
test_graph_vars
,
\
"test_hard"
,
trainers_num
,
nccl2_trainer_id
)
if
nccl2_trainer_id
==
0
:
test_hard_ret_history
.
append
((
steps
,
outputs
[
'key_eval'
],
outputs
[
outputs
[
'key_eval'
]]))
except
fluid
.
core
.
EOFException
:
if
args
.
save_checkpoints
:
save_path
=
os
.
path
.
join
(
args
.
checkpoints
,
"step_"
+
str
(
steps
))
fluid
.
io
.
save_persistables
(
exe
,
save_path
,
train_program
)
train_pyreader
.
reset
()
break
# final eval on dev set
if
args
.
do_val
:
test_pyreader
.
decorate_tensor_provider
(
dev_data_generator
)
outputs
=
evaluate
(
args
,
test_exe
,
test_pyreader
,
test_graph_vars
,
"dev"
,
trainers_num
,
nccl2_trainer_id
)
if
nccl2_trainer_id
==
0
:
dev_ret_history
.
append
((
steps
,
outputs
[
'key_eval'
],
outputs
[
outputs
[
'key_eval'
]]))
# final eval on test set
if
args
.
do_test
:
test_pyreader
.
decorate_tensor_provider
(
test_data_generator
)
outputs
=
evaluate
(
args
,
test_exe
,
test_pyreader
,
test_graph_vars
,
"test"
,
trainers_num
,
nccl2_trainer_id
)
if
nccl2_trainer_id
==
0
:
test_ret_history
.
append
((
steps
,
outputs
[
'key_eval'
],
outputs
[
outputs
[
'key_eval'
]]))
# final eval on test_hard set
if
args
.
do_test_hard
:
test_pyreader
.
decorate_tensor_provider
(
test_hard_data_generator
)
outputs
=
evaluate
(
args
,
test_exe
,
test_pyreader
,
test_graph_vars
,
"test_hard"
,
trainers_num
,
nccl2_trainer_id
)
if
nccl2_trainer_id
==
0
:
test_hard_ret_history
.
append
((
steps
,
outputs
[
'key_eval'
],
outputs
[
outputs
[
'key_eval'
]]))
if
nccl2_trainer_id
==
0
:
if
args
.
do_val
:
dev_ret_history
=
sorted
(
dev_ret_history
,
key
=
lambda
a
:
a
[
2
],
reverse
=
True
)
print
(
"Best validation result: step %d %s %f"
%
\
(
dev_ret_history
[
0
][
0
],
dev_ret_history
[
0
][
1
],
dev_ret_history
[
0
][
2
]))
if
__name__
==
'__main__'
:
print_arguments
(
args
)
main
(
args
)
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录