未验证 提交 2a3d001d 编写于 作者: X xfcygaocan 提交者: GitHub

add unimo large and most of tasks (#678)

上级 7cd60539
......@@ -28,29 +28,35 @@ Results on single-modal understanding and generation tasks:
---
## TODOs
- [] Add all downstream tasks
- [] Add unimo large model
- [] Add VQA tasks
## Dependencies
python 3.7.4\
paddlepaddle-gpu==1.8.4.post107\
pyrouge==0.1.3
regex==2020.7.14
## Pre-trained Models
`UNIMO` adopts large-scale text corpus, image collections and image-text aligned datasets as the pre-training data.
We provide `UNIMO` models of 1 scale settings which are pretrained:
We provide `UNIMO` pre-trained models below:
[UNIMO base](https://unimo.bj.bcebos.com/model/unimo_base_en.tar.gz) (lowercased | 12 layers)
[UNIMO-mnli base](https://unimo.bj.bcebos.com/model/unimo_mnli_base_en.tar.gz) (lowercased | 12 layers)
[UNIMO large](https://unimo.bj.bcebos.com/model/unimo_large_en.tar.gz) (lowercased | 24 layers)
[UNIMO-mnli large](https://unimo.bj.bcebos.com/model/unimo_mnli_large_en.tar.gz) (lowercased | 24 layers)
```
MODEL_SIZE=base
MODEL_SIZE=base # base | mnli_base | large | mnli_large
cd /path/to/model_files
wget --no-check-certificate -q https://unimo.bj.bcebos.com/model/unimo_${MODEL_SIZE}_en.tar.gz
tar -zxf unimo_${MODEL_SIZE}_en.tar.gz
```
## Experiments
Our fine-tuning experiments are carried on V100 GPU. Here are the results from the `UNIMO` model:
Our fine-tuning experiments are carried on V100 GPU. The following are the startup methods and basic settings of all downstream tasks:
<table>
<tr>
......@@ -62,29 +68,165 @@ Our fine-tuning experiments are carried on V100 GPU. Here are the results from t
<td><strong><center>Running Time</strong></td>
</tr>
<tr>
<td rowspan="1"><center>Text Understanding<center></td>
<td rowspan="1"><center>SST-2<center></td>
<td rowspan="8"><center>Text Understanding<center></td>
<td rowspan="2"><center>SST-2<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/classification/SST-2/run.sh</td>
<td><center>8</td>
<td><center>9h</td>
</tr>
<tr>
<td rowspan="1"><center>Text Generation<center></td>
<td rowspan="1"><center>CoQA<center></td>
<td><center>UNIMO large</td>
<td><center>sh ./script/classification/SST-2_large/run.sh</td>
<td><center>8</td>
<td><center>14h</td>
</tr>
<tr>
<td rowspan="2"><center>CoLA<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/classification/CoLA/run.sh</td>
<td><center>4</td>
<td><center>2h</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/classification/CoLA_large/run.sh</td>
<td><center>4</td>
<td><center>4h</td>
</tr>
<tr>
<td rowspan="2"><center>MNLI-AX<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/classification/MNLI-AX/run.sh</td>
<td><center>8</td>
<td><center>1d20h</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/classification/MNLI-AX_large/run.sh</td>
<td><center>8</td>
<td><center>2d13h</td>
</tr>
<tr>
<td rowspan="2"><center>STS-B<center></td>
<td><center>UNIMO-mnli base</td>
<td><center>sh ./script/regression/STS-B/run.sh</td>
<td><center>8</td>
<td><center>2h</td>
</tr>
<tr>
<td><center>UNIMO-mnli large</td>
<td><center>sh ./script/regression/STS-B_large/run.sh</td>
<td><center>8</td>
<td><center>4h</td>
</tr>
<tr>
<td rowspan="8"><center>Text Generation<center></td>
<td rowspan="2"><center>CNN/DailyMail<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/seq2seq/cnndm/run.sh</td>
<td><center>4</td>
<td><center>1d8h</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/seq2seq/cnndm_large/run.sh</td>
<td><center>4</td>
<td><center>3d18h</td>
</tr>
<tr>
<td rowspan="2"><center>Gigaword<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/seq2seq/gigaword/run.sh</td>
<td><center>4</td>
<td><center>1d3h</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/seq2seq/gigaword_large/run.sh</td>
<td><center>4</td>
<td><center>2d3h</td>
</tr>
<tr>
<td rowspan="2"><center>CoQA<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/seq2seq/coqa/run.sh</td>
<td><center>4</td>
<td><center>7h</td>
</tr>
<tr>
<td rowspan="1"><center>Multi-Modal Understanding<center></td>
<td rowspan="1"><center>Flickr30k<center></td>
<td><center>UNIMO large</td>
<td><center>sh ./script/seq2seq/coqa_large/run.sh</td>
<td><center>4</td>
<td><center>22h</td>
</tr>
<tr>
<td rowspan="2"><center>Squad_QG<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/seq2seq/squad_qg/run.sh</td>
<td><center>4</td>
<td><center>4h</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/seq2seq/squad_qg_large/run.sh</td>
<td><center>4</td>
<td><center>8h</td>
</tr>
<tr>
<td rowspan="6"><center>Multi-Modal Understanding<center></td>
<td rowspan="2"><center>Flickr30k<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/retrieval/Flickr30k/run.sh</td>
<td><center>16</td>
<td><center>3d</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/retrieval/Flickr30k_large/run.sh</td>
<td><center>16</td>
<td><center>3d</td>
</tr>
<tr>
<td rowspan="2"><center>SNLI-VE<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/visual_entailment/SNLI-VE/run.sh</td>
<td><center>16</td>
<td><center>16h</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/visual_entailment/SNLI-VE_large/run.sh</td>
<td><center>16</td>
<td><center>2d</td>
</tr>
<tr>
<td rowspan="2"><center>VQA<center></td>
<td><center>UNIMO base</td>
<td><center>-</td>
<td><center>-</td>
<td><center>-</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>-</td>
<td><center>-</td>
<td><center>-</td>
</tr>
<tr>
<td rowspan="6"><center>Multi-Modal Generation<center></td>
<td rowspan="2"><center>COCO Caption<center></td>
<td><center>UNIMO base</td>
<td><center>sh ./script/img2txt/coco/run.sh</td>
<td><center>16</td>
<td><center>3d</td>
</tr>
<tr>
<td><center>UNIMO large</td>
<td><center>sh ./script/img2txt/coco_large/run.sh</td>
<td><center>16</td>
<td><center>4d</td>
</tr>
<table>
---
......@@ -105,6 +247,10 @@ For base model:
```
bash ./script/classification/SST-2/run.sh
```
For large model:
```
bash ./script/classification/SST-2_large/run.sh
```
#### Evaluation Results:
......@@ -117,13 +263,285 @@ bash ./script/classification/SST-2/run.sh
<td><center>UNIMO-base</td>
<td><center>95.1</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>96.8</td>
</tr>
<table>
### (2) Natural Language Inference
#### Download MNLI-AX dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/MNLI-AX.tar.gz
tar -zxf MNLI-AX.tar.gz
```
#### Run the following common to train and evaluate on the MNLI-AX dataset:
For base model:
```
bash ./script/classification/MNLI-AX/run.sh
```
For large model:
```
bash ./script/classification/MNLI-AX_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>Acc-(m/mm)</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>86.8/86.7</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>89.8/89.5</td>
</tr>
<table>
### (3) Similarity Tasks
#### Download STS-B dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/STS-B.tar.gz
tar -zxf STS-B.tar.gz
```
#### Run the following common to train and evaluate on the STS-B dataset:
For base model:
```
bash ./script/regression/STS-B/run.sh
```
For large model:
```
bash ./script/regression/STS-B_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>Pearson correlation</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>91.0</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>92.6</td>
</tr>
<table>
### (4) Linguistic Acceptability Judgments
#### Download CoLA dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/CoLA.tar.gz
tar -zxf CoLA.tar.gz
```
#### Run the following common to train and evaluate on the CoLA dataset:
For base model:
```
bash ./script/classification/CoLA/run.sh
```
For large model:
```
bash ./script/classification/CoLA_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>Matthews correlation</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>65.4</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>68.5</td>
</tr>
<table>
## Text Generation Tasks
### (1) Conversation Question Answering
### (1) Document Summarization
#### Download CNN/DailyMail dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/cnndm.tar.gz
tar -zxf cnndm.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/cnndm.tar.gz
tar -zxf cnndm.tar.gz
```
#### Run the following common to train and evaluate on the CNN/DailyMail dataset:
For base model:
```
bash ./script/seq2seq/cnndm/run.sh
```
For large model:
```
bash ./script/seq2seq/cnndm_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>ROUGE-1</strong></td>
<td><strong><center>ROUGE-2</strong></td>
<td><strong><center>ROUGE-L</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>42.42</td>
<td><center>20.12</td>
<td><center>39.61</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>43.51</td>
<td><center>20.65</td>
<td><center>40.63</td>
</tr>
<table>
### (2) Sentence Compression
#### Download Gigaword dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/gigaword.tar.gz
tar -zxf gigaword.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/gigaword.tar.gz
tar -zxf gigaword.tar.gz
```
#### Run the following common to train and evaluate on the Gigaword dataset:
For base model:
```
bash ./script/seq2seq/gigaword/run.sh
```
For large model:
```
bash ./script/seq2seq/gigaword_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>ROUGE-1</strong></td>
<td><strong><center>ROUGE-2</strong></td>
<td><strong><center>ROUGE-L</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>38.80</td>
<td><center>19.99</td>
<td><center>36.27</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>39.71</td>
<td><center>20.37</td>
<td><center>36.88</td>
</tr>
<table>
### (3) Question Generation
#### Download Squad dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/squad_qg.tar.gz
tar -zxf squad_qg.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/squad_qg.tar.gz
tar -zxf squad_qg.tar.gz
```
#### Run the following common to train and evaluate on the Squad dataset:
For base model:
```
bash ./script/seq2seq/squad_qg/run.sh
```
For large model:
```
bash ./script/seq2seq/squad_qg_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>BLUE4</strong></td>
<td><strong><center>METEOR</strong></td>
<td><strong><center>ROUGE-L</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>22.78</td>
<td><center>25.24</td>
<td><center>51.34</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>24.59</td>
<td><center>26.39</td>
<td><center>52.47</td>
</tr>
<table>
### (4) Conversation Question Answering
#### Download CoQA dataset:
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/coqa.tar.gz
......@@ -144,6 +562,10 @@ For base model:
```
bash ./script/seq2seq/coqa/run.sh
```
For large model:
```
bash ./script/seq2seq/coqa_large/run.sh
```
#### Evaluation Results:
......@@ -156,6 +578,10 @@ bash ./script/seq2seq/coqa/run.sh
<td><center>UNIMO-base</td>
<td><center>80.2</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>84.9</td>
</tr>
<table>
......@@ -179,6 +605,10 @@ For base model:
```
bash ./script/retrieval/Flickr30k/run.sh
```
For large model:
```
bash ./script/retrieval/Flickr30k_large/run.sh
```
#### Evaluation Results:
......@@ -197,6 +627,12 @@ Results of Image Retrieval task on Flickr30k dataset
<td><center>93.40</td>
<td><center>96.08</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>78.04</td>
<td><center>94.24</td>
<td><center>97.12</td>
</tr>
<table>
Results of Text Retrieval task on Flickr30k dataset
......@@ -214,6 +650,110 @@ Results of Text Retrieval task on Flickr30k dataset
<td><center>98.40</td>
<td><center>99.10</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>89.40</td>
<td><center>98.90</td>
<td><center>99.80</td>
</tr>
<table>
### (2) Visual Entailment
#### Download SNLI-VE dataset:
##### Note: Visual features are extracted by [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention)
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/SNLI-VE.tar.gz
tar -zxf SNLI-VE.tar.gz
```
#### Run the following common to train and evaluate on the SNLI-VE dataset:
For base model:
```
bash ./script/visual_entailment/SNLI-VE/run.sh
```
For large model:
```
bash ./script/visual_entailment/SNLI-VE_large/run.sh
```
#### Evaluation Results:
Results of Visual Entailment task on SNLI-VE dataset
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>dev</strong></td>
<td><strong><center>test</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>80.00</td>
<td><center>79.10</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>81.11</td>
<td><center>80.63</td>
</tr>
<table>
## Multi-Modal Generation Tasks
### (1) Image Caption Generation
#### Download COCO Caption dataset:
##### Note: Visual features are extracted by [bottom-up-attention](https://github.com/peteanderson80/bottom-up-attention)
```
cd /path/to/data
wget --no-check-certificate -q https://unimo.bj.bcebos.com/data/coco.tar.gz
tar -zxf coco.tar.gz
```
#### Download evaluation script:
```
cd src/eval/tasks
wget --no-check-certificate -q https://unimo.bj.bcebos.com/eval_script/coco.tar.gz
tar -zxf coco.tar.gz
```
#### Run the following common to train and evaluate on the COCO Caption dataset:
For base model:
```
bash ./script/img2txt/coco/run.sh
```
For large model:
```
bash ./script/img2txt/coco_large/run.sh
```
#### Evaluation Results:
<table>
<tr>
<td><strong><center>Model</strong></td>
<td><strong><center>BLUE4</strong></td>
<td><strong><center>CIDEr</strong></td>
</tr>
<tr>
<td><center>UNIMO-base</td>
<td><center>38.8</td>
<td><center>124.4</td>
</tr>
<tr>
<td><center>UNIMO-large</td>
<td><center>39.6</td>
<td><center>127.7</td>
</tr>
<table>
---
......@@ -235,4 +775,4 @@ Contact information
For help or issues using `UNIMO`, please submit a GitHub issue.
For personal communication related to `UNIMO`, please contact Wei Li (liwei85@baidu.com), Guocheng Niu (niuguocheng@baidu.com) , Can Gao (gaocan01@baidu.com).
\ No newline at end of file
For personal communication related to `UNIMO`, please contact Wei Li (liwei85@baidu.com), Guocheng Niu (niuguocheng@baidu.com) , Can Gao (gaocan01@baidu.com).
data_name=unimo_mnli_base_en
data_tar=${data_name}.tar.gz
bos_url=https://unimo.bj.bcebos.com/model/$data_tar
rm -rf $data_name
wget --no-check-certificate -q $bos_url
if [[ $? -ne 0 ]]; then
echo "url link: $bos_url"
echo "download data failed"
exit 1
fi
tar zxf $data_tar
rm -f $data_tar
exit 0
data_name=unimo_large_en
data_tar=${data_name}.tar.gz
bos_url=https://unimo.bj.bcebos.com/model/$data_tar
rm -rf $data_name
wget --no-check-certificate -q $bos_url
if [[ $? -ne 0 ]]; then
echo "url link: $bos_url"
echo "download data failed"
exit 1
fi
tar zxf $data_tar
rm -f $data_tar
exit 0
data_name=unimo_mnli_large_en
data_tar=${data_name}.tar.gz
bos_url=https://unimo.bj.bcebos.com/model/$data_tar
rm -rf $data_name
wget --no-check-certificate -q $bos_url
if [[ $? -ne 0 ]]; then
echo "url link: $bos_url"
echo "download data failed"
exit 1
fi
tar zxf $data_tar
rm -f $data_tar
exit 0
output_name="classification"
task=CoLA
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=2
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=10
eval_mertrics=matthews_corrcoef
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=250
fi
python -u ./src/run_classifier.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_pred ${do_pred:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/CoLA/train.tsv \
--dev_set ./data/CoLA/dev.tsv \
--test_set ./data/CoLA/test.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-2} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir
exit 0
output_name="classification"
task=CoLA_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=2
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=10
eval_mertrics=matthews_corrcoef
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=250
fi
python -u ./src/run_classifier.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_pred ${do_pred:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/CoLA/train.tsv \
--dev_set ./data/CoLA/dev.tsv \
--test_set ./data/CoLA/test.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-2} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir
exit 0
output_name="classification"
task=MNLI-AX
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_val_hard="True"
do_test="False"
do_test_hard="False"
do_pred="True"
do_pred_hard="True"
do_diagnostic="True"
num_labels=3
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=10000
validation_steps=20000
skip_steps=100
eval_mertrics=simple_accuracy
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=10000
fi
python -u ./src/run_classifier.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_val_hard ${do_val_hard:-"False"} \
--do_test ${do_test:-"True"} \
--do_test_hard ${do_test_hard:-"False"} \
--do_pred ${do_pred:-"True"} \
--do_pred_hard ${do_pred_hard:-"False"} \
--do_diagnostic ${do_diagnostic:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/MNLI-AX/train.tsv \
--dev_set ./data/MNLI-AX/m/dev.tsv \
--dev_hard_set ./data/MNLI-AX/mm/dev.tsv \
--test_set ./data/MNLI-AX/m/test.tsv \
--test_hard_set ./data/MNLI-AX/mm/test.tsv \
--diagnostic_set ./data/MNLI-AX/diagnostic.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-3} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir --line_prefix="Best validation result:" --final_res_file="final_res.m.txt"
python ./src/utils/stat_res.py --log_dir=$log_dir --line_prefix="Best validation_hard result:" --final_res_file="final_res.mm.txt"
exit 0
output_name="classification"
task=MNLI-AX_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_val_hard="True"
do_test="False"
do_test_hard="False"
do_pred="True"
do_pred_hard="True"
do_diagnostic="True"
num_labels=3
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=10000
validation_steps=20000
skip_steps=100
eval_mertrics=simple_accuracy
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=10000
fi
python -u ./src/run_classifier.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_val_hard ${do_val_hard:-"False"} \
--do_test ${do_test:-"True"} \
--do_test_hard ${do_test_hard:-"False"} \
--do_pred ${do_pred:-"True"} \
--do_pred_hard ${do_pred_hard:-"False"} \
--do_diagnostic ${do_diagnostic:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/MNLI-AX/train.tsv \
--dev_set ./data/MNLI-AX/m/dev.tsv \
--dev_hard_set ./data/MNLI-AX/mm/dev.tsv \
--test_set ./data/MNLI-AX/m/test.tsv \
--test_hard_set ./data/MNLI-AX/mm/test.tsv \
--diagnostic_set ./data/MNLI-AX/diagnostic.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-3} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir --line_prefix="Best validation result:" --final_res_file="final_res.m.txt"
python ./src/utils/stat_res.py --log_dir=$log_dir --line_prefix="Best validation_hard result:" --final_res_file="final_res.mm.txt"
exit 0
output_name="classification"
task=SST-2_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=2
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=2000
validation_steps=2000
skip_steps=10
eval_mertrics=simple_accuracy
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=1000
fi
python -u ./src/run_classifier.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_pred ${do_pred:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/SST-2/train.tsv \
--dev_set ./data/SST-2/dev.tsv \
--test_set ./data/SST-2/test.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-2} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${CONFIG_PATH} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir
exit 0
output_name="img2txt"
init_model="./model_files/unimo_base_en"
data_path="./data/coco"
object_file_local_path="coco_object_0.35_tot.ids"
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=10000
validation_steps=10000
label_smooth=0.1
weight_decay=0.01
max_seq_len=50
random_seed=666
#decoding params
do_decode="true"
max_img_len=101
max_obj_len=100
max_tgt_len=50
max_out_len=50
min_out_len=5
beam_size=5
length_penalty=0.6
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#dataset
train_filelist="train_filelist"
valid_filelist="valid_filelist"
test_filelist="test_filelist"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/coco/eval.sh"
eval_mertrics="Bleu_1,Bleu_2,Bleu_3,Bleu_4,METEOR,ROUGE_L,CIDEr,SPICE"
## turning params
pred_batch_size=8
epoch=20
BATCH_SIZE=("4")
LR_RATE=("7e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
## adversial training params
adv_type="villa"
adv_step=4
adv_lr=0.03
norm_type="l2"
adv_max_norm=0
adv_init_mag=0.4
adv_kl_weight=1.0
##configuration
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
timestamp=`date "+%Y%m%d-%H%M%S"`
echo $timestamp
# check
check_iplist
set -eu
output_dir=../output-coco
log_dir=../log-coco
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
--split_log_path $log_dir \
--nproc_per_node 16"
skip_steps=10
save_steps=10000
validation_steps=10000
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_img2txt.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_filelist ${data_path}/${train_filelist:-""} \
--valid_filelist ${data_path}/${valid_filelist:-""} \
--test_filelist ${data_path}/${test_filelist:-""} \
--object_file ${data_path}/${object_file_local_path:-""} \
--epoch ${epoch} \
--task_type ${task_type:-"img2txt"} \
--max_seq_len ${max_seq_len} \
--max_img_len ${max_img_len} \
--max_obj_len ${max_obj_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--adv_step ${adv_step:-2} \
--adv_lr ${adv_lr:-0.05} \
--adv_type ${adv_type:-"None"} \
--norm_type ${norm_type:-"l2"} \
--adv_max_norm ${adv_max_norm:-0.4} \
--adv_init_mag ${adv_init_mag:-0.4} \
--adv_kl_weight ${adv_kl_weight:-1.5} \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-""} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="img2txt"
init_model="./model_files/unimo_large_en"
data_path="./data/coco"
object_file_local_path="coco_object_0.2_tot.ids"
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=10000
validation_steps=10000
label_smooth=0.1
weight_decay=0.01
max_seq_len=50
random_seed=666
#decoding params
do_decode="true"
max_img_len=101
max_obj_len=100
max_tgt_len=50
max_out_len=50
min_out_len=5
beam_size=5
length_penalty=0.6
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#dataset
train_filelist="train_filelist"
valid_filelist="valid_filelist"
test_filelist="test_filelist"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/coco/eval.sh"
eval_mertrics="Bleu_1,Bleu_2,Bleu_3,Bleu_4,METEOR,ROUGE_L,CIDEr,SPICE"
## turning params
pred_batch_size=1
epoch=10
BATCH_SIZE=("2")
LR_RATE=("7e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
## adversial training params
adv_type="villa"
adv_step=4
adv_lr=0.05
norm_type="l2"
adv_max_norm=0.4
adv_init_mag=0.4
adv_kl_weight=1.0
with_pure_model="True"
## configuration
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
timestamp=`date "+%Y%m%d-%H%M%S"`
echo $timestamp
# check
check_iplist
set -eu
output_dir=../output-coco_large
log_dir=../log-coco_large
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
--split_log_path $log_dir \
--nproc_per_node 16"
skip_steps=10
save_steps=10000
validation_steps=10000
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_img2txt.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_filelist ${data_path}/${train_filelist:-""} \
--valid_filelist ${data_path}/${valid_filelist:-""} \
--test_filelist ${data_path}/${test_filelist:-""} \
--object_file ${data_path}/${object_file_local_path:-""} \
--epoch ${epoch} \
--task_type ${task_type:-"img2txt"} \
--max_seq_len ${max_seq_len} \
--max_img_len ${max_img_len} \
--max_obj_len ${max_obj_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--adv_step ${adv_step:-2} \
--adv_lr ${adv_lr:-0.05} \
--adv_type ${adv_type:-"None"} \
--norm_type ${norm_type:-"l2"} \
--adv_max_norm ${adv_max_norm:-0.4} \
--adv_init_mag ${adv_init_mag:-0.4} \
--adv_kl_weight ${adv_kl_weight:-1.5} \
--with_pure_model ${with_pure_model:-"True"} \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-""} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="regression"
task=STS-B
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=1
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=20
eval_mertrics=pearson_and_spearman
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_mnli_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=250
fi
python -u ./src/run_regression.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_pred ${do_pred:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/STS-B/train.tsv \
--dev_set ./data/STS-B/dev.tsv \
--test_set ./data/STS-B/test.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-2} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"pearson_and_spearman"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir
exit 0
output_name="regression"
task=STS-B_large
## hyper param
use_fp16="False"
do_train="True"
do_val="True"
do_test="False"
do_pred="True"
num_labels=1
weight_decay=0
max_len=512
warmup_ratio=0.06
save_checkpoints="False"
save_steps=500
validation_steps=500
skip_steps=20
eval_mertrics=pearson_and_spearman
EPOCH=("10")
BATCH_SIZE=("16" "32")
LR_RATE=("1e-5" "2e-5" "3e-5")
DD_RAND_SEED=("1" "2" "3" "4" "5")
init_model="./model_files/unimo_mnli_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
if [[ ${do_pred} == "True" ]]; then
pred_save_prefix="${output_dir}/predict"
mkdir -p $pred_save_prefix
fi
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
if [[ ${do_pred} == "True" ]]; then
pred_save="${pred_save_prefix}/test.${seed}.${epoch}.${lr}.${bs}"
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
if [[ ${bs} == "32" ]]; then
validation_steps=250
fi
python -u ./src/run_regression.py --use_cuda "True" \
--is_distributed ${is_distributed:-"False"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_pred ${do_pred:-"True"} \
--pred_save ${pred_save:-"./output/predict/test"} \
--batch_size ${bs:-16} \
--init_pretraining_params ${init_model:-""} \
--train_set ./data/STS-B/train.tsv \
--dev_set ./data/STS-B/dev.tsv \
--test_set ./data/STS-B/test.tsv \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"10"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-2} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"pearson_and_spearman"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir
exit 0
......@@ -2,7 +2,6 @@
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
......@@ -11,7 +10,6 @@ source ./utils.sh
check_iplist
export FLAGS_fuse_parameter_memory_size=64
set -eu
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
......@@ -41,7 +39,7 @@ python $lanch_start ./src/run_retrieval.py \
--use_fuse ${use_fuse:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-2} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--use_dynamic_loss_scaling ${use_fp16:-"False"} \
--use_sigmoid ${use_sigmoid:-"False"} \
......
output_name="retrieval"
task=Flickr30k_large
## hyper param
epoch=40
do_train="True"
do_val="True"
do_test="True"
save_checkpoints="False"
save_steps=10000
validation_steps=10000
samples_num=20
bbox="bbox100"
max_img_len=101
seed=1
batch_size=2
test_batch_size=96
lr=5e-6
learning_rate_scale=0.1
learning_rate_decay_epoch1=24
learning_rate_decay_epoch2=32
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export FLAGS_fuse_parameter_memory_size=64
output_dir=./output/${task}
log_dir=${output_dir}/log
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $save_model_base_dir
log_prefix=$seed"_"$epoch"_"$lr"_"$batch_size"."
eval_dir="${output_dir}/tmp/params.${seed}.${epoch}.${lr}.${batch_size}"
mkdir -p $eval_dir
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${batch_size}"
mkdir -p $save_model_dir
fi
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3,4,5,6,7 \
--split_log_path $log_dir \
--log_prefix $log_prefix \
--nproc_per_node 8"
lanch_start=" -u ./src/launch.py ${distributed_args} "
python $lanch_start ./src/run_retrieval.py \
--use_cuda "True" \
--is_distributed ${is_distributed:-"True"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fuse ${use_fuse:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--use_dynamic_loss_scaling ${use_fp16:-"False"} \
--use_sigmoid ${use_sigmoid:-"False"} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--scale_circle ${scale_circle:-1.0} \
--margin ${margin:-0.2} \
--verbose true \
--samples_num ${samples_num:-20} \
--run_random ${run_random:-"False"} \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--batch_size ${batch_size:-16} \
--test_batch_size ${test_batch_size:-96} \
--init_pretraining_params ${init_model:-""} \
--train_image_caption ./data/Flickr30k/flickr30k-textids/train.ids \
--train_image_feature_dir ./data/Flickr30k/flickr30k-features/$bbox/train \
--dev_image_caption ./data/Flickr30k/flickr30k-textids/val.all.ids \
--dev_image_feature_dir ./data/Flickr30k/flickr30k-features/$bbox/dev \
--test_image_caption ./data/Flickr30k/flickr30k-textids/test.all.ids \
--test_image_feature_dir ./data/Flickr30k/flickr30k-features/$bbox/test \
--img_id_path ./data/Flickr30k/flickr30k-textids/dataset_flickr30k_name_id.txt \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_step ${warmup_step:-"1"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--max_img_len ${max_img_len:-37} \
--learning_rate ${lr:-"5e-6"} \
--learning_rate_scale ${learning_rate_scale:-0.1} \
--learning_rate_decay_epoch1 ${learning_rate_decay_epoch1:-24} \
--learning_rate_decay_epoch2 ${learning_rate_decay_epoch2:-32} \
--lr_scheduler ${lr_scheduler:-"scale_by_epoch_decay"} \
--skip_steps ${skip_steps:-"50"} \
--num_iteration_per_drop_scope 10 \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"recall@k"} \
--eval_dir $eval_dir \
--random_seed ${seed:-1} \
>> $log_dir/${log_prefix}lanch.log 2>&1
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_base_en"
data_path='./data/cnndm'
## hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=512
max_tgt_len=128
max_out_len=128
min_out_len=20
beam_size=5
length_penalty=0.6
block_trigram="True"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.2k.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/cnndm/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=10
epoch=20
BATCH_SIZE=("8")
LR_RATE=("4e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-cnndm
log_dir=../log-cnndm
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"normal"} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--in_tokens ${in_tokens:-"True"} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu"} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/cnndm'
## hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=512
max_tgt_len=128
max_out_len=128
min_out_len=20
beam_size=6
length_penalty=1.2
block_trigram="true"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.2k.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/cnndm/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("2e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
## configuration
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-cnndm_large
log_dir=../log-cnndm_large
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"normal"} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--in_tokens ${in_tokens:-"True"} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu"} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/coqa'
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=10000
validation_steps=10000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
random_seed=666
#for multi-turn dialog/qa
task_type="dialog"
role_type_size=3
turn_type_size=16
#decoding params
do_decode="true"
max_src_len=480
max_tgt_len=32
max_out_len=30
min_out_len=0
beam_size=3
length_penalty=0.0
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.tsv"
test_set="dev.tsv"
do_train="true"
do_val="true"
do_test="false"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/coqa/eval.sh"
eval_mertrics="f1"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("8e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-coqa_large
log_dir=../log-coqa_large
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 4,5,6,7 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"dialog"} \
--role_type_size ${role_type_size:-3} \
--turn_type_size ${turn_type_size:-16} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--in_tokens ${in_tokens:-"True"} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu_1"} \
--random_seed ${random_seed:-"666"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_base_en"
data_path='./data/gigaword'
# hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=128
max_tgt_len=32
max_out_len=32
min_out_len=5
beam_size=5
length_penalty=0.6
block_trigram="False"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.20k.tsv"
pred_set="test.tsv"
test_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/gigaword/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=32
epoch=10
BATCH_SIZE=("32")
LR_RATE=("3e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-gigaword
log_dir=../log-gigaword
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed True \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"normal"} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1} \
--in_tokens ${in_tokens:-"True"} \
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu"} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/gigaword'
# hyper params
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=20000
validation_steps=20000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
#decoding params
do_decode="true"
max_src_len=128
max_tgt_len=32
max_out_len=32
min_out_len=5
beam_size=6
length_penalty=1.2
block_trigram="false"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.20k.tsv"
pred_set="test.tsv"
test_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/gigaword/eval.sh"
eval_mertrics="rouge-1,rouge-2,rouge-l"
## turning params
in_tokens="False"
pred_batch_size=32
epoch=10
BATCH_SIZE=("32")
LR_RATE=("3e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-gigaword_large
log_dir=../log-gigaword_large
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed True \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"normal"} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1} \
--in_tokens ${in_tokens:-"True"} \
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu"} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_base_en"
data_path='./data/squad_qg'
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=5000
validation_steps=5000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
random_seed=666
#decoding params
do_decode="true"
max_src_len=416
max_tgt_len=96
max_out_len=48
min_out_len=5
beam_size=5
length_penalty=1.0
block_trigram="false"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/squad_qg/eval.sh"
eval_mertrics="Bleu_4,METEOR,ROUGE_L"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("1.25e-5")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-squad_qg
log_dir=../log-squad_qg
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"normal"} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--in_tokens ${in_tokens:-"True"} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu"} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="seq2seq"
init_model="./model_files/unimo_large_en"
data_path='./data/squad_qg'
# hyper param
lr_scheduler="linear_warmup_decay"
use_fp16="False"
# Merge the ALLReduce times of a layer
use_fuse="True"
use_hierarchical_allreduce="True"
loss_scaling=12800
skip_steps=100
save_steps=5000
validation_steps=5000
label_smooth=0.1
weight_decay=0.01
max_seq_len=512
random_seed=666
#decoding params
do_decode="true"
max_src_len=416
max_tgt_len=96
max_out_len=48
min_out_len=5
beam_size=6
length_penalty=1.2
block_trigram="false"
use_multi_gpu_test="True"
#adam optimizer
beta1=0.9
beta2=0.98
epsilon=1e-06
#data
tokenized_input="True"
continuous_position="False"
#dataset
train_set="train.tsv"
dev_set="dev.tsv"
test_set="test.tsv"
pred_set="test.tsv"
do_train="true"
do_val="false"
do_test="true"
do_pred="false"
#evaluate
eval_script="bash ./src/eval/tasks/squad_qg/eval.sh"
eval_mertrics="Bleu_4,METEOR,ROUGE_L"
## turning params
in_tokens="False"
pred_batch_size=8
epoch=20
BATCH_SIZE=("8")
LR_RATE=("5e-6")
DD_RAND_SEED=("1")
WARMUP_PROP=("0.06")
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
\ No newline at end of file
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
# config env
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
# check
check_iplist
set -eu
output_dir=../output-squad_qg_large
log_dir=../log-squad_qg_large
mkdir -p $output_dir $log_dir
e_executor=$(echo ${use_experimental_executor-'True'} | tr '[A-Z]' '[a-z]')
use_fuse=$(echo ${use_fuse-'False'} | tr '[A-Z]' '[a-z]')
if [[ ${use_fuse} == "true" ]]; then
#MB
export FLAGS_fuse_parameter_memory_size=64
fi
export DEV_PREFIX=`echo ${dev_set:-"dev.tsv"} | sed 's/\.tsv$//'`
export TEST_PREFIX=`echo ${test_set:-"test.tsv"} | sed 's/\.tsv$//'`
export PRED_PREFIX=`echo ${pred_set:-"pred.tsv"} | sed 's/\.tsv$//'`
export EVAL_SCRIPT_LOG=${MYDIR}/../../../${output_dir}/eval.log
export TASK_DATA_PATH=${data_path}
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3 \
--split_log_path $log_dir \
--nproc_per_node 4"
for random_seed in "${DD_RAND_SEED[@]}"; do
echo "random_seed "${random_seed}
for batch_size in "${BATCH_SIZE[@]}"; do
echo "batch_size "${batch_size}
for warmup_proportion in "${WARMUP_PROP[@]}"; do
echo "warmup_proportion "${warmup_proportion}
for learning_rate in "${LR_RATE[@]}"; do
echo "learning rate "${learning_rate}
python -u ./src/launch.py ${distributed_args} \
./src/run_seq2seq.py --use_cuda "True" \
--is_distributed "True" \
--use_multi_gpu_test ${use_multi_gpu_test:-"True"} \
--use_fp16 ${use_fp16:-"False"} \
--use_dynamic_loss_scaling ${use_fp16} \
--init_loss_scaling ${loss_scaling:-128} \
--use_fast_executor ${e_executor:-"True"} \
--use_fuse ${use_fuse:-"False"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"False"} \
--do_train ${do_train:-"true"} \
--do_val ${do_val:-"false"} \
--do_test ${do_test:-"true"} \
--do_pred ${do_pred:-"false"} \
--do_decode ${do_decode:-"True"} \
--train_set ${data_path}/${train_set:-""} \
--dev_set ${data_path}/${dev_set:-""} \
--test_set ${data_path}/${test_set:-""} \
--pred_set ${data_path}/${pred_set:-""} \
--epoch ${epoch} \
--tokenized_input ${tokenized_input:-"True"} \
--task_type ${task_type:-"normal"} \
--max_seq_len ${max_seq_len} \
--max_src_len ${max_src_len} \
--max_tgt_len ${max_tgt_len} \
--max_out_len ${max_out_len} \
--min_out_len ${min_out_len} \
--block_trigram ${block_trigram:-"True"} \
--beam_size ${beam_size:-5} \
--length_penalty ${length_penalty:-0.6} \
--hidden_dropout_prob ${hidden_dropout_prob:-0.1} \
--attention_probs_dropout_prob ${attention_probs_dropout_prob:-0.1} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.98} \
--epsilon ${epsilon:-1e-06} \
--continuous_position ${continuous_position:-"false"} \
--tgt_type_id ${tgt_type_id:-1}\
--batch_size ${batch_size} \
--pred_batch_size ${pred_batch_size} \
--in_tokens ${in_tokens:-"True"} \
--learning_rate ${learning_rate} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--warmup_proportion ${warmup_proportion:-0.02} \
--weight_decay ${weight_decay:-0.01} \
--weight_sharing ${weight_sharing:-"True"} \
--label_smooth ${label_smooth:-0.1} \
--init_pretraining_params ${init_model:-""} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--checkpoints $output_dir \
--save_steps ${save_steps:-10000} \
--validation_steps ${validation_steps:-10000} \
--skip_steps ${skip_steps:-10} \
--save_and_valid_by_epoch ${save_and_valid_by_epoch:-"False"} \
--eval_script ${eval_script:-""} \
--eval_mertrics ${eval_mertrics:-"bleu"} \
--random_seed ${random_seed:-"1"} >> $log_dir/lanch.log 2>&1
done
done
done
done
python ./src/utils/extract_eval_res.py --log_dir=$log_dir
exit 0
output_name="visual_entailment"
task=SNLI-VE
bbox="bbox100"
weight_decay=0
max_len=512
warmup_ratio=0.06
eval_mertrics=simple_accuracy
do_train="True"
do_val="True"
do_test="True"
do_test_hard="False"
test_batch_size=24
save_checkpoints="False"
save_steps=2000
validation_steps=1000
EPOCH=("10")
BATCH_SIZE=("12")
LR_RATE=("1e-5")
DD_RAND_SEED=("1")
init_model="./model_files/unimo_base_en"
config_path="./model_files/config/unimo_base_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export FLAGS_fuse_parameter_memory_size=64
output_dir=./output/${task}
log_dir=${output_dir}/log
eval_dir=${output_dir}/tmp
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $eval_dir $save_model_base_dir
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
eval_dir="${output_dir}/tmp/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $eval_dir
if [[ ${bs} == "32" ]]; then
validation_steps=2000
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
--split_log_path $log_dir \
--log_prefix $log_prefix \
--nproc_per_node 16"
python -u ./src/launch.py ${distributed_args} \
./src/run_visual_entailment.py --use_cuda "True" \
--is_distributed ${is_distributed:-"True"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fuse ${use_fuse:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"True"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16:-"false"} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.999} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_test_hard ${do_test_hard:-"True"} \
--num_train_examples ${num_train_examples:-529527} \
--adv_step ${adv_step:-4} \
--adv_lr ${adv_lr:-0.05} \
--norm_type ${norm_type:-"l2"} \
--adv_max_norm ${adv_max_norm:-0.4} \
--adv_init_mag ${adv_init_mag:-0.4} \
--batch_size ${bs:-16} \
--test_batch_size ${test_batch_size:-16} \
--init_pretraining_params ${init_model:-""} \
--train_filelist "./data/SNLI-VE/$bbox/train_filelist" \
--dev_filelist "./data/SNLI-VE/$bbox/dev_filelist" \
--test_filelist "./data/SNLI-VE/$bbox/test_filelist" \
--test_hard_filelist ${test_hard_filelist:-""} \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--max_img_len ${max_img_len:-101} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"100"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-3} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--eval_dir ${eval_dir:-"./output/tmp"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir --key_words=job.log.0
exit 0
output_name="visual_entailment"
task=SNLI-VE_large
bbox="bbox100"
weight_decay=0
max_len=512
warmup_ratio=0.06
eval_mertrics=simple_accuracy
do_train="True"
do_val="True"
do_test="True"
do_test_hard="False"
test_batch_size=8
save_checkpoints="False"
save_steps=2000
validation_steps=1000
EPOCH=("10")
BATCH_SIZE=("4")
LR_RATE=("1e-5")
DD_RAND_SEED=("1")
init_model="./model_files/unimo_large_en"
config_path="./model_files/config/unimo_large_en.json"
vocab_file="./model_files/dict/unimo_en.vocab.txt"
bpe_json="./model_files/dict/unimo_en.encoder.json"
bpe_file="./model_files/dict/unimo_en.vocab.bpe"
#!/usr/bin/env bash
set -eux
R_DIR=`dirname $0`; MYDIR=`cd $R_DIR;pwd`
cd ${MYDIR}/../../../
source ${MYDIR}/model_conf
source ./env.sh
source ./utils.sh
check_iplist
export FLAGS_fuse_parameter_memory_size=64
output_dir=./output/${task}
log_dir=${output_dir}/log
eval_dir=${output_dir}/tmp
save_model_base_dir=$output_dir/save_model
mkdir -p $output_dir $log_dir $eval_dir $save_model_base_dir
for seed in "${DD_RAND_SEED[@]}"; do
echo "seed "$seed
for epoch in "${EPOCH[@]}"; do
echo "epoch "$epoch
for lr in "${LR_RATE[@]}"; do
echo "learning rate "$lr
for bs in "${BATCH_SIZE[@]}"; do
echo "batch_size "$bs
log_prefix=$seed"_"$epoch"_"$lr"_"$bs"."
eval_dir="${output_dir}/tmp/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $eval_dir
if [[ ${bs} == "32" ]]; then
validation_steps=2000
fi
if [[ ${save_checkpoints} == "True" ]]; then
save_model_dir="${save_model_base_dir}/params.${seed}.${epoch}.${lr}.${bs}"
mkdir -p $save_model_dir
fi
distributed_args="--node_ips ${PADDLE_TRAINERS} \
--node_id ${PADDLE_TRAINER_ID} \
--current_node_ip ${POD_IP} \
--selected_gpus 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 \
--split_log_path $log_dir \
--log_prefix $log_prefix \
--nproc_per_node 16"
python -u ./src/launch.py ${distributed_args} \
./src/run_visual_entailment.py --use_cuda "True" \
--is_distributed ${is_distributed:-"True"} \
--weight_sharing ${weight_sharing:-"True"} \
--use_fuse ${use_fuse:-"True"} \
--use_fast_executor ${e_executor:-"true"} \
--use_fp16 ${use_fp16:-"false"} \
--nccl_comm_num ${nccl_comm_num:-1} \
--use_hierarchical_allreduce ${use_hierarchical_allreduce:-"True"} \
--in_tokens ${in_tokens:-"false"} \
--use_dynamic_loss_scaling ${use_fp16:-"false"} \
--init_loss_scaling ${loss_scaling:-12800} \
--beta1 ${beta1:-0.9} \
--beta2 ${beta2:-0.999} \
--epsilon ${epsilon:-1e-06} \
--verbose true \
--do_train ${do_train:-"True"} \
--do_val ${do_val:-"True"} \
--do_test ${do_test:-"True"} \
--do_test_hard ${do_test_hard:-"True"} \
--num_train_examples ${num_train_examples:-529527} \
--adv_step ${adv_step:-4} \
--adv_lr ${adv_lr:-0.05} \
--norm_type ${norm_type:-"l2"} \
--adv_max_norm ${adv_max_norm:-0.4} \
--adv_init_mag ${adv_init_mag:-0.4} \
--batch_size ${bs:-16} \
--test_batch_size ${test_batch_size:-16} \
--init_pretraining_params ${init_model:-""} \
--train_filelist "./data/SNLI-VE/$bbox/train_filelist" \
--dev_filelist "./data/SNLI-VE/$bbox/dev_filelist" \
--test_filelist "./data/SNLI-VE/$bbox/test_filelist" \
--test_hard_filelist ${test_hard_filelist:-""} \
--checkpoints ${save_model_dir:-""} \
--save_checkpoints ${save_checkpoints:-"True"} \
--save_steps ${save_steps:-1000} \
--weight_decay ${weight_decay:-"0.1"} \
--warmup_proportion ${warmup_ratio:-"0.06"} \
--validation_steps ${validation_steps:-"100"} \
--epoch $epoch \
--max_seq_len ${max_len:-512} \
--max_img_len ${max_img_len:-101} \
--learning_rate ${lr:-"5e-5"} \
--lr_scheduler ${lr_scheduler:-"linear_warmup_decay"} \
--skip_steps ${skip_steps:-"100"} \
--num_iteration_per_drop_scope 10 \
--num_labels ${num_labels:-3} \
--unimo_vocab_file ${vocab_file} \
--encoder_json_file ${bpe_json} \
--vocab_bpe_file ${bpe_file} \
--unimo_config_path ${config_path} \
--eval_mertrics ${eval_mertrics:-"simple_accuracy"} \
--eval_dir ${eval_dir:-"./output/tmp"} \
--random_seed ${seed:-1} >> $log_dir/${log_prefix}lanch.log 2>&1
done
done
done
done
if [[ $? -ne 0 ]]; then
echo "run failed"
exit 1
fi
python ./src/utils/stat_res.py --log_dir=$log_dir --key_words=job.log.0
exit 0
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""args for visual_entailment task"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
from utils.args import ArgumentGroup
# yapf: disable
parser = argparse.ArgumentParser(__doc__)
model_g = ArgumentGroup(parser, "model", "model configuration and paths.")
model_g.add_arg("init_checkpoint", str, None, "Init checkpoint to resume training from.")
model_g.add_arg("init_pretraining_params", str, None,
"Init pre-training params which preforms fine-tuning from. If the "
"arg 'init_checkpoint' has been set, this argument wouldn't be valid.")
model_g.add_arg("checkpoints", str, "checkpoints", "Path to save checkpoints.")
model_g.add_arg("save_checkpoints", bool, True, "Whether to save checkpoints")
model_g.add_arg("weight_sharing", bool, True, "If set, share weights between word embedding and masked lm.")
model_g.add_arg("unimo_vocab_file", str, './model_files/dict/unimo_en.vocab.txt', "unimo vocab")
model_g.add_arg("encoder_json_file", str, './model_files/dict/unimo_en.encoder.json', 'bpt map')
model_g.add_arg("vocab_bpe_file", str, './model_files/dict/unimo_en.vocab.bpe', "vocab bpe")
model_g.add_arg("unimo_config_path", str, "./model_files/config/unimo_base_en.json",
"The file to save unimo configuration.")
train_g = ArgumentGroup(parser, "training", "training options.")
train_g.add_arg("epoch", int, 3, "Number of epoches for fine-tuning.")
train_g.add_arg("learning_rate", float, 5e-5, "Learning rate used to train with warmup.")
train_g.add_arg("lr_scheduler", str, "linear_warmup_decay",
"scheduler of learning rate.", choices=['linear_warmup_decay', 'noam_decay'])
train_g.add_arg("weight_decay", float, 0.01, "Weight decay rate for L2 regularizer.")
train_g.add_arg("warmup_proportion", float, 0.1,
"Proportion of training steps to perform linear learning rate warmup for.")
train_g.add_arg("save_steps", int, 10000, "The steps interval to save checkpoints.")
train_g.add_arg("validation_steps", int, 1000, "The steps interval to evaluate model performance.")
train_g.add_arg("nccl_comm_num", int, 1, "NCCL comm num.")
train_g.add_arg("hierarchical_allreduce_inter_nranks", int, 8, "Hierarchical allreduce inter ranks.")
train_g.add_arg("use_hierarchical_allreduce", bool, False, "Use hierarchical allreduce or not.")
train_g.add_arg("use_fp16", bool, False, "Whether to use fp16 mixed precision training.")
train_g.add_arg("use_dynamic_loss_scaling", bool, False, "Whether to use dynamic loss scaling.")
train_g.add_arg("init_loss_scaling", float, 1.0,
"Loss scaling factor for mixed precision training, only valid when use_fp16 is enabled.")
train_g.add_arg("incr_every_n_steps", int, 100, "Increases loss scaling every n consecutive.")
train_g.add_arg("decr_every_n_nan_or_inf", int, 2,
"Decreases loss scaling every n accumulated steps with nan or inf gradients.")
train_g.add_arg("incr_ratio", float, 2.0,
"The multiplier to use when increasing the loss scaling.")
train_g.add_arg("decr_ratio", float, 0.8,
"The less-than-one-multiplier to use when decreasing.")
train_g.add_arg("use_fuse", bool, False, "Whether to use fuse_allreduce_ops.")
# args for villa adv_lr, norm_type, adv_max_norm, adv_init_mag
train_g.add_arg("adv_step", int, 4, "adv_step")
train_g.add_arg("adv_lr", float, 0.05, "adv_lr")
train_g.add_arg("norm_type", str, 'l2', "norm_type")
train_g.add_arg("adv_max_norm", float, 0.4, "adv_max_norm")
train_g.add_arg("adv_init_mag", float, 0.4, "adv_init_mag")
# args for adam optimizer
train_g.add_arg("beta1", float, 0.9, "beta1 for adam")
train_g.add_arg("beta2", float, 0.98, "beta2 for adam.")
train_g.add_arg("epsilon", float, 1e-06, "epsilon for adam.")
log_g = ArgumentGroup(parser, "logging", "logging related.")
log_g.add_arg("skip_steps", int, 10, "The steps interval to print loss.")
log_g.add_arg("verbose", bool, False, "Whether to output verbose log.")
log_g.add_arg("eval_dir", str, "", "eval_dir to save tmp data")
data_g = ArgumentGroup(parser, "data", "Data paths, vocab paths and data processing options")
data_g.add_arg("train_filelist", str, None, "Path to training data.")
data_g.add_arg("test_filelist", str, None, "Path to test data.")
data_g.add_arg("test_hard_filelist", str, None, "Path to test_hard data.")
data_g.add_arg("dev_filelist", str, None, "Path to validation data.")
data_g.add_arg("max_seq_len", int, 512, "Number of words of the longest seqence.")
data_g.add_arg("max_img_len", int, 37, "Image feature size==2048.")
data_g.add_arg("num_train_examples", int, 0, "num_train_examples")
data_g.add_arg("batch_size", int, 32, "Total examples' number in batch for training. see also --in_tokens.")
data_g.add_arg("test_batch_size", int, 24, "Total examples' number in batch for training. see also --in_tokens.")
data_g.add_arg("in_tokens", bool, False,
"If set, the batch size will be the maximum number of tokens in one batch. "
"Otherwise, it will be the maximum number of examples in one batch.")
data_g.add_arg("do_lower_case", bool, True,
"Whether to lower case the input text. Should be True for uncased models and False for cased models.")
data_g.add_arg("random_seed", int, 0, "Random seed.")
data_g.add_arg("num_labels", int, 3, "label number")
run_type_g = ArgumentGroup(parser, "run_type", "running type options.")
run_type_g.add_arg("use_cuda", bool, True, "If set, use GPU for training.")
run_type_g.add_arg("is_distributed", bool, False, "If set, then start distributed training.")
run_type_g.add_arg("use_fast_executor", bool, False, "If set, use fast parallel executor (in experiment).")
run_type_g.add_arg("num_iteration_per_drop_scope", int, 10, "Iteration intervals to drop scope.")
run_type_g.add_arg("do_train", bool, True, "Whether to perform training.")
run_type_g.add_arg("do_val", bool, True, "Whether to perform evaluation on dev data set.")
run_type_g.add_arg("do_test", bool, True, "Whether to perform evaluation on test data set.")
run_type_g.add_arg("do_test_hard", bool, False, "Whether to perform evaluation on test data set.")
run_type_g.add_arg("eval_mertrics", str, "simple_accuracy", "eval_mertrics")
# yapf: enable
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Model for visual_entailment."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import glob
import time
import numpy as np
import paddle.fluid as fluid
from model.unimo_finetune import UNIMOModel
from eval import glue_eval
from collections import OrderedDict
from utils.utils import print_eval_log
def kl_divergence_with_logits(q_logits, p_logits):
"""
symmetric KL-divergence (See SMART, Sec 3.1)
q_logits: logits
p_logits: delta_logits
"""
q = fluid.layers.softmax(input=q_logits)
p = fluid.layers.softmax(input=p_logits)
kl_qp = fluid.layers.reduce_sum(q * (fluid.layers.log(q) - fluid.layers.log(p)), -1)
kl_pq = fluid.layers.reduce_sum(p * (fluid.layers.log(p) - fluid.layers.log(q)), -1)
vat_loss = fluid.layers.mean(x=kl_qp+kl_pq)
return vat_loss
def create_model(args, config, pyreader_name="train_reader", is_train=True):
"""create_model"""
shapes = [[-1, args.max_seq_len, 1], # src_ids
[-1, args.max_seq_len, 1], # pos_ids
[-1, args.max_seq_len, 1], # sent_ids
[-1, args.max_img_len + args.max_seq_len, args.max_img_len + args.max_seq_len], # input_mask
[-1, args.max_img_len, 1], # v_mask
[-1, args.max_seq_len, 1], # t_mask
[-1, args.max_img_len, config["image_embedding_size"]], # image_embedding
[-1, args.max_img_len, 5], # image_loc
[-1, 1] # labels
]
dtypes = ['int64', 'int64', 'int64', 'float32', 'float32', 'float32', 'float32','float32', 'int64']
lod_levels = [0, 0, 0, 0, 0, 0, 0, 0, 0]
pyreader = fluid.layers.py_reader(
capacity=70,
shapes=shapes,
dtypes=dtypes,
lod_levels=lod_levels,
name=pyreader_name,
use_double_buffer=True)
(src_ids, pos_ids, sent_ids, input_mask, v_mask, t_mask, image_embedding, image_loc, labels) \
= fluid.layers.read_file(pyreader)
emb_ids = {"word_embedding": src_ids, "sent_embedding": sent_ids, "pos_embedding": pos_ids}
image_input = {"image_embedding": image_embedding, "loc_embedding": image_loc}
adv_step, adv_lr, norm_type, adv_max_norm, adv_init_mag = \
args.adv_step, args.adv_lr, args.norm_type, args.adv_max_norm, args.adv_init_mag
assert adv_step > 0 and adv_init_mag > 0
def get_loss_and_logits(text_feats, image_feats):
feats = text_feats + image_feats
cls_params_name = ["cls_out_w_0", "cls_out_b_0"]
feats = fluid.layers.fc(
input=feats,
size=2048,
param_attr=fluid.ParamAttr(
name=cls_params_name[0],
initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
bias_attr=fluid.ParamAttr(
name=cls_params_name[1], initializer=fluid.initializer.Constant(0.)))
feats = fluid.layers.dropout(
x=feats,
dropout_prob=0.1,
dropout_implementation="upscale_in_train")
cls_params_name = ["cls_out_w_1", "cls_out_b_1"]
logits = fluid.layers.fc(
input=feats,
size=args.num_labels,
param_attr=fluid.ParamAttr(
name=cls_params_name[0],
initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
bias_attr=fluid.ParamAttr(
name=cls_params_name[1], initializer=fluid.initializer.Constant(0.)))
ce_loss, probs = fluid.layers.softmax_with_cross_entropy(
logits=logits, label=labels, return_softmax=True)
loss = fluid.layers.mean(x=ce_loss) / adv_step
return loss, logits, probs
def init_delta(input, mask, shape, name='text'):
real_seq_len = fluid.layers.shape(input)[1]
fake = fluid.layers.data(name=name+"_fake", shape=shape, dtype='float32')
mask_slice = fluid.layers.slice(mask, axes=[1], starts=[0], ends=fluid.layers.shape(mask)[1])
length = fluid.layers.reduce_sum(mask_slice, dim=1, keep_dim=True) * shape[-1]
# l2 norm
delta = fluid.layers.uniform_random_batch_size_like(mask, shape=fake.shape, min=-1.0, max=1.0)
delta = fluid.layers.slice(delta, axes=[1], starts=[0], ends=real_seq_len)
delta = delta * mask_slice
mag = adv_init_mag / fluid.layers.sqrt(length)
delta = delta * mag
return delta
if is_train:
text_emb_shape = [-1, args.max_seq_len, config['hidden_size']]
text_delta = init_delta(src_ids, t_mask, text_emb_shape, name='text')
image_emb_shape = [-1, args.max_img_len, config['image_embedding_size']]
image_delta = init_delta(image_embedding, v_mask, image_emb_shape, name='img')
else:
text_delta, image_delta = None, None
def pgd_with_l2(loss, delta):
# grad
delta_grad = fluid.backward.gradients(loss, delta)[0]
# l2 norm
delta_norm = fluid.layers.sqrt(fluid.layers.reduce_sum(fluid.layers.pow(fluid.layers.reshape(delta_grad, \
[fluid.layers.shape(delta_grad)[0], -1]), factor=2), dim=1, keep_dim=True))
delta_norm = fluid.layers.clamp(delta_norm, min=float(1e-8))
# pgd
delta = delta + adv_lr * delta_grad / delta_norm
# projection
if adv_max_norm > 0:
exceed_mask = (delta_norm > adv_max_norm).astype('float32')
reweights = (adv_max_norm / delta_norm) * exceed_mask + (1 - exceed_mask)
delta = delta * reweights
delta_grad.stop_gradient=True
return delta
loss = None
for iter in range(adv_step):
vl_pure = UNIMOModel(
emb_ids=emb_ids,
input_mask=input_mask,
config=config,
image_input=image_input,
weight_sharing=args.weight_sharing
)
vl_text = UNIMOModel(
text_adv_delta=text_delta,
emb_ids=emb_ids,
input_mask=input_mask,
config=config,
image_input=image_input,
weight_sharing=args.weight_sharing
)
vl_image = UNIMOModel(
image_adv_delta=image_delta,
emb_ids=emb_ids,
input_mask=input_mask,
config=config,
image_input=image_input,
weight_sharing=args.weight_sharing
)
h_pure_text, h_pure_image = vl_pure.get_pooled_output()
h_text_text, h_text_image = vl_text.get_pooled_output()
h_image_text, h_image_image = vl_image.get_pooled_output()
loss_pure, logit_pure, probs_pure = get_loss_and_logits(h_pure_text, h_pure_image)
loss_text, logit_text, probs_text = get_loss_and_logits(h_text_text, h_text_image)
loss_image, logit_image, probs_image = get_loss_and_logits(h_image_text, h_image_image)
if is_train:
text_delta = pgd_with_l2(loss_text, text_delta)
image_delta = pgd_with_l2(loss_image, image_delta)
kl_adv_text_loss = kl_divergence_with_logits(logit_pure, logit_text)
kl_adv_image_loss = kl_divergence_with_logits(logit_pure, logit_image)
cur_loss = loss_pure + loss_text + loss_image + kl_adv_text_loss + kl_adv_image_loss
loss = cur_loss if loss is None else loss + cur_loss
num_seqs = fluid.layers.create_tensor(dtype='int64')
accuracy = fluid.layers.accuracy(input=probs_pure, label=labels, total=num_seqs)
graph_vars = {
"loss": loss,
"probs": probs_pure,
"accuracy": accuracy,
"labels": labels,
"num_seqs": num_seqs
}
for k, v in graph_vars.items():
v.persistable = False
return pyreader, graph_vars
def evaluate(args, exe, test_pyreader, graph_vars, eval_phase, dev_count=1, gpu_id=0):
"""evaluate"""
all_mat = []
test_pyreader.start()
time_begin = time.time()
fetch_list = [graph_vars["probs"].name, graph_vars["labels"].name]
while True:
try:
np_probs, np_labels = exe.run(fetch_list=fetch_list)
np_preds = np.argmax(np_probs, axis=1).reshape((-1, 1))
np_labels = np_labels.reshape((-1, 1))
mat = np.concatenate([np_preds, np_labels], axis=1)
all_mat.extend(mat.tolist())
except fluid.core.EOFException:
test_pyreader.reset()
break
all_mat = np.array(all_mat)
time_end = time.time()
save_file = "%s/%s.trainers_%d.part_%d.npy" % (args.eval_dir, eval_phase, dev_count, gpu_id)
np.save(save_file, all_mat)
tmp_file = "%s/%s.trainers_%d.part_%d.finish" % (args.eval_dir, eval_phase, dev_count, gpu_id)
tmp_writer = open(tmp_file, "w")
tmp_writer.close()
if gpu_id == 0:
while True:
ret = os.popen('find %s -maxdepth 1 -name "%s.trainers_%d.part_*.finish"' %
(args.eval_dir, eval_phase, dev_count)).readlines()
if len(ret) != dev_count:
time.sleep(1)
continue
else:
break
all_mats = []
save_files = glob.glob("%s/%s.trainers_%d.part_*.npy" % (args.eval_dir, eval_phase, dev_count))
for cur_save_file in save_files:
mat = np.load(cur_save_file).tolist()
all_mats.extend(mat)
all_mats = np.array(all_mats)
cur_time = str(int(time.time()))
os.system("mkdir %s/%s" % (args.eval_dir, cur_time))
os.system("mv %s/%s.trainers_%d.* %s/%s" % (args.eval_dir, eval_phase, dev_count, args.eval_dir, cur_time))
ret = OrderedDict()
ret['phase'] = eval_phase
ret['loss'] = -1
ret['data_num'] = all_mats.shape[0]
ret['used_time'] = round(time_end - time_begin, 4)
metrics = OrderedDict()
metrics["simple_accuracy"] = glue_eval.simple_accuracy
if args.eval_mertrics in metrics:
ret_metric = metrics[args.eval_mertrics](all_mats[:, 0], all_mats[:, 1])
ret.update(ret_metric)
print_eval_log(ret)
else:
raise ValueError('unsupported metric {}'.format(args.eval_mertrics))
return ret
else:
return None
......@@ -92,17 +92,6 @@ class UNIMOModel(object):
self._is_img2txt_task = (task_type == "img2txt")
self._is_multimodal_task = (image_input is not None)
if emb_ids is not None and image_input is not None and emb_obj_ids is not None:
self._input_type = 'vol'
elif emb_ids is not None and image_input is not None:
self._input_type = 'vl'
elif emb_ids is not None:
self._input_type = 'l'
elif image_input is not None and emb_obj_ids is not None:
self._input_type = 'vo'
else:
raise ValueError('input feature error')
if self._is_dialogue_task:
self._role_type_size = config["role_type_size"]
self._turn_type_size = config["turn_type_size"]
......@@ -156,28 +145,39 @@ class UNIMOModel(object):
def _build_model(self, emb_ids=None, input_mask=None, image_input=None, emb_obj_ids=None, gather_idx=None):
"""build unimo model"""
if emb_ids is not None and image_input is not None and emb_obj_ids is not None:
input_type = 'vol'
elif emb_ids is not None and image_input is not None:
input_type = 'vl'
elif emb_ids is not None:
input_type = 'l'
elif image_input is not None and emb_obj_ids is not None:
input_type = 'vo'
else:
raise ValueError('input feature error')
self._enc_vol_out = None
self._enc_vl_out = None
self._enc_v_out = None
self._enc_l_out = None
if self._input_type == 'vol':
if input_type == 'vol':
self._enc_vol_out, self._enc_v_out, self._enc_l_out = self.encode(emb_ids=emb_ids,
input_mask=input_mask,
image_input=image_input,
emb_obj_ids=emb_obj_ids,
gather_idx=gather_idx)
elif self._input_type == 'vl':
elif input_type == 'vl':
self._enc_vl_out, self._enc_v_out, self._enc_l_out = self.encode(emb_ids=emb_ids,
input_mask=input_mask,
image_input=image_input,
gather_idx=gather_idx)
elif self._input_type == 'vo':
elif input_type == 'vo':
self._enc_v_out = self.encode(input_mask=input_mask,
image_input=image_input,
emb_obj_ids=emb_obj_ids,
gather_idx=gather_idx)
elif self._input_type == 'l':
elif input_type == 'l':
self._enc_l_out = self.encode(emb_ids=emb_ids,
input_mask=input_mask,
gather_idx=gather_idx)
......@@ -186,10 +186,22 @@ class UNIMOModel(object):
def encode(self, emb_ids=None, input_mask=None, image_input=None, emb_obj_ids=None, gather_idx=None):
"""unimo encoder"""
if emb_ids is not None and image_input is not None and emb_obj_ids is not None:
input_type = 'vol'
elif emb_ids is not None and image_input is not None:
input_type = 'vl'
elif emb_ids is not None:
input_type = 'l'
elif image_input is not None and emb_obj_ids is not None:
input_type = 'vo'
else:
raise ValueError('input feature error')
emb_feature, n_head_self_attn_mask, _v_seq_len, _o_seq_len = self._gen_input(emb_ids=emb_ids,
input_mask=input_mask,
image_input=image_input,
emb_obj_ids=emb_obj_ids)
emb_obj_ids=emb_obj_ids,
input_type=input_type)
enc_out = encoder(
enc_input=emb_feature,
attn_bias=n_head_self_attn_mask,
......@@ -210,7 +222,7 @@ class UNIMOModel(object):
caches=self.caches,
gather_idx=gather_idx)
if self._input_type == 'vol':
if input_type == 'vol':
assert _v_seq_len is not None and _o_seq_len is not None, "the input is invalid"
_vol_seq_len = layers.shape(enc_out)[1]
enc_v_out = fluid.layers.slice(
......@@ -221,7 +233,7 @@ class UNIMOModel(object):
input=enc_out, axes=[1], starts=[_v_seq_len + _o_seq_len], ends=[_vol_seq_len])
enc_vol_out = enc_out
return enc_vol_out, enc_v_out, enc_l_out
elif self._input_type == 'vl':
elif input_type == 'vl':
assert _v_seq_len is not None and _o_seq_len is None, "the input is invalid"
_vl_seq_len = layers.shape(enc_out)[1]
enc_v_out = fluid.layers.slice(
......@@ -230,20 +242,22 @@ class UNIMOModel(object):
input=enc_out, axes=[1], starts=[_v_seq_len], ends=[_vl_seq_len])
enc_vl_out = enc_out
return enc_vl_out, enc_v_out, enc_l_out
elif self._input_type == 'vo':
elif input_type == 'vo':
assert _v_seq_len is not None and _o_seq_len is not None, "the input is invalid"
enc_v_out = fluid.layers.slice(
input=enc_out, axes=[1], starts=[0], ends=[_v_seq_len])
return enc_v_out
elif self._input_type == 'l':
elif input_type == 'l':
assert _v_seq_len is None and _o_seq_len is None, "the input is invalid"
enc_l_out = enc_out
return enc_l_out
else:
raise ValueError("The input type is invalid")
def _gen_input(self, emb_ids=None, input_mask=None, image_input=None, emb_obj_ids=None):
def _gen_input(self, emb_ids=None, input_mask=None, image_input=None, emb_obj_ids=None, input_type=None):
assert input_mask is not None, "input_mask should not be none"
assert input_type is not None, "input_type should not be none"
self_attn_mask = input_mask
self_attn_mask = fluid.layers.scale(
x=self_attn_mask, scale=1e4, bias=-1.0, bias_after_scale=False)
......@@ -320,16 +334,16 @@ class UNIMOModel(object):
emb_obj_out, 'nd', self._prepostprocess_dropout, name="pre_encoder")
_o_seq_len = layers.shape(emb_obj_out)[1]
if self._input_type == 'vol':
if input_type == 'vol':
assert emb_ids is not None and image_input is not None and emb_obj_ids is not None, "the input is invalid"
emb_feature = fluid.layers.concat([emb_v_out, emb_obj_out, emb_out], axis=1)
elif self._input_type == 'vl':
elif input_type == 'vl':
assert emb_ids is not None and image_input is not None and emb_obj_ids is None, "the input is invalid"
emb_feature = fluid.layers.concat([emb_v_out, emb_out], axis=1)
elif self._input_type == 'l':
elif input_type == 'l':
assert emb_ids is not None and image_input is None and emb_obj_ids is None, "the input is invalid"
emb_feature = emb_out
elif self._input_type == 'vo':
elif input_type == 'vo':
assert emb_ids is None and image_input is not None and emb_obj_ids is not None, "the input is invalid"
emb_feature = fluid.layers.concat([emb_v_out, emb_obj_out], axis=1)
else:
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""data reader for multimodal pretraining"""
from __future__ import print_function
from __future__ import division
import json
import base64
import os
import numpy as np
import gzip
import six
import functools
import paddle.fluid as fluid
from reader.batching import pad_feature_data, pad_batch_data
class ClassifyReader(object):
"""ClassifyReader"""
def __init__(self,
filelist,
max_seq_len,
tokenizer):
self.files = open(filelist).readlines()
self.current_file_index = 0
self.total_file = len(self.files)
self.current_file = None
self.tot_examples_nums = 0
self.max_seq_len = max_seq_len
self.pad_id = tokenizer.pad_token_id
self.sep_id = tokenizer.sep_token_id
self.trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
self.trainer_nums = int(os.getenv("PADDLE_TRAINERS_NUM", "1"))
def get_num_examples(self):
"""get_num_examples"""
for index, file_ in enumerate(self.files):
self.tot_examples_nums += int(os.popen('wc -l '+file_.strip()).read().split()[0])
return self.tot_examples_nums
def get_progress(self):
"""return current progress of traning data
"""
return self.current_epoch, self.current_example, self.current_file_index, self.total_file, self.current_file
def parse_line(self, line, max_seq_len=512):
""" parse one line to token_ids, sentence_ids, pos_ids, label
"""
line = line.strip('\r\n').split(";")
if len(line) == 14:
(image_id, data_id, label, token_ids, sent_ids, pos_ids, _, image_w, image_h, \
number_box, boxes, image_embeddings, _, _) = line
else:
raise ValueError("One sample have %d fields!" % len(line))
def decode_feature(base64_str, size):
fea_base64 = base64.b64decode(base64_str)
fea_decode = np.frombuffer(fea_base64, dtype=np.float32)
shape = size, int(fea_decode.shape[0] / size)
features = np.resize(fea_decode, shape)
return features
token_ids = [int(token) for token in token_ids.split(" ")]
sent_ids = [int(token) for token in sent_ids.split(" ")]
pos_ids = [int(token) for token in pos_ids.split(" ")]
assert len(token_ids) == len(sent_ids) == len(pos_ids), \
"[Must be true]len(token_ids) == len(sent_ids) == len(pos_ids)"
number_box = int(number_box)
boxes = decode_feature(boxes, number_box)
image_embeddings = decode_feature(image_embeddings, number_box)
image_embeddings_cls = np.mean(image_embeddings, axis=0, keepdims=True)
image_embeddings = np.concatenate([image_embeddings_cls, image_embeddings], 0)
image_location = np.zeros((boxes.shape[0], 5), dtype=np.float32)
image_location[:, :4] = boxes
image_location[:, 4] = (image_location[:, 3] - image_location[:, 1]) * (
image_location[:, 2] - image_location[:, 0]) / (float(image_w) * float(image_h))
image_location[:, 0] = image_location[:, 0] / float(image_w)
image_location[:, 1] = image_location[:, 1] / float(image_h)
image_location[:, 2] = image_location[:, 2] / float(image_w)
image_location[:, 3] = image_location[:, 3] / float(image_h)
g_location = np.array([0, 0, 1, 1, 1])
image_location = np.concatenate([np.expand_dims(g_location, axis=0), image_location], axis=0)
image_loc = image_location
if len(token_ids) > max_seq_len:
token_ids = token_ids[:max_seq_len - 1] + [self.sep_id]
sent_ids = sent_ids[:max_seq_len]
pos_ids = pos_ids[:max_seq_len]
return [token_ids, sent_ids, pos_ids, label, image_loc, image_embeddings, number_box + 1]
def _prepare_batch_data(self, insts, pad_id=None):
batch_src_ids = [inst[0] for inst in insts]
batch_sent_ids = [inst[1] for inst in insts]
batch_pos_ids = [inst[2] for inst in insts]
batch_labels = [inst[3] for inst in insts]
batch_image_loc = [inst[4] for inst in insts]
batch_image_embedding = [inst[5] for inst in insts]
batch_image_size = [inst[6] for inst in insts]
batch_labels = np.array(batch_labels).astype("int64").reshape([-1, 1])
src_ids, token_mask = pad_batch_data(
batch_src_ids, pretraining_task='nlu', pad_idx=pad_id, return_input_mask=True)
sent_ids = pad_batch_data(
batch_sent_ids, pretraining_task='nlu', pad_idx=pad_id)
pos_ids = pad_batch_data(
batch_pos_ids, pretraining_task='nlu', pad_idx=pad_id)
image_loc = pad_feature_data(batch_image_loc)
image_embedding, image_mask = pad_feature_data(batch_image_embedding,
return_mask=True,
batch_image_size=batch_image_size)
input_mask = np.concatenate((image_mask, token_mask), axis=1)
input_mask = np.matmul(input_mask, np.transpose(input_mask, (0, 2, 1)))
return_list = [
src_ids, pos_ids, sent_ids, input_mask, image_mask, token_mask,
image_embedding, image_loc, batch_labels
]
return return_list
def read_file(self, file):
"""read_file"""
if file.endswith('.gz'):
with gzip.open(file, "rt") as f:
for line in f:
parsed_line = self.parse_line(
line, max_seq_len=self.max_seq_len)
if parsed_line is None:
continue
yield parsed_line
else:
with open(file, "r") as f:
for line in f:
parsed_line = self.parse_line(
line, max_seq_len=self.max_seq_len)
if parsed_line is None:
continue
yield parsed_line
def shuffle_samples(self, sample_generator, buffer=1000):
"""shuffle_samples"""
samples = []
try:
while True:
while len(samples) < buffer:
sample = next(sample_generator)
samples.append(sample)
np.random.shuffle(samples)
for sample in samples:
yield sample
samples = []
except StopIteration:
if len(samples) == 0:
yield None
else:
np.random.shuffle(samples)
for sample in samples:
yield sample
def data_generator(self,
batch_size,
epoch,
phase):
"""
data_generator
"""
if phase != "train":
epoch = 1
def wrapper():
"""wrapper"""
def batch_reader():
"""batch_reader"""
for epoch_index in range(epoch):
self.global_rng = np.random.RandomState(epoch_index)
self.current_epoch = epoch_index
self.current_example = 0
if phase == "train":
self.global_rng.shuffle(self.files)
for index, file_ in enumerate(self.files):
self.current_file_index = index + 1
self.current_file = file_
batch_records = []
for sample in self.shuffle_samples(self.read_file(file=file_.strip())):
self.current_example = self.current_example + 1
if sample is None:
continue
if len(batch_records) < batch_size:
batch_records.append(sample)
else:
yield self._prepare_batch_data(batch_records, self.pad_id)
batch_records = [sample]
if batch_records:
yield self._prepare_batch_data(batch_records, self.pad_id)
all_dev_batches = []
for batch_data in batch_reader():
if len(all_dev_batches) < self.trainer_nums:
all_dev_batches.append(batch_data)
if len(all_dev_batches) == self.trainer_nums:
yield all_dev_batches[self.trainer_id]
all_dev_batches = []
if phase == "train":
all_dev_batches = all_dev_batches * self.trainer_nums
np.random.shuffle(all_dev_batches)
if self.trainer_id < len(all_dev_batches):
yield all_dev_batches[self.trainer_id]
return wrapper
if __name__ == "__main__":
pass
......@@ -11,7 +11,7 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Finetuning on classification tasks."""
"""Finetuning on retrieval tasks."""
from __future__ import absolute_import
from __future__ import division
......
# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""visual entailment tasks."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import time
import multiprocessing
import numpy as np
import paddle.fluid as fluid
from utils.optimization import optimization
from utils.utils import get_time
from utils.init import init_pretraining_params, init_checkpoint
from utils.args import print_arguments
from model.tokenization import GptBpeTokenizer
from args.visual_entailment_args import parser
from collections import OrderedDict
from model.unimo_finetune import UNIMOConfig
from finetune.visual_entailment import create_model, evaluate
from reader.visual_entailment_reader import ClassifyReader
args = parser.parse_args()
def main(args):
"""main"""
model_config = UNIMOConfig(args.unimo_config_path)
model_config.print_config()
gpu_id = 0
gpus = fluid.core.get_cuda_device_count()
if args.is_distributed and os.getenv("FLAGS_selected_gpus") is not None:
gpu_list = os.getenv("FLAGS_selected_gpus").split(",")
gpus = len(gpu_list)
gpu_id = int(gpu_list[0])
if args.use_cuda:
place = fluid.CUDAPlace(gpu_id)
dev_count = gpus
else:
place = fluid.CPUPlace()
dev_count = int(os.environ.get('CPU_NUM', multiprocessing.cpu_count()))
tokenizer = GptBpeTokenizer(vocab_file=args.unimo_vocab_file,
encoder_json_file=args.encoder_json_file,
vocab_bpe_file=args.vocab_bpe_file,
do_lower_case=args.do_lower_case)
if not (args.do_train or args.do_val or args.do_test or args.do_test_hard):
raise ValueError("For args `do_train`, `do_val`, `do_test`, `do_test_hard`, at "
"least one of them must be True.")
startup_prog = fluid.Program()
if args.random_seed is not None:
startup_prog.random_seed = args.random_seed
trainers_num = int(os.getenv("PADDLE_TRAINERS_NUM", "1"))
if args.do_train:
train_data_reader = ClassifyReader(args.train_filelist, args.max_seq_len, tokenizer)
train_data_generator = train_data_reader.data_generator(
batch_size=args.batch_size,
epoch=args.epoch,
phase="train")
if args.num_train_examples:
num_train_examples = args.num_train_examples
else:
num_train_examples = train_data_reader.get_num_examples()
step_num_per_epoch = num_train_examples // args.batch_size // trainers_num
max_train_steps = args.epoch * step_num_per_epoch
warmup_steps = int(max_train_steps * args.warmup_proportion)
print("Device count: %d, gpu_id: %d" % (dev_count, gpu_id))
print("Num train examples: %d" % num_train_examples)
print("Max train steps: %d" % max_train_steps)
print("Num warmup steps: %d" % warmup_steps)
train_program = fluid.Program()
with fluid.program_guard(train_program, startup_prog):
with fluid.unique_name.guard():
train_pyreader, graph_vars = create_model(
args,
config=model_config,
pyreader_name="train_reader",
is_train=True)
scheduled_lr, loss_scaling = optimization(
loss=graph_vars["loss"],
warmup_steps=warmup_steps,
num_train_steps=max_train_steps,
learning_rate=args.learning_rate,
train_program=train_program,
weight_decay=args.weight_decay,
scheduler=args.lr_scheduler,
use_fp16=args.use_fp16,
use_dynamic_loss_scaling=args.use_dynamic_loss_scaling,
init_loss_scaling=args.init_loss_scaling,
beta1=args.beta1,
beta2=args.beta2,
epsilon=args.epsilon)
if args.do_val or args.do_test or args.do_test_hard:
test_prog = fluid.Program()
with fluid.program_guard(test_prog, startup_prog):
with fluid.unique_name.guard():
test_pyreader, test_graph_vars = create_model(
args,
config=model_config,
pyreader_name="dev_reader",
is_train=False)
test_prog = test_prog.clone(for_test=True)
if args.do_val:
dev_data_reader = ClassifyReader(args.dev_filelist, args.max_seq_len, tokenizer)
dev_data_generator = dev_data_reader.data_generator(
batch_size=args.test_batch_size,
epoch=1,
phase="dev")
if args.do_test:
test_data_reader = ClassifyReader(args.test_filelist, args.max_seq_len, tokenizer)
test_data_generator = test_data_reader.data_generator(
batch_size=args.test_batch_size,
epoch=1,
phase="test")
if args.do_test_hard:
test_hard_data_reader = ClassifyReader(args.test_hard_filelist, args.max_seq_len, tokenizer)
test_hard_data_generator = test_hard_data_reader.data_generator(
batch_size=args.test_batch_size,
epoch=1,
phase="test_hard")
nccl2_num_trainers = 1
nccl2_trainer_id = 0
print("args.is_distributed:", args.is_distributed)
if args.is_distributed:
trainer_id = int(os.getenv("PADDLE_TRAINER_ID", "0"))
worker_endpoints_env = os.getenv("PADDLE_TRAINER_ENDPOINTS")
current_endpoint = os.getenv("PADDLE_CURRENT_ENDPOINT")
worker_endpoints = worker_endpoints_env.split(",")
trainers_num = len(worker_endpoints)
print("worker_endpoints:{} trainers_num:{} current_endpoint:{} \
trainer_id:{}".format(worker_endpoints, trainers_num,
current_endpoint, trainer_id))
# prepare nccl2 env.
config = fluid.DistributeTranspilerConfig()
config.mode = "nccl2"
if args.nccl_comm_num > 1:
config.nccl_comm_num = args.nccl_comm_num
if args.use_hierarchical_allreduce and trainers_num > args.hierarchical_allreduce_inter_nranks:
config.use_hierarchical_allreduce = args.use_hierarchical_allreduce
config.hierarchical_allreduce_inter_nranks = args.hierarchical_allreduce_inter_nranks
assert config.hierarchical_allreduce_inter_nranks > 1
assert trainers_num % config.hierarchical_allreduce_inter_nranks == 0
config.hierarchical_allreduce_exter_nranks = \
trainers_num / config.hierarchical_allreduce_inter_nranks
t = fluid.DistributeTranspiler(config=config)
t.transpile(
trainer_id,
trainers=worker_endpoints_env,
current_endpoint=current_endpoint,
program=train_program if args.do_train else test_prog,
startup_program=startup_prog)
nccl2_num_trainers = trainers_num
nccl2_trainer_id = trainer_id
exe = fluid.Executor(place)
exe.run(startup_prog)
if args.do_train:
if args.init_checkpoint and args.init_pretraining_params:
print(
"WARNING: args 'init_checkpoint' and 'init_pretraining_params' "
"both are set! Only arg 'init_checkpoint' is made valid.")
if args.init_checkpoint:
init_checkpoint(
exe,
args.init_checkpoint,
main_program=train_program)
elif args.init_pretraining_params:
init_pretraining_params(
exe,
args.init_pretraining_params,
main_program=train_program)
elif args.do_val or args.do_test or args.do_test_hard:
args.init_checkpoint = args.init_pretraining_params
if not args.init_checkpoint:
raise ValueError("args 'init_checkpoint' should be set if"
"only doing validation or testing!")
init_checkpoint(
exe,
args.init_checkpoint,
main_program=startup_prog)
if args.do_train:
exec_strategy = fluid.ExecutionStrategy()
if args.use_fast_executor:
exec_strategy.use_experimental_executor = True
exec_strategy.num_threads = 4 if args.use_fp16 else 2
exec_strategy.num_iteration_per_drop_scope = min(args.num_iteration_per_drop_scope, args.skip_steps)
build_strategy = fluid.BuildStrategy()
build_strategy.remove_unnecessary_lock = False
if args.use_fuse:
build_strategy.fuse_all_reduce_ops = True
train_exe = fluid.ParallelExecutor(
use_cuda=args.use_cuda,
loss_name=graph_vars["loss"].name,
build_strategy=build_strategy,
exec_strategy=exec_strategy,
main_program=train_program,
num_trainers=nccl2_num_trainers,
trainer_id=nccl2_trainer_id)
train_pyreader.decorate_tensor_provider(train_data_generator)
else:
train_exe = None
if args.do_val or args.do_test or args.do_test_hard:
test_exe = fluid.ParallelExecutor(use_cuda=args.use_cuda,
main_program=test_prog,
share_vars_from=train_exe)
dev_ret_history = [] # (steps, key_eval, eval)
test_ret_history = [] # (steps, key_eval, eval)
test_hard_ret_history = [] # (steps, key_eval, eval)
steps = 0
if args.do_train:
train_pyreader.start()
time_begin = time.time()
skip_steps = args.skip_steps
while True:
try:
steps += 1
if steps % skip_steps == 0:
train_fetch_list = [graph_vars["loss"].name, scheduled_lr.name]
res = train_exe.run(fetch_list=train_fetch_list)
outputs = {"loss": np.mean(res[0]), 'learning_rate': float(res[1][0])}
if args.verbose:
verbose = "train pyreader queue size: %d, learning_rate: %.10f" % \
(train_pyreader.queue.size(), outputs['learning_rate'])
print(verbose)
current_epoch, current_example, current_file_index, total_file, current_file = \
train_data_reader.get_progress()
time_end = time.time()
used_time = time_end - time_begin
print("%s - epoch: %d, progress: %d/%d, %d/%d, step: %d, ave loss: %f, speed: %f steps/s" % \
(get_time(), current_epoch, current_example, num_train_examples, current_file_index, \
total_file, steps, outputs["loss"], args.skip_steps / used_time))
time_begin = time.time()
else:
train_exe.run(fetch_list=[])
if nccl2_trainer_id == 0:
if steps % args.save_steps == 0 and args.save_checkpoints:
save_path = os.path.join(args.checkpoints,
"step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program)
if steps % args.validation_steps == 0:
# evaluate dev set
if args.do_val:
test_pyreader.decorate_tensor_provider(dev_data_generator)
outputs = evaluate(args, test_exe, test_pyreader, test_graph_vars, \
"dev", trainers_num, nccl2_trainer_id)
if nccl2_trainer_id == 0:
dev_ret_history.append((steps, outputs['key_eval'], outputs[outputs['key_eval']]))
# evaluate test set
if args.do_test:
test_pyreader.decorate_tensor_provider(test_data_generator)
outputs = evaluate(args, test_exe, test_pyreader, test_graph_vars, \
"test", trainers_num, nccl2_trainer_id)
if nccl2_trainer_id == 0:
test_ret_history.append((steps, outputs['key_eval'], outputs[outputs['key_eval']]))
# evaluate test set
if args.do_test_hard:
test_pyreader.decorate_tensor_provider(test_hard_data_generator)
outputs = evaluate(args, test_exe, test_pyreader, test_graph_vars, \
"test_hard", trainers_num, nccl2_trainer_id)
if nccl2_trainer_id == 0:
test_hard_ret_history.append((steps, outputs['key_eval'], outputs[outputs['key_eval']]))
except fluid.core.EOFException:
if args.save_checkpoints:
save_path = os.path.join(args.checkpoints, "step_" + str(steps))
fluid.io.save_persistables(exe, save_path, train_program)
train_pyreader.reset()
break
# final eval on dev set
if args.do_val:
test_pyreader.decorate_tensor_provider(dev_data_generator)
outputs = evaluate(args, test_exe, test_pyreader, test_graph_vars, "dev", trainers_num, nccl2_trainer_id)
if nccl2_trainer_id == 0:
dev_ret_history.append((steps, outputs['key_eval'], outputs[outputs['key_eval']]))
# final eval on test set
if args.do_test:
test_pyreader.decorate_tensor_provider(test_data_generator)
outputs = evaluate(args, test_exe, test_pyreader, test_graph_vars, "test", trainers_num, nccl2_trainer_id)
if nccl2_trainer_id == 0:
test_ret_history.append((steps, outputs['key_eval'], outputs[outputs['key_eval']]))
# final eval on test_hard set
if args.do_test_hard:
test_pyreader.decorate_tensor_provider(test_hard_data_generator)
outputs = evaluate(args, test_exe, test_pyreader, test_graph_vars, "test_hard", trainers_num, nccl2_trainer_id)
if nccl2_trainer_id == 0:
test_hard_ret_history.append((steps, outputs['key_eval'], outputs[outputs['key_eval']]))
if nccl2_trainer_id == 0:
if args.do_val:
dev_ret_history = sorted(dev_ret_history, key=lambda a: a[2], reverse=True)
print("Best validation result: step %d %s %f" % \
(dev_ret_history[0][0], dev_ret_history[0][1], dev_ret_history[0][2]))
if __name__ == '__main__':
print_arguments(args)
main(args)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册