1.7.1版本,Bert fine-tuning编译期会卡十分钟
Created by: mapingshuo
背景
- PaddlePaddle版本:1.7.1,CommitID: 2a792de7c263fa038ba3e29285e4d9d7e86ab6ca
- GPU V100、CUDA9, CUDNN7.3
- 编译环境: 裸机编译
- 单机,多卡训练, 显存信息: 32GB
描述
在使用BERT库 执行fine-tuning训练时,遇到编译期卡住的情况。开启“export GLOG_v=15”, 发现卡在下面这一行长达十分钟:
I0412 11:43:24.269901 5808 op_desc.cc:673] begin to check attribute of elementwise_mul
I0412 11:43:24.269930 5808 op_desc.cc:679] CompileTime infer shape on elementwise_mul
I0412 11:43:24.269943 5808 op_desc.cc:695] From [sent_embedding@GRAD, elementwise_div_0, ] to [elementwise_mul_391, ]
I0412 11:43:24.270349 5808 op_desc.cc:673] begin to check attribute of elementwise_mul
I0412 11:43:24.270378 5808 op_desc.cc:679] CompileTime infer shape on elementwise_mul
I0412 11:43:24.270391 5808 op_desc.cc:695] From [word_embedding@GRAD, elementwise_div_0, ] to [elementwise_mul_392, ]```
复现方式
安装上述版本paddle,clone 上述BERT库后,开启“export GLOG_v=15”,执行以下命令即可复现:
export FLAGS_sync_nccl_allreduce=0
export FLAGS_eager_delete_tensor_gb=1
TASK_NAME='XNLI'
# customize your own path
BERT_BASE_PATH="uncased_L-24_H-1024_A-16"
DATA_PATH=xnli
CKPT_PATH=./checkpoints
python -u run_classifier.py --task_name ${TASK_NAME} \
--use_cuda true \
--do_train true \
--do_val true \
--do_test true \
--batch_size 8192 \
--in_tokens true \
--init_pretraining_params ${BERT_BASE_PATH}/params \
--data_dir ${DATA_PATH} \
--vocab_path ${BERT_BASE_PATH}/vocab.txt \
--checkpoints ${CKPT_PATH} \
--save_steps 1000 \
--weight_decay 0.01 \
--warmup_proportion 0.0 \
--validation_steps 25 \
--epoch 3 \
--max_seq_len 512 \
--bert_config_path ${BERT_BASE_PATH}/bert_config.json \
--learning_rate 1e-4 \
--skip_steps 10 \
--random_seed 1
最新develop也有同样问题,期待高优解决。