分类任务,本地可以运行,集群上评估部分出错“PrecisionRecallEvaluator”
Created by: 0YuanZhang0
网络为分类模型,本地运行可以成功,提交到集群上面时,刚开始运行就出错
............................*** Aborted at 1500366428 (unix time) try "date -d @1500366428" if you are using GNU date ***
*** Aborted at 1500366428 (unix time) try "date -d @1500366428" if you are using GNU date ***
PC: @ 0x70d722 paddle::PrecisionRecallEvaluator::calcStatsInfo()
*** SIGFPE (@0x70d722) received by PID 15143 (TID 0x7fe0872d0880) from PID 7395106; stack trace: ***
@ 0x7fe086ec0160 (unknown)
@ 0x70d722 paddle::PrecisionRecallEvaluator::calcStatsInfo()
@ 0x70f0f0 paddle::PrecisionRecallEvaluator::evalImp()
@ 0x70ecbe paddle::Evaluator::eval()
@ 0x745f98 paddle::CombinedEvaluator::eval()
@ 0x7393b4 paddle::MultiGradientMachine::eval()
@ 0x78d64a paddle::TrainerInternal::trainOneBatch()
@ 0x787dcf paddle::Trainer::trainOnePass()
@ 0x78b494 paddle::Trainer::train()
@ 0x5c02e3 main
@ 0x7fe08549bbd5 __libc_start_main
@ 0x5cf9a1 (unknown)
PC: @ 0x70d722 paddle::PrecisionRecallEvaluator::calcStatsInfo()
*** SIGFPE (@0x70d722) received by PID 25723 (TID 0x7ff8b3aaf880) from PID 7395106; stack trace: ***
配置网络如下,集群版本内layers.py中没有seq_reshape_layer层,本地修改了这个文件后加了seq_reshape_layer层后提交到集群:
data_word = data_layer(name="word", size=num_word)
data_postag = data_layer(name="postag", size=num_postag)
data_arc = data_layer(name="arc", size=num_arc)
if not is_predict:
data_label = data_layer(name="label", size=num_classes)
word_attr = ParameterAttribute(initial_std=1/8.0, initial_mean=0.0)
tag_attr = ParameterAttribute(initial_std=1/4.0, initial_mean=0.0)
label_attr = ParameterAttribute(initial_std=1/4.0, initial_mean=0.0)
embedding_word = embedding_layer(input=data_word, size=word_dim, param_attr=word_attr)
srl_word = seq_reshape_layer(input=embedding_word, reshape_size=20*word_dim)
embedding_postag = embedding_layer(input=data_postag, size=postag_dim, param_attr=tag_attr)
srl_tag = seq_reshape_layer(input=embedding_postag, reshape_size=20*postag_dim)
embedding_arc = embedding_layer(input=data_arc, size=arc_dim, param_attr=label_attr)
srl_arc = seq_reshape_layer(input=embedding_arc, reshape_size=12*arc_dim)
concat = concat_layer(input=[srl_word, srl_tag, srl_arc], act=LinearActivation())
bias_attr = ParameterAttribute(initial_std=0., l2_rate=0.0001)
w_attr = ParameterAttribute(initial_std=1e-4, initial_mean=0.0)
hidden1 = fc_layer(input=concat, size=hidden_dim, act=ReluActivation(), param_attr=w_attr, bias_attr=bias_attr)
hidden2 = fc_layer(input=hidden1, size=hidden_dim, act=ReluActivation(), param_attr=w_attr, bias_attr=bias_attr)
output = fc_layer(input=hidden2, size=num_classes, act=SoftmaxActivation(), param_attr=w_attr, bias_attr=bias_attr)
if not is_predict:
cls_loss = classification_cost(input=output, label=data_label, evaluator=[precision_recall_evaluator, classification_error_evaluator])
outputs(cls_loss)
else:
outputs(output)
任务链接为: http://yq01-idl-gpu-offline62.yq01.baidu.com:8880/output/list/9066