dssm模型训练过程中退出 (#568) · Issue · PaddlePaddle / models

dssm模型训练过程中退出

Created by: wangbin11

单机运行dssm目录下的train.py //model_type = 2 for regresison //model_arch ==1 for cnn python train.py --train_data_path ./data/regression/train.txt --test_data_path ./data/regression/test.txt -s ./data/vocab.txt --model_type 2 --num_workers 8 --model_arch 1 其余参数使用默认。

[INFO 2018-01-08 23:06:47,145 utils.py:125] The arguments passed by command line is : [INFO 2018-01-08 23:06:47,145 utils.py:127] batch_size: 32 [INFO 2018-01-08 23:06:47,146 utils.py:127] class_num: 0 [INFO 2018-01-08 23:06:47,146 utils.py:127] dnn_dims: 256,128,64,32 [INFO 2018-01-08 23:06:47,146 utils.py:127] model_arch: cnn [INFO 2018-01-08 23:06:47,146 utils.py:127] model_output_prefix: ./ [INFO 2018-01-08 23:06:47,146 utils.py:127] model_type: regression [INFO 2018-01-08 23:06:47,146 utils.py:127] num_batches_to_log: 100 [INFO 2018-01-08 23:06:47,146 utils.py:127] num_batches_to_save_model: 400 [INFO 2018-01-08 23:06:47,147 utils.py:127] num_batches_to_test: 200 [INFO 2018-01-08 23:06:47,147 utils.py:127] num_passes: 10 [INFO 2018-01-08 23:06:47,147 utils.py:127] num_workers: 8 [INFO 2018-01-08 23:06:47,147 utils.py:127] share_embed: False [INFO 2018-01-08 23:06:47,147 utils.py:127] share_network_between_source_target: False [INFO 2018-01-08 23:06:47,147 utils.py:127] source_dic_path: ./data/vocab.txt [INFO 2018-01-08 23:06:47,147 utils.py:127] target_dic_path: ./data/vocab.txt [INFO 2018-01-08 23:06:47,147 utils.py:127] test_data_path: ./data/regression/test.txt [INFO 2018-01-08 23:06:47,148 utils.py:127] train_data_path: ./data/regression/train.txt [INFO 2018-01-08 23:06:47,148 utils.py:127] use_gpu: False I0108 23:06:47.283903 18257 Util.cpp:166] commandline: --use_gpu=False --trainer_count=8 [WARNING 2018-01-08 23:06:47,365 network_conf.py:53] Build DSSM model with config of regression, cnn [INFO 2018-01-08 23:06:47,366 network_conf.py:54] The vocabulary size is : [57919, 57919] [INFO 2018-01-08 23:06:47,366 network_conf.py:180] build regression model [INFO 2018-01-08 23:06:47,368 network_conf.py:87] Create embedding table [source] whose dimention is 256. [INFO 2018-01-08 23:06:47,371 network_conf.py:87] Create embedding table [target] whose dimention is 256. [INFO 2018-01-08 23:06:47,372 network_conf.py:149] create a sequence_conv_pool whose context width is 3. [INFO 2018-01-08 23:06:47,375 network_conf.py:151] create a sequence_conv_pool whose context width is 4. [INFO 2018-01-08 23:06:47,378 network_conf.py:163] create fc layer [source_fc_0_128] which dimention is 128 [INFO 2018-01-08 23:06:47,379 network_conf.py:163] create fc layer [source_fc_1_64] which dimention is 64 [INFO 2018-01-08 23:06:47,380 network_conf.py:163] create fc layer [source_fc_2_32] which dimention is 32 [INFO 2018-01-08 23:06:47,381 network_conf.py:149] create a sequence_conv_pool whose context width is 3. [INFO 2018-01-08 23:06:47,383 network_conf.py:151] create a sequence_conv_pool whose context width is 4. [INFO 2018-01-08 23:06:47,386 network_conf.py:163] create fc layer [target_fc_0_128] which dimention is 128 [INFO 2018-01-08 23:06:47,387 network_conf.py:163] create fc layer [target_fc_1_64] which dimention is 64 [INFO 2018-01-08 23:06:47,388 network_conf.py:163] create fc layer [target_fc_2_32] which dimention is 32 I0108 23:06:47.985695 18257 GradientMachine.cpp:94] Initing parameters.. I0108 23:06:53.009483 18257 GradientMachine.cpp:101] Init parameters done. [INFO 2018-01-08 23:06:53,422 reader.py:31] [reader] load trainset from ./data/regression/train.txt [INFO 2018-01-08 23:06:54,772 train.py:229] Pass 0, Batch 0, Cost 0.306331, {'auc_evaluator_0': 0.6302083134651184} [INFO 2018-01-08 23:08:39,500 train.py:229] Pass 0, Batch 100, Cost 0.153258, {'auc_evaluator_0': 0.9294871687889099} [INFO 2018-01-08 23:10:28,211 train.py:229] Pass 0, Batch 200, Cost 0.088812, {'auc_evaluator_0': 0.7851851582527161} [INFO 2018-01-08 23:12:18,108 train.py:229] Pass 0, Batch 300, Cost 0.083488, {'auc_evaluator_0': 0.7239583134651184} [INFO 2018-01-08 23:14:13,401 train.py:229] Pass 0, Batch 400, Cost 0.084395, {'auc_evaluator_0': 0.7816091775894165} F0108 23:15:27.318744 18257 Evaluator.cpp:460] Check failed: binIdx <= kBinNum_ bin index [4293613805] out of range, predict value[-0.0806744]

* Check failure stack trace: *

@ 0x7ff5af39c9ad google::LogMessage::Fail() @ 0x7ff5af3a045c google::LogMessage::SendToLog() @ 0x7ff5af39c4d3 google::LogMessage::Flush() @ 0x7ff5af3a196e google::LogMessageFatal::~LogMessageFatal() @ 0x7ff5af17e81d paddle::AucEvaluator::evalImp() @ 0x7ff5af17e018 paddle::Evaluator::eval() @ 0x7ff5af152ce8 paddle::CombinedEvaluator::eval() @ 0x7ff5af16faaa paddle::MultiGradientMachine::eval() @ 0x7ff5aeff7886 _wrap_GradientMachine_eval @ 0x4cb45e PyEval_EvalFrameEx @ 0x4c2765 PyEval_EvalCodeEx @ 0x4ca8d1 PyEval_EvalFrameEx @ 0x4c2765 PyEval_EvalCodeEx @ 0x4ca099 PyEval_EvalFrameEx @ 0x4c2765 PyEval_EvalCodeEx @ 0x4ca099 PyEval_EvalFrameEx @ 0x4c2765 PyEval_EvalCodeEx @ 0x4c2509 PyEval_EvalCode @ 0x4f1def (unknown) @ 0x4ec652 PyRun_FileExFlags @ 0x4eae31 PyRun_SimpleFileExFlags @ 0x49e14a Py_Main @ 0x7ff5dc6a4830 __libc_start_main @ 0x49d9d9 _start @ (nil) (unknown) ('model type: ', 'regression') Aborted (core dumped)

训练主程序入口train.py

PaddlePaddle / models 大约 2 年 前同步成功

dssm模型训练过程中退出

* Check failure stack trace: *

PaddlePaddle / models
大约 2 年前同步成功