deepfm运行demo,出现Segmentation fault (core dumped)
Created by: windy444
我从头开始,部署了下modles里面的deepfm示例,发现有段错误。(在一台新机器上装最新paddle,并且部署models下的deepfm)。batchsize=1的时候,不会有问题,但是当batchsize增大的时候,会有段错误。 另外,示例里面有这个代码,我不是很明白
buf_size=args.batch_size * 10000
当batch_size较大,比如示例里面默认是1000,会导致初始化加载10几G的缓存,会初始化很久,并且这么大的缓存也不是必要的吧
下面是我两次运行,只是改了下batch_size大小。代码就是原来的,除了我把buf_size固定成了2000,另外测试和验证数据,我取了原来测试和验证数据的各自top 100
[work@XXX deep_fm]$ python train.py --train_data_path data/train2.txt --test_data_path data/valid2.txt --batch_size 1
I0101 21:55:06.438161 13166 Util.cpp:166] commandline: --use_gpu=False --trainer_count=1
I0101 21:55:06.519804 13166 GradientMachine.cpp:94] Initing parameters..
I0101 21:55:06.632671 13166 GradientMachine.cpp:101] Init parameters done.
[WARNING 2018-01-01 21:55:07,184 train.py:77] Pass 0, Batch 0, Samples 0, Cost 0.640660, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:07,719 train.py:88] Test 0-0, Cost 0.904218, {'auc': 0.5116550326347351, 'classification_error': 0.6600000262260437}
[WARNING 2018-01-01 21:55:18,102 train.py:77] Pass 1, Batch 0, Samples 0, Cost 0.392937, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:18,632 train.py:88] Test 1-0, Cost 0.663319, {'auc': 0.5314685106277466, 'classification_error': 0.25999999046325684}
[WARNING 2018-01-01 21:55:29,005 train.py:77] Pass 2, Batch 0, Samples 0, Cost 0.153824, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:29,538 train.py:88] Test 2-0, Cost 0.647524, {'auc': 0.5536130666732788, 'classification_error': 0.2199999988079071}
[WARNING 2018-01-01 21:55:39,917 train.py:77] Pass 3, Batch 0, Samples 0, Cost 0.287967, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:40,449 train.py:88] Test 3-0, Cost 0.646834, {'auc': 0.5879953503608704, 'classification_error': 0.2199999988079071}
[WARNING 2018-01-01 21:55:50,842 train.py:77] Pass 4, Batch 0, Samples 0, Cost 1.319486, {'auc': 0.0, 'classification_error': 1.0}
[WARNING 2018-01-01 21:55:51,374 train.py:88] Test 4-0, Cost 0.653498, {'auc': 0.5979021191596985, 'classification_error': 0.20999999344348907}
[WARNING 2018-01-01 21:56:01,749 train.py:77] Pass 5, Batch 0, Samples 0, Cost 1.486811, {'auc': 0.0, 'classification_error': 1.0}
[WARNING 2018-01-01 21:56:02,286 train.py:88] Test 5-0, Cost 0.643287, {'auc': 0.6095570921897888, 'classification_error': 0.20999999344348907}
[WARNING 2018-01-01 21:56:12,657 train.py:77] Pass 6, Batch 0, Samples 0, Cost 0.131564, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:13,220 train.py:88] Test 6-0, Cost 0.712368, {'auc': 0.6118881106376648, 'classification_error': 0.20000000298023224}
[WARNING 2018-01-01 21:56:23,595 train.py:77] Pass 7, Batch 0, Samples 0, Cost 0.007719, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:24,129 train.py:88] Test 7-0, Cost 0.728682, {'auc': 0.6130536198616028, 'classification_error': 0.20999999344348907}
[WARNING 2018-01-01 21:56:34,510 train.py:77] Pass 8, Batch 0, Samples 0, Cost 0.003461, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:35,045 train.py:88] Test 8-0, Cost 0.763806, {'auc': 0.6212121248245239, 'classification_error': 0.20000000298023224}
[WARNING 2018-01-01 21:56:45,418 train.py:77] Pass 9, Batch 0, Samples 0, Cost 0.011033, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:45,948 train.py:88] Test 9-0, Cost 0.753952, {'auc': 0.6340326070785522, 'classification_error': 0.23000000417232513}
[work@XXX deep_fm]$ python train.py --train_data_path data/train2.txt --test_data_path data/valid2.txt --batch_size 10
I0101 22:12:07.115480 26987 Util.cpp:166] commandline: --use_gpu=False --trainer_count=1
I0101 22:12:07.198402 26987 GradientMachine.cpp:94] Initing parameters..
I0101 22:12:07.312139 26987 GradientMachine.cpp:101] Init parameters done.
*** Aborted at 1514815927 (unix time) try "date -d @1514815927" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGSEGV (@0x7fe68ac1400c) received by PID 26987 (TID 0x7fe69801c700) from PID 18446744071742504972; stack trace: ***
@ 0x318b20f500 (unknown)
@ 0x7fe69b9125e0 sgemm_itcopy_SANDYBRIDGE
@ 0x7fe69a4cedf0 inner_thread
@ 0x7fe69a4d752f blas_thread_server
@ 0x318b207851 (unknown)
@ 0x318aee767d (unknown)
@ 0x0 (unknown)
Segmentation fault (core dumped)