deepfm运行demo，出现Segmentation fault (core dumped) (#561) · Issue · PaddlePaddle / models

deepfm运行demo，出现Segmentation fault (core dumped)

Created by: windy444

我从头开始，部署了下modles里面的deepfm示例，发现有段错误。（在一台新机器上装最新paddle，并且部署models下的deepfm）。batchsize=1的时候，不会有问题，但是当batchsize增大的时候，会有段错误。另外，示例里面有这个代码，我不是很明白

buf_size=args.batch_size * 10000

当batch_size较大，比如示例里面默认是1000，会导致初始化加载10几G的缓存，会初始化很久，并且这么大的缓存也不是必要的吧

下面是我两次运行，只是改了下batch_size大小。代码就是原来的，除了我把buf_size固定成了2000，另外测试和验证数据，我取了原来测试和验证数据的各自top 100

[work@XXX deep_fm]$ python train.py --train_data_path data/train2.txt --test_data_path data/valid2.txt --batch_size 1
I0101 21:55:06.438161 13166 Util.cpp:166] commandline:  --use_gpu=False --trainer_count=1 
I0101 21:55:06.519804 13166 GradientMachine.cpp:94] Initing parameters..
I0101 21:55:06.632671 13166 GradientMachine.cpp:101] Init parameters done.
[WARNING 2018-01-01 21:55:07,184 train.py:77] Pass 0, Batch 0, Samples 0, Cost 0.640660, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:07,719 train.py:88] Test 0-0, Cost 0.904218, {'auc': 0.5116550326347351, 'classification_error': 0.6600000262260437}
[WARNING 2018-01-01 21:55:18,102 train.py:77] Pass 1, Batch 0, Samples 0, Cost 0.392937, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:18,632 train.py:88] Test 1-0, Cost 0.663319, {'auc': 0.5314685106277466, 'classification_error': 0.25999999046325684}
[WARNING 2018-01-01 21:55:29,005 train.py:77] Pass 2, Batch 0, Samples 0, Cost 0.153824, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:29,538 train.py:88] Test 2-0, Cost 0.647524, {'auc': 0.5536130666732788, 'classification_error': 0.2199999988079071}
[WARNING 2018-01-01 21:55:39,917 train.py:77] Pass 3, Batch 0, Samples 0, Cost 0.287967, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:55:40,449 train.py:88] Test 3-0, Cost 0.646834, {'auc': 0.5879953503608704, 'classification_error': 0.2199999988079071}
[WARNING 2018-01-01 21:55:50,842 train.py:77] Pass 4, Batch 0, Samples 0, Cost 1.319486, {'auc': 0.0, 'classification_error': 1.0}
[WARNING 2018-01-01 21:55:51,374 train.py:88] Test 4-0, Cost 0.653498, {'auc': 0.5979021191596985, 'classification_error': 0.20999999344348907}
[WARNING 2018-01-01 21:56:01,749 train.py:77] Pass 5, Batch 0, Samples 0, Cost 1.486811, {'auc': 0.0, 'classification_error': 1.0}
[WARNING 2018-01-01 21:56:02,286 train.py:88] Test 5-0, Cost 0.643287, {'auc': 0.6095570921897888, 'classification_error': 0.20999999344348907}
[WARNING 2018-01-01 21:56:12,657 train.py:77] Pass 6, Batch 0, Samples 0, Cost 0.131564, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:13,220 train.py:88] Test 6-0, Cost 0.712368, {'auc': 0.6118881106376648, 'classification_error': 0.20000000298023224}
[WARNING 2018-01-01 21:56:23,595 train.py:77] Pass 7, Batch 0, Samples 0, Cost 0.007719, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:24,129 train.py:88] Test 7-0, Cost 0.728682, {'auc': 0.6130536198616028, 'classification_error': 0.20999999344348907}
[WARNING 2018-01-01 21:56:34,510 train.py:77] Pass 8, Batch 0, Samples 0, Cost 0.003461, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:35,045 train.py:88] Test 8-0, Cost 0.763806, {'auc': 0.6212121248245239, 'classification_error': 0.20000000298023224}
[WARNING 2018-01-01 21:56:45,418 train.py:77] Pass 9, Batch 0, Samples 0, Cost 0.011033, {'auc': 0.0, 'classification_error': 0.0}
[WARNING 2018-01-01 21:56:45,948 train.py:88] Test 9-0, Cost 0.753952, {'auc': 0.6340326070785522, 'classification_error': 0.23000000417232513}


[work@XXX deep_fm]$ python train.py --train_data_path data/train2.txt --test_data_path data/valid2.txt --batch_size 10
I0101 22:12:07.115480 26987 Util.cpp:166] commandline:  --use_gpu=False --trainer_count=1 
I0101 22:12:07.198402 26987 GradientMachine.cpp:94] Initing parameters..
I0101 22:12:07.312139 26987 GradientMachine.cpp:101] Init parameters done.

*** Aborted at 1514815927 (unix time) try "date -d @1514815927" if you are using GNU date ***

PC: @                0x0 (unknown)

*** SIGSEGV (@0x7fe68ac1400c) received by PID 26987 (TID 0x7fe69801c700) from PID 18446744071742504972; stack trace: ***

    @       0x318b20f500 (unknown)

    @     0x7fe69b9125e0 sgemm_itcopy_SANDYBRIDGE

    @     0x7fe69a4cedf0 inner_thread

    @     0x7fe69a4d752f blas_thread_server

    @       0x318b207851 (unknown)

    @       0x318aee767d (unknown)

    @                0x0 (unknown)
Segmentation fault (core dumped)

PaddlePaddle / models 大约 2 年 前同步成功

deepfm运行demo，出现Segmentation fault (core dumped)

PaddlePaddle / models
大约 2 年前同步成功