Can't get beyond the first batch when feeding in a dense_vector_sequence (#369) · Issue · PaddlePaddle / Paddle

Can't get beyond the first batch when feeding in a dense_vector_sequence

Created by: alvations

I have managed to feed numpy objects into Paddle by using something like np.array.tolist():

from paddle.trainer.PyDataProvider2 import *

import numpy as np

UNK_IDX = 2
START = "<s>"
END = "<e>"

def _get_ids(s, dictionary):
    words = s.strip().split()
    return [dictionary[START]] + \
           [dictionary.get(w, UNK_IDX) for w in words] + \
           [dictionary[END]]

def hook(settings, src_dict, trg_dict, file_list, **kwargs):
    # Some code ...
    # A numpy matrix that corresponds to the src (row) and target (column) vocabulary
    settings.thematrix = np.random.rand(len(src_dict), len(trg_dict))
    # ...
    settings.slots = [ integer_value_sequence(len(settings.src_dict)),
                           dense_vector_sequence(len(setting.src_dict)),
                            integer_value_sequence(len(settings.trg_dict))]
    # ...

@provider(init_hook=hook, pool_size=50000)
def process(settings, file_name):
    # ...
    for line in enumerate(f):
        src_seq, trg_seq = line.strip().split('\t')
        src_ids = _get_ids(src_seq, settings.src_dict)
        trg_ids = [settings.trg_dict.get(w, UNK_IDX)
                           for w in trg_words]
        trg_ids = [settings.trg_dict[START]] + trg_ids
    yield src_ids , settings.thematrix[src_ids].tolist(), trg_ids

Somehow the vectors can't seem to get pass the first batch and Paddle throws this error:

~/Paddle/demo/rowrow$ bash train.sh 
I1104 18:59:42.636052 18632 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=train.conf --save_dir=/home/ltan/Paddle/demo/rowrow/model --use_gpu=true --num_passes=100 --show_parameter_stats_period=1000 --trainer_count=4 --log_period=10 --dot_period=5 
I1104 18:59:46.503566 18632 Util.cpp:126] Calling runInitFunctions
I1104 18:59:46.503810 18632 Util.cpp:139] Call runInitFunctions done.
[WARNING 2016-11-04 18:59:46,847 default_decorators.py:40] please use keyword arguments in paddle config.
[INFO 2016-11-04 18:59:46,856 networks.py:1125] The input order is [source_language_word, target_language_word, target_language_next_word]
[INFO 2016-11-04 18:59:46,857 networks.py:1132] The output order is [__cost_0__]
I1104 18:59:46.871026 18632 Trainer.cpp:170] trainer mode: Normal
I1104 18:59:46.871906 18632 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4
I1104 18:59:46.988584 18632 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-04 18:59:46,990 dataprovider.py:15] src dict len : 45661
[INFO 2016-11-04 18:59:47,316 dataprovider.py:26] trg dict len : 422
I1104 18:59:47.347944 18632 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-04 18:59:47,348 dataprovider.py:15] src dict len : 45661
[INFO 2016-11-04 18:59:47,657 dataprovider.py:26] trg dict len : 422
I1104 18:59:47.658279 18632 GradientMachine.cpp:134] Initing parameters..
I1104 18:59:49.244287 18632 GradientMachine.cpp:141] Init parameters done.
F1104 18:59:50.485621 18632 PythonUtil.h:213] Check failed: PySequence_Check(seq_) 
*** Check failure stack trace: ***
    @     0x7f71f521adaa  (unknown)
    @     0x7f71f521ace4  (unknown)
    @     0x7f71f521a6e6  (unknown)
    @     0x7f71f521d687  (unknown)
    @           0x54dac9  paddle::DenseScanner::fill()
    @           0x54f1d1  paddle::SequenceScanner::fill()
    @           0x5543cc  paddle::PyDataProvider2::getNextBatchInternal()
    @           0x5779b2  paddle::DataProvider::getNextBatch()
    @           0x6a01f7  paddle::Trainer::trainOnePass()
    @           0x6a3b57  paddle::Trainer::train()
    @           0x53a2b3  main
    @     0x7f71f4426f45  (unknown)
    @           0x545ae5  (unknown)
    @              (nil)  (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81: 18632 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

Without tolist(), the dataprovider would support numpy array too but it throws the same error.

More details on http://stackoverflow.com/questions/40421248/why-is-paddle-throwing-errors-when-feeding-in-a-dense-vector-sequence-to-a-seqto and the data+code that i use to run train.sh is in https://github.com/alvations/rowrow .

What does the error means? Why is it not going pass the first batch?

Although the vectors I am using are random vectors, I hope to read a pickled file that loads a similar matrix that corresponds to the src_dict and target_dict. I would consider them to be dense since all cells in the matrix contains floating point values that ranges between [0.3, 1.0].

Is there any example of how to load a dense_vector_sequence in Paddle?

PaddlePaddle / Paddle 大约 2 年 前同步成功

Can't get beyond the first batch when feeding in a dense_vector_sequence

PaddlePaddle / Paddle
大约 2 年前同步成功