Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #369

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 11月 07, 2016 by saxon_zh@saxon_zhGuest

Can't get beyond the first batch when feeding in a dense_vector_sequence

Created by: alvations

I have managed to feed numpy objects into Paddle by using something like np.array.tolist():

from paddle.trainer.PyDataProvider2 import *

import numpy as np

UNK_IDX = 2
START = "<s>"
END = "<e>"

def _get_ids(s, dictionary):
    words = s.strip().split()
    return [dictionary[START]] + \
           [dictionary.get(w, UNK_IDX) for w in words] + \
           [dictionary[END]]

def hook(settings, src_dict, trg_dict, file_list, **kwargs):
    # Some code ...
    # A numpy matrix that corresponds to the src (row) and target (column) vocabulary
    settings.thematrix = np.random.rand(len(src_dict), len(trg_dict))
    # ...
    settings.slots = [ integer_value_sequence(len(settings.src_dict)),
                           dense_vector_sequence(len(setting.src_dict)),
                            integer_value_sequence(len(settings.trg_dict))]
    # ...

@provider(init_hook=hook, pool_size=50000)
def process(settings, file_name):
    # ...
    for line in enumerate(f):
        src_seq, trg_seq = line.strip().split('\t')
        src_ids = _get_ids(src_seq, settings.src_dict)
        trg_ids = [settings.trg_dict.get(w, UNK_IDX)
                           for w in trg_words]
        trg_ids = [settings.trg_dict[START]] + trg_ids
    yield src_ids , settings.thematrix[src_ids].tolist(), trg_ids

Somehow the vectors can't seem to get pass the first batch and Paddle throws this error:

~/Paddle/demo/rowrow$ bash train.sh 
I1104 18:59:42.636052 18632 Util.cpp:151] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=train.conf --save_dir=/home/ltan/Paddle/demo/rowrow/model --use_gpu=true --num_passes=100 --show_parameter_stats_period=1000 --trainer_count=4 --log_period=10 --dot_period=5 
I1104 18:59:46.503566 18632 Util.cpp:126] Calling runInitFunctions
I1104 18:59:46.503810 18632 Util.cpp:139] Call runInitFunctions done.
[WARNING 2016-11-04 18:59:46,847 default_decorators.py:40] please use keyword arguments in paddle config.
[INFO 2016-11-04 18:59:46,856 networks.py:1125] The input order is [source_language_word, target_language_word, target_language_next_word]
[INFO 2016-11-04 18:59:46,857 networks.py:1132] The output order is [__cost_0__]
I1104 18:59:46.871026 18632 Trainer.cpp:170] trainer mode: Normal
I1104 18:59:46.871906 18632 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4
I1104 18:59:46.988584 18632 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-04 18:59:46,990 dataprovider.py:15] src dict len : 45661
[INFO 2016-11-04 18:59:47,316 dataprovider.py:26] trg dict len : 422
I1104 18:59:47.347944 18632 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-04 18:59:47,348 dataprovider.py:15] src dict len : 45661
[INFO 2016-11-04 18:59:47,657 dataprovider.py:26] trg dict len : 422
I1104 18:59:47.658279 18632 GradientMachine.cpp:134] Initing parameters..
I1104 18:59:49.244287 18632 GradientMachine.cpp:141] Init parameters done.
F1104 18:59:50.485621 18632 PythonUtil.h:213] Check failed: PySequence_Check(seq_) 
*** Check failure stack trace: ***
    @     0x7f71f521adaa  (unknown)
    @     0x7f71f521ace4  (unknown)
    @     0x7f71f521a6e6  (unknown)
    @     0x7f71f521d687  (unknown)
    @           0x54dac9  paddle::DenseScanner::fill()
    @           0x54f1d1  paddle::SequenceScanner::fill()
    @           0x5543cc  paddle::PyDataProvider2::getNextBatchInternal()
    @           0x5779b2  paddle::DataProvider::getNextBatch()
    @           0x6a01f7  paddle::Trainer::trainOnePass()
    @           0x6a3b57  paddle::Trainer::train()
    @           0x53a2b3  main
    @     0x7f71f4426f45  (unknown)
    @           0x545ae5  (unknown)
    @              (nil)  (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81: 18632 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

Without tolist(), the dataprovider would support numpy array too but it throws the same error.

More details on http://stackoverflow.com/questions/40421248/why-is-paddle-throwing-errors-when-feeding-in-a-dense-vector-sequence-to-a-seqto and the data+code that i use to run train.sh is in https://github.com/alvations/rowrow .

What does the error means? Why is it not going pass the first batch?

Although the vectors I am using are random vectors, I hope to read a pickled file that loads a similar matrix that corresponds to the src_dict and target_dict. I would consider them to be dense since all cells in the matrix contains floating point values that ranges between [0.3, 1.0].

Is there any example of how to load a dense_vector_sequence in Paddle?

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#369
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7