Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #504

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 11月 17, 2016 by saxon_zh@saxon_zhGuest

How to stack GRU layers in the seqToseq_net.py demo example?

Created by: alvations

In the seqToseq demo, the main code that implements the decoder is:

    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = memory(
            name='gru_decoder', size=decoder_size, boot_layer=decoder_boot)

        context = simple_attention(
            encoded_sequence=enc_vec,
            encoded_proj=enc_proj,
            decoder_state=decoder_mem, )

        with mixed_layer(size=decoder_size * 3) as decoder_inputs:
            decoder_inputs += full_matrix_projection(input=context)
            decoder_inputs += full_matrix_projection(input=current_word)

        gru_step = gru_step_layer(
            name='gru_decoder',
            input=decoder_inputs,
            output_mem=decoder_mem,
            size=decoder_size)

        with mixed_layer(
                size=target_dict_dim, bias_attr=True,
                act=SoftmaxActivation()) as out:
            out += full_matrix_projection(input=gru_step)
        return out

How do I add more layers in the decoder like in the Google NMT paper?

From my understanding (according to the documentation), the recurrent_group at https://github.com/baidu/Paddle/blob/develop/demo/seqToseq/seqToseq_net.py#L156 is the part where Paddle takes a timestep so the layers should be added inside the gru_decoder_with_attention(), is that right?

But I'm unsure what is gru_step_layer() doing at https://github.com/baidu/Paddle/blob/develop/demo/seqToseq/seqToseq_net.py#L124 . There isn't much information in the documentation and from the code it says it's wrapping the gru_step which is at the GruStepLayer.cpp:

/**

  • @brief GruStepLayer is like GatedRecurrentLayer, but used in recurrent
  • layer group. GruStepLayer takes 2 input layer.
    • input[0] with size * 3 and diveded into 3 equal parts: (xz_t, xr_t, xi_t).
    • input[1] with size: {prev_out}.
  • parameter and biasParameter is also diveded into 3 equal parts:
    • parameter consists of (U_z, U_r, U)
    • baisParameter consists of (bias_z, bias_r, bias_o)
  • \f[
  • update \ gate: z_t = actGate(xz_t + U_z * prev_out + bias_z) \
  • reset \ gate: r_t = actGate(xr_t + U_r * prev_out + bias_r) \
  • output \ candidate: {h}_t = actNode(xi_t + U * dot(r_t, prev_out) + bias_o) \
  • output: h_t = dot((1-z_t), prev_out) + dot(z_t, prev_out)
  • \f]
  • @note
    • dot denotes "element-wise multiplication".
    • actNode is defined by config active_type
    • actGate is defined by config actvie_gate_type
  • The config file api if gru_step_layer. */

When I tried to stack the layers as such:

    def gru_decoder_with_attention(enc_vec, enc_proj, current_word):
        decoder_mem = memory(name='gru_decoder',
                             size=decoder_size,
                             boot_layer=decoder_boot)

        context = simple_attention(encoded_sequence=enc_vec,
                                   encoded_proj=enc_proj,
                                   decoder_state=decoder_mem, )


        with mixed_layer(size=decoder_size * 3) as decoder_inputs:
            decoder_inputs += full_matrix_projection(input=context)
            decoder_inputs += full_matrix_projection(input=current_word)

        gru_step = gru_step_layer(name='gru_decoder',
                                  input=decoder_inputs,
                                  output_mem=decoder_mem,
                                  size=decoder_size)

        with mixed_layer(size=decoder_size * 3) as decoder_inputs:
            decoder_inputs += full_matrix_projection(input=context)
            decoder_inputs += full_matrix_projection(input=gru_step)

        gru_step = gru_step_layer(name='gru_decoder',
                                  input=decoder_inputs,
                                  output_mem=decoder_mem,
                                  size=decoder_size)


        with mixed_layer(size=decoder_size,
                         bias_attr=True,
                         act=SoftmaxActivation()) as out:
            out += full_matrix_projection(input=gru_step)

        return out

It throw an error that says there's no input sequence.

$ bash train.sh 
I1117 14:37:45.858877 13026 Util.cpp:155] commandline: /home/ltan/Paddle/binary/bin/../opt/paddle/bin/paddle_trainer --config=train.conf --save_dir=/home/ltan/Paddle/demo/ibot/model-sub --use_gpu=true --num_passes=100 --show_parameter_stats_period=1000 --trainer_count=4 --log_period=10 --dot_period=5 
I1117 14:37:51.345690 13026 Util.cpp:130] Calling runInitFunctions
I1117 14:37:51.345918 13026 Util.cpp:143] Call runInitFunctions done.
[WARNING 2016-11-17 14:37:51,529 layers.py:1133] You are getting the first instance for a time series, and it is a normal recurrent layer output. There is no time series information at all. Maybe you want to use last_seq instead.
[WARNING 2016-11-17 14:37:51,533 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,539 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[WARNING 2016-11-17 14:37:51,540 default_decorators.py:40] please use keyword arguments in paddle config.
[INFO 2016-11-17 14:37:51,543 networks.py:1125] The input order is [source_language_word, target_language_word, target_language_next_word]
[INFO 2016-11-17 14:37:51,543 networks.py:1132] The output order is [__cost_0__]
I1117 14:37:51.551156 13026 Trainer.cpp:170] trainer mode: Normal
I1117 14:37:51.552254 13026 MultiGradientMachine.cpp:108] numLogicalDevices=1 numThreads=4 numDevices=4
I1117 14:37:51.656347 13026 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-17 14:37:51,656 dataprovider.py:27] src dict len : 10000
[INFO 2016-11-17 14:37:51,656 dataprovider.py:37] trg dict len : 7116
I1117 14:37:51.676383 13026 PyDataProvider2.cpp:247] loading dataprovider dataprovider::process
[INFO 2016-11-17 14:37:51,676 dataprovider.py:27] src dict len : 10000
[INFO 2016-11-17 14:37:51,676 dataprovider.py:37] trg dict len : 7116
I1117 14:37:51.676964 13026 GradientMachine.cpp:134] Initing parameters..
I1117 14:37:52.880277 13026 GradientMachine.cpp:141] Init parameters done.
F1117 14:37:53.244454 13049 GatedRecurrentLayer.cpp:79] Check failed: input.sequenceStartPositions 
*** Check failure stack trace: ***
    @     0x7f3acc440daa  (unknown)
    @     0x7f3acc440ce4  (unknown)
    @     0x7f3acc4406e6  (unknown)
    @     0x7f3acc443687  (unknown)
    @           0x5a9d7c  paddle::GatedRecurrentLayer::forward()
    @           0x66c220  paddle::NeuralNetwork::forward()
    @           0x65cf6f  paddle::RecurrentGradientMachine::forward()
    @           0x5ef64a  paddle::RecurrentLayerGroup::forward()
    @           0x66c220  paddle::NeuralNetwork::forward()
    @           0x672617  paddle::TrainerThread::forward()
    @           0x674935  paddle::TrainerThread::computeThread()
    @     0x7f3acbfbda60  (unknown)
    @     0x7f3accff9184  start_thread
F1117 14:37:53.255451 13041 GatedRecurrentLayer.cpp:79] Check failed: input.sequenceStartPositions 
*** Check failure stack trace: ***
    @     0x7f3acb72537d  (unknown)
    @     0x7f3acc440daa  (unknown)
    @              (nil)  (unknown)
/home/ltan/Paddle/binary/bin/paddle: line 81: 13026 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

I'm not sure what is wrong here. **Is it because my inputs to the gru_step_layer is wrong? **

How should the GRU layer stacking be done on the seqToseq GRU decoder?

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#504
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7