Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #2061

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 5月 08, 2017 by saxon_zh@saxon_zhGuest

Incorrect topological parsing with memory-layer referencing.

Created by: xinghai-sun

It seems that PaddlePaddle V2 APIs only consider the explicit layer connection (by "input" argument) when parsing the network topology, neglect of the fact that the memory-layer referencing (by "name" argument of paddle.layer.memory) should also be considered as an implicitly connection. As a result, such a layer with its output only referenced by a memory layer and not explicitly connected to any final cost/output layer, will not be created at all during backward traversing the topological graph.

Here is a simple example:

import paddle.v2 as paddle

def main():
    hidden_size = 128
    dict_size = 30000
    paddle.init(use_gpu=False, trainer_count=1)

    words = paddle.layer.data(
        name="words",
        type=paddle.data_type.integer_value_sequence(dict_size))
    next_words = paddle.layer.data(
        name='next_words',
        type=paddle.data_type.integer_value_sequence(dict_size))

    def recurrent_step(embedding):
        last_memory = paddle.layer.memory(name="memory", size=hidden_size)
        memory_update = paddle.layer.fc(
            name="memory", input=[last_memory, embedding], size=hidden_size)
        predict = paddle.layer.fc(
            input=[embedding, last_memory],
            size=dict_size,
            act=paddle.activation.Softmax())
        return predict

    predict_seq = paddle.layer.recurrent_group(
        step=recurrent_step,
        input=[paddle.layer.embedding(input=words, size=hidden_size)])
    cost = paddle.layer.classification_cost(
        input=predict_seq, label=next_words)

    parameters = paddle.parameters.create(cost)
    optimizer = paddle.optimizer.Adam(learning_rate=5e-5)
    trainer = paddle.trainer.SGD(
        cost=cost, parameters=parameters, update_equation=optimizer)

if __name__ == '__main__':
    main()

With error:

Traceback (most recent call last):
  File "bug.py", line 39, in <module>
    main()
  File "bug.py", line 32, in main
    parameters = paddle.parameters.create(cost)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/parameters.py", line 19, in create
    topology = Topology(layers)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/topology.py", line 69, in __init__
    layers, extra_layers=extra_layers)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 96, in parse_network
    return __parse__(__real_func__)
  File "/usr/local/lib/python2.7/site-packages/paddle/trainer_config_helpers/config_parser_utils.py", line 32, in parse_network_config
    config = config_parser.parse_config(network_conf, config_arg_str)
  File "/usr/local/lib/python2.7/site-packages/paddle/trainer/config_parser.py", line 3597, in parse_config
    trainer_config()
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 89, in __real_func__
    real_output = [each.to_proto(context=context) for each in output_layers]
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/config_base.py", line 109, in to_proto
    context=context)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/config_base.py", line 116, in to_proto
    ret_val = self.to_proto_impl(**kwargs)
  File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 398, in to_proto_impl
    RecurrentLayerGroupEnd(name=self.__recurrent_name__)
  File "/usr/local/lib/python2.7/site-packages/paddle/trainer/config_parser.py", line 419, in RecurrentLayerGroupEnd
    layer = g_layer_map[pair.layer_name]
KeyError: u'memory@__recurrent_group_0__'

I think it is due to that the memory_update layer is not created at all, and then PaddlePaddle cannot find any layer matching the name "memory" in the created last_memory layer. The reason might be that the memory_update layer is not explicitly connected to the cost layer, misleading PaddlePaddle to ignore it when creating layers.

However, it is actually connected (in a indirect or implicit manner) to the cost layer in the next time step through paddle.layer.memory component, and of-course, should never be ignored.

I guess, any recurrent model with a cost layer depending on the previous-step memory rather than current-step memory (updated just now) will meet the same problem (because the current-step update memory layer will then have no connection to the cost layer within current time step).

To prove it, I change only a single line of the code, making the cost layer depend on the current-step memory instead of the previous-step memory in original code, and then the model works just well.

I change last_memory to memory_update as below (such that memory_update is explicitly connected to the final cost), and the code works just well.

From

        predict = paddle.layer.fc(
            input=[embedding, last_memory],
            size=dict_size,
            act=paddle.activation.Softmax())

to

        predict = paddle.layer.fc(
            input=[embedding, memory_update],
            size=dict_size,
            act=paddle.activation.Softmax())

Neural Turing Machine model with "read first and write next" (not reverse) will also have such a problem. However, demos like vanilla LSTM/ GRU will not run into the problem since their cost or softmax output distribution depends LUCKILY on updated memory (hidden state, or cell state), instead of previous memory.

Besides, such a problem didn't exist in V1 APIs.

Would it be a bug? Could anyone help solve this issue?

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#2061
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7