Incorrect topological parsing with memory-layer referencing.
Created by: xinghai-sun
It seems that PaddlePaddle V2 APIs only consider the explicit layer connection (by "input" argument) when parsing the network topology, neglect of the fact that the memory-layer referencing (by "name" argument of paddle.layer.memory) should also be considered as an implicitly connection. As a result, such a layer with its output only referenced by a memory layer and not explicitly connected to any final cost/output layer, will not be created at all during backward traversing the topological graph.
Here is a simple example:
import paddle.v2 as paddle
def main():
hidden_size = 128
dict_size = 30000
paddle.init(use_gpu=False, trainer_count=1)
words = paddle.layer.data(
name="words",
type=paddle.data_type.integer_value_sequence(dict_size))
next_words = paddle.layer.data(
name='next_words',
type=paddle.data_type.integer_value_sequence(dict_size))
def recurrent_step(embedding):
last_memory = paddle.layer.memory(name="memory", size=hidden_size)
memory_update = paddle.layer.fc(
name="memory", input=[last_memory, embedding], size=hidden_size)
predict = paddle.layer.fc(
input=[embedding, last_memory],
size=dict_size,
act=paddle.activation.Softmax())
return predict
predict_seq = paddle.layer.recurrent_group(
step=recurrent_step,
input=[paddle.layer.embedding(input=words, size=hidden_size)])
cost = paddle.layer.classification_cost(
input=predict_seq, label=next_words)
parameters = paddle.parameters.create(cost)
optimizer = paddle.optimizer.Adam(learning_rate=5e-5)
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=optimizer)
if __name__ == '__main__':
main()
With error:
Traceback (most recent call last):
File "bug.py", line 39, in <module>
main()
File "bug.py", line 32, in main
parameters = paddle.parameters.create(cost)
File "/usr/local/lib/python2.7/site-packages/paddle/v2/parameters.py", line 19, in create
topology = Topology(layers)
File "/usr/local/lib/python2.7/site-packages/paddle/v2/topology.py", line 69, in __init__
layers, extra_layers=extra_layers)
File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 96, in parse_network
return __parse__(__real_func__)
File "/usr/local/lib/python2.7/site-packages/paddle/trainer_config_helpers/config_parser_utils.py", line 32, in parse_network_config
config = config_parser.parse_config(network_conf, config_arg_str)
File "/usr/local/lib/python2.7/site-packages/paddle/trainer/config_parser.py", line 3597, in parse_config
trainer_config()
File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 89, in __real_func__
real_output = [each.to_proto(context=context) for each in output_layers]
File "/usr/local/lib/python2.7/site-packages/paddle/v2/config_base.py", line 109, in to_proto
context=context)
File "/usr/local/lib/python2.7/site-packages/paddle/v2/config_base.py", line 116, in to_proto
ret_val = self.to_proto_impl(**kwargs)
File "/usr/local/lib/python2.7/site-packages/paddle/v2/layer.py", line 398, in to_proto_impl
RecurrentLayerGroupEnd(name=self.__recurrent_name__)
File "/usr/local/lib/python2.7/site-packages/paddle/trainer/config_parser.py", line 419, in RecurrentLayerGroupEnd
layer = g_layer_map[pair.layer_name]
KeyError: u'memory@__recurrent_group_0__'
I think it is due to that the memory_update layer is not created at all, and then PaddlePaddle cannot find any layer matching the name "memory" in the created last_memory layer. The reason might be that the memory_update layer is not explicitly connected to the cost layer, misleading PaddlePaddle to ignore it when creating layers.
However, it is actually connected (in a indirect or implicit manner) to the cost layer in the next time step through paddle.layer.memory component, and of-course, should never be ignored.
I guess, any recurrent model with a cost layer depending on the previous-step memory rather than current-step memory (updated just now) will meet the same problem (because the current-step update memory layer will then have no connection to the cost layer within current time step).
To prove it, I change only a single line of the code, making the cost layer depend on the current-step memory instead of the previous-step memory in original code, and then the model works just well.
I change last_memory to memory_update as below (such that memory_update is explicitly connected to the final cost), and the code works just well.
From
predict = paddle.layer.fc(
input=[embedding, last_memory],
size=dict_size,
act=paddle.activation.Softmax())
to
predict = paddle.layer.fc(
input=[embedding, memory_update],
size=dict_size,
act=paddle.activation.Softmax())
Neural Turing Machine model with "read first and write next" (not reverse) will also have such a problem. However, demos like vanilla LSTM/ GRU will not run into the problem since their cost or softmax output distribution depends LUCKILY on updated memory (hidden state, or cell state), instead of previous memory.
Besides, such a problem didn't exist in V1 APIs.
Would it be a bug? Could anyone help solve this issue?