fluid.layers.lstm模块返回值异常&feed值之后会抛出异常 (#23283) · Issue · PaddlePaddle / Paddle

fluid.layers.lstm模块返回值异常&feed值之后会抛出异常

Created by: Cli212

在GPU环境下使用paddle.fluid.layers.lstm模块，静态图，在打开is_bidirect = True的情况下，paddle说明文档里写明rnn_out的维度应该为[seq_len,batch_size,hidden_size∗2]，然而现在的输出维度依然为[seq_len,batch_size,hidden_size]，且通过executor feed值之后会抛出[operator < cudnn_lstm > error]，代码如下

article_len = 512
sequence_len = 128
emb_size = 768
batch_size = 2
hidden_size = 512
num_layers = 2
dropout_prob = 0.0
context_emb = fluid.layers.data(name='context',shape = [None,article_len,emb_size],dtype = 'float32',append_batch_size=False)
query_emb = fluid.layers.data(name='query',shape = [None,sequence_len,emb_size],dtype = 'float32',append_batch_size=False)
query_integrals = fluid.layers.data(name = 'query_integral',shape = [None,emb_size],dtype = 'float32',append_batch_size=False)
labels = fluid.layers.data(name = 'label',shape = [batch_size],dtype='float32',append_batch_size=False)
init_h = fluid.layers.fill_constant( [num_layers*2, batch_size,hidden_size], 'float32', 0.0 )
init_c = fluid.layers.fill_constant( [num_layers*2, batch_size, hidden_size], 'float32', 0.0 )
context_rnn_out, context_last_h, context_last_c = fluid.layers.lstm(context_emb, init_h, init_c, article_len, hidden_size, num_layers,is_bidirec=True)
query_rnn_out, query_last_h, query_last_c = fluid.layers.lstm(query_emb, init_h, init_c, sequence_len, hidden_size, num_layers)
print(context_rnn_out.shape)
# self.r_q = fluid.layers.masked_select(input=query_rnn_out, mask=self.query_mask)
# self.r_p = fluid.layers.masked_select(input=context_rnn_out, mask=self.context_mask)
r_q = query_rnn_out #[none,seq_len,hidden_size]
u_q = fluid.layers.fc(query_integrals,size = hidden_size)
r_p = context_rnn_out#[none,article_len,hidden_size]

print结果为(-1, 512, 512)，正常的花应该为(-1,512,1024)

import numpy as np
use_cuda = True
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place=place)
exe.run(fluid.default_startup_program())
optimizer = fluid.optimizer.SGD(learning_rate=0.5)
def reader():
    yield [[np.random.random([2,512,768]).astype(np.float32),np.random.random([2,128,768]).astype(np.float32),np.random.random([2,768]).astype(np.float32),np.array([0,1]).astype(np.float32)]]
# for i in range(100):
feeder = fluid.DataFeeder(['context','query','query_integral','label'],place)
for data in reader():
    loss_val = exe.run(fluid.default_main_program(), feed=feeder.feed(data), fetch_list=[r_q.name,u_q.name,r_p.name])
    # optimizer.minimize(loss_val)

feed值后抛出异常：

  "The following exception is not an EOF exception.")
---------------------------------------------------------------------------EnforceNotMet                             Traceback (most recent call last)<ipython-input-3-49f106cee0b2> in <module>
     10 feeder = fluid.DataFeeder(['context','query','query_integral','label'],place)
     11 for data in reader():
---> 12     loss_val = exe.run(fluid.default_main_program(), feed=feeder.feed(data), fetch_list=[r_q.name,u_q.name,r_p.name])
     13     # optimizer.minimize(loss_val)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
    781                 warnings.warn(
    782                     "The following exception is not an EOF exception.")
--> 783             six.reraise(*sys.exc_info())
    784 
    785     def _run_impl(self, program, feed, fetch_list, feed_var_name,
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
    776                 scope=scope,
    777                 return_numpy=return_numpy,
--> 778                 use_program_cache=use_program_cache)
    779         except Exception as e:
    780             if not isinstance(e, core.EOFException):
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run_impl(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
    829                 scope=scope,
    830                 return_numpy=return_numpy,
--> 831                 use_program_cache=use_program_cache)
    832 
    833         program._compile(scope, self.place)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run_program(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
    903         if not use_program_cache:
    904             self._default_executor.run(program.desc, scope, 0, True, True,
--> 905                                        fetch_var_name)
    906         else:
    907             self._default_executor.run_prepared_ctx(ctx, scope, False, False,
EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::operators::CudnnLSTMGPUKernel<float>::Compute(paddle::framework::ExecutionContext const&) const
3   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CudnnLSTMGPUKernel<float> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
4   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
5   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
6   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
7   paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
8   paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool, bool)

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
    attrs=kwargs.get("attrs", None))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/rnn.py", line 2188, in lstm
    'seed': seed,
  File "<ipython-input-1-e0921ff7663d>", line 15, in <module>
    context_rnn_out, context_last_h, context_last_c = fluid.layers.lstm(context_emb, init_h, init_c, article_len, hidden_size, num_layers,is_bidirec=True)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3265, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3183, in run_ast_nodes
    if (yield from self.run_code(code, result)):
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3018, in run_cell_async
    interactivity=interactivity, compiler=compiler, result=result)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 67, in _pseudo_sync_runner
    coro.send(None)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2843, in _run_cell
    return runner(coro)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2817, in run_cell
    raw_cell, store_history, silent, shell_futures)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel/zmqshell.py", line 536, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel/ipkernel.py", line 294, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 534, in execute_request
    user_expressions, allow_stdin,
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 267, in dispatch_shell
    yield gen.maybe_future(handler(stream, idents, msg))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/gen.py", line 326, in wrapper
    yielded = next(result)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel/kernelbase.py", line 357, in process_one
    yield gen.maybe_future(dispatch(*args))
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/gen.py", line 1147, in run
    yielded = self.gen.send(value)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/gen.py", line 1233, in inner
    self.run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/stack_context.py", line 300, in null_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/ioloop.py", line 758, in _run_callback
    ret = callback()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/asyncio/base_events.py", line 1771, in _run_once
    handle._run()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/asyncio/base_events.py", line 534, in run_forever
    self._run_once()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/tornado/platform/asyncio.py", line 132, in start
    self.asyncio_loop.run_forever()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel/kernelapp.py", line 505, in start
    self.io_loop.start()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/traitlets/config/application.py", line 664, in launch_instance
    app.start()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)

----------------------
Error Message Summary:
----------------------
Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.
  - New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
  - Recommended issue content: all error stack information
  [Hint: CUDNN_STATUS_EXECUTION_FAILED] at (/paddle/paddle/fluid/operators/cudnn_lstm_op.cu.cc:113)
  [operator < cudnn_lstm > error]```

PaddlePaddle / Paddle 1 年多 前同步成功

fluid.layers.lstm模块返回值异常&feed值之后会抛出异常

PaddlePaddle / Paddle
1 年多前同步成功