Created by: Xreki
- sum的输入超过10个时,develop代码会挂掉,log以及一些debug信息如下:
I0310 06:39:21.452370 7244 code_generator.cc:191] Op(sum), inputs:{1,2,3,4,0,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}, outputs:{20}
I0310 06:39:21.452404 7244 code_generator_helper.cc:50] input_size: 20
I0310 06:39:21.452409 7244 code_generator_helper.cc:51] sum_rhs: ${0} + ${1} + ${2} + ${3} + ${4} + ${5} + ${6} + ${7} + ${8} + ${9} + ${10} + ${110} + ${1210} + ${13210} + ${143210} + ${1543210} + ${16543210} + ${176543210} + ${1876543210} + ${19876543210}
/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py:855: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "train.py", line 485, in <module>
main()
File "train.py", line 478, in main
train()
File "train.py", line 417, in train
train_ppl = train_an_epoch_dataloader(epoch_id, batch_times)
File "train.py", line 368, in train_an_epoch_dataloader
use_program_cache=True)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 856, in run
six.reraise(*sys.exc_info())
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 851, in run
return_merged=return_merged)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 907, in _run_impl
program._compile(scope, self.place)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/compiler.py", line 424, in _compile
places=self._places)
File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/compiler.py", line 377, in _compile_data_parallel
self._exec_strategy, self._build_strategy, self._graph)
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::framework::ir::fusion_group::OperationExpression::GetRHS(std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> >*, unsigned long) const
3 paddle::framework::ir::fusion_group::OperationExpression::GetExpression(std::string, std::unordered_set<int, std::hash<int>, std::equal_to<int>, std::allocator<int> >*) const
4 paddle::framework::ir::fusion_group::CodeGenerator::EmitComputeBody(std::vector<paddle::framework::ir::fusion_group::OperationExpression, std::allocator<paddle::framework::ir::fusion_group::OperationExpression> > const&, std::set<int, std::less<int>, std::allocator<int> > const&, std::set<int, std::less<int>, std::allocator<int> > const&, std::string)
5 paddle::framework::ir::fusion_group::CodeGenerator::Generate(std::string, std::string, std::vector<paddle::framework::ir::fusion_group::OperationExpression, std::allocator<paddle::framework::ir::fusion_group::OperationExpression> > const&)
6 paddle::framework::ir::fusion_group::CodeGenerator::Generate(paddle::framework::ir::fusion_group::SubGraph*)
7 paddle::framework::ir::FusionGroupPass::GenerateCode(paddle::framework::ir::fusion_group::SubGraph*) const
8 paddle::framework::ir::FusionGroupPass::DetectFusionGroup(paddle::framework::ir::Graph*, int) const
9 paddle::framework::ir::FusionGroupPass::ApplyImpl(paddle::framework::ir::Graph*) const
10 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph*) const
11 paddle::framework::details::BuildStrategy::Apply(paddle::framework::ir::Graph*, std::vector<paddle::platform::Place, std::allocator<paddle::platform::Place> > const&, std::string const&, std::vector<paddle::framework::Scope*, std::allocator<paddle::framework::Scope*> > const&, unsigned long const&, bool, paddle::platform::NCCLCommunicator*) const
12 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocator<paddle::platform::Place> > const&, std::vector<std::string, std::allocator<std::string> > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocator<paddle::framework::Scope*> > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)
----------------------
Error Message Summary:
----------------------
InvalidArgumentError: Only 20 inputs are provided, but need 111 for operation < sum >.
[Hint: Expected index < input_ids_.size(), but received index:110 >= input_ids_.size():20.] at (/paddle/paddle/fluid/framework/ir/fusion_group/code_generator_helper.cc:77)
错误原因是,std::string的replace实际上会改变当前str的内容,即模板sum_rhs_component被改变了。
- 实际program转换成的Graph中可能存在一些control节点,下面代码中,不对node类型进行判断直接访问
Var()
,在language_model训练中也会挂掉。
- language_model中存在一些sqrt、square算子,和sum算子相连,可以被融合成一个fusion_group。这个PR中添加了sqrt、square算子的前向、反向计算公式。