[Paddle-TRT] Can't do ERNIE inference with tensorrt in develop branch
Created by: zlsh80826
-PaddlePaddle version: develop -GPU: including CUDA 10.2.89/CUDNN 7.6 -OS Platform: Ubuntu16.04 -Python version: 3.7 -C++version: 7.3.0 -API information To Reproduce Run the ERNIE tensorrt example in Paddle-Inference-Demo Other info / logs Hello, we ran into a tensorrt bug in the latest commit when running the above example. (Actually, I think this bug was happened for a long time). We can run the example in the older commit (61ec30f0). The bug seems like caused by the incorrect op convert, following is the error log.
I0617 21:33:10.236757 7086 analysis_predictor.cc:140] Profiler is deactivated, and no profiling report will be generated.
I0617 21:33:10.243189 7086 analysis_predictor.cc:929] MODEL VERSION: 1.7.2
I0617 21:33:10.243207 7086 analysis_predictor.cc:931] PREDICTOR VERSION: 0.0.0
W0617 21:33:10.243245 7086 analysis_predictor.cc:944] - Version incompatible (1) dropout
W0617 21:33:10.243255 7086 analysis_predictor.cc:944] - Version incompatible (1) elementwise_add
W0617 21:33:10.243261 7086 analysis_predictor.cc:944] - Version incompatible (1) feed
W0617 21:33:10.243268 7086 analysis_predictor.cc:944] - Version incompatible (1) fetch
W0617 21:33:10.243274 7086 analysis_predictor.cc:944] - Version incompatible (3) layer_norm
W0617 21:33:10.243280 7086 analysis_predictor.cc:944] - Version incompatible (1) lookup_table
W0617 21:33:10.243288 7086 analysis_predictor.cc:944] - Version incompatible (2) matmul
W0617 21:33:10.243294 7086 analysis_predictor.cc:944] - Version incompatible (2) mul
W0617 21:33:10.243299 7086 analysis_predictor.cc:944] - Version incompatible (1) relu
W0617 21:33:10.243305 7086 analysis_predictor.cc:944] - Version incompatible (2) reshape2
W0617 21:33:10.243311 7086 analysis_predictor.cc:944] - Version incompatible (1) scale
W0617 21:33:10.243317 7086 analysis_predictor.cc:944] - Version incompatible (2) slice
W0617 21:33:10.243324 7086 analysis_predictor.cc:944] - Version incompatible (1) softmax
W0617 21:33:10.243330 7086 analysis_predictor.cc:944] - Version incompatible (1) stack
W0617 21:33:10.243336 7086 analysis_predictor.cc:944] - Version incompatible (1) tanh
W0617 21:33:10.243342 7086 analysis_predictor.cc:944] - Version incompatible (1) transpose2
W0617 21:33:10.243348 7086 analysis_predictor.cc:196] WARNING: Results may be DIFF! Please use the corresponding version of the model and prediction library, and do not use the develop branch.
I0617 21:33:10.243445 7086 analysis_predictor.cc:450] TensorRT subgraph engine is enabled
^[[1m^[[35m--- Running analysis [ir_graph_build_pass]^[[0m
^[[1m^[[35m--- Running analysis [ir_graph_clean_pass]^[[0m
^[[1m^[[35m--- Running analysis [ir_analysis_pass]^[[0m
^[[32m--- Running IR pass [conv_affine_channel_fuse_pass]^[[0m
^[[32m--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]^[[0m
^[[32m--- Running IR pass [shuffle_channel_detect_pass]^[[0m
^[[32m--- Running IR pass [quant_conv2d_dequant_fuse_pass]^[[0m
^[[32m--- Running IR pass [delete_quant_dequant_op_pass]^[[0m
^[[32m--- Running IR pass [simplify_with_basic_ops_pass]^[[0m
^[[32m--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]^[[0m
I0617 21:33:11.094410 7086 graph_pattern_detector.cc:100] --- detected 7 subgraphs
^[[32m--- Running IR pass [multihead_matmul_fuse_pass_v2]^[[0m
I0617 21:33:11.144394 7086 graph_pattern_detector.cc:100] --- detected 3 subgraphs
^[[32m--- Running IR pass [skip_layernorm_fuse_pass]^[[0m
I0617 21:33:11.157673 7086 graph_pattern_detector.cc:100] --- detected 6 subgraphs
^[[32m--- Running IR pass [conv_bn_fuse_pass]^[[0m
^[[32m--- Running IR pass [fc_fuse_pass]^[[0m
I0617 21:33:11.159014 7086 graph_pattern_detector.cc:100] --- detected 3 subgraphs
I0617 21:33:11.159682 7086 graph_pattern_detector.cc:100] --- detected 8 subgraphs
^[[32m--- Running IR pass [tensorrt_subgraph_pass]^[[0m
I0617 21:33:11.161391 7086 tensorrt_subgraph_pass.cc:115] --- detect a sub-graph with 19 nodes
W0617 21:33:11.162122 7086 tensorrt_subgraph_pass.cc:285] The Paddle lib links the 7011 version TensorRT, make sure the runtime TensorRT you are using is no less than this version, otherwise, there might be Segfault!
I0617 21:33:11.162168 7086 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0617 21:33:12.478950 7086 engine.cc:83] Run Paddle-TRT FP16 mode
I0617 21:33:12.479008 7086 engine.cc:151] Run Paddle-TRT Dynamic Shape mode.
W0617 21:33:16.834189 7086 device_context.cc:265] Please NOTE: device: 2, CUDA Capability: 75, Driver API Version: 11.0, Runtime API Version: 10.2
W0617 21:33:16.834365 7086 device_context.cc:273] device: 2, cuDNN Version: 7.6.
I0617 21:33:37.546492 7086 tensorrt_subgraph_pass.cc:115] --- detect a sub-graph with 4 nodes
I0617 21:33:37.546820 7086 tensorrt_subgraph_pass.cc:321] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what():
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::inference::tensorrt::OpConverter::ConvertBlockToTRTEngine(paddle::framework::BlockDesc*, paddle::framework::Scope const&, std::vector<std::string, std::allocator<std::string > > const&, std::unordered_set<std::string, std::hash<std::string >, std::equal_to<std::string >, std::allocator<std::string > > const&, std::vector<std::string, std::allocator<std::string > > const&, paddle::inference::tensorrt::TensorRTEngine*)
3 paddle::inference::analysis::TensorRtSubgraphPass::CreateTensorRTOp(paddle::framework::ir::Node*, paddle::framework::ir::Graph*, std::vector<std::string, std::allocator<std::string > > const&, std::vector<std::string, std::allocator<std::string > >*) const
4 paddle::inference::analysis::TensorRtSubgraphPass::ApplyImpl(paddle::framework::ir::Graph*) const
5 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph*) const
6 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete<paddle::framework::ir::Graph> >)
7 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument*)
8 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*)
9 paddle::AnalysisPredictor::OptimizeInferenceProgram()
10 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr<paddle::framework::ProgramDesc> const&)
11 paddle::AnalysisPredictor::Init(std::shared_ptr<paddle::framework::Scope> const&, std::shared_ptr<paddle::framework::ProgramDesc> const&)
12 std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
13 std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(paddle::AnalysisConfig const&)
----------------------
Error Message Summary:
----------------------
InvalidArgumentError: TensorRT's tensor input requires at least 2 dimensions, but input slice_0.tmp_0 has 1 dims.
[Hint: Expected shape.size() > 1UL, but received shape.size():1 <= 1UL:1.] at (/home/rewang/Paddle-dev/paddle/fluid/inference/tensorrt/engine.h:67)