Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • Paddle
  • Issue
  • #21828

P
Paddle
  • 项目概览

PaddlePaddle / Paddle
大约 2 年 前同步成功

通知 2325
Star 20933
Fork 5424
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 1423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
P
Paddle
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 1,423
    • Issue 1,423
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 543
    • 合并请求 543
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 12月 19, 2019 by saxon_zh@saxon_zhGuest

squeeze_op 在inference模式下导致产生错误

Created by: Meiyim

测试环境:cuda 9, cudnn 7.0.3 采用C++代码对inference_model进行前向预测。 相关config配置如下:

  paddle::AnalysisConfig config;
  config.SetModel(FLAGS_model_dir);
  config.EnableUseGpu(100, 0);
  config.SwitchSpecifyInputNames(true);
  config.EnableCUDNN();
  config.SwitchIrOptim(true);
  config.EnableMemoryOptim();

若采用fluid_inference 1.6.0 则直接出core没有信息 若菜用fluid_inference develop,版本信息如下:

GIT COMMIT ID: 0fe16539ef3651966080d5ae96850da4557751e0
WITH_MKL: ON
WITH_MKLDNN: ON
WITH_GPU: ON
CUDA version: 9.0
CUDNN version: v7

运行log如下:

--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [cudnn_placement_pass]
--- Running IR pass [is_test_pass]
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [conv_affine_channel_fuse_pass]
--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_eltwiseadd_bn_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass]
--- Running IR pass [fc_fuse_pass]
I1219 10:18:14.103123 63015 graph_pattern_detector.cc:101] ---  detected 12 subgraphs
I1219 10:18:14.138665 63015 graph_pattern_detector.cc:101] ---  detected 62 subgraphs
--- Running IR pass [fc_elementwise_layernorm_fuse_pass]
I1219 10:18:14.181339 63015 graph_pattern_detector.cc:101] ---  detected 24 subgraphs
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running IR pass [runtime_context_cache_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I1219 10:18:14.225345 63015 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU
--- Running analysis [adjust_cudnn_workspace_size_pass]
--- Running analysis [inference_op_replace_pass]
--- Running analysis [memory_optimize_pass]
I1219 10:18:14.448055 63015 memory_optimize_pass.cc:223] Cluster name : expand_1.tmp_0  size: 1572864
I1219 10:18:14.448089 63015 memory_optimize_pass.cc:223] Cluster name : cast_6.tmp_0  size: 786432
I1219 10:18:14.448096 63015 memory_optimize_pass.cc:223] Cluster name : where_0.tmp_0  size: 16
I1219 10:18:14.448108 63015 memory_optimize_pass.cc:223] Cluster name : fc_25.tmp_1  size: 3072
I1219 10:18:14.448114 63015 memory_optimize_pass.cc:223] Cluster name : layer_norm_4.tmp_2  size: 3072
I1219 10:18:14.448122 63015 memory_optimize_pass.cc:223] Cluster name : scatter_nd_add_22.tmp_0  size: 3072
I1219 10:18:14.448128 63015 memory_optimize_pass.cc:223] Cluster name : scatter_nd_add_23.tmp_0  size: 3072
I1219 10:18:14.448134 63015 memory_optimize_pass.cc:223] Cluster name : layer_norm_14.tmp_2  size: 3072
I1219 10:18:14.448140 63015 memory_optimize_pass.cc:223] Cluster name : shape_1.tmp_0  size: 12
I1219 10:18:14.448146 63015 memory_optimize_pass.cc:223] Cluster name : layer_norm_0.tmp_2  size: 393216
I1219 10:18:14.448158 63015 memory_optimize_pass.cc:223] Cluster name : eval_placeholder_1  size: 1024
--- Running analysis [ir_graph_to_program_pass]
I1219 10:18:14.516865 63015 analysis_predictor.cc:471] ======= optimize end =======
W1219 10:18:15.083250 63015 device_context.cc:236] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.2, Runtime API Version: 9.0
W1219 10:18:15.088192 63015 device_context.cc:244] device: 0, cuDNN Version: 7.3.
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
  what():

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::CUDADeviceContext::Wait() const
3   paddle::framework::TransDataDevice(paddle::framework::Tensor const&, paddle::platform::Place const&, paddle::framework::Tensor*)
4   paddle::framework::TransformData(paddle::framework::OpKernelType const&, paddle::framework::OpKernelType const&, paddle::framework::Tensor const&, paddle::framework::Tensor*)
5   paddle::framework::OperatorWithKernel::PrepareData(paddle::framework::Scope const&, paddle::framework::OpKernelType const&, std::vector<std::string, std::allocator<std::string> >*, paddle::framework::RuntimeContext*) const
6   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const
7   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const
8   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)
9   paddle::framework::NaiveExecutor::Run()
10  paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocator<paddle::PaddleTensor> > const&, std::vector<paddle::PaddleTensor, std::allocator<paddle::PaddleTensor> >*, int)

------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/app/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2488, in append_op
    attrs=kwargs.get("attrs", None))
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/app/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
    return self.main_program.current_block().append_op(*args, **kwargs)
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/app/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 9105, in squeeze
    "XShape": x_shape})
  File "/home/work/chenxuyi/gitlab/paddle-models/model/transformer_encoder.py", line 372, in encoder
    pad_idx = L.where(L.cast(L.squeeze(input_mask, axes=[2]), 'bool'))
  File "/home/work/chenxuyi/gitlab/paddle-models/model/ernie.py", line 187, in _build_model
    name='encoder')
  File "/home/work/chenxuyi/gitlab/paddle-models/model/ernie.py", line 124, in __init__
    input_mask)
  File "./ernie/xnli.py", line 57, in forward
    use_fp16=self.hparam['use_fp16']
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/paddle-estimator/propeller/paddle/train/trainer.py", line 147, in _model_fn
    pred = model.forward(fea)
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/paddle-estimator/propeller/paddle/train/trainer.py", line 83, in _build_net
    features=features, mode=mode, params=params, run_config=run_config)
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/paddle-estimator/propeller/paddle/train/trainer.py", line 230, in _build_for_eval
    self.params, self.run_config)
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/paddle-estimator/propeller/paddle/train/trainer.py", line 482, in __init__
    0])  #eval_datasets must have same output shapes
  File "/home/work/chenxuyi/dis/pp/fine/mergeWeb/paddle-estimator/propeller/paddle/train/trainer.py", line 530, in train_and_eval
    train_hooks.append(_EvalHookOnTrainLoop())
  File "./ernie/xnli.py", line 221, in <module>
    exporters=[best_exporter])

----------------------
Error Message Summary:
----------------------
FatalError: cudaStreamSynchronize raises error: unspecified launch failure, errono: 4: unspecified launch failure at (/work/paddle/fluid/platform/device_context.cc:330)
  [operator < squeeze2 > error]
./gpu.sh: line 13: 63015 Aborted                 (core dumped) ./build/inference --logtostderr --model_dir $2 --data $1 --repeat 1 --output_prediction true --use_gpu true --device 0

截取部分组网代码,贴在下面:

    d_shape = L.shape(L.cast(enc_input, 'float32'))
    input_hidden_dim = enc_input.shape[-1]
    pad_idx = L.where(L.cast(L.squeeze(input_mask, axes=[2]), 'bool')) #!!!!!!!!!!!!!
    attn_bias = L.matmul(input_mask, input_mask, transpose_y=True) 
    attn_bias = (1. - attn_bias) * -10000.
    attn_bias = L.unsqueeze(attn_bias, axes=[1])
    attn_bias = L.expand(attn_bias, [1, n_head, 1, 1]) 
    if attn_bias.dtype != enc_input.dtype:
        attn_bias = L.cast(attn_bias, enc_input.dtype)
指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/Paddle#21828
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7