Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • Issue
  • #4710

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 6月 19, 2020 by saxon_zh@saxon_zhGuest

图片分类训练报错

Created by: mcl-stone

image 使用上面的命令报错如下: [root@72b2d0dbbbe6 image_classification]# python3 train.py --data_dir=./data/mask/ --total_images=186 --class_dim=2 --validate=True --model=ResNet50_vd --batch_size=8 --lr_strategy=cosine_decay --lr=0.1 --num_epochs=200 --model_save_dir=output/ --l2_decay=7e-5 --use_mixup=True --use_label_smoothing=True --label_smoothing_epsilon=0.1 2020-06-19 10:08:37,855-INFO: ------------- Configuration Arguments ------------- 2020-06-19 10:08:37,855-INFO: batch_size : 8 2020-06-19 10:08:37,855-INFO: checkpoint : None 2020-06-19 10:08:37,855-INFO: class_dim : 2 2020-06-19 10:08:37,855-INFO: data_dir : ./data/mask/ 2020-06-19 10:08:37,855-INFO: data_format : NCHW 2020-06-19 10:08:37,855-INFO: decay_epochs : 2.4 2020-06-19 10:08:37,855-INFO: decay_rate : 0.97 2020-06-19 10:08:37,855-INFO: drop_connect_rate : 0.2 2020-06-19 10:08:37,855-INFO: ema_decay : 0.9999 2020-06-19 10:08:37,855-INFO: enable_ce : False 2020-06-19 10:08:37,855-INFO: finetune_exclude_pretrained_params : None 2020-06-19 10:08:37,856-INFO: fuse_bn_act_ops : False 2020-06-19 10:08:37,856-INFO: fuse_elewise_add_act_ops : False 2020-06-19 10:08:37,856-INFO: image_mean : [0.485, 0.456, 0.406] 2020-06-19 10:08:37,856-INFO: image_shape : [3, 224, 224] 2020-06-19 10:08:37,856-INFO: image_std : [0.229, 0.224, 0.225] 2020-06-19 10:08:37,856-INFO: interpolation : None 2020-06-19 10:08:37,856-INFO: is_profiler : False 2020-06-19 10:08:37,856-INFO: l2_decay : 7e-05 2020-06-19 10:08:37,856-INFO: label_smoothing_epsilon : 0.1 2020-06-19 10:08:37,856-INFO: lower_ratio : 0.75 2020-06-19 10:08:37,856-INFO: lower_scale : 0.08 2020-06-19 10:08:37,856-INFO: lr : 0.1 2020-06-19 10:08:37,856-INFO: lr_strategy : cosine_decay 2020-06-19 10:08:37,856-INFO: max_iter : 0 2020-06-19 10:08:37,856-INFO: mixup_alpha : 0.2 2020-06-19 10:08:37,856-INFO: model : ResNet50_vd 2020-06-19 10:08:37,856-INFO: model_save_dir : output/ 2020-06-19 10:08:37,856-INFO: momentum_rate : 0.9 2020-06-19 10:08:37,856-INFO: num_epochs : 200 2020-06-19 10:08:37,857-INFO: padding_type : SAME 2020-06-19 10:08:37,857-INFO: pretrained_model : None 2020-06-19 10:08:37,857-INFO: print_step : 10 2020-06-19 10:08:37,857-INFO: profiler_path : ./profilier_files 2020-06-19 10:08:37,857-INFO: random_seed : None 2020-06-19 10:08:37,857-INFO: reader_buf_size : 8 2020-06-19 10:08:37,857-INFO: reader_thread : 8 2020-06-19 10:08:37,857-INFO: resize_short_size : 256 2020-06-19 10:08:37,857-INFO: same_feed : 0 2020-06-19 10:08:37,857-INFO: save_step : 1 2020-06-19 10:08:37,857-INFO: scale_loss : 1.0 2020-06-19 10:08:37,857-INFO: step_epochs : [30, 60, 90] 2020-06-19 10:08:37,857-INFO: test_batch_size : 8 2020-06-19 10:08:37,857-INFO: total_images : 186 2020-06-19 10:08:37,857-INFO: upper_ratio : 1.3333333333333333 2020-06-19 10:08:37,857-INFO: use_aa : False 2020-06-19 10:08:37,857-INFO: use_dali : False 2020-06-19 10:08:37,857-INFO: use_dynamic_loss_scaling : True 2020-06-19 10:08:37,857-INFO: use_ema : False 2020-06-19 10:08:37,857-INFO: use_fp16 : False 2020-06-19 10:08:37,857-INFO: use_gpu : True 2020-06-19 10:08:37,858-INFO: use_label_smoothing : 1 2020-06-19 10:08:37,858-INFO: use_mixup : 1 2020-06-19 10:08:37,858-INFO: use_se : True 2020-06-19 10:08:37,858-INFO: validate : 1 2020-06-19 10:08:37,858-INFO: warm_up_epochs : 5.0 2020-06-19 10:08:37,858-INFO: ---------------------------------------------------- W0619 10:08:39.362601 5105 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.0 W0619 10:08:39.366940 5105 device_context.cc:260] device: 0, cuDNN Version: 7.6. 2020-06-19 10:08:41,431-WARNING: img(./data/mask/train/0/aada8594b5c91353a100d936490ecd3d.jpg) is None, pass it. 2020-06-19 10:08:41,884-INFO: [Pass 0, train batch 0] loss 0.65583, lr 0.10000, elapse 0.4723 sec /usr/local/lib64/python3.6/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "train.py", line 304, in main() File "train.py", line 300, in main train(args) File "train.py", line 250, in train fetch_list=train_fetch_list) File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/executor.py", line 1071, in run six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.6/site-packages/six.py", line 703, in reraise raise value File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/executor.py", line 1066, in run return_merged=return_merged) File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/executor.py", line 1167, in _run_impl return_merged=return_merged) File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/executor.py", line 879, in _run_parallel tensors = exe.run(fetch_var_names, return_merged)._move_to_list() paddle.fluid.core_avx.EnforceNotMet:


C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) 2 paddle::operators::CUDNNConvOpKernel::Compute(paddle::framework::ExecutionContext const&) const 3 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvOpKernel, paddle::operators::CUDNNConvOpKernel, paddle::operators::CUDNNConvOpKernelpaddle::platform::float16 >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1 (closed)}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 4 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 6 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 7 paddle::framework::details::ComputationOpHandle::RunImpl() 8 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*) 9 paddle::framework::details::FastThreadedSSAGraphExecutor::RunTracedOps(std::vector<paddle::framework::details::OpHandleBase*, std::allocatorpaddle::framework::details::OpHandleBase* > const&) 10 paddle::framework::details::FastThreadedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, bool) 11 paddle::framework::details::ScopeBufferedMonitor::Apply(std::function<void ()> const&, bool) 12 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, bool) 13 paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&, bool)


Python Call Stacks (More useful to users):

File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/framework.py", line 2610, in append_op attrs=kwargs.get("attrs", None)) File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib64/python3.6/site-packages/paddle/fluid/layers/nn.py", line 2933, in conv2d "data_format": data_format, File "/host/Documents/models-release-1.8/models-release-1.8/PaddleCV/image_classification/models/resnet_vd.py", line 146, in conv_bn_layer bias_attr=False) File "/host/Documents/models-release-1.8/models-release-1.8/PaddleCV/image_classification/models/resnet_vd.py", line 67, in net name='conv1_1') File "/host/Documents/models-release-1.8/models-release-1.8/PaddleCV/image_classification/build_model.py", line 98, in _mixup_model net_out = model.net(input=image, class_dim=args.class_dim) File "/host/Documents/models-release-1.8/models-release-1.8/PaddleCV/image_classification/build_model.py", line 125, in create_model loss_out = _mixup_model(data, model, args, is_train) File "train.py", line 65, in build_program data_loader, loss_out = create_model(model, args, is_train) File "train.py", line 166, in train args=args) File "train.py", line 300, in main train(args) File "train.py", line 304, in main()


Error Message Summary:

ExternalError: Cudnn error, CUDNN_STATUS_EXECUTION_FAILED at (/paddle/paddle/fluid/operators/conv_cudnn_op.cu:300) [operator < conv2d > error]

然后使用 image 检测了环境,没有报错,数据集使用的口罩分类的图片。求解决!

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/models#4710
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7