Skip to content

  • 体验新版
    • 正在加载...
  • 登录
  • PaddlePaddle
  • models
  • Issue
  • #4064

M
models
  • 项目概览

PaddlePaddle / models
大约 2 年 前同步成功

通知 232
Star 6828
Fork 2962
  • 代码
    • 文件
    • 提交
    • 分支
    • Tags
    • 贡献者
    • 分支图
    • Diff
  • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
  • Wiki 0
    • Wiki
  • 分析
    • 仓库
    • DevOps
  • 项目成员
  • Pages
M
models
  • 项目概览
    • 项目概览
    • 详情
    • 发布
  • 仓库
    • 仓库
    • 文件
    • 提交
    • 分支
    • 标签
    • 贡献者
    • 分支图
    • 比较
  • Issue 602
    • Issue 602
    • 列表
    • 看板
    • 标记
    • 里程碑
  • 合并请求 255
    • 合并请求 255
  • Pages
  • 分析
    • 分析
    • 仓库分析
    • DevOps
  • Wiki 0
    • Wiki
  • 成员
    • 成员
  • 收起侧边栏
  • 动态
  • 分支图
  • 创建新Issue
  • 提交
  • Issue看板
已关闭
开放中
Opened 12月 12, 2019 by saxon_zh@saxon_zhGuest

ERROR 2019-12-12 04:02:41,061 launch.py:269] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0, 2] was aborted. Please check its log.

Created by: yanmeizhao

本地环境:cuda10,cudnn: 7.6.3, models的develop分支最新代码,paddle1.6.1,gcc version: 5.4.0

在用DALI训练image_classification时报错: [root@5d564e9b351a image_classification]# sh train_dali.sh ----------- Configuration Arguments ----------- cluster_node_ips: 127.0.0.1 log_dir: None node_ip: 127.0.0.1 print_config: True selected_gpus: None started_port: 6170 training_script: train.py training_script_args: ['--model=ResNet50', '--batch_size=32', '--lr_strategy=cosine_decay_warmup', '--num_epochs=240', '--lr=0.05', '--l2_decay=3e-5', '--lower_scale=0.64', '--lower_ratio=0.8', '--upper_ratio=1.2', '--use_dali=True'] use_paddlecloud: True

trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4 ------------- Configuration Arguments ------------- batch_size : 32 checkpoint : None class_dim : 1000 data_dir : ./data/ILSVRC2012/ decay_epochs : 2.4 decay_rate : 0.97 drop_connect_rate : 0.2 ema_decay : 0.9999 enable_ce : False image_mean : [0.485, 0.456, 0.406] image_shape : [3, 224, 224] image_std : [0.229, 0.224, 0.225] interpolation : None is_profiler : 0 l2_decay : 3e-05 label_smoothing_epsilon : 0.1 lower_ratio : 0.8 lower_scale : 0.64 lr : 0.05 lr_strategy : cosine_decay_warmup max_iter : 0 mixup_alpha : 0.2 model : ResNet50 model_save_dir : ./output momentum_rate : 0.9 num_epochs : 240 padding_type : SAME pretrained_model : None print_step : 10 profiler_path : ./ random_seed : None reader_buf_size : 2048 reader_thread : 8 resize_short_size : 256 same_feed : 0 save_step : 1 step_epochs : [30, 60, 90] test_batch_size : 16 total_images : 1281167 upper_ratio : 1.2 use_aa : False use_dali : 1 use_ema : False use_gpu : True use_label_smoothing : False use_mixup : False use_se : True validate : 1 warm_up_epochs : 5.0

W1212 04:02:31.290369 24045 device_context.cc:235] Please NOTE: device: 3, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0 W1212 04:02:31.302067 24045 device_context.cc:243] device: 3, cuDNN Version: 7.6. W1212 04:02:31.330199 24044 device_context.cc:235] Please NOTE: device: 2, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0 W1212 04:02:31.333243 24043 device_context.cc:235] Please NOTE: device: 1, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0 W1212 04:02:31.337818 24044 device_context.cc:243] device: 2, cuDNN Version: 7.6. W1212 04:02:31.340380 24043 device_context.cc:243] device: 1, cuDNN Version: 7.6. W1212 04:02:31.444458 24042 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0 W1212 04:02:31.451699 24042 device_context.cc:243] device: 0, cuDNN Version: 7.6. W1212 04:02:33.282218 24043 init.cc:205] *** Aborted at 1576123353 (unix time) try "date -d @1576123353" if you are using GNU date *** W1212 04:02:33.285073 24043 init.cc:205] PC: @ 0x0 (unknown) W1212 04:02:33.285284 24043 init.cc:205] *** SIGSEGV (@0x0) received by PID 24043 (TID 0x7fe6e3b9a740) from PID 0; stack trace: *** W1212 04:02:33.287878 24043 init.cc:205] @ 0x7fe6e37785d0 (unknown) W1212 04:02:33.288434 24043 init.cc:205] @ 0x7fe68db87e3b (unknown) W1212 04:02:33.288965 24043 init.cc:205] @ 0x7fe68dbc21df (unknown) W1212 04:02:33.289500 24043 init.cc:205] @ 0x7fe68db91522 (unknown) W1212 04:02:33.290030 24043 init.cc:205] @ 0x7fe68dbc1f02 PyInit_backend_impl W1212 04:02:33.291326 24043 init.cc:205] @ 0x559a8cfd08e5 _PyImport_LoadDynamicModuleWithSpec W1212 04:02:33.292248 24043 init.cc:205] @ 0x559a8cfd0ae5 _imp_create_dynamic W1212 04:02:33.293475 24043 init.cc:205] @ 0x559a8cecca61 PyCFunction_Call W1212 04:02:33.294788 24043 init.cc:205] @ 0x559a8cf80fdb _PyEval_EvalFrameDefault W1212 04:02:33.295608 24043 init.cc:205] @ 0x559a8cf52a94 _PyEval_EvalCodeWithName W1212 04:02:33.296420 24043 init.cc:205] @ 0x559a8cf53941 fast_function W1212 04:02:33.297256 24043 init.cc:205] @ 0x559a8cf59755 call_function W1212 04:02:33.298564 24043 init.cc:205] @ 0x559a8cf7bcba _PyEval_EvalFrameDefault W1212 04:02:33.299376 24043 init.cc:205] @ 0x559a8cf5370b fast_function W1212 04:02:33.300211 24043 init.cc:205] @ 0x559a8cf59755 call_function W1212 04:02:33.301522 24043 init.cc:205] @ 0x559a8cf7bcba _PyEval_EvalFrameDefault W1212 04:02:33.302337 24043 init.cc:205] @ 0x559a8cf5370b fast_function W1212 04:02:33.303174 24043 init.cc:205] @ 0x559a8cf59755 call_function W1212 04:02:33.304481 24043 init.cc:205] @ 0x559a8cf7bcba _PyEval_EvalFrameDefault W1212 04:02:33.305295 24043 init.cc:205] @ 0x559a8cf5370b fast_function W1212 04:02:33.306128 24043 init.cc:205] @ 0x559a8cf59755 call_function W1212 04:02:33.307440 24043 init.cc:205] @ 0x559a8cf7bcba _PyEval_EvalFrameDefault W1212 04:02:33.308254 24043 init.cc:205] @ 0x559a8cf5370b fast_function W1212 04:02:33.309087 24043 init.cc:205] @ 0x559a8cf59755 call_function W1212 04:02:33.310396 24043 init.cc:205] @ 0x559a8cf7bcba _PyEval_EvalFrameDefault W1212 04:02:33.311619 24043 init.cc:205] @ 0x559a8cf53d7b _PyFunction_FastCallDict W1212 04:02:33.312815 24043 init.cc:205] @ 0x559a8cec9f5f _PyObject_FastCallDict W1212 04:02:33.314150 24043 init.cc:205] @ 0x559a8cf0e670 _PyObject_CallMethodIdObjArgs W1212 04:02:33.315428 24043 init.cc:205] @ 0x559a8cec0a70 PyImport_ImportModuleLevelObject W1212 04:02:33.316738 24043 init.cc:205] @ 0x559a8cf7e033 _PyEval_EvalFrameDefault W1212 04:02:33.318001 24043 init.cc:205] @ 0x559a8cf54459 PyEval_EvalCodeEx W1212 04:02:33.319205 24043 init.cc:205] @ 0x559a8cf551ec PyEval_EvalCode W1212 04:02:33.400863 24044 init.cc:205] *** Aborted at 1576123353 (unix time) try "date -d @1576123353" if you are using GNU date *** W1212 04:02:33.403725 24044 init.cc:205] PC: @ 0x0 (unknown) W1212 04:02:33.403934 24044 init.cc:205] *** SIGSEGV (@0x0) received by PID 24044 (TID 0x7f6406bd7740) from PID 0; stack trace: *** W1212 04:02:33.406607 24044 init.cc:205] @ 0x7f64067b55d0 (unknown) W1212 04:02:33.407167 24044 init.cc:205] @ 0x7f63b0bc4e3b (unknown) W1212 04:02:33.407702 24044 init.cc:205] @ 0x7f63b0bff1df (unknown) W1212 04:02:33.408239 24044 init.cc:205] @ 0x7f63b0bce522 (unknown) W1212 04:02:33.408773 24044 init.cc:205] @ 0x7f63b0bfef02 PyInit_backend_impl W1212 04:02:33.410063 24044 init.cc:205] @ 0x55615a8c98e5 _PyImport_LoadDynamicModuleWithSpec W1212 04:02:33.410984 24044 init.cc:205] @ 0x55615a8c9ae5 _imp_create_dynamic W1212 04:02:33.412220 24044 init.cc:205] @ 0x55615a7c5a61 PyCFunction_Call W1212 04:02:33.413537 24044 init.cc:205] @ 0x55615a879fdb _PyEval_EvalFrameDefault W1212 04:02:33.414355 24044 init.cc:205] @ 0x55615a84ba94 _PyEval_EvalCodeWithName W1212 04:02:33.415171 24044 init.cc:205] @ 0x55615a84c941 fast_function W1212 04:02:33.416003 24044 init.cc:205] @ 0x55615a852755 call_function W1212 04:02:33.417318 24044 init.cc:205] @ 0x55615a874cba _PyEval_EvalFrameDefault W1212 04:02:33.418129 24044 init.cc:205] @ 0x55615a84c70b fast_function W1212 04:02:33.418969 24044 init.cc:205] @ 0x55615a852755 call_function W1212 04:02:33.419068 24045 init.cc:205] *** Aborted at 1576123353 (unix time) try "date -d @1576123353" if you are using GNU date *** W1212 04:02:33.420289 24044 init.cc:205] @ 0x55615a874cba _PyEval_EvalFrameDefault W1212 04:02:33.421099 24044 init.cc:205] @ 0x55615a84c70b fast_function W1212 04:02:33.421872 24045 init.cc:205] PC: @ 0x0 (unknown) W1212 04:02:33.421942 24044 init.cc:205] @ 0x55615a852755 call_function W1212 04:02:33.422075 24045 init.cc:205] *** SIGSEGV (@0x0) received by PID 24045 (TID 0x7f8121536740) from PID 0; stack trace: *** W1212 04:02:33.423307 24044 init.cc:205] @ 0x55615a874cba _PyEval_EvalFrameDefault W1212 04:02:33.424119 24044 init.cc:205] @ 0x55615a84c70b fast_function W1212 04:02:33.424721 24045 init.cc:205] @ 0x7f81211145d0 (unknown) W1212 04:02:33.424958 24044 init.cc:205] @ 0x55615a852755 call_function W1212 04:02:33.425238 24045 init.cc:205] @ 0x7f80c0f8de3b (unknown) W1212 04:02:33.425730 24045 init.cc:205] @ 0x7f80c0fc81df (unknown) W1212 04:02:33.426224 24045 init.cc:205] @ 0x7f80c0f97522 (unknown) W1212 04:02:33.426277 24044 init.cc:205] @ 0x55615a874cba _PyEval_EvalFrameDefault W1212 04:02:33.426712 24045 init.cc:205] @ 0x7f80c0fc7f02 PyInit_backend_impl W1212 04:02:33.427100 24044 init.cc:205] @ 0x55615a84c70b fast_function W1212 04:02:33.427959 24044 init.cc:205] @ 0x55615a852755 call_function W1212 04:02:33.428014 24045 init.cc:205] @ 0x5573e9a5f8e5 _PyImport_LoadDynamicModuleWithSpec W1212 04:02:33.428985 24045 init.cc:205] @ 0x5573e9a5fae5 _imp_create_dynamic W1212 04:02:33.429309 24044 init.cc:205] @ 0x55615a874cba _PyEval_EvalFrameDefault W1212 04:02:33.430250 24045 init.cc:205] @ 0x5573e995ba61 PyCFunction_Call W1212 04:02:33.430562 24044 init.cc:205] @ 0x55615a84cd7b _PyFunction_FastCallDict W1212 04:02:33.431604 24045 init.cc:205] @ 0x5573e9a0ffdb _PyEval_EvalFrameDefault W1212 04:02:33.431788 24044 init.cc:205] @ 0x55615a7c2f5f _PyObject_FastCallDict W1212 04:02:33.432447 24045 init.cc:205] @ 0x5573e99e1a94 _PyEval_EvalCodeWithName W1212 04:02:33.433146 24044 init.cc:205] @ 0x55615a807670 _PyObject_CallMethodIdObjArgs W1212 04:02:33.433285 24045 init.cc:205] @ 0x5573e99e2941 fast_function W1212 04:02:33.434139 24045 init.cc:205] @ 0x5573e99e8755 call_function W1212 04:02:33.434453 24044 init.cc:205] @ 0x55615a7b9a70 PyImport_ImportModuleLevelObject W1212 04:02:33.435487 24045 init.cc:205] @ 0x5573e9a0acba _PyEval_EvalFrameDefault W1212 04:02:33.435796 24044 init.cc:205] @ 0x55615a877033 _PyEval_EvalFrameDefault W1212 04:02:33.436323 24045 init.cc:205] @ 0x5573e99e270b fast_function W1212 04:02:33.437090 24044 init.cc:205] @ 0x55615a84d459 PyEval_EvalCodeEx W1212 04:02:33.437180 24045 init.cc:205] @ 0x5573e99e8755 call_function W1212 04:02:33.438321 24044 init.cc:205] @ 0x55615a84e1ec PyEval_EvalCode W1212 04:02:33.438518 24045 init.cc:205] @ 0x5573e9a0acba _PyEval_EvalFrameDefault W1212 04:02:33.439347 24045 init.cc:205] @ 0x5573e99e270b fast_function W1212 04:02:33.440203 24045 init.cc:205] @ 0x5573e99e8755 call_function W1212 04:02:33.441527 24045 init.cc:205] @ 0x5573e9a0acba _PyEval_EvalFrameDefault W1212 04:02:33.442353 24045 init.cc:205] @ 0x5573e99e270b fast_function W1212 04:02:33.443202 24045 init.cc:205] @ 0x5573e99e8755 call_function W1212 04:02:33.444519 24045 init.cc:205] @ 0x5573e9a0acba _PyEval_EvalFrameDefault W1212 04:02:33.445341 24045 init.cc:205] @ 0x5573e99e270b fast_function W1212 04:02:33.446188 24045 init.cc:205] @ 0x5573e99e8755 call_function W1212 04:02:33.447504 24045 init.cc:205] @ 0x5573e9a0acba _PyEval_EvalFrameDefault W1212 04:02:33.448730 24045 init.cc:205] @ 0x5573e99e2d7b _PyFunction_FastCallDict W1212 04:02:33.449932 24045 init.cc:205] @ 0x5573e9958f5f _PyObject_FastCallDict W1212 04:02:33.451252 24045 init.cc:205] @ 0x5573e999d670 _PyObject_CallMethodIdObjArgs W1212 04:02:33.452535 24045 init.cc:205] @ 0x5573e994fa70 PyImport_ImportModuleLevelObject W1212 04:02:33.453855 24045 init.cc:205] @ 0x5573e9a0d033 _PyEval_EvalFrameDefault W1212 04:02:33.455127 24045 init.cc:205] @ 0x5573e99e3459 PyEval_EvalCodeEx W1212 04:02:33.456331 24045 init.cc:205] @ 0x5573e99e41ec PyEval_EvalCode W1212 04:02:33.783942 24042 init.cc:205] *** Aborted at 1576123353 (unix time) try "date -d @1576123353" if you are using GNU date *** W1212 04:02:33.786825 24042 init.cc:205] PC: @ 0x0 (unknown) W1212 04:02:33.787039 24042 init.cc:205] *** SIGSEGV (@0x0) received by PID 24042 (TID 0x7efd88a12740) from PID 0; stack trace: *** W1212 04:02:33.789687 24042 init.cc:205] @ 0x7efd885f05d0 (unknown) W1212 04:02:33.790251 24042 init.cc:205] @ 0x7efd329ffe3b (unknown) W1212 04:02:33.790787 24042 init.cc:205] @ 0x7efd32a3a1df (unknown) W1212 04:02:33.791326 24042 init.cc:205] @ 0x7efd32a09522 (unknown) W1212 04:02:33.791860 24042 init.cc:205] @ 0x7efd32a39f02 PyInit_backend_impl W1212 04:02:33.793159 24042 init.cc:205] @ 0x55fcbae418e5 _PyImport_LoadDynamicModuleWithSpec W1212 04:02:33.794075 24042 init.cc:205] @ 0x55fcbae41ae5 _imp_create_dynamic W1212 04:02:33.795310 24042 init.cc:205] @ 0x55fcbad3da61 PyCFunction_Call W1212 04:02:33.796627 24042 init.cc:205] @ 0x55fcbadf1fdb _PyEval_EvalFrameDefault W1212 04:02:33.797446 24042 init.cc:205] @ 0x55fcbadc3a94 _PyEval_EvalCodeWithName W1212 04:02:33.798260 24042 init.cc:205] @ 0x55fcbadc4941 fast_function W1212 04:02:33.799093 24042 init.cc:205] @ 0x55fcbadca755 call_function W1212 04:02:33.800408 24042 init.cc:205] @ 0x55fcbadeccba _PyEval_EvalFrameDefault W1212 04:02:33.801223 24042 init.cc:205] @ 0x55fcbadc470b fast_function W1212 04:02:33.802060 24042 init.cc:205] @ 0x55fcbadca755 call_function W1212 04:02:33.803372 24042 init.cc:205] @ 0x55fcbadeccba _PyEval_EvalFrameDefault W1212 04:02:33.804188 24042 init.cc:205] @ 0x55fcbadc470b fast_function W1212 04:02:33.805022 24042 init.cc:205] @ 0x55fcbadca755 call_function W1212 04:02:33.806337 24042 init.cc:205] @ 0x55fcbadeccba _PyEval_EvalFrameDefault W1212 04:02:33.807153 24042 init.cc:205] @ 0x55fcbadc470b fast_function W1212 04:02:33.807986 24042 init.cc:205] @ 0x55fcbadca755 call_function W1212 04:02:33.809303 24042 init.cc:205] @ 0x55fcbadeccba _PyEval_EvalFrameDefault W1212 04:02:33.810118 24042 init.cc:205] @ 0x55fcbadc470b fast_function W1212 04:02:33.810961 24042 init.cc:205] @ 0x55fcbadca755 call_function W1212 04:02:33.812273 24042 init.cc:205] @ 0x55fcbadeccba _PyEval_EvalFrameDefault W1212 04:02:33.813495 24042 init.cc:205] @ 0x55fcbadc4d7b _PyFunction_FastCallDict W1212 04:02:33.814693 24042 init.cc:205] @ 0x55fcbad3af5f _PyObject_FastCallDict W1212 04:02:33.816009 24042 init.cc:205] @ 0x55fcbad7f670 _PyObject_CallMethodIdObjArgs W1212 04:02:33.817297 24042 init.cc:205] @ 0x55fcbad31a70 PyImport_ImportModuleLevelObject W1212 04:02:33.818611 24042 init.cc:205] @ 0x55fcbadef033 _PyEval_EvalFrameDefault W1212 04:02:33.819876 24042 init.cc:205] @ 0x55fcbadc5459 PyEval_EvalCodeEx W1212 04:02:33.821080 24042 init.cc:205] @ 0x55fcbadc61ec PyEval_EvalCode ERROR 2019-12-12 04:02:41,061 launch.py:269] ABORT!!! Out of all 4 trainers, the trainer process with rank=[0, 2] was aborted. Please check its log.

指派人
分配到
无
里程碑
无
分配里程碑
工时统计
无
截止日期
无
标识: paddlepaddle/models#4064
渝ICP备2023009037号

京公网安备11010502055752号

网络110报警服务 Powered by GitLab CE v13.7
开源知识
Git 入门 Pro Git 电子书 在线学 Git
Markdown 基础入门 IT 技术知识开源图谱
帮助
使用手册 反馈建议 博客
《GitCode 隐私声明》 《GitCode 服务条款》 关于GitCode
Powered by GitLab CE v13.7