PP-YOLO减小batch_size后因不明原因挂掉
Created by: baiyfbupt
- paddle 版本: paddlepaddle-gpu==1.8.4.post97
- paddledetection 版本: 最新master分支,commit id: 1394adde
- 机器环境:8卡P40
因为P40显存不足以使用原PP-YOLO配置的batch_size训练,试图将bs从24调到12,但遇到如下错误:
➜ PaddleDetection git:(master) ✗ CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c configs/ppyolo/ppyolo.yml --eval
2020-08-18 16:47:19,342-INFO: If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000500] in Optimizer will not take effect, and it will only be applied to other Parameters!
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
2020-08-18 16:47:26,358-INFO: places would be ommited when DataLoader is not iterable
W0818 16:47:26.503942 24754 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 9.0
W0818 16:47:26.508926 24754 device_context.cc:260] device: 0, cuDNN Version: 7.5.
W0818 16:47:28.675894 24754 device_context.h:155] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.6, but CUDNN version in your machine is 7.5, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version.
2020-08-18 16:47:30,834-WARNING: /root/.cache/paddle/weights/ResNet50_vd_ssld_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
/usr/local/lib/python3.7/site-packages/paddle/fluid/io.py:1998: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.w_0 fc_0.b_0
format(" ".join(unused_para_list)))
loading annotations into memory...
Done (t=21.63s)
creating index...
index created!
2020-08-18 16:47:55,600-WARNING: Found an invalid bbox in annotations: im_id: 200365, area: 0.0 x1: 296.65, y1: 388.33, x2: 296.67999999999995, y2: 388.33.
2020-08-18 16:48:02,799-WARNING: Found an invalid bbox in annotations: im_id: 550395, area: 0.0 x1: 9.98, y1: 188.56, x2: 14.52, y2: 188.56.
2020-08-18 16:48:12,570-INFO: places would be ommited when DataLoader is not iterable
I0818 16:48:42.298118 24754 build_strategy.cc:361] set enable_sequential_execution:1
W0818 16:48:43.948408 24754 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 240. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 171.
*** Error in `python': corrupted double-linked list: 0x00007fa3fc15d750 ***
W0818 16:50:59.118785 25205 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0818 16:50:59.118944 25205 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0818 16:50:59.118958 25205 init.cc:231] The detail failure signal is:
W0818 16:50:59.118989 25205 init.cc:234] *** Aborted at 1597740659 (unix time) try "date -d @1597740659" if you are using GNU date ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7faa208967e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x82970)[0x7faa208a1970]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7faa208a3184]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(_Znwm+0x18)[0x7faa16f64e78]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6google8protobuf8internal14ArenaStringPtr21CreateInstanceNoArenaEPKSs+0x22)[0x7fa9706edf22]
W0818 16:50:59.121510 25205 init.cc:234] PC: @ 0x0 (unknown)
W0818 16:50:59.121645 25205 init.cc:234] *** SIGSEGV (@0x38) received by PID 24754 (TID 0x7fa485758700) from PID 56; stack trace: ***
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle8platform5proto11MessageDesc27MergePartialFromCodedStreamEPN6google8protobuf2io16CodedInputStreamE+0x1b2)[0x7fa973e2e982]
W0818 16:50:59.123709 25205 init.cc:234] @ 0x7faa20bfa390 (unknown)
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle8platform5proto14AllMessageDesc27MergePartialFromCodedStreamEPN6google8protobuf2io16CodedInputStreamE+0x1b1)[0x7fa973e2f1a1]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle8platform5proto13cudaerrorDesc27MergePartialFromCodedStreamEPN6google8protobuf2io16CodedInputStreamE+0x12d)[0x7fa973e2fc0d]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6google8protobuf11MessageLite20ParseFromCodedStreamEPNS0_2io16CodedInputStreamE+0x20)[0x7fa97370f060]
W0818 16:50:59.127470 25205 init.cc:234] @ 0x7fa973e2ea7b paddle::platform::proto::AllMessageDesc::Clear()
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6google8protobuf11MessageLite23ParseFromZeroCopyStreamEPNS0_2io19ZeroCopyInputStreamE+0x9a)[0x7fa97370f20a]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6google8protobuf7Message16ParseFromIstreamEPSi+0x29)[0x7fa973715969]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle8platform22build_nvidia_error_msgE9cudaError+0x722)[0x7fa9709472d2]
W0818 16:50:59.131649 25205 init.cc:234] @ 0x7fa973e2e9f6 paddle::platform::proto::cudaerrorDesc::Clear()
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNK6paddle8platform6stream10CUDAStream4WaitEv+0x33)[0x7fa973dc3533]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNK6paddle8platform17CUDADeviceContext4WaitEv+0x25)[0x7fa970a07655]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle9framework15TransDataDeviceERKNS0_6TensorERKNS_8platform5PlaceEPS1_+0x136)[0x7fa973d28186]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle9framework13TransformDataERKNS0_12OpKernelTypeES3_RKNS0_6TensorEPS4_+0x3ce)[0x7fa973d26dfe]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNK6paddle9framework18OperatorWithKernel11PrepareDataERKNS0_5ScopeERKNS0_12OpKernelTypeEPSt6vectorISsSaISsEEPNS0_14RuntimeContextE+0x515)[0x7fa973d023a5]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNK6paddle9framework18OperatorWithKernel7RunImplERKNS0_5ScopeERKNS_8platform5PlaceEPNS0_14RuntimeContextE+0x11e)[0x7fa973d0392e]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNK6paddle9framework18OperatorWithKernel7RunImplERKNS0_5ScopeERKNS_8platform5PlaceE+0x211)[0x7fa973d042b1]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle9framework12OperatorBase3RunERKNS0_5ScopeERKNS_8platform5PlaceE+0x171)[0x7fa973cfd271]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle9framework7details19ComputationOpHandle7RunImplEv+0xa6)[0x7fa973a0cb56]
W0818 16:50:59.142904 25205 init.cc:234] @ 0x7fa97370f054 google::protobuf::MessageLite::ParseFromCodedStream()
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle9framework7details28FastThreadedSSAGraphExecutor9RunOpSyncEPNS1_12OpHandleBaseE+0x111)[0x7fa9739b2d31]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZN6paddle9framework7details28FastThreadedSSAGraphExecutor5RunOpEPNS1_12OpHandleBaseERKSt10shared_ptrINS0_13BlockingQueueImEEEPm+0x2f)[0x7fa9739b082f]
W0818 16:50:59.146567 25205 init.cc:234] @ 0x7fa97370f20a google::protobuf::MessageLite::ParseFromZeroCopyStream()
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(+0x505baf4)[0x7fa9739b0af4]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_EvEEE9_M_invokeERKSt9_Any_data+0x23)[0x7fa97092b603]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZNSt13__future_base11_State_base9_M_do_setERSt8functionIFSt10unique_ptrINS_12_Result_baseENS3_8_DeleterEEvEERb+0x27)[0x7fa9707252f7]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xea99)[0x7faa20bf7a99]
W0818 16:50:59.149616 25205 init.cc:234] @ 0x7fa973715969 google::protobuf::Message::ParseFromIstream()
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(+0x5057cc2)[0x7fa9739accc2]
/usr/local/lib/python3.7/site-packages/paddle/fluid/core_avx.so(_ZZN10ThreadPoolC1EmENKUlvE_clEv+0x194)[0x7fa970727754]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7faa16f8fc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7faa20bf06ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7faa2092641d]
======= Memory map: ========
00400000-00401000 r-xp 00000000 fc:01 3426090 /usr/local/bin/python3.7
00600000-00601000 r--p 00000000 fc:01 3426090 /usr/local/bin/python3.7
00601000-00602000 rw-p 00001000 fc:01 3426090 /usr/local/bin/python3.7
00a81000-1fffec000 rw-p 00000000 00:00 0 [heap]
200000000-200200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
200200000-200400000 ---p 00000000 00:00 0
200400000-200600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
200600000-202600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
202600000-205600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
205600000-206400000 ---p 00000000 00:00 0
206400000-206600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
206600000-206800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
206800000-206a00000 rw-s 206800000 00:20 199250 /dev/nvidia-uvm
206a00000-206c00000 ---p 00000000 00:00 0
206c00000-206e00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
206e00000-207000000 ---p 00000000 00:00 0
207000000-207200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
207200000-207400000 ---p 00000000 00:00 0
207400000-207600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
207600000-209600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
209600000-20c600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
20c600000-20d400000 ---p 00000000 00:00 0
20d400000-20d600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
20d600000-20d800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
20d800000-20da00000 rw-s 20d800000 00:20 199250 /dev/nvidia-uvm
20da00000-20dc00000 ---p 00000000 00:00 0
20dc00000-20de00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
20de00000-20e000000 ---p 00000000 00:00 0
20e000000-20e200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
20e200000-20e400000 ---p 00000000 00:00 0
20e400000-20e600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
20e600000-210600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
210600000-213600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
213600000-214400000 ---p 00000000 00:00 0
214400000-214600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
214600000-214800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
214800000-214a00000 rw-s 214800000 00:20 199250 /dev/nvidia-uvm
214a00000-214c00000 ---p 00000000 00:00 0
214c00000-214e00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
214e00000-215000000 ---p 00000000 00:00 0
215000000-215200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
215200000-215400000 ---p 00000000 00:00 0
215400000-215600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
215600000-217600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
217600000-21a600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21a600000-21b400000 ---p 00000000 00:00 0
21b400000-21b600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21b600000-21b800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21b800000-21ba00000 rw-s 21b800000 00:20 199250 /dev/nvidia-uvm
21ba00000-21bc00000 ---p 00000000 00:00 0
21bc00000-21be00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21be00000-21c000000 ---p 00000000 00:00 0
21c000000-21c200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21c200000-21c400000 ---p 00000000 00:00 0
21c400000-21c600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21c600000-21e600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
21e600000-221600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
221600000-222400000 ---p 00000000 00:00 0
222400000-222600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
222600000-222800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
222800000-222a00000 rw-s 222800000 00:20 199250 /dev/nvidia-uvm
222a00000-222c00000 ---p 00000000 00:00 0
222c00000-222e00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
222e00000-223000000 ---p 00000000 00:00 0
223000000-223200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
223200000-223400000 ---p 00000000 00:00 0
223400000-223600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
223600000-225600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
225600000-228600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
228600000-229400000 ---p 00000000 00:00 0
229400000-229600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
229600000-229800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
229800000-229a00000 rw-s 229800000 00:20 199250 /dev/nvidia-uvm
229a00000-229c00000 ---p 00000000 00:00 0
229c00000-229e00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
229e00000-22a000000 ---p 00000000 00:00 0
22a000000-22a200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
22a200000-22a400000 ---p 00000000 00:00 0
22a400000-22a600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
22a600000-22c600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
22c600000-22f600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
22f600000-230400000 ---p 00000000 00:00 0
230400000-230600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
230600000-230800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
230800000-230a00000 rw-s 230800000 00:20 199250 /dev/nvidia-uvm
230a00000-230c00000 ---p 00000000 00:00 0
230c00000-230e00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
230e00000-231000000 ---p 00000000 00:00 0
231000000-231200000 rw-s 00000000 00:20 199260 /dev/nvidiactl
231200000-231400000 ---p 00000000 00:00 0
231400000-231600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
231600000-233600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
233600000-236600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
236600000-237400000 ---p 00000000 00:00 0
237400000-237600000 rw-s 00000000 00:20 199260 /dev/nvidiactl
237600000-237800000 rw-s 00000000 00:20 199260 /dev/nvidiactl
237800000-237a00000 rw-s 237800000 00:20 199250 /dev/nvidia-uvm
237a00000-237c00000 ---p 00000000 00:00 0
237c00000-237e00000 rw-s 00000000 00:20 199260 /dev/nvidiactl
237e00000-a00200000 ---p 00000000 00:00 0
10000000000-11020000000 ---p 00000000 00:00 0
7f855cbfe000-7f855cbff000 rw-p 00000000 00:00 0
7f855cbff000-7f857bfff000 rw-p 00000000 00:00 0
7f857bfff000-7f857c000000 rw-p 00000000 00:00 0
7f857c000000-7f868c000000 ---p 00000000 00:00 0
7f868c000000-7f868cf4f000 rw-p 00000000 00:00 0
7f868cf4f000-7f8690000000 ---p 00000000 00:00 0
7f8692000000-7f87c0000000 ---p 00000000 00:00 0
7f87c0000000-7f87c3d3a000 rw-p 00000000 00:00 0
7f87c3d3a000-7f87c4000000 ---p 00000000 00:00 0
7f87c6000000-7f88c6000000 ---p 00000000 00:00 0
7f88c8000000-7f88cc000000 ---p 00000000 00:00 0
7f88d2000000-7f8a8e000000 ---p 00000000 00:00 0
7f8a92000000-7f8d02000000 ---p 00000000 00:00 0
7f8d04000000-7f8d0c000000 ---p 00000000 00:00 0
7f8d0e000000-7f8d88000000 ---p 00000000 00:00 0
7f8d8a000000-7f8d98000000 ---p 00000000 00:00 0
7f8d98bfe000-7f8d98bff000 rw-p 00000000 00:00 0
7f8d98bff000-7f8db7fff000 rw-p 00000000 00:00 0
7f8db7fff000-7f8db8000000 rw-p 00000000 00:00 0
7f8db8000000-7f8e3c000000 ---p 00000000 00:00 0
7f8e3e000000-7f8eca000000 ---p 00000000 00:00 0
7f8ecc000000-7f8f70000000 ---p 00000000 00:00 0
7f8f72000000-7f91e4000000 ---p 00000000 00:00 0
7f91e6000000-7f9562000000 ---p 00000000 00:00 0
7f9564000000-7f956c000000 ---p 00000000 00:00 0
7f956e000000-7f9588000000 ---p 00000000 00:00 0
7f9588bfe000-7f9588bff000 rw-p 00000000 00:00 0
7f9588bff000-7f95a7fff000 rw-p 00000000 00:00 0
7f95a7fff000-7f95a8000000 rw-p 00000000 00:00 0
7f95a8000000-7f9974000000 ---p 00000000 00:00 0
7f9976000000-7f9f84000000 ---p 00000000 00:00 0
7f9f86000000-7fa1d2000000 ---p 00000000 00:00 0
7fa1d2bfe000-7fa1d2bff000 rw-p 00000000 00:00 0
7fa1d2bff000-7fa1f1fff000 rw-p 00000000 00:00 0
7fa1f1fff000-7fa1f2000000 rw-p 00000000 00:00 0
7fa1f2000000-7fa212000000 ---p 00000000 00:00 0
7fa214000000-7fa250000000 ---p 00000000 00:00 0
7fa250bfe000-7fa250bff000 rw-p 00000000 00:00 0
7fa250bff000-7fa26ffff000 rw-p 00000000 00:00 0
7fa26ffff000-7fa270000000 rw-p 00000000 00:00 0
7fa270000000-7fa273d3a000 rw-p 00000000 00:00 0
7fa273d3a000-7fa274000000 ---p 00000000 00:00 0
7fa276000000-7fa278000000 ---p 00000000 00:00 0
7fa278000000-7fa27bd3a000 rw-p 00000000 00:00 0
7fa27bd3a000-7fa27c000000 ---p 00000000 00:00 0
7fa27e000000-7fa296000000 ---p 00000000 00:00 0
7fa298000000-7fa2c0000000 ---p 00000000 00:00 0
7fa2c2000000-7fa2e8000000 ---p 00000000 00:00 0
7fa2e8bfe000-7fa2e8bff000 rw-p 00000000 00:00 0
7fa2e8bff000-7fa307fff000 rw-p 00000000 00:00 0
7fa307fff000-7fa308000000 rw-p 00000000 00:00 0
7fa308000000-7fa30bdf4000 rw-p 00000000 00:00 0
7fa30bdf4000-7fa30c000000 ---p 00000000 00:00 0
7fa30c000000-7fa30fe14000 rw-p 00000000 00:00 0
7fa30fe14000-7fa310000000 ---p 00000000 00:00 0
7fa310000000-7fa313dbb000 rw-p 00000000 00:00 0
7fa313dbb000-7fa314000000 ---p 00000000 00:00 0
7fa316000000-7fa32e000000 ---p 00000000 00:00 0
7fa330000000-7fa344000000 ---p 00000000 00:00 0
7fa344000000-7fa347f72000 rw-p 00000000 00:00 0
7fa347f72000-7fa348000000 ---p 00000000 00:00 0
7fa34a000000-7fa34c000000 ---p 00000000 00:00 0
7fa34c000000-7fa36b6e0000 rw-s 00000000 00:03 3637518427 /dev/zero (deleted)
7fa36b6e0000-7fa370000000 ---p 00000000 00:00 0
7fa370000000-7fa373fba000 rw-p 00000000 00:00 0
7fa373fba000-7fa374000000 ---p 00000000 00:00 0
7fa376000000-7fa37c000000 ---p 00000000 00:00 0
7fa37c000000-7fa37c021000 rw-p 00000000 00:00 0
7fa37c021000-7fa380000000 ---p 00000000 00:00 0
7fa380000000-7fa380021000 rw-p 00000000 00:00 0
7fa380021000-7fa384000000 ---p 00000000 00:00 0
7fa384000000-7fa384021000 rw-p 00000000 00:00 0
7fa384021000-7fa388000000 ---p 00000000 00:00 0
7fa388000000-7fa388021000 rw-p 00000000 00:00 0
7fa388021000-7fa38c000000 ---p 00000000 00:00 0
7fa38c000000-7fa38c021000 rw-p 00000000 00:00 0
7fa38c021000-7fa390000000 ---p 00000000 00:00 0
7fa390000000-7fa390021000 rw-p 00000000 00:00 0
7fa390021000-7fa394000000 ---p 00000000 00:00 0
7fa394000000-7fa394021000 rw-p 00000000 00:00 0
7fa394021000-7fa398000000 ---p 00000000 00:00 0
7fa39a000000-7fa39c000000 ---p 00000000 00:00 0
7fa39c000000-7fa39c021000 rw-p 00000000 00:00 0
7fa39c021000-7fa3a0000000 ---p 00000000 00:00 0
7fa3a2000000-7fa3a4000000 ---p 00000000 00:00 0
7fa3a4000000-7fa3a4021000 rw-p 00000000 00:00 0
7fa3a4021000-7fa3a8000000 ---p 00000000 00:00 0
7fa3aa000000-7fa3ac000000 ---p 00000000 00:00 0
7fa3ac000000-7fa3ac175000 rw-p 00000000 00:00 0
7fa3ac175000-7fa3b0000000 ---p 00000000 00:00 0
7fa3b2000000-7fa3b4000000 ---p 00000000 00:00 0
7fa3b4000000-7fa3b417b000 rw-p 00000000 00:00 0
7fa3b417b000-7fa3b8000000 ---p 00000000 00:00 0
7fa3b8000000-7fa3b8021000 rw-p 00000000 00:00 0
7fa3b8021000-7fa3bc000000 ---p 00000000 00:00 0
7fa3bc000000-7fa3bc16e000 rw-p 00000000 00:00 0
7fa3bc16e000-7fa3c0000000 ---p 00000000 00:00 0
7fa3c2491000-7fa3c4000000 rw-p 00000000 00:00 0
7fa3c4000000-7fa3c4021000 rw-p 00000000 00:00 0
7fa3c4021000-7fa3c8000000 ---p 00000000 00:00 0
7fa3c87d8000-7fa3cc000000 rw-p 00000000 00:00 0
7fa3cc000000-7fa3cc161000 rw-p 00000000 00:00 0
7fa3cc161000-7fa3d0000000 ---p 00000000 00:00 0
7fa3d03ec000-7fa3d2000000 rw-p 00000000 00:00 0
7fa3d2000000-7fa3d4000000 ---p 00000000 00:00 0
7fa3d4000000-7fa3d4021000 rw-p 00000000 00:00 0
7fa3d4021000-7fa3d8000000 ---p 00000000 00:00 0
7fa3d8000000-7fa3d8021000 rw-p 00000000 00:00 0
7fa3d8021000-7fa3dc000000 ---p 00000000 00:00 0
7fa3dc000000-7fa3dc021000 rw-p 00000000 00:00 0
7fa3dc021000-7fa3e0000000 ---p 00000000 00:00 0
7fa3e03ec000-7fa3e2000000 rw-p 00000000 00:00 0
7fa3e2000000-7fa3e4000000 ---p 00000000 00:00 0
7fa3e4000000-7fa3e4192000 rw-p 00000000 00:00 0
7fa3e4192000-7fa3e8000000 ---p 00000000 00:00 0
7fa3e813b000-7fa3ea1fd000 rw-p 00000000 00:00 0
7fa3ea1fd000-7fa3ea1fe000 ---p 00000000 00:00 0
7fa3ea1fe000-7fa3eabfe000 rw-p 00000000 00:00 0
7fa3eabfe000-7fa3eabff000 ---p 00000000 00:00 0
7fa3eabff000-7fa3eb5ff000 rw-p 00000000 00:00 0
7fa3eb5ff000-7fa3eb600000 ---p 00000000 00:00 0
7fa3eb600000-7fa3ec000000 rw-p 00000000 00:00 0
7fa3ec000000-7fa3ec021000 rw-p 00000000 00:00 0
7fa3ec021000-7fa3f0000000 ---p 00000000 00:00 0
7fa3f0000000-7fa3f0021000 rw-p 00000000 00:00 0
7fa3f0021000-7fa3f4000000 ---p 00000000 00:00 0
7fa3f4000000-7fa3f4185000 rw-p 00000000 00:00 0
7fa3f4185000-7fa3f8000000 ---p 00000000 00:00 0
7fa3f8000000-7fa3f8171000 rw-p 00000000 00:00 0
7fa3f8171000-7fa3fc000000 ---p 00000000 00:00 0
7fa3fc000000-7fa3fc162000 rw-p 00000000 00:00 0
7fa3fc162000-7fa400000000 ---p 00000000 00:00 0
7fa400346000-7fa4007f4000 rw-p 00000000 00:00 0
7fa4007f4000-7fa4007f5000 ---p 00000000 00:00 0
7fa4007f5000-7fa4011f5000 rw-p 00000000 00:00 0
7fa4011f5000-7fa4011f6000 ---p 00000000 00:00 0
7fa4011f6000-7fa401bf6000 rw-p 00000000 00:00 0
7fa401bf6000-7fa401bf7000 ---p 00000000 00:00 0
7fa401bf7000-7fa4025f7000 rw-p 00000000 00:00 0
7fa4025f7000-7fa4025f8000 ---p 00000000 00:00 0
7fa4025f8000-7fa402ff8000 rw-p 00000000 00:00 0
7fa402ff8000-7fa402ff9000 ---p 00000000 00:00 0
7fa402ff9000-7fa4039f9000 rw-p 00000000 00:00 0
7fa4039f9000-7fa4039fa000 ---p 00000000 00:00 0
7fa4039fa000-7fa4043fa000 rw-p 00000000 00:00 0
7fa4043fa000-7fa4043fb000 ---p 00000000 00:00 0
7fa4043fb000-7fa404dfb000 rw-p 00000000 00:00 0
7fa404dfb000-7fa404dfc000 ---p 00000000 00:00 0
7fa404dfc000-7fa4057fc000 rw-p 00000000 00:00 0
7fa4057fc000-7fa4057fd000 ---p 00000000 00:00 0
7fa4057fd000-7fa4061fd000 rw-p 00000000 00:00 0
7fa4061fd000-7fa4061fe000 ---p 00000000 00:00 0
7fa4061fe000-7fa406bfe000 rw-p 00000000 00:00 0
7fa406bfe000-7fa406bff000 ---p 00000000 00:00 0
7fa406bff000-7fa4075ff000 rw-p 00000000 00:00 0
7fa4075ff000-7fa407600000 ---p 00000000 00:00 0
7fa407600000-7fa408000000 rw-p 00000000 00:00 0
7fa408000000-7fa468000000 ---p 00000000 00:00 0
7fa468000000-7fa46813a000 rw-p 00000000 00:00 0
7fa46813a000-7fa46c000000 ---p 00000000 00:00 0
7fa46c000000-7fa46c021000 rw-p 00000000W0818 16:50:59.152913 25205 init.cc:234] @ 0x7fa9709472d2 paddle::platform::build_nvidia_error_msg()
W0818 16:50:59.157562 25205 init.cc:234] @ 0x7fa973dc3533 paddle::platform::stream::CUDAStream::Wait()
W0818 16:50:59.162530 25205 init.cc:234] @ 0x7fa970a07655 paddle::platform::CUDADeviceContext::Wait()
W0818 16:50:59.166571 25205 init.cc:234] @ 0x7fa973d28186 paddle::framework::TransDataDevice()
W0818 16:50:59.172070 25205 init.cc:234] @ 0x7fa973d26dfe paddle::framework::TransformData()
W0818 16:50:59.174443 25205 init.cc:234] @ 0x7fa973d023a5 paddle::framework::OperatorWithKernel::PrepareData()
W0818 16:50:59.178413 25205 init.cc:234] @ 0x7fa973d0392e paddle::framework::OperatorWithKernel::RunImpl()
W0818 16:50:59.183684 25205 init.cc:234] @ 0x7fa973d042b1 paddle::framework::OperatorWithKernel::RunImpl()
W0818 16:50:59.186779 25205 init.cc:234] @ 0x7fa973cfd271 paddle::framework::OperatorBase::Run()
W0818 16:50:59.190925 25205 init.cc:234] @ 0x7fa973a0cb56 paddle::framework::details::ComputationOpHandle::RunImpl()
W0818 16:50:59.194707 25205 init.cc:234] @ 0x7fa9739b2d31 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
W0818 16:50:59.198913 25205 init.cc:234] @ 0x7fa9739b082f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
W0818 16:50:59.200784 25205 init.cc:234] @ 0x7fa9739b0af4 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data
W0818 16:50:59.205601 25205 init.cc:234] @ 0x7fa97092b603 std::_Function_handler<>::_M_invoke()
W0818 16:50:59.210278 25205 init.cc:234] @ 0x7fa9707252f7 std::__future_base::_State_base::_M_do_set()
W0818 16:50:59.212383 25205 init.cc:234] @ 0x7faa20bf7a99 __pthread_once_slow
W0818 16:50:59.214026 25205 init.cc:234] @ 0x7fa9739accc2 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv
W0818 16:50:59.218747 25205 init.cc:234] @ 0x7fa970727754 _ZZN10ThreadPoolC1EmENKUlvE_clEv
W0818 16:50:59.220773 25205 init.cc:234] @ 0x7faa16f8fc80 (unknown)
W0818 16:50:59.222688 25205 init.cc:234] @ 0x7faa20bf06ba start_thread
W0818 16:50:59.224601 25205 init.cc:234] @ 0x7faa2092641d clone
W0818 16:50:59.226652 25205 init.cc:234] @ 0x0 (unknown)
[1] 24754 segmentation fault CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python tools/train.py -c --eval
用bs=24的配置可以训完一个iter然后报显存不足。用bs=12就会如上log直接挂掉,尝试了其他bs配置也会挂,如下log(bs=8)
2020-08-18 16:57:15,941-WARNING: Found an invalid bbox in annotations: im_id: 200365, area: 0.0 x1: 296.65, y1: 388.33, x2: 296.67999999999995, y2: 388.33.
2020-08-18 16:57:22,808-WARNING: Found an invalid bbox in annotations: im_id: 550395, area: 0.0 x1: 9.98, y1: 188.56, x2: 14.52, y2: 188.56.
W0818 16:57:43.771173 26217 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0818 16:57:43.771234 26217 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0818 16:57:43.771245 26217 init.cc:231] The detail failure signal is:
W0818 16:57:43.771255 26217 init.cc:234] *** Aborted at 1597741063 (unix time) try "date -d @1597741063" if you are using GNU date ***
W0818 16:57:43.772841 26217 init.cc:234] PC: @ 0x0 (unknown)
W0818 16:57:43.773139 26217 init.cc:234] *** SIGBUS (@0x7f7885e39000) received by PID 26217 (TID 0x7f7b86983700) from PID 18446744071660867584; stack trace: ***
W0818 16:57:43.774380 26217 init.cc:234] @ 0x7f7b8600c390 (unknown)
W0818 16:57:43.775625 26217 init.cc:234] @ 0x7f7b85da3af8 (unknown)
W0818 16:57:43.776408 26217 init.cc:234] @ 0x7f7b45ad6e40 ffi_call_unix64
W0818 16:57:43.777185 26217 init.cc:234] @ 0x7f7b45ad68ab ffi_call
W0818 16:57:43.777987 26217 init.cc:234] @ 0x7f7b45ceaf4f _ctypes_callproc
W0818 16:57:43.778774 26217 init.cc:234] @ 0x7f7b45ce1ca3 PyCFuncPtr_call
W0818 16:57:43.780123 26217 init.cc:234] @ 0x7f7b862b2024 _PyObject_FastCallKeywords
W0818 16:57:43.781473 26217 init.cc:234] @ 0x7f7b862870db _PyEval_EvalFrameDefault
W0818 16:57:43.782737 26217 init.cc:234] @ 0x7f7b862808c0 function_code_fastcall
W0818 16:57:43.784080 26217 init.cc:234] @ 0x7f7b86288ae0 _PyEval_EvalFrameDefault
W0818 16:57:43.785297 26217 init.cc:234] @ 0x7f7b862808c0 function_code_fastcall
W0818 16:57:43.786645 26217 init.cc:234] @ 0x7f7b86288ae0 _PyEval_EvalFrameDefault
W0818 16:57:43.787860 26217 init.cc:234] @ 0x7f7b862808c0 function_code_fastcall
W0818 16:57:43.789204 26217 init.cc:234] @ 0x7f7b8628906c _PyEval_EvalFrameDefault
W0818 16:57:43.790540 26217 init.cc:234] @ 0x7f7b863964e4 _PyEval_EvalCodeWithName
W0818 16:57:43.791875 26217 init.cc:234] @ 0x7f7b862b17c0 _PyFunction_FastCallDict
W0818 16:57:43.793200 26217 init.cc:234] @ 0x7f7b862b2a2d _PyObject_Call_Prepend
W0818 16:57:43.794450 26217 init.cc:234] @ 0x7f7b86315c01 slot_tp_init
W0818 16:57:43.795683 26217 init.cc:234] @ 0x7f7b863102d3 type_call
W0818 16:57:43.797013 26217 init.cc:234] @ 0x7f7b862b2024 _PyObject_FastCallKeywords
W0818 16:57:43.798363 26217 init.cc:234] @ 0x7f7b86285eb1 _PyEval_EvalFrameDefault
W0818 16:57:43.799719 26217 init.cc:234] @ 0x7f7b863964e4 _PyEval_EvalCodeWithName
W0818 16:57:43.801051 26217 init.cc:234] @ 0x7f7b862b17c0 _PyFunction_FastCallDict
W0818 16:57:43.802392 26217 init.cc:234] @ 0x7f7b862b2a2d _PyObject_Call_Prepend
W0818 16:57:43.803627 26217 init.cc:234] @ 0x7f7b86315c01 slot_tp_init
W0818 16:57:43.804854 26217 init.cc:234] @ 0x7f7b863102d3 type_call
W0818 16:57:43.806183 26217 init.cc:234] @ 0x7f7b862b2024 _PyObject_FastCallKeywords
W0818 16:57:43.807528 26217 init.cc:234] @ 0x7f7b86285eb1 _PyEval_EvalFrameDefault
W0818 16:57:43.808724 26217 init.cc:234] @ 0x7f7b862808c0 function_code_fastcall
W0818 16:57:43.810065 26217 init.cc:234] @ 0x7f7b8628906c _PyEval_EvalFrameDefault
W0818 16:57:43.811403 26217 init.cc:234] @ 0x7f7b863964e4 _PyEval_EvalCodeWithName
W0818 16:57:43.812738 26217 init.cc:234] @ 0x7f7b862b16b7 _PyFunction_FastCallDict