yolo v3 在训练过程中会卡死
Created by: Knightsll
ubuntu 16.04 paddlepaddle-1.5.0 RTX 2080Ti 执行命令 python train.py --model_save_dir output/ --pretrain weights/ --data_dir dataset/coco/ --class_num 80 --batch_size 4 --learning_rate 0.0001 在训练过程中卡死,请问是出了什么问题 ----------- Configuration Arguments ----------- batch_size: 4 class_num: 80 data_dir: dataset/coco/ dataset: coco2017 debug: False draw_thresh: 0.5 enable_ce: False image_name: None image_path: image input_size: 608 label_smooth: True learning_rate: 0.0001 max_iter: 500200 model_save_dir: output/ nms_posk: 100 nms_thresh: 0.45 nms_topk: 400 no_mixup_iter: 40000 pretrain: weights/ random_shape: True snapshot_iter: 2000 start_iter: 0 syncbn: True use_gpu: True use_multiprocess_reader: True valid_thresh: 0.005 weights: weights/yolov3
Found 1 CUDA devices. W0708 19:33:51.657316 2893 device_context.cc:259] Please NOTE: device: 0, CUDA Capability: 75, Driver API Version: 10.2, Runtime API Version: 10.0 W0708 19:33:52.296442 2893 device_context.cc:267] device: 0, cuDNN Version: 7.6. Disable syncbn in single device WARNING:root: You can try our memory optimize feature to save your memory usage: # create a build_strategy variable to set memory optimize option build_strategy = compiler.BuildStrategy() build_strategy.enable_inplace = True build_strategy.memory_optimize = True
# pass the build_strategy to with_data_parallel API
compiled_prog = compiler.CompiledProgram(main).with_data_parallel(
loss_name=loss.name, build_strategy=build_strategy)
!!! Memory optimize is our experimental feature !!!
some variables may be removed/reused internal to save memory usage,
in order to fetch the right value of the fetch_list, please set the
persistable property to true for each variable in fetch_list
# Sample
conv1 = fluid.layers.conv2d(data, 4, 5, 1, act=None)
# if you need to fetch conv1, then:
conv1.persistable = True
loading annotations into memory... Done (t=25.48s) creating index... index created! Load in 80 categories. I0708 19:34:19.215874 2893 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 1. And the Program will be copied 1 copies I0708 19:34:19.342849 2893 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1 ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^