图像分类DALI出错
Created by: dbcool
--model=SE_ResNet50_vd
--pretrained_model=pretrain_model/SE_ResNet50_vd_pretrained/
--batch_size=200
--data_dir=data/terror_detail/
--num_epochs=60
--total_images=437462
--class_dim=14
--model_save_dir=terror/output_terror_detail/
--lr_strategy=cosine_decay_warmup
--use_label_smoothing=True
--use_mixup=True
--use_dali=True
--reader_thread=4
--lr=0.01
错误是:
validate : 1
warm_up_epochs : 5.0
进行图像分类时使用DALI出错
配置:nvidia-dali:0.16.0, paddle:1.6.1, gcc:5.4,cuda10
我的运行指令:
export CUDA_VISIBLE_DEVICES=0,1,2,3
export FLAGS_fraction_of_gpu_memory_to_use=0.80
python -m paddle.distributed.launch train.py W1204 09:21:20.665040 4048 device_context.cc:235] Please NOTE: device: 3, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W1204 09:21:20.671504 4048 device_context.cc:243] device: 3, cuDNN Version: 7.6. W1204 09:21:20.693971 4045 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W1204 09:21:20.700013 4045 device_context.cc:243] device: 0, cuDNN Version: 7.6. W1204 09:21:20.711027 4046 device_context.cc:235] Please NOTE: device: 1, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W1204 09:21:20.716599 4046 device_context.cc:243] device: 1, cuDNN Version: 7.6. W1204 09:21:20.730718 4047 device_context.cc:235] Please NOTE: device: 2, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0 W1204 09:21:20.735841 4047 device_context.cc:243] device: 2, cuDNN Version: 7.6. W1204 09:21:22.707733 4045 init.cc:205] *** Aborted at 1575451282 (unix time) try "date -d @1575451282" if you are using GNU date *** W1204 09:21:22.709415 4045 init.cc:205] PC: @ 0x0 (unknown) W1204 09:21:22.709626 4045 init.cc:205] *** SIGSEGV (@0x0) received by PID 4045 (TID 0x7fa1d202c740) from PID 0; stack trace: ***