ValueError: Operator "gen_nccl_id" has not been registered.
Created by: zhengbiqing
E:\projects\PaddleClas-master>python -m paddle.distributed.launch --selected_gpus='0' tools/train.py -c configs/quick_start/ResNet50_vd_finetune_my.yaml ----------- Configuration Arguments ----------- cluster_node_ips: 127.0.0.1 log_dir: None node_ip: 127.0.0.1 print_config: True selected_gpus: '0' started_port: 6170 training_script: tools/train.py training_script_args: ['-c', 'configs/quick_start/ResNet50_vd_finetune_my.yaml'] use_paddlecloud: False
trainers_endpoints: 127.0.0.1:6170 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 1 2020-05-13 23:57:14 INFO:
== PaddleClas is powered by PaddlePaddle ! ==
https://github.com/PaddlePaddle/PaddleClas ==
== == == For more info please go to the following website. == == == ==2020-05-13 23:57:14 INFO: ARCHITECTURE : 2020-05-13 23:57:14 INFO: name : ResNet50_vd 2020-05-13 23:57:14 INFO: ------------------------------------------------------------ 2020-05-13 23:57:14 INFO: LEARNING_RATE : 2020-05-13 23:57:14 INFO: function : Cosine 2020-05-13 23:57:14 INFO: params : 2020-05-13 23:57:14 INFO: lr : 0.00375 2020-05-13 23:57:14 INFO: ------------------------------------------------------------ 2020-05-13 23:57:14 INFO: OPTIMIZER : 2020-05-13 23:57:14 INFO: function : Momentum 2020-05-13 23:57:14 INFO: params : 2020-05-13 23:57:14 INFO: momentum : 0.9 2020-05-13 23:57:14 INFO: regularizer : 2020-05-13 23:57:14 INFO: factor : 1e-06 2020-05-13 23:57:14 INFO: function : L2 2020-05-13 23:57:14 INFO: ------------------------------------------------------------ 2020-05-13 23:57:14 INFO: TRAIN : 2020-05-13 23:57:14 INFO: batch_size : 32 2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/ 2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513train.list 2020-05-13 23:57:14 INFO: num_workers : 4 2020-05-13 23:57:14 INFO: shuffle_seed : 0 2020-05-13 23:57:14 INFO: transforms : 2020-05-13 23:57:14 INFO: DecodeImage : 2020-05-13 23:57:14 INFO: channel_first : False 2020-05-13 23:57:14 INFO: to_np : False 2020-05-13 23:57:14 INFO: to_rgb : True 2020-05-13 23:57:14 INFO: RandCropImage : 2020-05-13 23:57:14 INFO: size : 224 2020-05-13 23:57:14 INFO: RandFlipImage : 2020-05-13 23:57:14 INFO: flip_code : 1 2020-05-13 23:57:14 INFO: NormalizeImage : 2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406] 2020-05-13 23:57:14 INFO: order : 2020-05-13 23:57:14 INFO: scale : 1./255. 2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225] 2020-05-13 23:57:14 INFO: ToCHWImage : None 2020-05-13 23:57:14 INFO: ------------------------------------------------------------ 2020-05-13 23:57:14 INFO: VALID : 2020-05-13 23:57:14 INFO: batch_size : 20 2020-05-13 23:57:14 INFO: data_dir : G:/ai_data/paddle/0513/ 2020-05-13 23:57:14 INFO: file_list : G:/ai_data/paddle/0513test.list 2020-05-13 23:57:14 INFO: num_workers : 4 2020-05-13 23:57:14 INFO: shuffle_seed : 0 2020-05-13 23:57:14 INFO: transforms : 2020-05-13 23:57:14 INFO: DecodeImage : 2020-05-13 23:57:14 INFO: channel_first : False 2020-05-13 23:57:14 INFO: to_np : False 2020-05-13 23:57:14 INFO: to_rgb : True 2020-05-13 23:57:14 INFO: ResizeImage : 2020-05-13 23:57:14 INFO: resize_short : 256 2020-05-13 23:57:14 INFO: CropImage : 2020-05-13 23:57:14 INFO: size : 224 2020-05-13 23:57:14 INFO: NormalizeImage : 2020-05-13 23:57:14 INFO: mean : [0.485, 0.456, 0.406] 2020-05-13 23:57:14 INFO: order : 2020-05-13 23:57:14 INFO: scale : 1.0/255.0 2020-05-13 23:57:14 INFO: std : [0.229, 0.224, 0.225] 2020-05-13 23:57:14 INFO: ToCHWImage : None 2020-05-13 23:57:14 INFO: ------------------------------------------------------------ 2020-05-13 23:57:14 INFO: classes_num : 3 2020-05-13 23:57:14 INFO: epochs : 20 2020-05-13 23:57:14 INFO: image_shape : [3, 224, 224] 2020-05-13 23:57:14 INFO: mode : train 2020-05-13 23:57:14 INFO: model_save_dir : E:/projects/PaddleClas-master/output/ 2020-05-13 23:57:14 INFO: pretrained_model : E:/projects/PaddleClas-master/ResNet50_vd_pretrained 2020-05-13 23:57:14 INFO: save_interval : 1 2020-05-13 23:57:14 INFO: topk : 5 2020-05-13 23:57:14 INFO: total_images : 795 2020-05-13 23:57:14 INFO: valid_interval : 1 2020-05-13 23:57:14 INFO: validate : True
API is deprecated since 2.0.0 Please use FleetAPI instead. WIKI: https://github.com/PaddlePaddle/Fleet/blob/develop/markdown_doc/transpiler
Traceback (most recent call last): File "tools/train.py", line 124, in main(args) File "tools/train.py", line 69, in main config, train_prog, startup_prog, is_train=True) File "E:\projects\PaddleClas-master\tools\program.py", line 341, in build optimizer.minimize(fetchs['loss'][0]) File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init_.py", line 424, in minimize fleet.main_program = self.try_to_compile(startup_program, main_program) File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init.py", line 358, in _try_to_compile self.transpile(startup_program, main_program) File "C:\python\tf\lib\site-packages\paddle\fluid\incubate\fleet\collective_init.py", line 285, in _transpile current_endpoint=current_endpoint) File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 625, in transpile wait_port=self.config.wait_port) File "C:\python\tf\lib\site-packages\paddle\fluid\transpiler\distribute_transpiler.py", line 397, in _transpile_nccl2 self.config.hierarchical_allreduce_inter_nranks File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1797, in init proto = OpProtoHolder.instance().get_op_proto(type) File "C:\python\tf\lib\site-packages\paddle\fluid\framework.py", line 1679, in get_op_proto raise ValueError("Operator "%s" has not been registered." % type) ValueError: Operator "gen_nccl_id" has not been registered. 2020-05-13 15:57:16,981-ERROR: ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log. ERROR 2020-05-13 15:57:16,981 launch.py:284] ABORT!!! Out of all 1 trainers, the trainer process with rank=[0] was aborted. Please check its log.
这是什么问题?