不同的服务器运行相同的程序,一个机器可以稳定的训练几天,另一台机器训练不到3个小时就会报错
Created by: dragon515
如题: 不同的服务器运行相同的程序,一个机器可以稳定的训练几天,另一台机器训练不到3个小时就会报错 PaddleCheckError: Expected index.dims()[0] > 0, but received index.dims()[0]:0 <= 0:0. The index of gather_op should not be empty when the index's rank is 1. at [/paddle/paddle/fluid/operators/gather.cu.h:82]
2019-12-08 09:36:34,183-INFO: 1076 samples in file dataset/coco/annotations/instances_val2017.json
2019-12-08 09:36:34,186-INFO: places would be ommited when DataLoader is not iterable
W1208 09:36:36.099746 317 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 60, Driver API Version: 10.0, Runtime API Version: 10.0
W1208 09:36:36.412878 317 device_context.cc:243] device: 0, cuDNN Version: 7.6.
2019-12-08 09:36:40,108-INFO: Loading checkpoint from output/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x/27000...
2019-12-08 09:36:46,786-INFO: 20764 samples in file dataset/coco/annotations/instances_train2017.json
2019-12-08 09:36:46,893-INFO: places would be ommited when DataLoader is not iterable
I1208 09:36:47.311914 317 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 2. And the Program will be copied 2 copies
I1208 09:36:49.598440 317 graph_pattern_detector.cc:96] --- detected 40 subgraphs
I1208 09:36:49.672873 317 graph_pattern_detector.cc:96] --- detected 37 subgraphs
W1208 09:36:49.861197 317 fuse_all_reduce_op_pass.cc:72] Find all_reduce operators: 183. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 153.
I1208 09:36:49.873278 317 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
I1208 09:36:50.447363 317 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I1208 09:36:50.538653 317 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
2019-12-08 09:37:11,792-INFO: iter: 27020, lr: 0.002500, 'loss_cls': '0.107255', 'loss_bbox': '0.027132', 'loss_rpn_cls': '0.029164', 'loss_rpn_bbox': '0.004501', 'loss': '0.162709', time: 1.196, eta: 9 days, 14:16:56
2019-12-08 09:37:33,380-INFO: iter: 27040, lr: 0.002500, 'loss_cls': '0.160085', 'loss_bbox': '0.037842', 'loss_rpn_cls': '0.029905', 'loss_rpn_bbox': '0.005318', 'loss': '0.248189', time: 1.069, eta: 8 days, 13:42:18
2019-12-08 09:37:56,325-INFO: iter: 27060, lr: 0.002500, 'loss_cls': '0.111756', 'loss_bbox': '0.042657', 'loss_rpn_cls': '0.022942', 'loss_rpn_bbox': '0.003554', 'loss': '0.176668', time: 1.153, eta: 9 days, 5:55:00
2019-12-08 09:38:18,266-INFO: iter: 27080, lr: 0.002500, 'loss_cls': '0.117590', 'loss_bbox': '0.037847', 'loss_rpn_cls': '0.028765', 'loss_rpn_bbox': '0.007621', 'loss': '0.183682', time: 1.103, eta: 8 days, 20:15:40
2019-12-08 09:38:38,424-INFO: iter: 27100, lr: 0.002500, 'loss_cls': '0.109035', 'loss_bbox': '0.036450', 'loss_rpn_cls': '0.020142', 'loss_rpn_bbox': '0.002935', 'loss': '0.167680', time: 1.001, eta: 8 days, 0:35:44
2019-12-08 09:39:01,073-INFO: iter: 27120, lr: 0.002500, 'loss_cls': '0.112554', 'loss_bbox': '0.030803', 'loss_rpn_cls': '0.027392', 'loss_rpn_bbox': '0.007306', 'loss': '0.210294', time: 1.125, eta: 9 days, 0:33:28
2019-12-08 09:39:22,075-INFO: iter: 27140, lr: 0.002500, 'loss_cls': '0.119142', 'loss_bbox': '0.032270', 'loss_rpn_cls': '0.024191', 'loss_rpn_bbox': '0.004826', 'loss': '0.176259', time: 1.051, eta: 8 days, 10:21:23
2019-12-08 09:39:43,539-INFO: iter: 27160, lr: 0.002500, 'loss_cls': '0.146193', 'loss_bbox': '0.047863', 'loss_rpn_cls': '0.028642', 'loss_rpn_bbox': '0.007621', 'loss': '0.238028', time: 1.064, eta: 8 days, 12:47:36
2019-12-08 09:40:05,350-INFO: iter: 27180, lr: 0.002500, 'loss_cls': '0.095980', 'loss_bbox': '0.035866', 'loss_rpn_cls': '0.022334', 'loss_rpn_bbox': '0.004785', 'loss': '0.186832', time: 1.113, eta: 8 days, 22:10:17
2019-12-08 09:40:27,442-INFO: iter: 27200, lr: 0.002500, 'loss_cls': '0.103658', 'loss_bbox': '0.030443', 'loss_rpn_cls': '0.023747', 'loss_rpn_bbox': '0.005142', 'loss': '0.174824', time: 1.100, eta: 8 days, 19:36:49
2019-12-08 09:40:48,639-INFO: iter: 27220, lr: 0.002500, 'loss_cls': '0.123171', 'loss_bbox': '0.041430', 'loss_rpn_cls': '0.026343', 'loss_rpn_bbox': '0.007074', 'loss': '0.197138', time: 1.065, eta: 8 days, 12:56:42
2019-12-08 09:41:09,553-INFO: iter: 27240, lr: 0.002500, 'loss_cls': '0.126446', 'loss_bbox': '0.030707', 'loss_rpn_cls': '0.021942', 'loss_rpn_bbox': '0.004218', 'loss': '0.185318', time: 1.044, eta: 8 days, 8:53:24
2019-12-08 09:41:29,916-INFO: iter: 27260, lr: 0.002500, 'loss_cls': '0.109329', 'loss_bbox': '0.035236', 'loss_rpn_cls': '0.029849', 'loss_rpn_bbox': '0.008709', 'loss': '0.188167', time: 1.012, eta: 8 days, 2:44:33
2019-12-08 09:41:51,827-INFO: iter: 27280, lr: 0.002500, 'loss_cls': '0.117947', 'loss_bbox': '0.035055', 'loss_rpn_cls': '0.022669', 'loss_rpn_bbox': '0.004247', 'loss': '0.183069', time: 1.100, eta: 8 days, 19:38:23
2019-12-08 09:42:13,266-INFO: iter: 27300, lr: 0.002500, 'loss_cls': '0.104928', 'loss_bbox': '0.032062', 'loss_rpn_cls': '0.031705', 'loss_rpn_bbox': '0.006268', 'loss': '0.186234', time: 1.072, eta: 8 days, 14:19:22
2019-12-08 09:42:35,486-INFO: iter: 27320, lr: 0.002500, 'loss_cls': '0.110162', 'loss_bbox': '0.032131', 'loss_rpn_cls': '0.037423', 'loss_rpn_bbox': '0.009977', 'loss': '0.179894', time: 1.113, eta: 8 days, 22:07:25
2019-12-08 09:42:57,989-INFO: iter: 27340, lr: 0.002500, 'loss_cls': '0.098853', 'loss_bbox': '0.036495', 'loss_rpn_cls': '0.022839', 'loss_rpn_bbox': '0.007259', 'loss': '0.177353', time: 1.103, eta: 8 days, 20:13:59
2019-12-08 09:43:18,957-INFO: iter: 27360, lr: 0.002500, 'loss_cls': '0.136675', 'loss_bbox': '0.037647', 'loss_rpn_cls': '0.024854', 'loss_rpn_bbox': '0.005528', 'loss': '0.243505', time: 1.051, eta: 8 days, 10:15:43
2019-12-08 09:43:40,055-INFO: iter: 27380, lr: 0.002500, 'loss_cls': '0.107223', 'loss_bbox': '0.036117', 'loss_rpn_cls': '0.025308', 'loss_rpn_bbox': '0.005180', 'loss': '0.191671', time: 1.074, eta: 8 days, 14:33:44
2019-12-08 09:44:01,801-INFO: iter: 27400, lr: 0.002500, 'loss_cls': '0.114322', 'loss_bbox': '0.032498', 'loss_rpn_cls': '0.021028', 'loss_rpn_bbox': '0.003865', 'loss': '0.180020', time: 1.088, eta: 8 days, 17:13:38
2019-12-08 09:44:23,235-INFO: iter: 27420, lr: 0.002500, 'loss_cls': '0.101029', 'loss_bbox': '0.029264', 'loss_rpn_cls': '0.028364', 'loss_rpn_bbox': '0.007907', 'loss': '0.181565', time: 1.067, eta: 8 days, 13:19:37
2019-12-08 09:44:44,491-INFO: iter: 27440, lr: 0.002500, 'loss_cls': '0.097345', 'loss_bbox': '0.030726', 'loss_rpn_cls': '0.029091', 'loss_rpn_bbox': '0.005249', 'loss': '0.170476', time: 1.069, eta: 8 days, 13:35:30
2019-12-08 09:45:06,253-INFO: iter: 27460, lr: 0.002500, 'loss_cls': '0.107105', 'loss_bbox': '0.038378', 'loss_rpn_cls': '0.025624', 'loss_rpn_bbox': '0.005551', 'loss': '0.181381', time: 1.087, eta: 8 days, 17:10:51
2019-12-08 09:45:28,042-INFO: iter: 27480, lr: 0.002500, 'loss_cls': '0.108549', 'loss_bbox': '0.029579', 'loss_rpn_cls': '0.024967', 'loss_rpn_bbox': '0.006547', 'loss': '0.215194', time: 1.064, eta: 8 days, 12:39:42
2019-12-08 09:45:49,157-INFO: iter: 27500, lr: 0.002500, 'loss_cls': '0.112655', 'loss_bbox': '0.038301', 'loss_rpn_cls': '0.025386', 'loss_rpn_bbox': '0.006977', 'loss': '0.210186', time: 1.081, eta: 8 days, 15:58:03
2019-12-08 09:46:10,362-INFO: iter: 27520, lr: 0.002500, 'loss_cls': '0.115814', 'loss_bbox': '0.031053', 'loss_rpn_cls': '0.022635', 'loss_rpn_bbox': '0.004697', 'loss': '0.195093', time: 1.056, eta: 8 days, 11:02:29
2019-12-08 09:46:32,177-INFO: iter: 27540, lr: 0.002500, 'loss_cls': '0.116692', 'loss_bbox': '0.036010', 'loss_rpn_cls': '0.027288', 'loss_rpn_bbox': '0.005695', 'loss': '0.193081', time: 1.066, eta: 8 days, 13:02:31
2019-12-08 09:46:54,590-INFO: iter: 27560, lr: 0.002500, 'loss_cls': '0.128424', 'loss_bbox': '0.034904', 'loss_rpn_cls': '0.025524', 'loss_rpn_bbox': '0.004873', 'loss': '0.208800', time: 1.151, eta: 9 days, 5:23:29
2019-12-08 09:47:16,635-INFO: iter: 27580, lr: 0.002500, 'loss_cls': '0.107845', 'loss_bbox': '0.033880', 'loss_rpn_cls': '0.027985', 'loss_rpn_bbox': '0.005409', 'loss': '0.170375', time: 1.102, eta: 8 days, 19:58:25
2019-12-08 09:47:37,165-INFO: iter: 27600, lr: 0.002500, 'loss_cls': '0.096917', 'loss_bbox': '0.027007', 'loss_rpn_cls': '0.023711', 'loss_rpn_bbox': '0.005217', 'loss': '0.149012', time: 1.022, eta: 8 days, 4:34:20
2019-12-08 09:47:59,157-INFO: iter: 27620, lr: 0.002500, 'loss_cls': '0.148022', 'loss_bbox': '0.041588', 'loss_rpn_cls': '0.034595', 'loss_rpn_bbox': '0.008284', 'loss': '0.243836', time: 1.097, eta: 8 days, 18:56:47
2019-12-08 09:48:21,268-INFO: iter: 27640, lr: 0.002500, 'loss_cls': '0.102161', 'loss_bbox': '0.033482', 'loss_rpn_cls': '0.022656', 'loss_rpn_bbox': '0.003083', 'loss': '0.176066', time: 1.113, eta: 8 days, 22:04:28
2019-12-08 09:48:42,021-INFO: iter: 27660, lr: 0.002500, 'loss_cls': '0.116623', 'loss_bbox': '0.031174', 'loss_rpn_cls': '0.021008', 'loss_rpn_bbox': '0.006698', 'loss': '0.181974', time: 1.032, eta: 8 days, 6:25:45
2019-12-08 10:03:30,895-INFO: iter: 28260, lr: 0.002500, 'loss_cls': '0.120618', 'loss_bbox': '0.036356', 'loss_rpn_cls': '0.028862', 'loss_rpn_bbox': '0.006762', 'loss': '0.196524', time: 1.062, eta: 8 days, 12:08:16
2019-12-08 10:03:54,389-INFO: iter: 28280, lr: 0.002500, 'loss_cls': '0.116200', 'loss_bbox': '0.037455', 'loss_rpn_cls': '0.026497', 'loss_rpn_bbox': '0.006574', 'loss': '0.179571', time: 1.176, eta: 9 days, 9:52:23
2019-12-08 10:04:16,664-INFO: iter: 28300, lr: 0.002500, 'loss_cls': '0.090389', 'loss_bbox': '0.022946', 'loss_rpn_cls': '0.028943', 'loss_rpn_bbox': '0.005928', 'loss': '0.150511', time: 1.106, eta: 8 days, 20:34:48
2019-12-08 10:04:39,204-INFO: iter: 28320, lr: 0.002500, 'loss_cls': '0.099024', 'loss_bbox': '0.030282', 'loss_rpn_cls': '0.027065', 'loss_rpn_bbox': '0.006674', 'loss': '0.180380', time: 1.125, eta: 9 days, 0:10:30
2019-12-08 10:05:00,137-INFO: iter: 28340, lr: 0.002500, 'loss_cls': '0.104252', 'loss_bbox': '0.031640', 'loss_rpn_cls': '0.023287', 'loss_rpn_bbox': '0.004962', 'loss': '0.179577', time: 1.055, eta: 8 days, 10:42:35
2019-12-08 10:05:20,870-INFO: iter: 28360, lr: 0.002500, 'loss_cls': '0.106998', 'loss_bbox': '0.036490', 'loss_rpn_cls': '0.027223', 'loss_rpn_bbox': '0.006033', 'loss': '0.196109', time: 1.037, eta: 8 days, 7:08:30
2019-12-08 11:26:26,361-INFO: iter: 32040, lr: 0.002500, 'loss_cls': '0.093410', 'loss_bbox': '0.025801', 'loss_rpn_cls': '0.020566', 'loss_rpn_bbox': '0.004328', 'loss': '0.155523', time: 1.130, eta: 8 days, 23:52:36
2019-12-08 11:26:48,282-INFO: iter: 32060, lr: 0.002500, 'loss_cls': '0.116218', 'loss_bbox': '0.030774', 'loss_rpn_cls': '0.028761', 'loss_rpn_bbox': '0.008732', 'loss': '0.198504', time: 1.111, eta: 8 days, 20:15:33
2019-12-08 11:27:10,345-INFO: iter: 32080, lr: 0.002500, 'loss_cls': '0.130258', 'loss_bbox': '0.032691', 'loss_rpn_cls': '0.036417', 'loss_rpn_bbox': '0.007845', 'loss': '0.220228', time: 1.090, eta: 8 days, 16:15:38
2019-12-08 11:27:32,476-INFO: iter: 32100, lr: 0.002500, 'loss_cls': '0.108925', 'loss_bbox': '0.037187', 'loss_rpn_cls': '0.021106', 'loss_rpn_bbox': '0.007838', 'loss': '0.178534', time: 1.119, eta: 8 days, 21:54:37
2019-12-08 11:27:54,745-INFO: iter: 32120, lr: 0.002500, 'loss_cls': '0.112486', 'loss_bbox': '0.036119', 'loss_rpn_cls': '0.026041', 'loss_rpn_bbox': '0.005284', 'loss': '0.181879', time: 1.102, eta: 8 days, 18:36:12
2019-12-08 11:28:17,569-INFO: iter: 32140, lr: 0.002500, 'loss_cls': '0.105735', 'loss_bbox': '0.034030', 'loss_rpn_cls': '0.024859', 'loss_rpn_bbox': '0.003535', 'loss': '0.172869', time: 1.147, eta: 9 days, 3:05:29
2019-12-08 11:28:38,749-INFO: iter: 32160, lr: 0.002500, 'loss_cls': '0.097556', 'loss_bbox': '0.032315', 'loss_rpn_cls': '0.027183', 'loss_rpn_bbox': '0.004453', 'loss': '0.168935', time: 1.063, eta: 8 days, 11:03:53
2019-12-08 11:29:01,167-INFO: iter: 32180, lr: 0.002500, 'loss_cls': '0.106844', 'loss_bbox': '0.027867', 'loss_rpn_cls': '0.027673', 'loss_rpn_bbox': '0.007006', 'loss': '0.180671', time: 1.103, eta: 8 days, 18:40:46
2019-12-08 11:29:22,644-INFO: iter: 32200, lr: 0.002500, 'loss_cls': '0.111045', 'loss_bbox': '0.039628', 'loss_rpn_cls': '0.023468', 'loss_rpn_bbox': '0.004395', 'loss': '0.191487', time: 1.086, eta: 8 days, 15:34:46
2019-12-08 11:29:44,434-INFO: iter: 32220, lr: 0.002500, 'loss_cls': '0.124690', 'loss_bbox': '0.027931', 'loss_rpn_cls': '0.019735', 'loss_rpn_bbox': '0.005108', 'loss': '0.175699', time: 1.097, eta: 8 days, 17:33:22
2019-12-08 11:30:05,626-INFO: iter: 32240, lr: 0.002500, 'loss_cls': '0.095456', 'loss_bbox': '0.035033', 'loss_rpn_cls': '0.022073', 'loss_rpn_bbox': '0.003227', 'loss': '0.156538', time: 1.059, eta: 8 days, 10:13:40
2019-12-08 11:30:26,897-INFO: iter: 32260, lr: 0.002500, 'loss_cls': '0.119746', 'loss_bbox': '0.046080', 'loss_rpn_cls': '0.019381', 'loss_rpn_bbox': '0.004442', 'loss': '0.182005', time: 1.066, eta: 8 days, 11:36:56
2019-12-08 11:30:48,440-INFO: iter: 32280, lr: 0.002500, 'loss_cls': '0.115690', 'loss_bbox': '0.028565', 'loss_rpn_cls': '0.028244', 'loss_rpn_bbox': '0.005486', 'loss': '0.200766', time: 1.057, eta: 8 days, 9:58:26
2019-12-08 11:31:09,633-INFO: iter: 32300, lr: 0.002500, 'loss_cls': '0.107902', 'loss_bbox': '0.033771', 'loss_rpn_cls': '0.021195', 'loss_rpn_bbox': '0.004134', 'loss': '0.175219', time: 1.079, eta: 8 days, 14:09:52
2019-12-08 11:31:30,260-INFO: iter: 32320, lr: 0.002500, 'loss_cls': '0.098927', 'loss_bbox': '0.037162', 'loss_rpn_cls': '0.021459', 'loss_rpn_bbox': '0.005403', 'loss': '0.158387', time: 1.029, eta: 8 days, 4:37:34
2019-12-08 11:31:53,241-INFO: iter: 32340, lr: 0.002500, 'loss_cls': '0.107896', 'loss_bbox': '0.033144', 'loss_rpn_cls': '0.033767', 'loss_rpn_bbox': '0.004966', 'loss': '0.197591', time: 1.142, eta: 9 days, 2:11:20
2019-12-08 11:32:13,843-INFO: iter: 32360, lr: 0.002500, 'loss_cls': '0.107879', 'loss_bbox': '0.041093', 'loss_rpn_cls': '0.033133', 'loss_rpn_bbox': '0.011063', 'loss': '0.211041', time: 1.035, eta: 8 days, 5:37:19
2019-12-08 11:32:34,803-INFO: iter: 32380, lr: 0.002500, 'loss_cls': '0.110426', 'loss_bbox': '0.039289', 'loss_rpn_cls': '0.027911', 'loss_rpn_bbox': '0.005713', 'loss': '0.184945', time: 1.052, eta: 8 days, 8:53:31
2019-12-08 11:32:55,430-INFO: iter: 32400, lr: 0.002500, 'loss_cls': '0.128478', 'loss_bbox': '0.041886', 'loss_rpn_cls': '0.026493', 'loss_rpn_bbox': '0.006556', 'loss': '0.188470', time: 1.031, eta: 8 days, 4:53:08
2019-12-08 11:33:17,293-INFO: iter: 32420, lr: 0.002500, 'loss_cls': '0.110536', 'loss_bbox': '0.032399', 'loss_rpn_cls': '0.030574', 'loss_rpn_bbox': '0.007749', 'loss': '0.206734', time: 1.086, eta: 8 days, 15:30:35
2019-12-08 11:33:38,599-INFO: iter: 32440, lr: 0.002500, 'loss_cls': '0.095937', 'loss_bbox': '0.035684', 'loss_rpn_cls': '0.020276', 'loss_rpn_bbox': '0.007348', 'loss': '0.159013', time: 1.070, eta: 8 days, 12:17:27
2019-12-08 11:34:00,011-INFO: iter: 32460, lr: 0.002500, 'loss_cls': '0.120759', 'loss_bbox': '0.040102', 'loss_rpn_cls': '0.025354', 'loss_rpn_bbox': '0.005780', 'loss': '0.201859', time: 1.072, eta: 8 days, 12:43:34
2019-12-08 11:34:20,885-INFO: iter: 32480, lr: 0.002500, 'loss_cls': '0.091575', 'loss_bbox': '0.030783', 'loss_rpn_cls': '0.027161', 'loss_rpn_bbox': '0.004689', 'loss': '0.153618', time: 1.039, eta: 8 days, 6:27:44
2019-12-08 11:34:41,995-INFO: iter: 32500, lr: 0.002500, 'loss_cls': '0.131100', 'loss_bbox': '0.042119', 'loss_rpn_cls': '0.024955', 'loss_rpn_bbox': '0.005969', 'loss': '0.209270', time: 1.054, eta: 8 days, 9:20:55
2019-12-08 11:35:03,078-INFO: iter: 32520, lr: 0.002500, 'loss_cls': '0.108701', 'loss_bbox': '0.025954', 'loss_rpn_cls': '0.019844', 'loss_rpn_bbox': '0.004977', 'loss': '0.180352', time: 1.053, eta: 8 days, 9:05:40
2019-12-08 11:35:24,807-INFO: iter: 32540, lr: 0.002500, 'loss_cls': '0.100763', 'loss_bbox': '0.031137', 'loss_rpn_cls': '0.025281', 'loss_rpn_bbox': '0.004916', 'loss': '0.175427', time: 1.068, eta: 8 days, 11:56:12
2019-12-08 11:35:47,071-INFO: iter: 32560, lr: 0.002500, 'loss_cls': '0.103370', 'loss_bbox': '0.027407', 'loss_rpn_cls': '0.025815', 'loss_rpn_bbox': '0.007054', 'loss': '0.167267', time: 1.132, eta: 9 days, 0:06:07
2019-12-08 11:36:09,177-INFO: iter: 32580, lr: 0.002500, 'loss_cls': '0.093084', 'loss_bbox': '0.023440', 'loss_rpn_cls': '0.023047', 'loss_rpn_bbox': '0.006587', 'loss': '0.151249', time: 1.100, eta: 8 days, 18:06:41
2019-12-08 11:36:30,414-INFO: iter: 32600, lr: 0.002500, 'loss_cls': '0.092012', 'loss_bbox': '0.028003', 'loss_rpn_cls': '0.021767', 'loss_rpn_bbox': '0.006627', 'loss': '0.159122', time: 1.050, eta: 8 days, 8:25:08
2019-12-08 11:36:52,008-INFO: iter: 32620, lr: 0.002500, 'loss_cls': '0.118538', 'loss_bbox': '0.029654', 'loss_rpn_cls': '0.031362', 'loss_rpn_bbox': '0.007302', 'loss': '0.238050', time: 1.106, eta: 8 days, 19:08:26
2019-12-08 11:37:13,449-INFO: iter: 32640, lr: 0.002500, 'loss_cls': '0.072994', 'loss_bbox': '0.026526', 'loss_rpn_cls': '0.022744', 'loss_rpn_bbox': '0.004115', 'loss': '0.134926', time: 1.055, eta: 8 days, 9:23:13
2019-12-08 11:37:34,177-INFO: iter: 32660, lr: 0.002500, 'loss_cls': '0.097421', 'loss_bbox': '0.024360', 'loss_rpn_cls': '0.023935', 'loss_rpn_bbox': '0.006053', 'loss': '0.161697', time: 1.053, eta: 8 days, 9:03:48
2019-12-08 11:37:55,721-INFO: iter: 32680, lr: 0.002500, 'loss_cls': '0.104007', 'loss_bbox': '0.027913', 'loss_rpn_cls': '0.023525', 'loss_rpn_bbox': '0.008149', 'loss': '0.181776', time: 1.053, eta: 8 days, 9:06:54
2019-12-08 11:38:17,213-INFO: iter: 32700, lr: 0.002500, 'loss_cls': '0.115015', 'loss_bbox': '0.037233', 'loss_rpn_cls': '0.026973', 'loss_rpn_bbox': '0.007489', 'loss': '0.198169', time: 1.099, eta: 8 days, 17:47:08
2019-12-08 11:38:37,801-INFO: iter: 32720, lr: 0.002500, 'loss_cls': '0.107442', 'loss_bbox': '0.030250', 'loss_rpn_cls': '0.026453', 'loss_rpn_bbox': '0.006128', 'loss': '0.170408', time: 1.023, eta: 8 days, 3:16:08
2019-12-08 11:38:58,985-INFO: iter: 32740, lr: 0.002500, 'loss_cls': '0.115347', 'loss_bbox': '0.037755', 'loss_rpn_cls': '0.022134', 'loss_rpn_bbox': '0.003413', 'loss': '0.182344', time: 1.065, eta: 8 days, 11:21:58
2019-12-08 11:39:20,945-INFO: iter: 32760, lr: 0.002500, 'loss_cls': '0.100408', 'loss_bbox': '0.030180', 'loss_rpn_cls': '0.031180', 'loss_rpn_bbox': '0.006408', 'loss': '0.176443', time: 1.091, eta: 8 days, 16:18:49
2019-12-08 11:39:42,146-INFO: iter: 32780, lr: 0.002500, 'loss_cls': '0.090289', 'loss_bbox': '0.019610', 'loss_rpn_cls': '0.024948', 'loss_rpn_bbox': '0.003633', 'loss': '0.158710', time: 1.066, eta: 8 days, 11:26:46
2019-12-08 11:40:03,092-INFO: iter: 32800, lr: 0.002500, 'loss_cls': '0.096883', 'loss_bbox': '0.028336', 'loss_rpn_cls': '0.028111', 'loss_rpn_bbox': '0.007136', 'loss': '0.154402', time: 1.048, eta: 8 days, 7:57:52
2019-12-08 11:40:25,261-INFO: iter: 32820, lr: 0.002500, 'loss_cls': '0.127951', 'loss_bbox': '0.041875', 'loss_rpn_cls': '0.027269', 'loss_rpn_bbox': '0.006420', 'loss': '0.202579', time: 1.100, eta: 8 days, 17:53:07
2019-12-08 11:40:47,515-INFO: iter: 32840, lr: 0.002500, 'loss_cls': '0.105165', 'loss_bbox': '0.026979', 'loss_rpn_cls': '0.026087', 'loss_rpn_bbox': '0.006410', 'loss': '0.165277', time: 1.106, eta: 8 days, 19:01:51
2019-12-08 11:41:10,274-INFO: iter: 32860, lr: 0.002500, 'loss_cls': '0.113844', 'loss_bbox': '0.035535', 'loss_rpn_cls': '0.021415', 'loss_rpn_bbox': '0.005199', 'loss': '0.179511', time: 1.148, eta: 9 days, 3:02:05
2019-12-08 11:41:32,032-INFO: iter: 32880, lr: 0.002500, 'loss_cls': '0.107139', 'loss_bbox': '0.034098', 'loss_rpn_cls': '0.023092', 'loss_rpn_bbox': '0.005283', 'loss': '0.178474', time: 1.094, eta: 8 days, 16:51:14
2019-12-08 11:41:53,913-INFO: iter: 32900, lr: 0.002500, 'loss_cls': '0.107068', 'loss_bbox': '0.033079', 'loss_rpn_cls': '0.025516', 'loss_rpn_bbox': '0.005883', 'loss': '0.179267', time: 1.096, eta: 8 days, 17:05:30
2019-12-08 11:42:15,265-INFO: iter: 32920, lr: 0.002500, 'loss_cls': '0.084352', 'loss_bbox': '0.025512', 'loss_rpn_cls': '0.026543', 'loss_rpn_bbox': '0.005927', 'loss': '0.174963', time: 1.060, eta: 8 days, 10:17:00
2019-12-08 11:42:35,988-INFO: iter: 32940, lr: 0.002500, 'loss_cls': '0.094424', 'loss_bbox': '0.026548', 'loss_rpn_cls': '0.028392', 'loss_rpn_bbox': '0.004338', 'loss': '0.166381', time: 1.043, eta: 8 days, 6:58:18
2019-12-08 11:42:57,952-INFO: iter: 32960, lr: 0.002500, 'loss_cls': '0.077417', 'loss_bbox': '0.029750', 'loss_rpn_cls': '0.025627', 'loss_rpn_bbox': '0.006614', 'loss': '0.165685', time: 1.098, eta: 8 days, 17:34:56
2019-12-08 11:43:19,212-INFO: iter: 32980, lr: 0.002500, 'loss_cls': '0.104759', 'loss_bbox': '0.029914', 'loss_rpn_cls': '0.024789', 'loss_rpn_bbox': '0.005739', 'loss': '0.170735', time: 1.056, eta: 8 days, 9:36:00
2019-12-08 11:43:39,756-INFO: iter: 33000, lr: 0.002500, 'loss_cls': '0.116220', 'loss_bbox': '0.036502', 'loss_rpn_cls': '0.019375', 'loss_rpn_bbox': '0.004633', 'loss': '0.182570', time: 1.028, eta: 8 days, 4:06:34
2019-12-08 11:43:39,761-INFO: Save model to output/faster_rcnn_dcn_x101_vd_64x4d_fpn_1x/33000.
2019-12-08 11:43:55,919-INFO: Test iter 0
2019-12-08 11:44:14,859-INFO: Test iter 100
2019-12-08 11:44:33,713-INFO: Test iter 200
2019-12-08 11:44:53,128-INFO: Test iter 300
2019-12-08 11:45:12,317-INFO: Test iter 400
2019-12-08 11:45:30,595-INFO: Test iter 500
2019-12-08 11:45:48,932-INFO: Test iter 600
2019-12-08 11:46:07,444-INFO: Test iter 700
2019-12-08 11:46:25,915-INFO: Test iter 800
2019-12-08 11:46:44,161-INFO: Test iter 900
2019-12-08 11:47:02,660-INFO: Test iter 1000
2019-12-08 11:47:16,320-INFO: Test finish iter 1076
2019-12-08 11:47:16,321-INFO: Total number of images: 1076, inference time: 5.363484210675358 fps.
2019-12-08 11:47:16,698-INFO: Start evaluate...
2019-12-08 11:47:19,054-INFO: Best test box ap: 0.03341993076262028, in iter: 30000
2019-12-08 11:47:40,249-INFO: iter: 33020, lr: 0.002500, 'loss_cls': '0.108531', 'loss_bbox': '0.043173', 'loss_rpn_cls': '0.017900', 'loss_rpn_bbox': '0.004340', 'loss': '0.169767', time: 12.031, eta: 95 days, 15:54:32
2019-12-08 11:48:01,339-INFO: iter: 33040, lr: 0.002500, 'loss_cls': '0.079204', 'loss_bbox': '0.025323', 'loss_rpn_cls': '0.019523', 'loss_rpn_bbox': '0.004204', 'loss': '0.132759', time: 1.051, eta: 8 days, 8:30:30
2019-12-08 11:48:23,153-INFO: iter: 33060, lr: 0.002500, 'loss_cls': '0.090260', 'loss_bbox': '0.036725', 'loss_rpn_cls': '0.027528', 'loss_rpn_bbox': '0.006311', 'loss': '0.179389', time: 1.095, eta: 8 days, 17:01:20
2019-12-08 11:48:44,463-INFO: iter: 33080, lr: 0.002500, 'loss_cls': '0.093552', 'loss_bbox': '0.034076', 'loss_rpn_cls': '0.018074', 'loss_rpn_bbox': '0.007104', 'loss': '0.152736', time: 1.065, eta: 8 days, 11:12:37
2019-12-08 11:49:06,337-INFO: iter: 33100, lr: 0.002500, 'loss_cls': '0.136960', 'loss_bbox': '0.034778', 'loss_rpn_cls': '0.019313', 'loss_rpn_bbox': '0.004328', 'loss': '0.193764', time: 1.087, eta: 8 days, 15:29:17
2019-12-08 11:49:27,727-INFO: iter: 33120, lr: 0.002500, 'loss_cls': '0.096826', 'loss_bbox': '0.027477', 'loss_rpn_cls': '0.027583', 'loss_rpn_bbox': '0.003562', 'loss': '0.162150', time: 1.071, eta: 8 days, 12:25:21
2019-12-08 11:49:48,957-INFO: iter: 33140, lr: 0.002500, 'loss_cls': '0.111899', 'loss_bbox': '0.039398', 'loss_rpn_cls': '0.017612', 'loss_rpn_bbox': '0.005294', 'loss': '0.181398', time: 1.044, eta: 8 days, 7:10:46
2019-12-08 11:50:09,371-INFO: iter: 33160, lr: 0.002500, 'loss_cls': '0.114998', 'loss_bbox': '0.040129', 'loss_rpn_cls': '0.021477', 'loss_rpn_bbox': '0.005812', 'loss': '0.186288', time: 1.042, eta: 8 days, 6:47:26
2019-12-08 11:50:30,408-INFO: iter: 33180, lr: 0.002500, 'loss_cls': '0.088878', 'loss_bbox': '0.026715', 'loss_rpn_cls': '0.027719', 'loss_rpn_bbox': '0.007981', 'loss': '0.171643', time: 1.040, eta: 8 days, 6:19:52
2019-12-08 11:50:51,979-INFO: iter: 33200, lr: 0.002500, 'loss_cls': '0.127711', 'loss_bbox': '0.037682', 'loss_rpn_cls': '0.028521', 'loss_rpn_bbox': '0.004866', 'loss': '0.217983', time: 1.069, eta: 8 days, 11:54:39
2019-12-08 11:51:13,415-INFO: iter: 33220, lr: 0.002500, 'loss_cls': '0.135208', 'loss_bbox': '0.038123', 'loss_rpn_cls': '0.029232', 'loss_rpn_bbox': '0.005568', 'loss': '0.215429', time: 1.088, eta: 8 days, 15:28:55
2019-12-08 11:51:34,174-INFO: iter: 33240, lr: 0.002500, 'loss_cls': '0.124688', 'loss_bbox': '0.035213', 'loss_rpn_cls': '0.030661', 'loss_rpn_bbox': '0.007142', 'loss': '0.207017', time: 1.044, eta: 8 days, 7:04:49
2019-12-08 11:51:55,334-INFO: iter: 33260, lr: 0.002500, 'loss_cls': '0.107978', 'loss_bbox': '0.040368', 'loss_rpn_cls': '0.029550', 'loss_rpn_bbox': '0.005072', 'loss': '0.180974', time: 1.060, eta: 8 days, 10:09:37
2019-12-08 11:52:16,021-INFO: iter: 33280, lr: 0.002500, 'loss_cls': '0.117390', 'loss_bbox': '0.036064', 'loss_rpn_cls': '0.022095', 'loss_rpn_bbox': '0.004055', 'loss': '0.184817', time: 1.032, eta: 8 days, 4:57:10
2019-12-08 11:52:39,507-INFO: iter: 33300, lr: 0.002500, 'loss_cls': '0.099554', 'loss_bbox': '0.028488', 'loss_rpn_cls': '0.030340', 'loss_rpn_bbox': '0.004249', 'loss': '0.173735', time: 1.168, eta: 9 days, 6:50:47
2019-12-08 11:53:00,719-INFO: iter: 33320, lr: 0.002500, 'loss_cls': '0.091982', 'loss_bbox': '0.025819', 'loss_rpn_cls': '0.026813', 'loss_rpn_bbox': '0.004227', 'loss': '0.173283', time: 1.058, eta: 8 days, 9:50:57
/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/executor.py:774: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "tools/train.py", line 340, in <module>
main()
File "tools/train.py", line 246, in main
outs = exe.run(compiled_train_prog, fetch_list=train_values)
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/executor.py", line 775, in run
six.reraise(*sys.exc_info())
File "/opt/conda/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/executor.py", line 770, in run
use_program_cache=use_program_cache)
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/executor.py", line 829, in _run_impl
return_numpy=return_numpy)
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/executor.py", line 669, in _run_parallel
tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 void paddle::operators::GPUGather<float, int>(paddle::platform::DeviceContext const&, paddle::framework::Tensor const&, paddle::framework::Tensor const&, paddle::framework::Tensor*)
3 paddle::operators::GatherOpCUDAKernel<float>::Compute(paddle::framework::ExecutionContext const&) const
4 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::GatherOpCUDAKernel<float>, paddle::operators::GatherOpCUDAKernel<double>, paddle::operators::GatherOpCUDAKernel<long>, paddle::operators::GatherOpCUDAKernel<int>, paddle::operators::GatherOpCUDAKernel<paddle::platform::float16> >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)
5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_,boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::RuntimeContext*) const
6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_,boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const
7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&)
8 paddle::framework::details::ComputationOpHandle::RunImpl()
9 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)
10 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<paddle::framework::BlockingQueue<unsigned long> > const&, unsigned long*)
11 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<void>, std::__future_base::_Result_base::_Deleter>, void> >::_M_invoke(std::_Any_data const&)
12 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
13 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/framework.py", line 2459, in append_op
attrs=kwargs.get("attrs", None))
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/layers/nn.py", line 10806, in gather
attrs={'overwrite': overwrite})
File "/home/admin/.local/lib/python3.6/site-packages/paddle/fluid/layers/detection.py", line 428, in rpn_target_assign
predicted_cls_logits = nn.gather(cls_logits, score_index)
File "/data/nas/workspace/jupyter/PaddleDetection-release-0.1/ppdet/core/workspace.py", line 113, in partial_apply
return op(*args, **kwargs_)
File "/data/nas/workspace/jupyter/PaddleDetection-release-0.1/ppdet/modeling/anchor_heads/rpn_head.py", line 227, in get_loss
im_info=im_info)
File "/data/nas/workspace/jupyter/PaddleDetection-release-0.1/ppdet/modeling/architectures/faster_rcnn.py", line 100, in build
rpn_loss = self.rpn_head.get_loss(im_info, gt_box, is_crowd)
File "/data/nas/workspace/jupyter/PaddleDetection-release-0.1/ppdet/modeling/architectures/faster_rcnn.py", line 196, in train
return self.build(feed_vars, 'train')
File "tools/train.py", line 128, in main
train_fetches = model.train(feed_vars)
File "tools/train.py", line 340, in <module>
main()
----------------------
Error Message Summary:
----------------------
PaddleCheckError: Expected index.dims()[0] > 0, but received index.dims()[0]:0 <= 0:0.
The index of gather_op should not be empty when the index's rank is 1. at [/paddle/paddle/fluid/operators/gather.cu.h:82]
[operator < gather > error]
terminate called without an active exception
W1208 11:53:03.514448 366 init.cc:205] *** Aborted at 1575777183 (unix time) try "date -d @1575777183" if you are using GNU date ***
W1208 11:53:03.517763 366 init.cc:205] PC: @ 0x0 (unknown)
W1208 11:53:03.519001 366 init.cc:205] *** SIGABRT (@0x1f90000013d) received by PID 317 (TID 0x7f0f2b4bf700) from PID 317; stack trace: ***
W1208 11:53:03.525950 366 init.cc:205] @ 0x7f139613d100 (unknown)
W1208 11:53:03.542176 366 init.cc:205] @ 0x7f1395da15f7 __GI_raise
W1208 11:53:03.555480 366 init.cc:205] @ 0x7f1395da2ce8 __GI_abort
W1208 11:53:03.578831 366 init.cc:205] @ 0x7f137646d84a __gnu_cxx::__verbose_terminate_handler()
W1208 11:53:03.585741 366 init.cc:205] @ 0x7f137646bf47 __cxxabiv1::__terminate()
W1208 11:53:03.608299 366 init.cc:205] @ 0x7f137646bf7d std::terminate()
W1208 11:53:03.621994 366 init.cc:205] @ 0x7f137646bc5a __gxx_personality_v0
W1208 11:53:03.625728 366 init.cc:205] @ 0x7f1388dd3b97 _Unwind_ForcedUnwind_Phase2
W1208 11:53:03.629989 366 init.cc:205] @ 0x7f1388dd3e7d _Unwind_ForcedUnwind
W1208 11:53:03.634865 366 init.cc:205] @ 0x7f139613bd60 __GI___pthread_unwind
W1208 11:53:03.639715 366 init.cc:205] @ 0x7f1396136dd5 __pthread_exit
W1208 11:53:03.664770 366 init.cc:205] @ 0x559fb4fe2289 PyThread_exit_thread
W1208 11:53:03.670189 366 init.cc:205] @ 0x559fb4e7447a PyEval_RestoreThread.cold.736
W1208 11:53:03.674211 366 init.cc:205] @ 0x7f13427e65b9 pybind11::gil_scoped_release::~gil_scoped_release()
W1208 11:53:03.675829 366 init.cc:205] @ 0x7f134279af23 _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9operators6reader22LoDTensorBlockingQueueERKSt6vectorINS2_9framework9LoDTensorESaISC_EEE58_bIS9_SG_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESY_
W1208 11:53:03.678143 366 init.cc:205] @ 0x7f13427fa6e6 pybind11::cpp_function::dispatcher()
W1208 11:53:03.705709 366 init.cc:205] @ 0x559fb4f23fd4 _PyCFunction_FastCallDict
W1208 11:53:03.722385 366 init.cc:205] @ 0x559fb4fb1d3e call_function
W1208 11:53:03.749676 366 init.cc:205] @ 0x559fb4fd619a _PyEval_EvalFrameDefault
W1208 11:53:03.776196 366 init.cc:205] @ 0x559fb4fac8c8 PyEval_EvalCodeEx
W1208 11:53:03.791987 366 init.cc:205] @ 0x559fb4fad456 function_call
W1208 11:53:03.819322 366 init.cc:205] @ 0x559fb4f23dde PyObject_Call
W1208 11:53:03.846752 366 init.cc:205] @ 0x559fb4fd7994 _PyEval_EvalFrameDefault
W1208 11:53:03.861977 366 init.cc:205] @ 0x559fb4fab7db fast_function
W1208 11:53:03.877820 366 init.cc:205] @ 0x559fb4fb1cc5 call_function
W1208 11:53:03.905719 366 init.cc:205] @ 0x559fb4fd619a _PyEval_EvalFrameDefault
W1208 11:53:03.921274 366 init.cc:205] @ 0x559fb4fab7db fast_function
W1208 11:53:03.937258 366 init.cc:205] @ 0x559fb4fb1cc5 call_function
W1208 11:53:03.964187 366 init.cc:205] @ 0x559fb4fd619a _PyEval_EvalFrameDefault
W1208 11:53:03.989130 366 init.cc:205] @ 0x559fb4fabe4b _PyFunction_FastCallDict
W1208 11:53:04.013437 366 init.cc:205] @ 0x559fb4f2439f _PyObject_FastCallDict
W1208 11:53:04.037919 366 init.cc:205] @ 0x559fb4f28ff3 _PyObject_Call_Prepend