Created by: vslyu
修复runner里配置phase时的报错,
- runner里不配置phase,自动加载所有phase
- runner里已配置phase,check 加载对应的phase
格式化save_step的输出信息: config.yaml增加runner如下:
- name: single_multi_gpu_train
class: train
# num of epochs
epochs: 1
# device to run training or infer
device: gpu
selected_gpus: "0,1" # 选择多卡执行训练
save_checkpoint_interval: 1 # save model interval of epochs
save_inference_interval: 4 # save inference
save_step_interval: 40
save_checkpoint_path: "increment_dnn" # save checkpoint path
save_inference_path: "inference" # save inference path
save_step_path: "step_save"
save_inference_feed_varnames: [] # feed vars of save inference
save_inference_fetch_varnames: [] # fetch vars of save inference
print_interval: 1
phases: [phase1]
格式化输出信息:
2020-09-27 08:00:38,775-INFO: save epoch_id:0, batch_id:0 model into: "step_save/epoch_0_batch_0"
2020-09-27 08:00:39,392-INFO: [Train], epoch: 0, batch: 1, time_each_interval: 1.10s, BATCH_AUC: [0.], AUC: [0.]
2020-09-27 08:00:39,414-INFO: [Train], epoch: 0, batch: 2, time_each_interval: 0.02s, BATCH_AUC: [0.4], AUC: [0.4]
2020-09-27 08:00:39,434-INFO: [Train], epoch: 0, batch: 3, time_each_interval: 0.02s, BATCH_AUC: [0.57142857], AUC: [0.57142857]
2020-09-27 08:00:39,452-INFO: [Train], epoch: 0, batch: 4, time_each_interval: 0.02s, BATCH_AUC: [0.3125], AUC: [0.3125]
2020-09-27 08:00:39,469-INFO: [Train], epoch: 0, batch: 5, time_each_interval: 0.02s, BATCH_AUC: [0.2962963], AUC: [0.2962963]
2020-09-27 08:00:39,487-INFO: [Train], epoch: 0, batch: 6, time_each_interval: 0.02s, BATCH_AUC: [0.36363636], AUC: [0.36363636]
2020-09-27 08:00:39,505-INFO: [Train], epoch: 0, batch: 7, time_each_interval: 0.02s, BATCH_AUC: [0.41025641], AUC: [0.41025641]
2020-09-27 08:00:39,522-INFO: [Train], epoch: 0, batch: 8, time_each_interval: 0.02s, BATCH_AUC: [0.46666667], AUC: [0.46666667]
2020-09-27 08:00:39,540-INFO: [Train], epoch: 0, batch: 9, time_each_interval: 0.02s, BATCH_AUC: [0.515625], AUC: [0.515625]
2020-09-27 08:00:39,558-INFO: [Train], epoch: 0, batch: 10, time_each_interval: 0.02s, BATCH_AUC: [0.56944444], AUC: [0.56944444]
2020-09-27 08:00:39,575-INFO: [Train], epoch: 0, batch: 11, time_each_interval: 0.02s, BATCH_AUC: [0.52631579], AUC: [0.52631579]
2020-09-27 08:00:39,593-INFO: [Train], epoch: 0, batch: 12, time_each_interval: 0.02s, BATCH_AUC: [0.57142857], AUC: [0.57142857]
2020-09-27 08:00:39,610-INFO: [Train], epoch: 0, batch: 13, time_each_interval: 0.02s, BATCH_AUC: [0.59130435], AUC: [0.59130435]
2020-09-27 08:00:39,628-INFO: [Train], epoch: 0, batch: 14, time_each_interval: 0.02s, BATCH_AUC: [0.6], AUC: [0.6]
2020-09-27 08:00:39,646-INFO: [Train], epoch: 0, batch: 15, time_each_interval: 0.02s, BATCH_AUC: [0.57051282], AUC: [0.57051282]
2020-09-27 08:00:39,664-INFO: [Train], epoch: 0, batch: 16, time_each_interval: 0.02s, BATCH_AUC: [0.58928571], AUC: [0.58928571]
2020-09-27 08:00:39,687-INFO: [Train], epoch: 0, batch: 17, time_each_interval: 0.02s, BATCH_AUC: [0.61111111], AUC: [0.61111111]
2020-09-27 08:00:39,708-INFO: [Train], epoch: 0, batch: 18, time_each_interval: 0.02s, BATCH_AUC: [0.61979167], AUC: [0.61979167]
2020-09-27 08:00:39,728-INFO: [Train], epoch: 0, batch: 19, time_each_interval: 0.02s, BATCH_AUC: [0.61038961], AUC: [0.61038961]
2020-09-27 08:00:39,750-INFO: [Train], epoch: 0, batch: 20, time_each_interval: 0.02s, BATCH_AUC: [0.54411765], AUC: [0.58367347]
2020-09-27 08:00:39,768-INFO: [Train], epoch: 0, batch: 21, time_each_interval: 0.02s, BATCH_AUC: [0.60294118], AUC: [0.60617761]
2020-09-27 08:00:39,786-INFO: [Train], epoch: 0, batch: 22, time_each_interval: 0.02s, BATCH_AUC: [0.51953125], AUC: [0.5015015]
2020-09-27 08:00:39,804-INFO: [Train], epoch: 0, batch: 23, time_each_interval: 0.02s, BATCH_AUC: [0.55555556], AUC: [0.5]
2020-09-27 08:00:39,822-INFO: [Train], epoch: 0, batch: 24, time_each_interval: 0.02s, BATCH_AUC: [0.56640625], AUC: [0.5025]
2020-09-27 08:00:39,840-INFO: [Train], epoch: 0, batch: 25, time_each_interval: 0.02s, BATCH_AUC: [0.5021645], AUC: [0.49047619]
2020-09-27 08:00:39,858-INFO: [Train], epoch: 0, batch: 26, time_each_interval: 0.02s, BATCH_AUC: [0.578125], AUC: [0.52854123]
2020-09-27 08:00:39,880-INFO: [Train], epoch: 0, batch: 27, time_each_interval: 0.02s, BATCH_AUC: [0.65666667], AUC: [0.55456172]
2020-09-27 08:00:39,907-INFO: [Train], epoch: 0, batch: 28, time_each_interval: 0.03s, BATCH_AUC: [0.71428571], AUC: [0.59224806]
2020-09-27 08:00:39,931-INFO: [Train], epoch: 0, batch: 29, time_each_interval: 0.02s, BATCH_AUC: [0.64576803], AUC: [0.56740741]
2020-09-27 08:00:39,956-INFO: [Train], epoch: 0, batch: 30, time_each_interval: 0.03s, BATCH_AUC: [0.63988095], AUC: [0.57880435]
2020-09-27 08:00:39,979-INFO: [Train], epoch: 0, batch: 31, time_each_interval: 0.02s, BATCH_AUC: [0.5862069], AUC: [0.5546875]
2020-09-27 08:00:40,001-INFO: [Train], epoch: 0, batch: 32, time_each_interval: 0.02s, BATCH_AUC: [0.58035714], AUC: [0.56662665]
2020-09-27 08:00:40,029-INFO: [Train], epoch: 0, batch: 33, time_each_interval: 0.03s, BATCH_AUC: [0.62362637], AUC: [0.60687433]
2020-09-27 08:00:40,055-INFO: [Train], epoch: 0, batch: 34, time_each_interval: 0.03s, BATCH_AUC: [0.65104167], AUC: [0.63556851]
2020-09-27 08:00:40,076-INFO: [Train], epoch: 0, batch: 35, time_each_interval: 0.02s, BATCH_AUC: [0.60266667], AUC: [0.61064426]
2020-09-27 08:00:40,105-INFO: [Train], epoch: 0, batch: 36, time_each_interval: 0.03s, BATCH_AUC: [0.57552083], AUC: [0.61538462]
2020-09-27 08:00:40,128-INFO: [Train], epoch: 0, batch: 37, time_each_interval: 0.02s, BATCH_AUC: [0.53964194], AUC: [0.61607875]
2020-09-27 08:00:40,151-INFO: [Train], epoch: 0, batch: 38, time_each_interval: 0.02s, BATCH_AUC: [0.47314578], AUC: [0.5944664]
2020-09-27 08:00:40,176-INFO: [Train], epoch: 0, batch: 39, time_each_interval: 0.03s, BATCH_AUC: [0.43229167], AUC: [0.57360793]
epoch 0 done, use time: 1.88568782806
2020-09-27 08:00:40,178-INFO: save epoch_id:0 model into: "increment_dnn/0"