Add early stop by val loss
Created by: MoZhonglin
In PaddleRec/gnn, train process stops when train loss does not decrease anymore. However, that may be overfitting. Could you produce val loss after every training epoch and early stop by val loss?
Here the last part of training log:
2019-05-22 07:47:53,949-INFO: global_step: 293500, loss: 1.5515, train_acc: 0.9171
2019-05-22 07:48:48,193-INFO: global_step: 294000, loss: 1.5512, train_acc: 0.9160
2019-05-22 07:49:43,223-INFO: global_step: 294500, loss: 1.5385, train_acc: 0.9189
2019-05-22 07:50:38,921-INFO: global_step: 295000, loss: 1.5691, train_acc: 0.9152
2019-05-22 07:51:32,363-INFO: global_step: 295500, loss: 1.5419, train_acc: 0.9180
2019-05-22 07:52:30,012-INFO: global_step: 296000, loss: 1.5412, train_acc: 0.9170
2019-05-22 07:53:27,100-INFO: global_step: 296500, loss: 1.5574, train_acc: 0.9179
2019-05-22 07:54:23,586-INFO: global_step: 297000, loss: 1.5737, train_acc: 0.9147
2019-05-22 07:55:20,923-INFO: global_step: 297500, loss: 1.5620, train_acc: 0.9165
2019-05-22 07:56:17,238-INFO: global_step: 298000, loss: 1.5629, train_acc: 0.9163
2019-05-22 07:57:12,421-INFO: global_step: 298500, loss: 1.5550, train_acc: 0.9171
2019-05-22 07:58:09,173-INFO: global_step: 299000, loss: 1.5560, train_acc: 0.9160
2019-05-22 07:59:03,355-INFO: global_step: 299500, loss: 1.5360, train_acc: 0.9182
2019-05-22 08:00:00,681-INFO: global_step: 300000, loss: 1.5704, train_acc: 0.9149
2019-05-22 08:00:57,691-INFO: global_step: 300500, loss: 1.5599, train_acc: 0.9163
2019-05-22 08:01:55,292-INFO: global_step: 301000, loss: 1.5554, train_acc: 0.9168
2019-05-22 08:02:53,234-INFO: global_step: 301500, loss: 1.5628, train_acc: 0.9169
2019-05-22 08:03:51,387-INFO: global_step: 302000, loss: 1.5470, train_acc: 0.9187
2019-05-22 08:04:48,840-INFO: global_step: 302500, loss: 1.5536, train_acc: 0.9166
2019-05-22 08:05:44,542-INFO: global_step: 303000, loss: 1.5630, train_acc: 0.9164
2019-05-22 08:06:07,681-INFO: epoch loss: 1.5558
And here is the validation(infer) log:
2019-05-22 11:25:00,348-INFO: TEST --> loss: 7.4646, Recall@20: 0.3964
2019-05-22 11:25:29,366-INFO: TEST --> loss: 7.0453, Recall@20: 0.4387
2019-05-22 11:25:57,792-INFO: TEST --> loss: 6.9206, Recall@20: 0.4504
2019-05-22 11:26:25,246-INFO: TEST --> loss: 6.9511, Recall@20: 0.4546
2019-05-22 11:26:54,117-INFO: TEST --> loss: 7.1196, Recall@20: 0.4438
2019-05-22 11:27:22,694-INFO: TEST --> loss: 7.5807, Recall@20: 0.4227
2019-05-22 11:27:51,848-INFO: TEST --> loss: 8.1819, Recall@20: 0.3984
2019-05-22 11:28:21,325-INFO: TEST --> loss: 8.7333, Recall@20: 0.3805
2019-05-22 11:28:50,026-INFO: TEST --> loss: 9.4080, Recall@20: 0.3667
2019-05-22 11:29:19,289-INFO: TEST --> loss: 9.9524, Recall@20: 0.3628
2019-05-22 11:29:48,549-INFO: TEST --> loss: 10.5251, Recall@20: 0.3534
2019-05-22 11:30:17,700-INFO: TEST --> loss: 11.2345, Recall@20: 0.3476
2019-05-22 11:30:48,250-INFO: TEST --> loss: 11.6404, Recall@20: 0.3412
2019-05-22 11:31:17,090-INFO: TEST --> loss: 12.3211, Recall@20: 0.3370
2019-05-22 11:31:47,434-INFO: TEST --> loss: 12.8373, Recall@20: 0.3335
2019-05-22 11:32:18,261-INFO: TEST --> loss: 13.4245, Recall@20: 0.3295