训练sast和DB都碰到的不终止不报错问题
Created by: yingwei13mei
在用标签文件训练sast和db的时候都碰到训练未结束但也未终止,停在test tackling num阶段。 2020-08-31 18:02:16,085-INFO: epoch: 65, iter: 4998, 'lr': 1e-04, 'total_loss': 1.876235, 'score_loss': 0.840587, 'border_loss': 0.118486, 'tvo_loss': 0.454703, 'tco_loss': 0.116163, time: 1.140 2020-08-31 18:02:18,396-INFO: epoch: 65, iter: 5000, 'lr': 1e-04, 'total_loss': 1.876235, 'score_loss': 0.83565, 'border_loss': 0.118486, 'tvo_loss': 0.454703, 'tco_loss': 0.116163, time: 1.145 2020-08-31 18:02:18,458-INFO: test tackling num:1 2020-08-31 18:05:54,332-INFO: test tackling num:2
然后GPU显卡也一直有进程,显存也被占用。如果修改eval,不让评估就得不到best accuracy文件,怎么用来做后面预估呢。目前我的output目录就只有中间过程的文件。 ll output/det_sast/ total 9780252 -rw-r----- 1 root root 4036622 Aug 31 15:08 iter_epoch_100.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:08 iter_epoch_100.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:08 iter_epoch_100.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:08 iter_epoch_120.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:08 iter_epoch_120.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:08 iter_epoch_120.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:09 iter_epoch_140.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:09 iter_epoch_140.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:09 iter_epoch_140.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:09 iter_epoch_160.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:09 iter_epoch_160.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:09 iter_epoch_160.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:09 iter_epoch_180.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:09 iter_epoch_180.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:09 iter_epoch_180.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:09 iter_epoch_200.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:09 iter_epoch_200.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:09 iter_epoch_200.pdparams -rw-r----- 1 root root 4036622 Aug 31 16:57 iter_epoch_20.pdmodel -rw-r----- 1 root root 739485841 Aug 31 16:57 iter_epoch_20.pdopt -rw-r----- 1 root root 288025062 Aug 31 16:57 iter_epoch_20.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:09 iter_epoch_220.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:09 iter_epoch_220.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:09 iter_epoch_220.pdparams -rw-r----- 1 root root 4036622 Aug 31 17:26 iter_epoch_40.pdmodel -rw-r----- 1 root root 739664986 Aug 31 17:26 iter_epoch_40.pdopt -rw-r----- 1 root root 288012440 Aug 31 17:26 iter_epoch_40.pdparams -rw-r----- 1 root root 4036622 Aug 31 17:56 iter_epoch_60.pdmodel -rw-r----- 1 root root 739639469 Aug 31 17:56 iter_epoch_60.pdopt -rw-r----- 1 root root 288009444 Aug 31 17:56 iter_epoch_60.pdparams -rw-r----- 1 root root 4036622 Aug 31 15:08 iter_epoch_80.pdmodel -rw-r----- 1 root root 572987067 Aug 31 15:08 iter_epoch_80.pdopt -rw-r----- 1 root root 287971231 Aug 31 15:08 iter_epoch_80.pdparams -rw-r----- 1 root root 0 Aug 31 18:02 predicts_sast.txt