在昨天放出的srn中文版本,训练,只修改添加了中文的词典,发现loss有明显震荡甚至发散。是正常现象么?是不是由于结果两级loss的因素?英文单词由多个字母组成,后面的矫正容易,中文训练的话,需要改两步训练么?
Created by: Meicsu199345
2020-08-26 16:15:09,724-INFO: epoch: 0, iter: 10, lr: 0.000100, 'loss': 8865.338, 'acc': 0.0, time: 1.455 2020-08-26 16:15:24,710-INFO: epoch: 0, iter: 20, lr: 0.000100, 'loss': 3628.9634, 'acc': 0.0, time: 1.457 2020-08-26 16:15:41,112-INFO: epoch: 0, iter: 30, lr: 0.000100, 'loss': 1778.2188, 'acc': 0.0, time: 1.493 2020-08-26 16:15:56,523-INFO: epoch: 0, iter: 40, lr: 0.000100, 'loss': 1531.7689, 'acc': 0.0, time: 1.464 2020-08-26 16:16:12,307-INFO: epoch: 0, iter: 50, lr: 0.000100, 'loss': 1424.8579, 'acc': 0.0, time: 1.667 2020-08-26 16:16:27,500-INFO: epoch: 0, iter: 60, lr: 0.000100, 'loss': 1349.9011, 'acc': 0.0, time: 1.952 2020-08-26 16:16:43,573-INFO: epoch: 0, iter: 70, lr: 0.000100, 'loss': 1281.2195, 'acc': 0.0, time: 1.495 2020-08-26 16:16:58,564-INFO: epoch: 0, iter: 80, lr: 0.000100, 'loss': 1256.5146, 'acc': 0.023438, time: 1.488 2020-08-26 16:17:14,166-INFO: epoch: 0, iter: 90, lr: 0.000100, 'loss': 1214.2979, 'acc': 0.072917, time: 1.521 2020-08-26 16:17:29,347-INFO: epoch: 0, iter: 100, lr: 0.000100, 'loss': 1194.3745, 'acc': 0.078125, time: 1.527 2020-08-26 16:17:45,022-INFO: epoch: 0, iter: 110, lr: 0.000100, 'loss': 1201.0668, 'acc': 0.083333, time: 1.538 2020-08-26 16:18:00,324-INFO: epoch: 0, iter: 120, lr: 0.000100, 'loss': 1177.1057, 'acc': 0.080729, time: 1.531 2020-08-26 16:18:15,603-INFO: epoch: 0, iter: 130, lr: 0.000100, 'loss': 1179.5132, 'acc': 0.075521, time: 1.534 2020-08-26 16:18:30,873-INFO: epoch: 0, iter: 140, lr: 0.000100, 'loss': 1429.1057, 'acc': 0.0625, time: 1.544 2020-08-26 16:18:46,195-INFO: epoch: 0, iter: 150, lr: 0.000100, 'loss': 2689.7197, 'acc': 0.0, time: 1.550 2020-08-26 16:19:01,544-INFO: epoch: 0, iter: 160, lr: 0.000100, 'loss': 2560.416, 'acc': 0.0, time: 1.528 2020-08-26 16:19:16,989-INFO: epoch: 0, iter: 170, lr: 0.000100, 'loss': 2499.3525, 'acc': 0.0, time: 1.562 2020-08-26 16:19:32,357-INFO: epoch: 0, iter: 180, lr: 0.000100, 'loss': 2436.241, 'acc': 0.0, time: 1.542 2020-08-26 16:19:47,691-INFO: epoch: 0, iter: 190, lr: 0.000100, 'loss': 2423.223, 'acc': 0.0, time: 1.533 2020-08-26 16:20:03,077-INFO: epoch: 0, iter: 200, lr: 0.000100, 'loss': 2407.2725, 'acc': 0.0, time: 1.537 2020-08-26 16:20:18,498-INFO: epoch: 0, iter: 210, lr: 0.000100, 'loss': 2414.9458, 'acc': 0.0, time: 1.537 2020-08-26 16:20:33,863-INFO: epoch: 0, iter: 220, lr: 0.000100, 'loss': 2416.734, 'acc': 0.0, time: 1.536 2020-08-26 16:20:49,241-INFO: epoch: 0, iter: 230, lr: 0.000100, 'loss': 2392.4749, 'acc': 0.0, time: 1.526 2020-08-26 16:21:04,614-INFO: epoch: 0, iter: 240, lr: 0.000100, 'loss': 2386.7158, 'acc': 0.0, time: 1.547 2020-08-26 16:21:20,399-INFO: epoch: 0, iter: 250, lr: 0.000100, 'loss': 2382.9746, 'acc': 0.0, time: 1.548 2020-08-26 16:21:35,856-INFO: epoch: 0, iter: 260, lr: 0.000100, 'loss': 2387.2427, 'acc': 0.0, time: 1.545 2020-08-26 16:21:51,843-INFO: epoch: 0, iter: 270, lr: 0.000100, 'loss': 2378.5024, 'acc': 0.0, time: 1.558 2020-08-26 16:22:07,344-INFO: epoch: 0, iter: 280, lr: 0.000100, 'loss': 2350.8608, 'acc': 0.0, time: 1.553 2020-08-26 16:22:22,760-INFO: epoch: 0, iter: 290, lr: 0.000100, 'loss': 2352.2654, 'acc': 0.0, time: 1.520 2020-08-26 16:22:38,166-INFO: epoch: 0, iter: 300, lr: 0.000100, 'loss': 2359.5093, 'acc': 0.0, time: 1.584 2020-08-26 16:22:53,566-INFO: epoch: 0, iter: 310, lr: 0.000100, 'loss': 2376.8528, 'acc': 0.0, time: 1.539 2020-08-26 16:23:10,429-INFO: epoch: 0, iter: 320, lr: 0.000100, 'loss': 2382.2124, 'acc': 0.0, time: 1.536 2020-08-26 16:23:26,290-INFO: epoch: 0, iter: 330, lr: 0.000100, 'loss': 3498.0493, 'acc': 0.0, time: 2.021 2020-08-26 16:23:42,608-INFO: epoch: 0, iter: 340, lr: 0.000100, 'loss': 3531.2666, 'acc': 0.0, time: 1.529 2020-08-26 16:23:59,385-INFO: epoch: 0, iter: 350, lr: 0.000100, 'loss': 3463.5503, 'acc': 0.0, time: 2.006 2020-08-26 16:24:15,266-INFO: epoch: 0, iter: 360, lr: 0.000100, 'loss': 3417.6025, 'acc': 0.0, time: 1.539 2020-08-26 16:24:30,682-INFO: epoch: 0, iter: 370, lr: 0.000100, 'loss': 3404.5903, 'acc': 0.0, time: 1.535 2020-08-26 16:24:46,718-INFO: epoch: 0, iter: 380, lr: 0.000100, 'loss': 3402.6914, 'acc': 0.0, time: 1.542 2020-08-26 16:25:02,105-INFO: epoch: 0, iter: 390, lr: 0.000100, 'loss': 3393.2122, 'acc': 0.0, time: 1.558 2020-08-26 16:25:17,635-INFO: epoch: 0, iter: 400, lr: 0.000100, 'loss': 3376.334, 'acc': 0.0, time: 1.545 2020-08-26 16:25:33,150-INFO: epoch: 0, iter: 410, lr: 0.000100, 'loss': 3365.7368, 'acc': 0.0, time: 1.575 2020-08-26 16:25:49,170-INFO: epoch: 0, iter: 420, lr: 0.000100, 'loss': 3362.427, 'acc': 0.0, time: 1.563 2020-08-26 16:26:04,570-INFO: epoch: 0, iter: 430, lr: 0.000100, 'loss': 3379.5005, 'acc': 0.0, time: 1.551 2020-08-26 16:26:20,042-INFO: epoch: 0, iter: 440, lr: 0.000100, 'loss': 3357.6929, 'acc': 0.0, time: 1.550 2020-08-26 16:26:35,445-INFO: epoch: 0, iter: 450, lr: 0.000100, 'loss': 3334.6873, 'acc': 0.0, time: 1.539 2020-08-26 16:26:51,423-INFO: epoch: 0, iter: 460, lr: 0.000100, 'loss': 3350.052, 'acc': 0.0, time: 1.539 2020-08-26 16:27:06,906-INFO: epoch: 0, iter: 470, lr: 0.000100, 'loss': 4661.0244, 'acc': 0.0, time: 1.554 2020-08-26 16:27:22,722-INFO: epoch: 0, iter: 480, lr: 0.000100, 'loss': 4608.0566, 'acc': 0.0, time: 1.541 2020-08-26 16:27:38,222-INFO: epoch: 0, iter: 490, lr: 0.000100, 'loss': 4511.8975, 'acc': 0.0, time: 1.523 2020-08-26 16:27:53,788-INFO: epoch: 0, iter: 500, lr: 0.000100, 'loss': 4396.5225, 'acc': 0.0, time: 1.529 2020-08-26 16:28:09,442-INFO: epoch: 0, iter: 510, lr: 0.000100, 'loss': 4277.032, 'acc': 0.0, time: 1.566 2020-08-26 16:28:24,901-INFO: epoch: 0, iter: 520, lr: 0.000100, 'loss': 4214.5713, 'acc': 0.0, time: 1.540 2020-08-26 16:28:41,259-INFO: epoch: 0, iter: 530, lr: 0.000100, 'loss': 4182.3135, 'acc': 0.0, time: 2.019 2020-08-26 16:28:57,224-INFO: epoch: 0, iter: 540, lr: 0.000100, 'loss': 4137.4746, 'acc': 0.0, time: 1.561 2020-08-26 16:29:12,695-INFO: epoch: 0, iter: 550, lr: 0.000100, 'loss': 4132.7363, 'acc': 0.0, time: 1.557 2020-08-26 16:29:29,008-INFO: epoch: 0, iter: 560, lr: 0.000100, 'loss': 4117.2603, 'acc': 0.0, time: 2.027 2020-08-26 16:29:45,298-INFO: epoch: 0, iter: 570, lr: 0.000100, 'loss': 4110.297, 'acc': 0.0, time: 1.561 2020-08-26 16:30:01,186-INFO: epoch: 0, iter: 580, lr: 0.000100, 'loss': 4083.0166, 'acc': 0.0, time: 1.560 2020-08-26 16:30:17,546-INFO: epoch: 0, iter: 590, lr: 0.000100, 'loss': 4066.9604, 'acc': 0.0, time: 1.528 2020-08-26 16:30:32,960-INFO: epoch: 0, iter: 600, lr: 0.000100, 'loss': 4050.258, 'acc': 0.0, time: 1.560 2020-08-26 16:30:48,864-INFO: epoch: 0, iter: 610, lr: 0.000100, 'loss': 4012.3652, 'acc': 0.0, time: 1.530 2020-08-26 16:31:04,544-INFO: epoch: 0, iter: 620, lr: 0.000100, 'loss': 4050.6157, 'acc': 0.0, time: 1.604 2020-08-26 16:31:20,761-INFO: epoch: 0, iter: 630, lr: 0.000100, 'loss': 4075.1973, 'acc': 0.0, time: 1.531 2020-08-26 16:31:37,031-INFO: epoch: 0, iter: 640, lr: 0.000100, 'loss': 4043.4785, 'acc': 0.0, time: 2.024 2020-08-26 16:31:52,503-INFO: epoch: 0, iter: 650, lr: 0.000100, 'loss': 4030.134, 'acc': 0.0, time: 1.541 2020-08-26 16:32:09,342-INFO: epoch: 0, iter: 660, lr: 0.000100, 'loss': 4192.7915, 'acc': 0.0, time: 1.548 2020-08-26 16:32:25,156-INFO: epoch: 0, iter: 670, lr: 0.000100, 'loss': 5275.8154, 'acc': 0.0, time: 1.551 2020-08-26 16:32:40,881-INFO: epoch: 0, iter: 680, lr: 0.000100, 'loss': 5486.115, 'acc': 0.0, time: 1.569 2020-08-26 16:32:56,387-INFO: epoch: 0, iter: 690, lr: 0.000100, 'loss': 5574.666, 'acc': 0.0, time: 1.539 2020-08-26 16:33:11,844-INFO: epoch: 0, iter: 700, lr: 0.000100, 'loss': 5286.5293, 'acc': 0.0, time: 1.546 2020-08-26 16:33:27,294-INFO: epoch: 0, iter: 710, lr: 0.000100, 'loss': 5117.9717, 'acc': 0.0, time: 1.573 2020-08-26 16:33:43,182-INFO: epoch: 0, iter: 720, lr: 0.000100, 'loss': 5074.859, 'acc': 0.0, time: 1.504 2020-08-26 16:33:59,445-INFO: epoch: 0, iter: 730, lr: 0.000100, 'loss': 5019.9023, 'acc': 0.0, time: 1.542 2020-08-26 16:34:15,890-INFO: epoch: 0, iter: 740, lr: 0.000100, 'loss': 5111.6816, 'acc': 0.0, time: 2.025 2020-08-26 16:34:32,308-INFO: epoch: 0, iter: 750, lr: 0.000100, 'loss': 6095.9155, 'acc': 0.0, time: 1.532 2020-08-26 16:34:47,730-INFO: epoch: 0, iter: 760, lr: 0.000100, 'loss': 6075.448, 'acc': 0.0, time: 1.554 2020-08-26 16:35:03,197-INFO: epoch: 0, iter: 770, lr: 0.000100, 'loss': 6026.6475, 'acc': 0.0, time: 1.545 2020-08-26 16:35:18,609-INFO: epoch: 0, iter: 780, lr: 0.000100, 'loss': 5866.498, 'acc': 0.0, time: 1.529 2020-08-26 16:35:34,120-INFO: epoch: 0, iter: 790, lr: 0.000100, 'loss': 5865.2324, 'acc': 0.0, time: 1.529 2020-08-26 16:35:49,651-INFO: epoch: 0, iter: 800, lr: 0.000100, 'loss': 5835.7197, 'acc': 0.0, time: 1.530 2020-08-26 16:36:05,555-INFO: epoch: 0, iter: 810, lr: 0.000100, 'loss': 5814.573, 'acc': 0.0, time: 1.534 2020-08-26 16:36:21,480-INFO: epoch: 0, iter: 820, lr: 0.000100, 'loss': 6927.5576, 'acc': 0.0, time: 1.548 2020-08-26 16:36:36,996-INFO: epoch: 0, iter: 830, lr: 0.000100, 'loss': 6910.6157, 'acc': 0.0, time: 1.508 2020-08-26 16:36:53,262-INFO: epoch: 0, iter: 840, lr: 0.000100, 'loss': 6722.688, 'acc': 0.0, time: 1.533 2020-08-26 16:37:09,181-INFO: epoch: 0, iter: 850, lr: 0.000100, 'loss': 6652.197, 'acc': 0.0, time: 1.530 2020-08-26 16:37:24,810-INFO: epoch: 0, iter: 860, lr: 0.000100, 'loss': 7368.076, 'acc': 0.0, time: 1.547 2020-08-26 16:37:40,285-INFO: epoch: 0, iter: 870, lr: 0.000100, 'loss': 7796.634, 'acc': 0.0, time: 1.558 2020-08-26 16:37:55,671-INFO: epoch: 0, iter: 880, lr: 0.000100, 'loss': 7734.249, 'acc': 0.0, time: 1.556 2020-08-26 16:38:12,026-INFO: epoch: 0, iter: 890, lr: 0.000100, 'loss': 7648.0654, 'acc': 0.0, time: 1.565 2020-08-26 16:38:27,987-INFO: epoch: 0, iter: 900, lr: 0.000100, 'loss': 7825.077, 'acc': 0.0, time: 1.541 2020-08-26 16:38:44,206-INFO: epoch: 0, iter: 910, lr: 0.000100, 'loss': 8521.256, 'acc': 0.0, time: 1.998 2020-08-26 16:38:59,716-INFO: epoch: 0, iter: 920, lr: 0.000100, 'loss': 8585.867, 'acc': 0.0, time: 1.645 2020-08-26 16:39:15,187-INFO: epoch: 0, iter: 930, lr: 0.000100, 'loss': 9137.176, 'acc': 0.0, time: 1.543 2020-08-26 16:39:30,737-INFO: epoch: 0, iter: 940, lr: 0.000100, 'loss': 9123.863, 'acc': 0.0, time: 1.538 2020-08-26 16:39:46,558-INFO: epoch: 0, iter: 950, lr: 0.000100, 'loss': 8777.811, 'acc': 0.0, time: 1.560 2020-08-26 16:40:02,481-INFO: epoch: 0, iter: 960, lr: 0.000100, 'loss': 7878.2676, 'acc': 0.0, time: 1.525 2020-08-26 16:40:18,059-INFO: epoch: 0, iter: 970, lr: 0.000100, 'loss': 7766.036, 'acc': 0.0, time: 1.734 2020-08-26 16:40:33,633-INFO: epoch: 0, iter: 980, lr: 0.000100, 'loss': 8081.324, 'acc': 0.0, time: 1.551 2020-08-26 16:40:49,907-INFO: epoch: 0, iter: 990, lr: 0.000100, 'loss': 10508.6875, 'acc': 0.0, time: 1.552 2020-08-26 16:41:05,448-INFO: epoch: 0, iter: 1000, lr: 0.000100, 'loss': 10679.992, 'acc': 0.0, time: 1.557 2020-08-26 16:41:21,434-INFO: epoch: 0, iter: 1010, lr: 0.000100, 'loss': 10756.498, 'acc': 0.0, time: 1.562 2020-08-26 16:41:37,745-INFO: epoch: 0, iter: 1020, lr: 0.000100, 'loss': 10211.674, 'acc': 0.0, time: 2.040 2020-08-26 16:41:53,275-INFO: epoch: 0, iter: 1030, lr: 0.000100, 'loss': 10060.361, 'acc': 0.0, time: 1.567 2020-08-26 16:42:09,594-INFO: epoch: 0, iter: 1040, lr: 0.000100, 'loss': 12289.946, 'acc': 0.0, time: 1.558 2020-08-26 16:42:25,207-INFO: epoch: 0, iter: 1050, lr: 0.000100, 'loss': 12379.883, 'acc': 0.0, time: 1.567 2020-08-26 16:42:41,646-INFO: epoch: 0, iter: 1060, lr: 0.000100, 'loss': 14202.0625, 'acc': 0.0, time: 1.537 2020-08-26 16:42:58,256-INFO: epoch: 0, iter: 1070, lr: 0.000100, 'loss': 16405.912, 'acc': 0.0, time: 2.082 2020-08-26 16:43:13,742-INFO: epoch: 0, iter: 1080, lr: 0.000100, 'loss': 17079.21, 'acc': 0.0, time: 1.570 2020-08-26 16:43:48,558-INFO: epoch: 1, iter: 1090, lr: 0.000100, 'loss': 15609.656, 'acc': 0.0, time: 1.482 2020-08-26 16:44:03,392-INFO: epoch: 1, iter: 1100, lr: 0.000100, 'loss': 2046.145, 'acc': 0.0, time: 1.518 2020-08-26 16:44:19,117-INFO: epoch: 1, iter: 1110, lr: 0.000100, 'loss': 1381.2078, 'acc': 0.0, time: 2.033 2020-08-26 16:44:34,358-INFO: epoch: 1, iter: 1120, lr: 0.000100, 'loss': 1174.2664, 'acc': 0.0, time: 1.535 2020-08-26 16:44:50,141-INFO: epoch: 1, iter: 1130, lr: 0.000100, 'loss': 1101.4368, 'acc': 0.067708, time: 1.537 2020-08-26 16:45:06,806-INFO: epoch: 1, iter: 1140, lr: 0.000100, 'loss': 1083.049, 'acc': 0.085938, time: 1.988 2020-08-26 16:45:22,118-INFO: epoch: 1, iter: 1150, lr: 0.000100, 'loss': 1047.7622, 'acc': 0.083333, time: 1.531 2020-08-26 16:45:37,865-INFO: epoch: 1, iter: 1160, lr: 0.000100, 'loss': 1023.84595, 'acc': 0.078125, time: 1.529 2020-08-26 16:45:53,073-INFO: epoch: 1, iter: 1170, lr: 0.000100, 'loss': 1002.95886, 'acc': 0.078125, time: 1.529 2020-08-26 16:46:08,362-INFO: epoch: 1, iter: 1180, lr: 0.000100, 'loss': 985.1726, 'acc': 0.080729, time: 1.537 2020-08-26 16:46:23,637-INFO: epoch: 1, iter: 1190, lr: 0.000100, 'loss': 989.4188, 'acc': 0.080729, time: 1.518 2020-08-26 16:46:39,925-INFO: epoch: 1, iter: 1200, lr: 0.000100, 'loss': 981.55054, 'acc': 0.072917, time: 2.031 2020-08-26 16:46:55,218-INFO: epoch: 1, iter: 1210, lr: 0.000100, 'loss': 972.88403, 'acc': 0.075521, time: 1.528 2020-08-26 16:47:10,541-INFO: epoch: 1, iter: 1220, lr: 0.000100, 'loss': 985.5654, 'acc': 0.067708, time: 1.532 2020-08-26 16:47:25,779-INFO: epoch: 1, iter: 1230, lr: 0.000100, 'loss': 2270.5752, 'acc': 0.0, time: 1.536