使用memory_optimize后,loss计算异常
Created by: cserken
如题,在训练模型时,增加了memory_optimize后,loss经过几轮计算后,看似溢出了: Pass 0, trainbatch 0, loss 3.623, acc1 0.08333, acc5 0.4271 time 1.11 sec, batch 96 Pass 0, trainbatch 50, loss 4.479, acc1 0.1146, acc5 0.6458 time 0.80 sec, batch 96 Pass 0, trainbatch 100, loss 6.513, acc1 0.1146, acc5 0.6458 time 0.79 sec, batch 96 Pass 0, trainbatch 150, loss 3.125e+18, acc1 0.1875, acc5 0.6667 time 0.78 sec, batch 96 Pass 0, trainbatch 200, loss 1.042e+18, acc1 0.125, acc5 0.6458 time 0.79 sec, batch 96 Pass 0, trainbatch 250, loss 4.167e+18, acc1 0.1042, acc5 0.6042 time 0.80 sec, batch 96 Pass 0, trainbatch 300, loss 6.25e+18, acc1 0.09375, acc5 0.5104 time 0.79 sec, batch 96 Pass 0, trainbatch 350, loss 1.458e+19, acc1 0.1354, acc5 0.6667 time 0.80 sec, batch 96
不加memory_optimize时loss输出都正常 参考了这个issue的做法https://github.com/PaddlePaddle/Paddle/issues/11320 memory_optimize传入fetch_list,结果还是一样。 求问这种问题如何定位