lstm训练时,loss变为nan
Created by: Jinxi2
版本:paddle 1.5.1 问题:使用fluid.layers.lstm()时,hidden参数设置为128时,模型可以正常训练;其他设置不变,hidden参数设置为512时,训练几十个batch后,loss和准确率都会出现nan。但如果把lstm换成fc层,其他设置不变,则不会出现loss为nan的情况。输入是经过padding后的[seq_len, batch_size, embedding_size]的embedidng。
// 上面部分是embedding,维度为[seq_len, -1, embed_size] hidden = 128 init_h = fluid.layers.fill_constant([2, self.config.bat_size * 8, self.config.dim_hid], 'float32', 0.0 ) init_c = fluid.layers.fill_constant([2, self.config.bat_size * 8, self.config.dim_hid], 'float32', 0.0 ) feature, last_h, last_c = fluid.layers.lstm(name='bilstm_'+type, input=embedding, is_bidirec=True, init_h=init_h, init_c=init_c, max_len=5, hidden_size=hidden, num_layers=1) feature = fluid.layers.reduce_mean(feature, dim=0) // 后面是全连接层
训练log: INFO:(pid=21010): [2 epoch]-[3100 step] cost 6.8512 prec1 54.49% [p-dev 69.7%] each loss is 2.866, 0.446, 3.388, 0.151 each prec is 0.545, 0.826, 0.512, 0.949 INFO:(pid=21010): [2 epoch]-[3150 step] cost 7.0951 prec1 55.86% [p-dev 69.7%] each loss is 2.881, 0.51, 3.562, 0.142 each prec is 0.559, 0.777, 0.508, 0.949 INFO:(pid=21010): [3 epoch]-[3200 step] cost nan prec1 0.0% [p-dev 69.7%] each loss is nan, nan, nan, nan each prec is 0.0, 0.0, 0.0, 0.5 INFO:(pid=21010): [3 epoch]-[3250 step] cost nan prec1 0.0% [p-dev 69.7%] each loss is nan, nan, nan, nan each prec is 0.0, 0.0, 0.0, 0.5