NaN loss in deepvoice3 training
Created by: TheFlash10
Hey, i am currently training a deepvoice3 model on a custom dataset, i am facing this error
2020-06-04 12:01:42,075-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:42,075-WARNING: NaN or Inf found in input tensor. global_step: 20999 loss: nan 38it [01:14, 2.21s/it]2020-06-04 12:01:43,406-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:43,406-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:43,407-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:43,407-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:43,407-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:43,407-WARNING: NaN or Inf found in input tensor. global_step: 21000 loss: nan 39it [01:17, 1.95s/it]2020-06-04 12:01:45,869-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:45,870-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:45,870-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:45,871-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:45,872-WARNING: NaN or Inf found in input tensor. 2020-06-04 12:01:45,872-WARNING: NaN or Inf found in input tensor.
The loss comes out to be 'nan'. When i rerun and resume from the latest checkpoint, it works perfectly. Still sometimes the error pops up.
My understanding is that probably the learning rate is high, resulting in 'nan' loss. I have used the default config file and just the sampling rate is different.
Can you help me by suggesting something to solve this issue ?
Thanks