用paddleNLP中的emotion detection模型库,当文本类别好几千时,算出来avg loss: nan如何解决?
Created by: xfyu1999
版本为fluid 1.4, CUDA 9.0, cuDNN Version: 7.0; 我的类别从0一直到6600,有六千多个类,log信息如下:
----------- Configuration Arguments ----------- batch_size: 64 config_path: ./config.json data_dir: ./data/ do_infer: False do_train: True do_val: True epoch: 60 init_checkpoint: None lr: 0.002 output_dir: ./save_models/bilstm random_seed: 0 save_steps: 1000 skip_steps: 1000 task_name: topic_detection use_cuda: True validation_steps: 1000 verbose: False vocab_path: ./data/vocab.txt
Num train examples: 966363 Max train steps: 905966 step: 1000, avg loss: nan, avg acc: 0.312500, speed: 68.159301 steps/s [dev evaluation] avg loss: nan, avg acc: 0.285054, elapsed time: 4.126837 s step: 2000, avg loss: nan, avg acc: 0.328125, speed: 54.500267 steps/s [dev evaluation] avg loss: nan, avg acc: 0.288556, elapsed time: 4.153094 s step: 3000, avg loss: nan, avg acc: 0.265625, speed: 55.482941 steps/s [dev evaluation] avg loss: nan, avg acc: 0.289574, elapsed time: 5.376837 s step: 4000, avg loss: nan, avg acc: 0.234375, speed: 51.638327 steps/s [dev evaluation] avg loss: nan, avg acc: 0.289897, elapsed time: 3.970590 s step: 5000, avg loss: nan, avg acc: 0.312500, speed: 55.464049 steps/s [dev evaluation] avg loss: nan, avg acc: 0.289997, elapsed time: 3.967897 s step: 6000, avg loss: nan, avg acc: 0.281250, speed: 55.519344 steps/s [dev evaluation] avg loss: nan, avg acc: 0.290493, elapsed time: 4.155741 s step: 7000, avg loss: nan, avg acc: 0.281250, speed: 54.741381 steps/s [dev evaluation] avg loss: nan, avg acc: 0.290816, elapsed time: 4.073371 s step: 8000, avg loss: nan, avg acc: 0.359375, speed: 55.044019 steps/s [dev evaluation] avg loss: nan, avg acc: 0.290990, elapsed time: 4.118042 s step: 9000, avg loss: nan, avg acc: 0.343750, speed: 50.080564 steps/s [dev evaluation] avg loss: nan, avg acc: 0.290990, elapsed time: 4.061233 s step: 10000, avg loss: nan, avg acc: 0.281250, speed: 53.257747 steps/s [dev evaluation] avg loss: nan, avg acc: 0.291015, elapsed time: 3.943664 s step: 11000, avg loss: nan, avg acc: 0.281250, speed: 55.656405 steps/s [dev evaluation] avg loss: nan, avg acc: 0.291139, elapsed time: 4.204237 s step: 12000, avg loss: nan, avg acc: 0.250000, speed: 53.648392 steps/s [dev evaluation] avg loss: nan, avg acc: 0.291164, elapsed time: 3.995078 s
到第69000时 step: 69000, avg loss: 3125000062627741696.000000, avg acc: 0.359375, speed: 54.769828 steps/s [dev evaluation] avg loss: nan, avg acc: 0.291909, elapsed time: 4.050528 s 但后面双变成nan