When I use attentation model training a recognition model, the 'train_acc' always be 0.000000, and after 2 epoch, train_cost became nan
Created by: yanmeizhao
When I use attentation model training a recognition model, the 'train_acc' always be 0.000000, and after 2 epoch, train_cost became nan, and train_acc still 0.000000, the training info is as follow:
I0802 11:10:00.041497 2313 parallel_executor.cc:329] The number of CUDAPlace, which is used in ParallelExecutor, is 4. And the Program will be copied 4 copies I0802 11:10:01.277854 2313 build_strategy.cc:340] SeqOnlyAllReduceOps:0, num_trainers:1
Time: 1564715520.4697154; Iter[1000]; Avg loss: 25.504; Avg seq err: 1.000 kpis train_cost 25.504290 kpis train_acc 0.000000
Time: 1564715634.926904; Iter[2000]; Avg loss: 22.807; Avg seq err: 1.000 kpis train_cost 22.807307 kpis train_acc 0.000000
Time: 1564715748.7402227; Iter[3000]; Avg loss: nan; Avg seq err: 1.000 kpis train_cost nan kpis train_acc 0.000000
Time: 1564715862.1550791; Iter[4000]; Avg loss: nan; Avg seq err: 1.000 kpis train_cost nan kpis train_acc 0.000000
Time: 1564715975.961236; Iter[5000]; Avg loss: nan; Avg seq err: 1.000 kpis train_cost nan kpis train_acc 0.000000
Time: 1564716089.7519104; Iter[6000]; Avg loss: nan; Avg seq err: 1.000 kpis train_cost nan kpis train_acc 0.000000
The dataset used for training recognition is provised by "实例数据" in README
Thank you in advance!