error occurs when training model on my own dataset
已关闭
error occurs when training model on my own dataset
Created by: wujsy
Hi, I tried to train a Mandarin model on my own dataset(200h), used 6 gpus(K80c) in docker with ubuntu16.04, but i got the following error:
................................................................................................... Pass: 0, Batch: 100, TrainCost: 43.864211 ................................................................................................... Pass: 0, Batch: 200, TrainCost: 46.124234 ................................................................................................... Pass: 0, Batch: 300, TrainCost: 60.454348 ................................................................................................... Pass: 0, Batch: 400, TrainCost: 50.351969 ................................................................................................... Pass: 0, Batch: 500, TrainCost: 60.703720 ................................................................................................... Pass: 0, Batch: 600, TrainCost: 65.057645 ................................................................................................... Pass: 0, Batch: 700, TrainCost: 45.140548 ................................................................................................... Pass: 0, Batch: 800, TrainCost: 61.789109 ................................................................................................... Pass: 0, Batch: 900, TrainCost: 54.016421 ................................................................................................... Pass: 0, Batch: 1000, TrainCost: 41.392280 ................................................................................................... Pass: 0, Batch: 1100, TrainCost: 37.685595 ................................................................................................... Pass: 0, Batch: 1200, TrainCost: 27.951182 ................................................................................................... Pass: 0, Batch: 1300, TrainCost: 34.722017 ................................................................................................... Pass: 0, Batch: 1400, TrainCost: 34.317093 ..............................................................Traceback (most recent call last): File "train.py", line 129, in main() File "train.py", line 125, in main train() File "train.py", line 116, in train test_off=args.test_off) File "/data/wjx/workspace/DeepSpeech/model_utils/model.py", line 155, in train feeding=adapted_feeding_dict) File "/usr/local/lib/python2.7/dist-packages/paddle/v2/trainer.py", line 201, in train gm=self.gradient_machine)) File "/data/wjx/workspace/DeepSpeech/model_utils/model.py", line 140, in event_handler feeding=adapted_feeding_dict) File "/usr/local/lib/python2.7/dist-packages/paddle/v2/trainer.py", line 234, in test evaluator=evaluator, cost=total_cost / num_samples) ZeroDivisionError: float division by zero Failed in training!
any one can help me? thanks!