error occurs when training model on my own dataset (#235) · Issue · PaddlePaddle / DeepSpeech

error occurs when training model on my own dataset

Created by: wujsy

Hi, I tried to train a Mandarin model on my own dataset(200h), used 6 gpus(K80c) in docker with ubuntu16.04, but i got the following error:

................................................................................................... Pass: 0, Batch: 100, TrainCost: 43.864211 ................................................................................................... Pass: 0, Batch: 200, TrainCost: 46.124234 ................................................................................................... Pass: 0, Batch: 300, TrainCost: 60.454348 ................................................................................................... Pass: 0, Batch: 400, TrainCost: 50.351969 ................................................................................................... Pass: 0, Batch: 500, TrainCost: 60.703720 ................................................................................................... Pass: 0, Batch: 600, TrainCost: 65.057645 ................................................................................................... Pass: 0, Batch: 700, TrainCost: 45.140548 ................................................................................................... Pass: 0, Batch: 800, TrainCost: 61.789109 ................................................................................................... Pass: 0, Batch: 900, TrainCost: 54.016421 ................................................................................................... Pass: 0, Batch: 1000, TrainCost: 41.392280 ................................................................................................... Pass: 0, Batch: 1100, TrainCost: 37.685595 ................................................................................................... Pass: 0, Batch: 1200, TrainCost: 27.951182 ................................................................................................... Pass: 0, Batch: 1300, TrainCost: 34.722017 ................................................................................................... Pass: 0, Batch: 1400, TrainCost: 34.317093 ..............................................................Traceback (most recent call last): File "train.py", line 129, in main() File "train.py", line 125, in main train() File "train.py", line 116, in train test_off=args.test_off) File "/data/wjx/workspace/DeepSpeech/model_utils/model.py", line 155, in train feeding=adapted_feeding_dict) File "/usr/local/lib/python2.7/dist-packages/paddle/v2/trainer.py", line 201, in train gm=self.gradient_machine)) File "/data/wjx/workspace/DeepSpeech/model_utils/model.py", line 140, in event_handler feeding=adapted_feeding_dict) File "/usr/local/lib/python2.7/dist-packages/paddle/v2/trainer.py", line 234, in test evaluator=evaluator, cost=total_cost / num_samples) ZeroDivisionError: float division by zero Failed in training!

any one can help me? thanks!

PaddlePaddle / DeepSpeech 大约 2 年 前同步成功

error occurs when training model on my own dataset

PaddlePaddle / DeepSpeech
大约 2 年前同步成功