transformer: 'utf-8' codec can't decode byte 0xed
Created by: Genie-Liu
I down the preprocessed data WMT16_EN_DE from https://transformer-res.bj.bcebos.com/wmt16_ende_data_bpe_clean.tar.gz.
When I run infer.py, A UnicodeDecodeError come out:
Traceback (most recent call last): File "infer.py", line 323, in args = parse_args() File "infer.py", line 81, in parse_args src_dict = reader.DataReader.load_dict(args.src_vocab_fpath) File "/Users/xxx/Documents/GitHub/models/PaddleNLP/neural_machine_translation/transformer/reader.py", line 281, in load_dict line = line.decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 0: invalid continuation byte
Operating System: macOS 10.14 python version: Python 3.6.5