序列标注自定义数据集报keyerror
Created by: Stephen-L-Woo
平台:aistudio paddlehub版本:1.5.0 使用模型版本:ernie-1.1.0
我参照的项目: 《PaddleHub实战——使用语义预训练模型ERNIE优化信息抽取》 https://aistudio.baidu.com/aistudio/projectdetail/184200
我的数据集格式不同是中间没有分隔符‘’,和官方例子中的MSRA_NER()的格式是一样
但一直报错,而且相似的还报好几个,这是什么原因?
2020-02-06 14:58:53,414-WARNING: Your decorated reader has raised an exception! Exception in thread Thread-9: Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner self.run() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/io.py", line 474, in __provider_thread__ six.reraise(*sys.exc_info()) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 693, in reraise raise value File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/io.py", line 455, in __provider_thread__ for tensors in func(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/io.py", line 506, in __tensor_provider__ for slots in paddle_reader(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 488, in __reader_creator__ for item in reader(): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/reader/nlp_reader.py", line 257, in wrapper examples, batch_size, phase=phase): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/reader/nlp_reader.py", line 187, in _prepare_batch_data self.tokenizer, phase) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/reader/nlp_reader.py", line 463, in _convert_example_to_record for label in labels] + [no_entity_id] File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/reader/nlp_reader.py", line 463, in <listcomp> for label in labels] + [no_entity_id] KeyError: 'OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO'