用Ernie做序列标注任务，如何加入自定义的特征呢？ (#571) · Issue · PaddlePaddle / PaddleHub

用Ernie做序列标注任务，如何加入自定义的特征呢？

Created by: yaweisun

目前在用paddlehub提供的接口做序列标注，使用的预训练模型是“ernie_v2_eng_base”，在数据预处理的时候人工采集了一些特征，放在数据文件的“text_b”字段，之前的特征只有0和1两个值，直接在Reader类中将其赋值给“text_type_ids”，发现效果还可以。但是现在采集的特征有-1到10这些可能值，采用同样的方法就不行了。想问一下有没有比较正规的方法解决这一问题？

目前报错如下所示：

[2020-05-06 22:21:03,427] [ INFO] - Installing ernie_v2_eng_base module [2020-05-06 22:21:03,457] [ INFO] - Module ernie_v2_eng_base already installed in /home/aistudio/.paddlehub/modules/ernie_v2_eng_base [2020-05-06 22:21:04,137] [ INFO] - Set maximum sequence length of input tensor to 512 [2020-05-06 22:21:04,141] [ INFO] - The shape of input tensor[input_ids] set to [-1, 512, 1] [2020-05-06 22:21:04,142] [ INFO] - The shape of input tensor[position_ids] set to [-1, 512, 1] [2020-05-06 22:21:04,142] [ INFO] - The shape of input tensor[segment_ids] set to [-1, 512, 1] [2020-05-06 22:21:04,143] [ INFO] - The shape of input tensor[input_mask] set to [-1, 512, 1] [2020-05-06 22:21:04,143] [ INFO] - The shape of input tensor[task_ids] set to [-1, 512, 1] [2020-05-06 22:21:04,144] [ INFO] - 199 pretrained paramaters loaded by PaddleHub [2020-05-06 22:21:04,160] [ INFO] - Dataset label map = {'B': 0, 'I': 1, 'E': 2, 'S': 3, 'O': 4} [2020-05-06 22:21:04,202] [ INFO] - Checkpoint dir: work/model/ernie_material_data_tfidf_True 2020-05-06 22:21:06,857-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead. [2020-05-06 22:21:07,134] [ INFO] - Strategy with scheduler: {'warmup': 0.0, 'linear_decay': {'start_point': 0.0, 'end_learning_rate': 0}, 'noam_decay': False, 'discriminative': {'blocks': 0, 'factor': 2.6}, 'gradual_unfreeze': 0, 'slanted_triangle': {'cut_fraction': 0.0, 'ratio': 32}}, regularization: {'L2': 0.0, 'L2SP': 0.0, 'weight_decay': 0.01} and clip: {'GlobalNorm': 1.0, 'Norm': 0.0} [2020-05-06 22:22:06,467] [ INFO] - Try loading checkpoint from work/model/ernie_material_data_tfidf_True/ckpt.meta [2020-05-06 22:22:06,469] [ INFO] - PaddleHub model checkpoint not found, start from scratch... [2020-05-06 22:22:06,549] [ INFO] - PaddleHub finetune start /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") ---------------------------------------------------------------------------EnforceNotMet Traceback (most recent call last) in ----> 1 my_task = my_code(data_type="material", sent_type="data_tfidf", max_seq_len=512, with_content_feature=True) in my_code(data_type, sent_type, max_seq_len, with_content_feature, num_epoch) 59 ) 60 ---> 61 seq_label_task.finetune_and_eval() 62 return seq_label_task /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py in finetune_and_eval(self) 761 762 def finetune_and_eval(self): --> 763 return self.finetune(do_eval=True) 764 765 def finetune(self, do_eval=False): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py in finetune(self, do_eval) 773 while self.current_epoch <= self.config.num_epoch: 774 self.config.strategy.step() --> 775 run_states = self._run(do_eval=do_eval) 776 self.env.current_epoch += 1 777 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py in _run(self, do_eval) 826 with fluid.program_guard(self.main_program, self.startup_program): 827 if self.config.use_pyreader: --> 828 return self._run_with_py_reader(do_eval=do_eval) 829 return self._run_with_data_feeder(do_eval=do_eval) 830 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlehub/finetune/task/base_task.py in _run_with_py_reader(self, do_eval) 909 self.main_program_to_be_run, 910 fetch_list=self.fetch_list, --> 911 return_numpy=False) 912 fetch_result = [np.array(x) for x in fetch_result] 913 /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache) 781 warnings.warn( 782 "The following exception is not an EOF exception.") --> 783 six.reraise(*sys.exc_info()) 784 785 def _run_impl(self, program, feed, fetch_list, feed_var_name, /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py in reraise(tp, value, tb) 691 if value.traceback is not tb: 692 raise value.with_traceback(tb) --> 693 raise value 694 finally: 695 value = None /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache) 776 scope=scope, 777 return_numpy=return_numpy, --> 778 use_program_cache=use_program_cache) 779 except Exception as e: 780 if not isinstance(e, core.EOFException): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run_impl(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache) 841 fetch_list=fetch_list, 842 fetch_var_name=fetch_var_name, --> 843 return_numpy=return_numpy) 844 845 def _run_program(self, program, feed, fetch_list, feed_var_name, /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py in _run_parallel(self, program, scope, feed, fetch_list, fetch_var_name, return_numpy) 675 676 fetch_var_names = list(map(_to_name_str, fetch_list)) --> 677 tensors = exe.run(fetch_var_names)._move_to_list() 678 return as_numpy(tensors) if return_numpy else tensors 679 EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) 2 paddle::platform::CUDADeviceContext::Wait() const 3 paddle::framework::details::ScopeBufferedMonitor::Apply(std::function<void ()> const&, bool) 4 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&) 5 paddle::framework::ParallelExecutor::Run(std::vector<std::string, std::allocatorstd::string > const&)

Error Message Summary:

FatalError: cudaStreamSynchronize raises error: unspecified launch failure, errono: 4: unspecified launch failure at (/paddle/paddle/fluid/platform/device_context.cc:331)

PaddlePaddle / PaddleHub 1 年多 前同步成功

用Ernie做序列标注任务，如何加入自定义的特征呢？

C++ Call Stacks (More useful to developers):

Error Message Summary:

PaddlePaddle / PaddleHub
1 年多前同步成功