关于Token-Document Relation Prediction Task
Created by: tanaka-jp
请问,关于Pre-Training Task, 论文中是这么写的 We add a task to predict whether the token in a segment appears in other segments of the original document. Empirically, the words that appear in many parts of a document are usually commonly-used words or relevant with the main topics of the document. Therefore, through identifying the key words of a document appearing in the segment, the task can enable the ability of a model to capture the key words of the document to some extent
我有2个问题。
问题1,是如何在input data中标示要预测的token的? 问题2,一个token在文章中出现很多次的话,这个token可能是很常见的token,也有可能是对于这个文章来说很重要的token。直接预测token是否多次出现,应该不能得到一个判断capture the key words的模型。是否有用TFIDF之类的方法做了进一步的处理?