关于Token-Document Relation Prediction Task (#263) · Issue · PaddlePaddle / ERNIE

关于Token-Document Relation Prediction Task

Created by: tanaka-jp

请问，关于Pre-Training Task，论文中是这么写的 We add a task to predict whether the token in a segment appears in other segments of the original document. Empirically, the words that appear in many parts of a document are usually commonly-used words or relevant with the main topics of the document. Therefore, through identifying the key words of a document appearing in the segment, the task can enable the ability of a model to capture the key words of the document to some extent

我有2个问题。

问题1，是如何在input data中标示要预测的token的？问题2，一个token在文章中出现很多次的话，这个token可能是很常见的token，也有可能是对于这个文章来说很重要的token。直接预测token是否多次出现，应该不能得到一个判断capture the key words的模型。是否有用TFIDF之类的方法做了进一步的处理？

PaddlePaddle / ERNIE 大约 2 年 前同步成功

关于Token-Document Relation Prediction Task

PaddlePaddle / ERNIE
大约 2 年前同步成功