update readme

62a13fd8 · liyukun01 · 87d3c630 · 62a13fd8 · 62a13fd8
显示空白变更内容
内联并排

Showing with 21 addition and 1 deletion

README.md README.md +11 -1

README.zh.md README.zh.md +10 -0

未找到文件。
--- a/README.md
+++ b/README.md
@@ -16,6 +16,7 @@ English | [简体中文](./README.zh.md)
        * [Discourse Relation Task](#discourse-relation-task)
        * [IR Relevance Task](#ir-relevance-task)
  * [ERNIE 1.0: <strong>E</strong>nhanced <strong>R</strong>epresentation through k<strong>N</strong>owledge <strong>I</strong>nt<strong>E</strong>gration](#ernie-10-enhanced-representation-through-knowledge-integration)
+  * [Compare the ERNIE 1.0 and ERNIE 2.0](#compare-the-ernie-10-and-ernie-20)
  * [Results on English Datasets](#results-on-english-datasets)
  * [Results on Chinese Datasets](#results-on-chinese-datasets)

@@ -96,6 +97,15 @@ In the example sentence above, BERT can identify the  “K.” through the local

 Integrating both phrase information and named entity information enables the model to obtain better language representation compare to BERT. ERNIE is trained on multi-source data and knowledge collected from encyclopedia articles, news, and forum dialogues, which improves its performance in context-based knowledge reasoning.

+### Compare the ERNIE 1.0 and ERNIE 2.0
+
+#### Pre-Training Tasks
+| Tasks | ERNIE model 1.0 | ERNIE model 2.0 (en) | ERNIE model 2.0 (zh) |
+| ------------------- | -------------------------- | ------------------------------------------------------------ | ----------------------------------------- |
+| **Word-aware**      | ✅ Knowledge Masking        | ✅ Knowledge Masking <br> ✅ Capitalization Prediction <br> ✅ Token-Document Relation Prediction | ✅ Knowledge Masking                       |
+| **Structure-aware** |                            | ✅ Sentence Reordering                                        | ✅ Sentence Reordering <br> ✅ Sentence Distance |
+| **Semantic-aware**  | ✅ Next Sentence Prediction | ✅ Discourse Relation                                         | ✅ Discourse Relation <br> ✅ IR Relevance  |
+
 ## Release Notes

 - July 30, 2019: release ERNIE 2.0
@@ -326,7 +336,7 @@ XNLI is a natural language inference dataset in 15 languages. It was jointly bui

 *\*The DRCD dataset is converted from Traditional Chinese to Simplified Chinese based on tool: https://github.com/skydark/nstools/tree/master/zhtools*

-\* *The pre-training data of ERNIE 1.0 BASE does not contain instances whose length exceeds 128, but other models is pre-trained with the instances whose length are 512. It causes poorer performance of ERNIE 1.0 BASE on long-text tasks. So We have released [ERNIE 1.0 Base(max-len-512)](https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz) in July 29th, 2019*
+\* *The pre-training data of ERNIE 1.0 BASE does not contain instances whose length exceeds 128, but other models is pre-trained with the instances whose length are 512. It causes poorer performance of ERNIE 1.0 BASE on long-text tasks. So We have released [ERNIE 1.0 Base(max-len-512)](https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz) on July 29th, 2019*




--- a/README.zh.md
+++ b/README.zh.md
@@ -16,6 +16,7 @@
        * [Discourse Relation Task](#discourse-relation-task)
        * [IR Relevance Task](#ir-relevance-task)
  * [ERNIE 1.0: <strong>E</strong>nhanced <strong>R</strong>epresentation through k<strong>N</strong>owledge <strong>I</strong>nt<strong>E</strong>gration](#ernie-10-enhanced-representation-through-knowledge-integration)
+  * [对比 ERNIE 1.0 和 ERNIE 2.0](#对比-ernie-10-和-ernie-20)
  * [中文效果验证](#中文效果验证)
  * [英文效果验证](#英文效果验证)

@@ -90,6 +91,15 @@

 训练数据方面，除百科类、资讯类中文语料外，**ERNIE** 还引入了论坛对话类数据，利用 **DLM**（Dialogue Language Model）建模 Query-Response 对话结构，将对话 Pair 对作为输入，引入 Dialogue Embedding 标识对话的角色，利用 Dialogue Response Loss 学习对话的隐式关系，进一步提升模型的语义表示能力。

+### 对比 ERNIE 1.0 和 ERNIE 2.0
+
+#### Pre-Training Tasks
+
+| 任务 | ERNIE 1.0 模型 | ERNIE 2.0 英文模型 | ERNIE 2.0 中文模型 |
+| ------------------- | -------------------------- | ------------------------------------------------------------ | ----------------------------------------- |
+| **Word-aware**      | ✅ Knowledge Masking        | ✅ Knowledge Masking <br> ✅ Capitalization Prediction <br> ✅ Token-Document Relation Prediction | ✅ Knowledge Masking                       |
+| **Structure-aware** |                            | ✅ Sentence Reordering                                        | ✅ Sentence Reordering <br> ✅ Sentence Distance |
+| **Semantic-aware**  | ✅ Next Sentence Prediction | ✅ Discourse Relation                                         | ✅ Discourse Relation <br> ✅ IR Relevance  |


 ## 开源记录