Update document (#3834)

* Update document * Update link to PaddleLARK

Update document (#3834)
* Update document * Update link to PaddleLARK
d8b44efc · Yibing Liu · GitHub · e34627d7 · d8b44efc · d8b44efc
5 changed file
--- a/PaddleNLP/PaddleLARK/XLNet/README.md
+++ b/PaddleNLP/PaddleLARK/XLNet/README.md
@@ -8,7 +8,21 @@ For more details, please refer to the research paper

 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237)

+## Directory structure

+```
+├── model/                        # directory for model structure definition
+│   ├── classifier.py             # model for regression/classification
+│   ├── xlnet.py                  # model for XLNet
+├── reader/                       # directory for data reader
+│   ├── cls.py                    # data reader for regression/classification
+│   ├── squad.py                  # data reader for squad
+├── utils/                        # directory for utility files
+│── modeling.py                   # network modules
+│── optimization.py               # optimization method
+│── run_classifier.py             # script for running regression/classification task
+│── run_squad.py                  # script for running squad
+```

 ## Installation


--- a/PaddleNLP/PaddleLARK/XLNet/README_cn.md
+++ b/PaddleNLP/PaddleLARK/XLNet/README_cn.md
@@ -8,11 +8,27 @@ XLNet 与 [BERT](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/P

 [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237)

+## 目录结构
+
+```
+├── model/                        # 模型结构定义目录
+│   ├── classifier.py             # 回归/分类模型结构
+│   ├── xlnet.py                  # XLNet 模型结构
+├── reader/                       # 数据读取 reader 定义目录
+│   ├── cls.py                    # 分类任务数据读取
+│   ├── squad.py                  # squad 数据读取
+├── utils/                        # 辅助文件目录
+│── modeling.py                   # 网络定义模块
+│── optimization.py               # 优化方法
+│── run_classifier.py             # 运行回归/分类任务的脚本
+│── run_squad.py                  # 运行 squad 任务的脚本
+```
+
 ## 安装

 该项目要求 Paddle Fluid 1.6.0 及以上版本，请参考 [安装指南](https://www.paddlepaddle.org.cn/start) 进行安装。

-## Pre-trained models
+## 预训练模型

 这里提供了从官方开源模型转换而来的两个预训练模型供下载


--- a/PaddleNLP/PaddleLARK/XLNet/modeling.py
+++ b/PaddleNLP/PaddleLARK/XLNet/modeling.py
+#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import re
 import numpy as np
 import paddle.fluid as fluid
-import collections


 def log_softmax(logits, axis=-1):

--- a/PaddleNLP/README.md
+++ b/PaddleNLP/README.md
@@ -20,7 +20,7 @@

  - PaddleNLP为您提供持续的技术支持和模型算法更新，为您的NLP业务保驾护航。

-  
+

 快速安装
 -------
@@ -55,7 +55,7 @@ cd models/PaddleNLP/sentiment_classification
 |                    **语言模型**                    | [Language_model](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/language_model) | 基于循环神经网络（RNN）的经典神经语言模型（neural language model）。 |
 |                 **情感分类**:fire:                 | [Senta](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/sentiment_classification)，[EmotionDetection](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/emotion_detection) | Senta（Sentiment Classification，简称Senta）和EmotionDetection两个项目分别提供了面向*通用场景*和*人机对话场景专用*的情感倾向性分析模型。 |
 |              **文本相似度计算**:fire:              | [SimNet](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/similarity_net) | SimNet，又称为Similarity Net，为您提供高效可靠的文本相似度计算工具和预训练模型。 |
-|                 **语义表示**:fire:                 | [PaddleLARK](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/language_representations_kit) | PaddleLARK，全称为Paddle LAngauge Representation Toolkit，集成了ELMO，BERT，ERNIE 1.0，ERNIE 2.0，XLNet等热门中英文预训练模型。 |
+|                 **语义表示**:fire:                 | [PaddleLARK](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleLARK) | PaddleLARK，全称为Paddle LAngauge Representation Toolkit，集成了ELMO，BERT，ERNIE 1.0，ERNIE 2.0，XLNet等热门中英文预训练模型。 |
 |                    **文本生成**                    | [PaddleTextGEN](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleTextGEN) | Paddle Text Generation为您提供了一些列经典文本生成模型案例，如vanilla seq2seq，seq2seq with attention，variational seq2seq模型等。 |
 |                    **阅读理解**                    | [PaddleMRC](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleMRC) | PaddleMRC，全称为Paddle Machine Reading Comprehension，集合了百度在阅读理解领域相关的模型，工具，开源数据等一系列工作。包括DuReader (百度开源的基于真实搜索用户行为的中文大规模阅读理解数据集)，KT-Net (结合知识的阅读理解模型，SQuAD以及ReCoRD曾排名第一), D-Net (预训练-微调框架，在EMNLP2019 MRQA国际阅读理解评测获得第一)，等。 |
 |                    **对话系统**                    | [PaddleDialogue](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleDialogue) | 包括：1）DGU（Dialogue General Understanding，通用对话理解模型）覆盖了包括**检索式聊天系统**中context-response matching任务和**任务完成型对话系统**中**意图识别**，**槽位解析**，**状态追踪**等常见对话系统任务，在6项国际公开数据集中都获得了最佳效果。<br/> 2) knowledge-driven dialogue：百度开源的知识驱动的开放领域对话数据集，发表于ACL2019。<br/>3）ADEM（Auto Dialogue Evaluation Model）：对话自动评估模型，可用于自动评估不同对话生成模型的回复质量。 |
@@ -106,4 +106,3 @@ cd models/PaddleNLP/sentiment_classification
 扫描下方二维码，加入我们的QQ群，即刻获取来自百度的技术支持：

 ![Paddle_QQ](./appendix/Paddle_QQ.jpg)
-
--- a/README.md
+++ b/README.md
@@ -172,6 +172,7 @@ PaddlePaddle 提供了丰富的计算单元，使得用户可以采用模块化
 | ------------------------------------------------------------ | ------------------------------------------------------------ |
 | [ERNIE](https://github.com/PaddlePaddle/ERNIE)(Enhanced Representation from kNowledge IntEgration) | 百度自研的语义表示模型，通过建模海量数据中的词、实体及实体关系，学习真实世界的语义知识。相较于 BERT 学习原始语言信号，ERNIE 直接对先验语义知识单元进行建模，增强了模型语义表示能力。 |
 | [BERT](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleLARK/BERT)(Bidirectional Encoder Representation from Transformers) | 一个迁移能力很强的通用语义表示模型， 以 Transformer 为网络基本组件，以双向 Masked Language Model和 Next Sentence Prediction 为训练目标，通过预训练得到通用语义表示，再结合简单的输出层，应用到下游的 NLP 任务，在多个任务上取得了 SOTA 的结果。 |
+| [XLNet](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleLARK/XLNet)(XLNet: Generalized Autoregressive Pretraining for Language Understanding) | 重要的语义表示模型之一，引入 Transformer-XL 为骨架，以 Permutation Language Modeling 为优化目标，在若干下游任务上优于 BERT 的性能。 |
 | [ELMo](https://github.com/PaddlePaddle/models/tree/release/1.6/PaddleNLP/PaddleLARK/ELMo)(Embeddings from Language Models) | 重要的通用语义表示模型之一，以双向 LSTM 为网路基本组件，以 Language Model 为训练目标，通过预训练得到通用的语义表示，将通用的语义表示作为 Feature 迁移到下游 NLP 任务中，会显著提升下游任务的模型性能。 |

 #### 文本相似度计算