change readme

5a642ac2 · webyfdt · 5942cca9 · 5a642ac2 · 5a642ac2 · 5a642ac2
12 changed file
--- a/README.md
+++ b/README.md
@@ -49,7 +49,7 @@ ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框

 # 环境安装

-1. 安装环境依赖：[环境安装](./readme_env.md)
+1. 安装环境依赖：[环境安装](./README_ENV.md)
 2. 安装Ernie套件

 ```plain
@@ -148,7 +148,7 @@ python run_infer.py --param_path ./examples/cls_enrie_fc_ch_infer.json

 # 预训练模型介绍

- 参考预训练模型原理介绍:[模型介绍](readme_model.md)
+- 参考预训练模型原理介绍:[模型介绍](./nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/README.md)
 - 预训练模型下载：进入./wenxin_appzoo/models_hub目录下,下载示例：

 ```plain
@@ -162,7 +162,7 @@ sh downlaod_ernie3.0_base_ch.sh

 # 模型效果评估

-[模型效果评估](readme_score.md)
+[模型效果评估](README_SCORE.md)

 # 数据集下载

@@ -192,7 +192,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE 1.0

-```json
+```
 @article{sun2019ernie,
  title={Ernie: Enhanced representation through knowledge integration},
  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Chen, Xuyi and Zhang, Han and Tian, Xin and Zhu, Danxiang and Tian, Hao and Wu, Hua},
@@ -203,7 +203,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE 2.0

-```json
+```
 @inproceedings{sun2020ernie,
  title={Ernie 2.0: A continual pre-training framework for language understanding},
  author={Sun, Yu and Wang, Shuohuan and Li, Yukun and Feng, Shikun and Tian, Hao and Wu, Hua and Wang, Haifeng},
@@ -217,7 +217,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE-GEN

-```json
+```
 @article{xiao2020ernie,
  title={Ernie-gen: An enhanced multi-flow pre-training and fine-tuning framework for natural language generation},
  author={Xiao, Dongling and Zhang, Han and Li, Yukun and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
@@ -228,7 +228,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE-ViL

-```json
+```
 @article{yu2020ernie,
  title={Ernie-vil: Knowledge enhanced vision-language representations through scene graph},
  author={Yu, Fei and Tang, Jiji and Yin, Weichong and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
@@ -239,7 +239,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE-Gram

-```json
+```
 @article{xiao2020ernie,
  title={ERNIE-Gram: Pre-Training with Explicitly N-Gram Masked Language Modeling for Natural Language Understanding},
  author={Xiao, Dongling and Li, Yu-Kun and Zhang, Han and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
@@ -250,7 +250,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE-Doc

-```json
+```
 @article{ding2020ernie,
  title={ERNIE-Doc: A retrospective long-document modeling transformer},
  author={Ding, Siyu and Shang, Junyuan and Wang, Shuohuan and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
@@ -261,7 +261,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE-UNIMO

-```json
+```
 @article{li2020unimo,
  title={Unimo: Towards unified-modal understanding and generation via cross-modal contrastive learning},
  author={Li, Wei and Gao, Can and Niu, Guocheng and Xiao, Xinyan and Liu, Hao and Liu, Jiachen and Wu, Hua and Wang, Haifeng},
@@ -272,7 +272,7 @@ sh downlaod_ernie3.0_base_ch.sh

 ### ERNIE-M

-```json
+```
 @article{ouyang2020ernie,
  title={Ernie-m: Enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora},
  author={Ouyang, Xuan and Wang, Shuohuan and Pang, Chao and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},

--- a/readme_score.md
+++ b/readme_score.md
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/readme.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/models_hub/readme.md
-# models hub
- 预训练模型
+# 模型介绍

-  包含预训练模型下载脚本、模型配置的json文件、模型对应的词表文件、外加一个简单介绍文件（readme.txt）
+- 预训练模型：包含预训练模型下载脚本、模型配置的json文件、模型对应的词表文件、外加一个简单介绍文件（readme.txt）

- 开箱即用的模型
\ No newline at end of file
+- 开箱即用的模型
+
+# Ernie2.0 
+
+[Ernie2.0 ](https://www.jiqizhixin.com/articles/2019-07-31-10)
+
+# Ernie-Doc
+
+[ERNIE-Doc](https://github.com/PaddlePaddle/ERNIE/blob/repro/ernie-doc/README_zh.md)
+
+# Ernie-M
+
+[Ernie-M](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-m)
+
+# Ernie-Gen
+
+[Ernie-Gen](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen)
\ No newline at end of file
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/data_distillation/README.md
@@ -47,7 +47,7 @@ data_distillation/

 ## 数据准备

- 目前采用三种数据增强策略策略，对于不用的任务可以特定的比例混合。三种[数据增强](../../tools/data/data_aug/readme.md)策略包括：
+- 目前采用三种数据增强策略策略，对于不用的任务可以特定的比例混合。三种[数据增强](../../tools/data/data_aug/README.md)策略包括：
  - 添加噪声：对原始样本中的词，以一定的概率（如0.1）替换为”UNK”标签
  - 同词性词替换：对原始样本中的所有词，以一定的概率（如0.1）替换为本数据集中随机一个同词性的词
  - N-sampling：从原始样本中，随机选取位置截取长度为m的片段作为新的样本，其中片段的长度m为0到原始样本长度之间的随机值

--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/readme.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/information_extraction_many_to_many/readme.md
@@ -142,7 +142,7 @@ python convert_data.py

 ### ERNIE预训练模型下载

- 文心提供的[ERNIE预训练模型](../../../../../readme_model.md)的参数文件和配置文件在 wenxin_appzoo/wenxin_appzoo/models_hub目录下，使用对应的sh脚本，即可拉取对应的模型、字典、必要环境等文件。
+- 文心提供的[ERNIE预训练模型](../../../../../README_MODEL.md)的参数文件和配置文件在 wenxin_appzoo/wenxin_appzoo/models_hub目录下，使用对应的sh脚本，即可拉取对应的模型、字典、必要环境等文件。

 | 模型名称        | 下载脚本                           | 备注                                       |
 | --------------- | ---------------------------------- | ------------------------------------------ |

--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/sequence_labeling/README.md
@@ -40,8 +40,8 @@

 ### 数据准备

- 在文心中，基于ERNIE的模型都不需要用户自己分词和生成词表文件，非ERNIE的模型需要用户自己提前切好词，词之间以空格分隔，并生成词表文件。切词和词表生成可以使用「[分词工具与词表生成工具](../../tools/data/wordseg/readme.md)」进行处理。
- 文心中的所有数据集、包含词表文件、label_map文件等都必须为为utf-8格式，如果你的数据是其他格式，请使用「[编码识别及转换工具](../../tools/data/data_cleaning/readme.md)」进行格式转换。
+- 在文心中，基于ERNIE的模型都不需要用户自己分词和生成词表文件，非ERNIE的模型需要用户自己提前切好词，词之间以空格分隔，并生成词表文件。切词和词表生成可以使用「[分词工具与词表生成工具](../../tools/data/wordseg/README.md)」进行处理。
+- 文心中的所有数据集、包含词表文件、label_map文件等都必须为为utf-8格式，如果你的数据是其他格式，请使用「[编码识别及转换工具](../../tools/data/data_cleaning/README.md)」进行格式转换。
 - 文心中的训练集、测试集、验证集、预测集和词表分别存放在./wenxin_appzoo/tasks/sequence_labeling/data目录下的train_data、test_data、dev_data、predict_data、dict文件夹下。

 #### 训练集/测试集/验证集文件格式
@@ -98,7 +98,7 @@ B-PER   0

 ### ERNIE预训练模型下载

- 文心提供的[ERNIE预训练模型](../../../../../readme_model.md)的参数文件和配置文件在wenxin_appzoo/models_hub目录下，由对应的download_xx.sh文件是下载得到，包括模型的参数文件、配置文件以及词表等。
+- 文心提供的[ERNIE预训练模型](../../../../../README_MODEL.md)的参数文件和配置文件在wenxin_appzoo/models_hub目录下，由对应的download_xx.sh文件是下载得到，包括模型的参数文件、配置文件以及词表等。

 | 模型名称        | 下载脚本                           | 备注                                       |
 | --------------- | ---------------------------------- | ------------------------------------------ |

--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/README.md
 # 文本分类

 - 本章分为两部分进行讲解：
-  - [代码结构、准备工作](./readme_code.md)：介绍文本分类任务的代码结构以及数据、模型结构、预训练模型下载、评估指标等信息
-  - 选择具体的模型训练文本分类任务，文心预置了4类模型训练文本分类任务，具体分为：
-    - [Bow(BowClassification)模型](./readme_bow.md)
-    - [ERNIE(ErnieClassification)模型](readme_ERNIE.md)
-    - [ERNIE-Doc(ErnieDocClassification)模型](./readme_Doc.md)
-    - [ERNIE-M(ErnieClassification)模型](./readme_m.md)
\ No newline at end of file
+  - [代码结构、准备工作](./README_CODE.md)：介绍文本分类任务的代码结构以及数据、模型结构、预训练模型下载、评估指标等信息
+  - 选择具体的模型训练文本分类任务，文心预置了3类模型训练文本分类任务，具体分为：
+    - [Bow(BowClassification)模型](./README_BOW.md)
+    - [ERNIE(ErnieClassification)模型](./README_ERNIE.md)
+    - [ERNIE-M(ErnieClassification)模型](./README_M.md)
\ No newline at end of file
--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_code.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_classification/readme_code.md
@@ -77,8 +77,8 @@

 ## 数据准备

- 在文心中，基于ERNIE的模型都不需要用户自己分词和生成词表文件，非ERNIE的模型需要用户自己提前切好词，词之间以空格分隔，并生成词表文件。切词和词表生成可以使用「[分词工具与词表生成工具](../../tools/data/wordseg/readme.md)」进行处理。
- 文心中的所有数据集、包含词表文件、label_map文件等都必须为为utf-8格式，如果你的数据是其他格式，请使用「[编码识别及转换工具](../../tools/data/data_cleaning/readme.md)」进行格式转换。
+- 在文心中，基于ERNIE的模型都不需要用户自己分词和生成词表文件，非ERNIE的模型需要用户自己提前切好词，词之间以空格分隔，并生成词表文件。切词和词表生成可以使用「[分词工具与词表生成工具](../../tools/data/wordseg/README.md)」进行处理。
+- 文心中的所有数据集、包含词表文件、label_map文件等都必须为为utf-8格式，如果你的数据是其他格式，请使用「[编码识别及转换工具](../../tools/data/data_cleaning/README.md)」进行格式转换。
 - 文心中的训练集、测试集、验证集、预测集和词表分别存放在./wenxin_appzoo/tasks/text_classification/data目录下的train_data、test_data、dev_data、predict_data、dict文件夹下。
 - 在分类任务中，训练集、测试集和验证集的数据格式相同，数据分为两列，列与列之间用**\t**进行分隔。第一列为文本，第二列为标签。以下为示例：

@@ -169,7 +169,7 @@ USB接口 只有 2个 ， 太 少 了 点 ， 不能 接 太多 外 接 设备 

 ## ERNIE预训练模型下载

-文心提供的ERNIE预训练模型的参数文件和配置文件在./wenxin_appzoo/tasks/models_hub/目录下，由对应的download_xx.sh文件是下载得到。ERNIE部分模型介绍，请详见文档「[ERNIE模型介绍](../../../../../readme_model.md)」
+文心提供的ERNIE预训练模型的参数文件和配置文件在./wenxin_appzoo/tasks/models_hub/目录下，由对应的download_xx.sh文件是下载得到。ERNIE部分模型介绍，请详见文档「[ERNIE模型介绍](../../../../../README_MODEL.md)」

 ## 模型评估指标选择


--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_generation/README.md
@@ -58,7 +58,7 @@ text_generation/

 ### ERNIE预训练模型下载

- 文心提供的[ERNIE预训练模型](../../../../../readme_model.md)的参数文件和配置文件在 wenxin_appzoo/wenxin_appzoo/models_hub目录下，使用对应的sh脚本，即可拉取对应的模型、字典、必要环境等文件。
+- 文心提供的[ERNIE预训练模型](../../../../../README_MODEL.md)的参数文件和配置文件在 wenxin_appzoo/wenxin_appzoo/models_hub目录下，使用对应的sh脚本，即可拉取对应的模型、字典、必要环境等文件。

 | 模型名称  | 下载脚本                         | 备注                                                     |
 | --------- | -------------------------------- | -------------------------------------------------------- |

--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/README.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tasks/text_matching/README.md
@@ -60,8 +60,8 @@

 ### 数据准备

- 在文心中，基于ERNIE的模型都不需要用户自己分词和生成词表文件，非ERNIE的模型需要用户自己提前切好词，词之间以空格分隔，并生成词表文件。切词和词表生成可以使用「[分词工具与词表生成工具](../../tools/data/wordseg/readme.md)」进行处理。
- 文心中的所有数据集、包含词表文件、label_map文件等都必须为为utf-8格式，如果你的数据是其他格式，请使用「[编码识别及转换工具](../../tools/data/data_cleaning/readme.md)」进行格式转换。
+- 在文心中，基于ERNIE的模型都不需要用户自己分词和生成词表文件，非ERNIE的模型需要用户自己提前切好词，词之间以空格分隔，并生成词表文件。切词和词表生成可以使用「[分词工具与词表生成工具](../../tools/data/wordseg/README.md)」进行处理。
+- 文心中的所有数据集、包含词表文件、label_map文件等都必须为为utf-8格式，如果你的数据是其他格式，请使用「[编码识别及转换工具](../../tools/data/data_cleaning/README.md)」进行格式转换。
 - 在文本匹配任务中，根据其训练方式的不同，训练集分为Pointwise和Pairwise两种格式，测试集、验证集和预测集的格式相同。
 - 非ERNIE数据的pointwise训练集、pairwise训练集、测试集、验证集和预测集分别存放在./wenxin_appzoo/tasks/text_matching/data目录下的train_data_pointwise_tokenized、train_data_pairwise_tokenized、test_data_tokenized、dev_data_tokenized和predict_data_tokenized文件夹下。
 - ERNIE数据的pointwise训练集、pairwise训练集、测试集、验证集和预测集分别存放在./wenxin_appzoo/tasks/text_matching/data目录下的train_data_pointwise、train_data_pairwise、test_data、dev_data和predict_data文件夹下。
@@ -212,7 +212,7 @@

 ### ERNIE预训练模型下载

- 文心提供的[ERNIE预训练模型](../../../../../readme_model.md)的下载脚本在wenxin_appzoo/models_hub目录下，各预训练模型可由对应的download_xx.sh文件下载得到，用户可根据需求自行下载。其中，ernie_config.json为ERNIE预训练模型的配置文件，vocab.txt为ERNIE预训练模型的词表文件，params目录为ERNIE预训练模型的参数文件目录。
+- 文心提供的[ERNIE预训练模型](../../../../../README_MODEL.md)的下载脚本在wenxin_appzoo/models_hub目录下，各预训练模型可由对应的download_xx.sh文件下载得到，用户可根据需求自行下载。其中，ernie_config.json为ERNIE预训练模型的配置文件，vocab.txt为ERNIE预训练模型的词表文件，params目录为ERNIE预训练模型的参数文件目录。

 | 模型名称        | 下载脚本                           | 备注                                       |
 | --------------- | ---------------------------------- | ------------------------------------------ |

--- a/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/readme.md
+++ b/nlp-ernie/wenxin_appzoo/wenxin_appzoo/tools/readme.md
 # 工具使用

- [分词工具与词表生成工具](./data/wordseg/readme.md)
- [数据增强](data/data_aug/readme.md)
- [交叉验证](./run_preprocess/readme.md)
- [网格搜索](../tasks/text_classification/readme_grid.md)
- [编码及转换攻击](./data/data_cleaning/readme.md)
+- [分词工具与词表生成工具](./data/wordseg/README.md)
+- [数据增强](data/data_aug/README.md)
+- [交叉验证](./run_preprocess/README.md)
+- [网格搜索](../tasks/text_classification/README_GRID.md)
+- [编码及转换攻击](./data/data_cleaning/README.md)

--- a/readme_model.md
+++ b/readme_model.md
-# 模型介绍
-
-# Ernie2.0 
-
-[Ernie2.0 ](https://www.jiqizhixin.com/articles/2019-07-31-10)
-
-# Ernie-Doc
-
-[ERNIE-Doc](https://github.com/PaddlePaddle/ERNIE/blob/repro/ernie-doc/README_zh.md)
-
-# Ernie-M
-
-[Ernie-M](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-m)
-
-# Ernie-Gen
-
-[Ernie-Gen](https://github.com/PaddlePaddle/ERNIE/tree/repro/ernie-gen)
\ No newline at end of file