diff --git a/PaddleNLP/README.md b/PaddleNLP/README.md
index a764d0d5a9b6f1ebadff3e4eb0035bab3317788a..00945a53eb9746c1c1d41e5212881936800a0db0 100644
--- a/PaddleNLP/README.md
+++ b/PaddleNLP/README.md
@@ -102,14 +102,15 @@ electra = ElectraModel.from_pretrained('chinese-electra-small')
## API 使用文档
- [Transformer API](./docs/transformers.md)
-
+ * 基于Transformer结构相关的预训练模型API,包含ERNIE, BERT, RoBERTa, Electra等主流经典结构和下游任务。
- [Data API](./docs/data.md)
-
+ * 文本数据Pipeline相关的API说明。
- [Dataset API](./docs/datasets.md)
-
+ * 数据集相关API,包含自定义数据集,数据集贡献与数据集快速加载等功能说明。
- [Embedding API](./docs/embeddings.md)
-
+ * 词向量相关API,支持一键快速加载包预训练的中文词向量,VisulDL高维可视化等功能说明。
- [Metrics API](./docs/metrics.md)
+ * 针对NLP场景的评估指标说明,与飞桨2.0框架高层API兼容。
## 交互式Notebook教程
diff --git a/PaddleNLP/README_en.md b/PaddleNLP/README_en.md
index 34e8c9a533f315dc6719e5a84f39dfa44c416b06..62c7668a65bf8049c06841427a981ac7e2ddb64f 100644
--- a/PaddleNLP/README_en.md
+++ b/PaddleNLP/README_en.md
@@ -20,10 +20,13 @@ PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-t
* **Rich and Powerful Model Zoo**
- Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
+
* **Easy-to-use API**
- The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
+
* **High Performance and Large-scale Training**
- We provide a highly optimized ditributed training implementation for BERT with Fleet API, it can fully utilize GPU clusters for large-scale model pre-training. Please refer to our [benchmark](./benchmark/bert) for more information.
+
* **Detailed Tutorials and Industrial Practices**
- We offers detailed and interactable notebook tutorials to show you the best practices of PaddlePaddle 2.0.
@@ -91,13 +94,9 @@ For more pretrained model selection, please refer to [Pretrained-Models](./paddl
## API Usage
- [Transformer API](./docs/transformers.md)
-
- [Data API](./docs/data.md)
-
- [Dataset API](./docs/datasets.md)
-
- [Embedding API](./docs/embeddings.md)
-
- [Metrics API](./docs/metrics.md)
diff --git a/PaddleNLP/docs/transformers.md b/PaddleNLP/docs/transformers.md
index 7735bac76b3fb00bc4d17d526a04442847192992..cea646846838fe28d22b3e8bba1f2bf92bc5dda8 100644
--- a/PaddleNLP/docs/transformers.md
+++ b/PaddleNLP/docs/transformers.md
@@ -1,9 +1,9 @@
-# PaddleNLP transformer类预训练模型
+# PaddleNLP Transformer API
-随着深度学习的发展,NLP领域涌现了一大批高质量的transformer类预训练模型,多次刷新各种NLP任务SOTA。PaddleNLP为用户提供了常用的BERT、ERNIE等预训练模型,让用户能够方便快捷的使用各种transformer类模型,完成自己所需的任务。
+随着深度学习的发展,NLP领域涌现了一大批高质量的Transformer类预训练模型,多次刷新各种NLP任务SOTA。PaddleNLP为用户提供了常用的BERT、ERNIE、RoBERTa等经典结构预训练模型,让开发者能够方便快捷应用各类Transformer预训练模型及其下游任务。
-## Transformer 类模型汇总
+## Transformer 预训练模型汇总
下表汇总了目前PaddleNLP支持的各类预训练模型。用户可以使用PaddleNLP提供的模型,完成问答、序列分类、token分类等任务。同时我们提供了22种预训练的参数权重供用户使用,其中包含了11种中文语言模型的预训练权重。
@@ -12,7 +12,7 @@
| [BERT](https://arxiv.org/abs/1810.04805) | BertTokenizer|BertModel
BertForQuestionAnswering
BertForSequenceClassification
BertForTokenClassification| `bert-base-uncased`
`bert-large-uncased`
`bert-base-multilingual-uncased`
`bert-base-cased`
`bert-base-chinese`
`bert-base-multilingual-cased`
`bert-large-cased`
`bert-wwm-chinese`
`bert-wwm-ext-chinese` |
|[ERNIE](https://arxiv.org/abs/1904.09223)|ErnieTokenizer
ErnieTinyTokenizer|ErnieModel
ErnieForQuestionAnswering
ErnieForSequenceClassification
ErnieForTokenClassification
ErnieForGeneration| `ernie-1.0`
`ernie-tiny`
`ernie-2.0-en`
`ernie-2.0-large-en`
`ernie-gen-base-en`
`ernie-gen-large-en`
`ernie-gen-large-en-430g`|
|[RoBERTa](https://arxiv.org/abs/1907.11692)|RobertaTokenizer| RobertaModel
RobertaForQuestionAnswering
RobertaForSequenceClassification
RobertaForTokenClassification| `roberta-wwm-ext`
`roberta-wwm-ext-large`
`rbt3`
`rbtl3`|
-|[ELECTRA](https://arxiv.org/abs/2003.10555) |ElectraTokenizer| ElectraModel
ElectraForSequenceClassification
ElectraForTokenClassification
|`electra-small`
`electra-base`
`electra-large`
`chinese-electra-small`
`chinese-electra-base`
|
+|[ELECTRA](https://arxiv.org/abs/2003.10555) | ElectraTokenizer| ElectraModel
ElectraForSequenceClassification
ElectraForTokenClassification
|`electra-small`
`electra-base`
`electra-large`
`chinese-electra-small`
`chinese-electra-base`
|
|[Transformer](https://arxiv.org/abs/1706.03762) |- | TransformerModel | - |
注:其中中文的预训练模型有 `bert-base-chinese, bert-wwm-chinese, bert-wwm-ext-chinese, ernie-1.0, ernie-tiny, roberta-wwm-ext, roberta-wwm-ext-large, rbt3, rbtl3, chinese-electra-base, chinese-electra-small`。生成模型`ernie-gen-base-en, ernie-gen-large-en, ernie-gen-large-en-430g`仅支持`ErnieForGeneration`任务。
@@ -47,7 +47,7 @@ for batch in train_data_loader:
probs = paddle.nn.functional.softmax(logits, axis=1)
loss.backward()
optimizer.step()
- optimizer.clear_gradients()
+ optimizer.clear_grad()
```
上面的代码给出使用预训练模型的简要示例,更完整详细的示例代码,可以参考[使用预训练模型Fine-tune完成中文文本分类任务](https://github.com/PaddlePaddle/models/tree/develop/PaddleNLP/examples/text_classification/pretrained_models)。
diff --git a/PaddleNLP/paddlenlp/transformers/converter/run_glue_pp.py b/PaddleNLP/paddlenlp/transformers/converter/run_glue_pp.py
index e11374c04ce6beb222151a359049e9635495b914..fccbc0825538830e8e059329e8ee910e1f3aa2db 100644
--- a/PaddleNLP/paddlenlp/transformers/converter/run_glue_pp.py
+++ b/PaddleNLP/paddlenlp/transformers/converter/run_glue_pp.py
@@ -369,7 +369,7 @@ def do_train(args):
loss.backward()
optimizer.step()
lr_scheduler.step()
- optimizer.clear_gradients()
+ optimizer.clear_grad()
if global_step % args.save_steps == 0:
evaluate(model, loss_fct, metric, dev_data_loader)
if (not args.n_gpu > 1) or paddle.distributed.get_rank() == 0: