README.md 3.0 KB
Newer Older
Z
Zeyu Chen 已提交
1
# PaddleNLP
2

Z
Zeyu Chen 已提交
3 4 5
![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
6

Z
Zeyu Chen 已提交
7
## Introduction
8

Z
Zeyu Chen 已提交
9
PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API with detailed tutorials, It's also the NLP best practice for PaddlePaddle 2.0 API system.
P
pkpk 已提交
10

Z
Zeyu Chen 已提交
11
** This project is still UNDER ACTIVE DEVELOPMENT. **
P
pkpk 已提交
12

Z
Zeyu Chen 已提交
13
## Features
P
pkpk 已提交
14

Z
Zeyu Chen 已提交
15 16 17 18 19 20 21 22
* **Rich and Powerful Model Zoo**
  - Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
* **Easy-to-use API**
  - The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
* **High Performance and Large-scale Training**
  - We provide a highly optimized ditributed training implementation for BERT with Fleet API, it can fully utilize GPU clusters for large-scale model pre-training. Please refer to our [benchmark](./benchmark/bert) for more information.
* **Detailed Tutorials and Industrial Practices**
  - We offers detailed and interactable notebook tutorials to show you the best practices of PaddlePaddle 2.0.
P
pkpk 已提交
23

Z
Zeyu Chen 已提交
24
## Installation
P
pkpk 已提交
25

Z
Zeyu Chen 已提交
26
### Prerequisites
P
pkpk 已提交
27

Z
Zeyu Chen 已提交
28 29
* python >= 3.6
* paddlepaddle >= 2.0.0-rc1
P
pkpk 已提交
30

Z
Zeyu Chen 已提交
31
```
Z
Zeyu Chen 已提交
32
pip install paddlenlp==2.0.0a
Z
Zeyu Chen 已提交
33
```
P
pkpk 已提交
34

Z
Zeyu Chen 已提交
35
## Quick Start
P
pkpk 已提交
36

Z
Zeyu Chen 已提交
37
### Quick Dataset Loading
P
pkpk 已提交
38

Z
Zeyu Chen 已提交
39
```python
Z
Zeyu Chen 已提交
40 41 42

from paddlenlp.datasets import ChnSentiCrop
train_ds, test_ds = ChnSentiCorp.get_datasets(['train','test'])
P
pkpk 已提交
43 44
```

Z
Zeyu Chen 已提交
45
### Chinese Text Emebdding Loading
P
pkpk 已提交
46

Z
Zeyu Chen 已提交
47
```python
Z
Zeyu Chen 已提交
48 49 50 51

from paddlenlp.embeddings import TokenEmbedding
wordemb = TokenEmbedding("word2vec.baike.300d")
print(wordemb.search("中国"))
Z
Zeyu Chen 已提交
52 53 54
>>> [0.260801, 0.1047, 0.129453 ... 0.096542, 0.0092513]

```
Z
Zeyu Chen 已提交
55

Z
Zeyu Chen 已提交
56 57 58 59 60 61
### One-Line Classical Model Building

```python
from paddlenlp.models import Ernie
ernie = Ernie(Ernie.Task.SeqCls)
ernie.forward(input_ids, segment_ids)
P
pkpk 已提交
62 63
```

Z
Zeyu Chen 已提交
64
### Rich Chinsese Pre-trained Models
P
pkpk 已提交
65

Z
Zeyu Chen 已提交
66
```python
Z
Zeyu Chen 已提交
67 68 69 70 71
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel
ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-ext-large')
electra = ElectraModel.from_pretrained('eclectra-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
Y
Yibing Liu 已提交
72 73
```

Z
Zeyu Chen 已提交
74 75
For more pretrained model selection, please refer to [PretrainedModels](./paddlenlp/transformers/README.md)

Z
Zeyu Chen 已提交
76 77 78 79 80 81 82 83
## API Usage

* Transformer API(./docs/transformers.md)
* Dataset API(./docs/datasets.md)
* Embedding API(./docs/embeddings.md)
* Metrics API(./docs/embeddings.md)
* Models API(./docs/models.md)

Z
Zeyu Chen 已提交
84 85 86
## Tutorials

List our notebook tutorials based on AI Studio.
Z
Zeyu Chen 已提交
87
TBD
Y
Yibing Liu 已提交
88

Z
Zeyu Chen 已提交
89
## Community
Y
Yibing Liu 已提交
90

Z
Zeyu Chen 已提交
91 92
* SIG for Pretrained Model Contribution
* SIG for Dataset Integration
Z
Zeyu Chen 已提交
93
TBD
Y
Yibing Liu 已提交
94

Z
Zeyu Chen 已提交
95
## FAQ
Y
Yibing Liu 已提交
96

Z
Zeyu Chen 已提交
97
## License
Y
Yibing Liu 已提交
98

Z
Zeyu Chen 已提交
99
PaddleNLP is provided under the [Apache-2.0 License](./LICENSE).