README.md 3.6 KB
Newer Older
Z
Zeyu Chen 已提交
1
简体中文 | [English](./README_en.md)
Z
Zeyu Chen 已提交
2

Z
Zeyu Chen 已提交
3
# PaddleNLP
4

Z
Zeyu Chen 已提交
5 6 7
![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)
8

Z
Zeyu Chen 已提交
9
## Introduction
10

Z
Zeyu Chen 已提交
11
PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API with detailed tutorials, It's also the NLP best practice for PaddlePaddle 2.0 API system.
P
pkpk 已提交
12

Z
Zeyu Chen 已提交
13
**This project is still UNDER ACTIVE DEVELOPMENT.**
P
pkpk 已提交
14

Z
Zeyu Chen 已提交
15
## Features
P
pkpk 已提交
16

Z
Zeyu Chen 已提交
17 18 19 20 21 22 23 24
* **Rich and Powerful Model Zoo**
  - Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
* **Easy-to-use API**
  - The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
* **High Performance and Large-scale Training**
  - We provide a highly optimized ditributed training implementation for BERT with Fleet API, it can fully utilize GPU clusters for large-scale model pre-training. Please refer to our [benchmark](./benchmark/bert) for more information.
* **Detailed Tutorials and Industrial Practices**
  - We offers detailed and interactable notebook tutorials to show you the best practices of PaddlePaddle 2.0.
P
pkpk 已提交
25

Z
Zeyu Chen 已提交
26
## Installation
P
pkpk 已提交
27

Z
Zeyu Chen 已提交
28
### Prerequisites
P
pkpk 已提交
29

Z
Zeyu Chen 已提交
30 31
* python >= 3.6
* paddlepaddle >= 2.0.0-rc1
P
pkpk 已提交
32

Z
Zeyu Chen 已提交
33
```
Z
Zeyu Chen 已提交
34
pip install paddlenlp>=2.0.0a
Z
Zeyu Chen 已提交
35
```
P
pkpk 已提交
36

Z
Zeyu Chen 已提交
37
## Quick Start
P
pkpk 已提交
38

Z
Zeyu Chen 已提交
39
### Quick Dataset Loading
P
pkpk 已提交
40

Z
Zeyu Chen 已提交
41
```python
Z
Zeyu Chen 已提交
42
from paddlenlp.datasets import ChnSentiCrop
Z
Zeyu Chen 已提交
43

Z
Zeyu Chen 已提交
44
train_ds, test_ds = ChnSentiCorp.get_datasets(['train','test'])
P
pkpk 已提交
45 46
```

Z
Zeyu Chen 已提交
47 48
For more Dataset API usage, please refer to [Dataset API](./docs/datasets.md).

Z
Zeyu Chen 已提交
49
### Chinese Text Emebdding Loading
P
pkpk 已提交
50

Z
Zeyu Chen 已提交
51
```python
Z
Zeyu Chen 已提交
52 53

from paddlenlp.embeddings import TokenEmbedding
Z
Zeyu Chen 已提交
54

Z
Zeyu Chen 已提交
55 56 57 58 59
wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643
Z
Zeyu Chen 已提交
60
```
Z
Zeyu Chen 已提交
61

Z
Zeyu Chen 已提交
62 63
For more token embedding usage, please refer to [examples/word_embedding](./example/../examples/word_embedding/README.md).

Z
Zeyu Chen 已提交
64 65 66
### One-Line Classical Model Building

```python
Z
Zeyu Chen 已提交
67 68 69 70 71 72 73
from paddlenlp.models import Ernie, Senta, SimNet

ernie = Ernie("ernie-1.0", num_classes=2, task="seq-cls")

senta = Senta(network="bow", vocab_size=1024, num_classes=2)

simnet = SimNet(network="gru", vocab_size=1024, num_classes=2)
Z
Zeyu Chen 已提交
74

P
pkpk 已提交
75 76
```

Z
Zeyu Chen 已提交
77
### Rich Chinsese Pre-trained Models
P
pkpk 已提交
78

Z
Zeyu Chen 已提交
79
```python
Z
Zeyu Chen 已提交
80
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel
Z
Zeyu Chen 已提交
81

Z
Zeyu Chen 已提交
82
ernie = ErnieModel.from_pretrained('ernie-1.0')
Z
Zeyu Chen 已提交
83
bert = BertModel.from_pretrained('bert-wwm-chinese')
Z
Zeyu Chen 已提交
84
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
Z
Zeyu Chen 已提交
85
electra = ElectraModel.from_pretrained('chinese-electra-small')
Y
Yibing Liu 已提交
86 87
```

Z
Zeyu Chen 已提交
88
For more pretrained model selection, please refer to [Pretrained-Models](./docs/transformers.md)
Z
Zeyu Chen 已提交
89

Z
Zeyu Chen 已提交
90 91
## API Usage

Z
Zeyu Chen 已提交
92 93 94 95 96
* [Transformer API](./docs/transformers.md)
* [Dataset API](./docs/datasets.md)
* [Embedding API](./docs/embeddings.md)
* [Metrics API](./docs/embeddings.md)
* [Models API](./docs/models.md)
Z
Zeyu Chen 已提交
97

Z
Zeyu Chen 已提交
98 99
## Tutorials

Z
Zeyu Chen 已提交
100
Please refer to our official AI Studio account for more interactive tutorials: [PaddleNLP on AI Studio](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)
Y
Yibing Liu 已提交
101

Z
Zeyu Chen 已提交
102
## Community
Y
Yibing Liu 已提交
103

Z
Zeyu Chen 已提交
104 105
* SIG for Pretrained Model Contribution
* SIG for Dataset Integration
Z
Zeyu Chen 已提交
106
* SIG for Tutorial Writing
Y
Yibing Liu 已提交
107

Z
Zeyu Chen 已提交
108
## License
Y
Yibing Liu 已提交
109

Z
Zeyu Chen 已提交
110
PaddleNLP is provided under the [Apache-2.0 License](./LICENSE).