README_en.md 4.6 KB
Newer Older
1
English | [简体中文](./README.md)
Z
Zeyu Chen 已提交
2

Z
Zeyu Chen 已提交
3 4 5 6
<p align="center">
  <img src="./docs/imgs/paddlenlp.png" width="520" height ="100" />
</p>

Z
Zeyu Chen 已提交
7 8
---------------------------------------------------------------------------------

Z
Zeyu Chen 已提交
9 10 11 12 13 14
![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)

## Introduction

Z
Zeyu Chen 已提交
15
PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API with detailed tutorials. It's also the NLP best practice for PaddlePaddle 2.0 API system.
Z
Zeyu Chen 已提交
16 17 18 19 20 21 22

**This project is still UNDER ACTIVE DEVELOPMENT.**

## Features

* **Rich and Powerful Model Zoo**
  - Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
23

Z
Zeyu Chen 已提交
24 25
* **Easy-to-use API**
  - The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
26

Z
Zeyu Chen 已提交
27 28
* **High Performance and Large-scale Training**
  - We provide a highly optimized ditributed training implementation for BERT with Fleet API, it can fully utilize GPU clusters for large-scale model pre-training. Please refer to our [benchmark](./benchmark/bert) for more information.
29

Z
Zeyu Chen 已提交
30 31 32 33 34 35 36 37
* **Detailed Tutorials and Industrial Practices**
  - We offers detailed and interactable notebook tutorials to show you the best practices of PaddlePaddle 2.0.

## Installation

### Prerequisites

* python >= 3.6
Z
Zeyu Chen 已提交
38
* paddlepaddle >= 2.0.0
Z
Zeyu Chen 已提交
39 40

```
Z
Zeyu Chen 已提交
41
pip install paddlenlp>=2.0.0rc
Z
Zeyu Chen 已提交
42 43 44 45 46 47 48
```

## Quick Start

### Quick Dataset Loading

```python
Z
Zeyu Chen 已提交
49
from paddlenlp.datasets import ChnSentiCorp
Z
Zeyu Chen 已提交
50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77

train_ds, test_ds = ChnSentiCorp.get_datasets(['train','test'])
```

### Chinese Text Emebdding Loading

```python

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643
```

### Rich Chinsese Pre-trained Models

```python
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
```

Z
Zeyu Chen 已提交
78
For more pretrained model selection, please refer to [Pretrained-Models](./paddlenlp/transformers/README.md)
Z
Zeyu Chen 已提交
79

Z
Zeyu Chen 已提交
80
## Model Zoo and Applications
Z
Zeyu Chen 已提交
81

Z
Zeyu Chen 已提交
82 83 84 85 86 87 88 89 90 91 92 93 94
- [Word Embedding](./examples/word_embedding/README.md)
- [Lexical Analysis](./examples/lexical_analysis/README.md)
- [Language Model](./examples/language_model)
- [Text Classification](./examples/text_classification/README.md)
- [Text Generation](./examples/text_generation/README.md)
- [Semantic Matching](./examples/text_matching/README.md)
- [Named Entity Recognition](./examples/named_entity_recognition/README.md)
- [Text Graph](./examples/text_graph/README.md)
- [General Dialogue](./examples/dialogue)
- [Machine Translation](./exmaples/machine_translation)
- [Question Answering](./exmaples/machine_reading_comprehension)

## API Usage
Z
Zeyu Chen 已提交
95

Z
Zeyu Chen 已提交
96
- [Transformer API](./docs/transformers.md)
Z
Zeyu Chen 已提交
97
- [Data API](./docs/data.md)
Z
Zeyu Chen 已提交
98 99
- [Dataset API](./docs/datasets.md)
- [Embedding API](./docs/embeddings.md)
100
- [Metrics API](./docs/metrics.md)
S
Steffy-zxf 已提交
101

S
Steffy-zxf 已提交
102

Z
Zeyu Chen 已提交
103 104 105 106
## Tutorials

Please refer to our official AI Studio account for more interactive tutorials: [PaddleNLP on AI Studio](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)

S
Steffy-zxf 已提交
107
* [What's Seq2Vec?](https://aistudio.baidu.com/aistudio/projectdetail/1283423) shows how to use LSTM to do sentiment analysis.
S
Steffy-zxf 已提交
108

S
Steffy-zxf 已提交
109
* [Sentiment Analysis with ERNIE](https://aistudio.baidu.com/aistudio/projectdetail/1294333) shows how to exploit the pretrained ERNIE to make sentiment analysis better.
S
Steffy-zxf 已提交
110

Z
Zeyu Chen 已提交
111
* [Waybill Information Extraction with BiGRU-CRF Model](https://aistudio.baidu.com/aistudio/projectdetail/1317771) shows how to make use of bigru and crf to do information extraction.
S
Steffy-zxf 已提交
112

Z
Zeyu Chen 已提交
113
* [Waybill Information Extraction with ERNIE](https://aistudio.baidu.com/aistudio/projectdetail/1329361) shows how to exploit the pretrained ERNIE to do information extraction better.
S
Steffy-zxf 已提交
114 115


Z
Zeyu Chen 已提交
116 117
## Community

118 119 120 121 122
Join our QQ Technical Group for technical exchange right now! ⬇️

<div align="center">
  <img src="./docs/imgs/qq.png" width="200" height="200" />
</div>
Z
Zeyu Chen 已提交
123 124 125 126

## License

PaddleNLP is provided under the [Apache-2.0 License](./LICENSE).