README_en.md 4.1 KB
Newer Older
1
English | [简体中文](./README.md)
Z
Zeyu Chen 已提交
2

Z
Zeyu Chen 已提交
3 4 5 6
<p align="center">
  <img src="./docs/imgs/paddlenlp.png" width="520" height ="100" />
</p>

Z
Zeyu Chen 已提交
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
# PaddleNLP

![License](https://img.shields.io/badge/license-Apache%202-red.svg)
![python version](https://img.shields.io/badge/python-3.6+-orange.svg)
![support os](https://img.shields.io/badge/os-linux%2C%20win%2C%20mac-yellow.svg)

## Introduction

PaddleNLP aims to accelerate NLP applications through powerful model zoo, easy-to-use API with detailed tutorials, It's also the NLP best practice for PaddlePaddle 2.0 API system.

**This project is still UNDER ACTIVE DEVELOPMENT.**

## Features

* **Rich and Powerful Model Zoo**
  - Our Model Zoo covers mainstream NLP applications, including Lexical Analysis, Syntactic Parsing, Machine Translation, Text Classification, Text Generation, Text Matching, General Dialogue and Question Answering etc.
* **Easy-to-use API**
  - The API is fully integrated with PaddlePaddle high-level API system. It minimizes the number of user actions required for common use cases like data loading, text pre-processing, training and evaluation. which enables you to deal with text problems more productively.
* **High Performance and Large-scale Training**
  - We provide a highly optimized ditributed training implementation for BERT with Fleet API, it can fully utilize GPU clusters for large-scale model pre-training. Please refer to our [benchmark](./benchmark/bert) for more information.
* **Detailed Tutorials and Industrial Practices**
  - We offers detailed and interactable notebook tutorials to show you the best practices of PaddlePaddle 2.0.

## Installation

### Prerequisites

* python >= 3.6
* paddlepaddle >= 2.0.0-rc1

```
pip install paddlenlp>=2.0.0a
```

## Quick Start

### Quick Dataset Loading

```python
from paddlenlp.datasets import ChnSentiCrop

train_ds, test_ds = ChnSentiCorp.get_datasets(['train','test'])
```

### Chinese Text Emebdding Loading

```python

from paddlenlp.embeddings import TokenEmbedding

wordemb = TokenEmbedding("w2v.baidu_encyclopedia.target.word-word.dim300")
print(wordemb.cosine_sim("国王", "王后"))
>>> 0.63395125
wordemb.cosine_sim("艺术", "火车")
>>> 0.14792643
```

### One-Line Classical Model Building

```python
from paddlenlp.models import Ernie

ernie = Ernie(Ernie.Task.SeqCls)
ernie.forward(input_ids, segment_ids)
```

### Rich Chinsese Pre-trained Models

```python
from paddlenlp.transformers import ErnieModel, BertModel, RobertaModel, ElectraModel

ernie = ErnieModel.from_pretrained('ernie-1.0')
bert = BertModel.from_pretrained('bert-wwm-chinese')
roberta = RobertaModel.from_pretrained('roberta-wwm-ext')
electra = ElectraModel.from_pretrained('chinese-electra-small')
```

For more pretrained model selection, please refer to [PretrainedModels](./paddlenlp/transformers/README.md)

## API Usage


Z
Zeyu Chen 已提交
89 90 91 92 93 94
- [Transformer API](./docs/transformers.md)

- [Dataset API](./docs/datasets.md)

- [Embedding API](./docs/embeddings.md)

95
- [Metrics API](./docs/metrics.md)
S
Steffy-zxf 已提交
96

Z
Zeyu Chen 已提交
97
- [Models API](./docs/models.md)
S
Steffy-zxf 已提交
98 99


Z
Zeyu Chen 已提交
100 101 102 103
## Tutorials

Please refer to our official AI Studio account for more interactive tutorials: [PaddleNLP on AI Studio](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/574995)

Z
Zeyu Chen 已提交
104
* [What's Seq2Vec?](https://aistudio.baidu.com/aistudio/projectdetail/1294333) shows how to use LSTM to do sentiment analysis.
S
Steffy-zxf 已提交
105

Z
Zeyu Chen 已提交
106
* [Sentiment Analysis with ERNIE](https://aistudio.baidu.com/aistudio/projectdetail/1283423) shows how to exploit the pretrained ERNIE to make sentiment analysis better.
S
Steffy-zxf 已提交
107

Z
Zeyu Chen 已提交
108
* [Waybill Information Extraction with BiGRU-CRF Model](https://aistudio.baidu.com/aistudio/projectdetail/1317771) shows how to make use of bigru and crf to do information extraction.
S
Steffy-zxf 已提交
109

Z
Zeyu Chen 已提交
110
* [Waybill Information Extraction with ERNIE](https://aistudio.baidu.com/aistudio/projectdetail/1329361) shows how to exploit the pretrained ERNIE to do information extraction better.
S
Steffy-zxf 已提交
111 112


Z
Zeyu Chen 已提交
113 114
## Community

115 116 117 118 119
Join our QQ Technical Group for technical exchange right now! ⬇️

<div align="center">
  <img src="./docs/imgs/qq.png" width="200" height="200" />
</div>
Z
Zeyu Chen 已提交
120 121 122 123

## License

PaddleNLP is provided under the [Apache-2.0 License](./LICENSE).