HuggingFace2paddle.md 6.4 KB
Newer Older
W
wjj19950828 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105
# HuggingFace 模型导出为 Paddle 模型教程

X2Paddle 新增对 HuggingFace 模型支持,目前模型支持的列表如下

## TorchScript

<font size=0.5>

| | <font size=2> CausalLM  |<font size=2> MaskedLM  | <font size=2> Seq2SeqLM | <font size=2>SequenceClassification |<font size=2> MultipleChoice |<font size=2>NextSentencePrediction |<font size=2>TokenClassification | <font size=2>QuestionAnswering |<font size=2> AudioClassification |
|---|---|---|---|---|---|---|---|---|---|
| <font size=2> [BERT](https://huggingface.co/docs/transformers/main/model_doc/bert#transformers.BertModel) |✅ | ✅ | None |✅|✅|✅|✅|✅|None|
| <font size=2> [RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta#transformers.RobertaModel) |✅ |✅ | None |✅|✅| None |✅|✅| None |
|<font size=2>  [T5](https://huggingface.co/docs/transformers/main/model_doc/t5#transformers.T5Model) | None | None |✅|None|None|None|None|None|None|
|<font size=2>  [GPT2](https://huggingface.co/docs/transformers/main/model_doc/gpt2#transformers.GPT2Model) |✅ | None |None|✅|None|None|✅|None|None|
|<font size=2>  [MarianMT](https://huggingface.co/docs/transformers/main/model_doc/marian#transformers.MarianModel) |✅ | None |✅|None|None|None|None|None|None|
|<font size=2>  [ELECTRA](https://huggingface.co/docs/transformers/main/model_doc/electra#transformers.ElectraModel) | None | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2>  [DistilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert#transformers.DistilBertModel)|None | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2>  [BART](https://huggingface.co/docs/transformers/main/model_doc/bart#transformers.BartModel) |✅ |✅ |✅|✅|None|None|None|✅|None|
|<font size=2>  [XLM-RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/xlm-roberta#transformers.XLMRobertaModel) |✅ |✅ | None |✅|✅|None|✅|✅|None|
|<font size=2>  [ALBERT](https://huggingface.co/docs/transformers/main/model_doc/albert#transformers.AlbertModel) |None | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2>  [LayoutLM](https://huggingface.co/docs/transformers/main/model_doc/layoutlm#transformers.LayoutLMModel) |None | ✅ |None|✅|None|None|✅|None|None|
|<font size=2>  [BigBird](https://huggingface.co/docs/transformers/main/model_doc/big_bird#transformers.BigBirdModel) |✅ | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2>  [Wav2Vec2](https://huggingface.co/docs/transformers/main/model_doc/wav2vec2#transformers.Wav2Vec2Model) | None | None |None|None|None|None|None|None|✅|

Notes:

- 上表的列表示 backbone 类型,行表示任务类型
- None 表示该 backbone 不支持相关任务(与模型转换无关)

## ONNX

<font size=0.5>

| | <font size=2> CausalLM  |<font size=2> MaskedLM  | <font size=2> Seq2SeqLM | <font size=2>SequenceClassification |<font size=2> MultipleChoice |<font size=2>NextSentencePrediction |<font size=2>TokenClassification | <font size=2>QuestionAnswering |<font size=2> AudioClassification |
|---|---|---|---|---|---|---|---|---|---|
| <font size=2> [BERT](https://huggingface.co/docs/transformers/main/model_doc/bert#transformers.BertModel) |✅ | ✅ | None |✅|None|None|✅|✅|None|
| <font size=2> [RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta#transformers.RobertaModel) |✅ |✅ | None |✅|None| None |✅|✅| None |
|<font size=2>  [T5](https://huggingface.co/docs/transformers/main/model_doc/t5#transformers.T5Model) | None | None |✅|None|None|None|None|None|None|
|<font size=2>  [GPT2](https://huggingface.co/docs/transformers/main/model_doc/gpt2#transformers.GPT2Model) |✅ | None |None|✅|None|None|✅|None|None|
|<font size=2>  [MarianMT](https://huggingface.co/docs/transformers/main/model_doc/marian#transformers.MarianModel) |✅ | None |✅|None|None|None|None|None|None|
|<font size=2>  [ELECTRA](https://huggingface.co/docs/transformers/main/model_doc/electra#transformers.ElectraModel) | None | ✅ |None|✅|None|None|✅|✅|None|
|<font size=2>  [DistilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert#transformers.DistilBertModel)|None | ✅ |None|✅|None|None|✅|✅|None|
|<font size=2>  [BART](https://huggingface.co/docs/transformers/main/model_doc/bart#transformers.BartModel) |✅ |None |✅|✅|None|None|None|✅|None|
|<font size=2>  [XLM-RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/xlm-roberta#transformers.XLMRobertaModel) |✅ |✅ | None |✅|None|None|✅|✅|None|
|<font size=2>  [ALBERT](https://huggingface.co/docs/transformers/main/model_doc/albert#transformers.AlbertModel) |None | ✅ |None|✅|None|None|✅|✅|None|
|<font size=2>  [LayoutLM](https://huggingface.co/docs/transformers/main/model_doc/layoutlm#transformers.LayoutLMModel) |None | ✅ |None|✅|None|None|✅|None|None|

Notes:

- 上表的列表示 backbone 类型,行表示任务类型
- None 表示 HuggingFace 不支持 ONNX 模型导出(与模型转换无关)

## 转换教程

### 环境依赖

- python >= 3.5
- Huggingface 4.16.0
- PyTorch 1.7.1
- PaddlePaddle 2.3.0
- ONNX 1.9.0

### Torch模型转换到Paddle

以模型为 Bert、任务类型为 SequenceClassification 为例,运行如下代码:

```code
import torch

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from x2paddle.convert import pytorch2paddle

save_dir = "pd_model"
jit_type = "trace"

# Load tokenizer and PyTorch weights form the Hub
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", return_dict=False)
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
pt_model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", return_dict=False)
pt_model.eval()
result = pt_model(inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"])

# convert
pytorch2paddle(pt_model, save_dir , jit_type, [inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"]])
```

### ONNX 模型转换到 Paddle

#### 步骤一、通过 HuggingFace 导出 ONNX 模型

使用命令行

```shell
python -m transformers.onnx --model=bert-base-uncased onnx/
```

更多细节可参考 HuggingFace [ONNX导出教程](https://huggingface.co/docs/transformers/main/serialization#exporting-a-model-to-onnx)

#### 步骤二、通过X2Paddle将ONNX模型转换为Paddle格式

通过 X2Paddle 将 ONNX 模型转换为 Paddle

```shell
x2paddle --framework=onnx --model=model.onnx --save_dir=pd_model_dynamic
```