未验证 提交 091b1644 编写于 作者: W WJJ1995 提交者: GitHub

Add HuggingFace Model Conversion Tutorial (#823)

* add replication pad

* update op_list.md

* re-lint

* fixed aten::index

* rm useless code

* Support Wav2vec2

* Support onnx roberta

* update op_list.md

* deal with comments

* deal with comments

* Support T5

* fixed for ci

* deal with comments

* deal with scalar tensor

* fixed misspell

* Add doc for HuggingFace

* rm useless info

* fixed bug
上级 68f5861b
# HuggingFace 模型转换为 Paddle 模型教程
X2Paddle 新增对 HuggingFace 模型支持,TorchScript 支持55+模型,ONNX 支持40+模型,目前模型支持的列表如下
## TorchScript
<font size=0.5>
| | <font size=2> CausalLM |<font size=2> MaskedLM | <font size=2> Seq2SeqLM | <font size=2>SequenceClassification |<font size=2> MultipleChoice |<font size=2>NextSentencePrediction |<font size=2>TokenClassification | <font size=2>QuestionAnswering |<font size=2> AudioClassification |
|---|---|---|---|---|---|---|---|---|---|
| <font size=2> [BERT](https://huggingface.co/docs/transformers/main/model_doc/bert#transformers.BertModel) |✅ | ✅ | None |✅|✅|✅|✅|✅|None|
| <font size=2> [RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta#transformers.RobertaModel) |✅ |✅ | None |✅|✅| None |✅|✅| None |
|<font size=2> [T5](https://huggingface.co/docs/transformers/main/model_doc/t5#transformers.T5Model) | None | None |✅|None|None|None|None|None|None|
|<font size=2> [GPT2](https://huggingface.co/docs/transformers/main/model_doc/gpt2#transformers.GPT2Model) |✅ | None |None|✅|None|None|✅|None|None|
|<font size=2> [MarianMT](https://huggingface.co/docs/transformers/main/model_doc/marian#transformers.MarianModel) |✅ | None |✅|None|None|None|None|None|None|
|<font size=2> [ELECTRA](https://huggingface.co/docs/transformers/main/model_doc/electra#transformers.ElectraModel) | None | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2> [DistilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert#transformers.DistilBertModel)|None | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2> [BART](https://huggingface.co/docs/transformers/main/model_doc/bart#transformers.BartModel) |✅ |✅ |✅|✅|None|None|None|✅|None|
|<font size=2> [XLM-RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/xlm-roberta#transformers.XLMRobertaModel) |✅ |✅ | None |✅|✅|None|✅|✅|None|
|<font size=2> [ALBERT](https://huggingface.co/docs/transformers/main/model_doc/albert#transformers.AlbertModel) |None | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2> [LayoutLM](https://huggingface.co/docs/transformers/main/model_doc/layoutlm#transformers.LayoutLMModel) |None | ✅ |None|✅|None|None|✅|None|None|
|<font size=2> [BigBird](https://huggingface.co/docs/transformers/main/model_doc/big_bird#transformers.BigBirdModel) |✅ | ✅ |None|✅|✅|None|✅|✅|None|
|<font size=2> [Wav2Vec2](https://huggingface.co/docs/transformers/main/model_doc/wav2vec2#transformers.Wav2Vec2Model) | None | None |None|None|None|None|None|None|✅|
Notes:
- 上表的列表示 backbone 类型,行表示任务类型
- None 表示该 backbone 不支持相关任务(与模型转换无关)
## ONNX
<font size=0.5>
| | <font size=2> CausalLM |<font size=2> MaskedLM | <font size=2> Seq2SeqLM | <font size=2>SequenceClassification |<font size=2>TokenClassification | <font size=2>QuestionAnswering |
|---|---|---|---|---|---|---|
| <font size=2> [BERT](https://huggingface.co/docs/transformers/main/model_doc/bert#transformers.BertModel) |✅ | ✅ | None |✅|✅|✅|
| <font size=2> [RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta#transformers.RobertaModel) |✅ |✅ | None |✅|✅|✅|
|<font size=2> [T5](https://huggingface.co/docs/transformers/main/model_doc/t5#transformers.T5Model) | None | None |✅|None|None|None|
|<font size=2> [GPT2](https://huggingface.co/docs/transformers/main/model_doc/gpt2#transformers.GPT2Model) |✅ | None |None|✅|✅|None|
|<font size=2> [MarianMT](https://huggingface.co/docs/transformers/main/model_doc/marian#transformers.MarianModel) |✅ | None |✅|None|None|None|
|<font size=2> [ELECTRA](https://huggingface.co/docs/transformers/main/model_doc/electra#transformers.ElectraModel) | None | ✅ |None|✅|✅|✅|
|<font size=2> [DistilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert#transformers.DistilBertModel)|None | ✅ |None|✅|✅|✅|
|<font size=2> [BART](https://huggingface.co/docs/transformers/main/model_doc/bart#transformers.BartModel) |✅ |None |✅|✅|None|✅|
|<font size=2> [XLM-RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/xlm-roberta#transformers.XLMRobertaModel) |✅ |✅ | None |✅|✅|✅|
|<font size=2> [ALBERT](https://huggingface.co/docs/transformers/main/model_doc/albert#transformers.AlbertModel) |None | ✅ |None|✅|✅|✅|
|<font size=2> [LayoutLM](https://huggingface.co/docs/transformers/main/model_doc/layoutlm#transformers.LayoutLMModel) |None | ✅ |None|✅|✅|None|
Notes:
- 上表的列表示 backbone 类型,行表示任务类型
- None 表示 HuggingFace 不支持 ONNX 模型导出(与模型转换无关)
## 转换教程
### 环境依赖
- python >= 3.5
- Huggingface 4.16.0
- PyTorch 1.7.1
- PaddlePaddle 2.3.0
- ONNX 1.9.0
### Torch模型转换到Paddle
以模型为 Bert、任务类型为 SequenceClassification 为例,运行如下代码:
```code
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from x2paddle.convert import pytorch2paddle
save_dir = "pd_model"
jit_type = "trace"
# Load tokenizer and PyTorch weights form the Hub
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", return_dict=False)
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
pt_model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", return_dict=False)
pt_model.eval()
result = pt_model(inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"])
# convert
pytorch2paddle(pt_model, save_dir , jit_type, [inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"]])
```
### ONNX 模型转换到 Paddle
#### 步骤一、通过 HuggingFace 导出 ONNX 模型
使用命令行
```shell
python -m transformers.onnx --model=bert-base-uncased onnx/
```
更多细节可参考 HuggingFace [ONNX导出教程](https://huggingface.co/docs/transformers/main/serialization#exporting-a-model-to-onnx)
#### 步骤二、通过X2Paddle将ONNX模型转换为Paddle格式
通过 X2Paddle 将 ONNX 模型转换为 Paddle
```shell
x2paddle --framework=onnx --model=model.onnx --save_dir=pd_model_dynamic
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册