diff --git a/docs/inference_model_convertor/toolkits/HuggingFace2paddle.md b/docs/inference_model_convertor/toolkits/HuggingFace2paddle.md new file mode 100644 index 0000000000000000000000000000000000000000..7e45870bf7bd2d35d407c5cf0a1b21ce856b8973 --- /dev/null +++ b/docs/inference_model_convertor/toolkits/HuggingFace2paddle.md @@ -0,0 +1,105 @@ +# HuggingFace 模型导出为 Paddle 模型教程 + +X2Paddle 新增对 HuggingFace 模型支持,目前模型支持的列表如下 + +## TorchScript + + + +| | CausalLM | MaskedLM | Seq2SeqLM | SequenceClassification | MultipleChoice |NextSentencePrediction |TokenClassification | QuestionAnswering | AudioClassification | +|---|---|---|---|---|---|---|---|---|---| +| [BERT](https://huggingface.co/docs/transformers/main/model_doc/bert#transformers.BertModel) |✅ | ✅ | None |✅|✅|✅|✅|✅|None| +| [RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta#transformers.RobertaModel) |✅ |✅ | None |✅|✅| None |✅|✅| None | +| [T5](https://huggingface.co/docs/transformers/main/model_doc/t5#transformers.T5Model) | None | None |✅|None|None|None|None|None|None| +| [GPT2](https://huggingface.co/docs/transformers/main/model_doc/gpt2#transformers.GPT2Model) |✅ | None |None|✅|None|None|✅|None|None| +| [MarianMT](https://huggingface.co/docs/transformers/main/model_doc/marian#transformers.MarianModel) |✅ | None |✅|None|None|None|None|None|None| +| [ELECTRA](https://huggingface.co/docs/transformers/main/model_doc/electra#transformers.ElectraModel) | None | ✅ |None|✅|✅|None|✅|✅|None| +| [DistilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert#transformers.DistilBertModel)|None | ✅ |None|✅|✅|None|✅|✅|None| +| [BART](https://huggingface.co/docs/transformers/main/model_doc/bart#transformers.BartModel) |✅ |✅ |✅|✅|None|None|None|✅|None| +| [XLM-RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/xlm-roberta#transformers.XLMRobertaModel) |✅ |✅ | None |✅|✅|None|✅|✅|None| +| [ALBERT](https://huggingface.co/docs/transformers/main/model_doc/albert#transformers.AlbertModel) |None | ✅ |None|✅|✅|None|✅|✅|None| +| [LayoutLM](https://huggingface.co/docs/transformers/main/model_doc/layoutlm#transformers.LayoutLMModel) |None | ✅ |None|✅|None|None|✅|None|None| +| [BigBird](https://huggingface.co/docs/transformers/main/model_doc/big_bird#transformers.BigBirdModel) |✅ | ✅ |None|✅|✅|None|✅|✅|None| +| [Wav2Vec2](https://huggingface.co/docs/transformers/main/model_doc/wav2vec2#transformers.Wav2Vec2Model) | None | None |None|None|None|None|None|None|✅| + +Notes: + +- 上表的列表示 backbone 类型,行表示任务类型 +- None 表示该 backbone 不支持相关任务(与模型转换无关) + +## ONNX + + + +| | CausalLM | MaskedLM | Seq2SeqLM | SequenceClassification | MultipleChoice |NextSentencePrediction |TokenClassification | QuestionAnswering | AudioClassification | +|---|---|---|---|---|---|---|---|---|---| +| [BERT](https://huggingface.co/docs/transformers/main/model_doc/bert#transformers.BertModel) |✅ | ✅ | None |✅|None|None|✅|✅|None| +| [RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta#transformers.RobertaModel) |✅ |✅ | None |✅|None| None |✅|✅| None | +| [T5](https://huggingface.co/docs/transformers/main/model_doc/t5#transformers.T5Model) | None | None |✅|None|None|None|None|None|None| +| [GPT2](https://huggingface.co/docs/transformers/main/model_doc/gpt2#transformers.GPT2Model) |✅ | None |None|✅|None|None|✅|None|None| +| [MarianMT](https://huggingface.co/docs/transformers/main/model_doc/marian#transformers.MarianModel) |✅ | None |✅|None|None|None|None|None|None| +| [ELECTRA](https://huggingface.co/docs/transformers/main/model_doc/electra#transformers.ElectraModel) | None | ✅ |None|✅|None|None|✅|✅|None| +| [DistilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert#transformers.DistilBertModel)|None | ✅ |None|✅|None|None|✅|✅|None| +| [BART](https://huggingface.co/docs/transformers/main/model_doc/bart#transformers.BartModel) |✅ |None |✅|✅|None|None|None|✅|None| +| [XLM-RoBERTa](https://huggingface.co/docs/transformers/main/model_doc/xlm-roberta#transformers.XLMRobertaModel) |✅ |✅ | None |✅|None|None|✅|✅|None| +| [ALBERT](https://huggingface.co/docs/transformers/main/model_doc/albert#transformers.AlbertModel) |None | ✅ |None|✅|None|None|✅|✅|None| +| [LayoutLM](https://huggingface.co/docs/transformers/main/model_doc/layoutlm#transformers.LayoutLMModel) |None | ✅ |None|✅|None|None|✅|None|None| + +Notes: + +- 上表的列表示 backbone 类型,行表示任务类型 +- None 表示 HuggingFace 不支持 ONNX 模型导出(与模型转换无关) + +## 转换教程 + +### 环境依赖 + +- python >= 3.5 +- Huggingface 4.16.0 +- PyTorch 1.7.1 +- PaddlePaddle 2.3.0 +- ONNX 1.9.0 + +### Torch模型转换到Paddle + +以模型为 Bert、任务类型为 SequenceClassification 为例,运行如下代码: + +```code +import torch + +from transformers import AutoTokenizer, AutoModelForSequenceClassification +from x2paddle.convert import pytorch2paddle + +save_dir = "pd_model" +jit_type = "trace" + +# Load tokenizer and PyTorch weights form the Hub +tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased", return_dict=False) +inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") +pt_model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", return_dict=False) +pt_model.eval() +result = pt_model(inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"]) + +# convert +pytorch2paddle(pt_model, save_dir , jit_type, [inputs["input_ids"], inputs["attention_mask"], inputs["token_type_ids"]]) +``` + +### ONNX 模型转换到 Paddle + +#### 步骤一、通过 HuggingFace 导出 ONNX 模型 + +使用命令行 + +```shell +python -m transformers.onnx --model=bert-base-uncased onnx/ +``` + +更多细节可参考 HuggingFace [ONNX导出教程](https://huggingface.co/docs/transformers/main/serialization#exporting-a-model-to-onnx) + +#### 步骤二、通过X2Paddle将ONNX模型转换为Paddle格式 + +通过 X2Paddle 将 ONNX 模型转换为 Paddle + +```shell +x2paddle --framework=onnx --model=model.onnx --save_dir=pd_model_dynamic +```