# KIE Algorithm - LayoutXLM - [1. Introduction](#1-introduction) - [2. Environment](#2-environment) - [3. Model Training / Evaluation / Prediction](#3-model-training--evaluation--prediction) - [4. Inference and Deployment](#4-inference-and-deployment) - [4.1 Python Inference](#41-python-inference) - [4.2 C++ Inference](#42-c-inference) - [4.3 Serving](#43-serving) - [4.4 More](#44-more) - [5. FAQ](#5-faq) - [Citation](#Citation) ## 1. Introduction Paper: > [LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding](https://arxiv.org/abs/2104.08836) > > Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei > > 2021 On XFUND_zh dataset, the algorithm reproduction Hmean is as follows. |Model|Backbone|Task |Cnnfig|Hmean|Download link| | --- | --- |--|--- | --- | --- | |LayoutXLM|LayoutXLM-base|SER |[ser_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml)|90.38%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)/[inference model](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh_infer.tar)| |LayoutXLM|LayoutXLM-base|RE | [re_layoutxlm_xfund_zh.yml](../../configs/kie/layoutlm_series/re_layoutxlm_xfund_zh.yml)|74.83%|[trained model](https://paddleocr.bj.bcebos.com/pplayout/re_LayoutXLM_xfun_zh.tar)/[inference model(coming soon)]()| ## 2. Environment Please refer to ["Environment Preparation"](./environment_en.md) to configure the PaddleOCR environment, and refer to ["Project Clone"](./clone_en.md) to clone the project code. ## 3. Model Training / Evaluation / Prediction Please refer to [KIE tutorial](./kie_en.md)。PaddleOCR has modularized the code structure, so that you only need to **replace the configuration file** to train different models. ## 4. Inference and Deployment ### 4.1 Python Inference **Note:** Currently, the RE model inference process is still in the process of adaptation. We take SER model as an example to introduce the KIE process based on LayoutXLM model. First, we need to export the trained model into inference model. Take LayoutXLM model trained on XFUND_zh as an example ([trained model download link](https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar)). Use the following command to export. ``` bash wget https://paddleocr.bj.bcebos.com/pplayout/ser_LayoutXLM_xfun_zh.tar tar -xf ser_LayoutXLM_xfun_zh.tar python3 tools/export_model.py -c configs/kie/layoutlm_series/ser_layoutxlm_xfund_zh.yml -o Architecture.Backbone.checkpoints=./ser_LayoutXLM_xfun_zh/best_accuracy Global.save_inference_dir=./inference/ser_layoutxlm ``` Use the following command to infer using LayoutXLM SER model. ```bash cd ppstructure python3 kie/predict_kie_token_ser.py \ --kie_algorithm=LayoutXLM \ --ser_model_dir=../inference/ser_layoutxlm_infer \ --image_dir=./docs/kie/input/zh_val_42.jpg \ --ser_dict_path=../train_data/XFUND/class_list_xfun.txt \ --vis_font_path=../doc/fonts/simfang.ttf ``` The SER visualization results are saved in the `./output` directory by default. The results are as follows.