init VIMER-StrucTexT (#5648)

4c5c73d4 · yulin.li · GitHub · da7e73ff · 4c5c73d4 · 4c5c73d4
9 changed file
--- a/modelcenter/VIMER-StrucTexT/doc/busticket_vis.png
+++ b/modelcenter/VIMER-StrucTexT/doc/busticket_vis.png
--- a/modelcenter/VIMER-StrucTexT/doc/print_vis.png
+++ b/modelcenter/VIMER-StrucTexT/doc/print_vis.png
--- a/modelcenter/VIMER-StrucTexT/doc/receipt_vis.png
+++ b/modelcenter/VIMER-StrucTexT/doc/receipt_vis.png
--- a/modelcenter/VIMER-StrucTexT/doc/structext_arch.png
+++ b/modelcenter/VIMER-StrucTexT/doc/structext_arch.png
--- a/modelcenter/VIMER-StrucTexT/download_cn.md
+++ b/modelcenter/VIMER-StrucTexT/download_cn.md
+# 信息抽取推理模型：
+
+
+| 模型名称 |  模型结构  | 数据集  | 任务 | 模型大小 | 下载地址 |
+| :---: | :--------: | :--------: | :--------: |:--------: |:--------: |
+| `StrucTexT_base_ephoie_labeling` | 12-layers, 768-hidden, 12-heads | EPHOIE | 实体分类 | 631MB|[下载链接](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_base_ephoie_labeling.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A49%3A14Z%2F-1%2Fhost%2F6f75cab944e45627d5f1f630377f4b016dd94104322136bc060912273b852d52) |
+| `StrucTexT_large_ephoie_labeling` | 24-layers, 1024-hidden, 16-heads | EPHOIE | 实体分类 | 1.6GB|[下载链接](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_large_ephoie_labeling.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A50%3A36Z%2F-1%2Fhost%2Fa4b284ef849b5ada868403cc8626d1d89d4ca3b5b89b40fc3e0b8e7ffc251753) |
+| `StrucTexT_base_funsd_labeling` | 12-layers, 768-hidden, 12-heads | FUNSD | 实体分类 | 631MB|[下载链接](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_base_funsd_labeling.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A49%3A49Z%2F-1%2Fhost%2Fc3b3648f106aaaf1c73c7876c2012fa55c74016325f0352892323165c5b3a16c) |
+| `StrucTexT_base_funsd_linking`| 6-layers, 768-hidden, 12-heads | FUNSD | 实体关系 | 631MB|[下载链接](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_base_funsd_linking.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A50%3A16Z%2F-1%2Fhost%2F50aee9546d618b296ec1dfc54084e497ee4836f7034dbe6e2131c26a4003f870) |
\ No newline at end of file
--- a/modelcenter/VIMER-StrucTexT/download_en.md
+++ b/modelcenter/VIMER-StrucTexT/download_en.md
+# Information Extraction Inference Model：
+
+
+| Model Name    |  Model Details  | Benchmark  | Application | Model Size | Download |
+| :---: | :--------: | :--------: | :--------: |:--------: |:--------: |
+| `StrucTexT_base_ephoie_labeling` | 12-layers, 768-hidden, 12-heads | EPHOIE | Entity Labeling | 631MB|[link](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_base_ephoie_labeling.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A49%3A14Z%2F-1%2Fhost%2F6f75cab944e45627d5f1f630377f4b016dd94104322136bc060912273b852d52) |
+| `StrucTexT_large_ephoie_labeling` | 24-layers, 1024-hidden, 16-heads | EPHOIE | Entity Labeling | 1.6GB|[link](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_large_ephoie_labeling.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A50%3A36Z%2F-1%2Fhost%2Fa4b284ef849b5ada868403cc8626d1d89d4ca3b5b89b40fc3e0b8e7ffc251753) |
+| `StrucTexT_base_funsd_labeling` | 12-layers, 768-hidden, 12-heads | FUNSD | Entity Labeling | 631MB|[link](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_base_funsd_labeling.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A49%3A49Z%2F-1%2Fhost%2Fc3b3648f106aaaf1c73c7876c2012fa55c74016325f0352892323165c5b3a16c) |
+| `StrucTexT_base_funsd_linking`| 6-layers, 768-hidden, 12-heads | FUNSD | Entity Linking | 631MB|[link](https://aisee.bj.bcebos.com/VIMER/StrucTexT/StrucTexT_base_funsd_linking.pdparams.tar.gz?authorization=bce-auth-v1%2Fdb0b41e2ab894ecfb1126a768c603d79%2F2021-12-02T08%3A50%3A16Z%2F-1%2Fhost%2F50aee9546d618b296ec1dfc54084e497ee4836f7034dbe6e2131c26a4003f870) |
\ No newline at end of file
--- a/modelcenter/VIMER-StrucTexT/info.yaml
+++ b/modelcenter/VIMER-StrucTexT/info.yaml
+---
+Model_Info:
+  name: "VIMER-StrucTexT1.0"
+  description: "结构化文本理解"
+  description_en: "Structured Text Understanding"
+  icon: ""
+  from_repo: "VIMER"
+Task:
+- tag_en: "Wenxin Big Models"
+  tag: "文心CV大模型"
+  sub_tag_en: "Multi-modal Information Extraction"
+  sub_tag: "多模态信息提取"
+Datasets: "SROIE,FUNSD,EPHOIE,XFUND"
+Pulisher: "Baidu"
+License: "apache.2.0"
+Paper:
+   - title: "Structured Text Understanding with Multi-Modal Transformers"
+     url: "https://arxiv.org/pdf/2108.02923.pdf"
+IfTraining: 0
+IfOnlineDemo: 1
\ No newline at end of file
--- a/modelcenter/VIMER-StrucTexT/introduction_cn.ipynb
+++ b/modelcenter/VIMER-StrucTexT/introduction_cn.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "source": [
+    "# VIMER-StrucTexT 1.0\n",
+    "\n",
+    "## 1.模型简介\n",
+    "\n",
+    "随着OCR技术在产业应用的快速发展，现实场景对OCR提出新的需求：从感知走向认知。结构化信息提取逐渐成为OCR产业应用的核心技术之一，旨在快速且准确地分析卡证、票据、档案图像等富视觉数据中的结构化文字信息，并对关键数据进行提取。\n",
+    "因此，百度OCR团队提出了一个联合字符级别和字段级别文本多模态特征增强的预训练大模型VIMER-StrucTexT，支持文档、票据、卡证等富视觉图像中的OCR结构化信息提取，并应用于实体分类和实体链接两大任务类型。VIMER-StrucTexT通过实现字符级和字段级表达灵活输出的框架，能够根据下游任务的特性，选择合理的建模粒度去支持上述的结构化任务。除此之外，针对文档票据等富视觉文本图像，充分利用大数据优势，创新性引入字段长度预测和字段方位预测等自监督预训练方式，VIMER-StrucTexT拥有更丰富的多模态特征表达。\n",
+    "\n",
+    "![structext_arch](./doc/structext_arch.png#pic_center)\n",
+    "<center><b>模型结构</b></center>\n",
+    "\n",
+    "## 2.模型效果\n",
+    "\n",
+    "我们在三个文档结构化理解任务上微调了VIMER-StrucTexT，分别是字符级别实体分类(**T-ELB**)、字段级别实体分类(**S-ELB**)、字段级别实体连接(**S-ELK**)。**注意：**后文中的实体分类任务的效果指标均为**字段级别 F1 score**，该指标计算相关字段类型的Macro F1 score。\n",
+    "\n",
+    "### 2.1 字符级别实体分类\n",
+    "\n",
+    "#### 2.1.1 数据集\n",
+    "\n",
+    "- [EPHOIE](https://github.com/HCIILAB/EPHOIE)主要来源于扫描版中文试卷文档。\n",
+    "该数据集包括10种字段类型，每种字段均为字符级别的标注，也就是说一段文本行中不同的单字符可能属于不同的字段类型。字符级别实体分类包含的类型：学科、测试时间、姓名、学校、考试号、座位号、班级、学号、年级、分数。\n",
+    "| 模型                          | **字段级别 F1 score**          |\n",
+    "| :---------------------------- | :----------------------------: |\n",
+    "| Base model      |            0.9884              |\n",
+    "| Large model      |           0.9930              |\n",
+    "\n",
+    "### 2.2 字段级别实体分类\n",
+    "\n",
+    "#### 2.2.1 数据集\n",
+    "\n",
+    "- [SROIE](https://rrc.cvc.uab.es/?ch=13&com=introduction)是一个用于票据信息抽取的公开数据集，由ICDAR 2019 Chanllenge提供。它包含了626张训练票据数据以及347张测试票据数据，每张票据都包含以下四个预定义字段：`公司名, 日期, 地址, 总价`。\n",
+    "- [FUNSD](https://guillaumejaume.github.io/FUNSD/)是一个用于表单理解的数据集，它包含199张真实的、完全标注的扫描版图片，类型包括市场报告、广告以及学术报告等，并分为149张训练集以及50张测试集。FUNSD数据集适用于多种类型的任务，我们专注于解决其中的字段级别实体分类以及字段级别实体连接任务。\n",
+    "- [XFUND](https://github.com/doc-analysis/XFUND)是一个多语种表单理解数据集，它包含7种不同语种表单数据，并且全部用人工进行了键-值对形式的标注。其中每个语种的数据都包含了199张表单数据，并分为149张训练集以及50张测试集，我们测试了XFUND的中文子数据集。\n",
+    "\n",
+    "| 模型        | **SROIE**    | **FUNSD**      | **XFUND-ZH**    |\n",
+    "| :-----------------------------| :----------------------------: | :----------------------------: | :----------------------------: |\n",
+    "| Base model      |           0.9827               |           0.8483               |\n",
+    "| Large model     |           0.9870               |           0.8756               |\n",
+    "\n",
+    "### 2.3 字段级别实体连接\n",
+    "\n",
+    "#### 2.3.1 数据集\n",
+    "\n",
+    "- [FUNSD](https://guillaumejaume.github.io/FUNSD)标注的连接格式为`(entity_from, entity_to)`，代表着该连接为一对“问题-答案”，模型主要任务为预测语义实体之间的连接关系。\n",
+    "- [XFUND](https://github.com/doc-analysis/XFUND)的实验设置与FUNSD相同，我们在其中的中文子数据集测试了模型的效果。\n",
+    "\n",
+    "| 模型        | **SROIE**    | **FUNSD**      | **XFUND-ZH**    |\n",
+    "| :-----------------------------| :----------------------------: | :----------------------------: | :----------------------------: |\n",
+    "| Base model      |           0.7045               |           0.8009               |\n",
+    "| Large model     |           0.7421               |           0.8681               |\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.快速体验\n",
+    "\n",
+    "### 3.1 安装PaddlePaddle\n",
+    "\n",
+    "本代码库基于`PaddlePaddle 2.1.0+`，你可参阅[paddlepaddle-quick](https://www.paddlepaddle.org.cn/install/quick)进行环境准备，或者使用pip进行安装："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "!pip3 install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2 依赖库\n",
+    "\n",
+    "相关依赖库已在`requirements.txt`中列出，你可以使用以下命令行进行依赖库安装：\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "!pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.3 推理模型\n",
+    "#### 3.3.1 EPHOIE数据集字符级别实体分类推理"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! python data/make_ephoie_data.py --config_file ./configs/(base/large)/labeling_ephoie.json --label_file examples/ephoie/test_list.txt --label_dir <ephoie_folder>/final_release_image_20201222/ --kvpair_dir <ephoie_folder>/final_release_kvpair_20201222/ --out_dir <ephoie_folder>/test_labels\n",
+    "! python ./tools/eval_infer.py --config_file ./configs/(base/large)/labeling_ephoie.json --task_type labeling_token --label_path <ephoie_folder>/test_labels/ --image_path <ephoie_folder>/final_release_image_20201222/ --weights_path StrucTexT_ephoie_(base/large)_labeling.pdparams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.3.2 FUNSD数据集字段级别实体分类任务"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! python data/make_funsd_data.py --config_file ./configs/base/labeling_funsd.json --label_dir <funsd_folder>/dataset/testing_data/annotations/ --out_dir <funsd_folder>/dataset/testing_data/test_labels\n",
+    "! python ./tools/eval_infer.py --config_file ./configs/(base/large)/labeling_ephoie.json --task_type labeling_token --label_path <ephoie_folder>/test_labels/ --image_path <ephoie_folder>/final_release_image_20201222/ --weights_path StrucTexT_ephoie_(base/large)_labeling.pdparams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.3.3 FUNSD数据集字段级别实体连接任务"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! python data/make_funsd_data.py --config_file ./configs/base/linking_funsd.json --label_dir <funsd_folder>/dataset/testing_data/annotations/ --out_dir <funsd_folder>/dataset/testing_data/test_labels\n",
+    "! python ./tools/eval_infer.py --config_file ./configs/base/linking_funsd.json --task_type linking --label_path <funsd_folder>/dataset/testing_data/test_labels/ --image_path <funsd_folder>/dataset/testing_data/images/ --weights_path StrucTexT_funsd_base_linking.pdparams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.产品应用\n",
+    "\n",
+    "以下可视化数据来源于StrucTexT的实际应用效果。*不同颜色代表不同的实体类别，实体之间的黑色连接线代表它们属于同一实体，橙色连接线代表实体之间存在连接关系。*\n",
+    "- 购物小票\n",
+    "![example_receipt](./doc/receipt_vis.png#pic_center)\n",
+    "- 船票/车票\n",
+    "![example_busticket](./doc/busticket_vis.png#pic_center)\n",
+    "- 机打发票\n",
+    "![example_print](./doc/print_vis.png#pic_center)\n",
+    "- 更多相关信息与应用，请参考[Baidu OCR](https://ai.baidu.com/tech/ocr)开放平台。\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.引用\n",
+    "相关文献请引用：\n",
+    "```\n",
+    "@inproceedings{li2021structext,\n",
+    "  title={StrucTexT: Structured Text Understanding with Multi-Modal Transformers},\n",
+    "  author={Li, Yulin and Qian, Yuxi and Yu, Yuechen and Qin, Xiameng and Zhang, Chengquan and Liu, Yan and Yao, Kun and Han, Junyu and Liu, Jingtuo and Ding, Errui},\n",
+    "  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},\n",
+    "  pages={1912--1920},\n",
+    "  year={2021}\n",
+    "}\n",
+    "```\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
--- a/modelcenter/VIMER-StrucTexT/introduction_en.ipynb
+++ b/modelcenter/VIMER-StrucTexT/introduction_en.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "jupyter": {
+     "outputs_hidden": false
+    }
+   },
+   "source": [
+    "# VIMER-StrucTexT 1.0\n",
+    "\n",
+    "## 1.Introduction\n",
+    "\n",
+    "VIMER-StrucTexT is a joint segment-level and token-level representation enhancement model for document image understanding, such as pdf, invoice, receipt and so on. Due to the complexity of content and layout in visually rich documents (VRDs), structured text understanding has been a challenging task. Most existing studies decoupled this problem into two sub-tasks: entity labeling and entity linking, which require an entire understanding of the context of documents at both token and segment levels. VIMER-StrucTexT is a unified framework, which can flexibly and effectively deal with the above subtasks in their required representation granularity. Besides, we design a novel pre-training strategy with the self-supervised tasks, named **Segment Length Prediction** and **Paired Box Direction**, to learn a richer multi-modal representation.\n",
+    "\n",
+    "![structext_arch](./doc/structext_arch.png#pic_center)\n",
+    "<center><b>Model Architecture</b></center>\n",
+    "\n",
+    "## 2.Model Effects\n",
+    "\n",
+    "We funtune VIMER-StrucTexT on three structured text understanding tasks, i.e, Token-based Entity Labeling (**T-ELB**), Segment-based Entity Labeling (**S-ELB**), and Entity Linking (**S-ELK**). **Note:** Below performance of all **ELB** tasks are the **entity-level F1 score** that calculate Macro F1 scores among related entity types.\n",
+    "\n",
+    "### 2.1 Token-based Entity Labeling\n",
+    "\n",
+    "#### 2.1.1 Datasets\n",
+    "\n",
+    "- [EPHOIE](https://github.com/HCIILAB/EPHOIE) is collected from scanned Chinese examination papers.\n",
+    "There are 10 text types, which are marked at the character level, where means a text segment composed of characters with different categories.\n",
+    "The required token-based entity types are as follow: ``` Subject, Test Time, Name, School, Examination Number, Seat Number, Class, Student Number, Grade, and Score. ```\n",
+    "\n",
+    "| Models                          | **entity-level F1 score**          |\n",
+    "| :---------------------------- | :----------------------------: |\n",
+    "| Base model      |            0.9884              |\n",
+    "| Large model      |            0.9930              |\n",
+    "\n",
+    "### 2.2 Segment-based Entity Labeling\n",
+    "\n",
+    "#### 2.2.1 Datasets\n",
+    "\n",
+    "- [SROIE](https://rrc.cvc.uab.es/?ch=13&com=introduction) is a public dateset for receipt information extraction in ICDAR 2019 Chanllenge. It contains of 626 receipts for training and 347 receipts for testing. Every receipt contains four predefined values: `company, date, address, and total`.\n",
+    "- [FUNSD](https://guillaumejaume.github.io/FUNSD/) is a form understanding benchmark with 199 real, fully annotated, scanned form images, such as marketing, advertising, and scientific reports, which is split into 149 training samples and 50 testing samples. FUNSD dataset is suitable for a variety of tasks and we force on the sub-tasks of **T-ELB** and **S-ELK** for the semantic entity `question, answer and header`.\n",
+    "- [XFUND](https://github.com/doc-analysis/XFUND)  is a multilingual form understanding benchmark dataset that includes human-labeled forms with key-value pairs in 7 languages. Each language includes 199 forms, where the training set includes 149 forms, and the test set includes 50 forms. We evaluate on the Chinses section of XFUND.\n",
+    "\n",
+    "| Models       | **SROIE**    | **FUNSD**      | **XFUND-ZH**    |\n",
+    "| :-----------------------------| :----------------------------: | :----------------------------: | :----------------------------: |\n",
+    "| Base model      |           0.9827               |           0.8483              |\n",
+    "| Large model     |           0.9870               |           0.8756              |\n",
+    "\n",
+    "### 2.3 Segment-based Entity Linking\n",
+    "\n",
+    "#### 2.3.1 Datasets\n",
+    "\n",
+    "- [FUNSD](https://guillaumejaume.github.io/FUNSD) annotates the links formatted as `(entity_from, entity_to)`, resulting in a question–answer pair. It guides the task of predicting the relations between semantic entities.\n",
+    "- [XFUND](https://github.com/doc-analysis/XFUND) is the same setting as FUNSD. We evaluate on the Chinses section of the dataset.\n",
+    "\n",
+    "| Models       | **SROIE**    | **FUNSD**      | **XFUND-ZH**    |\n",
+    "| :-----------------------------| :----------------------------: | :----------------------------: | :----------------------------: |\n",
+    "| Base model      |           0.7045               |           0.830               |\n",
+    "| Large model     |           0.7421               |           0.8681              |\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.Quick Experience\n",
+    "\n",
+    "### 3.1 Install PaddlePaddle\n",
+    "\n",
+    "This code base needs to be executed on the `PaddlePaddle 2.1.0+`. You can find how to prepare the environment from this [paddlepaddle-quick](https://www.paddlepaddle.org.cn/install/quick) or use pip:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "!pip3 install paddlepaddle-gpu --upgrade -i https://mirror.baidu.com/pypi/simple"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2 Model Dependencies\n",
+    "\n",
+    "Dependencies are listed in file `requirements.txt`, you can use the following command to install these packages.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "!pip3 install --upgrade -r requirements.txt -i https://mirror.baidu.com/pypi/simple"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.3 Model Inference\n",
+    "#### 3.3.1 Token-based ELB task on EPHOIE"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! python data/make_ephoie_data.py --config_file ./configs/(base/large)/labeling_ephoie.json --label_file examples/ephoie/test_list.txt --label_dir <ephoie_folder>/final_release_image_20201222/ --kvpair_dir <ephoie_folder>/final_release_kvpair_20201222/ --out_dir <ephoie_folder>/test_labels\n",
+    "! python ./tools/eval_infer.py --config_file ./configs/(base/large)/labeling_ephoie.json --task_type labeling_token --label_path <ephoie_folder>/test_labels/ --image_path <ephoie_folder>/final_release_image_20201222/ --weights_path StrucTexT_ephoie_(base/large)_labeling.pdparams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.3.2 Segment-based ELB task on FUNSD"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! python data/make_funsd_data.py --config_file ./configs/base/labeling_funsd.json --label_dir <funsd_folder>/dataset/testing_data/annotations/ --out_dir <funsd_folder>/dataset/testing_data/test_labels\n",
+    "! python ./tools/eval_infer.py --config_file ./configs/(base/large)/labeling_ephoie.json --task_type labeling_token --label_path <ephoie_folder>/test_labels/ --image_path <ephoie_folder>/final_release_image_20201222/ --weights_path StrucTexT_ephoie_(base/large)_labeling.pdparams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### 3.3.3 Segment-based ELK task on FUNSD"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "execution": {},
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "! python data/make_funsd_data.py --config_file ./configs/base/linking_funsd.json --label_dir <funsd_folder>/dataset/testing_data/annotations/ --out_dir <funsd_folder>/dataset/testing_data/test_labels\n",
+    "! python ./tools/eval_infer.py --config_file ./configs/base/linking_funsd.json --task_type linking --label_path <funsd_folder>/dataset/testing_data/test_labels/ --image_path <funsd_folder>/dataset/testing_data/images/ --weights_path StrucTexT_funsd_base_linking.pdparams"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.Product Applications\n",
+    "\n",
+    "The following visualized images are sampled from the domestic practical applications of StrucTexT. *Different colors of masks represent different entity categories. There are black lines between different segments or text lines, indicating that they belong to the same entity. And the orange lines indicate the link relationship between entities.*\n",
+    "- Shopping receipts\n",
+    "![example_receipt](./doc/receipt_vis.png#pic_center)\n",
+    "- Bus/Ship receipts\n",
+    "![example_busticket](./doc/busticket_vis.png#pic_center)\n",
+    "- Printed receipts\n",
+    "![example_print](./doc/print_vis.png#pic_center)\n",
+    "- For more information and applications, please visit [Baidu OCR](https://ai.baidu.com/tech/ocr) open platform."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.Citation\n",
+    "You can cite our paper as below:\n",
+    "```\n",
+    "@inproceedings{li2021structext,\n",
+    "  title={StrucTexT: Structured Text Understanding with Multi-Modal Transformers},\n",
+    "  author={Li, Yulin and Qian, Yuxi and Yu, Yuechen and Qin, Xiameng and Zhang, Chengquan and Liu, Yan and Yao, Kun and Han, Junyu and Liu, Jingtuo and Ding, Errui},\n",
+    "  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},\n",
+    "  pages={1912--1920},\n",
+    "  year={2021}\n",
+    "}\n",
+    "```\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}