提交 9e1a77ea 编写于 作者: A andyjpaddle

update dict

...@@ -131,7 +131,7 @@ pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu. ...@@ -131,7 +131,7 @@ pip3 install dist/PPOCRLabel-1.0.2-py2.py3-none-any.whl -i https://mirror.baidu.
> 注意:如果表格中存在空白单元格,同样需要使用一个标注框将其标出,使得单元格总数与图像中保持一致。 > 注意:如果表格中存在空白单元格,同样需要使用一个标注框将其标出,使得单元格总数与图像中保持一致。
3. **调整单元格顺序:**点击软件`视图-显示框编号` 打开标注框序号,在软件界面右侧拖动 `识别结果` 一栏下的所有结果,使得标注框编号按照从左到右,从上到下的顺序排列 3. **调整单元格顺序**:点击软件`视图-显示框编号` 打开标注框序号,在软件界面右侧拖动 `识别结果` 一栏下的所有结果,使得标注框编号按照从左到右,从上到下的顺序排列,按行依次标注。
4. 标注表格结构:**在外部Excel软件中,将存在文字的单元格标记为任意标识符(如 `1` )**,保证Excel中的单元格合并情况与原图相同即可(即不需要Excel中的单元格文字与图片中的文字完全相同) 4. 标注表格结构:**在外部Excel软件中,将存在文字的单元格标记为任意标识符(如 `1` )**,保证Excel中的单元格合并情况与原图相同即可(即不需要Excel中的单元格文字与图片中的文字完全相同)
......
[English](README_en.md) | 简体中文
# 场景应用 # 场景应用
PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR垂类应用,在PP-OCR、PP-Structure的通用能力基础之上,以notebook的形式展示利用场景数据微调、模型优化方法、数据增广等内容,为开发者快速落地OCR应用提供示范与启发。 PaddleOCR场景应用覆盖通用,制造、金融、交通行业的主要OCR垂类应用,在PP-OCR、PP-Structure的通用能力基础之上,以notebook的形式展示利用场景数据微调、模型优化方法、数据增广等内容,为开发者快速落地OCR应用提供示范与启发。
> 如需下载全部垂类模型,可以扫描下方二维码,关注公众号填写问卷后,加入PaddleOCR官方交流群获取20G OCR学习大礼包(内含《动手学OCR》电子书、课程回放视频、前沿论文等重磅资料) - [教程文档](#1)
- [通用](#11)
- [制造](#12)
- [金融](#13)
- [交通](#14)
<div align="center"> - [模型下载](#2)
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div> <a name="1"></a>
## 教程文档
<a name="11"></a>
### 通用
| 类别 | 亮点 | 模型下载 | 教程 |
| ---------------------- | ------------ | -------------- | --------------------------------------- |
| 高精度中文识别模型SVTR | 比PP-OCRv3识别模型精度高3%,可用于数据挖掘或对预测效率要求不高的场景。| [模型下载](#2) | [中文](./高精度中文识别模型.md)/English |
| 手写体识别 | 新增字形支持 | | |
<a name="12"></a>
> 如果您是企业开发者且未在下述场景中找到合适的方案,可以填写[OCR应用合作调研问卷](https://paddle.wjx.cn/vj/QwF7GKw.aspx),免费与官方团队展开不同层次的合作,包括但不限于问题抽象、确定技术方案、项目答疑、共同研发等。如果您已经使用PaddleOCR落地项目,也可以填写此问卷,与飞桨平台共同宣传推广,提升企业技术品宣。期待您的提交! ### 制造
## 通用 | 类别 | 亮点 | 模型下载 | 教程 | 示例图 |
| -------------- | ------------------------------ | -------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| 数码管识别 | 数码管数据合成、漏识别调优 | [模型下载](#2) | [中文](./光功率计数码管字符识别/光功率计数码管字符识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/7d5774a273f84efba5b9ce7fd3f86e9ef24b6473e046444db69fa3ca20ac0986" width = "200" height = "100" /> |
| 液晶屏读数识别 | 检测模型蒸馏、Serving部署 | [模型下载](#2) | [中文](./液晶屏读数识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/901ab741cb46441ebec510b37e63b9d8d1b7c95f63cc4e5e8757f35179ae6373" width = "200" height = "100" /> |
| 包装生产日期 | 点阵字符合成、过曝过暗文字识别 | [模型下载](#2) | [中文](./包装生产日期识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/d9e0533cc1df47ffa3bbe99de9e42639a3ebfa5bce834bafb1ca4574bf9db684" width = "200" height = "100" /> |
| PCB文字识别 | 小尺寸文本检测与识别 | [模型下载](#2) | [中文](./PCB字符识别/PCB字符识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/95d8e95bf1ab476987f2519c0f8f0c60a0cdc2c444804ed6ab08f2f7ab054880" width = "200" height = "100" /> |
| 电表识别 | 大分辨率图像检测调优 | [模型下载](#2) | | |
| 液晶屏缺陷检测 | 非文字字符识别 | | | |
| 类别 | 亮点 | 类别 | 亮点 | <a name="13"></a>
| ------------------------------------------------- | -------- | ---------- | ------------ |
| [高精度中文识别模型SVTR](./高精度中文识别模型.md) | 新增模型 | 手写体识别 | 新增字形支持 |
## 制造 ### 金融
| 类别 | 亮点 | 类别 | 亮点 | | 类别 | 亮点 | 模型下载 | 教程 | 示例图 |
| ------------------------------------------------------------ | ------------------------------ | ------------------------------------------- | -------------------- | | -------------- | ------------------------ | -------------- | ----------------------------------- | ------------------------------------------------------------ |
| [数码管识别](./光功率计数码管字符识别/光功率计数码管字符识别.md) | 数码管数据合成、漏识别调优 | 电表识别 | 大分辨率图像检测调优 | | 表单VQA | 多模态通用表单结构化提取 | [模型下载](#2) | [中文](./多模态表单识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/a3b25766f3074d2facdf88d4a60fc76612f51992fd124cf5bd846b213130665b" width = "200" height = "200" /> |
| [液晶屏读数识别](./液晶屏读数识别.md) | 检测模型蒸馏、Serving部署 | [PCB文字识别](./PCB字符识别/PCB字符识别.md) | 小尺寸文本检测与识别 | | 增值税发票 | 尽请期待 | | | |
| [包装生产日期](./包装生产日期识别.md) | 点阵字符合成、过曝过暗文字识别 | 液晶屏缺陷检测 | 非文字字符识别 | | 印章检测与识别 | 端到端弯曲文本识别 | | | |
| 通用卡证识别 | 通用结构化提取 | | | |
| 身份证识别 | 结构化提取、图像阴影 | | | |
| 合同比对 | 密集文本检测、NLP串联 | | | |
## 金融 <a name="14"></a>
| 类别 | 亮点 | 类别 | 亮点 | ### 交通
| ------------------------------ | ------------------------ | ------------ | --------------------- |
| [表单VQA](./多模态表单识别.md) | 多模态通用表单结构化提取 | 通用卡证识别 | 通用结构化提取 | | 类别 | 亮点 | 模型下载 | 教程 | 示例图 |
| 增值税发票 | 尽请期待 | 身份证识别 | 结构化提取、图像阴影 | | ----------------- | ------------------------------ | -------------- | ----------------------------------- | ------------------------------------------------------------ |
| 印章检测与识别 | 端到端弯曲文本识别 | 合同比对 | 密集文本检测、NLP串联 | | 车牌识别 | 多角度图像、轻量模型、端侧部署 | [模型下载](#2) | [中文](./轻量级车牌识别.md)/English | <img src="https://ai-studio-static-online.cdn.bcebos.com/76b6a0939c2c4cf49039b6563c4b28e241e11285d7464e799e81c58c0f7707a7" width = "200" height = "100" /> |
| 驾驶证/行驶证识别 | 尽请期待 | | | |
| 快递单识别 | 尽请期待 | | | |
<a name="2"></a>
## 模型下载
如需下载上述场景中已经训练好的垂类模型,可以扫描下方二维码,关注公众号填写问卷后,加入PaddleOCR官方交流群获取20G OCR学习大礼包(内含《动手学OCR》电子书、课程回放视频、前沿论文等重磅资料)
<div align="center">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
## 交通 如果您是企业开发者且未在上述场景中找到合适的方案,可以填写[OCR应用合作调研问卷](https://paddle.wjx.cn/vj/QwF7GKw.aspx),免费与官方团队展开不同层次的合作,包括但不限于问题抽象、确定技术方案、项目答疑、共同研发等。如果您已经使用PaddleOCR落地项目,也可以填写此问卷,与飞桨平台共同宣传推广,提升企业技术品宣。期待您的提交!
| 类别 | 亮点 | 类别 | 亮点 | <a href="https://trackgit.com">
| ------------------------------- | ------------------------------ | ---------- | -------- | <img src="https://us-central1-trackgit-analytics.cloudfunctions.net/token/ping/l63cvzo0w09yxypc7ygl" alt="traffic" />
| [车牌识别](./轻量级车牌识别.md) | 多角度图像、轻量模型、端侧部署 | 快递单识别 | 尽请期待 | </a>
| 驾驶证/行驶证识别 | 尽请期待 | | |
\ No newline at end of file
# 基于PP-OCRv3的手写文字识别
- [1. 项目背景及意义](#1-项目背景及意义)
- [2. 项目内容](#2-项目内容)
- [3. PP-OCRv3识别算法介绍](#3-PP-OCRv3识别算法介绍)
- [4. 安装环境](#4-安装环境)
- [5. 数据准备](#5-数据准备)
- [6. 模型训练](#6-模型训练)
- [6.1 下载预训练模型](#61-下载预训练模型)
- [6.2 修改配置文件](#62-修改配置文件)
- [6.3 开始训练](#63-开始训练)
- [7. 模型评估](#7-模型评估)
- [8. 模型导出推理](#8-模型导出推理)
- [8.1 模型导出](#81-模型导出)
- [8.2 模型推理](#82-模型推理)
## 1. 项目背景及意义
目前光学字符识别(OCR)技术在我们的生活当中被广泛使用,但是大多数模型在通用场景下的准确性还有待提高。针对于此我们借助飞桨提供的PaddleOCR套件较容易的实现了在垂类场景下的应用。手写体在日常生活中较为常见,然而手写体的识别却存在着很大的挑战,因为每个人的手写字体风格不一样,这对于视觉模型来说还是相当有挑战的。因此训练一个手写体识别模型具有很好的现实意义。下面给出一些手写体的示例图:
![example](https://ai-studio-static-online.cdn.bcebos.com/7a8865b2836f42d382e7c3fdaedc4d307d797fa2bcd0466e9f8b7705efff5a7b)
## 2. 项目内容
本项目基于PaddleOCR套件,以PP-OCRv3识别模型为基础,针对手写文字识别场景进行优化。
Aistudio项目链接:[OCR手写文字识别](https://aistudio.baidu.com/aistudio/projectdetail/4330587)
## 3. PP-OCRv3识别算法介绍
PP-OCRv3的识别模块是基于文本识别算法[SVTR](https://arxiv.org/abs/2205.00159)优化。SVTR不再采用RNN结构,通过引入Transformers结构更加有效地挖掘文本行图像的上下文信息,从而提升文本识别能力。如下图所示,PP-OCRv3采用了6个优化策略。
![v3_rec](https://ai-studio-static-online.cdn.bcebos.com/d4f5344b5b854d50be738671598a89a45689c6704c4d481fb904dd7cf72f2a1a)
优化策略汇总如下:
* SVTR_LCNet:轻量级文本识别网络
* GTC:Attention指导CTC训练策略
* TextConAug:挖掘文字上下文信息的数据增广策略
* TextRotNet:自监督的预训练模型
* UDML:联合互学习策略
* UIM:无标注数据挖掘方案
详细优化策略描述请参考[PP-OCRv3优化策略](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/PP-OCRv3_introduction.md#3-%E8%AF%86%E5%88%AB%E4%BC%98%E5%8C%96)
## 4. 安装环境
```python
# 首先git官方的PaddleOCR项目,安装需要的依赖
git clone https://github.com/PaddlePaddle/PaddleOCR.git
cd PaddleOCR
pip install -r requirements.txt
```
## 5. 数据准备
本项目使用公开的手写文本识别数据集,包含Chinese OCR, 中科院自动化研究所-手写中文数据集[CASIA-HWDB2.x](http://www.nlpr.ia.ac.cn/databases/handwriting/Download.html),以及由中科院手写数据和网上开源数据合并组合的[数据集](https://aistudio.baidu.com/aistudio/datasetdetail/102884/0)等,该项目已经挂载处理好的数据集,可直接下载使用进行训练。
```python
下载并解压数据
tar -xf hw_data.tar
```
## 6. 模型训练
### 6.1 下载预训练模型
首先需要下载我们需要的PP-OCRv3识别预训练模型,更多选择请自行选择其他的[文字识别模型](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.5/doc/doc_ch/models_list.md#2-%E6%96%87%E6%9C%AC%E8%AF%86%E5%88%AB%E6%A8%A1%E5%9E%8B)
```python
# 使用该指令下载需要的预训练模型
wget -P ./pretrained_models/ https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_train.tar
# 解压预训练模型文件
tar -xf ./pretrained_models/ch_PP-OCRv3_rec_train.tar -C pretrained_models
```
### 6.2 修改配置文件
我们使用`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml`,主要修改训练轮数和学习率参相关参数,设置预训练模型路径,设置数据集路径。 另外,batch_size可根据自己机器显存大小进行调整。 具体修改如下几个地方:
```
epoch_num: 100 # 训练epoch数
save_model_dir: ./output/ch_PP-OCR_v3_rec
save_epoch_step: 10
eval_batch_step: [0, 100] # 评估间隔,每隔100step评估一次
pretrained_model: ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy # 预训练模型路径
lr:
name: Cosine # 修改学习率衰减策略为Cosine
learning_rate: 0.0001 # 修改fine-tune的学习率
warmup_epoch: 2 # 修改warmup轮数
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data # 训练集图片路径
ext_op_transform_idx: 1
label_file_list:
- ./train_data/chineseocr-data/rec_hand_line_all_label_train.txt # 训练集标签
- ./train_data/handwrite/HWDB2.0Train_label.txt
- ./train_data/handwrite/HWDB2.1Train_label.txt
- ./train_data/handwrite/HWDB2.2Train_label.txt
- ./train_data/handwrite/hwdb_ic13/handwriting_hwdb_train_labels.txt
- ./train_data/handwrite/HW_Chinese/train_hw.txt
ratio_list:
- 0.1
- 1.0
- 1.0
- 1.0
- 0.02
- 1.0
loader:
shuffle: true
batch_size_per_card: 64
drop_last: true
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data # 测试集图片路径
label_file_list:
- ./train_data/chineseocr-data/rec_hand_line_all_label_val.txt # 测试集标签
- ./train_data/handwrite/HWDB2.0Test_label.txt
- ./train_data/handwrite/HWDB2.1Test_label.txt
- ./train_data/handwrite/HWDB2.2Test_label.txt
- ./train_data/handwrite/hwdb_ic13/handwriting_hwdb_val_labels.txt
- ./train_data/handwrite/HW_Chinese/test_hw.txt
loader:
shuffle: false
drop_last: false
batch_size_per_card: 64
num_workers: 4
```
由于数据集大多是长文本,因此需要**注释**掉下面的数据增广策略,以便训练出更好的模型。
```
- RecConAug:
prob: 0.5
ext_data_num: 2
image_shape: [48, 320, 3]
```
### 6.3 开始训练
我们使用上面修改好的配置文件`configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml`,预训练模型,数据集路径,学习率,训练轮数等都已经设置完毕后,可以使用下面命令开始训练。
```python
# 开始训练识别模型
python tools/train.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml
```
## 7. 模型评估
在训练之前,我们可以直接使用下面命令来评估预训练模型的效果:
```python
# 评估预训练模型
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy"
```
```
[2022/07/14 10:46:22] ppocr INFO: load pretrain successful from ./pretrained_models/ch_PP-OCRv3_rec_train/best_accuracy
eval model:: 100%|████████████████████████████| 687/687 [03:29<00:00, 3.27it/s]
[2022/07/14 10:49:52] ppocr INFO: metric eval ***************
[2022/07/14 10:49:52] ppocr INFO: acc:0.03724954461811258
[2022/07/14 10:49:52] ppocr INFO: norm_edit_dis:0.4859541065843199
[2022/07/14 10:49:52] ppocr INFO: Teacher_acc:0.0371584699368947
[2022/07/14 10:49:52] ppocr INFO: Teacher_norm_edit_dis:0.48718814890536477
[2022/07/14 10:49:52] ppocr INFO: fps:947.8562684823883
```
可以看出,直接加载预训练模型进行评估,效果较差,因为预训练模型并不是基于手写文字进行单独训练的,所以我们需要基于预训练模型进行finetune。
训练完成后,可以进行测试评估,评估命令如下:
```python
# 评估finetune效果
python tools/eval.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy"
```
评估结果如下,可以看出识别准确率为54.3%。
```
[2022/07/14 10:54:06] ppocr INFO: metric eval ***************
[2022/07/14 10:54:06] ppocr INFO: acc:0.5430100180913
[2022/07/14 10:54:06] ppocr INFO: norm_edit_dis:0.9203322593158589
[2022/07/14 10:54:06] ppocr INFO: Teacher_acc:0.5401183969626324
[2022/07/14 10:54:06] ppocr INFO: Teacher_norm_edit_dis:0.919827504507755
[2022/07/14 10:54:06] ppocr INFO: fps:928.948733797251
```
如需获取已训练模型,请扫码填写问卷,加入PaddleOCR官方交流群获取全部OCR垂类模型下载链接、《动手学OCR》电子书等全套OCR学习资料🎁
<div align="left">
<img src="https://ai-studio-static-online.cdn.bcebos.com/dd721099bd50478f9d5fb13d8dd00fad69c22d6848244fd3a1d3980d7fefc63e" width = "150" height = "150" />
</div>
将下载或训练完成的模型放置在对应目录下即可完成模型推理。
## 8. 模型导出推理
训练完成后,可以将训练模型转换成inference模型。inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成。
### 8.1 模型导出
导出命令如下:
```python
# 转化为推理模型
python tools/export_model.py -c configs/rec/PP-OCRv3/ch_PP-OCRv3_rec_distillation.yml -o Global.pretrained_model="./output/ch_PP-OCR_v3_rec/best_accuracy" Global.save_inference_dir="./inference/rec_ppocrv3/"
```
### 8.2 模型推理
导出模型后,可以使用如下命令进行推理预测:
```python
# 推理预测
python tools/infer/predict_rec.py --image_dir="train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg" --rec_model_dir="./inference/rec_ppocrv3/Student"
```
```
[2022/07/14 10:55:56] ppocr INFO: In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320', if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
[2022/07/14 10:55:58] ppocr INFO: Predicts of train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg:('品结构,差异化的多品牌渗透使欧莱雅确立了其在中国化妆', 0.9904912114143372)
```
```python
# 可视化文字识别图片
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os
img_path = 'train_data/handwrite/HWDB2.0Test_images/104-P16_4.jpg'
def vis(img_path):
plt.figure()
image = Image.open(img_path)
plt.imshow(image)
plt.show()
# image = image.resize([208, 208])
vis(img_path)
```
![res](https://ai-studio-static-online.cdn.bcebos.com/ad7c02745491498d82e0ce95f4a274f9b3920b2f467646858709359b7af9d869)
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
## 1. 简介 ## 1. 简介
PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库,其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度,SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet,使用两层Global Blocks。在中文场景中,PP-OCRv3识别主要使用如下优化策略: PP-OCRv3是百度开源的超轻量级场景文本检测识别模型库,其中超轻量的场景中文识别模型SVTR_LCNet使用了SVTR算法结构。为了保证速度,SVTR_LCNet将SVTR模型的Local Blocks替换为LCNet,使用两层Global Blocks。在中文场景中,PP-OCRv3识别主要使用如下优化策略[详细技术报告](../doc/doc_ch/PP-OCRv3_introduction.md)
- GTC:Attention指导CTC训练策略; - GTC:Attention指导CTC训练策略;
- TextConAug:挖掘文字上下文信息的数据增广策略; - TextConAug:挖掘文字上下文信息的数据增广策略;
- TextRotNet:自监督的预训练模型; - TextRotNet:自监督的预训练模型;
......
...@@ -6,11 +6,11 @@ Global: ...@@ -6,11 +6,11 @@ Global:
save_model_dir: ./output/re_layoutlmv2_xfund_zh save_model_dir: ./output/re_layoutlmv2_xfund_zh
save_epoch_step: 2000 save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration # evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 57 ] eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False cal_metric_during_train: False
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2048 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg
save_res_path: ./output/re_layoutlmv2_xfund_zh/res/ save_res_path: ./output/re_layoutlmv2_xfund_zh/res/
......
Global: Global:
use_gpu: True use_gpu: True
epoch_num: &epoch_num 200 epoch_num: &epoch_num 130
log_smooth_window: 10 log_smooth_window: 10
print_batch_step: 10 print_batch_step: 10
save_model_dir: ./output/re_layoutxlm/ save_model_dir: ./output/re_layoutxlm_xfund_zh
save_epoch_step: 2000 save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration # evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 19 ] eval_batch_step: [ 0, 19 ]
...@@ -12,7 +12,7 @@ Global: ...@@ -12,7 +12,7 @@ Global:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg
save_res_path: ./output/re/ save_res_path: ./output/re_layoutxlm_xfund_zh/res/
Architecture: Architecture:
model_type: vqa model_type: vqa
...@@ -81,7 +81,7 @@ Train: ...@@ -81,7 +81,7 @@ Train:
loader: loader:
shuffle: True shuffle: True
drop_last: False drop_last: False
batch_size_per_card: 8 batch_size_per_card: 2
num_workers: 8 num_workers: 8
collate_fn: ListCollator collate_fn: ListCollator
......
...@@ -6,13 +6,13 @@ Global: ...@@ -6,13 +6,13 @@ Global:
save_model_dir: ./output/ser_layoutlm_xfund_zh save_model_dir: ./output/ser_layoutlm_xfund_zh
save_epoch_step: 2000 save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration # evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 57 ] eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False cal_metric_during_train: False
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg
save_res_path: ./output/ser_layoutlm_xfund_zh/res/ save_res_path: ./output/re_layoutlm_xfund_zh/res
Architecture: Architecture:
model_type: vqa model_type: vqa
...@@ -55,6 +55,7 @@ Train: ...@@ -55,6 +55,7 @@ Train:
data_dir: train_data/XFUND/zh_train/image data_dir: train_data/XFUND/zh_train/image
label_file_list: label_file_list:
- train_data/XFUND/zh_train/train.json - train_data/XFUND/zh_train/train.json
ratio_list: [ 1.0 ]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: RGB img_mode: RGB
......
...@@ -27,6 +27,7 @@ Architecture: ...@@ -27,6 +27,7 @@ Architecture:
Loss: Loss:
name: VQASerTokenLayoutLMLoss name: VQASerTokenLayoutLMLoss
num_classes: *num_classes num_classes: *num_classes
key: "backbone_out"
Optimizer: Optimizer:
name: AdamW name: AdamW
......
...@@ -27,6 +27,7 @@ Architecture: ...@@ -27,6 +27,7 @@ Architecture:
Loss: Loss:
name: VQASerTokenLayoutLMLoss name: VQASerTokenLayoutLMLoss
num_classes: *num_classes num_classes: *num_classes
key: "backbone_out"
Optimizer: Optimizer:
name: AdamW name: AdamW
......
Global: Global:
use_gpu: True use_gpu: True
epoch_num: &epoch_num 200 epoch_num: &epoch_num 130
log_smooth_window: 10 log_smooth_window: 10
print_batch_step: 10 print_batch_step: 10
save_model_dir: ./output/re_layoutxlm_funsd save_model_dir: ./output/re_vi_layoutxlm_xfund_zh
save_epoch_step: 2000 save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration # evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 57 ] eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False cal_metric_during_train: False
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: train_data/FUNSD/testing_data/images/83624198.png infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg
save_res_path: ./output/re_layoutxlm_funsd/res/ save_res_path: ./output/re/xfund_zh/with_gt
Architecture: Architecture:
model_type: vqa model_type: vqa
...@@ -21,6 +21,7 @@ Architecture: ...@@ -21,6 +21,7 @@ Architecture:
Backbone: Backbone:
name: LayoutXLMForRe name: LayoutXLMForRe
pretrained: True pretrained: True
mode: vi
checkpoints: checkpoints:
Loss: Loss:
...@@ -50,10 +51,9 @@ Metric: ...@@ -50,10 +51,9 @@ Metric:
Train: Train:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
data_dir: ./train_data/FUNSD/training_data/images/ data_dir: train_data/XFUND/zh_train/image
label_file_list: label_file_list:
- ./train_data/FUNSD/train_v4.json - train_data/XFUND/zh_train/train.json
# - ./train_data/FUNSD/train.json
ratio_list: [ 1.0 ] ratio_list: [ 1.0 ]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
...@@ -62,8 +62,9 @@ Train: ...@@ -62,8 +62,9 @@ Train:
- VQATokenLabelEncode: # Class handling label - VQATokenLabelEncode: # Class handling label
contains_re: True contains_re: True
algorithm: *algorithm algorithm: *algorithm
class_path: &class_path ./train_data/FUNSD/class_list.txt class_path: &class_path train_data/XFUND/class_list_xfun.txt
use_textline_bbox_info: &use_textline_bbox_info True use_textline_bbox_info: &use_textline_bbox_info True
order_method: &order_method "tb-yx"
- VQATokenPad: - VQATokenPad:
max_seq_len: &max_seq_len 512 max_seq_len: &max_seq_len 512
return_attention_mask: True return_attention_mask: True
...@@ -79,22 +80,20 @@ Train: ...@@ -79,22 +80,20 @@ Train:
order: 'hwc' order: 'hwc'
- ToCHWImage: - ToCHWImage:
- KeepKeys: - KeepKeys:
# dataloader will return list in this order keep_keys: [ 'input_ids', 'bbox','attention_mask', 'token_type_ids', 'image', 'entities', 'relations'] # dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'entities', 'relations']
loader: loader:
shuffle: False shuffle: True
drop_last: False drop_last: False
batch_size_per_card: 8 batch_size_per_card: 2
num_workers: 16 num_workers: 4
collate_fn: ListCollator collate_fn: ListCollator
Eval: Eval:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
data_dir: ./train_data/FUNSD/testing_data/images/ data_dir: train_data/XFUND/zh_val/image
label_file_list: label_file_list:
- ./train_data/FUNSD/test_v4.json - train_data/XFUND/zh_val/val.json
# - ./train_data/FUNSD/test.json
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: RGB img_mode: RGB
...@@ -104,6 +103,7 @@ Eval: ...@@ -104,6 +103,7 @@ Eval:
algorithm: *algorithm algorithm: *algorithm
class_path: *class_path class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad: - VQATokenPad:
max_seq_len: *max_seq_len max_seq_len: *max_seq_len
return_attention_mask: True return_attention_mask: True
...@@ -119,11 +119,11 @@ Eval: ...@@ -119,11 +119,11 @@ Eval:
order: 'hwc' order: 'hwc'
- ToCHWImage: - ToCHWImage:
- KeepKeys: - KeepKeys:
# dataloader will return list in this order keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'entities', 'relations'] # dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'entities', 'relations']
loader: loader:
shuffle: False shuffle: False
drop_last: False drop_last: False
batch_size_per_card: 8 batch_size_per_card: 8
num_workers: 8 num_workers: 8
collate_fn: ListCollator collate_fn: ListCollator
Global:
use_gpu: True
epoch_num: &epoch_num 130
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/re_vi_layoutxlm_xfund_zh_udml
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: ppstructure/docs/vqa/input/zh_val_21.jpg
save_res_path: ./output/re/xfund_zh/with_gt
Architecture:
model_type: &model_type "vqa"
name: DistillationModel
algorithm: Distillation
Models:
Teacher:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForRe
pretrained: True
mode: vi
checkpoints:
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: *algorithm
Transform:
Backbone:
name: LayoutXLMForRe
pretrained: True
mode: vi
checkpoints:
Loss:
name: CombinedLoss
loss_config_list:
- DistillationLossFromOutput:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: loss
reduction: mean
- DistillationVQADistanceLoss:
weight: 0.5
mode: "l2"
model_name_pairs:
- ["Student", "Teacher"]
key: hidden_states_5
name: "loss_5"
- DistillationVQADistanceLoss:
weight: 0.5
mode: "l2"
model_name_pairs:
- ["Student", "Teacher"]
key: hidden_states_8
name: "loss_8"
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
clip_norm: 10
lr:
learning_rate: 0.00005
warmup_epoch: 10
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: DistillationRePostProcess
model_name: ["Student", "Teacher"]
key: null
Metric:
name: DistillationMetric
base_metric_name: VQAReTokenMetric
main_indicator: hmean
key: "Student"
Train:
dataset:
name: SimpleDataSet
data_dir: train_data/XFUND/zh_train/image
label_file_list:
- train_data/XFUND/zh_train/train.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: &class_path train_data/XFUND/class_list_xfun.txt
use_textline_bbox_info: &use_textline_bbox_info True
# [None, "tb-yx"]
order_method: &order_method "tb-yx"
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox','attention_mask', 'token_type_ids', 'image', 'entities', 'relations'] # dataloader will return list in this order
loader:
shuffle: True
drop_last: False
batch_size_per_card: 2
num_workers: 4
collate_fn: ListCollator
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/XFUND/zh_val/image
label_file_list:
- train_data/XFUND/zh_val/val.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'entities', 'relations'] # dataloader will return list in this order
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 8
collate_fn: ListCollator
...@@ -3,30 +3,38 @@ Global: ...@@ -3,30 +3,38 @@ Global:
epoch_num: &epoch_num 200 epoch_num: &epoch_num 200
log_smooth_window: 10 log_smooth_window: 10
print_batch_step: 10 print_batch_step: 10
save_model_dir: ./output/ser_layoutlm_funsd save_model_dir: ./output/ser_vi_layoutxlm_xfund_zh
save_epoch_step: 2000 save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration # evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 57 ] eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False cal_metric_during_train: False
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: train_data/FUNSD/testing_data/images/83624198.png infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg
save_res_path: ./output/ser_layoutlm_funsd/res/ # if you want to predict using the groundtruth ocr info,
# you can use the following config
# infer_img: train_data/XFUND/zh_val/val.json
# infer_mode: False
save_res_path: ./output/ser/xfund_zh/res
Architecture: Architecture:
model_type: vqa model_type: vqa
algorithm: &algorithm "LayoutLM" algorithm: &algorithm "LayoutXLM"
Transform: Transform:
Backbone: Backbone:
name: LayoutLMForSer name: LayoutXLMForSer
pretrained: True pretrained: True
checkpoints: checkpoints:
# one of base or vi
mode: vi
num_classes: &num_classes 7 num_classes: &num_classes 7
Loss: Loss:
name: VQASerTokenLayoutLMLoss name: VQASerTokenLayoutLMLoss
num_classes: *num_classes num_classes: *num_classes
key: "backbone_out"
Optimizer: Optimizer:
name: AdamW name: AdamW
...@@ -43,7 +51,7 @@ Optimizer: ...@@ -43,7 +51,7 @@ Optimizer:
PostProcess: PostProcess:
name: VQASerTokenLayoutLMPostProcess name: VQASerTokenLayoutLMPostProcess
class_path: &class_path ./train_data/FUNSD/class_list.txt class_path: &class_path train_data/XFUND/class_list_xfun.txt
Metric: Metric:
name: VQASerTokenMetric name: VQASerTokenMetric
...@@ -52,9 +60,10 @@ Metric: ...@@ -52,9 +60,10 @@ Metric:
Train: Train:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
data_dir: ./train_data/FUNSD/training_data/images/ data_dir: train_data/XFUND/zh_train/image
label_file_list: label_file_list:
- ./train_data/FUNSD/train.json - train_data/XFUND/zh_train/train.json
ratio_list: [ 1.0 ]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: RGB img_mode: RGB
...@@ -64,6 +73,8 @@ Train: ...@@ -64,6 +73,8 @@ Train:
algorithm: *algorithm algorithm: *algorithm
class_path: *class_path class_path: *class_path
use_textline_bbox_info: &use_textline_bbox_info True use_textline_bbox_info: &use_textline_bbox_info True
# one of [None, "tb-yx"]
order_method: &order_method "tb-yx"
- VQATokenPad: - VQATokenPad:
max_seq_len: &max_seq_len 512 max_seq_len: &max_seq_len 512
return_attention_mask: True return_attention_mask: True
...@@ -78,8 +89,7 @@ Train: ...@@ -78,8 +89,7 @@ Train:
order: 'hwc' order: 'hwc'
- ToCHWImage: - ToCHWImage:
- KeepKeys: - KeepKeys:
# dataloader will return list in this order keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader: loader:
shuffle: True shuffle: True
drop_last: False drop_last: False
...@@ -89,9 +99,9 @@ Train: ...@@ -89,9 +99,9 @@ Train:
Eval: Eval:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
data_dir: train_data/FUNSD/testing_data/images/ data_dir: train_data/XFUND/zh_val/image
label_file_list: label_file_list:
- ./train_data/FUNSD/test.json - train_data/XFUND/zh_val/val.json
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: RGB img_mode: RGB
...@@ -101,6 +111,7 @@ Eval: ...@@ -101,6 +111,7 @@ Eval:
algorithm: *algorithm algorithm: *algorithm
class_path: *class_path class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info use_textline_bbox_info: *use_textline_bbox_info
order_method: *order_method
- VQATokenPad: - VQATokenPad:
max_seq_len: *max_seq_len max_seq_len: *max_seq_len
return_attention_mask: True return_attention_mask: True
...@@ -115,8 +126,7 @@ Eval: ...@@ -115,8 +126,7 @@ Eval:
order: 'hwc' order: 'hwc'
- ToCHWImage: - ToCHWImage:
- KeepKeys: - KeepKeys:
# dataloader will return list in this order keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader: loader:
shuffle: False shuffle: False
drop_last: False drop_last: False
......
...@@ -3,30 +3,84 @@ Global: ...@@ -3,30 +3,84 @@ Global:
epoch_num: &epoch_num 200 epoch_num: &epoch_num 200
log_smooth_window: 10 log_smooth_window: 10
print_batch_step: 10 print_batch_step: 10
save_model_dir: ./output/ser_layoutxlm_funsd save_model_dir: ./output/ser_vi_layoutxlm_xfund_zh_udml
save_epoch_step: 2000 save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration # evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 57 ] eval_batch_step: [ 0, 19 ]
cal_metric_during_train: False cal_metric_during_train: False
save_inference_dir: save_inference_dir:
use_visualdl: False use_visualdl: False
seed: 2022 seed: 2022
infer_img: train_data/FUNSD/testing_data/images/83624198.png infer_img: ppstructure/docs/vqa/input/zh_val_42.jpg
save_res_path: output/ser_layoutxlm_funsd/res/ save_res_path: ./output/ser_layoutxlm_xfund_zh/res
Architecture: Architecture:
model_type: vqa model_type: &model_type "vqa"
algorithm: &algorithm "LayoutXLM" name: DistillationModel
Transform: algorithm: Distillation
Backbone: Models:
name: LayoutXLMForSer Teacher:
pretrained: True pretrained:
checkpoints: freeze_params: false
num_classes: &num_classes 7 return_all_feats: true
model_type: *model_type
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
# one of base or vi
mode: vi
checkpoints:
num_classes: &num_classes 7
Student:
pretrained:
freeze_params: false
return_all_feats: true
model_type: *model_type
algorithm: *algorithm
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
# one of base or vi
mode: vi
checkpoints:
num_classes: *num_classes
Loss: Loss:
name: VQASerTokenLayoutLMLoss name: CombinedLoss
num_classes: *num_classes loss_config_list:
- DistillationVQASerTokenLayoutLMLoss:
weight: 1.0
model_name_list: ["Student", "Teacher"]
key: backbone_out
num_classes: *num_classes
- DistillationSERDMLLoss:
weight: 1.0
act: "softmax"
use_log: true
model_name_pairs:
- ["Student", "Teacher"]
key: backbone_out
- DistillationVQADistanceLoss:
weight: 0.5
mode: "l2"
model_name_pairs:
- ["Student", "Teacher"]
key: hidden_states_5
name: "loss_5"
- DistillationVQADistanceLoss:
weight: 0.5
mode: "l2"
model_name_pairs:
- ["Student", "Teacher"]
key: hidden_states_8
name: "loss_8"
Optimizer: Optimizer:
name: AdamW name: AdamW
...@@ -36,25 +90,29 @@ Optimizer: ...@@ -36,25 +90,29 @@ Optimizer:
name: Linear name: Linear
learning_rate: 0.00005 learning_rate: 0.00005
epochs: *epoch_num epochs: *epoch_num
warmup_epoch: 2 warmup_epoch: 10
regularizer: regularizer:
name: L2 name: L2
factor: 0.00000 factor: 0.00000
PostProcess: PostProcess:
name: VQASerTokenLayoutLMPostProcess name: DistillationSerPostProcess
class_path: &class_path ./train_data/FUNSD/class_list.txt model_name: ["Student", "Teacher"]
key: backbone_out
class_path: &class_path train_data/XFUND/class_list_xfun.txt
Metric: Metric:
name: VQASerTokenMetric name: DistillationMetric
base_metric_name: VQASerTokenMetric
main_indicator: hmean main_indicator: hmean
key: "Student"
Train: Train:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
data_dir: ./train_data/FUNSD/training_data/images/ data_dir: train_data/XFUND/zh_train/image
label_file_list: label_file_list:
- ./train_data/FUNSD/train.json - train_data/XFUND/zh_train/train.json
ratio_list: [ 1.0 ] ratio_list: [ 1.0 ]
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
...@@ -64,6 +122,8 @@ Train: ...@@ -64,6 +122,8 @@ Train:
contains_re: False contains_re: False
algorithm: *algorithm algorithm: *algorithm
class_path: *class_path class_path: *class_path
# one of [None, "tb-yx"]
order_method: &order_method "tb-yx"
- VQATokenPad: - VQATokenPad:
max_seq_len: &max_seq_len 512 max_seq_len: &max_seq_len 512
return_attention_mask: True return_attention_mask: True
...@@ -78,20 +138,19 @@ Train: ...@@ -78,20 +138,19 @@ Train:
order: 'hwc' order: 'hwc'
- ToCHWImage: - ToCHWImage:
- KeepKeys: - KeepKeys:
# dataloader will return list in this order keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader: loader:
shuffle: True shuffle: True
drop_last: False drop_last: False
batch_size_per_card: 8 batch_size_per_card: 4
num_workers: 4 num_workers: 4
Eval: Eval:
dataset: dataset:
name: SimpleDataSet name: SimpleDataSet
data_dir: train_data/FUNSD/testing_data/images/ data_dir: train_data/XFUND/zh_val/image
label_file_list: label_file_list:
- ./train_data/FUNSD/test.json - train_data/XFUND/zh_val/val.json
transforms: transforms:
- DecodeImage: # load image - DecodeImage: # load image
img_mode: RGB img_mode: RGB
...@@ -100,6 +159,7 @@ Eval: ...@@ -100,6 +159,7 @@ Eval:
contains_re: False contains_re: False
algorithm: *algorithm algorithm: *algorithm
class_path: *class_path class_path: *class_path
order_method: *order_method
- VQATokenPad: - VQATokenPad:
max_seq_len: *max_seq_len max_seq_len: *max_seq_len
return_attention_mask: True return_attention_mask: True
...@@ -114,10 +174,10 @@ Eval: ...@@ -114,10 +174,10 @@ Eval:
order: 'hwc' order: 'hwc'
- ToCHWImage: - ToCHWImage:
- KeepKeys: - KeepKeys:
# dataloader will return list in this order keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels'] # dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader: loader:
shuffle: False shuffle: False
drop_last: False drop_last: False
batch_size_per_card: 8 batch_size_per_card: 8
num_workers: 4 num_workers: 4
Global:
use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/re_layoutlmv2_funsd
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 57 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: train_data/FUNSD/testing_data/images/83624198.png
save_res_path: ./output/re_layoutlmv2_funsd/res/
Architecture:
model_type: vqa
algorithm: &algorithm "LayoutLMv2"
Transform:
Backbone:
name: LayoutLMv2ForRe
pretrained: True
checkpoints:
Loss:
name: LossFromOutput
key: loss
reduction: mean
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
clip_norm: 10
lr:
learning_rate: 0.00005
warmup_epoch: 10
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQAReTokenLayoutLMPostProcess
Metric:
name: VQAReTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/FUNSD/training_data/images/
label_file_list:
- ./train_data/FUNSD/train.json
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: &class_path train_data/FUNSD/class_list.txt
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'entities', 'relations']
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 8
collate_fn: ListCollator
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/FUNSD/testing_data/images/
label_file_list:
- ./train_data/FUNSD/test.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: True
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQAReTokenRelation:
- VQAReTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1./255.
mean: [0.485, 0.456, 0.406]
std: [0.229, 0.224, 0.225]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'entities', 'relations']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 8
collate_fn: ListCollator
Global:
use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/ser_layoutlm_sroie
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 200 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: train_data/SROIE/test/X00016469670.jpg
save_res_path: ./output/ser_layoutlm_sroie/res/
Architecture:
model_type: vqa
algorithm: &algorithm "LayoutLM"
Transform:
Backbone:
name: LayoutLMForSer
pretrained: True
checkpoints:
num_classes: &num_classes 9
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path ./train_data/SROIE/class_list.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/SROIE/train
label_file_list:
- ./train_data/SROIE/train.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: &use_textline_bbox_info True
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/SROIE/test
label_file_list:
- ./train_data/SROIE/test.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
use_textline_bbox_info: *use_textline_bbox_info
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 4
Global:
use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/ser_layoutlmv2_funsd
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 100 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: train_data/FUNSD/testing_data/images/83624198.png
save_res_path: ./output/ser_layoutlmv2_funsd/res/
Architecture:
model_type: vqa
algorithm: &algorithm "LayoutLMv2"
Transform:
Backbone:
name: LayoutLMv2ForSer
pretrained: True
checkpoints:
num_classes: &num_classes 7
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path train_data/FUNSD/class_list.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/FUNSD/training_data/images/
label_file_list:
- ./train_data/FUNSD/train.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/FUNSD/testing_data/images/
label_file_list:
- ./train_data/FUNSD/test.json
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 4
Global:
use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/ser_layoutlmv2_sroie
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 200 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: train_data/SROIE/test/X00016469670.jpg
save_res_path: ./output/ser_layoutlmv2_sroie/res/
Architecture:
model_type: vqa
algorithm: &algorithm "LayoutLMv2"
Transform:
Backbone:
name: LayoutLMv2ForSer
pretrained: True
checkpoints:
num_classes: &num_classes 9
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path ./train_data/SROIE/class_list.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/SROIE/train
label_file_list:
- ./train_data/SROIE/train.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: ./train_data/SROIE/test
label_file_list:
- ./train_data/SROIE/test.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 4
Global:
use_gpu: True
epoch_num: &epoch_num 200
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/ser_layoutxlm_sroie
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 200 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: train_data/SROIE/test/X00016469670.jpg
save_res_path: res_img_aug_with_gt
Architecture:
model_type: vqa
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
checkpoints:
num_classes: &num_classes 9
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path ./train_data/SROIE/class_list.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/SROIE/train
label_file_list:
- ./train_data/SROIE/train.txt
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/SROIE/test
label_file_list:
- ./train_data/SROIE/test.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 4
Global:
use_gpu: True
epoch_num: &epoch_num 100
log_smooth_window: 10
print_batch_step: 10
save_model_dir: ./output/ser_layoutxlm_wildreceipt
save_epoch_step: 2000
# evaluation is run every 10 iterations after the 0th iteration
eval_batch_step: [ 0, 200 ]
cal_metric_during_train: False
save_inference_dir:
use_visualdl: False
seed: 2022
infer_img: train_data//wildreceipt/image_files/Image_12/10/845be0dd6f5b04866a2042abd28d558032ef2576.jpeg
save_res_path: ./output/ser_layoutxlm_wildreceipt/res
Architecture:
model_type: vqa
algorithm: &algorithm "LayoutXLM"
Transform:
Backbone:
name: LayoutXLMForSer
pretrained: True
checkpoints:
num_classes: &num_classes 51
Loss:
name: VQASerTokenLayoutLMLoss
num_classes: *num_classes
Optimizer:
name: AdamW
beta1: 0.9
beta2: 0.999
lr:
name: Linear
learning_rate: 0.00005
epochs: *epoch_num
warmup_epoch: 2
regularizer:
name: L2
factor: 0.00000
PostProcess:
name: VQASerTokenLayoutLMPostProcess
class_path: &class_path ./train_data/wildreceipt/class_list.txt
Metric:
name: VQASerTokenMetric
main_indicator: hmean
Train:
dataset:
name: SimpleDataSet
data_dir: ./train_data/wildreceipt/
label_file_list:
- ./train_data/wildreceipt/wildreceipt_train.txt
ratio_list: [ 1.0 ]
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: &max_seq_len 512
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: True
drop_last: False
batch_size_per_card: 8
num_workers: 4
Eval:
dataset:
name: SimpleDataSet
data_dir: train_data/wildreceipt
label_file_list:
- ./train_data/wildreceipt/wildreceipt_test.txt
transforms:
- DecodeImage: # load image
img_mode: RGB
channel_first: False
- VQATokenLabelEncode: # Class handling label
contains_re: False
algorithm: *algorithm
class_path: *class_path
- VQATokenPad:
max_seq_len: *max_seq_len
return_attention_mask: True
- VQASerTokenChunk:
max_seq_len: *max_seq_len
- Resize:
size: [224,224]
- NormalizeImage:
scale: 1
mean: [ 123.675, 116.28, 103.53 ]
std: [ 58.395, 57.12, 57.375 ]
order: 'hwc'
- ToCHWImage:
- KeepKeys:
# dataloader will return list in this order
keep_keys: [ 'input_ids', 'bbox', 'attention_mask', 'token_type_ids', 'image', 'labels']
loader:
shuffle: False
drop_last: False
batch_size_per_card: 8
num_workers: 4
...@@ -53,10 +53,11 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p ...@@ -53,10 +53,11 @@ PP-OCRv3检测模型是对PP-OCRv2中的[CML](https://arxiv.org/pdf/2109.03144.p
|序号|策略|模型大小|hmean|速度(cpu + mkldnn)| |序号|策略|模型大小|hmean|速度(cpu + mkldnn)|
|-|-|-|-|-| |-|-|-|-|-|
|baseline teacher|DB-R50|99M|83.5%|260ms| |baseline teacher|PP-OCR server|49M|83.2%|171ms|
|teacher1|DB-R50-LK-PAN|124M|85.0%|396ms| |teacher1|DB-R50-LK-PAN|124M|85.0%|396ms|
|teacher2|DB-R50-LK-PAN-DML|124M|86.0%|396ms| |teacher2|DB-R50-LK-PAN-DML|124M|86.0%|396ms|
|baseline student|PP-OCRv2|3M|83.2%|117ms| |baseline student|PP-OCRv2|3M|83.2%|117ms|
|student0|DB-MV3-RSE-FPN|3.6M|84.5%|124ms|
|student1|DB-MV3-CML(teacher2)|3M|84.3%|117ms| |student1|DB-MV3-CML(teacher2)|3M|84.3%|117ms|
|student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.4%|124ms| |student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.4%|124ms|
...@@ -184,7 +185,7 @@ UDML(Unified-Deep Mutual Learning)联合互学习是PP-OCRv2中就采用的 ...@@ -184,7 +185,7 @@ UDML(Unified-Deep Mutual Learning)联合互学习是PP-OCRv2中就采用的
**(6)UIM:无标注数据挖掘方案** **(6)UIM:无标注数据挖掘方案**
UIM(Unlabeled Images Mining)是一种非常简单的无标注数据挖掘方案。核心思想是利用高精度的文本识别大模型对无标注数据进行预测,获取伪标签,并且选择预测置信度高的样本作为训练数据,用于训练小模型。使用该策略,识别模型的准确率进一步提升到79.4%(+1%)。 UIM(Unlabeled Images Mining)是一种非常简单的无标注数据挖掘方案。核心思想是利用高精度的文本识别大模型对无标注数据进行预测,获取伪标签,并且选择预测置信度高的样本作为训练数据,用于训练小模型。使用该策略,识别模型的准确率进一步提升到79.4%(+1%)。实际操作中,我们使用全量数据集训练高精度SVTR-Tiny模型(acc=82.5%)进行数据挖掘,点击获取[模型下载地址和使用教程](../../applications/高精度中文识别模型.md)
<div align="center"> <div align="center">
<img src="../ppocr_v3/UIM.png" width="500"> <img src="../ppocr_v3/UIM.png" width="500">
......
...@@ -65,7 +65,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/ ...@@ -65,7 +65,7 @@ python3 -m paddle.distributed.launch --gpus '0,1,2,3' tools/train.py -c configs/
``` ```
上述指令中,通过-c 选择训练使用configs/det/det_db_mv3.yml配置文件。 上述指令中,通过-c 选择训练使用configs/det/det_mv3_db.yml配置文件。
有关配置文件的详细解释,请参考[链接](./config.md) 有关配置文件的详细解释,请参考[链接](./config.md)
您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001 您也可以通过-o参数在不需要修改yml文件的情况下,改变训练的参数,比如,调整训练的学习率为0.0001
......
...@@ -55,10 +55,11 @@ The ablation experiments are as follows: ...@@ -55,10 +55,11 @@ The ablation experiments are as follows:
|ID|Strategy|Model Size|Hmean|The Inference Time(cpu + mkldnn)| |ID|Strategy|Model Size|Hmean|The Inference Time(cpu + mkldnn)|
|-|-|-|-|-| |-|-|-|-|-|
|baseline teacher|DB-R50|99M|83.5%|260ms| |baseline teacher|PP-OCR server|49M|83.2%|171ms|
|teacher1|DB-R50-LK-PAN|124M|85.0%|396ms| |teacher1|DB-R50-LK-PAN|124M|85.0%|396ms|
|teacher2|DB-R50-LK-PAN-DML|124M|86.0%|396ms| |teacher2|DB-R50-LK-PAN-DML|124M|86.0%|396ms|
|baseline student|PP-OCRv2|3M|83.2%|117ms| |baseline student|PP-OCRv2|3M|83.2%|117ms|
|student0|DB-MV3-RSE-FPN|3.6M|84.5%|124ms|
|student1|DB-MV3-CML(teacher2)|3M|84.3%|117ms| |student1|DB-MV3-CML(teacher2)|3M|84.3%|117ms|
|student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.4%|124ms| |student2|DB-MV3-RSE-FPN-CML(teacher2)|3.6M|85.4%|124ms|
...@@ -199,7 +200,7 @@ UDML (Unified-Deep Mutual Learning) is a strategy proposed in PP-OCRv2 which is ...@@ -199,7 +200,7 @@ UDML (Unified-Deep Mutual Learning) is a strategy proposed in PP-OCRv2 which is
**(6)UIM:Unlabeled Images Mining** **(6)UIM:Unlabeled Images Mining**
UIM (Unlabeled Images Mining) is a very simple unlabeled data mining strategy. The main idea is to use a high-precision text recognition model to predict unlabeled images to obtain pseudo-labels, and select samples with high prediction confidence as training data for training lightweight models. Using this strategy, the accuracy of the recognition model is further improved to 79.4% (+1%). UIM (Unlabeled Images Mining) is a very simple unlabeled data mining strategy. The main idea is to use a high-precision text recognition model to predict unlabeled images to obtain pseudo-labels, and select samples with high prediction confidence as training data for training lightweight models. Using this strategy, the accuracy of the recognition model is further improved to 79.4% (+1%). In practice, we use the full data set to train the high-precision SVTR_Tiny model (acc=82.5%) for data mining. [SVTR_Tiny model download and tutorial](../../applications/高精度中文识别模型.md).
<div align="center"> <div align="center">
<img src="../ppocr_v3/UIM.png" width="500"> <img src="../ppocr_v3/UIM.png" width="500">
......
...@@ -51,7 +51,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \ ...@@ -51,7 +51,7 @@ python3 tools/train.py -c configs/det/det_mv3_db.yml \
-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
``` ```
In the above instruction, use `-c` to select the training to use the `configs/det/det_db_mv3.yml` configuration file. In the above instruction, use `-c` to select the training to use the `configs/det/det_mv3_db.yml` configuration file.
For a detailed explanation of the configuration file, please refer to [config](./config_en.md). For a detailed explanation of the configuration file, please refer to [config](./config_en.md).
You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001 You can also use `-o` to change the training parameters without modifying the yml file. For example, adjust the training learning rate to 0.0001
......
...@@ -26,6 +26,7 @@ import copy ...@@ -26,6 +26,7 @@ import copy
from random import sample from random import sample
from ppocr.utils.logging import get_logger from ppocr.utils.logging import get_logger
from ppocr.data.imaug.vqa.augment import order_by_tbyx
class ClsLabelEncode(object): class ClsLabelEncode(object):
...@@ -873,6 +874,7 @@ class VQATokenLabelEncode(object): ...@@ -873,6 +874,7 @@ class VQATokenLabelEncode(object):
add_special_ids=False, add_special_ids=False,
algorithm='LayoutXLM', algorithm='LayoutXLM',
use_textline_bbox_info=True, use_textline_bbox_info=True,
order_method=None,
infer_mode=False, infer_mode=False,
ocr_engine=None, ocr_engine=None,
**kwargs): **kwargs):
...@@ -902,6 +904,8 @@ class VQATokenLabelEncode(object): ...@@ -902,6 +904,8 @@ class VQATokenLabelEncode(object):
self.infer_mode = infer_mode self.infer_mode = infer_mode
self.ocr_engine = ocr_engine self.ocr_engine = ocr_engine
self.use_textline_bbox_info = use_textline_bbox_info self.use_textline_bbox_info = use_textline_bbox_info
self.order_method = order_method
assert self.order_method in [None, "tb-yx"]
def split_bbox(self, bbox, text, tokenizer): def split_bbox(self, bbox, text, tokenizer):
words = text.split() words = text.split()
...@@ -941,6 +945,14 @@ class VQATokenLabelEncode(object): ...@@ -941,6 +945,14 @@ class VQATokenLabelEncode(object):
# load bbox and label info # load bbox and label info
ocr_info = self._load_ocr_info(data) ocr_info = self._load_ocr_info(data)
for idx in range(len(ocr_info)):
if "bbox" not in ocr_info[idx]:
ocr_info[idx]["bbox"] = self.trans_poly_to_bbox(ocr_info[idx][
"points"])
if self.order_method == "tb-yx":
ocr_info = order_by_tbyx(ocr_info)
# for re # for re
train_re = self.contains_re and not self.infer_mode train_re = self.contains_re and not self.infer_mode
if train_re: if train_re:
...@@ -980,7 +992,10 @@ class VQATokenLabelEncode(object): ...@@ -980,7 +992,10 @@ class VQATokenLabelEncode(object):
info["bbox"] = self.trans_poly_to_bbox(info["points"]) info["bbox"] = self.trans_poly_to_bbox(info["points"])
encode_res = self.tokenizer.encode( encode_res = self.tokenizer.encode(
text, pad_to_max_seq_len=False, return_attention_mask=True) text,
pad_to_max_seq_len=False,
return_attention_mask=True,
return_token_type_ids=True)
if not self.add_special_ids: if not self.add_special_ids:
# TODO: use tok.all_special_ids to remove # TODO: use tok.all_special_ids to remove
...@@ -1052,10 +1067,10 @@ class VQATokenLabelEncode(object): ...@@ -1052,10 +1067,10 @@ class VQATokenLabelEncode(object):
return data return data
def trans_poly_to_bbox(self, poly): def trans_poly_to_bbox(self, poly):
x1 = np.min([p[0] for p in poly]) x1 = int(np.min([p[0] for p in poly]))
x2 = np.max([p[0] for p in poly]) x2 = int(np.max([p[0] for p in poly]))
y1 = np.min([p[1] for p in poly]) y1 = int(np.min([p[1] for p in poly]))
y2 = np.max([p[1] for p in poly]) y2 = int(np.max([p[1] for p in poly]))
return [x1, y1, x2, y2] return [x1, y1, x2, y2]
def _load_ocr_info(self, data): def _load_ocr_info(self, data):
......
...@@ -13,12 +13,10 @@ ...@@ -13,12 +13,10 @@
# limitations under the License. # limitations under the License.
from .token import VQATokenPad, VQASerTokenChunk, VQAReTokenChunk, VQAReTokenRelation from .token import VQATokenPad, VQASerTokenChunk, VQAReTokenChunk, VQAReTokenRelation
from .augment import DistortBBox
__all__ = [ __all__ = [
'VQATokenPad', 'VQATokenPad',
'VQASerTokenChunk', 'VQASerTokenChunk',
'VQAReTokenChunk', 'VQAReTokenChunk',
'VQAReTokenRelation', 'VQAReTokenRelation',
'DistortBBox',
] ]
...@@ -16,22 +16,18 @@ import os ...@@ -16,22 +16,18 @@ import os
import sys import sys
import numpy as np import numpy as np
import random import random
from copy import deepcopy
class DistortBBox: def order_by_tbyx(ocr_info):
def __init__(self, prob=0.5, max_scale=1, **kwargs): res = sorted(ocr_info, key=lambda r: (r["bbox"][1], r["bbox"][0]))
"""Random distort bbox for i in range(len(res) - 1):
""" for j in range(i, 0, -1):
self.prob = prob if abs(res[j + 1]["bbox"][1] - res[j]["bbox"][1]) < 20 and \
self.max_scale = max_scale (res[j + 1]["bbox"][0] < res[j]["bbox"][0]):
tmp = deepcopy(res[j])
def __call__(self, data): res[j] = deepcopy(res[j + 1])
if random.random() > self.prob: res[j + 1] = deepcopy(tmp)
return data else:
bbox = np.array(data['bbox']) break
rnd_scale = (np.random.rand(*bbox.shape) - 0.5) * 2 * self.max_scale return res
bbox = np.round(bbox + rnd_scale).astype(bbox.dtype)
data['bbox'] = np.clip(data['bbox'], 0, 1000)
data['bbox'] = bbox.tolist()
sys.stdout.flush()
return data
...@@ -63,18 +63,21 @@ class KLJSLoss(object): ...@@ -63,18 +63,21 @@ class KLJSLoss(object):
def __call__(self, p1, p2, reduction="mean"): def __call__(self, p1, p2, reduction="mean"):
if self.mode.lower() == 'kl': if self.mode.lower() == 'kl':
loss = paddle.multiply(p2, paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5)) loss = paddle.multiply(p2,
paddle.log((p2 + 1e-5) / (p1 + 1e-5) + 1e-5))
loss += paddle.multiply( loss += paddle.multiply(
p1, paddle.log((p1 + 1e-5) / (p2 + 1e-5) + 1e-5)) p1, paddle.log((p1 + 1e-5) / (p2 + 1e-5) + 1e-5))
loss *= 0.5 loss *= 0.5
elif self.mode.lower() == "js": elif self.mode.lower() == "js":
loss = paddle.multiply(p2, paddle.log((2*p2 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5)) loss = paddle.multiply(
p2, paddle.log((2 * p2 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5))
loss += paddle.multiply( loss += paddle.multiply(
p1, paddle.log((2*p1 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5)) p1, paddle.log((2 * p1 + 1e-5) / (p1 + p2 + 1e-5) + 1e-5))
loss *= 0.5 loss *= 0.5
else: else:
raise ValueError("The mode.lower() if KLJSLoss should be one of ['kl', 'js']") raise ValueError(
"The mode.lower() if KLJSLoss should be one of ['kl', 'js']")
if reduction == "mean": if reduction == "mean":
loss = paddle.mean(loss, axis=[1, 2]) loss = paddle.mean(loss, axis=[1, 2])
elif reduction == "none" or reduction is None: elif reduction == "none" or reduction is None:
...@@ -154,7 +157,9 @@ class LossFromOutput(nn.Layer): ...@@ -154,7 +157,9 @@ class LossFromOutput(nn.Layer):
self.reduction = reduction self.reduction = reduction
def forward(self, predicts, batch): def forward(self, predicts, batch):
loss = predicts[self.key] loss = predicts
if self.key is not None and isinstance(predicts, dict):
loss = loss[self.key]
if self.reduction == 'mean': if self.reduction == 'mean':
loss = paddle.mean(loss) loss = paddle.mean(loss)
elif self.reduction == 'sum': elif self.reduction == 'sum':
......
...@@ -24,6 +24,9 @@ from .distillation_loss import DistillationCTCLoss ...@@ -24,6 +24,9 @@ from .distillation_loss import DistillationCTCLoss
from .distillation_loss import DistillationSARLoss from .distillation_loss import DistillationSARLoss
from .distillation_loss import DistillationDMLLoss from .distillation_loss import DistillationDMLLoss
from .distillation_loss import DistillationDistanceLoss, DistillationDBLoss, DistillationDilaDBLoss from .distillation_loss import DistillationDistanceLoss, DistillationDBLoss, DistillationDilaDBLoss
from .distillation_loss import DistillationVQASerTokenLayoutLMLoss, DistillationSERDMLLoss
from .distillation_loss import DistillationLossFromOutput
from .distillation_loss import DistillationVQADistanceLoss
class CombinedLoss(nn.Layer): class CombinedLoss(nn.Layer):
......
...@@ -21,8 +21,10 @@ from .rec_ctc_loss import CTCLoss ...@@ -21,8 +21,10 @@ from .rec_ctc_loss import CTCLoss
from .rec_sar_loss import SARLoss from .rec_sar_loss import SARLoss
from .basic_loss import DMLLoss from .basic_loss import DMLLoss
from .basic_loss import DistanceLoss from .basic_loss import DistanceLoss
from .basic_loss import LossFromOutput
from .det_db_loss import DBLoss from .det_db_loss import DBLoss
from .det_basic_loss import BalanceLoss, MaskL1Loss, DiceLoss from .det_basic_loss import BalanceLoss, MaskL1Loss, DiceLoss
from .vqa_token_layoutlm_loss import VQASerTokenLayoutLMLoss
def _sum_loss(loss_dict): def _sum_loss(loss_dict):
...@@ -322,3 +324,133 @@ class DistillationDistanceLoss(DistanceLoss): ...@@ -322,3 +324,133 @@ class DistillationDistanceLoss(DistanceLoss):
loss_dict["{}_{}_{}_{}".format(self.name, pair[0], pair[1], loss_dict["{}_{}_{}_{}".format(self.name, pair[0], pair[1],
idx)] = loss idx)] = loss
return loss_dict return loss_dict
class DistillationVQASerTokenLayoutLMLoss(VQASerTokenLayoutLMLoss):
def __init__(self,
num_classes,
model_name_list=[],
key=None,
name="loss_ser"):
super().__init__(num_classes=num_classes)
self.model_name_list = model_name_list
self.key = key
self.name = name
def forward(self, predicts, batch):
loss_dict = dict()
for idx, model_name in enumerate(self.model_name_list):
out = predicts[model_name]
if self.key is not None:
out = out[self.key]
loss = super().forward(out, batch)
loss_dict["{}_{}".format(self.name, model_name)] = loss["loss"]
return loss_dict
class DistillationLossFromOutput(LossFromOutput):
def __init__(self,
reduction="none",
model_name_list=[],
dist_key=None,
key="loss",
name="loss_re"):
super().__init__(key=key, reduction=reduction)
self.model_name_list = model_name_list
self.name = name
self.dist_key = dist_key
def forward(self, predicts, batch):
loss_dict = dict()
for idx, model_name in enumerate(self.model_name_list):
out = predicts[model_name]
if self.dist_key is not None:
out = out[self.dist_key]
loss = super().forward(out, batch)
loss_dict["{}_{}".format(self.name, model_name)] = loss["loss"]
return loss_dict
class DistillationSERDMLLoss(DMLLoss):
"""
"""
def __init__(self,
act="softmax",
use_log=True,
num_classes=7,
model_name_pairs=[],
key=None,
name="loss_dml_ser"):
super().__init__(act=act, use_log=use_log)
assert isinstance(model_name_pairs, list)
self.key = key
self.name = name
self.num_classes = num_classes
self.model_name_pairs = model_name_pairs
def forward(self, predicts, batch):
loss_dict = dict()
for idx, pair in enumerate(self.model_name_pairs):
out1 = predicts[pair[0]]
out2 = predicts[pair[1]]
if self.key is not None:
out1 = out1[self.key]
out2 = out2[self.key]
out1 = out1.reshape([-1, out1.shape[-1]])
out2 = out2.reshape([-1, out2.shape[-1]])
attention_mask = batch[2]
if attention_mask is not None:
active_output = attention_mask.reshape([-1, ]) == 1
out1 = out1[active_output]
out2 = out2[active_output]
loss_dict["{}_{}".format(self.name, idx)] = super().forward(out1,
out2)
return loss_dict
class DistillationVQADistanceLoss(DistanceLoss):
def __init__(self,
mode="l2",
model_name_pairs=[],
key=None,
name="loss_distance",
**kargs):
super().__init__(mode=mode, **kargs)
assert isinstance(model_name_pairs, list)
self.key = key
self.model_name_pairs = model_name_pairs
self.name = name + "_l2"
def forward(self, predicts, batch):
loss_dict = dict()
for idx, pair in enumerate(self.model_name_pairs):
out1 = predicts[pair[0]]
out2 = predicts[pair[1]]
attention_mask = batch[2]
if self.key is not None:
out1 = out1[self.key]
out2 = out2[self.key]
if attention_mask is not None:
max_len = attention_mask.shape[-1]
out1 = out1[:, :max_len]
out2 = out2[:, :max_len]
out1 = out1.reshape([-1, out1.shape[-1]])
out2 = out2.reshape([-1, out2.shape[-1]])
if attention_mask is not None:
active_output = attention_mask.reshape([-1, ]) == 1
out1 = out1[active_output]
out2 = out2[active_output]
loss = super().forward(out1, out2)
if isinstance(loss, dict):
for key in loss:
loss_dict["{}_{}nohu_{}".format(self.name, key,
idx)] = loss[key]
else:
loss_dict["{}_{}_{}_{}".format(self.name, pair[0], pair[1],
idx)] = loss
return loss_dict
...@@ -17,26 +17,30 @@ from __future__ import division ...@@ -17,26 +17,30 @@ from __future__ import division
from __future__ import print_function from __future__ import print_function
from paddle import nn from paddle import nn
from ppocr.losses.basic_loss import DMLLoss
class VQASerTokenLayoutLMLoss(nn.Layer): class VQASerTokenLayoutLMLoss(nn.Layer):
def __init__(self, num_classes): def __init__(self, num_classes, key=None):
super().__init__() super().__init__()
self.loss_class = nn.CrossEntropyLoss() self.loss_class = nn.CrossEntropyLoss()
self.num_classes = num_classes self.num_classes = num_classes
self.ignore_index = self.loss_class.ignore_index self.ignore_index = self.loss_class.ignore_index
self.key = key
def forward(self, predicts, batch): def forward(self, predicts, batch):
if isinstance(predicts, dict) and self.key is not None:
predicts = predicts[self.key]
labels = batch[5] labels = batch[5]
attention_mask = batch[2] attention_mask = batch[2]
if attention_mask is not None: if attention_mask is not None:
active_loss = attention_mask.reshape([-1, ]) == 1 active_loss = attention_mask.reshape([-1, ]) == 1
active_outputs = predicts.reshape( active_output = predicts.reshape(
[-1, self.num_classes])[active_loss] [-1, self.num_classes])[active_loss]
active_labels = labels.reshape([-1, ])[active_loss] active_label = labels.reshape([-1, ])[active_loss]
loss = self.loss_class(active_outputs, active_labels) loss = self.loss_class(active_output, active_label)
else: else:
loss = self.loss_class( loss = self.loss_class(
predicts.reshape([-1, self.num_classes]), predicts.reshape([-1, self.num_classes]),
labels.reshape([-1, ])) labels.reshape([-1, ]))
return {'loss': loss} return {'loss': loss}
\ No newline at end of file
...@@ -19,6 +19,8 @@ from .rec_metric import RecMetric ...@@ -19,6 +19,8 @@ from .rec_metric import RecMetric
from .det_metric import DetMetric from .det_metric import DetMetric
from .e2e_metric import E2EMetric from .e2e_metric import E2EMetric
from .cls_metric import ClsMetric from .cls_metric import ClsMetric
from .vqa_token_ser_metric import VQASerTokenMetric
from .vqa_token_re_metric import VQAReTokenMetric
class DistillationMetric(object): class DistillationMetric(object):
......
...@@ -73,28 +73,40 @@ class BaseModel(nn.Layer): ...@@ -73,28 +73,40 @@ class BaseModel(nn.Layer):
self.return_all_feats = config.get("return_all_feats", False) self.return_all_feats = config.get("return_all_feats", False)
def forward(self, x, data=None): def forward(self, x, data=None):
y = dict() y = dict()
if self.use_transform: if self.use_transform:
x = self.transform(x) x = self.transform(x)
x = self.backbone(x) x = self.backbone(x)
y["backbone_out"] = x if isinstance(x, dict):
y.update(x)
else:
y["backbone_out"] = x
final_name = "backbone_out"
if self.use_neck: if self.use_neck:
x = self.neck(x) x = self.neck(x)
y["neck_out"] = x if isinstance(x, dict):
y.update(x)
else:
y["neck_out"] = x
final_name = "neck_out"
if self.use_head: if self.use_head:
x = self.head(x, targets=data) x = self.head(x, targets=data)
# for multi head, save ctc neck out for udml # for multi head, save ctc neck out for udml
if isinstance(x, dict) and 'ctc_neck' in x.keys(): if isinstance(x, dict) and 'ctc_neck' in x.keys():
y["neck_out"] = x["ctc_neck"] y["neck_out"] = x["ctc_neck"]
y["head_out"] = x y["head_out"] = x
elif isinstance(x, dict): elif isinstance(x, dict):
y.update(x) y.update(x)
else: else:
y["head_out"] = x y["head_out"] = x
final_name = "head_out"
if self.return_all_feats: if self.return_all_feats:
if self.training: if self.training:
return y return y
elif isinstance(x, dict):
return x
else: else:
return {"head_out": y["head_out"]} return {final_name: x}
else: else:
return x return x
...@@ -22,13 +22,22 @@ from paddle import nn ...@@ -22,13 +22,22 @@ from paddle import nn
from paddlenlp.transformers import LayoutXLMModel, LayoutXLMForTokenClassification, LayoutXLMForRelationExtraction from paddlenlp.transformers import LayoutXLMModel, LayoutXLMForTokenClassification, LayoutXLMForRelationExtraction
from paddlenlp.transformers import LayoutLMModel, LayoutLMForTokenClassification from paddlenlp.transformers import LayoutLMModel, LayoutLMForTokenClassification
from paddlenlp.transformers import LayoutLMv2Model, LayoutLMv2ForTokenClassification, LayoutLMv2ForRelationExtraction from paddlenlp.transformers import LayoutLMv2Model, LayoutLMv2ForTokenClassification, LayoutLMv2ForRelationExtraction
from paddlenlp.transformers import AutoModel
__all__ = ["LayoutXLMForSer", 'LayoutLMForSer'] __all__ = ["LayoutXLMForSer", "LayoutLMForSer"]
pretrained_model_dict = { pretrained_model_dict = {
LayoutXLMModel: 'layoutxlm-base-uncased', LayoutXLMModel: {
LayoutLMModel: 'layoutlm-base-uncased', "base": "layoutxlm-base-uncased",
LayoutLMv2Model: 'layoutlmv2-base-uncased' "vi": "layoutxlm-wo-backbone-base-uncased",
},
LayoutLMModel: {
"base": "layoutlm-base-uncased",
},
LayoutLMv2Model: {
"base": "layoutlmv2-base-uncased",
"vi": "layoutlmv2-wo-backbone-base-uncased",
},
} }
...@@ -36,42 +45,47 @@ class NLPBaseModel(nn.Layer): ...@@ -36,42 +45,47 @@ class NLPBaseModel(nn.Layer):
def __init__(self, def __init__(self,
base_model_class, base_model_class,
model_class, model_class,
type='ser', mode="base",
type="ser",
pretrained=True, pretrained=True,
checkpoints=None, checkpoints=None,
**kwargs): **kwargs):
super(NLPBaseModel, self).__init__() super(NLPBaseModel, self).__init__()
if checkpoints is not None: if checkpoints is not None: # load the trained model
self.model = model_class.from_pretrained(checkpoints) self.model = model_class.from_pretrained(checkpoints)
elif isinstance(pretrained, (str, )) and os.path.exists(pretrained): else: # load the pretrained-model
self.model = model_class.from_pretrained(pretrained) pretrained_model_name = pretrained_model_dict[base_model_class][
else: mode]
pretrained_model_name = pretrained_model_dict[base_model_class]
if pretrained is True: if pretrained is True:
base_model = base_model_class.from_pretrained( base_model = base_model_class.from_pretrained(
pretrained_model_name) pretrained_model_name)
else: else:
base_model = base_model_class( base_model = base_model_class.from_pretrained(pretrained)
**base_model_class.pretrained_init_configuration[ if type == "ser":
pretrained_model_name])
if type == 'ser':
self.model = model_class( self.model = model_class(
base_model, num_classes=kwargs['num_classes'], dropout=None) base_model, num_classes=kwargs["num_classes"], dropout=None)
else: else:
self.model = model_class(base_model, dropout=None) self.model = model_class(base_model, dropout=None)
self.out_channels = 1 self.out_channels = 1
self.use_visual_backbone = True
class LayoutLMForSer(NLPBaseModel): class LayoutLMForSer(NLPBaseModel):
def __init__(self, num_classes, pretrained=True, checkpoints=None, def __init__(self,
num_classes,
pretrained=True,
checkpoints=None,
mode="base",
**kwargs): **kwargs):
super(LayoutLMForSer, self).__init__( super(LayoutLMForSer, self).__init__(
LayoutLMModel, LayoutLMModel,
LayoutLMForTokenClassification, LayoutLMForTokenClassification,
'ser', mode,
"ser",
pretrained, pretrained,
checkpoints, checkpoints,
num_classes=num_classes) num_classes=num_classes, )
self.use_visual_backbone = False
def forward(self, x): def forward(self, x):
x = self.model( x = self.model(
...@@ -85,62 +99,92 @@ class LayoutLMForSer(NLPBaseModel): ...@@ -85,62 +99,92 @@ class LayoutLMForSer(NLPBaseModel):
class LayoutLMv2ForSer(NLPBaseModel): class LayoutLMv2ForSer(NLPBaseModel):
def __init__(self, num_classes, pretrained=True, checkpoints=None, def __init__(self,
num_classes,
pretrained=True,
checkpoints=None,
mode="base",
**kwargs): **kwargs):
super(LayoutLMv2ForSer, self).__init__( super(LayoutLMv2ForSer, self).__init__(
LayoutLMv2Model, LayoutLMv2Model,
LayoutLMv2ForTokenClassification, LayoutLMv2ForTokenClassification,
'ser', mode,
"ser",
pretrained, pretrained,
checkpoints, checkpoints,
num_classes=num_classes) num_classes=num_classes)
self.use_visual_backbone = True
if hasattr(self.model.layoutlmv2, "use_visual_backbone"
) and self.model.layoutlmv2.use_visual_backbone is False:
self.use_visual_backbone = False
def forward(self, x): def forward(self, x):
if self.use_visual_backbone is True:
image = x[4]
else:
image = None
x = self.model( x = self.model(
input_ids=x[0], input_ids=x[0],
bbox=x[1], bbox=x[1],
attention_mask=x[2], attention_mask=x[2],
token_type_ids=x[3], token_type_ids=x[3],
image=x[4], image=image,
position_ids=None, position_ids=None,
head_mask=None, head_mask=None,
labels=None) labels=None)
if not self.training: if self.training:
res = {"backbone_out": x[0]}
res.update(x[1])
return res
else:
return x return x
return x[0]
class LayoutXLMForSer(NLPBaseModel): class LayoutXLMForSer(NLPBaseModel):
def __init__(self, num_classes, pretrained=True, checkpoints=None, def __init__(self,
num_classes,
pretrained=True,
checkpoints=None,
mode="base",
**kwargs): **kwargs):
super(LayoutXLMForSer, self).__init__( super(LayoutXLMForSer, self).__init__(
LayoutXLMModel, LayoutXLMModel,
LayoutXLMForTokenClassification, LayoutXLMForTokenClassification,
'ser', mode,
"ser",
pretrained, pretrained,
checkpoints, checkpoints,
num_classes=num_classes) num_classes=num_classes)
self.use_visual_backbone = True
def forward(self, x): def forward(self, x):
if self.use_visual_backbone is True:
image = x[4]
else:
image = None
x = self.model( x = self.model(
input_ids=x[0], input_ids=x[0],
bbox=x[1], bbox=x[1],
attention_mask=x[2], attention_mask=x[2],
token_type_ids=x[3], token_type_ids=x[3],
image=x[4], image=image,
position_ids=None, position_ids=None,
head_mask=None, head_mask=None,
labels=None) labels=None)
if not self.training: if self.training:
res = {"backbone_out": x[0]}
res.update(x[1])
return res
else:
return x return x
return x[0]
class LayoutLMv2ForRe(NLPBaseModel): class LayoutLMv2ForRe(NLPBaseModel):
def __init__(self, pretrained=True, checkpoints=None, **kwargs): def __init__(self, pretrained=True, checkpoints=None, mode="base",
super(LayoutLMv2ForRe, self).__init__(LayoutLMv2Model, **kwargs):
LayoutLMv2ForRelationExtraction, super(LayoutLMv2ForRe, self).__init__(
're', pretrained, checkpoints) LayoutLMv2Model, LayoutLMv2ForRelationExtraction, mode, "re",
pretrained, checkpoints)
def forward(self, x): def forward(self, x):
x = self.model( x = self.model(
...@@ -158,18 +202,27 @@ class LayoutLMv2ForRe(NLPBaseModel): ...@@ -158,18 +202,27 @@ class LayoutLMv2ForRe(NLPBaseModel):
class LayoutXLMForRe(NLPBaseModel): class LayoutXLMForRe(NLPBaseModel):
def __init__(self, pretrained=True, checkpoints=None, **kwargs): def __init__(self, pretrained=True, checkpoints=None, mode="base",
super(LayoutXLMForRe, self).__init__(LayoutXLMModel, **kwargs):
LayoutXLMForRelationExtraction, super(LayoutXLMForRe, self).__init__(
're', pretrained, checkpoints) LayoutXLMModel, LayoutXLMForRelationExtraction, mode, "re",
pretrained, checkpoints)
self.use_visual_backbone = True
if hasattr(self.model.layoutxlm, "use_visual_backbone"
) and self.model.layoutxlm.use_visual_backbone is False:
self.use_visual_backbone = False
def forward(self, x): def forward(self, x):
if self.use_visual_backbone is True:
image = x[4]
else:
image = None
x = self.model( x = self.model(
input_ids=x[0], input_ids=x[0],
bbox=x[1], bbox=x[1],
attention_mask=x[2], attention_mask=x[2],
token_type_ids=x[3], token_type_ids=x[3],
image=x[4], image=image,
position_ids=None, position_ids=None,
head_mask=None, head_mask=None,
labels=None, labels=None,
......
...@@ -31,8 +31,8 @@ from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, \ ...@@ -31,8 +31,8 @@ from .rec_postprocess import CTCLabelDecode, AttnLabelDecode, SRNLabelDecode, \
SPINLabelDecode, VLLabelDecode SPINLabelDecode, VLLabelDecode
from .cls_postprocess import ClsPostProcess from .cls_postprocess import ClsPostProcess
from .pg_postprocess import PGPostProcess from .pg_postprocess import PGPostProcess
from .vqa_token_ser_layoutlm_postprocess import VQASerTokenLayoutLMPostProcess from .vqa_token_ser_layoutlm_postprocess import VQASerTokenLayoutLMPostProcess, DistillationSerPostProcess
from .vqa_token_re_layoutlm_postprocess import VQAReTokenLayoutLMPostProcess from .vqa_token_re_layoutlm_postprocess import VQAReTokenLayoutLMPostProcess, DistillationRePostProcess
from .table_postprocess import TableMasterLabelDecode, TableLabelDecode from .table_postprocess import TableMasterLabelDecode, TableLabelDecode
...@@ -45,7 +45,9 @@ def build_post_process(config, global_config=None): ...@@ -45,7 +45,9 @@ def build_post_process(config, global_config=None):
'SEEDLabelDecode', 'VQASerTokenLayoutLMPostProcess', 'SEEDLabelDecode', 'VQASerTokenLayoutLMPostProcess',
'VQAReTokenLayoutLMPostProcess', 'PRENLabelDecode', 'VQAReTokenLayoutLMPostProcess', 'PRENLabelDecode',
'DistillationSARLabelDecode', 'ViTSTRLabelDecode', 'ABINetLabelDecode', 'DistillationSARLabelDecode', 'ViTSTRLabelDecode', 'ABINetLabelDecode',
'TableMasterLabelDecode', 'SPINLabelDecode', 'VLLabelDecode' 'TableMasterLabelDecode', 'SPINLabelDecode',
'DistillationSerPostProcess', 'DistillationRePostProcess',
'VLLabelDecode'
] ]
if config['name'] == 'PSEPostProcess': if config['name'] == 'PSEPostProcess':
......
...@@ -49,3 +49,25 @@ class VQAReTokenLayoutLMPostProcess(object): ...@@ -49,3 +49,25 @@ class VQAReTokenLayoutLMPostProcess(object):
result.append((ocr_info_head, ocr_info_tail)) result.append((ocr_info_head, ocr_info_tail))
results.append(result) results.append(result)
return results return results
class DistillationRePostProcess(VQAReTokenLayoutLMPostProcess):
"""
DistillationRePostProcess
"""
def __init__(self, model_name=["Student"], key=None, **kwargs):
super().__init__(**kwargs)
if not isinstance(model_name, list):
model_name = [model_name]
self.model_name = model_name
self.key = key
def __call__(self, preds, *args, **kwargs):
output = dict()
for name in self.model_name:
pred = preds[name]
if self.key is not None:
pred = pred[self.key]
output[name] = super().__call__(pred, *args, **kwargs)
return output
...@@ -93,3 +93,25 @@ class VQASerTokenLayoutLMPostProcess(object): ...@@ -93,3 +93,25 @@ class VQASerTokenLayoutLMPostProcess(object):
ocr_info[idx]["pred"] = self.id2label_map_for_show[int(pred_id)] ocr_info[idx]["pred"] = self.id2label_map_for_show[int(pred_id)]
results.append(ocr_info) results.append(ocr_info)
return results return results
class DistillationSerPostProcess(VQASerTokenLayoutLMPostProcess):
"""
DistillationSerPostProcess
"""
def __init__(self, class_path, model_name=["Student"], key=None, **kwargs):
super().__init__(class_path, **kwargs)
if not isinstance(model_name, list):
model_name = [model_name]
self.model_name = model_name
self.key = key
def __call__(self, preds, batch=None, *args, **kwargs):
output = dict()
for name in self.model_name:
pred = preds[name]
if self.key is not None:
pred = pred[self.key]
output[name] = super().__call__(pred, batch=batch, *args, **kwargs)
return output
...@@ -53,8 +53,12 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -53,8 +53,12 @@ def load_model(config, model, optimizer=None, model_type='det'):
checkpoints = global_config.get('checkpoints') checkpoints = global_config.get('checkpoints')
pretrained_model = global_config.get('pretrained_model') pretrained_model = global_config.get('pretrained_model')
best_model_dict = {} best_model_dict = {}
is_float16 = False
if model_type == 'vqa': if model_type == 'vqa':
# NOTE: for vqa model, resume training is not supported now
if config["Architecture"]["algorithm"] in ["Distillation"]:
return best_model_dict
checkpoints = config['Architecture']['Backbone']['checkpoints'] checkpoints = config['Architecture']['Backbone']['checkpoints']
# load vqa method metric # load vqa method metric
if checkpoints: if checkpoints:
...@@ -78,6 +82,7 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -78,6 +82,7 @@ def load_model(config, model, optimizer=None, model_type='det'):
logger.warning( logger.warning(
"{}.pdopt is not exists, params of optimizer is not loaded". "{}.pdopt is not exists, params of optimizer is not loaded".
format(checkpoints)) format(checkpoints))
return best_model_dict return best_model_dict
if checkpoints: if checkpoints:
...@@ -96,6 +101,9 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -96,6 +101,9 @@ def load_model(config, model, optimizer=None, model_type='det'):
key, params.keys())) key, params.keys()))
continue continue
pre_value = params[key] pre_value = params[key]
if pre_value.dtype == paddle.float16:
pre_value = pre_value.astype(paddle.float32)
is_float16 = True
if list(value.shape) == list(pre_value.shape): if list(value.shape) == list(pre_value.shape):
new_state_dict[key] = pre_value new_state_dict[key] = pre_value
else: else:
...@@ -103,7 +111,10 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -103,7 +111,10 @@ def load_model(config, model, optimizer=None, model_type='det'):
"The shape of model params {} {} not matched with loaded params shape {} !". "The shape of model params {} {} not matched with loaded params shape {} !".
format(key, value.shape, pre_value.shape)) format(key, value.shape, pre_value.shape))
model.set_state_dict(new_state_dict) model.set_state_dict(new_state_dict)
if is_float16:
logger.info(
"The parameter type is float16, which is converted to float32 when loading"
)
if optimizer is not None: if optimizer is not None:
if os.path.exists(checkpoints + '.pdopt'): if os.path.exists(checkpoints + '.pdopt'):
optim_dict = paddle.load(checkpoints + '.pdopt') optim_dict = paddle.load(checkpoints + '.pdopt')
...@@ -122,9 +133,10 @@ def load_model(config, model, optimizer=None, model_type='det'): ...@@ -122,9 +133,10 @@ def load_model(config, model, optimizer=None, model_type='det'):
best_model_dict['start_epoch'] = states_dict['epoch'] + 1 best_model_dict['start_epoch'] = states_dict['epoch'] + 1
logger.info("resume from {}".format(checkpoints)) logger.info("resume from {}".format(checkpoints))
elif pretrained_model: elif pretrained_model:
load_pretrained_params(model, pretrained_model) is_float16 = load_pretrained_params(model, pretrained_model)
else: else:
logger.info('train from scratch') logger.info('train from scratch')
best_model_dict['is_float16'] = is_float16
return best_model_dict return best_model_dict
...@@ -138,19 +150,28 @@ def load_pretrained_params(model, path): ...@@ -138,19 +150,28 @@ def load_pretrained_params(model, path):
params = paddle.load(path + '.pdparams') params = paddle.load(path + '.pdparams')
state_dict = model.state_dict() state_dict = model.state_dict()
new_state_dict = {} new_state_dict = {}
is_float16 = False
for k1 in params.keys(): for k1 in params.keys():
if k1 not in state_dict.keys(): if k1 not in state_dict.keys():
logger.warning("The pretrained params {} not in model".format(k1)) logger.warning("The pretrained params {} not in model".format(k1))
else: else:
if params[k1].dtype == paddle.float16:
params[k1] = params[k1].astype(paddle.float32)
is_float16 = True
if list(state_dict[k1].shape) == list(params[k1].shape): if list(state_dict[k1].shape) == list(params[k1].shape):
new_state_dict[k1] = params[k1] new_state_dict[k1] = params[k1]
else: else:
logger.warning( logger.warning(
"The shape of model params {} {} not matched with loaded params {} {} !". "The shape of model params {} {} not matched with loaded params {} {} !".
format(k1, state_dict[k1].shape, k1, params[k1].shape)) format(k1, state_dict[k1].shape, k1, params[k1].shape))
model.set_state_dict(new_state_dict) model.set_state_dict(new_state_dict)
if is_float16:
logger.info(
"The parameter type is float16, which is converted to float32 when loading"
)
logger.info("load pretrain successful from {}".format(path)) logger.info("load pretrain successful from {}".format(path))
return model return is_float16
def save_model(model, def save_model(model,
...@@ -166,15 +187,19 @@ def save_model(model, ...@@ -166,15 +187,19 @@ def save_model(model,
""" """
_mkdir_if_not_exist(model_path, logger) _mkdir_if_not_exist(model_path, logger)
model_prefix = os.path.join(model_path, prefix) model_prefix = os.path.join(model_path, prefix)
paddle.save(optimizer.state_dict(), model_prefix + '.pdopt') if config['Architecture']["model_type"] != 'vqa':
paddle.save(optimizer.state_dict(), model_prefix + '.pdopt')
if config['Architecture']["model_type"] != 'vqa': if config['Architecture']["model_type"] != 'vqa':
paddle.save(model.state_dict(), model_prefix + '.pdparams') paddle.save(model.state_dict(), model_prefix + '.pdparams')
metric_prefix = model_prefix metric_prefix = model_prefix
else: else: # for vqa system, we follow the save/load rules in NLP
if config['Global']['distributed']: if config['Global']['distributed']:
model._layers.backbone.model.save_pretrained(model_prefix) arch = model._layers
else: else:
model.backbone.model.save_pretrained(model_prefix) arch = model
if config["Architecture"]["algorithm"] in ["Distillation"]:
arch = arch.Student
arch.backbone.model.save_pretrained(model_prefix)
metric_prefix = os.path.join(model_prefix, 'metric') metric_prefix = os.path.join(model_prefix, 'metric')
# save metric and config # save metric and config
with open(metric_prefix + '.states', 'wb') as f: with open(metric_prefix + '.states', 'wb') as f:
......
...@@ -216,7 +216,7 @@ Use the following command to complete the tandem prediction of `OCR + SER` based ...@@ -216,7 +216,7 @@ Use the following command to complete the tandem prediction of `OCR + SER` based
```shell ```shell
cd ppstructure cd ppstructure
CUDA_VISIBLE_DEVICES=0 python3.7 vqa/predict_vqa_token_ser.py --vqa_algorithm=LayoutXLM --ser_model_dir=../output/ser/infer --ser_dict_path=../train_data/XFUND/class_list_xfun.txt --image_dir=docs/vqa/input/zh_val_42.jpg --output=output CUDA_VISIBLE_DEVICES=0 python3.7 vqa/predict_vqa_token_ser.py --vqa_algorithm=LayoutXLM --ser_model_dir=../output/ser/infer --ser_dict_path=../train_data/XFUND/class_list_xfun.txt --vis_font_path=../doc/fonts/simfang.ttf --image_dir=docs/vqa/input/zh_val_42.jpg --output=output
``` ```
After the prediction is successful, the visualization images and results will be saved in the directory specified by the `output` field After the prediction is successful, the visualization images and results will be saved in the directory specified by the `output` field
......
...@@ -215,7 +215,7 @@ python3.7 tools/export_model.py -c configs/vqa/ser/layoutxlm.yml -o Architecture ...@@ -215,7 +215,7 @@ python3.7 tools/export_model.py -c configs/vqa/ser/layoutxlm.yml -o Architecture
```shell ```shell
cd ppstructure cd ppstructure
CUDA_VISIBLE_DEVICES=0 python3.7 vqa/predict_vqa_token_ser.py --vqa_algorithm=LayoutXLM --ser_model_dir=../output/ser/infer --ser_dict_path=../train_data/XFUND/class_list_xfun.txt --image_dir=docs/vqa/input/zh_val_42.jpg --output=output CUDA_VISIBLE_DEVICES=0 python3.7 vqa/predict_vqa_token_ser.py --vqa_algorithm=LayoutXLM --ser_model_dir=../output/ser/infer --ser_dict_path=../train_data/XFUND/class_list_xfun.txt --vis_font_path=../doc/fonts/simfang.ttf --image_dir=docs/vqa/input/zh_val_42.jpg --output=output
``` ```
预测成功后,可视化图片和结果会保存在`output`字段指定的目录下 预测成功后,可视化图片和结果会保存在`output`字段指定的目录下
......
...@@ -153,7 +153,7 @@ def main(args): ...@@ -153,7 +153,7 @@ def main(args):
img_res = draw_ser_results( img_res = draw_ser_results(
image_file, image_file,
ser_res, ser_res,
font_path="../doc/fonts/simfang.ttf", ) font_path=args.vis_font_path, )
img_save_path = os.path.join(args.output, img_save_path = os.path.join(args.output,
os.path.basename(image_file)) os.path.basename(image_file))
......
...@@ -114,7 +114,7 @@ Train: ...@@ -114,7 +114,7 @@ Train:
name: SimpleDataSet name: SimpleDataSet
data_dir: ./train_data/ic15_data/ data_dir: ./train_data/ic15_data/
label_file_list: label_file_list:
- ./train_data/ic15_data/rec_gt_train4w.txt - ./train_data/ic15_data/rec_gt_train.txt
transforms: transforms:
- DecodeImage: - DecodeImage:
img_mode: BGR img_mode: BGR
......
...@@ -153,7 +153,7 @@ Train: ...@@ -153,7 +153,7 @@ Train:
data_dir: ./train_data/ic15_data/ data_dir: ./train_data/ic15_data/
ext_op_transform_idx: 1 ext_op_transform_idx: 1
label_file_list: label_file_list:
- ./train_data/ic15_data/rec_gt_train4w.txt - ./train_data/ic15_data/rec_gt_train.txt
transforms: transforms:
- DecodeImage: - DecodeImage:
img_mode: BGR img_mode: BGR
......
...@@ -52,8 +52,9 @@ null:null ...@@ -52,8 +52,9 @@ null:null
===========================infer_benchmark_params========================== ===========================infer_benchmark_params==========================
random_infer_input:[{float32,[3,48,320]}] random_infer_input:[{float32,[3,48,320]}]
===========================train_benchmark_params========================== ===========================train_benchmark_params==========================
batch_size:128 batch_size:64
fp_items:fp32|fp16 fp_items:fp32|fp16
epoch:1 epoch:1
--profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile --profiler_options:batch_range=[10,20];state=GPU;tracer_option=Default;profile_path=model.profile
flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096 flags:FLAGS_eager_delete_tensor_gb=0.0;FLAGS_fraction_of_gpu_memory_to_use=0.98;FLAGS_conv_workspace_size_limit=4096
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0 model_name:ch_ppocr_mobile_v2_0
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/ infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_quant:False infer_quant:False
......
===========================ch_ppocr_mobile_v2.0=========================== ===========================ch_ppocr_mobile_v2.0===========================
model_name:ch_ppocr_mobile_v2.0 model_name:ch_ppocr_mobile_v2_0
python:python3.7 python:python3.7
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/ infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_export:null infer_export:null
......
===========================paddle2onnx_params=========================== ===========================paddle2onnx_params===========================
model_name:ch_ppocr_mobile_v2.0 model_name:ch_ppocr_mobile_v2_0
python:python3.7 python:python3.7
2onnx: paddle2onnx 2onnx: paddle2onnx
--det_model_dir:./inference/ch_ppocr_mobile_v2.0_det_infer/ --det_model_dir:./inference/ch_ppocr_mobile_v2.0_det_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0 model_name:ch_ppocr_mobile_v2_0
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0 model_name:ch_ppocr_mobile_v2_0
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/ infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer/
infer_quant:False infer_quant:False
......
===========================infer_params=========================== ===========================infer_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
python:python python:python
infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer infer_model:./inference/ch_ppocr_mobile_v2.0_det_infer
infer_export:null infer_export:null
......
===========================paddle2onnx_params=========================== ===========================paddle2onnx_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
python:python3.7 python:python3.7
2onnx: paddle2onnx 2onnx: paddle2onnx
--det_model_dir:./inference/ch_ppocr_mobile_v2.0_det_infer/ --det_model_dir:./inference/ch_ppocr_mobile_v2.0_det_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_infer/
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
python:python3.7 python:python3.7
gpu_list:192.168.0.1,192.168.0.2;0,1 gpu_list:192.168.0.1,192.168.0.2;0,1
Global.use_gpu:True Global.use_gpu:True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det model_name:ch_ppocr_mobile_v2_0_det
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det_PACT model_name:ch_ppocr_mobile_v2_0_det_PACT
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================kl_quant_params=========================== ===========================kl_quant_params===========================
model_name:ch_ppocr_mobile_v2.0_det_KL model_name:ch_ppocr_mobile_v2_0_det_KL
python:python3.7 python:python3.7
Global.pretrained_model:null Global.pretrained_model:null
Global.save_inference_dir:null Global.save_inference_dir:null
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det_FPGM model_name:ch_ppocr_mobile_v2_0_det_FPGM
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_det_FPGM model_name:ch_ppocr_mobile_v2_0_det_FPGM
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0_det_KL model_name:ch_ppocr_mobile_v2_0_det_KL
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer infer_model:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer
infer_quant:False infer_quant:False
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_KL model_name:ch_ppocr_mobile_v2_0_det_KL
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_det_KL model_name:ch_ppocr_mobile_v2_0_det_KL
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer/
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0_det_PACT model_name:ch_ppocr_mobile_v2_0_det_PACT
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_det_pact_infer infer_model:./inference/ch_ppocr_mobile_v2.0_det_pact_infer
infer_quant:False infer_quant:False
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_PACT model_name:ch_ppocr_mobile_v2_0_det_PACT
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_pact_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_pact_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_det_PACT model_name:ch_ppocr_mobile_v2_0_det_PACT
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_pact_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_pact_infer/
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0_rec model_name:ch_ppocr_mobile_v2_0_rec
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_infer/ infer_model:./inference/ch_ppocr_mobile_v2.0_rec_infer/
infer_quant:False infer_quant:False
......
===========================paddle2onnx_params=========================== ===========================paddle2onnx_params===========================
model_name:ch_ppocr_mobile_v2.0_rec model_name:ch_ppocr_mobile_v2_0_rec
python:python3.7 python:python3.7
2onnx: paddle2onnx 2onnx: paddle2onnx
--det_model_dir: --det_model_dir:
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_rec model_name:ch_ppocr_mobile_v2_0_rec
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:null --det_dirname:null
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec model_name:ch_ppocr_mobile_v2_0_rec
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec model_name:ch_ppocr_mobile_v2_0_rec
python:python3.7 python:python3.7
gpu_list:192.168.0.1,192.168.0.2;0,1 gpu_list:192.168.0.1,192.168.0.2;0,1
Global.use_gpu:True Global.use_gpu:True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec model_name:ch_ppocr_mobile_v2_0_rec
python:python3.7 python:python3.7
gpu_list:0|0,1 gpu_list:0|0,1
Global.use_gpu:True|True Global.use_gpu:True|True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_PACT model_name:ch_ppocr_mobile_v2_0_rec_PACT
python:python3.7 python:python3.7
gpu_list:0 gpu_list:0
Global.use_gpu:True|True Global.use_gpu:True|True
...@@ -14,7 +14,7 @@ null:null ...@@ -14,7 +14,7 @@ null:null
## ##
trainer:pact_train trainer:pact_train
norm_train:null norm_train:null
pact_train:deploy/slim/quantization/quant.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_PACT/rec_chinese_lite_train_v2.0.yml -o pact_train:deploy/slim/quantization/quant.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_PACT/rec_chinese_lite_train_v2.0.yml -o
fpgm_train:null fpgm_train:null
distill_train:null distill_train:null
null:null null:null
...@@ -28,7 +28,7 @@ null:null ...@@ -28,7 +28,7 @@ null:null
Global.save_inference_dir:./output/ Global.save_inference_dir:./output/
Global.checkpoints: Global.checkpoints:
norm_export:null norm_export:null
quant_export:deploy/slim/quantization/export_model.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_PACT/rec_chinese_lite_train_v2.0.yml -o quant_export:deploy/slim/quantization/export_model.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_PACT/rec_chinese_lite_train_v2.0.yml -o
fpgm_export:null fpgm_export:null
distill_export:null distill_export:null
export1:null export1:null
......
===========================kl_quant_params=========================== ===========================kl_quant_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_KL model_name:ch_ppocr_mobile_v2_0_rec_KL
python:python3.7 python:python3.7
Global.pretrained_model:null Global.pretrained_model:null
Global.save_inference_dir:null Global.save_inference_dir:null
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_infer/ infer_model:./inference/ch_ppocr_mobile_v2.0_rec_infer/
infer_export:deploy/slim/quantization/quant_kl.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_KL/rec_chinese_lite_train_v2.0.yml -o infer_export:deploy/slim/quantization/quant_kl.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_KL/rec_chinese_lite_train_v2.0.yml -o
infer_quant:True infer_quant:True
inference:tools/infer/predict_rec.py --rec_image_shape="3,32,320" inference:tools/infer/predict_rec.py --rec_image_shape="3,32,320"
--use_gpu:False|True --use_gpu:False|True
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_FPGM model_name:ch_ppocr_mobile_v2_0_rec_FPGM
python:python3.7 python:python3.7
gpu_list:0 gpu_list:0
Global.use_gpu:True|True Global.use_gpu:True|True
...@@ -15,7 +15,7 @@ null:null ...@@ -15,7 +15,7 @@ null:null
trainer:fpgm_train trainer:fpgm_train
norm_train:null norm_train:null
pact_train:null pact_train:null
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./pretrain_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy fpgm_train:deploy/slim/prune/sensitivity_anal.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./pretrain_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy
distill_train:null distill_train:null
null:null null:null
null:null null:null
...@@ -29,7 +29,7 @@ Global.save_inference_dir:./output/ ...@@ -29,7 +29,7 @@ Global.save_inference_dir:./output/
Global.checkpoints: Global.checkpoints:
norm_export:null norm_export:null
quant_export:null quant_export:null
fpgm_export:deploy/slim/prune/export_prune_model.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o fpgm_export:deploy/slim/prune/export_prune_model.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o
distill_export:null distill_export:null
export1:null export1:null
export2:null export2:null
......
===========================train_params=========================== ===========================train_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_FPGM model_name:ch_ppocr_mobile_v2_0_rec_FPGM
python:python3.7 python:python3.7
gpu_list:0 gpu_list:0
Global.use_gpu:True|True Global.use_gpu:True|True
...@@ -15,7 +15,7 @@ null:null ...@@ -15,7 +15,7 @@ null:null
trainer:fpgm_train trainer:fpgm_train
norm_train:null norm_train:null
pact_train:null pact_train:null
fpgm_train:deploy/slim/prune/sensitivity_anal.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./pretrain_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy fpgm_train:deploy/slim/prune/sensitivity_anal.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o Global.pretrained_model=./pretrain_models/ch_ppocr_mobile_v2.0_rec_train/best_accuracy
distill_train:null distill_train:null
null:null null:null
null:null null:null
...@@ -29,7 +29,7 @@ Global.save_inference_dir:./output/ ...@@ -29,7 +29,7 @@ Global.save_inference_dir:./output/
Global.checkpoints: Global.checkpoints:
norm_export:null norm_export:null
quant_export:null quant_export:null
fpgm_export:deploy/slim/prune/export_prune_model.py -c test_tipc/configs/ch_ppocr_mobile_v2.0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o fpgm_export:deploy/slim/prune/export_prune_model.py -c test_tipc/configs/ch_ppocr_mobile_v2_0_rec_FPGM/rec_chinese_lite_train_v2.0.yml -o
distill_export:null distill_export:null
export1:null export1:null
export2:null export2:null
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_KL model_name:ch_ppocr_mobile_v2_0_rec_KL
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_klquant_infer infer_model:./inference/ch_ppocr_mobile_v2.0_rec_klquant_infer
infer_quant:False infer_quant:False
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_det_KL model_name:ch_ppocr_mobile_v2_0_rec_KL
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_klquant_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_KL model_name:ch_ppocr_mobile_v2_0_rec_KL
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:null --det_dirname:null
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_PACT model_name:ch_ppocr_mobile_v2_0_rec_PACT
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_mobile_v2.0_rec_pact_infer infer_model:./inference/ch_ppocr_mobile_v2.0_rec_pact_infer
infer_quant:False infer_quant:False
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_det_PACT model_name:ch_ppocr_mobile_v2_0_rec_PACT
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_mobile_v2.0_det_pact_infer/ --det_dirname:./inference/ch_ppocr_mobile_v2.0_det_pact_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_mobile_v2.0_rec_PACT model_name:ch_ppocr_mobile_v2_0_rec_PACT
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:null --det_dirname:null
......
===========================cpp_infer_params=========================== ===========================cpp_infer_params===========================
model_name:ch_ppocr_server_v2.0 model_name:ch_ppocr_server_v2_0
use_opencv:True use_opencv:True
infer_model:./inference/ch_ppocr_server_v2.0_det_infer/ infer_model:./inference/ch_ppocr_server_v2.0_det_infer/
infer_quant:False infer_quant:False
......
===========================ch_ppocr_server_v2.0=========================== ===========================ch_ppocr_server_v2.0===========================
model_name:ch_ppocr_server_v2.0 model_name:ch_ppocr_server_v2_0
python:python3.7 python:python3.7
infer_model:./inference/ch_ppocr_server_v2.0_det_infer/ infer_model:./inference/ch_ppocr_server_v2.0_det_infer/
infer_export:null infer_export:null
......
===========================paddle2onnx_params=========================== ===========================paddle2onnx_params===========================
model_name:ch_ppocr_server_v2.0 model_name:ch_ppocr_server_v2_0
python:python3.7 python:python3.7
2onnx: paddle2onnx 2onnx: paddle2onnx
--det_model_dir:./inference/ch_ppocr_server_v2.0_det_infer/ --det_model_dir:./inference/ch_ppocr_server_v2.0_det_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_server_v2.0 model_name:ch_ppocr_server_v2_0
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_server_v2.0_det_infer/ --det_dirname:./inference/ch_ppocr_server_v2.0_det_infer/
......
===========================serving_params=========================== ===========================serving_params===========================
model_name:ch_ppocr_server_v2.0 model_name:ch_ppocr_server_v2_0
python:python3.7 python:python3.7
trans_model:-m paddle_serving_client.convert trans_model:-m paddle_serving_client.convert
--det_dirname:./inference/ch_ppocr_server_v2.0_det_infer/ --det_dirname:./inference/ch_ppocr_server_v2.0_det_infer/
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册