提交 c167bdae 编写于 作者: 文幕地方's avatar 文幕地方

update doc

上级 faa66531
......@@ -3,7 +3,7 @@
本文提供了PaddleOCR表格识别模型的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
- [1. 数据准备](#1-数据准备)
- [1.1. 准备数据集](#11-准备数据集)
- [1.1. 准备数据集](#11-数据集格式)
- [1.2. 数据下载](#12-数据下载)
- [1.3. 数据集生成](#13-数据集生成)
- [2. 开始训练](#2-开始训练)
......@@ -23,7 +23,7 @@
# 1. 数据准备
## 1.1. 准备数据集
## 1.1. 数据集格式
PaddleOCR 表格识别模型数据集格式如下:
```txt
......@@ -71,8 +71,8 @@ TableGeneration是一个开源表格数据集生成工具,其通过浏览器
|类型|样例|
|---|---|
|简单表格|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/simple.jpg)|
|彩色表格|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/color.jpg)|
|简单表格|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/simple.jpg)|
|彩色表格|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/color.jpg)|
# 2. 开始训练
......
......@@ -5,7 +5,7 @@ This article provides a full-process guide for the PaddleOCR table recognition m
- [1. Data Preparation](#1-data-preparation)
- [1.1. DataSet Preparation](#11-dataset-preparation)
- [1.2. Data Download](#12-data-download)
- [1.3. Dataset Generation](#13-dataset-generation)
- [1.3. Dataset Generation](#13-dataset-format)
- [2. Training](#2-training)
- [2.1. Start Training](#21-start-training)
- [2.2. Resume Training](#22-resume-training)
......@@ -23,7 +23,7 @@ This article provides a full-process guide for the PaddleOCR table recognition m
# 1. Data Preparation
## 1.1. DataSet Preparation
## 1.1. DataSet Format
The format of the PaddleOCR table recognition model dataset is as follows:
```txt
......@@ -35,15 +35,15 @@ img_label
The json format of each line is:
```json
{
'filename': PMC5755158_010_01.png, # image name
'split': ’train‘, # whether the image belongs to the training set or the validation set
'imgid': 0, # index of image
'filename': PMC5755158_010_01.png,# image name
'split': ’train‘, # whether the image belongs to the training set or the validation set
'imgid': 0,# index of image
'html': {
'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, # HTML string of the table
'structure': {'tokens': ['<thead>', '<tr>', '<td>', ...]}, # HTML string of the table
'cell': [
{
'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # text in cell
'bbox': [x0, y0, x1, y1] # bbox of cell
'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # text in cell
'bbox': [x0, y0, x1, y1] # bbox of cell
}
]
}
......@@ -73,8 +73,8 @@ Some samples are as follows:
|Type|Sample|
|---|---|
|Simple Table|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/simple.jpg)|
|Simple Color Table|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/color.jpg)|
|Simple Table|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/simple.jpg)|
|Simple Color Table|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/color.jpg)|
# 2. Training
......
......@@ -45,9 +45,13 @@
## 3. 效果演示
![图片](http://agroup.baidu-int.com/file/stream/bj/bj-e50a465becdbde9bffb84a84d41d196ac1acf1b6)
![图片](http://agroup.baidu-int.com/file/stream/bj/bj-17ea53b181408a35d977c6c26b1ea308b4c27a79)
![图片](http://agroup.baidu-int.com/file/stream/bj/bj-b905f57beca7115d54b907deac70c10056274858)
![图片](http://agroup.baidu-int.com/file/stream/bj/bj-894694c9558fe7deb8cc896f9411fdfd252bca72)
![图片](http://agroup.baidu-int.com/file/stream/bj/bj-03a0a67378b41a353257bd2fe8a1e9a864c89cb5)
## 4. 使用
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册