diff --git a/doc/doc_ch/table_recognition.md b/doc/doc_ch/table_recognition.md index fea95222cf68f9436b43edf63f8e1a549cb65491..558adf7bd04df745e0c508958bd110cb8d0f30b1 100644 --- a/doc/doc_ch/table_recognition.md +++ b/doc/doc_ch/table_recognition.md @@ -3,7 +3,7 @@ 本文提供了PaddleOCR表格识别模型的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明: - [1. 数据准备](#1-数据准备) - - [1.1. 准备数据集](#11-准备数据集) + - [1.1. 准备数据集](#11-数据集格式) - [1.2. 数据下载](#12-数据下载) - [1.3. 数据集生成](#13-数据集生成) - [2. 开始训练](#2-开始训练) @@ -23,7 +23,7 @@ # 1. 数据准备 -## 1.1. 准备数据集 +## 1.1. 数据集格式 PaddleOCR 表格识别模型数据集格式如下: ```txt @@ -71,8 +71,8 @@ TableGeneration是一个开源表格数据集生成工具,其通过浏览器 |类型|样例| |---|---| -|简单表格|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/simple.jpg)| -|彩色表格|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/color.jpg)| +|简单表格|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/simple.jpg)| +|彩色表格|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/color.jpg)| # 2. 开始训练 diff --git a/doc/doc_en/table_recognition_en.md b/doc/doc_en/table_recognition_en.md index 28f8c6fa98246d584532ebcea1d77394a03921f8..e2d55865f1265b83c7dc4b2b52c3172abe2da070 100644 --- a/doc/doc_en/table_recognition_en.md +++ b/doc/doc_en/table_recognition_en.md @@ -5,7 +5,7 @@ This article provides a full-process guide for the PaddleOCR table recognition m - [1. Data Preparation](#1-data-preparation) - [1.1. DataSet Preparation](#11-dataset-preparation) - [1.2. Data Download](#12-data-download) - - [1.3. Dataset Generation](#13-dataset-generation) + - [1.3. Dataset Generation](#13-dataset-format) - [2. Training](#2-training) - [2.1. Start Training](#21-start-training) - [2.2. Resume Training](#22-resume-training) @@ -23,7 +23,7 @@ This article provides a full-process guide for the PaddleOCR table recognition m # 1. Data Preparation -## 1.1. DataSet Preparation +## 1.1. DataSet Format The format of the PaddleOCR table recognition model dataset is as follows: ```txt @@ -35,15 +35,15 @@ img_label The json format of each line is: ```json { - 'filename': PMC5755158_010_01.png, # image name - 'split': ’train‘, # whether the image belongs to the training set or the validation set - 'imgid': 0, # index of image + 'filename': PMC5755158_010_01.png,# image name + 'split': ’train‘, # whether the image belongs to the training set or the validation set + 'imgid': 0,# index of image 'html': { - 'structure': {'tokens': ['', '', '', ...]}, # HTML string of the table + 'structure': {'tokens': ['', '', '', ...]}, # HTML string of the table 'cell': [ { - 'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # text in cell - 'bbox': [x0, y0, x1, y1] # bbox of cell + 'tokens': ['P', 'a', 'd', 'd', 'l', 'e', 'P', 'a', 'd', 'd', 'l', 'e'], # text in cell + 'bbox': [x0, y0, x1, y1] # bbox of cell } ] } @@ -73,8 +73,8 @@ Some samples are as follows: |Type|Sample| |---|---| -|Simple Table|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/simple.jpg)| -|Simple Color Table|![](https://github.com/WenmuZhou/TableGeneration/blob/main/imgs/color.jpg)| +|Simple Table|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/simple.jpg)| +|Simple Color Table|![](https://raw.githubusercontent.com/WenmuZhou/TableGeneration/main/imgs/color.jpg)| # 2. Training diff --git a/ppstructure/table/README_ch.md b/ppstructure/table/README_ch.md index a21b3d1c26c57e045ae017deddba8b12c3421143..f82aa786ffd724f45322c57dfd94f2bca5f8a616 100644 --- a/ppstructure/table/README_ch.md +++ b/ppstructure/table/README_ch.md @@ -45,9 +45,13 @@ ## 3. 效果演示 ![图片](http://agroup.baidu-int.com/file/stream/bj/bj-e50a465becdbde9bffb84a84d41d196ac1acf1b6) + ![图片](http://agroup.baidu-int.com/file/stream/bj/bj-17ea53b181408a35d977c6c26b1ea308b4c27a79) + ![图片](http://agroup.baidu-int.com/file/stream/bj/bj-b905f57beca7115d54b907deac70c10056274858) + ![图片](http://agroup.baidu-int.com/file/stream/bj/bj-894694c9558fe7deb8cc896f9411fdfd252bca72) + ![图片](http://agroup.baidu-int.com/file/stream/bj/bj-03a0a67378b41a353257bd2fe8a1e9a864c89cb5) ## 4. 使用