提交 8ea84de6 编写于 作者: 文幕地方's avatar 文幕地方

rm title

上级 f0c73691
# OCR数据集
- [OCR数据集](#ocr数据集)
- [1. 文本检测](#1-文本检测)
- [1.1 PaddleOCR 文字检测数据格式](#11-paddleocr-文字检测数据格式)
- [1.2 公开数据集](#12-公开数据集)
- [1.2.1 ICDAR 2015](#121-icdar-2015)
- [2. 文本识别](#2-文本识别)
- [2.1 PaddleOCR 文字识别数据格式](#21-paddleocr-文字识别数据格式)
- [2.2 公开数据集](#22-公开数据集)
- [2.1 ICDAR 2015](#21-icdar-2015)
- [3. 数据存放路径](#3-数据存放路径)
- [1. 文本检测](#1-文本检测)
- [1.1 PaddleOCR 文字检测数据格式](#11-paddleocr-文字检测数据格式)
- [1.2 公开数据集](#12-公开数据集)
- [1.2.1 ICDAR 2015](#121-icdar-2015)
- [2. 文本识别](#2-文本识别)
- [2.1 PaddleOCR 文字识别数据格式](#21-paddleocr-文字识别数据格式)
- [2.2 公开数据集](#22-公开数据集)
- [2.1 ICDAR 2015](#21-icdar-2015)
- [3. 数据存放路径](#3-数据存放路径)
这里整理了OCR中常用的公开数据集,持续更新中,欢迎各位小伙伴贡献数据集~
......
# 表格识别数据集
- [表格识别数据集](#表格识别数据集)
- [数据集汇总](#数据集汇总)
- [1. PubTabNet数据集](#1-pubtabnet数据集)
- [2. 好未来表格识别竞赛数据集](#2-好未来表格识别竞赛数据集)
- [数据集汇总](#数据集汇总)
- [1. PubTabNet数据集](#1-pubtabnet数据集)
- [2. 好未来表格识别竞赛数据集](#2-好未来表格识别竞赛数据集)
这里整理了常用表格识别数据集,持续更新中,欢迎各位小伙伴贡献数据集~
......
......@@ -2,7 +2,6 @@
本节以icdar2015数据集为例,介绍PaddleOCR中检测模型训练、评估、测试的使用方式。
- [文字检测](#文字检测)
- [1. 准备数据和模型](#1-准备数据和模型)
- [1.1 准备数据集](#11-准备数据集)
- [1.2 下载预训练模型](#12-下载预训练模型)
......
......@@ -2,19 +2,18 @@
本文提供了PaddleOCR文本识别任务的全流程指南,包括数据准备、模型训练、调优、评估、预测,各个阶段的详细说明:
- [文字识别](#文字识别)
- [1. 数据准备](#1-数据准备)
- [1.1 准备数据集](#11-准备数据集)
- [1.2 字典](#12-字典)
- [1.3 添加空格类别](#13-添加空格类别)
- [2. 启动训练](#2-启动训练)
- [2.1 数据增强](#21-数据增强)
- [2.2 通用模型训练](#22-通用模型训练)
- [2.3 多语言模型训练](#23-多语言模型训练)
- [2.4 知识蒸馏训练](#24-知识蒸馏训练)
- [3 评估](#3-评估)
- [4 预测](#4-预测)
- [5. 转Inference模型测试](#5-转inference模型测试)
- [1. 数据准备](#1-数据准备)
- [1.1 准备数据集](#11-准备数据集)
- [1.2 字典](#12-字典)
- [1.3 添加空格类别](#13-添加空格类别)
- [2. 启动训练](#2-启动训练)
- [2.1 数据增强](#21-数据增强)
- [2.2 通用模型训练](#22-通用模型训练)
- [2.3 多语言模型训练](#23-多语言模型训练)
- [2.4 知识蒸馏训练](#24-知识蒸馏训练)
- [3 评估](#3-评估)
- [4 预测](#4-预测)
- [5. 转Inference模型测试](#5-转inference模型测试)
<a name="数据准备"></a>
......
# OCR datasets
- [OCR datasets](#ocr-datasets)
- [1. Text detection](#1-text-detection)
- [1.1 PaddleOCR text detection format annotation](#11-paddleocr-text-detection-format-annotation)
- [1.2 Public dataset](#12-public-dataset)
- [1.2.1 ICDAR 2015](#121-icdar-2015)
- [2. Text recognition](#2-text-recognition)
- [2.1 PaddleOCR text recognition format annotation](#21-paddleocr-text-recognition-format-annotation)
- [2.2 Public dataset](#22-public-dataset)
- [2.1 ICDAR2015](#21-icdar2015)
- [3. 数据存放路径](#3-数据存放路径)
- [1. Text detection](#1-text-detection)
- [1.1 PaddleOCR text detection format annotation](#11-paddleocr-text-detection-format-annotation)
- [1.2 Public dataset](#12-public-dataset)
- [1.2.1 ICDAR 2015](#121-icdar-2015)
- [2. Text recognition](#2-text-recognition)
- [2.1 PaddleOCR text recognition format annotation](#21-paddleocr-text-recognition-format-annotation)
- [2.2 Public dataset](#22-public-dataset)
- [2.1 ICDAR 2015](#21-icdar-2015)
- [3. Data storage path](#3-data-storage-path)
Here is a list of public datasets commonly used in OCR, which are being continuously updated. Welcome to contribute datasets~
......@@ -129,9 +128,9 @@ Similar to the training set, the test set also needs to be provided a folder con
|ICDAR 2015| http://rrc.cvc.uab.es/?ch=4&com=downloads | [train](https://paddleocr.bj.bcebos.com/dataset/rec_gt_train.txt)/ [test](https://paddleocr.bj.bcebos.com/dataset/rec_gt_test.txt) |
| Multilingual datasets |[Baidu network disk](https://pan.baidu.com/s/1bS_u207Rm7YbY33wOECKDA) Extraction code: frgi <br> [google drive](https://drive.google.com/file/d/18cSWX7wXSy4G0tbKJ0d9PuIaiwRLHpjA/view) | Included in the downloaded image zip |
#### 2.1 ICDAR2015
#### 2.1 ICDAR 2015
The ICDAR2015 dataset can be downloaded from the link in the table above for quick validation. The lmdb format dataset required by en benchmark can also be downloaded from the table above.
The ICDAR 2015 dataset can be downloaded from the link in the table above for quick validation. The lmdb format dataset required by en benchmark can also be downloaded from the table above.
Then download the PaddleOCR format annotation file from the table above.
......@@ -146,7 +145,7 @@ The data format is as follows, (a) is the original picture, (b) is the Ground Tr
![](../../datasets/icdar_rec.png)
## 3. 数据存放路径
## 3. Data storage path
The default storage path for PaddleOCR training data is `PaddleOCR/train_data`, if you already have a dataset on your disk, just create a soft link to the dataset directory:
......
# Table Recognition Datasets
- [Table Recognition Datasets](#table-recognition-datasets)
- [Dataset Summary](#dataset-summary)
- [1. PubTabNet](#1-pubtabnet)
- [2. TAL Table Recognition Competition Dataset](#2-tal-table-recognition-competition-dataset)
- [Dataset Summary](#dataset-summary)
- [1. PubTabNet](#1-pubtabnet)
- [2. TAL Table Recognition Competition Dataset](#2-tal-table-recognition-competition-dataset)
Here are the commonly used table recognition datasets, which are being updated continuously. Welcome to contribute datasets~
......
......@@ -2,20 +2,19 @@
This section uses the icdar2015 dataset as an example to introduce the training, evaluation, and testing of the detection model in PaddleOCR.
- [Text Detection](#text-detection)
- [1. Data and Weights Preparation](#1-data-and-weights-preparation)
- [1.1 Data Preparation](#11-data-preparation)
- [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
- [2. Training](#2-training)
- [2.1 Start Training](#21-start-training)
- [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
- [2.3 Training with New Backbone](#23-training-with-new-backbone)
- [2.4 Training with knowledge distillation](#24-training-with-knowledge-distillation)
- [3. Evaluation and Test](#3-evaluation-and-test)
- [3.1 Evaluation](#31-evaluation)
- [3.2 Test](#32-test)
- [4. Inference](#4-inference)
- [5. FAQ](#5-faq)
- [1. Data and Weights Preparation](#1-data-and-weights-preparation)
- [1.1 Data Preparation](#11-data-preparation)
- [1.2 Download Pre-trained Model](#12-download-pre-trained-model)
- [2. Training](#2-training)
- [2.1 Start Training](#21-start-training)
- [2.2 Load Trained Model and Continue Training](#22-load-trained-model-and-continue-training)
- [2.3 Training with New Backbone](#23-training-with-new-backbone)
- [2.4 Training with knowledge distillation](#24-training-with-knowledge-distillation)
- [3. Evaluation and Test](#3-evaluation-and-test)
- [3.1 Evaluation](#31-evaluation)
- [3.2 Test](#32-test)
- [4. Inference](#4-inference)
- [5. FAQ](#5-faq)
## 1. Data and Weights Preparation
......
# Text Recognition
- [Text Recognition](#text-recognition)
- [1. Data Preparation](#1-data-preparation)
- [1.1 DataSet Preparation](#11-dataset-preparation)
- [1.2 Dictionary](#12-dictionary)
- [1.4 Add Space Category](#14-add-space-category)
- [2.Training](#2training)
- [2.1 Data Augmentation](#21-data-augmentation)
- [2.2 General Training](#22-general-training)
- [2.3 Multi-language Training](#23-multi-language-training)
- [2.4 Training with Knowledge Distillation](#24-training-with-knowledge-distillation)
- [3. Evalution](#3-evalution)
- [4. Prediction](#4-prediction)
- [5. Convert to Inference Model](#5-convert-to-inference-model)
- [1. Data Preparation](#1-data-preparation)
- [1.1 DataSet Preparation](#11-dataset-preparation)
- [1.2 Dictionary](#12-dictionary)
- [1.4 Add Space Category](#14-add-space-category)
- [2.Training](#2training)
- [2.1 Data Augmentation](#21-data-augmentation)
- [2.2 General Training](#22-general-training)
- [2.3 Multi-language Training](#23-multi-language-training)
- [2.4 Training with Knowledge Distillation](#24-training-with-knowledge-distillation)
- [3. Evalution](#3-evalution)
- [4. Prediction](#4-prediction)
- [5. Convert to Inference Model](#5-convert-to-inference-model)
<a name="DATA_PREPARATION"></a>
## 1. Data Preparation
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册