提交 5b4675e0 编写于 作者: L LDOUBLEV

fix problems refer comments

上级 e91f3703
# 简介 # 简介
PaddleOCR旨在打造一套丰富、领先、且实用的文字检测、识别模型/工具库,助力使用者训练出更好的模型,并应用落地。 PaddleOCR旨在打造一套丰富、领先、且实用的OCR工具库,助力使用者训练出更好的模型,并应用落地。
## 文档教程
- [快速安装](./doc/installation.md)
- [文本识别模型训练/评估/预测](./doc/detection.md)
- [文本预测模型训练/评估/预测](./doc/recognition.md)
## 特性: ## 特性:
- 超轻量级模型 - 超轻量级模型
- (检测模型4.1M + 识别模型4.5M = 8.6M) - (检测模型4.1M + 识别模型4.5M = 8.6M)
- 支持竖排文字 - 支持竖排文字识别
- (单模型同时支持横排和竖排文字识别) - (单模型同时支持横排和竖排文字识别)
- 支持长文本识别 - 支持长文本识别
- 支持中英文数字组合识别 - 支持中英文数字组合识别
- 提供训练代码 - 提供训练代码
- 支持模型部署 - 支持模型部署
## 文档教程
- [快速安装](./doc/installation.md)
- [快速开始]()
- [文本识别模型训练/评估/预测](./doc/detection.md)
- [文本预测模型训练/评估/预测](./doc/recognition.md)
- [基于inference model预测](./doc/)
## 文本检测算法: ## 文本检测算法:
PaddleOCR开源的文本检测算法列表: PaddleOCR开源的文本检测算法列表:
- [EAST](https://arxiv.org/abs/1704.03155) - [x] [EAST](https://arxiv.org/abs/1704.03155)
- [DB](https://arxiv.org/abs/1911.08947) - [x] [DB](https://arxiv.org/abs/1911.08947)
- [SAST](https://arxiv.org/abs/1908.05498) - [x] [SAST](https://arxiv.org/abs/1908.05498)
- []
算法效果: 算法效果:
|模型|骨干网络|数据集|Hmean| |模型|骨干网络|Hmean|
|-|-|-|-| |-|-|-|
|EAST|ResNet50_vd|ICDAR2015|85.85%| |EAST^[1]^|ResNet50_vd|85.85%|
|EAST|MobileNetV3|ICDAR2015|79.08%| |EAST^[1]^|MobileNetV3|79.08%|
|DB|ResNet50_vd|ICDAR2015|83.30%| |DB^[2]^|ResNet50_vd|83.30%|
|DB|MobileNetV3|ICDAR2015|73.00%| |DB^[2]^|MobileNetV3|73.00%|
PaddleOCR文本检测算法的训练与使用请参考[文档](./doc/detection.md) PaddleOCR文本检测算法的训练与使用请参考[文档](./doc/detection.md)
## 文本识别算法: ## 文本识别算法:
PaddleOCR开源的文本识别算法列表: PaddleOCR开源的文本识别算法列表:
- (CRNN)[https://arxiv.org/abs/1507.05717] - [CRNN](https://arxiv.org/abs/1507.05717)
- [Rosetta](https://arxiv.org/abs/1910.05085) - [Rosetta](https://arxiv.org/abs/1910.05085)
- [STAR-Net](http://www.bmva.org/bmvc/2016/papers/paper043/index.html) - [STAR-Net](http://www.bmva.org/bmvc/2016/papers/paper043/index.html)
- [RARE](https://arxiv.org/abs/1603.03915v1) - [RARE](https://arxiv.org/abs/1603.03915v1)
...@@ -59,7 +64,8 @@ PaddleOCR开源的文本识别算法列表: ...@@ -59,7 +64,8 @@ PaddleOCR开源的文本识别算法列表:
PaddleOCR文本识别算法的训练与使用请参考[文档](./doc/recognition.md) PaddleOCR文本识别算法的训练与使用请参考[文档](./doc/recognition.md)
## 端到端算法 ## TODO
**端到端OCR算法**
PaddleOCR即将开源百度自研端对端OCR模型[End2End-PSL](https://arxiv.org/abs/1909.07808),敬请关注。 PaddleOCR即将开源百度自研端对端OCR模型[End2End-PSL](https://arxiv.org/abs/1909.07808),敬请关注。
- End2End-PSL (comming soon) - End2End-PSL (comming soon)
...@@ -67,7 +73,7 @@ PaddleOCR即将开源百度自研端对端OCR模型[End2End-PSL](https://arxiv.o ...@@ -67,7 +73,7 @@ PaddleOCR即将开源百度自研端对端OCR模型[End2End-PSL](https://arxiv.o
# 参考文献 # 参考文献
``` ```
EAST: 1. EAST:
@inproceedings{zhou2017east, @inproceedings{zhou2017east,
title={EAST: an efficient and accurate scene text detector}, title={EAST: an efficient and accurate scene text detector},
author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun}, author={Zhou, Xinyu and Yao, Cong and Wen, He and Wang, Yuzhi and Zhou, Shuchang and He, Weiran and Liang, Jiajun},
...@@ -76,7 +82,7 @@ EAST: ...@@ -76,7 +82,7 @@ EAST:
year={2017} year={2017}
} }
DB: 2. DB:
@article{liao2019real, @article{liao2019real,
title={Real-time Scene Text Detection with Differentiable Binarization}, title={Real-time Scene Text Detection with Differentiable Binarization},
author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang}, author={Liao, Minghui and Wan, Zhaoyi and Yao, Cong and Chen, Kai and Bai, Xiang},
...@@ -84,7 +90,7 @@ DB: ...@@ -84,7 +90,7 @@ DB:
year={2019} year={2019}
} }
DTRB: 3. DTRB:
@inproceedings{baek2019wrong, @inproceedings{baek2019wrong,
title={What is wrong with scene text recognition model comparisons? dataset and model analysis}, title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk}, author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
...@@ -93,7 +99,7 @@ DTRB: ...@@ -93,7 +99,7 @@ DTRB:
year={2019} year={2019}
} }
SAST: 4. SAST:
@inproceedings{wang2019single, @inproceedings{wang2019single,
title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning}, title={A Single-Shot Arbitrarily-Shaped Text Detector based on Context Attended Multi-Task Learning},
author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming}, author={Wang, Pengfei and Zhang, Chengquan and Qi, Fei and Huang, Zuming and En, Mengyi and Han, Junyu and Liu, Jingtuo and Ding, Errui and Shi, Guangming},
...@@ -102,7 +108,7 @@ SAST: ...@@ -102,7 +108,7 @@ SAST:
year={2019} year={2019}
} }
SRN: 5. SRN:
@article{yu2020towards, @article{yu2020towards,
title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks}, title={Towards Accurate Scene Text Recognition with Semantic Reasoning Networks},
author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui}, author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Han, Junyu and Liu, Jingtuo and Ding, Errui},
...@@ -110,7 +116,7 @@ SRN: ...@@ -110,7 +116,7 @@ SRN:
year={2020} year={2020}
} }
end2end-psl: 6. end2end-psl:
@inproceedings{sun2019chinese, @inproceedings{sun2019chinese,
title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning}, title={Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning},
author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo}, author={Sun, Yipeng and Liu, Jiaming and Liu, Wei and Han, Junyu and Ding, Errui and Liu, Jingtuo},
......
# 文字检测 # 文字检测
本节以icdar15数据集为例,介绍PaddleOCR中检测模型的使用方式 本节以icdar15数据集为例,介绍PaddleOCR中检测模型的训练、评估与测试
## 数据准备 ## 数据准备
icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。 icdar2015数据集可以从[官网](https://rrc.cvc.uab.es/?ch=4&com=downloads)下载到,首次下载需注册。
...@@ -26,8 +26,9 @@ wget -P /PaddleOCR/train_data/ 测试标注文件链接 ...@@ -26,8 +26,9 @@ wget -P /PaddleOCR/train_data/ 测试标注文件链接
" 图像文件名 json.dumps编码的图像标注信息" " 图像文件名 json.dumps编码的图像标注信息"
ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}] ch4_test_images/img_61.jpg [{"transcription": "MASA", "points": [[310, 104], [416, 141], [418, 216], [312, 179]], ...}]
``` ```
json.dumps编码前的图像标注信息是包含多个字典的list,字典中的points表示文本框的位置,如果您想在其他数据集上训练PaddleOCR, json.dumps编码前的图像标注信息是包含多个字典的list,字典中的$points$表示文本框的四个点的坐标(x, y),从左上角的点开始顺时针排列。
可以按照上述形式构建标注文件。 $transcription$表示当前文本框的文字,在文本检测任务中并不需要这个信息。
如果您想在其他数据集上训练PaddleOCR,可以按照上述形式构建标注文件。
## 快速启动训练 ## 快速启动训练
...@@ -62,7 +63,7 @@ PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall ...@@ -62,7 +63,7 @@ PaddleOCR计算三个OCR检测相关的指标,分别是:Precision、Recall
运行如下代码,根据配置文件det_db_mv3.yml中save_res_path指定的测试集检测结果文件,计算评估指标。 运行如下代码,根据配置文件det_db_mv3.yml中save_res_path指定的测试集检测结果文件,计算评估指标。
``` ```
python3 tools/eval.py -c configs/det/det_db_mv3.yml -o checkpoints="./output/best_accuracy" python3 tools/eval.py -c configs/det/det_db_mv3.yml -o Gloabl.checkpoints="./output/best_accuracy"
``` ```
## 测试检测效果 ## 测试检测效果
......
...@@ -25,10 +25,3 @@ cd PaddleOCR ...@@ -25,10 +25,3 @@ cd PaddleOCR
pip3 install --upgrade pip pip3 install --upgrade pip
pip3 install -r requirements.txt pip3 install -r requirements.txt
``` ```
## 快速运行
```
python3 tools/infer/predict_eval.py --image_file="./"
```
【可视化运行结果】
...@@ -22,6 +22,7 @@ import string ...@@ -22,6 +22,7 @@ import string
from ppocr.utils.utility import initial_logger from ppocr.utils.utility import initial_logger
logger = initial_logger() logger = initial_logger()
from ppocr.utils.utility import create_module from ppocr.utils.utility import create_module
from tools.infer.utility import get_image_file_list
import time import time
...@@ -72,16 +73,8 @@ class EvalTestReader(object): ...@@ -72,16 +73,8 @@ class EvalTestReader(object):
self.params) self.params)
batch_size = self.params['test_batch_size_per_card'] batch_size = self.params['test_batch_size_per_card']
flag_test_single_img = False
if mode == "test":
single_img_path = self.params['single_img_path']
if single_img_path is not None:
flag_test_single_img = True
img_list = [] img_list = []
if flag_test_single_img: if mode != "test":
img_list.append([single_img_path, single_img_path])
else:
img_set_dir = self.params['img_set_dir'] img_set_dir = self.params['img_set_dir']
img_name_list_path = self.params['label_file_path'] img_name_list_path = self.params['label_file_path']
with open(img_name_list_path, "rb") as fin: with open(img_name_list_path, "rb") as fin:
...@@ -90,6 +83,9 @@ class EvalTestReader(object): ...@@ -90,6 +83,9 @@ class EvalTestReader(object):
img_name = line.decode().strip("\n").split("\t")[0] img_name = line.decode().strip("\n").split("\t")[0]
img_path = img_set_dir + "/" + img_name img_path = img_set_dir + "/" + img_name
img_list.append([img_path, img_name]) img_list.append([img_path, img_name])
else:
img_path = self.params['single_img_path']
img_list = get_image_file_list(img_path)
def batch_iter_reader(): def batch_iter_reader():
batch_outs = [] batch_outs = []
......
...@@ -61,16 +61,21 @@ def parse_args(): ...@@ -61,16 +61,21 @@ def parse_args():
return parser.parse_args() return parser.parse_args()
def get_image_file_list(image_dir): def get_image_file_list(img_file):
image_file_list = [] imgs_lists = []
if image_dir is None: if img_file is None or not os.path.exists(img_file):
return image_file_list raise Exception("not found any img file in {}".format(img_file))
if os.path.isfile(image_dir):
image_file_list = [image_dir] img_end = ['jpg', 'png', 'jpeg', 'JPEG', 'JPG', 'bmp']
elif os.path.isdir(image_dir): if os.path.isfile(img_file) and img_file.split('.')[-1] in img_end:
for single_file in os.listdir(image_dir): imgs_lists.append(img_file)
image_file_list.append(os.path.join(image_dir, single_file)) elif os.path.isdir(img_file):
return image_file_list for single_file in os.listdir(img_file):
if single_file.split('.')[-1] in img_end:
imgs_lists.append(os.path.join(img_file, single_file))
if len(imgs_lists) == 0:
raise Exception("not found any img file in {}".format(img_file))
return imgs_lists
def create_predictor(args, mode): def create_predictor(args, mode):
......
...@@ -68,50 +68,6 @@ def draw_det_res(dt_boxes, config, img_name, ino): ...@@ -68,50 +68,6 @@ def draw_det_res(dt_boxes, config, img_name, ino):
logger.info("The detected Image saved in {}".format(save_path)) logger.info("The detected Image saved in {}".format(save_path))
def simple_reader(img_file, config):
imgs_lists = []
if img_file is None or not os.path.exists(img_file):
raise Exception("not found any img file in {}".format(img_file))
img_end = ['jpg', 'png', 'jpeg', 'JPEG', 'JPG', 'bmp']
if os.path.isfile(img_file) and img_file.split('.')[-1] in img_end:
imgs_lists.append(img_file)
elif os.path.isdir(img_file):
for single_file in os.listdir(img_file):
if single_file.split('.')[-1] in img_end:
imgs_lists.append(os.path.join(img_file, single_file))
if len(imgs_lists) == 0:
raise Exception("not found any img file in {}".format(img_file))
batch_size = config['Global']['test_batch_size_per_card']
global_params = config['Global']
params = deepcopy(config['TestReader'])
params.update(global_params)
reader_function = params['process_function']
process_function = create_module(reader_function)(params)
def batch_iter_reader():
batch_outs = []
for img_path in imgs_lists:
img = cv2.imread(img_path)
if img.shape[-1] == 1 or len(list(img.shape)) == 2:
img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
if img is None:
logger.info("load image error:" + img_path)
continue
outs = process_function(img)
outs.append(os.path.basename(img_path))
print(outs[0].shape, outs[2])
batch_outs.append(outs)
if len(batch_outs) == batch_size:
yield batch_outs
batch_outs = []
if len(batch_outs) != 0:
yield batch_outs
return batch_iter_reader
def main(): def main():
config = program.load_config(FLAGS.config) config = program.load_config(FLAGS.config)
program.merge_config(FLAGS.opt) program.merge_config(FLAGS.opt)
...@@ -148,9 +104,7 @@ def main(): ...@@ -148,9 +104,7 @@ def main():
save_res_path = config['Global']['save_res_path'] save_res_path = config['Global']['save_res_path']
with open(save_res_path, "wb") as fout: with open(save_res_path, "wb") as fout:
# test_reader = reader_main(config=config, mode='test') test_reader = reader_main(config=config, mode='test')
single_img_path = config['TestReader']['single_img_path']
test_reader = simple_reader(img_file=single_img_path, config=config)
tackling_num = 0 tackling_num = 0
for data in test_reader(): for data in test_reader():
img_num = len(data) img_num = len(data)
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册