未验证 提交 74be6033 编写于 作者: N neonhuang 提交者: GitHub

add multi lang ocr (#1702)

上级 d53f6412
......@@ -231,3 +231,4 @@ We welcome you to contribute code to PaddleHub, and thank you for your feedback.
* Many thanks to [zl1271](https://github.com/zl1271) for fixing serving docs typo
* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of UGATIT and deoldify models in Hugging Face spaces
* Many thanks to [itegel](https://github.com/itegel) for fixing quick start docs typo
* Many thanks to [AK391](https://github.com/AK391) for adding the webdemo of Photo2Cartoon model in Hugging Face spaces
......@@ -247,3 +247,4 @@ print(results)
* 非常感谢[zl1271](https://github.com/zl1271)修复了serving文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了UGATIT和deoldify模型的web demo
* 非常感谢[itegel](https://github.com/itegel)修复了快速开始文档中的错别字
* 非常感谢[AK391](https://github.com/AK391)在Hugging Face spaces中添加了Photo2Cartoon模型的web demo
......@@ -50,6 +50,8 @@
**UGATIT Selfie2anime Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/U-GAT-IT-selfie2anime)
**Photo2Cartoon Huggingface Web Demo**: Integrated to [Huggingface Spaces](https://huggingface.co/spaces) with [Gradio](https://github.com/gradio-app/gradio). See demo: [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/akhaliq/photo2cartoon)
### Object Detection
- Pedestrian detection, vehicle detection, and more industrial-grade ultra-large-scale pretrained models are provided.
......
# arabic_ocr_db_crnn_mobile
|模型名称|arabic_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- arabic_ocr_db_crnn_mobile Module用于识别图片当中的阿拉伯文字,包括阿拉伯文、波斯文、维吾尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的阿拉伯文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别阿拉伯文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install arabic_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run arabic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="arabic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造ArabicOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m arabic_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/arabic_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="arabic_ocr_db_crnn_mobile",
version="1.1.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class ArabicOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="arabic",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# chinese_cht_ocr_db_crnn_mobile
|模型名称|chinese_cht_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- chinese_cht_ocr_db_crnn_mobile Module用于识别图片当中的繁体中文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的繁体中文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别繁体中文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install chinese_cht_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run chinese_cht_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、预测代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="chinese_cht_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造ChineseChtOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m chinese_cht_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/chinese_cht_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="chinese_cht_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class ChineseChtOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="chinese_cht",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
# cyrillic_ocr_db_crnn_mobile
|模型名称|cyrillic_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- cyrillic_ocr_db_crnn_mobile Module用于识别图片当中的斯拉夫文,包括俄罗斯文、塞尔维亚文、白俄罗斯文、保加利亚文、乌克兰文、蒙古文、阿迪赫文、阿瓦尔文、达尔瓦文、因古什文、拉克文、莱兹甘文、塔巴萨兰文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的斯拉夫文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别斯拉夫文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install cyrillic_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run cyrillic_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="cyrillic_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造CyrillicOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m cyrillic_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/cyrillic_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="cyrillic_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class CyrillicOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="cyrillic",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# devanagari_ocr_db_crnn_mobile
|模型名称|devanagari_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- devanagari_ocr_db_crnn_mobile Module用于识别图片当中的梵文,包括印地文、马拉地文、尼泊尔文、比尔哈文、迈蒂利文、昂加文、孟加拉文、摩揭陀文、那格浦尔文、尼瓦尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的梵文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别梵文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install devanagari_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run devanagari_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="devanagari_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造DevanagariOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m devanagari_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/devanagari_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="devanagari_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class DevanagariOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="devanagari",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# french_ocr_db_crnn_mobile
|模型名称|french_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- french_ocr_db_crnn_mobile Module用于识别图片当中的法文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的法文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别法文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install french_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run french_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="french_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造FrechOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m french_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/french_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="french_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class FrechOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="fr",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
!
"
$
%
&
'
(
)
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
>
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
]
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
£
§
­
²
´
µ
·
º
¼
½
¿
À
Á
Ä
Å
Ç
É
Í
Ï
Ô
Ö
Ø
Ù
Ü
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
í
ï
ñ
ò
ó
ô
ö
ø
ù
ú
û
ü
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
self.max_text_len = config['max_text_length']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type in [
"ch", 'japan', 'korean', 'french', 'german'
]:
character_dict_path = config['character_dict_path']
add_space = False
if 'use_space_char' in config:
add_space = config['use_space_char']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line
if add_space:
self.character_str += " "
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
elif self.loss_type == "srn":
dict_character = dict_character + [self.beg_str, self.end_str]
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[int(text_index[idx])])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def cal_predicts_accuracy_srn(char_ops,
preds,
labels,
max_text_len,
is_debug=False):
acc_num = 0
img_num = 0
char_num = char_ops.get_char_num()
total_len = preds.shape[0]
img_num = int(total_len / max_text_len)
for i in range(img_num):
cur_label = []
cur_pred = []
for j in range(max_text_len):
if labels[j + i * max_text_len] != int(char_num - 1): #0
cur_label.append(labels[j + i * max_text_len][0])
else:
break
for j in range(max_text_len + 1):
if j < len(cur_label) and preds[j + i * max_text_len][
0] != cur_label[j]:
break
elif j == len(cur_label) and j == max_text_len:
acc_num += 1
break
elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
char_num - 1):
acc_num += 1
break
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from german_ocr_db_crnn_mobile.character import CharacterOps
from german_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="german_ocr_db_crnn_mobile",
version="1.0.0",
summary=
"The module can recognize the german texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization module. Then it recognizes the german texts. ",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
version="1.1.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class GermanOCRDBCRNNMobile(hub.Module):
def _initialize(self, text_detector_module=None, enable_mkldnn=False, use_angle_classification=False):
class GermanOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.character_dict_path = os.path.join(self.directory, 'assets',
'german_dict.txt')
char_ops_params = {
'character_type': 'german',
'character_dict_path': self.character_dict_path,
'loss_type': 'ctc',
'max_text_length': 25,
'use_space_char': True
}
self.char_ops = CharacterOps(char_ops_params)
self.rec_image_shape = [3, 32, 320]
self._text_detector_module = text_detector_module
self.font_file = os.path.join(self.directory, 'assets', 'german.ttf')
self.enable_mkldnn = enable_mkldnn
self.use_angle_classification = use_angle_classification
self.rec_pretrained_model_path = os.path.join(
self.directory, 'inference_model', 'character_rec')
self.rec_predictor, self.rec_input_tensor, self.rec_output_tensors = self._set_config(
self.rec_pretrained_model_path)
if self.use_angle_classification:
self.cls_pretrained_model_path = os.path.join(
self.directory, 'inference_model', 'angle_cls')
self.cls_predictor, self.cls_input_tensor, self.cls_output_tensors = self._set_config(
self.cls_pretrained_model_path)
def _set_config(self, pretrained_model_path):
"""
predictor config path
"""
model_file_path = os.path.join(pretrained_model_path, 'model')
params_file_path = os.path.join(pretrained_model_path, 'params')
config = AnalysisConfig(model_file_path, params_file_path)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
config.enable_use_gpu(8000, 0)
else:
config.disable_gpu()
if self.enable_mkldnn:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.enable_mkldnn()
config.disable_glog_info()
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False)
predictor = create_paddle_predictor(config)
input_names = predictor.get_input_names()
input_tensor = predictor.get_input_tensor(input_names[0])
output_names = predictor.get_output_names()
output_tensors = []
for output_name in output_names:
output_tensor = predictor.get_output_tensor(output_name)
output_tensors.append(output_tensor)
return predictor, input_tensor, output_tensors
@property
def text_detector_module(self):
"""
text detect module
"""
if not self._text_detector_module:
self._text_detector_module = hub.Module(
name='chinese_text_detection_db_mobile',
enable_mkldnn=self.enable_mkldnn,
version='1.0.4')
return self._text_detector_module
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def get_rotate_crop_image(self, img, points):
'''
img_height, img_width = img.shape[0:2]
left = int(np.min(points[:, 0]))
right = int(np.max(points[:, 0]))
top = int(np.min(points[:, 1]))
bottom = int(np.max(points[:, 1]))
img_crop = img[top:bottom, left:right, :].copy()
points[:, 0] = points[:, 0] - left
points[:, 1] = points[:, 1] - top
'''
img_crop_width = int(
max(
np.linalg.norm(points[0] - points[1]),
np.linalg.norm(points[2] - points[3])))
img_crop_height = int(
max(
np.linalg.norm(points[0] - points[3]),
np.linalg.norm(points[1] - points[2])))
pts_std = np.float32([[0, 0], [img_crop_width, 0],
[img_crop_width, img_crop_height],
[0, img_crop_height]])
M = cv2.getPerspectiveTransform(points, pts_std)
dst_img = cv2.warpPerspective(
img,
M, (img_crop_width, img_crop_height),
borderMode=cv2.BORDER_REPLICATE,
flags=cv2.INTER_CUBIC)
dst_img_height, dst_img_width = dst_img.shape[0:2]
if dst_img_height * 1.0 / dst_img_width >= 1.5:
dst_img = np.rot90(dst_img)
return dst_img
def resize_norm_img_rec(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
assert imgC == img.shape[2]
h, w = img.shape[:2]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def resize_norm_img_cls(self, img):
cls_image_shape = [3, 48, 192]
imgC, imgH, imgW = cls_image_shape
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
if cls_image_shape[0] == 1:
resized_image = resized_image / 255
resized_image = resized_image[np.newaxis, :]
else:
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def recognize_text(self,
images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5,
angle_classification_thresh=0.9):
"""
Get the chinese texts in the predicted images.
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="german",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu.
batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
text_thresh(float): the threshold of the chinese text recognition confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
Returns:
res (list): The result of chinese texts and save path of images.
res (list): The result of text detection box and save path of images.
"""
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.use_gpu = use_gpu
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
detection_results = self.text_detector_module.detect_text(
images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh)
print('*'*10)
print(detection_results)
boxes = [
np.array(item['data']).astype(np.float32)
for item in detection_results
]
all_results = []
for index, img_boxes in enumerate(boxes):
original_image = predicted_data[index].copy()
result = {'save_path': ''}
if img_boxes.size == 0:
result['data'] = []
else:
img_crop_list = []
boxes = sorted_boxes(img_boxes)
for num_box in range(len(boxes)):
tmp_box = copy.deepcopy(boxes[num_box])
img_crop = self.get_rotate_crop_image(
original_image, tmp_box)
img_crop_list.append(img_crop)
if self.use_angle_classification:
img_crop_list, angle_list = self._classify_text(
img_crop_list,
angle_classification_thresh=angle_classification_thresh)
rec_results = self._recognize_text(img_crop_list)
# if the recognized text confidence score is lower than text_thresh, then drop it
rec_res_final = []
for index, res in enumerate(rec_results):
text, score = res
if score >= text_thresh:
rec_res_final.append({
'text':
text,
'confidence':
float(score),
'text_box_position':
boxes[index].astype(np.int).tolist()
})
result['data'] = rec_res_final
if visualization and result['data']:
result['save_path'] = self.save_result_image(
original_image, boxes, rec_results, output_dir,
text_thresh)
all_results.append(result)
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
......@@ -310,282 +67,21 @@ class GermanOCRDBCRNNMobile(hub.Module):
results = self.recognize_text(images_decode, **kwargs)
return results
def save_result_image(
self,
original_image,
detection_boxes,
rec_results,
output_dir='ocr_result',
text_thresh=0.5,
):
image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
txts = [item[0] for item in rec_results]
scores = [item[1] for item in rec_results]
draw_img = draw_ocr(
image,
detection_boxes,
txts,
scores,
font_file=self.font_file,
draw_txt=True,
drop_score=text_thresh)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
save_file_path = os.path.join(output_dir, saved_name)
cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
return save_file_path
def _classify_text(self, image_list, angle_classification_thresh=0.9):
img_list = copy.deepcopy(image_list)
img_num = len(img_list)
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
# Sorting can speed up the cls process
indices = np.argsort(np.array(width_list))
cls_res = [['', 0.0]] * img_num
batch_num = 30
for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img_cls(img_list[indices[ino]])
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch)
norm_img_batch = norm_img_batch.copy()
self.cls_input_tensor.copy_from_cpu(norm_img_batch)
self.cls_predictor.zero_copy_run()
prob_out = self.cls_output_tensors[0].copy_to_cpu()
label_out = self.cls_output_tensors[1].copy_to_cpu()
if len(label_out.shape) != 1:
prob_out, label_out = label_out, prob_out
label_list = ['0', '180']
for rno in range(len(label_out)):
label_idx = label_out[rno]
score = prob_out[rno][label_idx]
label = label_list[label_idx]
cls_res[indices[beg_img_no + rno]] = [label, score]
if '180' in label and score > angle_classification_thresh:
img_list[indices[beg_img_no + rno]] = cv2.rotate(
img_list[indices[beg_img_no + rno]], 1)
return img_list, cls_res
def _recognize_text(self, img_list):
img_num = len(img_list)
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
# Sorting can speed up the recognition process
indices = np.argsort(np.array(width_list))
rec_res = [['', 0.0]] * img_num
batch_num = 30
for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img_rec(img_list[indices[ino]],
max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch, axis=0)
norm_img_batch = norm_img_batch.copy()
self.rec_input_tensor.copy_from_cpu(norm_img_batch)
self.rec_predictor.zero_copy_run()
rec_idx_batch = self.rec_output_tensors[0].copy_to_cpu()
rec_idx_lod = self.rec_output_tensors[0].lod()[0]
predict_batch = self.rec_output_tensors[1].copy_to_cpu()
predict_lod = self.rec_output_tensors[1].lod()[0]
for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno]
end = predict_lod[rno + 1]
probs = predict_batch[beg:end, :]
ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
if len(valid_ind) == 0:
continue
score = np.mean(probs[valid_ind, ind[valid_ind]])
# rec_res.append([preds_text, score])
rec_res[indices[beg_img_no + rno]] = [preds_text, score]
return rec_res
def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
detector_dir = os.path.join(dirname, 'text_detector')
classifier_dir = os.path.join(dirname, 'angle_classifier')
recognizer_dir = os.path.join(dirname, 'text_recognizer')
self._save_detector_model(detector_dir, model_filename, params_filename,
combined)
if self.use_angle_classification:
self._save_classifier_model(classifier_dir, model_filename,
params_filename, combined)
self._save_recognizer_model(recognizer_dir, model_filename,
params_filename, combined)
logger.info("The inference model has been saved in the path {}".format(
os.path.realpath(dirname)))
def _save_detector_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
self.text_detector_module.save_inference_model(
dirname, model_filename, params_filename, combined)
def _save_recognizer_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.rec_pretrained_model_path, 'model')
params_file_path = os.path.join(self.rec_pretrained_model_path,
'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.rec_pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
def _save_classifier_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.cls_pretrained_model_path, 'model')
params_file_path = os.path.join(self.cls_pretrained_model_path,
'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.cls_pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.recognize_text(
paths=[args.input_path],
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=args.visualization)
results = self.model.run_cmd(argvs)
return results
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir',
type=str,
default='ocr_result',
help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="diretory to image")
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
if __name__ == '__main__':
ocr = GermanOCRDBCRNNMobile(enable_mkldnn=False, use_angle_classification=True)
image_path = [
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/ger_1.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
]
res = ocr.recognize_text(paths=image_path, visualization=True)
ocr.save_inference_model('save')
print(res)
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
!
"
#
$
%
&
'
(
)
*
+
,
-
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
]
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
©
°
²
´
½
Á
Ä
Å
Ç
È
É
Í
Ó
Ö
×
Ü
ß
à
á
â
ã
ä
å
æ
ç
è
é
ê
ë
í
ð
ñ
ò
ó
ô
õ
ö
ø
ú
û
ü
ý
ā
ă
ą
ć
Č
č
đ
ē
ė
ę
ğ
ī
ı
Ł
ł
ń
ň
ō
ř
Ş
ş
Š
š
ţ
ū
ż
Ž
ž
Ș
ș
ț
Δ
α
λ
μ
φ
Г
О
а
в
л
о
р
с
т
я
 
丿
使
便
姿
婿
宿
寿
尿
廿
忿
椿
槿
橿
殿
沿
湿
滿
漿
禿
稿
竿
簿
綿
耀
西
調
谿
貿
輿
退
駿
鹿
麿
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import numpy as np
import string
class CharacterOps(object):
""" Convert between text-label and text-index """
def __init__(self, config):
self.character_type = config['character_type']
self.loss_type = config['loss_type']
self.max_text_len = config['max_text_length']
if self.character_type == "en":
self.character_str = "0123456789abcdefghijklmnopqrstuvwxyz"
dict_character = list(self.character_str)
elif self.character_type in [
"ch", 'japan', 'korean', 'french', 'german'
]:
character_dict_path = config['character_dict_path']
add_space = False
if 'use_space_char' in config:
add_space = config['use_space_char']
self.character_str = ""
with open(character_dict_path, "rb") as fin:
lines = fin.readlines()
for line in lines:
line = line.decode('utf-8').strip("\n").strip("\r\n")
self.character_str += line
if add_space:
self.character_str += " "
dict_character = list(self.character_str)
elif self.character_type == "en_sensitive":
# same with ASTER setting (use 94 char).
self.character_str = string.printable[:-6]
dict_character = list(self.character_str)
else:
self.character_str = None
assert self.character_str is not None, \
"Nonsupport type of the character: {}".format(self.character_str)
self.beg_str = "sos"
self.end_str = "eos"
if self.loss_type == "attention":
dict_character = [self.beg_str, self.end_str] + dict_character
elif self.loss_type == "srn":
dict_character = dict_character + [self.beg_str, self.end_str]
self.dict = {}
for i, char in enumerate(dict_character):
self.dict[char] = i
self.character = dict_character
def encode(self, text):
"""convert text-label into text-index.
input:
text: text labels of each image. [batch_size]
output:
text: concatenated text index for CTCLoss.
[sum(text_lengths)] = [text_index_0 + text_index_1 + ... + text_index_(n - 1)]
length: length of each text. [batch_size]
"""
if self.character_type == "en":
text = text.lower()
text_list = []
for char in text:
if char not in self.dict:
continue
text_list.append(self.dict[char])
text = np.array(text_list)
return text
def decode(self, text_index, is_remove_duplicate=False):
""" convert text-index into text-label. """
char_list = []
char_num = self.get_char_num()
if self.loss_type == "attention":
beg_idx = self.get_beg_end_flag_idx("beg")
end_idx = self.get_beg_end_flag_idx("end")
ignored_tokens = [beg_idx, end_idx]
else:
ignored_tokens = [char_num]
for idx in range(len(text_index)):
if text_index[idx] in ignored_tokens:
continue
if is_remove_duplicate:
if idx > 0 and text_index[idx - 1] == text_index[idx]:
continue
char_list.append(self.character[int(text_index[idx])])
text = ''.join(char_list)
return text
def get_char_num(self):
return len(self.character)
def get_beg_end_flag_idx(self, beg_or_end):
if self.loss_type == "attention":
if beg_or_end == "beg":
idx = np.array(self.dict[self.beg_str])
elif beg_or_end == "end":
idx = np.array(self.dict[self.end_str])
else:
assert False, "Unsupport type %s in get_beg_end_flag_idx"\
% beg_or_end
return idx
else:
err = "error in get_beg_end_flag_idx when using the loss %s"\
% (self.loss_type)
assert False, err
def cal_predicts_accuracy(char_ops,
preds,
preds_lod,
labels,
labels_lod,
is_remove_duplicate=False):
acc_num = 0
img_num = 0
for ino in range(len(labels_lod) - 1):
beg_no = preds_lod[ino]
end_no = preds_lod[ino + 1]
preds_text = preds[beg_no:end_no].reshape(-1)
preds_text = char_ops.decode(preds_text, is_remove_duplicate)
beg_no = labels_lod[ino]
end_no = labels_lod[ino + 1]
labels_text = labels[beg_no:end_no].reshape(-1)
labels_text = char_ops.decode(labels_text, is_remove_duplicate)
img_num += 1
if preds_text == labels_text:
acc_num += 1
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def cal_predicts_accuracy_srn(char_ops,
preds,
labels,
max_text_len,
is_debug=False):
acc_num = 0
img_num = 0
char_num = char_ops.get_char_num()
total_len = preds.shape[0]
img_num = int(total_len / max_text_len)
for i in range(img_num):
cur_label = []
cur_pred = []
for j in range(max_text_len):
if labels[j + i * max_text_len] != int(char_num - 1): #0
cur_label.append(labels[j + i * max_text_len][0])
else:
break
for j in range(max_text_len + 1):
if j < len(cur_label) and preds[j + i * max_text_len][
0] != cur_label[j]:
break
elif j == len(cur_label) and j == max_text_len:
acc_num += 1
break
elif j == len(cur_label) and preds[j + i * max_text_len][0] == int(
char_num - 1):
acc_num += 1
break
acc = acc_num * 1.0 / img_num
return acc, acc_num, img_num
def convert_rec_attention_infer_res(preds):
img_num = preds.shape[0]
target_lod = [0]
convert_ids = []
for ino in range(img_num):
end_pos = np.where(preds[ino, :] == 1)[0]
if len(end_pos) <= 1:
text_list = preds[ino, 1:]
else:
text_list = preds[ino, 1:end_pos[1]]
target_lod.append(target_lod[ino] + len(text_list))
convert_ids = convert_ids + list(text_list)
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
def convert_rec_label_to_lod(ori_labels):
img_num = len(ori_labels)
target_lod = [0]
convert_ids = []
for ino in range(img_num):
target_lod.append(target_lod[ino] + len(ori_labels[ino]))
convert_ids = convert_ids + list(ori_labels[ino])
convert_ids = np.array(convert_ids)
convert_ids = convert_ids.reshape((-1, 1))
return convert_ids, target_lod
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import ast
import copy
import math
import os
import time
from paddle.fluid.core import AnalysisConfig, create_paddle_predictor, PaddleTensor
from paddlehub.common.logger import logger
from paddlehub.module.module import moduleinfo, runnable, serving
from PIL import Image
import cv2
import numpy as np
import paddle.fluid as fluid
import paddlehub as hub
from japan_ocr_db_crnn_mobile.character import CharacterOps
from japan_ocr_db_crnn_mobile.utils import base64_to_cv2, draw_ocr, get_image_ext, sorted_boxes
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="japan_ocr_db_crnn_mobile",
version="1.0.0",
summary=
"The module can recognize the japan texts in an image. Firstly, it will detect the text box positions based on the differentiable_binarization module. Then it recognizes the german texts. ",
author="paddle-dev",
author_email="paddle-dev@baidu.com",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class JapanOCRDBCRNNMobile(hub.Module):
def _initialize(self, text_detector_module=None, enable_mkldnn=False, use_angle_classification=False):
class JapanOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.character_dict_path = os.path.join(self.directory, 'assets',
'japan_dict.txt')
char_ops_params = {
'character_type': 'japan',
'character_dict_path': self.character_dict_path,
'loss_type': 'ctc',
'max_text_length': 25,
'use_space_char': True
}
self.char_ops = CharacterOps(char_ops_params)
self.rec_image_shape = [3, 32, 320]
self._text_detector_module = text_detector_module
self.font_file = os.path.join(self.directory, 'assets', 'japan.ttc')
self.enable_mkldnn = enable_mkldnn
self.use_angle_classification = use_angle_classification
self.rec_pretrained_model_path = os.path.join(
self.directory, 'inference_model', 'character_rec')
self.rec_predictor, self.rec_input_tensor, self.rec_output_tensors = self._set_config(
self.rec_pretrained_model_path)
if self.use_angle_classification:
self.cls_pretrained_model_path = os.path.join(
self.directory, 'inference_model', 'angle_cls')
self.cls_predictor, self.cls_input_tensor, self.cls_output_tensors = self._set_config(
self.cls_pretrained_model_path)
def _set_config(self, pretrained_model_path):
"""
predictor config path
"""
model_file_path = os.path.join(pretrained_model_path, 'model')
params_file_path = os.path.join(pretrained_model_path, 'params')
config = AnalysisConfig(model_file_path, params_file_path)
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
use_gpu = True
except:
use_gpu = False
if use_gpu:
config.enable_use_gpu(8000, 0)
else:
config.disable_gpu()
if self.enable_mkldnn:
# cache 10 different shapes for mkldnn to avoid memory leak
config.set_mkldnn_cache_capacity(10)
config.enable_mkldnn()
config.disable_glog_info()
config.delete_pass("conv_transpose_eltwiseadd_bn_fuse_pass")
config.switch_use_feed_fetch_ops(False)
predictor = create_paddle_predictor(config)
input_names = predictor.get_input_names()
input_tensor = predictor.get_input_tensor(input_names[0])
output_names = predictor.get_output_names()
output_tensors = []
for output_name in output_names:
output_tensor = predictor.get_output_tensor(output_name)
output_tensors.append(output_tensor)
return predictor, input_tensor, output_tensors
@property
def text_detector_module(self):
"""
text detect module
"""
if not self._text_detector_module:
self._text_detector_module = hub.Module(
name='chinese_text_detection_db_mobile',
enable_mkldnn=self.enable_mkldnn,
version='1.0.4')
return self._text_detector_module
def read_images(self, paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(
img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
logger.info("error in loading image:{}".format(img_path))
continue
images.append(img)
return images
def get_rotate_crop_image(self, img, points):
'''
img_height, img_width = img.shape[0:2]
left = int(np.min(points[:, 0]))
right = int(np.max(points[:, 0]))
top = int(np.min(points[:, 1]))
bottom = int(np.max(points[:, 1]))
img_crop = img[top:bottom, left:right, :].copy()
points[:, 0] = points[:, 0] - left
points[:, 1] = points[:, 1] - top
'''
img_crop_width = int(
max(
np.linalg.norm(points[0] - points[1]),
np.linalg.norm(points[2] - points[3])))
img_crop_height = int(
max(
np.linalg.norm(points[0] - points[3]),
np.linalg.norm(points[1] - points[2])))
pts_std = np.float32([[0, 0], [img_crop_width, 0],
[img_crop_width, img_crop_height],
[0, img_crop_height]])
M = cv2.getPerspectiveTransform(points, pts_std)
dst_img = cv2.warpPerspective(
img,
M, (img_crop_width, img_crop_height),
borderMode=cv2.BORDER_REPLICATE,
flags=cv2.INTER_CUBIC)
dst_img_height, dst_img_width = dst_img.shape[0:2]
if dst_img_height * 1.0 / dst_img_width >= 1.5:
dst_img = np.rot90(dst_img)
return dst_img
def resize_norm_img_rec(self, img, max_wh_ratio):
imgC, imgH, imgW = self.rec_image_shape
assert imgC == img.shape[2]
h, w = img.shape[:2]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def resize_norm_img_cls(self, img):
cls_image_shape = [3, 48, 192]
imgC, imgH, imgW = cls_image_shape
h = img.shape[0]
w = img.shape[1]
ratio = w / float(h)
if math.ceil(imgH * ratio) > imgW:
resized_w = imgW
else:
resized_w = int(math.ceil(imgH * ratio))
resized_image = cv2.resize(img, (resized_w, imgH))
resized_image = resized_image.astype('float32')
if cls_image_shape[0] == 1:
resized_image = resized_image / 255
resized_image = resized_image[np.newaxis, :]
else:
resized_image = resized_image.transpose((2, 0, 1)) / 255
resized_image -= 0.5
resized_image /= 0.5
padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
padding_im[:, :, 0:resized_w] = resized_image
return padding_im
def recognize_text(self,
images=[],
paths=[],
use_gpu=False,
output_dir='ocr_result',
visualization=False,
box_thresh=0.5,
text_thresh=0.5,
angle_classification_thresh=0.9):
"""
Get the chinese texts in the predicted images.
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="japan",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
use_gpu (bool): Whether to use gpu.
batch_size(int): the program deals once with one
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
box_thresh(float): the threshold of the detected text box's confidence
text_thresh(float): the threshold of the chinese text recognition confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
Returns:
res (list): The result of chinese texts and save path of images.
res (list): The result of text detection box and save path of images.
"""
if use_gpu:
try:
_places = os.environ["CUDA_VISIBLE_DEVICES"]
int(_places[0])
except:
raise RuntimeError(
"Environment Variable CUDA_VISIBLE_DEVICES is not set correctly. If you wanna use gpu, please set CUDA_VISIBLE_DEVICES via export CUDA_VISIBLE_DEVICES=cuda_device_id."
)
self.use_gpu = use_gpu
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = self.read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
detection_results = self.text_detector_module.detect_text(
images=predicted_data, use_gpu=self.use_gpu, box_thresh=box_thresh)
print('*'*10)
print(detection_results)
boxes = [
np.array(item['data']).astype(np.float32)
for item in detection_results
]
all_results = []
for index, img_boxes in enumerate(boxes):
original_image = predicted_data[index].copy()
result = {'save_path': ''}
if img_boxes.size == 0:
result['data'] = []
else:
img_crop_list = []
boxes = sorted_boxes(img_boxes)
for num_box in range(len(boxes)):
tmp_box = copy.deepcopy(boxes[num_box])
img_crop = self.get_rotate_crop_image(
original_image, tmp_box)
img_crop_list.append(img_crop)
if self.use_angle_classification:
img_crop_list, angle_list = self._classify_text(
img_crop_list,
angle_classification_thresh=angle_classification_thresh)
rec_results = self._recognize_text(img_crop_list)
# if the recognized text confidence score is lower than text_thresh, then drop it
rec_res_final = []
for index, res in enumerate(rec_results):
text, score = res
if score >= text_thresh:
rec_res_final.append({
'text':
text,
'confidence':
float(score),
'text_box_position':
boxes[index].astype(np.int).tolist()
})
result['data'] = rec_res_final
if visualization and result['data']:
result['save_path'] = self.save_result_image(
original_image, boxes, rec_results, output_dir,
text_thresh)
all_results.append(result)
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
......@@ -310,282 +67,21 @@ class JapanOCRDBCRNNMobile(hub.Module):
results = self.recognize_text(images_decode, **kwargs)
return results
def save_result_image(
self,
original_image,
detection_boxes,
rec_results,
output_dir='ocr_result',
text_thresh=0.5,
):
image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
txts = [item[0] for item in rec_results]
scores = [item[1] for item in rec_results]
draw_img = draw_ocr(
image,
detection_boxes,
txts,
scores,
font_file=self.font_file,
draw_txt=True,
drop_score=text_thresh)
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
save_file_path = os.path.join(output_dir, saved_name)
cv2.imwrite(save_file_path, draw_img[:, :, ::-1])
return save_file_path
def _classify_text(self, image_list, angle_classification_thresh=0.9):
img_list = copy.deepcopy(image_list)
img_num = len(img_list)
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
# Sorting can speed up the cls process
indices = np.argsort(np.array(width_list))
cls_res = [['', 0.0]] * img_num
batch_num = 30
for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img_cls(img_list[indices[ino]])
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch)
norm_img_batch = norm_img_batch.copy()
self.cls_input_tensor.copy_from_cpu(norm_img_batch)
self.cls_predictor.zero_copy_run()
prob_out = self.cls_output_tensors[0].copy_to_cpu()
label_out = self.cls_output_tensors[1].copy_to_cpu()
if len(label_out.shape) != 1:
prob_out, label_out = label_out, prob_out
label_list = ['0', '180']
for rno in range(len(label_out)):
label_idx = label_out[rno]
score = prob_out[rno][label_idx]
label = label_list[label_idx]
cls_res[indices[beg_img_no + rno]] = [label, score]
if '180' in label and score > angle_classification_thresh:
img_list[indices[beg_img_no + rno]] = cv2.rotate(
img_list[indices[beg_img_no + rno]], 1)
return img_list, cls_res
def _recognize_text(self, img_list):
img_num = len(img_list)
# Calculate the aspect ratio of all text bars
width_list = []
for img in img_list:
width_list.append(img.shape[1] / float(img.shape[0]))
# Sorting can speed up the recognition process
indices = np.argsort(np.array(width_list))
rec_res = [['', 0.0]] * img_num
batch_num = 30
for beg_img_no in range(0, img_num, batch_num):
end_img_no = min(img_num, beg_img_no + batch_num)
norm_img_batch = []
max_wh_ratio = 0
for ino in range(beg_img_no, end_img_no):
h, w = img_list[indices[ino]].shape[0:2]
wh_ratio = w * 1.0 / h
max_wh_ratio = max(max_wh_ratio, wh_ratio)
for ino in range(beg_img_no, end_img_no):
norm_img = self.resize_norm_img_rec(img_list[indices[ino]],
max_wh_ratio)
norm_img = norm_img[np.newaxis, :]
norm_img_batch.append(norm_img)
norm_img_batch = np.concatenate(norm_img_batch, axis=0)
norm_img_batch = norm_img_batch.copy()
self.rec_input_tensor.copy_from_cpu(norm_img_batch)
self.rec_predictor.zero_copy_run()
rec_idx_batch = self.rec_output_tensors[0].copy_to_cpu()
rec_idx_lod = self.rec_output_tensors[0].lod()[0]
predict_batch = self.rec_output_tensors[1].copy_to_cpu()
predict_lod = self.rec_output_tensors[1].lod()[0]
for rno in range(len(rec_idx_lod) - 1):
beg = rec_idx_lod[rno]
end = rec_idx_lod[rno + 1]
rec_idx_tmp = rec_idx_batch[beg:end, 0]
preds_text = self.char_ops.decode(rec_idx_tmp)
beg = predict_lod[rno]
end = predict_lod[rno + 1]
probs = predict_batch[beg:end, :]
ind = np.argmax(probs, axis=1)
blank = probs.shape[1]
valid_ind = np.where(ind != (blank - 1))[0]
if len(valid_ind) == 0:
continue
score = np.mean(probs[valid_ind, ind[valid_ind]])
# rec_res.append([preds_text, score])
rec_res[indices[beg_img_no + rno]] = [preds_text, score]
return rec_res
def save_inference_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
detector_dir = os.path.join(dirname, 'text_detector')
classifier_dir = os.path.join(dirname, 'angle_classifier')
recognizer_dir = os.path.join(dirname, 'text_recognizer')
self._save_detector_model(detector_dir, model_filename, params_filename,
combined)
if self.use_angle_classification:
self._save_classifier_model(classifier_dir, model_filename,
params_filename, combined)
self._save_recognizer_model(recognizer_dir, model_filename,
params_filename, combined)
logger.info("The inference model has been saved in the path {}".format(
os.path.realpath(dirname)))
def _save_detector_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
self.text_detector_module.save_inference_model(
dirname, model_filename, params_filename, combined)
def _save_recognizer_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.rec_pretrained_model_path, 'model')
params_file_path = os.path.join(self.rec_pretrained_model_path,
'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.rec_pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
def _save_classifier_model(self,
dirname,
model_filename=None,
params_filename=None,
combined=True):
if combined:
model_filename = "__model__" if not model_filename else model_filename
params_filename = "__params__" if not params_filename else params_filename
place = fluid.CPUPlace()
exe = fluid.Executor(place)
model_file_path = os.path.join(self.cls_pretrained_model_path, 'model')
params_file_path = os.path.join(self.cls_pretrained_model_path,
'params')
program, feeded_var_names, target_vars = fluid.io.load_inference_model(
dirname=self.cls_pretrained_model_path,
model_filename=model_file_path,
params_filename=params_file_path,
executor=exe)
fluid.io.save_inference_model(
dirname=dirname,
main_program=program,
executor=exe,
feeded_var_names=feeded_var_names,
target_vars=target_vars,
model_filename=model_filename,
params_filename=params_filename)
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
self.parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
self.arg_input_group = self.parser.add_argument_group(
title="Input options", description="Input data. Required")
self.arg_config_group = self.parser.add_argument_group(
title="Config options",
description=
"Run configuration for controlling module behavior, not required.")
self.add_module_config_arg()
self.add_module_input_arg()
args = self.parser.parse_args(argvs)
results = self.recognize_text(
paths=[args.input_path],
use_gpu=args.use_gpu,
output_dir=args.output_dir,
visualization=args.visualization)
results = self.model.run_cmd(argvs)
return results
def add_module_config_arg(self):
"""
Add the command config options
"""
self.arg_config_group.add_argument(
'--use_gpu',
type=ast.literal_eval,
default=False,
help="whether use GPU or not")
self.arg_config_group.add_argument(
'--output_dir',
type=str,
default='ocr_result',
help="The directory to save output images.")
self.arg_config_group.add_argument(
'--visualization',
type=ast.literal_eval,
default=False,
help="whether to save output as images.")
def add_module_input_arg(self):
"""
Add the command input options
"""
self.arg_input_group.add_argument(
'--input_path', type=str, default=None, help="diretory to image")
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
if __name__ == '__main__':
ocr = JapanOCRDBCRNNMobile(enable_mkldnn=False, use_angle_classification=True)
image_path = [
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/ger_1.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/12.jpg',
'/mnt/zhangxuefei/PaddleOCR/doc/imgs/test_image.jpg'
]
res = ocr.recognize_text(paths=image_path, visualization=True)
ocr.save_inference_model('save')
print(res)
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# -*- coding:utf-8 -*-
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
from PIL import Image, ImageDraw, ImageFont
import base64
import cv2
import numpy as np
def draw_ocr(image,
boxes,
txts,
scores,
font_file,
draw_txt=True,
drop_score=0.5):
"""
Visualize the results of OCR detection and recognition
args:
image(Image|array): RGB image
boxes(list): boxes with shape(N, 4, 2)
txts(list): the texts
scores(list): txxs corresponding scores
draw_txt(bool): whether draw text or not
drop_score(float): only scores greater than drop_threshold will be visualized
return(array):
the visualized img
"""
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score or math.isnan(score):
continue
box = np.reshape(np.array(box), [-1, 1, 2]).astype(np.int64)
image = cv2.polylines(np.array(image), [box], True, (255, 0, 0), 2)
if draw_txt:
img = np.array(resize_img(image, input_size=600))
txt_img = text_visual(
txts,
scores,
font_file,
img_h=img.shape[0],
img_w=600,
threshold=drop_score)
img = np.concatenate([np.array(img), np.array(txt_img)], axis=1)
return img
return image
def text_visual(texts, scores, font_file, img_h=400, img_w=600, threshold=0.):
"""
create new blank img and draw txt on it
args:
texts(list): the text will be draw
scores(list|None): corresponding score of each txt
img_h(int): the height of blank img
img_w(int): the width of blank img
return(array):
"""
if scores is not None:
assert len(texts) == len(
scores), "The number of txts and corresponding scores must match"
def create_blank_img():
blank_img = np.ones(shape=[img_h, img_w], dtype=np.int8) * 255
blank_img[:, img_w - 1:] = 0
blank_img = Image.fromarray(blank_img).convert("RGB")
draw_txt = ImageDraw.Draw(blank_img)
return blank_img, draw_txt
blank_img, draw_txt = create_blank_img()
font_size = 20
txt_color = (0, 0, 0)
font = ImageFont.truetype(font_file, font_size, encoding="utf-8")
gap = font_size + 5
txt_img_list = []
count, index = 1, 0
for idx, txt in enumerate(texts):
index += 1
if scores[idx] < threshold or math.isnan(scores[idx]):
index -= 1
continue
first_line = True
while str_count(txt) >= img_w // font_size - 4:
tmp = txt
txt = tmp[:img_w // font_size - 4]
if first_line:
new_txt = str(index) + ': ' + txt
first_line = False
else:
new_txt = ' ' + txt
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
txt = tmp[img_w // font_size - 4:]
if count >= img_h // gap - 1:
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
if first_line:
new_txt = str(index) + ': ' + txt + ' ' + '%.3f' % (scores[idx])
else:
new_txt = " " + txt + " " + '%.3f' % (scores[idx])
draw_txt.text((0, gap * count), new_txt, txt_color, font=font)
# whether add new blank img or not
if count >= img_h // gap - 1 and idx + 1 < len(texts):
txt_img_list.append(np.array(blank_img))
blank_img, draw_txt = create_blank_img()
count = 0
count += 1
txt_img_list.append(np.array(blank_img))
if len(txt_img_list) == 1:
blank_img = np.array(txt_img_list[0])
else:
blank_img = np.concatenate(txt_img_list, axis=1)
return np.array(blank_img)
def str_count(s):
"""
Count the number of Chinese characters,
a single English character and a single number
equal to half the length of Chinese characters.
args:
s(string): the input of string
return(int):
the number of Chinese characters
"""
import string
count_zh = count_pu = 0
s_len = len(s)
en_dg_count = 0
for c in s:
if c in string.ascii_letters or c.isdigit() or c.isspace():
en_dg_count += 1
elif c.isalpha():
count_zh += 1
else:
count_pu += 1
return s_len - math.ceil(en_dg_count / 2)
def resize_img(img, input_size=600):
img = np.array(img)
im_shape = img.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
im_scale = float(input_size) / float(im_size_max)
im = cv2.resize(img, None, None, fx=im_scale, fy=im_scale)
return im
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def sorted_boxes(dt_boxes):
"""
Sort text boxes in order from top to bottom, left to right
args:
dt_boxes(array):detected text boxes with shape [4, 2]
return:
sorted boxes(array) with shape [4, 2]
"""
num_boxes = dt_boxes.shape[0]
sorted_boxes = sorted(dt_boxes, key=lambda x: (x[0][1], x[0][0]))
_boxes = list(sorted_boxes)
for i in range(num_boxes - 1):
if abs(_boxes[i + 1][0][1] - _boxes[i][0][1]) < 10 and \
(_boxes[i + 1][0][0] < _boxes[i][0][0]):
tmp = _boxes[i]
_boxes[i] = _boxes[i + 1]
_boxes[i + 1] = tmp
return _boxes
def base64_to_cv2(b64str):
data = base64.b64decode(b64str.encode('utf8'))
data = np.fromstring(data, np.uint8)
data = cv2.imdecode(data, cv2.IMREAD_COLOR)
return data
# kannada_ocr_db_crnn_mobile
|模型名称|kannada_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- kannada_ocr_db_crnn_mobile Module用于识别图片当中的卡纳达文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的卡纳达文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别卡纳达文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install kannada_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run kannada_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="kannada_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造KannadaOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m kannada_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/kannada_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="kannada_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class KannadaOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="ka",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# korean_ocr_db_crnn_mobile
|模型名称|korean_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- korean_ocr_db_crnn_mobile Module用于识别图片当中的韩文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的韩文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别韩文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install french_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run korean_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="korean_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造KoreanOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m korean_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/korean_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="korean_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class KoreanOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="korean",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# latin_ocr_db_crnn_mobile
|模型名称|latin_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- latin_ocr_db_crnn_mobile Module用于识别图片当中的拉丁文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的拉丁文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别拉丁文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install latin_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run latin_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="latin_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造LatinOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m latin_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/latin_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="latin_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class LatinOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="latin",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# multi_languages_ocr_db_crnn
|模型名称|multi_languages_ocr_db_crnn|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+RCNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-11-24|
|数据指标|-|
## 一、模型基本信息
- ### 应用效果展示
- 样例结果示例:
<p align="center">
<img src="https://user-images.githubusercontent.com/76040149/133097562-d8c9abd1-6c70-4d93-809f-fa4735764836.png" width = "600" hspace='10'/> <br />
</p>
- ### 模型介绍
- multi_languages_ocr_db_crnn Module用于识别图片当中的文字。其基于PaddleOCR模块,检测得到文本框,识别文本框中的文字,再对检测文本框进行角度分类。最终检测算法采用DB(Differentiable Binarization),而识别文字算法则采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。
该Module不仅提供了通用场景下的中英文模型,也提供了[80个语言](#语种缩写)的小语种模型。
<p align="center">
<img src="https://user-images.githubusercontent.com/76040149/133098254-7c642826-d6d7-4dd0-986e-371622337867.png" width = "300" height = "450" hspace='10'/> <br />
</p>
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install multi_languages_ocr_db_crnn
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE"
$ hub run multi_languages_ocr_db_crnn --input_path "/PATH/TO/IMAGE" --lang "ch" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="multi_languages_ocr_db_crnn", lang='en', enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- multi_languages_ocr_db_crnn目前支持80个语种,可以通过修改lang参数进行切换,对于英文模型,指定lang=en,具体支持的[语种](#语种缩写)可查看表格。
- ### 3、API
- ```python
def __init__(self,
lang="ch",
det=True, rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造MultiLangOCR对象
- **参数**
- lang(str): 多语言模型选择。默认为中文模型,即lang="ch"。
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m multi_languages_ocr_db_crnn
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/multi_languages_ocr_db_crnn"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
<a name="语种缩写"></a>
## 五、支持语种及缩写
| 语种 | 描述 | 缩写 | | 语种 | 描述 | 缩写 |
| --- | --- | --- | ---|--- | --- | --- |
|中文|chinese and english|ch| |保加利亚文|Bulgarian |bg|
|英文|english|en| |乌克兰文|Ukranian|uk|
|法文|french|fr| |白俄罗斯文|Belarusian|be|
|德文|german|german| |泰卢固文|Telugu |te|
|日文|japan|japan| | 阿巴扎文 | Abaza | abq |
|韩文|korean|korean| |泰米尔文|Tamil |ta|
|中文繁体|chinese traditional |chinese_cht| |南非荷兰文 |Afrikaans |af|
|意大利文| Italian |it| |阿塞拜疆文 |Azerbaijani |az|
|西班牙文|Spanish |es| |波斯尼亚文|Bosnian|bs|
|葡萄牙文| Portuguese|pt| |捷克文|Czech|cs|
|俄罗斯文|Russia|ru| |威尔士文 |Welsh |cy|
|阿拉伯文|Arabic|ar| |丹麦文 |Danish|da|
|印地文|Hindi|hi| |爱沙尼亚文 |Estonian |et|
|维吾尔|Uyghur|ug| |爱尔兰文 |Irish |ga|
|波斯文|Persian|fa| |克罗地亚文|Croatian |hr|
|乌尔都文|Urdu|ur| |匈牙利文|Hungarian |hu|
|塞尔维亚文(latin)| Serbian(latin) |rs_latin| |印尼文|Indonesian|id|
|欧西坦文|Occitan |oc| |冰岛文 |Icelandic|is|
|马拉地文|Marathi|mr| |库尔德文 |Kurdish|ku|
|尼泊尔文|Nepali|ne| |立陶宛文|Lithuanian |lt|
|塞尔维亚文(cyrillic)|Serbian(cyrillic)|rs_cyrillic| |拉脱维亚文 |Latvian |lv|
|毛利文|Maori|mi| | 达尔瓦文|Dargwa |dar|
|马来文 |Malay|ms| | 因古什文|Ingush |inh|
|马耳他文 |Maltese |mt| | 拉克文|Lak |lbe|
|荷兰文 |Dutch |nl| | 莱兹甘文|Lezghian |lez|
|挪威文 |Norwegian |no| |塔巴萨兰文 |Tabassaran |tab|
|波兰文|Polish |pl| | 比尔哈文|Bihari |bh|
| 罗马尼亚文|Romanian |ro| | 迈蒂利文|Maithili |mai|
| 斯洛伐克文|Slovak |sk| | 昂加文|Angika |ang|
| 斯洛文尼亚文|Slovenian |sl| | 孟加拉文|Bhojpuri |bho|
| 阿尔巴尼亚文|Albanian |sq| | 摩揭陀文 |Magahi |mah|
| 瑞典文|Swedish |sv| | 那格浦尔文|Nagpur |sck|
| 西瓦希里文|Swahili |sw| | 尼瓦尔文|Newari |new|
| 塔加洛文|Tagalog |tl| | 保加利亚文 |Goan Konkani|gom|
| 土耳其文|Turkish |tr| | 沙特阿拉伯文|Saudi Arabia|sa|
| 乌兹别克文|Uzbek |uz| | 阿瓦尔文|Avar |ava|
| 越南文|Vietnamese |vi| | 阿瓦尔文|Avar |ava|
| 蒙古文|Mongolian |mn| | 阿迪赫文|Adyghe |ady|
import argparse
import sys
import os
import ast
import paddle
import paddle2onnx
import paddle2onnx as p2o
import paddle.fluid as fluid
from paddleocr import PaddleOCR
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
from .utils import read_images, save_result_image, mkdir
@moduleinfo(
name="multi_languages_ocr_db_crnn",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class MultiLangOCR:
def __init__(self,
lang="ch",
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
lang(str): the selection of languages
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.lang = lang
self.logger = get_logger()
argc = len(sys.argv)
if argc == 1 or argc > 1 and sys.argv[1] == 'serving':
self.det = det
self.rec = rec
self.use_angle_cls = use_angle_cls
self.engine = PaddleOCR(
lang=lang,
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
det_db_box_thresh=box_thresh,
cls_thresh=angle_classification_thresh)
self.det_model_dir = self.engine.text_detector.args.det_model_dir
self.rec_model_dir = self.engine.text_detector.args.rec_model_dir
self.cls_model_dir = self.engine.text_detector.args.cls_model_dir
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
if images != [] and isinstance(images, list) and paths == []:
predicted_data = images
elif images == [] and isinstance(paths, list) and paths != []:
predicted_data = read_images(paths)
else:
raise TypeError("The input data is inconsistent with expectations.")
assert predicted_data != [], "There is not any image to be predicted. Please check the input data."
all_results = []
for img in predicted_data:
result = {'save_path': ''}
if img is None:
result['data'] = []
all_results.append(result)
continue
original_image = img.copy()
rec_results = self.engine.ocr(img, det=self.det, rec=self.rec, cls=self.use_angle_cls)
rec_res_final = []
for line in rec_results:
if self.det and self.rec:
boxes = line[0]
text, score = line[1]
rec_res_final.append({'text': text, 'confidence': float(score), 'text_box_position': boxes})
elif self.det and not self.rec:
boxes = line
rec_res_final.append({'text_box_position': boxes})
else:
if self.use_angle_cls and not self.rec:
orientation, score = line
rec_res_final.append({'orientation': orientation, 'score': float(score)})
else:
text, score = line
rec_res_final.append({'text': text, 'confidence': float(score)})
result['data'] = rec_res_final
if visualization and result['data']:
result['save_path'] = save_result_image(original_image, rec_results, output_dir, self.directory,
self.lang, self.det, self.rec, self.logger)
all_results.append(result)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
parser = self.arg_parser()
args = parser.parse_args(argvs)
if args.lang is not None:
self.lang = args.lang
self.det = args.det
self.rec = args.rec
self.use_angle_cls = args.use_angle_cls
self.engine = PaddleOCR(
lang=self.lang,
det=args.det,
rec=args.rec,
use_angle_cls=args.use_angle_cls,
enable_mkldnn=args.enable_mkldnn,
use_gpu=args.use_gpu,
det_db_box_thresh=args.box_thresh,
cls_thresh=args.angle_classification_thresh)
results = self.recognize_text(
paths=[args.input_path], output_dir=args.output_dir, visualization=args.visualization)
return results
def arg_parser(self):
parser = argparse.ArgumentParser(
description="Run the %s module." % self.name,
prog='hub run %s' % self.name,
usage='%(prog)s',
add_help=True)
parser.add_argument('--input_path', type=str, default=None, help="diretory to image. Required.", required=True)
parser.add_argument('--use_gpu', type=ast.literal_eval, default=False, help="whether use GPU or not")
parser.add_argument('--output_dir', type=str, default='ocr_result', help="The directory to save output images.")
parser.add_argument(
'--visualization', type=ast.literal_eval, default=False, help="whether to save output as images.")
parser.add_argument('--lang', type=str, default=None, help="the selection of languages")
parser.add_argument('--det', type=ast.literal_eval, default=True, help="whether use text detector or not")
parser.add_argument('--rec', type=ast.literal_eval, default=True, help="whether use text recognizer or not")
parser.add_argument(
'--use_angle_cls', type=ast.literal_eval, default=False, help="whether text orientation classifier or not")
parser.add_argument('--enable_mkldnn', type=ast.literal_eval, default=False, help="whether use mkldnn or not")
parser.add_argument(
"--box_thresh", type=float, default=0.6, help="set the threshold of the detected text box's confidence")
parser.add_argument(
"--angle_classification_thresh",
type=float,
default=0.9,
help="set the threshold of the angle classification confidence")
return parser
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
v0, v1, v2 = paddle2onnx.__version__.split('.')
if int(v1) < 9:
raise ImportError("paddle2onnx>=0.9.0 is required")
if input_shape_dict is None:
input_shape_dict = {'x': [-1, 3, -1, -1]}
if input_shape_dict is not None and not isinstance(input_shape_dict, dict):
raise Exception("input_shape_dict should be dict, eg. {'x': [-1, 3, -1, -1]}.")
if opset_version <= 9:
raise Exception("opset_version <= 9 is not surpported, please try with higher opset_version >=10.")
path_dict = {"det": self.det_model_dir, "rec": self.rec_model_dir, "cls": self.cls_model_dir}
for (key, path) in path_dict.items():
model_filename = 'inference.pdmodel'
params_filename = 'inference.pdiparams'
save_file = os.path.join(dirname, '{}_{}.onnx'.format(self.name, key))
# convert model save with 'paddle.fluid.io.save_inference_model'
if hasattr(paddle, 'enable_static'):
paddle.enable_static()
exe = fluid.Executor(fluid.CPUPlace())
if model_filename is None and params_filename is None:
[program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(path, exe)
else:
[program, feed_var_names, fetch_vars] = fluid.io.load_inference_model(
path, exe, model_filename=model_filename, params_filename=params_filename)
onnx_proto = p2o.run_convert(program, input_shape_dict=input_shape_dict, opset_version=opset_version)
mkdir(save_file)
with open(save_file, "wb") as f:
f.write(onnx_proto.SerializeToString())
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
import os
import time
import cv2
import numpy as np
from PIL import Image, ImageDraw
from paddleocr import draw_ocr
def save_result_image(original_image,
rec_results,
output_dir='ocr_result',
directory=None,
lang='ch',
det=True,
rec=True,
logger=None):
image = Image.fromarray(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
if det and rec:
boxes = [line[0] for line in rec_results]
txts = [line[1][0] for line in rec_results]
scores = [line[1][1] for line in rec_results]
fonts_lang = 'fonts/simfang.ttf'
lang_fonts = {
'korean': 'korean',
'fr': 'french',
'german': 'german',
'hi': 'hindi',
'ne': 'nepali',
'fa': 'persian',
'es': 'spanish',
'ta': 'tamil',
'te': 'telugu',
'ur': 'urdu',
'ug': 'uyghur',
}
if lang in lang_fonts.keys():
fonts_lang = 'fonts/' + lang_fonts[lang] + '.ttf'
font_file = os.path.join(directory, 'assets', fonts_lang)
im_show = draw_ocr(image, boxes, txts, scores, font_path=font_file)
elif det and not rec:
boxes = rec_results
im_show = draw_boxes(image, boxes)
im_show = np.array(im_show)
else:
logger.warning("only cls or rec not supported visualization.")
return ""
if not os.path.exists(output_dir):
os.makedirs(output_dir)
ext = get_image_ext(original_image)
saved_name = 'ndarray_{}{}'.format(time.time(), ext)
save_file_path = os.path.join(output_dir, saved_name)
im_show = Image.fromarray(im_show)
im_show.save(save_file_path)
return save_file_path
def read_images(paths=[]):
images = []
for img_path in paths:
assert os.path.isfile(img_path), "The {} isn't a valid file.".format(img_path)
img = cv2.imread(img_path)
if img is None:
continue
images.append(img)
return images
def draw_boxes(image, boxes, scores=None, drop_score=0.5):
img = image.copy()
draw = ImageDraw.Draw(img)
if scores is None:
scores = [1] * len(boxes)
for (box, score) in zip(boxes, scores):
if score < drop_score:
continue
draw.line([(box[0][0], box[0][1]), (box[1][0], box[1][1])], fill='red')
draw.line([(box[1][0], box[1][1]), (box[2][0], box[2][1])], fill='red')
draw.line([(box[2][0], box[2][1]), (box[3][0], box[3][1])], fill='red')
draw.line([(box[3][0], box[3][1]), (box[0][0], box[0][1])], fill='red')
draw.line([(box[0][0] - 1, box[0][1] + 1), (box[1][0] - 1, box[1][1] + 1)], fill='red')
draw.line([(box[1][0] - 1, box[1][1] + 1), (box[2][0] - 1, box[2][1] + 1)], fill='red')
draw.line([(box[2][0] - 1, box[2][1] + 1), (box[3][0] - 1, box[3][1] + 1)], fill='red')
draw.line([(box[3][0] - 1, box[3][1] + 1), (box[0][0] - 1, box[0][1] + 1)], fill='red')
return img
def get_image_ext(image):
if image.shape[2] == 4:
return ".png"
return ".jpg"
def mkdir(path):
sub_dir = os.path.dirname(path)
if not os.path.exists(sub_dir):
os.makedirs(sub_dir)
# tamil_ocr_db_crnn_mobile
|模型名称|tamil_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- tamil_ocr_db_crnn_mobile Module用于识别图片当中的泰米尔文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的泰米尔文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别泰米尔文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install tamil_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run tamil_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run tamil_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="tamil_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造TamilOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m tamil_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/tamil_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="tamil_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class TamilOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="ta",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
# telugu_ocr_db_crnn_mobile
|模型名称|telugu_ocr_db_crnn_mobile|
| :--- | :---: |
|类别|图像-文字识别|
|网络|Differentiable Binarization+CRNN|
|数据集|icdar2015数据集|
|是否支持Fine-tuning|否|
|最新更新日期|2021-12-2|
|数据指标|-|
## 一、模型基本信息
- ### 模型介绍
- telugu_ocr_db_crnn_mobile Module用于识别图片当中的泰卢固文。其基于multi_languages_ocr_db_crnn检测得到的文本框,继续识别文本框中的泰卢固文文字。最终识别文字算法采用CRNN(Convolutional Recurrent Neural Network)即卷积递归神经网络。其是DCNN和RNN的组合,专门用于识别图像中的序列式对象。与CTC loss配合使用,进行文字识别,可以直接从文本词级或行级的标注中学习,不需要详细的字符级的标注。该Module是一个识别泰卢固文的轻量级OCR模型,支持直接预测。
- 更多详情参考:
- [Real-time Scene Text Detection with Differentiable Binarization](https://arxiv.org/pdf/1911.08947.pdf)
- [An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition](https://arxiv.org/pdf/1507.05717.pdf)
## 二、安装
- ### 1、环境依赖
- PaddlePaddle >= 2.0.2
- Python >= 3.6
- PaddleOCR >= 2.0.1 | [如何安装PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/quickstart.md#1)
- PaddleHub >= 2.0.0 | [如何安装paddlehub](../../../../docs/docs_ch/get_start/installation.rst)
- Paddle2Onnx >= 0.9.0 | [如何安装paddle2onnx](https://github.com/PaddlePaddle/Paddle2ONNX/blob/develop/README_zh.md)
- shapely
- pyclipper
- ```shell
$ pip3.6 install "paddleocr==2.3.0.2"
$ pip3.6 install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
$ pip3.6 install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple
```
- **该Module依赖于第三方库shapely和pyclipper,使用该Module之前,请先安装shapely和pyclipper。**
- ### 2、安装
- ```shell
$ hub install telugu_ocr_db_crnn_mobile
```
- 如您安装时遇到问题,可参考:[零基础windows安装](../../../../docs/docs_ch/get_start/windows_quickstart.md)
| [零基础Linux安装](../../../../docs/docs_ch/get_start/linux_quickstart.md) | [零基础MacOS安装](../../../../docs/docs_ch/get_start/mac_quickstart.md)
## 三、模型API预测
- ### 1、命令行预测
- ```shell
$ hub run telugu_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE"
$ hub run telugu_ocr_db_crnn_mobile --input_path "/PATH/TO/IMAGE" --det True --rec True --use_angle_cls True --box_thresh 0.7 --angle_classification_thresh 0.8 --visualization True
```
- 通过命令行方式实现文字识别模型的调用,更多请见 [PaddleHub命令行指令](../../../../docs/docs_ch/tutorial/cmd_usage.rst)
- ### 2、代码示例
- ```python
import paddlehub as hub
import cv2
ocr = hub.Module(name="telugu_ocr_db_crnn_mobile", enable_mkldnn=True) # mkldnn加速仅在CPU下有效
result = ocr.recognize_text(images=[cv2.imread('/PATH/TO/IMAGE')])
# or
# result = ocr.recognize_text(paths=['/PATH/TO/IMAGE'])
```
- ### 3、API
- ```python
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9)
```
- 构造TeluguOCRDBCRNNMobile对象
- **参数**
- det(bool): 是否开启文字检测。默认为True。
- rec(bool): 是否开启文字识别。默认为True。
- use_angle_cls(bool): 是否开启方向分类, 用于设置使用方向分类器识别180度旋转文字。默认为False。
- enable_mkldnn(bool): 是否开启mkldnn加速CPU计算。该参数仅在CPU运行下设置有效。默认为False。
- use\_gpu (bool): 是否使用 GPU;**若使用GPU,请先设置CUDA_VISIBLE_DEVICES环境变量**
- box\_thresh (float): 检测文本框置信度的阈值;
- angle_classification_thresh(float): 文本方向分类置信度的阈值
- ```python
def recognize_text(images=[],
paths=[],
output_dir='ocr_result',
visualization=False)
```
- 预测API,检测输入图片中的所有文本的位置和识别文本结果。
- **参数**
- paths (list\[str\]): 图片的路径;
- images (list\[numpy.ndarray\]): 图片数据,ndarray.shape 为 \[H, W, C\],BGR格式;
- output\_dir (str): 图片的保存路径,默认设为 ocr\_result;
- visualization (bool): 是否将识别结果保存为图片文件, 仅有检测开启时有效, 默认为False;
- **返回**
- res (list\[dict\]): 识别结果的列表,列表中每一个元素为 dict,各字段为:
- data (list\[dict\]): 识别文本结果,列表中每一个元素为 dict,各字段为:
- text(str): 识别得到的文本
- confidence(float): 识别文本结果置信度
- text_box_position(list): 文本框在原图中的像素坐标,4*2的矩阵,依次表示文本框左下、右下、右上、左上顶点的坐标,如果无识别结果则data为\[\]
- orientation(str): 分类的方向,仅在只有方向分类开启时输出
- score(float): 分类的得分,仅在只有方向分类开启时输出
- save_path (str, optional): 识别结果的保存路径,如不保存图片则save_path为''
## 四、服务部署
- PaddleHub Serving 可以部署一个目标检测的在线服务。
- ### 第一步:启动PaddleHub Serving
- 运行启动命令:
- ```shell
$ hub serving start -m telugu_ocr_db_crnn_mobile
```
- 这样就完成了一个目标检测的服务化API的部署,默认端口号为8866。
- **NOTE:** 如使用GPU预测,则需要在启动服务之前,请设置CUDA\_VISIBLE\_DEVICES环境变量,否则不用设置。
- ### 第二步:发送预测请求
- 配置好服务端,以下数行代码即可实现发送预测请求,获取预测结果
- ```python
import requests
import json
import cv2
import base64
def cv2_to_base64(image):
data = cv2.imencode('.jpg', image)[1]
return base64.b64encode(data.tostring()).decode('utf8')
# 发送HTTP请求
data = {'images':[cv2_to_base64(cv2.imread("/PATH/TO/IMAGE"))]}
headers = {"Content-type": "application/json"}
url = "http://127.0.0.1:8866/predict/telugu_ocr_db_crnn_mobile"
r = requests.post(url=url, headers=headers, data=json.dumps(data))
# 打印预测结果
print(r.json()["results"])
```
import paddlehub as hub
from paddleocr.ppocr.utils.logging import get_logger
from paddleocr.tools.infer.utility import base64_to_cv2
from paddlehub.module.module import moduleinfo, runnable, serving
@moduleinfo(
name="telugu_ocr_db_crnn_mobile",
version="1.0.0",
summary="ocr service",
author="PaddlePaddle",
type="cv/text_recognition")
class TeluguOCRDBCRNNMobile:
def __init__(self,
det=True,
rec=True,
use_angle_cls=False,
enable_mkldnn=False,
use_gpu=False,
box_thresh=0.6,
angle_classification_thresh=0.9):
"""
initialize with the necessary elements
Args:
det(bool): Whether to use text detector.
rec(bool): Whether to use text recognizer.
use_angle_cls(bool): Whether to use text orientation classifier.
enable_mkldnn(bool): Whether to enable mkldnn.
use_gpu (bool): Whether to use gpu.
box_thresh(float): the threshold of the detected text box's confidence
angle_classification_thresh(float): the threshold of the angle classification confidence
"""
self.logger = get_logger()
self.model = hub.Module(
name="multi_languages_ocr_db_crnn",
lang="te",
det=det,
rec=rec,
use_angle_cls=use_angle_cls,
enable_mkldnn=enable_mkldnn,
use_gpu=use_gpu,
box_thresh=box_thresh,
angle_classification_thresh=angle_classification_thresh)
self.model.name = self.name
def recognize_text(self, images=[], paths=[], output_dir='ocr_result', visualization=False):
"""
Get the text in the predicted images.
Args:
images (list(numpy.ndarray)): images data, shape of each is [H, W, C]. If images not paths
paths (list[str]): The paths of images. If paths not images
output_dir (str): The directory to store output images.
visualization (bool): Whether to save image or not.
Returns:
res (list): The result of text detection box and save path of images.
"""
all_results = self.model.recognize_text(
images=images, paths=paths, output_dir=output_dir, visualization=visualization)
return all_results
@serving
def serving_method(self, images, **kwargs):
"""
Run as a service.
"""
images_decode = [base64_to_cv2(image) for image in images]
results = self.recognize_text(images_decode, **kwargs)
return results
@runnable
def run_cmd(self, argvs):
"""
Run as a command
"""
results = self.model.run_cmd(argvs)
return results
def export_onnx_model(self, dirname: str, input_shape_dict=None, opset_version=10):
'''
Export the model to ONNX format.
Args:
dirname(str): The directory to save the onnx model.
input_shape_dict: dictionary ``{ input_name: input_value }, eg. {'x': [-1, 3, -1, -1]}``
opset_version(int): operator set
'''
self.model.export_onnx_model(dirname=dirname, input_shape_dict=input_shape_dict, opset_version=opset_version)
paddleocr>=2.3.0.2
paddle2onnx>=0.9.0
shapely
pyclipper
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册